Implementation of the classic Dynamic Programming problem using the Needleman–Wunsch algorithm which requires quadratic space & time complexity.
Given 2 sequences, find the minimum cost of aligning the 2 sequences
Gaps can be inserted to 1 sequence or the other, but not at the same time
2 = Gap Penalty (δ)
If 2 characters are aligned with each other, there may be a mismatch penalty (αi j)
0= aligning identical letters1= aligning a vowel with a vowel, or a consonant with a consonant3= aligning a vowel with a consonant
Minimum cost = sum of mismatch & gap penalties (the optimal alignment)
O(M*N)
Storage: also O(M*N) which requires a large 2D array
- Letters only (no spaces, numbers or special characters)
_(Underscore character) is reserved to represent- View results in a fixed-width font for the 2 sequences to be lined up
- Leading & trailing whitespace is trimmed
- Uppercase / Lowercase is ignored (all converted to lowercase)
predecessorIndexesis calculated when creatingmemoTablebut not used to find the actual alignment, only to show where the values inmemoTablecame from- For example: if
predecessorIndexes[4][4]contained the array[4, 3], it means the value ofmemoTable[4][4](which is6) came frommemoTable[4][3](i.e.case3,seq2with a gap so it came from the left) predecessorIndexes[0][0]contains the array[-1, -1]because the upper left corner has no predecessor, it was the initial base case
- Provide 2 strings (edit
testSequences2D array) - Optionally change the
GAP_PENALTYand the mismatch penalties
CurrentlyVOWEL_VOWEL_PENALTYandCONSONANT_CONSONANT_PENALTYare the same, but are arbitrary
findAlignment()should only be called from insidecalcOptimalAlignment()because it usesmemoTable& the 2 sequences to display the actual alignmentseq1&seq2have a leading space added (this is after trimming any trailing/leading whitespace)
This is so thatseq1.charAt(i)&seq2.charAt(j)work in the loops & the string indexes match up
It causes some adjustments for the array sizes.memoTable = new int[seq1.length()][seq2.length()]uses the exact value oflength()which includes the space- Loops like
for(int i=0; i<seq1.length(); i++)use<seq1.length()to not go out of bounds of the string indexes - In
findAlignment(),int i = seq1.length()-1;&int j = seq2.length()-1;since the 2 sequences have leading spaces & arrays are indexes from0 findAlignment()basically retraces each calculation in thememoTableto see where it came from
There may be multiple optimal paths, but this only finds1of them
- String Alignment using Dynamic Programming - Gina M. Cannarozzi to retrace the memo table & find the alignment


