Smith–Waterman algorithm

The Smith–Waterman algorithm is a dynamic programming algorithm used for local sequence alignment. It identifies similar regions between two sequences by comparing segments and aligning them to optimize a similarity score. This algorithm is widely used in bioinformatics for comparing DNA, RNA, or protein sequences.

Problem Description

Given two sequences, ${\displaystyle A=a_{1},a_{2},...,a_{m}}$ and ${\displaystyle B=b_{1},b_{2},...,b_{n}}$ , the goal is to find the optimal local alignment between these sequences. A local alignment aligns a substring of ${\displaystyle A}$ with a substring of ${\displaystyle B}$ to maximize a similarity score.

Scoring System

The algorithm uses the following scoring system:

${\displaystyle +s}$ for a match between two characters.
${\displaystyle -d}$ for a mismatch (substitution).
${\displaystyle -g}$ for a gap (insertion or deletion).

Dynamic Programming Matrix

The algorithm constructs a dynamic programming matrix ${\displaystyle H}$ of dimensions ${\displaystyle (m+1)\times (n+1)}$ , where ${\displaystyle H(i,j)}$ represents the optimal alignment score between substrings ${\displaystyle A[1..i]}$ and ${\displaystyle B[1..j]}$ . The matrix is initialized as:

${\displaystyle H(i,0)=0\quad {\text{for all }}i\in [0,m]}$ ${\displaystyle H(0,j)=0\quad {\text{for all }}j\in [0,n]}$

The recurrence relation used to compute the score is:

${\displaystyle H(i,j)=\max {\begin{cases}0,\\H(i-1,j-1)+{\text{score}}(a_{i},b_{j}),\\H(i-1,j)-g,\\H(i,j-1)-g\end{cases}}}$

Here, ${\displaystyle {\text{score}}(a_{i},b_{j})}$ is ${\displaystyle +s}$ if ${\displaystyle a_{i}=b_{j}}$ (match) and ${\displaystyle -d}$ otherwise (mismatch).

Traceback

To determine the optimal local alignment, traceback is performed starting from the cell in ${\displaystyle H}$ with the maximum value. The traceback continues until a cell with a value of ${\displaystyle 0}$ is reached, indicating the end of the local alignment.

Output

The output of the algorithm includes:

The highest alignment score.
The aligned substrings of ${\displaystyle A}$ and ${\displaystyle B}$ .
The alignment path in the dynamic programming matrix.

Example

Consider the two sequences A = "ACACACTA" and B = "AGCACACA". Using a match score of ${\displaystyle +2}$ , mismatch penalty of ${\displaystyle -1}$ , and gap penalty of ${\displaystyle -2}$ , the Smith–Waterman algorithm identifies the optimal local alignment with a score of ${\displaystyle 14}$ , aligning the substrings "ACACACT" from A and "GCACACA" from ${\displaystyle B}$ .

FreeBASIC

Translation of: JavaScript

' Make the BLOSUM matrix into a class for easier retrieval Type BLOSUM62  scoring_matrix As String Ptr  matrix_size As Integer    Declare Constructor()  Declare Destructor()  Declare Function getItem(aa1 As String, aa2 As String) As Integer  Declare Sub loadMatrix() End Type Constructor BLOSUM62()  matrix_size = 0  scoring_matrix = 0  loadMatrix() End Constructor Destructor BLOSUM62()  If scoring_matrix <> 0 Then Deallocate scoring_matrix End Destructor Sub BLOSUM62.loadMatrix()  Dim As Integer ff = Freefile()  Dim As String line_text, key  Dim As String parts(100)  Dim As Integer part_cnt, score    If Open("BLOSUM62.txt" For Input As #ff) <> 0 Then  Print "Error: Could not open BLOSUM62.txt"  Return  End If    While Not Eof(ff)  Line Input #ff, line_text  line_text = Trim(line_text)  If Len(line_text) > 0 Then matrix_size += 1  Wend   Close #ff    scoring_matrix = Allocate(matrix_size * Sizeof(String))    Open "BLOSUM62.txt" For Input As #ff  Dim As Integer idx = 0    While Not Eof(ff) And idx < matrix_size  Line Input #ff, line_text  line_text = Trim(line_text)    If Len(line_text) > 0 Then  part_cnt = 0  Dim As Integer posic = 1, next_pos    Do  next_pos = Instr(posic, line_text, " ")  If next_pos = 0 Then next_pos = Instr(posic, line_text, Chr(9))    If next_pos > 0 Then  parts(part_cnt) = Trim(Mid(line_text, posic, next_pos - posic))  posic = next_pos + 1  ' Saltar espacios adicionales  While posic <= Len(line_text) And (Mid(line_text, posic, 1) = " " Or Mid(line_text, posic, 1) = Chr(9))  posic += 1  Wend  Else  parts(part_cnt) = Trim(Mid(line_text, posic))  Exit Do  End If  part_cnt += 1  Loop While posic <= Len(line_text) And part_cnt < 100    If part_cnt >= 2 Then  key = parts(0) & "," & parts(1)  scoring_matrix[idx] = key & ":" & parts(2)  idx += 1  End If  End If  Wend    Close #ff  matrix_size = idx End Sub Function BLOSUM62.getItem(aa1 As String, aa2 As String) As Integer  Dim As String search_key1 = aa1 & "," & aa2  Dim As String search_key2 = aa2 & "," & aa1  Dim As Integer i, colon_pos    For i = 0 To matrix_size - 1  colon_pos = Instr(scoring_matrix[i], ":")  If colon_pos > 0 Then  Dim As String key = Left(scoring_matrix[i], colon_pos - 1)  If key = search_key1 Or key = search_key2 Then  Return Val(Mid(scoring_matrix[i], colon_pos + 1))  End If  End If  Next    Return 0 End Function Function max_of_array(arr() As Integer, cnt As Integer) As Integer  Dim As Integer max_val = arr(0)  For i As Integer = 1 To cnt - 1  If arr(i) > max_val Then max_val = arr(i)  Next  Return max_val End Function Function index_of_max(arr() As Integer, cnt As Integer, value As Integer) As Integer  For i As Integer = 0 To cnt - 1  If arr(i) = value Then Return i  Next  Return 0 End Function Sub local_alignment(v As String, w As String, scoring_matrix As BLOSUM62, gap_start As Integer, gap_extend As Integer, Byref result_score As String, Byref v_aligned As String, Byref w_aligned As String)  Dim As Integer v_len = Len(v), w_len = Len(w), i, j    ' Initialize the 3 matrices  Dim As Integer Ptr Ptr M = Allocate((v_len + 1) * Sizeof(Integer Ptr))  Dim As Integer Ptr Ptr X = Allocate((v_len + 1) * Sizeof(Integer Ptr))  Dim As Integer Ptr Ptr Y = Allocate((v_len + 1) * Sizeof(Integer Ptr))  Dim As Integer Ptr Ptr back_track = Allocate((v_len + 1) * Sizeof(Integer Ptr))    For i = 0 To v_len  M[i] = Allocate((w_len + 1) * Sizeof(Integer))  X[i] = Allocate((w_len + 1) * Sizeof(Integer))  Y[i] = Allocate((w_len + 1) * Sizeof(Integer))  back_track[i] = Allocate((w_len + 1) * Sizeof(Integer))    For j = 0 To w_len  M[i][j] = 0  X[i][j] = 0  Y[i][j] = 0  back_track[i][j] = 0  Next  Next    ' Initialize the maximum scores  Dim As Integer max_score = -1  Dim As Integer max_i = 0, max_j = 0    ' Populate all three matrices  For i = 1 To v_len  For j = 1 To w_len  Dim As Integer y_val1 = Y[i-1][j] - gap_extend  Dim As Integer y_val2 = M[i-1][j] - gap_start  Y[i][j] = Iif(y_val1 > y_val2, y_val1, y_val2)    Dim As Integer x_val1 = X[i][j-1] - gap_extend  Dim As Integer x_val2 = M[i][j-1] - gap_start  X[i][j] = Iif(x_val1 > x_val2, x_val1, x_val2)    Dim As Integer cur_scores(3)  cur_scores(0) = Y[i][j]  cur_scores(1) = M[i-1][j-1] + scoring_matrix.getItem(Mid(v, i, 1), Mid(w, j, 1))  cur_scores(2) = X[i][j]  cur_scores(3) = 0    M[i][j] = max_of_array(cur_scores(), 4)  back_track[i][j] = index_of_max(cur_scores(), 4, M[i][j])    If M[i][j] > max_score Then  max_score = M[i][j]  max_i = i  max_j = j  End If  Next  Next    Print "Finished making the matrix"    ' Initialize the indices to start at the position of the high score  i = max_i  j = max_j    ' Initialize the aligned strings as the input strings up to the position of the high score  v_aligned = Left(v, i)  w_aligned = Left(w, j)    ' Backtrack to start of the local alignment starting at the highest scoring cell  While back_track[i][j] <> 3 And i * j <> 0 And i >= j  If back_track[i][j] = 0 Then  i -= 1  Elseif back_track[i][j] = 1 Then  i -= 1  j -= 1  Elseif back_track[i][j] = 2 Then  j -= 1  End If  Wend    Print "Finished backtracking"    ' Cut the strings at the ending point of the backtrack  v_aligned = Mid(v_aligned, i + 1)  w_aligned = Mid(w_aligned, j + 1)    result_score = Str(max_score)    ' Free memory  For i = 0 To v_len  Deallocate M[i]  Deallocate X[i]  Deallocate Y[i]  Deallocate back_track[i]  Next  Deallocate M  Deallocate X  Deallocate Y  Deallocate back_track End Sub ' Read the input file Dim As Integer ff = Freefile() Dim As String line_text, word1, word2 Dim As String chunk = "" Dim As Integer linenum = 0 Dim As Boolean reading_first = True If Open("rosalind_laff.txt" For Input As #ff) <> 0 Then  Print "Error: Could not open rosalind_laff.txt"  Sleep: End End If While Not Eof(ff)  Line Input #ff, line_text  line_text = Trim(line_text)  linenum += 1    If Len(line_text) = 0 Then Continue While    If Left(line_text, 1) = ">" Then  If linenum = 1 Then  chunk = ""  Elseif reading_first Then  word1 = chunk  chunk = ""  reading_first = False  End If  Else  chunk &= line_text  End If Wend word2 = chunk Close #ff Print "Sequence 1 length: " & Len(word1) Print "Sequence 2 length: " & Len(word2) ' Get the local alignment (given sigma = 11, epsilon = 1 in problem statement) Dim As BLOSUM62 blosum Dim As String result_score, v_aligned, w_aligned local_alignment(word1, word2, blosum, 11, 1, result_score, v_aligned, w_aligned) ' Print and save the answer Print result_score Print v_aligned Print w_aligned ' Save results ff = Freefile() Open "out_localali_test.txt" For Output As #ff Print #ff, result_score Print #ff, v_aligned Print #ff, w_aligned Close #ff Print "Results saved to out_localali_test.txt" Sleep

Go

Works with: Go version 1.10.2

Translation of: JavaScript

package main import ( "bufio" "fmt" "math" "os" "strconv" "strings" ) // BLOSUM62 represents the BLOSUM62 scoring matrix type BLOSUM62 struct { scoringMatrix map[string]int } // NewBLOSUM62 creates a new BLOSUM62 matrix from a file func NewBLOSUM62() (*BLOSUM62, error) { file, err := os.Open("BLOSUM62.txt") if err != nil { return nil, err } defer file.Close() blosum := &BLOSUM62{ scoringMatrix: make(map[string]int), } scanner := bufio.NewScanner(file) for scanner.Scan() { line := scanner.Text() entries := strings.Fields(line) if len(entries) >= 3 { key := entries[0] + "," + entries[1] value, err := strconv.Atoi(entries[2]) if err != nil { return nil, err } blosum.scoringMatrix[key] = value } } if err := scanner.Err(); err != nil { return nil, err } return blosum, nil } // GetScore returns the score for a pair of amino acids func (b *BLOSUM62) GetScore(a, c byte) int { key := string(a) + "," + string(c) return b.scoringMatrix[key] } // LocalAlignment performs local alignment of two sequences func LocalAlignment(v, w string, scoringMatrix *BLOSUM62, gapStart, gapExtend int) (string, string, string) { // Initialize the 3 matrices M := make([][]int, len(v)+1) X := make([][]int, len(v)+1) Y := make([][]int, len(v)+1) backTrack := make([][]int, len(v)+1) for i := range M { M[i] = make([]int, len(w)+1) X[i] = make([]int, len(w)+1) Y[i] = make([]int, len(w)+1) backTrack[i] = make([]int, len(w)+1) } // Initialize the maximum scores maxScore := -1 maxI, maxJ := 0, 0 // Populate all three matrices for i := 1; i <= len(v); i++ { for j := 1; j <= len(w); j++ { Y[i][j] = max(Y[i-1][j]-gapExtend, M[i-1][j]-gapStart) X[i][j] = max(X[i][j-1]-gapExtend, M[i][j-1]-gapStart) curScores := []int{ Y[i][j], M[i-1][j-1] + scoringMatrix.GetScore(v[i-1], w[j-1]), X[i][j], 0, } M[i][j] = maxSlice(curScores) backTrack[i][j] = indexOf(curScores, M[i][j]) if M[i][j] > maxScore { maxScore = M[i][j] maxI, maxJ = i, j } } } fmt.Println("Finished making the matrix") // Initialize the indices to start at the position of the high score i, j := maxI, maxJ // Initialize the aligned strings as the input strings up to the position of the high score vAligned, wAligned := v[:i], w[:j] // Backtrack to start of the local alignment starting at the highest scoring cell for backTrack[i][j] != 3 && i*j != 0 && i >= j { if backTrack[i][j] == 0 { i-- } else if backTrack[i][j] == 1 { i-- j-- } else if backTrack[i][j] == 2 { j-- } } fmt.Println("finished backtracking") // Cut the strings at the ending point of the backtrack vAligned = vAligned[i:] wAligned = wAligned[j:] return strconv.Itoa(maxScore), vAligned, wAligned } func max(a, b int) int { if a > b { return a } return b } func maxSlice(slice []int) int { if len(slice) == 0 { return math.MinInt32 } max := slice[0] for _, value := range slice { if value > max { max = value } } return max } func indexOf(slice []int, item int) int { for i, v := range slice { if v == item { return i } } return -1 } func main() { file, err := os.Open("rosalind_laff.txt") if err != nil { fmt.Println("Error opening file:", err) return } defer file.Close() var indata []string scanner := bufio.NewScanner(file) for scanner.Scan() { indata = append(indata, scanner.Text()) } indata = append(indata, ">") word1 := "" word2 := "" linenum := 0 var chunk []string for _, line := range indata { line = strings.TrimSpace(line) linenum++ if line == "" { continue } else { if line[0] == '>' { if linenum == 1 { chunk = nil } else if linenum > 1 && linenum != len(indata) { word1 = strings.Join(chunk, "") chunk = nil } else { word2 = strings.Join(chunk, "") } } else { chunk = append(chunk, line) } } } // Get the BLOSUM62 matrix blosum, err := NewBLOSUM62() if err != nil { fmt.Println("Error loading BLOSUM62 matrix:", err) return } // Get the local alignment (given sigma = 11, epsilon = 1 in problem statement) maxScore, vAligned, wAligned := LocalAlignment(word1, word2, blosum, 11, 1) // Print the results fmt.Println(maxScore) fmt.Println(vAligned) fmt.Println(wAligned) // Save the results to a file outFile, err := os.Create("out_localali_test.txt") if err != nil { fmt.Println("Error creating output file:", err) return } defer outFile.Close() _, err = fmt.Fprintf(outFile, "%s\n%s\n%s", maxScore, vAligned, wAligned) if err != nil { fmt.Println("Error writing to output file:", err) } }

JavaScript

Works with: NodeJS 16.14.2

// Ported from https://codereview.stackexchange.com/questions/187588 // Make the BLOSUM matrix into a class for easier retrieval class BLOSUM62 {  constructor() {  this.scoring_matrix = {};  const fs = require('fs');  const entries = fs.readFileSync('BLOSUM62.txt', 'utf8')  .split('\n')  .map(line => line.trim().split(/\s+/));    for (const entry of entries) {  if (entry.length >= 3) {  this.scoring_matrix[[entry[0], entry[1]]] = parseInt(entry[2]);  }  }  }  getItem(pair) {  return this.scoring_matrix[[pair[0], pair[1]]];  } } function local_alignment(v, w, scoring_matrix, gap_start, gap_extend) {  // Initialize the 3 matrices  const M = Array(v.length + 1).fill().map(() => Array(w.length + 1).fill(0));  const X = Array(v.length + 1).fill().map(() => Array(w.length + 1).fill(0));  const Y = Array(v.length + 1).fill().map(() => Array(w.length + 1).fill(0));  const back_track = Array(v.length + 1).fill().map(() => Array(w.length + 1).fill(0));    // Initialize the maximum scores  let max_score = -1;  let max_i = 0, max_j = 0;    // Populate all three matrices  for (let i = 1; i <= v.length; i++) {  for (let j = 1; j <= w.length; j++) {  Y[i][j] = Math.max(Y[i-1][j] - gap_extend, M[i-1][j] - gap_start);  X[i][j] = Math.max(X[i][j-1] - gap_extend, M[i][j-1] - gap_start);    const cur_scores = [  Y[i][j],   M[i-1][j-1] + scoring_matrix.getItem([v[i-1], w[j-1]]),   X[i][j],   0  ];    M[i][j] = Math.max(...cur_scores);  back_track[i][j] = cur_scores.indexOf(M[i][j]);    if (M[i][j] > max_score) {  max_score = M[i][j];  max_i = i;  max_j = j;  }  }  }    console.log('Finished making the matrix');    // Initialize the indices to start at the position of the high score  let i = max_i, j = max_j;    // Initialize the aligned strings as the input strings up to the position of the high score  let v_aligned = v.substring(0, i);  let w_aligned = w.substring(0, j);    // Backtrack to start of the local alignment starting at the highest scoring cell  while (back_track[i][j] !== 3 && i * j !== 0 && i >= j) {  if (back_track[i][j] === 0) {  i -= 1;  } else if (back_track[i][j] === 1) {  i -= 1;  j -= 1;  } else if (back_track[i][j] === 2) {  j -= 1;  }  }    console.log('finished backtracking');    // Cut the strings at the ending point of the backtrack  v_aligned = v_aligned.substring(i);  w_aligned = w_aligned.substring(j);    return [max_score.toString(), v_aligned, w_aligned]; } // Read the input file const fs = require('fs'); let indata = fs.readFileSync('rosalind_laff.txt', 'utf8').split('\n'); indata.push('>'); let word1 = ''; let word2 = ''; let linenum = 0; let chunk = []; for (const line of indata) {  const trimmedLine = line.trim();  linenum = linenum + 1;    if (trimmedLine === '') {  continue;  } else {  if (trimmedLine[0] === '>') {  if (linenum === 1) {  chunk = [];  } else if (linenum > 1 && linenum !== indata.length) {  word1 = chunk.join('');  chunk = [];  } else {  word2 = chunk.join('');  }  } else {  chunk.push(trimmedLine);  }  } } // Get the local alignment (given sigma = 11, epsilon = 1 in problem statement) const blosum = new BLOSUM62(); const alignment = local_alignment(word1, word2, blosum, 11, 1); // Print and save the answer console.log(alignment.join('\n')); fs.writeFileSync('out_localali_test.txt', alignment.toString());

Julia

From the same source code as the second Wren examples.

# Smith-Waterman implementation in Julia # see Java code at https://github.com/grace-raper/smith-waterman/blob/main/SmithWaterman.java import Random.shuffle! # Map of amino acids to corresponding index in the BLOSUM62 scoring matrix const AMINO_TO_INDEX = Dict{Char, Int}(  'A' => 1, 'R' => 2, 'N' => 3, 'D' => 4, 'C' => 5,  'Q' => 6, 'E' => 7, 'G' => 8, 'H' => 9, 'I' => 10,  'L' => 11, 'K' => 12, 'M' => 13, 'F' => 14, 'P' => 15,  'S' => 16, 'T' => 17, 'W' => 18, 'Y' => 19, 'V' => 20,  'a' => 1, 'r' => 2, 'n' => 3, 'd' => 4, 'c' => 5,  'q' => 6, 'e' => 7, 'g' => 8, 'h' => 9, 'i' => 10,  'l' => 11, 'k' => 12, 'm' => 13, 'f' => 14, 'p' => 15,  's' => 16, 't' => 17, 'w' => 18, 'y' => 19, 'v' => 20, ) # BLOSUM62 scoring matrix const BLOSUM62 = [  [4, -1, -2, -2, 0, -1, -1, 0, -2, -1, -1, -1, -1, -2, -1, 1, 0, -3, -2, 0],  [-1, 5, 0, -2, -3, 1, 0, -2, 0, -3, -2, 2, -1, -3, -2, -1, -1, -3, -2, -3],  [-2, 0, 6, 1, -3, 0, 0, 0, 1, -3, -3, 0, -2, -3, -2, 1, 0, -4, -2, -3],  [-2, -2, 1, 6, -3, 0, 2, -1, -1, -3, -4, -1, -3, -3, -1, 0, -1, -4, -3, -3],  [0, -3, -3, -3, 9, -3, -4, -3, -3, -1, -1, -3, -1, -2, -3, -1, -1, -2, -2, -1],  [-1, 1, 0, 0, -3, 5, 2, -2, 0, -3, -2, 1, 0, -3, -1, 0, -1, -2, -1, -2],  [-1, 0, 0, 2, -4, 2, 5, -2, 0, -3, -3, 1, -2, -3, -1, 0, -1, -3, -2, -2],  [0, -2, 0, -1, -3, -2, -2, 6, -2, -4, -4, -2, -3, -3, -2, 0, -2, -2, -3, -3],  [-2, 0, 1, -1, -3, 0, 0, -2, 8, -3, -3, -1, -2, -1, -2, -1, -2, -2, 2, -3],  [-1, -3, -3, -3, -1, -3, -3, -4, -3, 4, 2, -3, 1, 0, -3, -2, -1, -3, -1, 3],  [-1, -2, -3, -4, -1, -2, -3, -4, -3, 2, 4, -2, 2, 0, -3, -2, -1, -2, -1, 1],  [-1, 2, 0, -1, -3, 1, 1, -2, -1, -3, -2, 5, -1, -3, -1, 0, -1, -3, -2, -2],  [-1, -1, -2, -3, -1, 0, -2, -3, -2, 1, 2, -1, 5, 0, -2, -1, -1, -1, -1, 1],  [-2, -3, -3, -3, -2, -3, -3, -3, -1, 0, 0, -3, 0, 6, -4, -2, -2, 1, 3, -1],  [-1, -2, -2, -1, -3, -1, -1, -2, -2, -3, -3, -1, -2, -4, 7, -1, -1, -4, -3, -2],  [1, -1, 1, 0, -1, 0, 0, 0, -1, -2, -2, 0, -1, -2, -1, 4, 1, -3, -2, -2],  [0, -1, 0, -1, -1, -1, -1, -2, -2, -1, -1, -1, -1, -2, -1, 1, 5, -2, -2, 0],  [-3, -3, -4, -4, -2, -2, -3, -2, -2, -3, -2, -3, -1, 1, -4, -3, -2, 11, 2, -3],  [-2, -2, -2, -3, -2, -1, -2, -3, 2, -1, -1, -2, -1, 3, -3, -2, -2, 2, 7, -1],  [0, -3, -3, -3, -1, -2, -2, -3, -3, 3, 1, -2, 1, -1, -2, -2, 0, -3, -1, 4], ] # Linear gap cost const GAP_COST = -4 mutable struct SmithWaterman  s1id::String  s2id::String  s1::String  s2::String  n::Int  score::Matrix{Int}  max_val::Int  max_i::Int  max_j::Int  function SmithWaterman(s1id::String, s2id::String, s1::String, s2::String, n::Int, gap_cost = GAP_COST)  sw = new(s1id, s2id, s1, s2, n, zeros(Int, length(s1) + 1, length(s2) + 1), 0, 0, 0)  fill_score!(sw, gap_cost)  optimize_alignment!(sw)  return sw  end end function fill_score!(sw::SmithWaterman, gap_cost)  # Initialize first row and column with zeros  sw.score[:, 1] .= 0  sw.score[1, :] .= 0  # Fill the scoring matrix  for i in 2:size(sw.score, 1)  for j in 2:size(sw.score, 2)  options = [  sw.score[i-1, j-1] + BLOSUM62[AMINO_TO_INDEX[sw.s1[i-1]]][AMINO_TO_INDEX[sw.s2[j-1]]],  sw.score[i-1, j] + gap_cost,  sw.score[i, j-1] + gap_cost,  0,  ]  sw.score[i, j] = maximum(options)  end  end end function optimize_alignment!(sw::SmithWaterman)  sw.max_val = 0  sw.max_i = 0  sw.max_j = 0  # Find the maximum score and its position  for i in 2:length(sw.s1)  for j in 2:length(sw.s2)  if sw.score[i, j] > sw.max_val  sw.max_val = sw.score[i, j]  sw.max_i = i  sw.max_j = j  end  end  end end # Return the score function score(sw::SmithWaterman)  return sw.max_val end # Compute p-value based on n iterations function pval(sw::SmithWaterman)  better = 0  for _ in 1:sw.n  # Shuffle s2  s2chars = collect(sw.s2)  shuffle!(s2chars)  s2random = join(s2chars)  sm = SmithWaterman("", "", sw.s1, s2random, 0)  if score(sm) >= score(sw)  better += 1  end  end  return (better + 1) / (sw.n + 1) end # Print the alignment function print_alignment(sw::SmithWaterman, gap_cost = GAP_COST)  i = sw.max_i  j = sw.max_j  s1print = ""  compare = ""  s2print = ""  # Begin by adding the last amino acid to the sequence  s1print = sw.s1[i] * s1print  s2print = sw.s2[j] * s2print  # Compare the match  if sw.s1[i] == sw.s2[j]  compare = sw.s1[i] * compare  elseif BLOSUM62[AMINO_TO_INDEX[sw.s1[i]]][AMINO_TO_INDEX[sw.s2[j]]] > 0  compare = "+" * compare  else  compare = " " * compare  end  # Backtrack to construct alignment  while sw.score[i, j] > 0  if sw.score[i, j] - gap_cost == sw.score[i-1, j]  i -= 1  s1print = sw.s1[i] * s1print  s2print = "-" * s2print  compare = " " * compare  elseif sw.score[i, j] - gap_cost == sw.score[i, j-1]  j -= 1  s1print = "-" * s1print  s2print = sw.s2[j] * s2print  compare = " " * compare  else  i -= 1  j -= 1  s1print = sw.s1[i] * s1print  s2print = sw.s2[j] * s2print  if sw.s1[i] == sw.s2[j]  compare = sw.s1[i] * compare  elseif BLOSUM62[AMINO_TO_INDEX[sw.s1[i]]][AMINO_TO_INDEX[sw.s2[j]]] > 0  compare = "+" * compare  else  compare = " " * compare  end  end  end  # Print basics  println("COMPARISON OF $(sw.s1id) AND $(sw.s2id)\n")  println("Score: $(sw.max_val)\n")  println("Alignment:")  # Print sequence alignment, 60 characters per line  a = "$(sw.s1id):\t$(i - 1)\t"  b = "\t\t\t"  c = "$(sw.s2id):\t$(j - 1)\t"  for k in eachindex(s1print)  if k > 1 && (k - 1) % 60 == 0  println(a)  println(b)  println(c)  println()  a = "$(sw.s1id):\t$i\t"  b = "\t\t\t"  c = "$(sw.s2id):\t$j\t"  end  a *= s1print[k]  b *= compare[k]  c *= s2print[k]  if s1print[k] != '-'  i += 1  end  if s2print[k] != '-'  j += 1  end  end  println(a)  println(b)  println(c)  # Print score matrix if both strings are less than 15 characters  if length(sw.s1) < 15 && length(sw.s2) < 15  println("\nScore Matrix:")  for l in 1:size(sw.score, 1)  println(sw.score[l, :])  end  end  # Print p-value if n > 0  if sw.n > 0  println("\np-value: $(pval(sw))")  end end const ACC_TO_SEQ = Dict(  "O95363" =>  "MVGSALRRGAHAYVYLVSKASHISRGHQHQAWGSRPPAAECATQRAPGSVVELLGKSYPQ" *  "DDHSNLTRKVLTRVGRNLHNQQHHPLWLIKERVKEHFYKQYVGRFGTPLFSVYDNLSPVV" *  "TTWQNFDSLLIPADHPSRKKGDNYYLNRTHMLRAHTSAHQWDLLHAGLDAFLVVGDVYRR" *  "DQIDSQHYPIFHQLEAVRLFSKHELFAGIKDGESLQLFEQSSRSAHKQETHTMEAVKLVE" *  "FDLKQTLTRLMAHLFGDELEIRWVDCYFPFTHPSFEMEINFHGEWLEVLGCGVMEQQLVN" *  "SAGAQDRIGWAFGLGLERLAMILYDIPDIRLFWCEDERFLKQFCVSNINQKVKFQPLSKY" *  "PAVINDISFWLPSENYAENDFYDLVRTIGGDLVEKVDLIDKFVHPKTHKTSHCYRITYRH" *  "MERTLSQREVRHIHQALQEAAVQLLGVEGRF",  "Q10574" =>  "MSWEQYQMYVPQCHPSFMYQGSIQSTMTTPLQSPNFSLDSPNYPDSLSNGGGKDDKKKCR" *  "RYKTPSPQLLRMRRSAANERERRRMNTLNVAYDELREVLPEIDSGKKLSKFETLQMAQKY" *  "IECLSQILKQDSKNENLKSKSG",  "P22816" =>  "MTKYNSGSSEMPAAQTIKQEYHNGYGQPTHPGYGFSAYSQQNPIAHPGQNPHQTLQNFFS" *  "RFNAVGDASAGNGGAASISANGSGSSCNYSHANHHPAELDKPLGMNMTPSPIYTTDYDDE" *  "NSSLSSEEHVLAPLVCSSAQSSRPCLTWACKACKKKSVTVDRRKAATMRERRRLRKVNEA" *  "FEILKRRTSSNPNQRLPKVEILRNAIEYIESLEDLLQESSTTRDGDNLAPSLSGKSCQSD" *  "YLSSYAGAYLEDKLSFYNKHMEKYGQFTDFDGNANGSSLDCLNLIVQSINKSTTSPIQNK" *  "ATPSASDTQSPPSSGATAPTSLHVNFKRKCST",  "Q8IU24" =>  "MEFVELSSCRFDATPTFCDRPAAPNATVLPGEHFPVPNGSYEDQGDGHVLAPGPSFHGPG" *  "RCLLWACKACKKKTVPIDRRKAATMRERRRLVKVNEAFDILKKKSCANPNQRLPKVEILR" *  "NAISYIEQLHKLLRDSKENSSGEVSDTSAPSPGSCSDGMAAHSPHSFCTDTSGNSSWEQG" *  "DGQPGNGYENQSCGNTVSSLDCLSLIVQSISTIEGEENNNASNTPR",  "Q90477" =>  "MELSDIPFPIPSADDFYDDPCFNTNDMHFFEDLDPRLVHVSLLKPDEHHHIEDEHVRAPS" *  "GHHQAGRCLLWACKACKRKTTNADRRKAATMRERRRLSKVNDAFETLKRCTSTNPNQRLP" *  "KVEILRNAISYIESLQALLRSQEDNYYPVLEHYSGDSDASSPRSNCSDGMMDFMGPTCQT" *  "RRRNSYDSSYFNDTPNADARNNKNSVVSSLDCLSSIVERISTETPACPVLSVPEGHEESP" *  "CSPHEGSVLSDTGTTAPSPTSCPQQQAQETIYQVL",  "P13904" =>  "MELLPPPLRDMEVTEGSLCAFPTPDDFYDDPCFNTSDMSFFEDLDPRLVHVTLLKPEEPH" *  "HNEDEHVRAPSGHHQAGRCLLWACKACKRKTTNADRRKAATMRERRRLSKVNEAFETLKR" *  "YTSTNPNQRLPKVEILRNAIRYIESLQALLHDQDEAFYPVLEHYSGDSDASSPRSNCSDG" *  "MMDYNSPPCGSRRRNSYDSSFYSDSPNDSRLGKSSVISSLDCLSSIVERISTQSPSCPVP" *  "TAVDSGSEGSPCSPLQGETLSERVITIPSPSNTCTQLSQDPSSTIYHVL",  "P16075" =>  "MDLLGPMEMTEGSLCSFTAADDFYDDPCFNTSDMHFFEDLDPRLVHVGGLLKPEEHPHTR" *  "APPREPTEEEHVRAPSGHHQAGRCLLWACKACKRKTTNADRRKAATMRERRRLSKVNEAF" *  "ETLKRCTSTNPNQRLPKVEILRNAIRYIESLQALLREQEDAYYPVLEHYSGESDASSPRS" *  "NCSDGMMEYSGPPCSSRRRNSYDSSYYTESPNDPKHGKSSVVSSLDCLSSIVERISTDNS" *  "TCPILPPAEAVAEGSPCSPQEGGNLSDSGAQIPSPTNCTPLPQESSSSSSSNPIYQVL",  "P10085" =>  "MELLSPPLRDIDLTGPDGSLCSFETADDFYDDPCFDSPDLRFFEDLDPRLVHMGALLKPE" *  "EHAHFPTAVHPGPGAREDEHVRAPSGHHQAGRCLLWACKACKRKTTNADRRKAATMRERR" *  "RLSKVNEAFETLKRCTSSNPNQRLPKVEILRNAIRYIEGLQALLRDQDAAPPGAAAFYAP" *  "GPLPPGRGSEHYSGDSDASSPRSNCSDGMMDYSGPPSGPRRQNGYDTAYYSEAARESRPG" *  "KSAAVSSLDCLSSIVERISTDSPAAPALLLADAPPESPPGPPEGASLSDTEQGTQTPSPD" *  "AAPQCPAGSNPNAIYQVL",  "P17542" =>  "MTERPPSEAARSDPQLEGRDAAEASMAPPHLVLLNGVAKETSRAAAAEPPVIELGARGGP" *  "GGGPAGGGGAARDLKGRDAATAEARHRVPTTELCRPPGPAPAPAPASVTAELPGDGRMVQ" *  "LSPPALAAPAAPGRALLYSLSQPLASLGSGFFGEPDAFPMFTTNNRVKRRPSPYEMEITD" *  "GPHTKVVRRIFTNSRERWRQQNVNGAFAELRKLIPTHPPDKKLSKNEILRLAMKYINFLA" *  "KLLNDQEEEGTQRAKTGKDPVVGAGGGGGGGGGGAPPDDLLQDVLSPNSSCGSSLDGAAS" *  "PDSYTEEPAPKHTARSLHPAMLPAADGAGPR",  "P15172" =>  "MELLSPPLRDVDLTAPDGSLCSFATTDDFYDDPCFDSPDLRFFEDLDPRLMHVGALLKPE" *  "EHSHFPAAVHPAPGAREDEHVRAPSGHHQAGRCLLWACKACKRKTTNADRRKAATMRERR" *  "RLSKVNEAFETLKRCTSSNPNQRLPKVEILRNAIRYIEGLQALLRDQDAAPPGAAAAFYA" *  "PGPLPPGRGGEHYSGDSDASSPRSNCSDGMMDYSGPPSGARRRNCYEGAYYNEAPSEPRP" *  "GKSAAVSSLDCLSSIVERISTESPAAPALLLADVPSESPPRRQEAAAPSEGESSGDPTQS" *  "PDAAPQCPAGANPNPIYQVL", ) const ACCESSIONS = ["P15172", "P17542", "P10085", "P16075", "P13904",  "Q90477", "Q8IU24", "P22816", "Q10574", "O95363"] # Part 1 of the test cases. const sw = SmithWaterman("str001", "str002", "deadly", "ddgearlyk", 999) print_alignment(sw) println("\nScore: ", score(sw)) println("P-value: ", pval(sw)) # part 2 test cases are too verbose, skipped here # Part 3 of the test cases. const c1 = SmithWaterman("P15172", "Q10574", ACC_TO_SEQ["P15172"], ACC_TO_SEQ["Q10574"], 999) const c2 = SmithWaterman("P15172", "Q10574", ACC_TO_SEQ["P15172"], ACC_TO_SEQ["O95363"], 999) println("p-value for P15172 versus Q10574: $(pval(c1))") println("p-value for P15172 versus O95363: $(pval(c2))")

Output:

COMPARISON OF str001 AND str002 Score: 13 Alignment: str001: 0 d-eadly d ea ly str002: 1 dgearly Score Matrix: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0] [0, 6, 6, 2, 2, 0, 0, 0, 0, 0] [0, 2, 8, 4, 7, 3, 0, 0, 0, 1] [0, 0, 4, 8, 4, 11, 7, 3, 0, 0] [0, 6, 6, 4, 10, 7, 9, 5, 1, 0] [0, 2, 2, 2, 6, 9, 5, 13, 9, 5] [0, 0, 0, 0, 2, 5, 7, 9, 20, 16] p-value: 0.206 Score: 13 P-value: 0.222 p-value for P15172 versus Q10574: 0.001 p-value for P15172 versus O95363: 0.544

Phix

Translation of: Julia – (and Wren v2)

constant BLOSUM62 = { -- j,o,u entries are "fake", intended to trigger a typecheck if actually referenced: -- A B C D E F G H I j K L M N o P Q R S T u V W X Y Z   { 4,-2, 0,-2,-1,-2, 0,-2,-1,"",-1,-1,-1,-2,"",-1,-1,-1, 1, 0,"", 0,-3,-1,-2,-1}, -- A  {-2, 6,-3, 6, 2,-3,-1,-1,-3,"",-1,-4,-3, 1,"",-1, 0,-2, 0,-1,"",-3,-4,-1,-3, 2}, -- B  { 0,-3, 9,-3,-4,-2,-3,-3,-1,"",-3,-1,-1,-3,"",-3,-3,-3,-1,-1,"",-1,-2,-1,-2,-4}, -- C  {-2, 6,-3, 6, 2,-3,-1,-1,-3,"",-1,-4,-3, 1,"",-1, 0,-2, 0,-1,"",-3,-4,-1,-3, 2}, -- D  {-1, 2,-4, 2, 5,-3,-2, 0,-3,"", 1,-3,-2, 0,"",-1, 2, 0, 0,-1,"",-2,-3,-1,-2, 5}, -- E  {-2,-3,-2,-3,-3, 6,-3,-1, 0,"",-3, 0, 0,-3,"",-4,-3,-3,-2,-2,"",-1, 1,-1, 3,-3}, -- F  { 0,-1,-3,-1,-2,-3, 6,-2,-4,"",-2,-4,-3, 0,"",-2,-2,-2, 0,-2,"",-3,-2,-1,-3,-2}, -- G  {-2,-1,-3,-1, 0,-1,-2, 8,-3,"",-1,-3,-2, 1,"",-2, 0, 0,-1,-2,"",-3,-2,-1, 2, 0}, -- H  {-1,-3,-1,-3,-3, 0,-4,-3, 4,"",-3, 2, 1,-3,"",-3,-3,-3,-2,-1,"", 3,-3,-1,-1,-3}, -- I  {"","","","","","","","","","","","","","","","","","","","","","","","","",""}, -- j  {-1,-1,-3,-1, 1,-3,-2,-1,-3,"", 5,-2,-1, 0,"",-1, 1, 2, 0,-1,"",-2,-3,-1,-2, 1}, -- K  {-1,-4,-1,-4,-3, 0,-4,-3, 2,"",-2, 4, 2,-3,"",-3,-2,-2,-2,-1,"", 1,-2,-1,-1,-3}, -- L  {-1,-3,-1,-3,-2, 0,-3,-2, 1,"",-1, 2, 5,-2,"",-2, 0,-1,-1,-1,"", 1,-1,-1,-1,-2}, -- M  {-2, 1,-3, 1, 0,-3, 0, 1,-3,"", 0,-3,-2, 6,"",-2, 0, 0, 1, 0,"",-3,-4,-1,-2, 0}, -- N  {"","","","","","","","","","","","","","","","","","","","","","","","","",""}, -- o  {-1,-1,-3,-1,-1,-4,-2,-2,-3,"",-1,-3,-2,-2,"", 7,-1,-2,-1,-1,"",-2,-4,-1,-3,-1}, -- P  {-1, 0,-3, 0, 2,-3,-2, 0,-3,"", 1,-2, 0, 0,"",-1, 5, 1, 0,-1,"",-2,-2,-1,-1, 2}, -- Q  {-1,-2,-3,-2, 0,-3,-2, 0,-3,"", 2,-2,-1, 0,"",-2, 1, 5,-1,-1,"",-3,-3,-1,-2, 0}, -- R  { 1, 0,-1, 0, 0,-2, 0,-1,-2,"", 0,-2,-1, 1,"",-1, 0,-1, 4, 1,"",-2,-3,-1,-2, 0}, -- S  { 0,-1,-1,-1,-1,-2,-2,-2,-1,"",-1,-1,-1, 0,"",-1,-1,-1, 1, 5,"", 0,-2,-1,-2,-1}, -- T  {"","","","","","","","","","","","","","","","","","","","","","","","","",""}, -- u  { 0,-3,-1,-3,-2,-1,-3,-3, 3,"",-2, 1, 1,-3,"",-2,-2,-3,-2, 0,"", 4,-3,-1,-1,-2}, -- V  {-3,-4,-2,-4,-3, 1,-2,-2,-3,"",-3,-2,-1,-4,"",-4,-2,-3,-3,-2,"",-3,11,-1, 2,-3}, -- W  {-1,-1,-1,-1,-1,-1,-1,-1,-1,"",-1,-1,-1,-1,"",-1,-1,-1,-1,-1,"",-1,-1,-1,-1,-1}, -- X  {-2,-3,-2,-3,-2, 3,-3, 2,-1,"",-2,-1,-1,-2,"",-3,-1,-2,-2,-2,"",-1, 2,-1, 7,-2}, -- Y  {-1, 2,-4, 2, 5,-3,-2, 0,-3,"", 1,-3,-2, 0,"",-1, 2, 0, 0,-1,"",-2,-3,-1,-2, 5}} -- Z // Linear gap cost (used when a '-' is inserted into the alignment). constant GAP_COST = -4 enum S1ID, S2ID, S1, S2, N, SCORE, MAX_VAL, MAX_I, MAX_J function fill_score(sequence sw, integer gap_cost)  sequence s1 = sq_sub(upper(sw[S1]),'A'-1),  s2 = sq_sub(upper(sw[S2]),'A'-1),  score = sw[SCORE]  sw[SCORE] = 0  for i=2 to length(score) do  for j=2 to length(score[1]) do  score[i,j] = max({score[i-1,j-1] + BLOSUM62[s1[i-1],s2[j-1]],  score[i-1,j] + gap_cost,  score[i,j-1] + gap_cost,  0})  end for  end for  sw[SCORE] = score  return sw end function function optimize_alignment(sequence sw)  // Find the maximum score and its position  integer max_val = 0, max_i = 0, max_j = 0,  l1 = length(sw[S1]), l2 = length(sw[S2])  sequence score = sw[SCORE]  for i=2 to l1 do  for j=2 to l2 do  if score[i,j]>max_val then  max_val = score[i,j]  max_i = i  max_j = j  end if  end for  end for  sw[MAX_VAL] = max_val  sw[MAX_I] = max_i  sw[MAX_J] = max_j  return sw end function function SmithWaterman(string s1id, s2id, s1, s2, integer n, gap_cost = GAP_COST)  sequence sw = repeat(repeat(0,length(s2)+1),length(s1)+1)  sw = {s1id,s2id,s1,s2,n,sw,0,0,0}  sw = fill_score(sw, gap_cost)  sw = optimize_alignment(sw)  return sw end function // Compute p-value based on n iterations function pval(sequence sw)  integer better = 0, n = sw[N], swm = sw[MAX_VAL]  string s1 = sw[S1], s2 = sw[S2]  for iter=1 to n do  better += SmithWaterman("","",s1,shuffle(s2),0)[MAX_VAL]>swm  end for  return (better+1) / (n+1) end function procedure print_alignment(sequence sw, integer gap_cost = GAP_COST)  string s1 = sw[S1], k1 = sq_sub(upper(s1),'A'-1),  s2 = sw[S2], k2 = sq_sub(upper(s2),'A'-1),  s1id = sw[S1ID],  s2id = sw[S2ID],  -- aside: these are built backwards:  s1print = "",  alignmt = "",  s2print = ""  integer i = sw[MAX_I],  j = sw[MAX_J],  ml = max(length(s1id),length(s2id))  sequence score = sw[SCORE]  // Begin by adding the last amino acid to the sequence  s1print &= s1[i]  s2print &= s2[j]  // Compare the match  if s1[i] == s2[j] then  alignmt &= s1[i]  elsif BLOSUM62[k1[i],k2[j]]>0 then  alignmt &= '+'  else  alignmt &= ' '  end if  // Backtrack to construct alignment  while score[i,j]>0 do  if score[i,j]-gap_cost == score[i-1,j] then  i -= 1  s1print &= s1[i]  s2print &= '-'  alignmt &= ' '  elsif score[i,j]-gap_cost == score[i,j-1] then  j -= 1  s1print &= '-'  s2print &= s2[j]  alignmt &= ' '  else  i -= 1  j -= 1  s1print &= s1[i]  s2print &= s2[j]  if s1[i] == s2[j] then  alignmt &= s1[i]  elsif BLOSUM62[k1[i],k2[j]]>0 then  alignmt &= '+'  else  alignmt &= ' '  end if  end if  end while  assert(length(s1print)=length(alignmt))  assert(length(s1print)=length(s2print))  s1print = reverse(s1print)  s2print = reverse(s2print)  alignmt = reverse(alignmt)  printf(1,"COMPARISON OF %s AND %s\n",{sw[S1ID],sw[S2ID]})  printf(1,"Score: %d\n",sw[MAX_VAL])  printf(1,"Alignment:\n")  printf(1,"%s: %s\n%s %s\n%s: %s\n",{pad_head(s1id,ml),shorten(s1print,"",50),  repeat(' ',ml),shorten(alignmt,"",50),  pad_head(s2id,ml),shorten(s2print,"",50)})  // Print score matrix if both strings are less than 15 characters  if length(s1)<15 and length(s2)<15 then  printf(1,"\nScore Matrix:\n")  pp(sw[SCORE],{pp_Nest,1})  end if  // Print p-value if the specified iterations is > 0  if sw[N]>0 then  printf(1,"\np-value: %.3f\n",pval(sw))  end if  printf(1,"\n") end procedure constant ACC_TO_SEQ = new_dict({  {"O95363", "MVGSALRRGAHAYVYLVSKASHISRGHQHQAWGSRPPAAECATQRAPGSVVELLGKSYPQ"&  "DDHSNLTRKVLTRVGRNLHNQQHHPLWLIKERVKEHFYKQYVGRFGTPLFSVYDNLSPVV"&  "TTWQNFDSLLIPADHPSRKKGDNYYLNRTHMLRAHTSAHQWDLLHAGLDAFLVVGDVYRR"&  "DQIDSQHYPIFHQLEAVRLFSKHELFAGIKDGESLQLFEQSSRSAHKQETHTMEAVKLVE"&  "FDLKQTLTRLMAHLFGDELEIRWVDCYFPFTHPSFEMEINFHGEWLEVLGCGVMEQQLVN"&  "SAGAQDRIGWAFGLGLERLAMILYDIPDIRLFWCEDERFLKQFCVSNINQKVKFQPLSKY"&  "PAVINDISFWLPSENYAENDFYDLVRTIGGDLVEKVDLIDKFVHPKTHKTSHCYRITYRH"&  "MERTLSQREVRHIHQALQEAAVQLLGVEGRF"},  {"Q10574", "MSWEQYQMYVPQCHPSFMYQGSIQSTMTTPLQSPNFSLDSPNYPDSLSNGGGKDDKKKCR"&  "RYKTPSPQLLRMRRSAANERERRRMNTLNVAYDELREVLPEIDSGKKLSKFETLQMAQKY"&  "IECLSQILKQDSKNENLKSKSG"},  {"P22816", "MTKYNSGSSEMPAAQTIKQEYHNGYGQPTHPGYGFSAYSQQNPIAHPGQNPHQTLQNFFS"&  "RFNAVGDASAGNGGAASISANGSGSSCNYSHANHHPAELDKPLGMNMTPSPIYTTDYDDE"&  "NSSLSSEEHVLAPLVCSSAQSSRPCLTWACKACKKKSVTVDRRKAATMRERRRLRKVNEA"&  "FEILKRRTSSNPNQRLPKVEILRNAIEYIESLEDLLQESSTTRDGDNLAPSLSGKSCQSD"&  "YLSSYAGAYLEDKLSFYNKHMEKYGQFTDFDGNANGSSLDCLNLIVQSINKSTTSPIQNK"&  "ATPSASDTQSPPSSGATAPTSLHVNFKRKCST"},  {"Q8IU24", "MEFVELSSCRFDATPTFCDRPAAPNATVLPGEHFPVPNGSYEDQGDGHVLAPGPSFHGPG"&  "RCLLWACKACKKKTVPIDRRKAATMRERRRLVKVNEAFDILKKKSCANPNQRLPKVEILR"&  "NAISYIEQLHKLLRDSKENSSGEVSDTSAPSPGSCSDGMAAHSPHSFCTDTSGNSSWEQG"&  "DGQPGNGYENQSCGNTVSSLDCLSLIVQSISTIEGEENNNASNTPR"},  {"Q90477", "MELSDIPFPIPSADDFYDDPCFNTNDMHFFEDLDPRLVHVSLLKPDEHHHIEDEHVRAPS"&  "GHHQAGRCLLWACKACKRKTTNADRRKAATMRERRRLSKVNDAFETLKRCTSTNPNQRLP"&  "KVEILRNAISYIESLQALLRSQEDNYYPVLEHYSGDSDASSPRSNCSDGMMDFMGPTCQT"&  "RRRNSYDSSYFNDTPNADARNNKNSVVSSLDCLSSIVERISTETPACPVLSVPEGHEESP"&  "CSPHEGSVLSDTGTTAPSPTSCPQQQAQETIYQVL"},  {"P13904", "MELLPPPLRDMEVTEGSLCAFPTPDDFYDDPCFNTSDMSFFEDLDPRLVHVTLLKPEEPH"&  "HNEDEHVRAPSGHHQAGRCLLWACKACKRKTTNADRRKAATMRERRRLSKVNEAFETLKR"&  "YTSTNPNQRLPKVEILRNAIRYIESLQALLHDQDEAFYPVLEHYSGDSDASSPRSNCSDG"&  "MMDYNSPPCGSRRRNSYDSSFYSDSPNDSRLGKSSVISSLDCLSSIVERISTQSPSCPVP"&  "TAVDSGSEGSPCSPLQGETLSERVITIPSPSNTCTQLSQDPSSTIYHVL"},  {"P16075", "MDLLGPMEMTEGSLCSFTAADDFYDDPCFNTSDMHFFEDLDPRLVHVGGLLKPEEHPHTR"&  "APPREPTEEEHVRAPSGHHQAGRCLLWACKACKRKTTNADRRKAATMRERRRLSKVNEAF"&  "ETLKRCTSTNPNQRLPKVEILRNAIRYIESLQALLREQEDAYYPVLEHYSGESDASSPRS"&  "NCSDGMMEYSGPPCSSRRRNSYDSSYYTESPNDPKHGKSSVVSSLDCLSSIVERISTDNS"&  "TCPILPPAEAVAEGSPCSPQEGGNLSDSGAQIPSPTNCTPLPQESSSSSSSNPIYQVL"},  {"P10085", "MELLSPPLRDIDLTGPDGSLCSFETADDFYDDPCFDSPDLRFFEDLDPRLVHMGALLKPE"&  "EHAHFPTAVHPGPGAREDEHVRAPSGHHQAGRCLLWACKACKRKTTNADRRKAATMRERR"&  "RLSKVNEAFETLKRCTSSNPNQRLPKVEILRNAIRYIEGLQALLRDQDAAPPGAAAFYAP"&  "GPLPPGRGSEHYSGDSDASSPRSNCSDGMMDYSGPPSGPRRQNGYDTAYYSEAARESRPG"&  "KSAAVSSLDCLSSIVERISTDSPAAPALLLADAPPESPPGPPEGASLSDTEQGTQTPSPD"&  "AAPQCPAGSNPNAIYQVL"},  {"P17542", "MTERPPSEAARSDPQLEGRDAAEASMAPPHLVLLNGVAKETSRAAAAEPPVIELGARGGP"&  "GGGPAGGGGAARDLKGRDAATAEARHRVPTTELCRPPGPAPAPAPASVTAELPGDGRMVQ"&  "LSPPALAAPAAPGRALLYSLSQPLASLGSGFFGEPDAFPMFTTNNRVKRRPSPYEMEITD"&  "GPHTKVVRRIFTNSRERWRQQNVNGAFAELRKLIPTHPPDKKLSKNEILRLAMKYINFLA"&  "KLLNDQEEEGTQRAKTGKDPVVGAGGGGGGGGGGAPPDDLLQDVLSPNSSCGSSLDGAAS"&  "PDSYTEEPAPKHTARSLHPAMLPAADGAGPR"},  {"P15172", "MELLSPPLRDVDLTAPDGSLCSFATTDDFYDDPCFDSPDLRFFEDLDPRLMHVGALLKPE"&  "EHSHFPAAVHPAPGAREDEHVRAPSGHHQAGRCLLWACKACKRKTTNADRRKAATMRERR"&  "RLSKVNEAFETLKRCTSSNPNQRLPKVEILRNAIRYIEGLQALLRDQDAAPPGAAAAFYA"&  "PGPLPPGRGGEHYSGDSDASSPRSNCSDGMMDYSGPPSGARRRNCYEGAYYNEAPSEPRP"&  "GKSAAVSSLDCLSSIVERISTESPAAPALLLADVPSESPPRRQEAAAPSEGESSGDPTQS"&  "PDAAPQCPAGANPNPIYQVL"}}) constant ACCESSIONS = {"P15172", "P17542", "P10085", "P16075", "P13904",  "Q90477", "Q8IU24", "P22816", "Q10574", "O95363"},  LA = length(ACCESSIONS) // Part 1 of the test cases. constant sw = SmithWaterman("str001", "str002", "deadly", "ddgearlyk", 999) print_alignment(sw) // Part 2 of the test cases. sequence test2 = repeat(repeat(0,LA),LA) for i,ai in ACCESSIONS do  for j,aj in ACCESSIONS from i do  sequence si = getd(ai,ACC_TO_SEQ),  sj = getd(aj,ACC_TO_SEQ),  temp = SmithWaterman(ai,aj,si,sj,0)  test2[i][j] = temp[MAX_VAL] if i=1 and j=2 then -- (comment out for excessively copious output)  print_alignment(temp) end if  end for end for pp(test2,{pp_Nest,1}) // Part 3 of the test cases. string id1 = "P15172", s1 = getd(id1,ACC_TO_SEQ),  id2 = "Q10574", s2 = getd(id2,ACC_TO_SEQ),  id3 = "O95363", s3 = getd(id3,ACC_TO_SEQ) sequence comp1 = SmithWaterman(id1, id2, s1, s2, 999),  comp2 = SmithWaterman(id1, id3, s1, s3, 999) printf(1,"\np-value for %s versus %s: %.3f\n",{id1,id2,pval(comp1)}) printf(1,"\np-value for %s versus %s: %.3f\n",{id1,id3,pval(comp2)})

Output:

(almost identical to Wren, when full output shown)

COMPARISON OF str001 AND str002 Score: 13 Alignment: str001: d-eadly d ea ly str002: dgearly Score Matrix: {{0,0,0,0,0,0,0,0,0,0}, {0,6,6,2,2,0,0,0,0,0}, {0,2,8,4,7,3,0,0,0,1}, {0,0,4,8,4,11,7,3,0,0}, {0,6,6,4,10,7,9,5,1,0}, {0,2,2,2,6,9,5,13,9,5}, {0,0,0,0,2,5,7,9,20,16}} p-value: 0.200 COMPARISON OF P15172 AND P17542 Score: 143 Alignment: P15172: TNADRRKAATMRERRRLSKVNEAFETLKRCTSSNP-NQRLPKVEILRNAI...PRSNCSDGMMDYSGPPSGARRRNCYEGAYYNEA-PSEPRPGKSAAVSSLD T RR RER R VN AF L++ ++P +++L K EILR A+...P + D + D P S + +GA ++ EP P K A SL P17542: TKVVRRIFTNSRERWRQQNVNGAFAELRKLIPTHPPDKKLSKNEILRLAM...P-PD--DLLQDVLSPNSSCGS-SL-DGAASPDSYTEEPAP-KHTA-RSLH {{1705,143,1496,1016,974,889,428,368,118,56}, {0,1726,128,129,128,112,144,123,150,66}, {0,0,1698,1039,998,921,440,367,118,52}, {0,0,0,1590,1147,1089,448,414,116,61}, {0,0,0,0,1537,1100,450,410,119,72}, {0,0,0,0,0,1475,449,410,114,62}, {0,0,0,0,0,0,1210,446,123,45}, {0,0,0,0,0,0,0,1741,118,74}, {0,0,0,0,0,0,0,0,740,67}, {0,0,0,0,0,0,0,0,0,2414}} p-value for P15172 versus Q10574: 0.001 p-value for P15172 versus O95363: 0.558

Wren

Library: Wren-ioutil

Library: Wren-math

Version 1

Translation of: JavaScript

import "./ioutil" for FileUtil import "./math" for Math, Nums class BLOSUM62 {  construct new() {  _scoringMatrix = {}  var lines = FileUtil.readLines("BLOSUM62.txt")  for (line in lines) {  var entry = line.trim().split(" ")  if (entry.count >= 3) {  var key = entry[0] + "," + entry[1]  _scoringMatrix[key] = Num.fromString(entry[2])  }  }  }  getItem(pair) {  var key = pair[0] + "," + pair[1]  return _scoringMatrix[key]  } } var localAlignment = Fn.new { |v, w, scoringMatrix, gapStart, gapExtend|  // Initialize the matrices.  var M = List.filled(v.count + 1, null)  var X = List.filled(v.count + 1, null)  var Y = List.filled(v.count + 1, null)  var B = List.filled(v.count + 1, null) // back-track  for (i in 0..v.count) {  M[i] = List.filled(w.count + 1, 0)  X[i] = List.filled(w.count + 1, 0)  Y[i] = List.filled(w.count + 1, 0)  B[i] = List.filled(w.count + 1, 0)  }  // Initialize the maximum scores.  var maxScore = -1  var maxI = 0  var maxJ = 0  // Populate the matrices.  for (i in 1..v.count) {  for (j in 1..w.count) {  Y[i][j] = Math.max(Y[i-1][j] - gapExtend, M[i-1][j] - gapStart)  X[i][j] = Math.max(X[i][j-1] - gapExtend, M[i][j-1] - gapStart)  var curScores = [  Y[i][j],  M[i-1][j-1] + scoringMatrix.getItem([v[i-1], w[j-1]]),  X[i][j],  0  ]  M[i][j] = Nums.max(curScores)  B[i][j] = curScores.indexOf(M[i][j])  if (M[i][j] > maxScore) {  maxScore = M[i][j]  maxI = i  maxJ = j  }  }  }  System.print("Finished making the matrix")  // Initialize the indices to start at the position of the high score.  var i = maxI  var j = maxJ  // Initialize the aligned strings as the input strings up to the position of the high score.  var vAligned = v[0...i]  var wAligned = w[0...j]  // Backtrack to start of the local alignment starting at the highest scoring cell.  while (B[i][j] != 3 && i * j != 0 && i >= j) {  if (B[i][j] == 0) {  i = i - 1  } else if (B[i][j] == 1) {  i = i - 1  j = j - 1  } else if (B[i][j] == 2) {  j = j - 1  }  }  System.print("finished backtracking")  // Cut the strings at the ending point of the backtrack.  vAligned = vAligned[i..-1]  wAligned = wAligned[j..-1]  return [maxScore.toString, vAligned, wAligned] } // Read the input file. var indata = FileUtil.readLines("sw_input.txt") indata.add(">") var word1 = "" var word2 = "" var linenum = 0 var chunk = [] for (line in indata) {  var trimmedLine = line.trim()  linenum = linenum + 1  if (trimmedLine == "") {  continue  } else {  if (trimmedLine[0] == ">") {  if (linenum == 1) {  chunk = []  } else if (linenum > 1 && linenum != indata.count) {  word1 = chunk.join("")  chunk = []  } else {  word2 = chunk.join("")  }  } else {  chunk.add(trimmedLine)  }  } } // Get the local alignment (given sigma = 11, epsilon = 1 in problem statement). var blosum = BLOSUM62.new() var alignment = localAlignment.call(word1, word2, blosum, 11, 1) // Print and save the answer. System.print(alignment.join("\n")) FileUtil.write("sw_output.txt", alignment.toString())

Version 2

This version is a translation of the Java code here which provides a basic implementation of the Smith-Waterman algorithm that has been tailored to the issue of finding the optimal alignment of two DNA protein sequences. The code is subject to the MIT Licence (reproduced below) and so therefore is this translation.

/* MIT License Copyright (c) 2022 Grace Raper Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ import "./math" for Nums import "random" for Random // Map of amino acids to corresponding index in the BLOSUM62 scoring matrix. var AMINO_TO_INDEX = {  "A": 0, "R": 1, "N": 2, "D": 3, "C": 4,  "Q": 5, "E": 6, "G": 7, "H": 8, "I": 9,  "L": 10, "K": 11, "M": 12, "F": 13, "P": 14,  "S": 15, "T": 16, "W": 17, "Y": 18, "V": 19,  "a": 0, "r": 1, "n": 2, "d": 3, "c": 4,  "q": 5, "e": 6, "g": 7, "h": 8, "i": 9,  "l": 10, "k": 11, "m": 12, "f": 13, "p": 14,  "s": 15, "t": 16, "w": 17, "y": 18, "v": 19 } // BLOSUM62 scoring matrix. var BLOSUM62 = [  [ 4, -1, -2, -2, 0, -1, -1, 0, -2, -1, -1, -1, -1, -2, -1, 1, 0, -3, -2, 0],  [-1, 5, 0, -2, -3, 1, 0, -2, 0, -3, -2, 2, -1, -3, -2, -1, -1, -3, -2, -3],  [-2, 0, 6, 1, -3, 0, 0, 0, 1, -3, -3, 0, -2, -3, -2, 1, 0, -4, -2, -3],  [-2, -2, 1, 6, -3, 0, 2, -1, -1, -3, -4, -1, -3, -3, -1, 0, -1, -4, -3, -3],  [ 0, -3, -3, -3, 9, -3, -4, -3, -3, -1, -1, -3, -1, -2, -3, -1, -1, -2, -2, -1],  [-1, 1, 0, 0, -3, 5, 2, -2, 0, -3, -2, 1, 0, -3, -1, 0, -1, -2, -1, -2],  [-1, 0, 0, 2, -4, 2, 5, -2, 0, -3, -3, 1, -2, -3, -1, 0, -1, -3, -2, -2],  [ 0, -2, 0, -1, -3, -2, -2, 6, -2, -4, -4, -2, -3, -3, -2, 0, -2, -2, -3, -3],  [-2, 0, 1, -1, -3, 0, 0, -2, 8, -3, -3, -1, -2, -1, -2, -1, -2, -2, 2, -3],  [-1, -3, -3, -3, -1, -3, -3, -4, -3, 4, 2, -3, 1, 0, -3, -2, -1, -3, -1, 3],  [-1, -2, -3, -4, -1, -2, -3, -4, -3, 2, 4, -2, 2, 0, -3, -2, -1, -2, -1, 1],  [-1, 2, 0, -1, -3, 1, 1, -2, -1, -3, -2, 5, -1, -3, -1, 0, -1, -3, -2, -2],  [-1, -1, -2, -3, -1, 0, -2, -3, -2, 1, 2, -1, 5, 0, -2, -1, -1, -1, -1, 1],  [-2, -3, -3, -3, -2, -3, -3, -3, -1, 0, 0, -3, 0, 6, -4, -2, -2, 1, 3, -1],  [-1, -2, -2, -1, -3, -1, -1, -2, -2, -3, -3, -1, -2, -4, 7, -1, -1, -4, -3, -2],  [ 1, -1, 1, 0, -1, 0, 0, 0, -1, -2, -2, 0, -1, -2, -1, 4, 1, -3, -2, -2],  [ 0, -1, 0, -1, -1, -1, -1, -2, -2, -1, -1, -1, -1, -2, -1, 1, 5, -2, -2, 0],  [-3, -3, -4, -4, -2, -2, -3, -2, -2, -3, -2, -3, -1, 1, -4, -3, -2, 11, 2, -3],  [-2, -2, -2, -3, -2, -1, -2, -3, 2, -1, -1, -2, -1, 3, -3, -2, -2, 2, 7, -1],  [ 0, -3, -3, -3, -1, -2, -2, -3, -3, 3, 1, -2, 1, -1, -2, -2, 0, -3, -1, 4] ] // Linear gap cost (used when a '-' is inserted into the alignment). var GAP_COST = -4 // Random number generator. var Rand = Random.new() class SmithWaterman {  construct new(s1id, s2id, s1, s2, n) {  _s1id = s1id  _s2id = s2id  _s1 = s1  _s2 = s2  _n = n  _score = List.filled(s1.count + 1, null)  for (i in 0..s1.count) _score[i] = List.filled(s2.count + 1, 0)  for (i in 1..._score.count) {  for (j in 1..._score[0].count) {  var options = []  var b = BLOSUM62[AMINO_TO_INDEX[s1[i - 1]]][AMINO_TO_INDEX[s2[j - 1]]]  options.add(_score[i - 1][j - 1] + b)  options.add(_score[i - 1][j] + GAP_COST)  options.add(_score[i][j - 1] + GAP_COST)  options.add(0)  _score[i][j] = Nums.max(options)  }  }  // Optimize alignment.  _maxVal = 0  _maxI = 0  _maxJ = 0  // Iterate through the list updating max when a larger value is found.  for (i in 1...s1.count) {  for (j in 1...s2.count) {  if (_score[i][j] > _maxVal) {  _maxVal = _score[i][j]  _maxI = i  _maxJ = j  }  }  }  }  // Get score.  score { _maxVal }  // Returns p-val (iterations are determined by constructor).  pval {  var better = 0  for (k in 0..._n) {  var s2chars = _s2.toList  for (i in 0...s2chars.count) {  var j = Rand.int(s2chars.count)  s2chars.swap(i, j)  }  var s2random = s2chars.join("")  var sm = SmithWaterman.new("", "", _s1, s2random, 0)  if (sm.score >= this.score) better = better + 1  }  return (better + 1) / (_n + 1)  }  // Prints alignment.  // If both string are less than 15 characters, scoring matrix is printed.  // If n > 0, p-value is printed.  printAlignment() {  // Backtrack for the maxVal to find the optimal solution!  var i = _maxI  var j = _maxJ  var s1print = ""  var compare = ""  var s2print = ""  // Begin by adding the last amino acid to the sequence.  s1print = _s1[i] + s1print  s2print = _s2[j] + s2print  // Compare that match is s1 and s2 to add to compare string.  if (_s1[i] == _s2[j]) {  compare = _s1[i] + compare  } else if (BLOSUM62[AMINO_TO_INDEX[_s1[i]]][AMINO_TO_INDEX[_s2[j]]] > 0) {  compare = "+" + compare  } else {  compare = " " + compare  }  // Continue to format strings.  while (_score[i][j] > 0) {  if (_score[i][j] - GAP_COST == _score[i - 1][j]) {  i = i - 1  s1print = _s1[i] + s1print  s2print = "-" + s2print  compare = " " + compare  } else if (_score[i][j] - GAP_COST == _score[i][j - 1]) {  j = j - 1  s1print = "-" + s1print  s2print = _s2[j] + s2print  compare = " " + compare  } else {  i = i - 1  j = j - 1  s1print = _s1[i] + s1print  s2print = _s2[j] + s2print  if (_s1[i] == _s2[j]) {  compare = _s1[i] + compare  } else if (BLOSUM62[AMINO_TO_INDEX[_s1[i]]][AMINO_TO_INDEX[_s2[j]]] > 0) {  compare = "+" + compare  } else {  compare = " " + compare  }  }  }  // Print some of the basics.  System.print("COMPARISON OF %(_s1id) AND %(_s2id)\n")  System.print("Score: %(_maxVal)\n")  System.print("Alignment:")  // Print sequence alignment using s1print, compare, s2print.  // 60 characters per line.  // At the start of line print identifier and location in input string.  var a = "%(_s1id):\t%(i)\t"  var b = "\t\t\t"  var c = "%(_s2id):\t%(j)\t"  for (k in 0...s1print.count) {  if (k != 0 && k % 60 == 0) {  System.print(a)  System.print(b)  System.print(c)  System.print()  a = "%(_s1id):\t%(i)\t"  b = "\t\t\t"  c = "%(_s2id):\t%(j)\t"  }  a = a + s1print[k]  b = b + compare[k]  c = c + s2print[k]  if (s1print[k] != "-") i = i + 1  if (s2print[k] != "-") j = j + 1  }  System.print(a)  System.print(b)  System.print(c)  // If both strings are less than 15 letters long, print score matrix.  if (_s1.count < 15 && _s2.count < 15) {  System.print("\nScore Matrix:")  for (l in 0..._score.count) System.print(_score[l])  }  // If n > 0 print p-value.  if (_n > 0) {  System.print("\np-value: %(pval)")  }  } } var ACC_TO_SEQ = {  "O95363": "MVGSALRRGAHAYVYLVSKASHISRGHQHQAWGSRPPAAECATQRAPGSVVELLGKSYPQ" +  "DDHSNLTRKVLTRVGRNLHNQQHHPLWLIKERVKEHFYKQYVGRFGTPLFSVYDNLSPVV" +  "TTWQNFDSLLIPADHPSRKKGDNYYLNRTHMLRAHTSAHQWDLLHAGLDAFLVVGDVYRR" +  "DQIDSQHYPIFHQLEAVRLFSKHELFAGIKDGESLQLFEQSSRSAHKQETHTMEAVKLVE" +  "FDLKQTLTRLMAHLFGDELEIRWVDCYFPFTHPSFEMEINFHGEWLEVLGCGVMEQQLVN" +  "SAGAQDRIGWAFGLGLERLAMILYDIPDIRLFWCEDERFLKQFCVSNINQKVKFQPLSKY" +  "PAVINDISFWLPSENYAENDFYDLVRTIGGDLVEKVDLIDKFVHPKTHKTSHCYRITYRH" +  "MERTLSQREVRHIHQALQEAAVQLLGVEGRF",  "Q10574": "MSWEQYQMYVPQCHPSFMYQGSIQSTMTTPLQSPNFSLDSPNYPDSLSNGGGKDDKKKCR" +  "RYKTPSPQLLRMRRSAANERERRRMNTLNVAYDELREVLPEIDSGKKLSKFETLQMAQKY" +  "IECLSQILKQDSKNENLKSKSG",  "P22816": "MTKYNSGSSEMPAAQTIKQEYHNGYGQPTHPGYGFSAYSQQNPIAHPGQNPHQTLQNFFS" +  "RFNAVGDASAGNGGAASISANGSGSSCNYSHANHHPAELDKPLGMNMTPSPIYTTDYDDE" +  "NSSLSSEEHVLAPLVCSSAQSSRPCLTWACKACKKKSVTVDRRKAATMRERRRLRKVNEA" +  "FEILKRRTSSNPNQRLPKVEILRNAIEYIESLEDLLQESSTTRDGDNLAPSLSGKSCQSD" +  "YLSSYAGAYLEDKLSFYNKHMEKYGQFTDFDGNANGSSLDCLNLIVQSINKSTTSPIQNK" +  "ATPSASDTQSPPSSGATAPTSLHVNFKRKCST",  "Q8IU24": "MEFVELSSCRFDATPTFCDRPAAPNATVLPGEHFPVPNGSYEDQGDGHVLAPGPSFHGPG" +  "RCLLWACKACKKKTVPIDRRKAATMRERRRLVKVNEAFDILKKKSCANPNQRLPKVEILR" +  "NAISYIEQLHKLLRDSKENSSGEVSDTSAPSPGSCSDGMAAHSPHSFCTDTSGNSSWEQG" +  "DGQPGNGYENQSCGNTVSSLDCLSLIVQSISTIEGEENNNASNTPR",  "Q90477": "MELSDIPFPIPSADDFYDDPCFNTNDMHFFEDLDPRLVHVSLLKPDEHHHIEDEHVRAPS" +  "GHHQAGRCLLWACKACKRKTTNADRRKAATMRERRRLSKVNDAFETLKRCTSTNPNQRLP" +  "KVEILRNAISYIESLQALLRSQEDNYYPVLEHYSGDSDASSPRSNCSDGMMDFMGPTCQT" +  "RRRNSYDSSYFNDTPNADARNNKNSVVSSLDCLSSIVERISTETPACPVLSVPEGHEESP" +  "CSPHEGSVLSDTGTTAPSPTSCPQQQAQETIYQVL",  "P13904": "MELLPPPLRDMEVTEGSLCAFPTPDDFYDDPCFNTSDMSFFEDLDPRLVHVTLLKPEEPH" +  "HNEDEHVRAPSGHHQAGRCLLWACKACKRKTTNADRRKAATMRERRRLSKVNEAFETLKR" +  "YTSTNPNQRLPKVEILRNAIRYIESLQALLHDQDEAFYPVLEHYSGDSDASSPRSNCSDG" +  "MMDYNSPPCGSRRRNSYDSSFYSDSPNDSRLGKSSVISSLDCLSSIVERISTQSPSCPVP" +  "TAVDSGSEGSPCSPLQGETLSERVITIPSPSNTCTQLSQDPSSTIYHVL",  "P16075": "MDLLGPMEMTEGSLCSFTAADDFYDDPCFNTSDMHFFEDLDPRLVHVGGLLKPEEHPHTR" +  "APPREPTEEEHVRAPSGHHQAGRCLLWACKACKRKTTNADRRKAATMRERRRLSKVNEAF" +  "ETLKRCTSTNPNQRLPKVEILRNAIRYIESLQALLREQEDAYYPVLEHYSGESDASSPRS" +  "NCSDGMMEYSGPPCSSRRRNSYDSSYYTESPNDPKHGKSSVVSSLDCLSSIVERISTDNS" +  "TCPILPPAEAVAEGSPCSPQEGGNLSDSGAQIPSPTNCTPLPQESSSSSSSNPIYQVL",  "P10085": "MELLSPPLRDIDLTGPDGSLCSFETADDFYDDPCFDSPDLRFFEDLDPRLVHMGALLKPE" +  "EHAHFPTAVHPGPGAREDEHVRAPSGHHQAGRCLLWACKACKRKTTNADRRKAATMRERR" +  "RLSKVNEAFETLKRCTSSNPNQRLPKVEILRNAIRYIEGLQALLRDQDAAPPGAAAFYAP" +  "GPLPPGRGSEHYSGDSDASSPRSNCSDGMMDYSGPPSGPRRQNGYDTAYYSEAARESRPG" +  "KSAAVSSLDCLSSIVERISTDSPAAPALLLADAPPESPPGPPEGASLSDTEQGTQTPSPD" +  "AAPQCPAGSNPNAIYQVL",  "P17542": "MTERPPSEAARSDPQLEGRDAAEASMAPPHLVLLNGVAKETSRAAAAEPPVIELGARGGP" +  "GGGPAGGGGAARDLKGRDAATAEARHRVPTTELCRPPGPAPAPAPASVTAELPGDGRMVQ" +  "LSPPALAAPAAPGRALLYSLSQPLASLGSGFFGEPDAFPMFTTNNRVKRRPSPYEMEITD" +  "GPHTKVVRRIFTNSRERWRQQNVNGAFAELRKLIPTHPPDKKLSKNEILRLAMKYINFLA" +  "KLLNDQEEEGTQRAKTGKDPVVGAGGGGGGGGGGAPPDDLLQDVLSPNSSCGSSLDGAAS" +  "PDSYTEEPAPKHTARSLHPAMLPAADGAGPR",  "P15172": "MELLSPPLRDVDLTAPDGSLCSFATTDDFYDDPCFDSPDLRFFEDLDPRLMHVGALLKPE" +  "EHSHFPAAVHPAPGAREDEHVRAPSGHHQAGRCLLWACKACKRKTTNADRRKAATMRERR" +  "RLSKVNEAFETLKRCTSSNPNQRLPKVEILRNAIRYIEGLQALLRDQDAAPPGAAAAFYA" +  "PGPLPPGRGGEHYSGDSDASSPRSNCSDGMMDYSGPPSGARRRNCYEGAYYNEAPSEPRP" +  "GKSAAVSSLDCLSSIVERISTESPAAPALLLADVPSESPPRRQEAAAPSEGESSGDPTQS" +  "PDAAPQCPAGANPNPIYQVL" } var ACCESSIONS = ["P15172", "P17542", "P10085", "P16075", "P13904",  "Q90477", "Q8IU24", "P22816", "Q10574", "O95363"] // Part 1 of the test cases. var test1 = SmithWaterman.new("str001", "str002", "deadly", "ddgearlyk", 999) test1.printAlignment() System.print() // Part 2 of the test cases. var test2 = List.filled(ACCESSIONS.count, null) for (i in 0...test2.count) {  test2[i] = List.filled(ACCESSIONS.count, 0)  for (j in i...test2.count) {  var temp = SmithWaterman.new(ACCESSIONS[i], ACCESSIONS[j],  ACC_TO_SEQ[ACCESSIONS[i]], ACC_TO_SEQ[ACCESSIONS[j]], 0)  test2[i][j] = temp.score  temp.printAlignment()  System.print()  } } for (i in 0...test2.count) System.print(test2[i]) System.print() // Part 3 of the test cases. var comp1 = SmithWaterman.new("P15172", "Q10574", ACC_TO_SEQ["P15172"], ACC_TO_SEQ["Q10574"], 999) var comp2 = SmithWaterman.new("P15172", "Q10574", ACC_TO_SEQ["P15172"], ACC_TO_SEQ["O95363"], 999) System.print("p-value for P15172 versus Q10574: %(comp1.pval)") System.print() System.print("p-value for P15172 versus O95363: %(comp2.pval)")

Output:

As there are a lot of results it's best to just dump them to a file. The results for test 3 are very slow to emerge. Only the result of the first test is shown below.

COMPARISON OF str001 AND str002 Score: 13 Alignment: str001:	0	d-eadly	d ea ly str002:	1	dgearly Score Matrix: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0] [0, 6, 6, 2, 2, 0, 0, 0, 0, 0] [0, 2, 8, 4, 7, 3, 0, 0, 0, 1] [0, 0, 4, 8, 4, 11, 7, 3, 0, 0] [0, 6, 6, 4, 10, 7, 9, 5, 1, 0] [0, 2, 2, 2, 6, 9, 5, 13, 9, 5] [0, 0, 0, 0, 2, 5, 7, 9, 20, 16] p-value: 0.204