Skip to content

Releases: ppillot/biomsalign

v0.3.3

07 Jan 04:13
Compare
Choose a tag to compare

Fixes

  • fix Traceback matrix size, which was causing artificial matches at the end of sequences
  • fix DEL->MATCH transition in msa vs msa, and msa vs seq aligments. The value used for computing the del score could be corrupted in some cases.

v0.3.2

06 Jan 22:53
Compare
Choose a tag to compare

Fixes

  • fix weighing schemes (seq vs profile alignment, profile vs profile alignment)
  • fix profiles merging

Improvement

  • Kmer distance (now correct for identical sequences)

v0.3.1

06 Jan 03:33
Compare
Choose a tag to compare

Fixes

  • Optional arguments were not obeyed anymore due to the new minimization process.
  • Kmer based distances are now computed by taking into account common Kmers and common missing Kmers, in conformity with the Simple Matching Distance (by opposition with the Tanimoto Distance used previously). Kmers based distances seem to not be suitable when comparing unrelated sequences of various length. By taking into account also the Kmers that are commonly absent, these sequences are relatively disfavoured instead of appearing more related due to their size.

v0.3.0

01 Jan 21:11
Compare
Choose a tag to compare

What's Changed

  • Distance matrix for global multiple sequences alignement, based on Kmer fingerprinting, has been improved. The distance computation uses a Jacquard similarity score and is corrected for sequence length by estimating a background matching probability.

Fixes

  • Kmer computation from nucleic sequences in the distance matrix function

Full Changelog: v0.2.0-beta...v0.3.0

Diagonal based alignment improvements

12 Sep 00:07
Compare
Choose a tag to compare
Pre-release

Diagonal filtering improvements

Correctness and overall speed have been improved:

  • A special case of longest path was ignored
  • A better definition of optimal path has made possible some additional filtering, thus reducing the search space and time complexity
  • A better implementation of path storage, using a traceback vector has reduced the memory footprint'
  • A garbage collection like method has been added to reduce the number of iterations dedicated to paths collections maintenance

Other fixes

  • A common case for stale kmers in the minimizing window was not taken into account
  • In the regular Multiple Sequence Alignment procedure, a bug was preventing the detection of gap openings. The formulae for computing gap opening/closing penalty has been fixed.

Diagonal based alignments fixes

07 Sep 01:34
Compare
Choose a tag to compare
Pre-release

BioMSA v0.1.4

Fixes and improvements to diagonal based alignment heuristic

  • Previously only high confidence seeds for diagonals were retained, where the seed is a common window between both sequences. In cases of sequences with low homology this proved to be not sufficient. Now all common kmers are evaluated and an optimal list of diagonals seeds is built during an extra step.
  • Low quality seeds (kmers that are replicated in both sequences and convey less information) are discarded which avoids combinatorial explosions
  • Some fixes have been made to the diagonal extension mechanism where some boundary rules were not made symmetrical between both aligned sequences

Initial pre-release

30 Aug 01:34
Compare
Choose a tag to compare
Initial pre-release Pre-release
Pre-release

BioMSA v0.1.3

Initial release.

The library is functionnal. Protein and nucleic sequences can be aligned. Long sequences (>1600 residues) alignment relies on a diagonal finding strategy based on minimizers to speed up the process ×100.

Known issues

In multiple sequences alignments where the diagonal finding method is used, the center-star procedure involved in merging the pairwise alignments can cause unrealistic results in regions where the center sequence is notably different from its siblings.