Skip to content

Cenote Taker 2 version 2.1.5

Latest
Compare
Choose a tag to compare
@mtisza1 mtisza1 released this 09 May 17:47
· 11 commits to master since this release

NOTE: Downloading the binaries will not help you to set up Cenote-Taker 2. If you haven't already installed Cenote-Taker 2, please follow installation/update instructions in README, including the database updates.

Update notes:

  1. Major changes have been made to make the installation faster, easier and have a smaller data footprint (was ~130GB and now is ~8GB to ~75GB depending on your database choices). Details:
  • The following tools (either tricky to install or out of date) were removed from the dependencies: krona, emboss suite, circlator, mummer.
  • The following tools were added to the dependencies: seqkit
  • The following tools were changed from stand-alone git clones to packages in the conda environment: lastal/lastdb, hhblits/hhsearch, phanotate.
  • The protein BLAST database of RefSeq etc sequences was updated to include ~3000 new RefSeq virus entries
  • The hhsuite databases are now optional. PDB, PFAM, CDD
  1. The tool now checks that your run_title is appropriately formatted
  2. For contigs with DTRs (direct terminal repeats), the --wrap option allows users to choose either: clip repeat region and rotate contig to an appropriate position, or forgo rotating and clipping but DTRs are reported in the genome map. #29
  3. Certain rm commands were fixed. #21
  4. The taxonomy calling framework has been updated. NCBI Taxdump files are used for TaxIDs instead of the krona database. "tax_guide.blastx.out" files now show the taxid of the best hit, and have tab-separated hierarchical taxonomy info for that reference. Example:
example_ct1_1	gi|849254117|ref|YP_009150201.1| terminase [Propionibacterium phage PHL085N00]	45.575	9.81e-119	452
taxid: 1500812
10239	Viruses	superkingdom
2731341	Duplodnaviria	clade
2731360	Heunggongvirae	kingdom
2731618	Uroviricota	phylum
2731619	Caudoviricetes	class
28883	Caudovirales	order
10699	Siphoviridae	family
1982251	Pahexavirus	genus
1982275	Pahexavirus PHL037M02	species
  1. protein sequence based taxonomy now is more flexible, with thresholds for genome taxon assignment:
Hallmark AAI to Reference Taxonomic granularity from CT2
>90% Genus, e.g. "Ilzatvirus"
>40% Family, e.g. "Siphoviridae"
>25% Order, e.g. "Caudovirales"
=<25% Generic name, e.g. "phage"
  1. --hallmark_taxonomy option allows users to get hierarchical taxonomy information for all identified hallmark genes. This could be useful for more sophisticated downstream taxonomy assignments.
  2. -db virion is now the default setting. I think most people are inputting contigs assembled from WGS data, and this is the correct option for this data type.

Good luck with all of your Cenotes :neckbeard: 💖

Mike