Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] give a header to each fasta #7

Open
Louis-MG opened this issue Apr 5, 2023 · 0 comments
Open

[Feature request] give a header to each fasta #7

Louis-MG opened this issue Apr 5, 2023 · 0 comments

Comments

@Louis-MG
Copy link

Louis-MG commented Apr 5, 2023

I found myself using your tool to obtain more hindsight into kmer presence and absence in genomes, specifically for kmers unique to certain genomes. Because the output of UniqueKmer is in the following form:

>sequence1
kmer1
kmer2
kmer3

tools will count them as one continuous sequence kmer1kmer2kmer3 which is unintended and will even create kmers that do not exist in the original genome.

Could you add an option to give a header to each kmer ? it would then look like this :

>sequence1:kmer:1
kmer1
>sequence1:kmer:2
kmer2
>sequence1:kmer:3
kmer3

I wrote a command line with awk for that but it would be convenient to have it as an option:

awk -i inplace '{if (/>/) {line=$0; sum=0} else {sum+=1; KMER=$0; print line ":kmer:" sum "\n" KMER} }' unique_kmers.fasta
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant