Skip to content

Question: Suggestions for handling . wildcards within query patterns #123

Answered by BurntSushi
tfwillems asked this question in Q&A
Discussion options

You must be logged in to vote

Enumerating all cases is what I would suggest if the total number of patterns doesn't get too crazy. If enumerating all of them is infeasible (i.e., would result in more than low millions), then the next suggestion I have would be to search for a common prefix of the set of all enumerated patterns. So for example, if you have ATCNG, then you'd search for ATC and then run another search to confirm whether a match actually exists at that location. This strategy only works if your prefix leads to a low false positive rate of candidates. ATC, for example, is probably short enough that if you're searching DNA, you'll probably have a very high false positive rate. A long prefix doesn't guarante…

Replies: 3 comments 1 reply

Comment options

You must be logged in to vote
0 replies
Answer selected by BurntSushi
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
1 reply
@BurntSushi
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
4 participants
Converted from issue

This discussion was converted from issue #122 on August 15, 2023 19:00.