Skip to content
This repository has been archived by the owner on Mar 17, 2023. It is now read-only.

add prebuilt index for genome.fa #70

Open
lincoln-harris opened this issue Jun 8, 2020 · 2 comments
Open

add prebuilt index for genome.fa #70

lincoln-harris opened this issue Jun 8, 2020 · 2 comments
Labels
enhancement New feature or request wontfix This will not be worked on

Comments

@lincoln-harris
Copy link
Collaborator

lincoln-harris commented Jun 8, 2020

can we add a prebuilt index for the human genome .gtf / .fa that would load much faster?

hg38.fa -> 3 Gb
hg38.gtf -> 144 Mb

@lincoln-harris lincoln-harris added the enhancement New feature or request label Jun 8, 2020
@lincoln-harris lincoln-harris changed the title add prebuilt indexes add prebuilt index for genome.fa Jun 8, 2020
@lincoln-harris
Copy link
Collaborator Author

the rate limiting step here is genome interval tree construction, rather than building the genome.fa index. not sure what to do about this?

@lincoln-harris lincoln-harris added the wontfix This will not be worked on label Jun 9, 2020
@rvanheusden
Copy link
Contributor

Have you confirmed that building the interval tree is the main contributor for startup time? If so, it's perhaps worth taking a look at whether a majority of the time is spent doing calls in Python or if more time is spent in the low-level C code that NCLS uses for the underlying interval tree implementation. If the latter constitutes a majority of the time spent, then it may be worth optimizing the low level code. Because the interval tree only needs to be built once and then can be used on multiple threads, perhaps a custom low-level implementation of an interval tree that allows it to be built cooperatively by multiple threads would speed up this process.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

2 participants