Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uploading a custom genome #1195

Open
br1215 opened this issue Feb 16, 2024 · 19 comments
Open

Uploading a custom genome #1195

br1215 opened this issue Feb 16, 2024 · 19 comments

Comments

@br1215
Copy link

br1215 commented Feb 16, 2024

Hello,

I am trying to upload a custom genome to HiGlass, i've been looking at the data preparation for gene annotation uploads but I am confused how to set up the .db file that is produced at the end.

Right now I have a .gff3 file with all the annotations for the genome I need to upload is there an easier way to get from .gff3 to the .db file that is shown in https://docs.higlass.io/data_preparation.html#gene-annotation-tracks ??

Thank you

@pkerpedjiev
Copy link
Member

Yeah, it's a massive pain according to that documentation.

I can offer two alternative suggestions:

  1. Try the experimental gene_annotations repo and follow the instructions there.
  2. Post a link to the gff file here and I can try to help you get it converted.

@br1215
Copy link
Author

br1215 commented Feb 19, 2024

Thank you,

I will try out the gene_annotations repo first and then get back to you about the link.

@br1215
Copy link
Author

br1215 commented Feb 20, 2024

Hi,

Im having issues with some of the scripts you've linked. Would you be able to give it a try on your end with this .gff3?
https://drive.google.com/file/d/1SkvbMh1dMjSGWo0EAfllssEJtOMRnQ0s/view?usp=drive_link

Thank you very much.

@pkerpedjiev
Copy link
Member

pkerpedjiev commented Feb 21, 2024 via email

@br1215
Copy link
Author

br1215 commented Feb 21, 2024 via email

@pkerpedjiev
Copy link
Member

pkerpedjiev commented Feb 21, 2024 via email

@pkerpedjiev
Copy link
Member

Nvm, I was able to create one like this:

grep "sequence-region" data/GFP_synHoxA.gencode.vM20.annotation.gff3 | awk '{ print $2 "\t" $4 }' > data/GRCm38.chromsizes

@br1215
Copy link
Author

br1215 commented Feb 21, 2024 via email

@pkerpedjiev
Copy link
Member

Here's a link to download the file. It will expire 12 hours from now but I'll be happy to regenerate it.

You can preview what the file looks like on resgen:

https://resgen.io/l/?d=ZoOsGb0WSEGhYvHFHWhbTA

@br1215
Copy link
Author

br1215 commented Feb 21, 2024 via email

@br1215
Copy link
Author

br1215 commented Feb 21, 2024

Is there a specific way to ingest the files? Its taking a really long time of "loading" but nothing is appearing on the top axis.

@br1215
Copy link
Author

br1215 commented Feb 21, 2024

docker exec higlass-container python higlass-server/manage.py ingest_tileset --filename /tmp/GFP_synHoxA.gencode.vM20.annotation.gff3.hgbed.beddb --filetype gene-annotations --datatype gene-annotation

This was the line of code I was using to ingest the file

@br1215
Copy link
Author

br1215 commented Feb 21, 2024

Screenshot 2024-02-21 at 2.25.11 PM.pdf
Sorry to spam you with multiple messages but I was able to view it but because the chromosome size file is different there is a discrepency between the chromosome axis and the gene annotation. I've attached a screenshot where the HoxA cluster should be on chr6 not chr15.
Here is the link to the chromosome file I used previously: https://drive.google.com/file/d/1Q9fGocBPD6x1NRQSCKIKVADqLfeAlkwv/view?usp=drive_link
Would you please be able to generate the gene annotation file using this chromosome size file instead?
Also would it be possible to share how you made the gene annotation file as well? I may need to make more custom gene annotations down the line.

@pkerpedjiev
Copy link
Member

No problem, here's the updated file:

GFP_synHoxA.gencode.vM20.annotation.gff3.hgbed.beddb.gz

And here's the exact commands I ran to create it from the gene_annotations repo.

FILE=data/gencode.vM32.annotation.gff3.gz
FILE=data/GFP_synHoxA.gencode.vM20.annotation.gff3
python scripts/gff_to_jsonl.py $FILE > $FILE.gjsonl
python scripts/gjsonl_to_hgbed.py --name-attribute gene_name $FILE.gjsonl > $FILE.hgbed
clodius aggregate bedfile --chromsizes-filename data/mm10_synHoxA.chrom.sizes $FILE.hgbed

@br1215
Copy link
Author

br1215 commented Feb 22, 2024 via email

@pkerpedjiev
Copy link
Member

Lol, that line is an artifact of me copying one line above where I should have started copying. It's for another file I converted previously.

@br1215
Copy link
Author

br1215 commented Feb 23, 2024

Haha thank you for the clarification!
On a side note, is it possible to load arc plots onto HiGlass?

@pkerpedjiev
Copy link
Member

Yes, I do believe you can load arc plots onto HiGlass. You have to convert the bedfile into beddb using that same clodius aggregate ... command and then ingest using --filetype beddb --datatype bedlike.

Here's an example of what they should look like:

https://resgen.io/l/?d=FDtY1WmpRiGiEab7hfpFkA

@br1215
Copy link
Author

br1215 commented Apr 5, 2024

I think theres a disconnect between the files I have and the files that are required for arc plots.
I have been using the ARIMA Capture Hi-C pipeline to analyze my data and the output arcplots are in this format:
chr6 51148745 51153809 chr6:52196961-52202014,-1.62924053973028 70 .
chr6 51153810 51158843 chr6:52175795-52180921,-1.72455071953461 46 .
chr6 51164005 51169469 chr6:52222766-52227908,-1.99877363861238 160 .
chr6 51169470 51174478 chr6:52008494-52013501,-2.02287119019144 2 .

Where they also have a corresponding .tbi file that I can upload onto the washu genome browser.
Is there a way to convert this file into a beddb file? I tried using the aggregate command you suggested but when I ingest and load onto HiGlass nothing is loaded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants