-
Notifications
You must be signed in to change notification settings - Fork 448
Modules: adding test dataset for testing the hificnv module #1763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…cnv module tesing, genome/pacbio/vcv/SCRI_KT5028_GRCh38_downsampled_on_chr22_pbmm2_snv_hiphased.vcf.gz.
inemesb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this addition! Is it necessary to add a new reference genome, or could you use one that already exists in the test datasets? The one you added is quite big, if it works with a smaller one it might be better to align to e.g. 'genome.fasta'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice descriptions!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for reviewing it. I really appreciate it.
The aligned-phased BAM and MAF VCF files span the entire chr22, so the FASTA file also covers the whole chromosome and is indeed large, as you mentioned. I might try using homo_sapiens/genome/genome.fasta (chr22:16,570,000–16,610,000) to limit reads in the region, but I’m not sure if it’s based on GRCh38. Do you know? I’ll continue working on it this weekend. Thanks again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think so based on what I see: https://github.com/nf-core/test-datasets/blob/modules/data/genomics/homo_sapiens/README.md#reference-files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@inemesb Thanks for reviewing! I found that the genomics/homo_sapiens/pacbio directory includes suitable genome3.fasta, BAM, and VCF files for the hificnv module I submitted. I wasn’t very familiar with nf-core’s test datasets before, but I’m getting the hang of it now. Since the hificnv no longer needs the test datasets I made, I will close this PR. Thank you again for your help.
I would like to add the following dataset for the
hificnvmoduel tesing: (1)/data/genomics/homo_sapiens/pacbio/bam/SCRI_KT5028_GRCh38_downsampled_on_chr22_pbmm2_snv_hiphased.bam, (2)/data/genomics/homo_sapiens/genome/GRCh38_chr22.fa, and (3)/data/genomics/homo_sapiens/pacbio/vcf/SCRI_KT5028_GRCh38_downsampled_on_chr22_pbmm2_snv_hiphased.vcf.gzalone with their index files. Those were made specifically for testing thehificnvmodule and futurepb-cpg-toolsmodule. The existing test datasets do not have sufficient reads across a chromosome to runhificnv. So I included the downsized, aligned (against GRCH38), and snv-phased (hiphased) bam with 1000 reads across chr22 to testhificnvand minor allele files yielded bydeepvariantsfor this specific bam file.