Skip to content

Conversation

@chaochaowong
Copy link

@chaochaowong chaochaowong commented Oct 23, 2025

I would like to add the following dataset for the hificnv moduel tesing: (1)/data/genomics/homo_sapiens/pacbio/bam/SCRI_KT5028_GRCh38_downsampled_on_chr22_pbmm2_snv_hiphased.bam, (2)/data/genomics/homo_sapiens/genome/GRCh38_chr22.fa, and (3) /data/genomics/homo_sapiens/pacbio/vcf/SCRI_KT5028_GRCh38_downsampled_on_chr22_pbmm2_snv_hiphased.vcf.gz alone with their index files. Those were made specifically for testing the hificnv module and future pb-cpg-tools module. The existing test datasets do not have sufficient reads across a chromosome to run hificnv. So I included the downsized, aligned (against GRCH38), and snv-phased (hiphased) bam with 1000 reads across chr22 to test hificnv and minor allele files yielded by deepvariants for this specific bam file.

…cnv module

tesing,
genome/pacbio/vcv/SCRI_KT5028_GRCh38_downsampled_on_chr22_pbmm2_snv_hiphased.vcf.gz.
Copy link

@inemesb inemesb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this addition! Is it necessary to add a new reference genome, or could you use one that already exists in the test datasets? The one you added is quite big, if it works with a smaller one it might be better to align to e.g. 'genome.fasta'.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice descriptions!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for reviewing it. I really appreciate it.
The aligned-phased BAM and MAF VCF files span the entire chr22, so the FASTA file also covers the whole chromosome and is indeed large, as you mentioned. I might try using homo_sapiens/genome/genome.fasta (chr22:16,570,000–16,610,000) to limit reads in the region, but I’m not sure if it’s based on GRCh38. Do you know? I’ll continue working on it this weekend. Thanks again.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

@chaochaowong chaochaowong Nov 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@inemesb Thanks for reviewing! I found that the genomics/homo_sapiens/pacbio directory includes suitable genome3.fasta, BAM, and VCF files for the hificnv module I submitted. I wasn’t very familiar with nf-core’s test datasets before, but I’m getting the hang of it now. Since the hificnv no longer needs the test datasets I made, I will close this PR. Thank you again for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants