Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions README.md
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice descriptions!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for reviewing it. I really appreciate it.
The aligned-phased BAM and MAF VCF files span the entire chr22, so the FASTA file also covers the whole chromosome and is indeed large, as you mentioned. I might try using homo_sapiens/genome/genome.fasta (chr22:16,570,000–16,610,000) to limit reads in the region, but I’m not sure if it’s based on GRCh38. Do you know? I’ll continue working on it this weekend. Thanks again.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

@chaochaowong chaochaowong Nov 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@inemesb Thanks for reviewing! I found that the genomics/homo_sapiens/pacbio directory includes suitable genome3.fasta, BAM, and VCF files for the hificnv module I submitted. I wasn’t very familiar with nf-core’s test datasets before, but I’m getting the hang of it now. Since the hificnv no longer needs the test datasets I made, I will close this PR. Thank you again for your help.

Original file line number Diff line number Diff line change
Expand Up @@ -388,6 +388,8 @@ The earth sciences folder contain subfolders for different data formats encounte
- genome3.fasta: Reference fasta based on chr19:45760000-45770300
- genome_motifs.txt: TF motifs used for cellranger-atac
- genome.NC_012920_1.gb: Contains mtDNA reference genome in Genbank format
- GRCh38_chr22.fasta: GRCh38 reference fasta based on chr22
- GRCh38_chr22.fasta.fai: index file for 'GRCh38_chr22.fasta'
- transcriptome.fasta: Reference transcriptome based on `genome.fasta`
- gff3: Encode GFF3 file downsampled based on reference position
- gtf: Encode GTF file downsampled based on reference position, `genome_minimal.gtf` is a minimal version containing only the standard fields
Expand Down Expand Up @@ -609,6 +611,8 @@ The earth sciences folder contain subfolders for different data formats encounte
- NA03697B2_downsampled.pbmm2.repeats.bai: associated index to NA03697B2_downsampled.pbmm2.repeats.bam
- NA037562_downsampled.pbmm2.repeats.bam: subsample of puretarget pacbio reads from the [public pacbio dataset](https://downloads.pacbcloud.com/public/dataset/PureTargetRE/Coriell/PBMM2-BAM-Input-For-IGV-And-TRGT/) aligned to genome3.fasta
- NA037562_downsampled.pbmm2.repeats.bai: associated index to NA037562_downsampled.pbmm2.repeats.bam
- SCRI_KT5028_GRCh38_downsampled_on_chr22_pbmm2_snv_hiphased.bam: AML cell line down-sized sample of 1000 pbmm2-aligned and snv-phased sorted reads scattered on chr22; made for testing 'hificnv' and 'pb-cpg-tools' modules.
- SCRI_KT5028_GRCh38_downsampled_on_chr22_pbmm2_snv_hiphased.bam.csi: index file of 'SCRI_KT5028_GRCh38_downsampled_on_chr22_pbmm2_snv_hiphased.bam'
- bed:
- alz.ccs.fl.NEB_5p--NEB_Clontech_3p.flnc.clustered.singletons.merged.aligned_tc.bed: first set of gene models generated by TAMA collapse
- alz.ccs.fl.NEB_5p--NEB_Clontech_3p.flnc.clustered.singletons.merged.aligned_tc.2.bed: first set of gene models generated by TAMA collapse
Expand All @@ -628,6 +632,8 @@ The earth sciences folder contain subfolders for different data formats encounte
- FAM_snvs_annotated_ranked.vcf.gz: VCF file from HG002, only with ch16 generated from deepvariant and GLnexus
- FAM.ped: ped file associated with HG002
- peddy.sites: peddy standard hg38 sites downsampled to only chr16
- SCRI_KT5028_GRCh38_downsampled_on_chr22_pbmm2_snv_hiphased.vcf.gz: VCF file associated with 'SCRI_KT5028_GRCh38_downsampled_on_chr22_pbmm2_snv_hiphased.bam'
- SCRI_KT5028_GRCh38_downsampled_on_chr22_pbmm2_snv_hiphased.vcf.gz.tbi: Index file associated 'SCRI_KT5028_GRCh38_downsampled_on_chr22_pbmm2_snv_hiphased.vcf.gz'

- popgen:
- plink_simulated.bed: case-control simulated variants dataset in PLINK binary format
Expand Down
508,186 changes: 508,186 additions & 0 deletions data/genomics/homo_sapiens/genome/GRCh38_chr22.fasta

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions data/genomics/homo_sapiens/genome/GRCh38_chr22.fasta.fai
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
chr22 50818468 7 100 101
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.