Skip to content

Commit cc0969f

Browse files
committed
Add flag worked-example
1 parent f901494 commit cc0969f

File tree

1 file changed

+45
-21
lines changed

1 file changed

+45
-21
lines changed

Day1/ngs-intro.Rmd

Lines changed: 45 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -254,7 +254,8 @@ GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT
254254
+ Q = 20, p=0.01
255255
+ Q = 10, p=0.1
256256
- These numeric quanties are *encoded* as **ASCII** code
257-
+ Sometimes an offset is used before encoding
257+
+ An offset needs to be used before encoding
258+
+ At least 33 to get to meaningful characters
258259

259260
## Fastq quality scores
260261

@@ -276,6 +277,9 @@ GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT
276277
- **S**equence **A**lignment/**M**ap (sam) http://samtools.github.io/hts-specs/SAMv1.pdf
277278
- *Header* lines followed by tab-delimited lines
278279
+ Header gives information about the alignment and references sequences used
280+
- Same format regardless of sequencing protocol (i.e. RNA-seq, ChIP-seq, DNA-seq etc)
281+
- May contain un-mapped reads
282+
279283
```
280284
@HD VN:1.0 SO:coordinate
281285
@SQ SN:chr1 LN:249250621
@@ -325,25 +329,27 @@ CC:Z:chr15 CP:i:102518319 XS:A:+ HI:i:0
325329
+ Read is unmapped
326330
+ Read is paired / unpaired
327331
+ Read failed QC
328-
+ Read is a PCR duplicate
329-
- https://broadinstitute.github.io/picard/explain-flags.html
332+
+ Read is a PCR duplicate (see later)
330333

331-
![sam-flags](../images/sam-flags.png)
332334

333-
## Aligned reads - bam
334335

335-
- *Exactly* the same information as a sam file
336-
- ..except that it is *binary* version of sam
337-
- compressed around x4
338-
- Attempting to read will print garbage to the screen
339-
- bam files can be indexed
340-
+ Produces an index file with the same name as the bam file, but with **.bai** extension
336+
## Derivation
341337

342-
```
343-
samtools view mysequences.bam | head
338+
```{r echo=FALSE,cache=TRUE}
339+
suppressPackageStartupMessages(library(GenomicAlignments))
340+
mybam <- "HG00096.chr22.bam"
341+
bam.extra <- readGAlignments(file=mybam,param=ScanBamParam(what=c("flag")))
342+
flags <- mcols(bam.extra)$flag
343+
flagMat <- bamFlagAsBitMatrix(flags)
344+
df <- data.frame(ReadHasProperty = as.logical(flagMat[1,]),Binary=flagMat[1,] ,MultiplyBy=2^(0:10))
345+
knitr::kable(df)
344346
```
345347

346-
- N.B The sequences can be extracted by various tools to give *fastq*
348+
Value of flag is given by `r paste(df$Binary,df$MultiplyBy,sep="x",collapse=" + ")` = `r sum(df$Binary * t(df$MultiplyBy))`
349+
350+
See also
351+
352+
- https://broadinstitute.github.io/picard/explain-flags.html
347353

348354
## samtools flagstat
349355

@@ -367,15 +373,20 @@ $ samtools flagstat NA19914.chr22.bam
367373
368374
```
369375

370-
## Aligned files in IGV
371-
372-
- Once our bam files have been *indexed* we can view them in IGV
373-
- This is **highly recommended**
374-
- Check-out [our colleagues' course](http://mrccsc.github.io/IGV_course/) for more details
376+
## Aligned reads - bam
375377

376-
![igv](../images/igv_screenshot1.png)
378+
- *Exactly* the same information as a sam file
379+
- ..except that it is *binary* version of sam
380+
- compressed around x4
381+
- Attempting to read will print garbage to the screen
382+
- bam files can be indexed
383+
+ Produces an index file with the same name as the bam file, but with **.bai** extension
377384

385+
```
386+
samtools view mysequences.bam | head
387+
```
378388

389+
- N.B The sequences can be extracted by various tools to give *fastq*
379390

380391

381392

@@ -416,6 +427,17 @@ autoplot(dupReads,aes(fill=pcrDuplicate)) + scale_fill_manual(values = c("black"
416427
+ Allow efficient access
417428
+ [***samtools***](http://www.htslib.org/)
418429

430+
## Aligned files in IGV
431+
432+
- Once our bam files have been *indexed* we can view them in IGV
433+
- This is **highly recommended**
434+
- Check-out [our colleagues' course](http://mrccsc.github.io/IGV_course/) for more details
435+
436+
![igv](../images/igv_screenshot1.png)
437+
438+
439+
440+
419441

420442

421443

@@ -448,6 +470,8 @@ variableStep chrom=chr2
448470
300705 12.5
449471
```
450472

473+
# Crash-course in R
474+
451475
## Advantages of R
452476

453477
![NYT](../images/NYTimes_R_Article.png)
@@ -465,7 +489,7 @@ The R programming language is now recognised beyond the academic community as an
465489
- Facilitating ***Reproducible Research***
466490

467491

468-
# Crash-course in R
492+
469493

470494

471495

0 commit comments

Comments
 (0)