@@ -254,7 +254,8 @@ GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT
254254 + Q = 20, p=0.01
255255 + Q = 10, p=0.1
256256- These numeric quanties are * encoded* as ** ASCII** code
257- + Sometimes an offset is used before encoding
257+ + An offset needs to be used before encoding
258+ + At least 33 to get to meaningful characters
258259
259260## Fastq quality scores
260261
@@ -276,6 +277,9 @@ GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT
276277- ** S** equence ** A** lignment/** M** ap (sam) http://samtools.github.io/hts-specs/SAMv1.pdf
277278- * Header* lines followed by tab-delimited lines
278279 + Header gives information about the alignment and references sequences used
280+ - Same format regardless of sequencing protocol (i.e. RNA-seq, ChIP-seq, DNA-seq etc)
281+ - May contain un-mapped reads
282+
279283```
280284@HD VN:1.0 SO:coordinate
281285@SQ SN:chr1 LN:249250621
@@ -325,25 +329,27 @@ CC:Z:chr15 CP:i:102518319 XS:A:+ HI:i:0
325329 + Read is unmapped
326330 + Read is paired / unpaired
327331 + Read failed QC
328- + Read is a PCR duplicate
329- - https://broadinstitute.github.io/picard/explain-flags.html
332+ + Read is a PCR duplicate (see later)
330333
331- ![ sam-flags] ( ../images/sam-flags.png )
332334
333- ## Aligned reads - bam
334335
335- - * Exactly* the same information as a sam file
336- - ..except that it is * binary* version of sam
337- - compressed around x4
338- - Attempting to read will print garbage to the screen
339- - bam files can be indexed
340- + Produces an index file with the same name as the bam file, but with ** .bai** extension
336+ ## Derivation
341337
342- ```
343- samtools view mysequences.bam | head
338+ ``` {r echo=FALSE,cache=TRUE}
339+ suppressPackageStartupMessages(library(GenomicAlignments))
340+ mybam <- "HG00096.chr22.bam"
341+ bam.extra <- readGAlignments(file=mybam,param=ScanBamParam(what=c("flag")))
342+ flags <- mcols(bam.extra)$flag
343+ flagMat <- bamFlagAsBitMatrix(flags)
344+ df <- data.frame(ReadHasProperty = as.logical(flagMat[1,]),Binary=flagMat[1,] ,MultiplyBy=2^(0:10))
345+ knitr::kable(df)
344346```
345347
346- - N.B The sequences can be extracted by various tools to give * fastq*
348+ Value of flag is given by ` r paste(df$Binary,df$MultiplyBy,sep="x",collapse=" + ") ` = ` r sum(df$Binary * t(df$MultiplyBy)) `
349+
350+ See also
351+
352+ - https://broadinstitute.github.io/picard/explain-flags.html
347353
348354## samtools flagstat
349355
@@ -367,15 +373,20 @@ $ samtools flagstat NA19914.chr22.bam
367373
368374```
369375
370- ## Aligned files in IGV
371-
372- - Once our bam files have been * indexed* we can view them in IGV
373- - This is ** highly recommended**
374- - Check-out [ our colleagues' course] ( http://mrccsc.github.io/IGV_course/ ) for more details
376+ ## Aligned reads - bam
375377
376- ![ igv] ( ../images/igv_screenshot1.png )
378+ - * Exactly* the same information as a sam file
379+ - ..except that it is * binary* version of sam
380+ - compressed around x4
381+ - Attempting to read will print garbage to the screen
382+ - bam files can be indexed
383+ + Produces an index file with the same name as the bam file, but with ** .bai** extension
377384
385+ ```
386+ samtools view mysequences.bam | head
387+ ```
378388
389+ - N.B The sequences can be extracted by various tools to give * fastq*
379390
380391
381392
@@ -416,6 +427,17 @@ autoplot(dupReads,aes(fill=pcrDuplicate)) + scale_fill_manual(values = c("black"
416427 + Allow efficient access
417428 + [ *** samtools*** ] ( http://www.htslib.org/ )
418429
430+ ## Aligned files in IGV
431+
432+ - Once our bam files have been * indexed* we can view them in IGV
433+ - This is ** highly recommended**
434+ - Check-out [ our colleagues' course] ( http://mrccsc.github.io/IGV_course/ ) for more details
435+
436+ ![ igv] ( ../images/igv_screenshot1.png )
437+
438+
439+
440+
419441
420442
421443
@@ -448,6 +470,8 @@ variableStep chrom=chr2
448470300705 12.5
449471```
450472
473+ # Crash-course in R
474+
451475## Advantages of R
452476
453477![ NYT] ( ../images/NYTimes_R_Article.png )
@@ -465,7 +489,7 @@ The R programming language is now recognised beyond the academic community as an
465489- Facilitating *** Reproducible Research***
466490
467491
468- # Crash-course in R
492+
469493
470494
471495
0 commit comments