Introduction to DESeq2

Posted May 12, 2020 2020-05-12T15:30:00+08:00 by Zefeng Zhu

Updated May 21, 2020 2020-05-21T12:03:54+08:00

Goal

Want their normalization to handle:

The goal is calculate a scaling factor for each factor. The scaling factor has to take read depth and library composition into account.

Take the log of all the values
- (default) log base e: $\log_{e}$
Average each row
- average of log values (after raised by e, this is geometric average)
- $\cfrac{1+\text{Infity}}{2}=\text{Infity}$
Filter Out Genes with Infinity
- filter out genes with zero read counts in one or more samples
- help focus the scaling factors on the house keeping genes - genes that transcribed at similar levels regardless of tissue type
Subtract the average log value from the log(counts)
- $\log(\text{reads for gene X})-\log(\text{average for gene X})=\log(\cfrac{\text{reads for gene X}}{\text{average for gene X}})$
Calculate the median of the ratios for each sample
- avoid extreme genes from swaying the value too much in one direction
- genes with huge differences in expression have no more influence on the median than genes with minor differences
Convert the medians to “normal numbers” to get the final scaling factors for each sample
- raise e to the median value for each sample
Divide the original read counts by the scaling factors

Personal Note: 本质上就是$\cfrac{x\cdot\text{Geometric Average}}{\text{Median}}$

Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550. doi:10.1186/s13059-014-0550-8
StatQuest: DESeq2, part 1, Library Normalization