Posts Introduction to DESeq2

Introduction to DESeq2


Want their normalization to handle:

  1. Differences in library sizes
  2. Differences in library composition

The goal is calculate a scaling factor for each factor. The scaling factor has to take read depth and library composition into account.


  • Take the log of all the values
    • (default) log base e: $\log_{e}$
  • Average each row
    • average of log values (after raised by e, this is geometric average)
    • $\cfrac{1+\text{Infity}}{2}=\text{Infity}$
  • Filter Out Genes with Infinity
    • filter out genes with zero read counts in one or more samples
    • help focus the scaling factors on the house keeping genes - genes that transcribed at similar levels regardless of tissue type
  • Subtract the average log value from the log(counts)
    • $\log(\text{reads for gene X})-\log(\text{average for gene X})=\log(\cfrac{\text{reads for gene X}}{\text{average for gene X}})$
  • Calculate the median of the ratios for each sample
    • avoid extreme genes from swaying the value too much in one direction
    • genes with huge differences in expression have no more influence on the median than genes with minor differences
  • Convert the medians to “normal numbers” to get the final scaling factors for each sample
    • raise e to the median value for each sample
  • Divide the original read counts by the scaling factors

Personal Note: 本质上就是$\cfrac{x\cdot\text{Geometric Average}}{\text{Median}}$


  1. Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550. doi:10.1186/s13059-014-0550-8
  2. StatQuest: DESeq2, part 1, Library Normalization
This post is licensed under CC BY 4.0 by the author.