Skip to content

mstack-space/AppStatBio

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Syllabus: Applied Statistics for High-Throughput Biology

Instructor

Levi Waldron, PhD
Associate Professor of Biostatistics
City University of New York School Graduate of Public Health and Health Policy
New York, NY, U.S.A.

Email: lwaldron.research@gmail.com
Hangouts: lwaldron.research
Skype: levi.waldron

Preparation

Please come to the first class with the following installed:

Please create an account at www.github.com, and use it to introduce yourself at https://github.com/waldronlab/AppStatBio/issues.

Summary

This course will provide biologists and bioinformaticians with practical statistical and data analysis skills to perform rigorous analysis of high-throughput biological data. The course assumes some familiarity with genomics and with R programming, but does not assume prior statistical training. It covers the statistical concepts necessary to design experiments and analyze high-dimensional data generated by genomic technologies, including: exploratory data analysis, linear modeling, analysis of categorical variables, principal components analysis, and batch effects.

Textbook

Related Resources

Labs

Each day will include a hands-on lab session, that students should attempt in full.

Session detail by day

All course materials will be available from https://github.com/waldronlab/AppStatBio/.

  1. introduction
    • random variables
    • distributions
    • hypothesis testing for one or two samples (t-test, Wilcoxon test, etc)
    • hypothesis testing for categorical variables (Fisher's Test, Chi-square test)
    • data manipulation using dplyr
  2. linear modeling
    • linear and generalized linear modeling
    • model matrix and model formulae
    • multiple testing
  3. unsupervised analysis
    • graphics for exploratory data analysis
    • distance in high dimensions
    • principal components analysis and multidimensional scaling
    • unsupervised clustering
    • batch effects
  4. multi'omic data analysis lab session
    • core data classes in Bioconductor: GRanges, SummarizedExperiment, RaggedExperiment, MultiAssayExperiment
    • creating a MultiAssayExperiment
    • subsetting, reshaping, growing, and extraction of a MultiAssayExperiment
    • lotting, correlation, and other statistical analyses
    • multi'omics lab code and html

About

Applied Statistics for High-Throughput Biology

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%