Levi Waldron, PhD
Associate Professor of Biostatistics
City University of New York School Graduate of Public Health and Health Policy
New York, NY, U.S.A.
Email: lwaldron.research@gmail.com
Hangouts: lwaldron.research
Skype: levi.waldron
Please come to the first class with the following installed:
- Bioconductor www.bioconductor.org/install
- R Studio: https://www.rstudio.com/products/rstudio/download3/
Please create an account at www.github.com, and use it to introduce yourself at https://github.com/waldronlab/AppStatBio/issues.
This course will provide biologists and bioinformaticians with practical statistical and data analysis skills to perform rigorous analysis of high-throughput biological data. The course assumes some familiarity with genomics and with R programming, but does not assume prior statistical training. It covers the statistical concepts necessary to design experiments and analyze high-dimensional data generated by genomic technologies, including: exploratory data analysis, linear modeling, analysis of categorical variables, principal components analysis, and batch effects.
- Biomedical Data Science by Irizarry and Love (ePub version)
- Source repository
Each day will include a hands-on lab session, that students should attempt in full.
All course materials will be available from https://github.com/waldronlab/AppStatBio/.
- introduction
- random variables
- distributions
- hypothesis testing for one or two samples (t-test, Wilcoxon test, etc)
- hypothesis testing for categorical variables (Fisher's Test, Chi-square test)
- data manipulation using dplyr
- linear modeling
- linear and generalized linear modeling
- model matrix and model formulae
- multiple testing
- unsupervised analysis
- graphics for exploratory data analysis
- distance in high dimensions
- principal components analysis and multidimensional scaling
- unsupervised clustering
- batch effects
- multi'omic data analysis lab session