Levi Waldron, PhD
Associate Professor of Biostatistics
City University of New York School Graduate of Public Health and Health Policy
New York, NY, U.S.A.
Email: lwaldron.research@gmail.com
Hangouts: lwaldron.research
Skype: levi.waldron
This course will provide biologists and bioinformaticians with practical statistical and data analysis skills to perform rigorous analysis of high-throughput biological data. The course assumes some familiarity with genomics and with R programming, but does not assume prior statistical training. It covers the statistical concepts necessary to design experiments and analyze high-dimensional data generated by genomic technologies, including: exploratory data analysis, linear modeling, analysis of categorical variables, principal components analysis, and batch effects.
- Biomedical Data Science by Irizarry and Love (ePub version)
- Source repository
Each day will include a hands-on lab session, that students should attempt in full.
All course materials will be available from https://github.com/waldronlab/AppStatBio/.
- introduction
- random variables
- distributions
- hypothesis testing for one or two samples (t-test, Wilcoxon test, etc)
- data manipulation using dplyr
- non-parametric approaches
- hypothesis tests for categorical variables (chi-square, Fisher's Exact)
- Monte Carlo simulation
- permutation tests
- bootstrap simulation
- exploratory data analysis
- linear modeling
- linear and generalized linear modeling
- model matrix and model formulae
- multiple testing
- unsupervised analysis
- graphics for exploratory data analysis
- distance in high dimensions
- principal components analysis and multidimensional scaling
- unsupervised clustering
- batch effects