Levi Waldron, PhD
Assistant Professor of Biostatistics
City University of New York School Graduate of Public Health and Health Policy
New York, NY, U.S.A.
In-person at CIBIO: Orange 1
Email: lwaldron.research@gmail.com
Hangouts: lwaldron.research
Skype: levi.waldron
Book appointments at: http://www.calendly.com/lwaldron
Classes will take place in Room Molveno at the following times. You can add the AppStatHTB calendar to your own from the html or iCal.
- Weds. 10 Feb, 10-13
- Thurs. 18 Feb, 10-13
- Weds. 24 Feb, 10:30-13:30
- Fri. 04 Mar, 10-13
- Thu. 10 Mar, 10-13
- Thu. 17 Mar, 10-13
- Thu. 24 Mar, 10-13
- Thu. 31 Mar, 10-13 (presentations)
- YouTube channel (lectures will be recorded here)
- Private Google Group for discussion and announcements.
This course will provide biologists and bioinformaticians with practical statistical and data analysis skills to perform rigorous analysis of high-throughput biological data. The course assumes some familiarity with genomics and with R programming, but does not assume prior statistical training. It covers the statistical concepts necessary to design experiments and analyze high-dimensional data generated by genomic technologies, including: exploratory data analysis, hypothesis testing, linear modeling, principal components analysis, unsupervised clustering, cross-validation and bootstrap resampling methods.
- Biomedical Data Science by Irizarry and Love (ePub version)
- Source repository
- Github resources at http://waldronlab.github.io/github/
- Resources for learning R at http://waldronlab.github.io/learnr/
- Other resources at http://waldronlab.github.io/
Three components are required for class completion: 1) completion of lab exercises each week, 2) oral presentation of the final project, 3) written report of the final project. See details below.
Each week will include a hands-on lab session, that students are required to hand in before the following class by committing to this Github repository. You can work together on lab exercises, but everyone must hand in their individual lab.
A class project will be handed out after the third week of class, that will involve analysing a genomics dataset. Each student will analyse their own dataset and prepare an individual report. Students will present their projects in two ways:
- A 20-minute project presentation on March 31
- A written report on the same project, prepared using R Markdown, due April 4
Project presentations and reports will be assessed for quality of analysis, but also for presentation style and clarity
-
Week 1
- introduction to R
- random variables
- distributions
-
Week 2
- populations and samples
- Central Limit Theorem
- t-distribution
-
Week 3
- hypothesis testing
- type I and II error and power
- confidence intervals
-
Week 4
- hypothesis tests for categorical variables (chi-square, Fisher's Exact)
- Monte Carlo simulation
- permutation tests
- bootstrap simulation
- exploratory data analysis
-
Week 5
- distance in high dimensions
- singular value decomposition
- principal componenets analysis and multidimensional scaling
-
Week 6
- unsupervised clustering
- batch effects
-
Week 7
- linear modeling
- model matrix and model formulae
-
Week 8: project presentations