Skip to content

mstack-space/AppStatBio

 
 

Repository files navigation

Syllabus: Applied Statistics for High-Throughput Biology

Instructor

Levi Waldron, PhD
Assistant Professor of Biostatistics
City University of New York School Graduate of Public Health and Health Policy
New York, NY, U.S.A.

In-person at CIBIO: Orange 1
Email: lwaldron.research@gmail.com
Hangouts: lwaldron.research
Skype: levi.waldron

Book appointments at: http://www.calendly.com/lwaldron

Times and Places

Classes will take place in Room Molveno at the following times. You can add the AppStatHTB calendar to your own from the html or iCal.

  1. Weds. 10 Feb, 10-13
  2. Thurs. 18 Feb, 10-13
  3. Weds. 24 Feb, 10:30-13:30
  4. Fri. 04 Mar, 10-13
  5. Thu. 10 Mar, 10-13
  6. Thu. 17 Mar, 10-13
  7. Thu. 24 Mar, 10-13
  8. Thu. 31 Mar, 10-13 (presentations)

Important Class Links

Summary

This course will provide biologists and bioinformaticians with practical statistical and data analysis skills to perform rigorous analysis of high-throughput biological data. The course assumes some familiarity with genomics and with R programming, but does not assume prior statistical training. It covers the statistical concepts necessary to design experiments and analyze high-dimensional data generated by genomic technologies, including: exploratory data analysis, hypothesis testing, linear modeling, principal components analysis, unsupervised clustering, cross-validation and bootstrap resampling methods.

Textbook

Related Resources

Evaluation

Three components are required for class completion: 1) completion of lab exercises each week, 2) oral presentation of the final project, 3) written report of the final project. See details below.

Labs

Each week will include a hands-on lab session, that students are required to hand in before the following class by committing to this Github repository. You can work together on lab exercises, but everyone must hand in their individual lab.

Projects

A class project will be handed out after the third week of class, that will involve analysing a genomics dataset. Each student will analyse their own dataset and prepare an individual report. Students will present their projects in two ways:

  1. A 20-minute project presentation on March 31
  2. A written report on the same project, prepared using R Markdown, due April 4

Project presentations and reports will be assessed for quality of analysis, but also for presentation style and clarity

Session detail by week

  • Week 1

    • introduction to R
    • random variables
    • distributions
  • Week 2

    • populations and samples
    • Central Limit Theorem
    • t-distribution
  • Week 3

    • hypothesis testing
    • type I and II error and power
    • confidence intervals
  • Week 4

    • hypothesis tests for categorical variables (chi-square, Fisher's Exact)
    • Monte Carlo simulation
    • permutation tests
    • bootstrap simulation
    • exploratory data analysis
  • Week 5

    • distance in high dimensions
    • singular value decomposition
    • principal componenets analysis and multidimensional scaling
  • Week 6

    • unsupervised clustering
    • batch effects
  • Week 7

    • linear modeling
    • model matrix and model formulae
  • Week 8: project presentations

About

Applied Statistics for High-Throughput Biology

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%