2010 Federal STEM Education Inventory Data Set
Stage 1:
- Calculate % growth of funding between year 2008 & 2009.
- If funding is positive, tag it as 1, if funding is negative tag it as 0. This is the target variable.
Stage 2:
- Create graphs of univariate distribution of all non funding variables and share on a jupyter notebook. Just FYI - Funding FY2008, FY2009, FY2010 are the "funding variables"
- Calculate mutual_info_score of target variable created in stage 1 & ALL non funding variables and share on a jupyter notebook.
Stage 3:
- Divide data into train & test samples. (70-30 split)
- Select features & build xgboost model. You will be judged on roc_auc_score on test sample.
- Write testcases on all user defined functions using pytest framework.