Code repository for the paper:
Agniva Chowdhury and Pradeep Ramuhalli. A Provably Accurate Randomized Sampling Algorithm for Logistic Regression. In Proceedings of the 38th AAAI Conference on Artificial Intelligence, 2024.
Technical Appendix of the paper can be found in TechnicalAppendix.pdf.
- Cardiovascular disease dataset (cardio): cardio_train.csv (sourced from here)
- Bank customer churn prediction dataset (churn): Bank Customer Churn Prediction.csv (sourced from here)
- Default of credit card clients dataset (default): default of credit card clients.csv (sourced from here)
- To compute row leverage scores of a matrix: leverage_scores.py
- To perform leverage score, l2s, or uniform sampling: row_sampling.py
The code for l2s sampling has been sourced from here.
To reproduce the experiments in the paper, run the following Jupyter Notebooks:
- For Cardiovascular disease dataset: cardio_train.ipynb
- For Bank customer churn prediction dataset: default_of_credit_card_clients.ipynb
- For Default of credit card clients dataset: Bank_Customer_Churn_Prediction.ipynb
@article{Chowdhury_Ramuhalli_2024, title={A Provably Accurate Randomized Sampling Algorithm for Logistic Regression}, author={Chowdhury, Agniva and Ramuhalli, Pradeep}, journal={Proceedings of the AAAI Conference on Artificial Intelligence}, volume={38}, number={10}, year={2024}, pages={11597-11605}, url={https://ojs.aaai.org/index.php/AAAI/article/view/29042}, doi={10.1609/aaai.v38i10.29042} } Please contact Agniva Chowdhury for questions or comments.