Evaluating Machine Learning Algorithms for Materials Science using the Matbench Protocol Anubhav Jain Staff Scientist, Lawrence Berkeley National Laboratory Deputy Director, Materials Project materialsproject.org The Materials Project Slides (already) uploaded to https://hackingmaterials.lbl.gov
Outline of talk 1. A quick introduction to the Materials Project 2. Engaging the community: The MPContribs data platform 3. Benchmarking machine learning algorithms using the Matbench protocol
A quick introduction to the Materials Project
The core of Materials Project is a free database of calculated materials properties and crystal structures Free, public resource • www.materialsproject.org Data on ~150,000 materials, including information on: • electronic structure • phonon and thermal properties • elastic / mechanical properties • magnetic properties • ferroelectric properties • piezoelectric properties • dielectric properties Powered by hundreds of millions of CPU-hours invested into high- quality calculations 4
The core data set keeps growing with time … 5
Apps give insight into data Materials Explorer Phase Stability Diagrams Pourbaix Diagrams (Aqueous Stability) Battery Explorer 6
The code powering the Materials Project is available open source (BSD/MIT licenses) just-in-time error correction, fixing your calculations so you don’t have to ‘recipes' for common materials science simulation tasks making materials science web apps easy workflow management software for high-throughput computing materials science analysis code: make, transform and analyze crystals, phase diagrams and more & more … MP team members also contribue to several other non-MP codes, e.g. matminer for machine learning featurization 7
Example: calculation workflows implemented in by dozens of collaborators Phonons Elasticity Defects Magnetism Band Structures Stability Grain Boundaries Equations of State X-ray Absorption Spectra Piezoelectric Dielectric Surfaces & more … 9 Requirements: VASP license and a big computer ABINIT planned in future w/G.-M. Rignanese 8
Example 2: matminer allows researchers to generate diverse feature sets for machine learning 9 >60 featurizer classes can generate thousands of potential descriptors that are described in the literature feat = EwaldEnergy([options]) y = feat.featurize([input_data]) • compatible with scikit- learn pipelining • automatically deploy multiprocessing to parallelize over data • include citations to methodology papers
The Materials Project is used heavily by the research community > 180,000 registered users > 40,000 new users last year ~100 new registrations/day ~5,000-10,000 users log on every day > 2M+ records downloaded through API each day; 1.8 TB of data served per month 10
A large fraction of users are from industry Student 44% Academia 36% Industry 10% Government 5% Other 5% 3.5% Schrodinger: Many of our customers are active users of the Materials Project and use MP databases for their projects. Enabling direct access to MP databases from within Schrödinger software is a powerful addition that will be appreciated by our users. Toyota: “Materials Project is a wonderful project. Please accept my appreciation to you to release it free and easy to access.” Hazen Research: “Amazing and well done data base. I still remember searching Landolt-Börnstein series during my PhD for similar things.” 11
Engaging the community: the MPContribs data platform
How can we use Materials Project to build a community of materials researchers? Materials Project now has high visibility (e.g., by search engines) How can we use this platform to help add value to the community of materials researchers? 13
Beyond calculations: MPContribs allows the research community to contribute their own data A “materials detail page,” containing all the information MP has calculated about a specific material Experimental data on a material (either specific phase, composition, or chemical system) “MPContribs” bridges the gap 14
2. Materials Project links to your contribution 3. Your data set and paper are linked 1. Google links to Materials Project page 15 From Google search to your data and your research, via MP
MPContribs is open for contributions You can now apply to contribute your data set and we will work with you to disseminate via MP Designed for: • smaller data sets (e.g., MBs to GBs); for large data files see NOMAD or other repos • Linking to MP compositions Available via mpcontribs.org 16
Benchmarking machine learning methods using the Matbench protocol
MP is now involved in an effort to benchmark various machine learning algorithms 18
Model 2 Without standardized benchmarks, ML models can be difficult to compare Model 1 Dataset 1 + No structures No AB2C3 compositions 4k samples Dataset 2 + Model 3 Dataset 3 + RMSETest Set = 0.05 eV MAE5-fold CV = 0.021 eV Val. Loss = 0.005 VS. VS. Structures avail. 100k samples Eabove hull < 0.050 eV ??? ??? ??? ??? ???
What’s needed – an “ImageNet” for materials science https://qz.com/1034972/the-data-that-changed-the-direction-of-ai-research-and-possibly-the-world/ 20
Can we make the same advancements in materials as in computer vision? One of the reasons computer science / machine learning seems to advance so quickly is that they decouple data generation from algorithm development This allows groups to focus on algorithm development without all the data generation, data cleaning, etc. that often is the majority of an end-to-end data science project Clear comparisons also move the field forward and measure progress 21
The ingredients of the Matbench benchmark qStandard data sets qStandard test splits according to nested cross-validation procedure qAn online leaderboard that encourages reproducible results 22
Matbench includes 13 different ML tasks 23 Dunn, A.; Wang, Q.; Ganose, A.; Dopp, D.; Jain, A. Benchmarking Materials Property Prediction Methods: The Matbench Test Set and Automatminer Reference Algorithm. npj Comput Mater 2020, 6 (1), 138. https://doi.org/10.1038/s41524-020-00406-3.
The tasks encompass a variety of problems 13 Ready-to-use ML tasks ranging in training size, target property, inputs, task type. • Pre-cleaned datasets from literature and online repositories (such as Materials Project) • Wide range of practical solid state ML tasks • Experimental and computed properties • Standardized error evaluation (nested CV)
Browse datasets and tasks with Materials Project MPContribsML https://ml.materialsproject.org
The ingredients of the Matbench benchmark ü Standard data sets qStandard test splits according to nested cross-validation procedure qAn online leaderboard that encourages reproducible results 26
27 Most commonly used test split procedure • Training/validation is used for model selection • Test / hold-out is used only for error estimation (Test set should not inform model selection, i.e. “final answer”)
Think of it as N different “universes” – we have a different training of the model in each universe and a different hold-out. 28 Nested CV – like hold-out, but varies the hold-out set
Think of it as N different “universes” – we have a different training of the model in each universe and a different hold-out. 29 Nested CV – like hold-out, but varies the hold-out set “A nested CV procedure provides an almost unbiased estimate of the true error.” Varma and Simon, Bias in error estimation when using cross-validation for model selection (2006)
The ingredients of the Matbench benchmark ü Standard data sets ü Standard test splits according to nested cross-validation procedure qAn online leaderboard that encourages reproducible results 30
Matbench has an online leaderboard – matbench.materialsproject.org
Complete and reproducible results on standardized ML tasks Sample-by-sample predictions of all algorithms on all tasks, notebooks and scripts for reproduction Aggregate scores across nested CV folds Complete model metadata, hyperparameters, required compute, academic references .json .ipynb .py
Algorithm comparison across individual tasks OR complete benchmark Example: matbench_dielectric Compare both specialized and general-purpose algorithms across multiple error metrics
Evaluation of ML paradigms drives research and development Traditional paradigms: • Traditional Models (e.g., RF + MagPie[1] features) • AutoML inside “traditional ML” space (Automatminer) Advancements in deep neural networks: 1. doi.org/10.1038/npjcompumats.2016.28 Attention Networks (e.g., CRABNet [2]) Optimal Descriptor Networks (e.g, MODNet [3]) Crystal Graph Networks (e.g, CGCNN, MEGNet [4]) 2. doi.org/10.1038/s41524-021-00545-1 3. doi.org/10.1038/s41524-021-00552-2 4. doi.org/10.1021/acs.chemmater.9b01294
Matbench compares these ML model paradigms Traditional paradigms: • Traditional Models (e.g., RF + MagPie[1] features) • AutoML inside “traditional ML” space (Automatminer) Advancements in deep neural networks: 1. doi.org/10.1038/npjcompumats.2016.28 Attention Networks (e.g., CRABNet [2]) Optimal Descriptor Networks (e.g, MODNet [3]) Crystal Graph Networks (e.g, CGCNN, MEGNet [4]) 2. doi.org/10.1038/s41524-021-00545-1 3. doi.org/10.1038/s41524-021-00552-2 4. doi.org/10.1021/acs.chemmater.9b01294 ✓ - in Matbench ✓ - in Matbench ✓ - in Matbench ✓ - CGCNN in Matbench ✓ - MEGNET in progress ✓ - PR in review
Contribute your model to the body of knowledge Matbench Python package Evaluate an entire benchmark with ~10 lines of code $: pip install matbench from matbench.bench import MatbenchBenchmark mb = MatbenchBenchmark(autoload=False) for task in mb.tasks: task.load() for fold in task.folds: train_inputs, train_outputs = task.get_train_and_val_data(fold) my_model.train_and_validate(train_inputs, train_outputs) test_inputs = task.get_test_data(fold, include_target=False) predictions = my_model.predict(test_inputs) task.record(fold, predictions) mb.to_file("my_models_benchmark.json.gz") Your model needs to have: • a function that trains it based on training data • makes a prediction based on the trained model
Contribute your model to the body of knowledge Matbench Python package Evaluate an entire benchmark with ~10 lines of code $: pip install matbench from matbench.bench import MatbenchBenchmark mb = MatbenchBenchmark(autoload=False) for task in mb.tasks: task.load() for fold in task.folds: train_inputs, train_outputs = task.get_train_and_val_data(fold) my_model.train_and_validate(train_inputs, train_outputs) test_inputs = task.get_test_data(fold, include_target=False) predictions = my_model.predict(test_inputs) task.record(fold, predictions) mb.to_file("my_models_benchmark.json.gz") Submit model file along with your desired model metadata via Github PR
The ingredients of the Matbench benchmark ü Standard data sets ü Standard test splits according to nested cross-validation procedure ü An online leaderboard that encourages reproducible results 38
Results so far: graph NN for large data sets, conventional ML for small Dunn, A.; Wang, Q.; Ganose, A.; Dopp, D.; Jain, A. Benchmarking Materials Property Prediction Methods: The Matbench Test Set and Automatminer Reference Algorithm. npj Comput Mater 2020, 6 (1), 138. https://doi.org/10.1038/s41524-020-00406-3. 39
Overall and upcoming goals for Matbench • We have introduced a method that allows researchers to evaluate their machine learning models on a standard benchmark, if they so choose • The “Matbench” resource also provides metadata and code examples that allows others to reproduce and use community ML models more easily, as well as discover new ML models • In the future, we hope to do expand the type of tasks, perform meta- analyses on what kinds of algorithms work best for certain problems, and plot progress on these tasks over time 40
Concluding thoughts The Materials Project is a free resource providing data and tools to help perform research and development of new materials Even more can be accomplished as a unified community to push forward data dissemination as well as the capabilities of machine learning 41 We encourage you to give Matbench a try, and look forward to seeing your algorithm on the leaderboard!
Kristin Persson MP Director The team Intro Thank you! Patrick Huck Staff Scientist (MPContribs) Alex Dunn Grad Student (Matbench / matminer) Slides (already) uploaded to https://hackingmaterials.lbl.gov

Evaluating Machine Learning Algorithms for Materials Science using the Matbench Protocol

  • 1.
    Evaluating Machine LearningAlgorithms for Materials Science using the Matbench Protocol Anubhav Jain Staff Scientist, Lawrence Berkeley National Laboratory Deputy Director, Materials Project materialsproject.org The Materials Project Slides (already) uploaded to https://hackingmaterials.lbl.gov
  • 2.
    Outline of talk 1.A quick introduction to the Materials Project 2. Engaging the community: The MPContribs data platform 3. Benchmarking machine learning algorithms using the Matbench protocol
  • 3.
    A quick introductionto the Materials Project
  • 4.
    The core ofMaterials Project is a free database of calculated materials properties and crystal structures Free, public resource • www.materialsproject.org Data on ~150,000 materials, including information on: • electronic structure • phonon and thermal properties • elastic / mechanical properties • magnetic properties • ferroelectric properties • piezoelectric properties • dielectric properties Powered by hundreds of millions of CPU-hours invested into high- quality calculations 4
  • 5.
    The core dataset keeps growing with time … 5
  • 6.
    Apps give insightinto data Materials Explorer Phase Stability Diagrams Pourbaix Diagrams (Aqueous Stability) Battery Explorer 6
  • 7.
    The code poweringthe Materials Project is available open source (BSD/MIT licenses) just-in-time error correction, fixing your calculations so you don’t have to ‘recipes' for common materials science simulation tasks making materials science web apps easy workflow management software for high-throughput computing materials science analysis code: make, transform and analyze crystals, phase diagrams and more & more … MP team members also contribue to several other non-MP codes, e.g. matminer for machine learning featurization 7
  • 8.
    Example: calculation workflowsimplemented in by dozens of collaborators Phonons Elasticity Defects Magnetism Band Structures Stability Grain Boundaries Equations of State X-ray Absorption Spectra Piezoelectric Dielectric Surfaces & more … 9 Requirements: VASP license and a big computer ABINIT planned in future w/G.-M. Rignanese 8
  • 9.
    Example 2: matminerallows researchers to generate diverse feature sets for machine learning 9 >60 featurizer classes can generate thousands of potential descriptors that are described in the literature feat = EwaldEnergy([options]) y = feat.featurize([input_data]) • compatible with scikit- learn pipelining • automatically deploy multiprocessing to parallelize over data • include citations to methodology papers
  • 10.
    The Materials Projectis used heavily by the research community > 180,000 registered users > 40,000 new users last year ~100 new registrations/day ~5,000-10,000 users log on every day > 2M+ records downloaded through API each day; 1.8 TB of data served per month 10
  • 11.
    A large fractionof users are from industry Student 44% Academia 36% Industry 10% Government 5% Other 5% 3.5% Schrodinger: Many of our customers are active users of the Materials Project and use MP databases for their projects. Enabling direct access to MP databases from within Schrödinger software is a powerful addition that will be appreciated by our users. Toyota: “Materials Project is a wonderful project. Please accept my appreciation to you to release it free and easy to access.” Hazen Research: “Amazing and well done data base. I still remember searching Landolt-Börnstein series during my PhD for similar things.” 11
  • 12.
    Engaging the community: theMPContribs data platform
  • 13.
    How can weuse Materials Project to build a community of materials researchers? Materials Project now has high visibility (e.g., by search engines) How can we use this platform to help add value to the community of materials researchers? 13
  • 14.
    Beyond calculations: MPContribsallows the research community to contribute their own data A “materials detail page,” containing all the information MP has calculated about a specific material Experimental data on a material (either specific phase, composition, or chemical system) “MPContribs” bridges the gap 14
  • 15.
    2. Materials Projectlinks to your contribution 3. Your data set and paper are linked 1. Google links to Materials Project page 15 From Google search to your data and your research, via MP
  • 16.
    MPContribs is openfor contributions You can now apply to contribute your data set and we will work with you to disseminate via MP Designed for: • smaller data sets (e.g., MBs to GBs); for large data files see NOMAD or other repos • Linking to MP compositions Available via mpcontribs.org 16
  • 17.
    Benchmarking machine learning methodsusing the Matbench protocol
  • 18.
    MP is nowinvolved in an effort to benchmark various machine learning algorithms 18
  • 19.
    Model 2 Without standardizedbenchmarks, ML models can be difficult to compare Model 1 Dataset 1 + No structures No AB2C3 compositions 4k samples Dataset 2 + Model 3 Dataset 3 + RMSETest Set = 0.05 eV MAE5-fold CV = 0.021 eV Val. Loss = 0.005 VS. VS. Structures avail. 100k samples Eabove hull < 0.050 eV ??? ??? ??? ??? ???
  • 20.
    What’s needed – an“ImageNet” for materials science https://qz.com/1034972/the-data-that-changed-the-direction-of-ai-research-and-possibly-the-world/ 20
  • 21.
    Can we makethe same advancements in materials as in computer vision? One of the reasons computer science / machine learning seems to advance so quickly is that they decouple data generation from algorithm development This allows groups to focus on algorithm development without all the data generation, data cleaning, etc. that often is the majority of an end-to-end data science project Clear comparisons also move the field forward and measure progress 21
  • 22.
    The ingredients ofthe Matbench benchmark qStandard data sets qStandard test splits according to nested cross-validation procedure qAn online leaderboard that encourages reproducible results 22
  • 23.
    Matbench includes 13different ML tasks 23 Dunn, A.; Wang, Q.; Ganose, A.; Dopp, D.; Jain, A. Benchmarking Materials Property Prediction Methods: The Matbench Test Set and Automatminer Reference Algorithm. npj Comput Mater 2020, 6 (1), 138. https://doi.org/10.1038/s41524-020-00406-3.
  • 24.
    The tasks encompassa variety of problems 13 Ready-to-use ML tasks ranging in training size, target property, inputs, task type. • Pre-cleaned datasets from literature and online repositories (such as Materials Project) • Wide range of practical solid state ML tasks • Experimental and computed properties • Standardized error evaluation (nested CV)
  • 25.
    Browse datasets andtasks with Materials Project MPContribsML https://ml.materialsproject.org
  • 26.
    The ingredients ofthe Matbench benchmark ü Standard data sets qStandard test splits according to nested cross-validation procedure qAn online leaderboard that encourages reproducible results 26
  • 27.
    27 Most commonly usedtest split procedure • Training/validation is used for model selection • Test / hold-out is used only for error estimation (Test set should not inform model selection, i.e. “final answer”)
  • 28.
    Think of itas N different “universes” – we have a different training of the model in each universe and a different hold-out. 28 Nested CV – like hold-out, but varies the hold-out set
  • 29.
    Think of itas N different “universes” – we have a different training of the model in each universe and a different hold-out. 29 Nested CV – like hold-out, but varies the hold-out set “A nested CV procedure provides an almost unbiased estimate of the true error.” Varma and Simon, Bias in error estimation when using cross-validation for model selection (2006)
  • 30.
    The ingredients ofthe Matbench benchmark ü Standard data sets ü Standard test splits according to nested cross-validation procedure qAn online leaderboard that encourages reproducible results 30
  • 31.
    Matbench has anonline leaderboard – matbench.materialsproject.org
  • 32.
    Complete and reproducibleresults on standardized ML tasks Sample-by-sample predictions of all algorithms on all tasks, notebooks and scripts for reproduction Aggregate scores across nested CV folds Complete model metadata, hyperparameters, required compute, academic references .json .ipynb .py
  • 33.
    Algorithm comparison acrossindividual tasks OR complete benchmark Example: matbench_dielectric Compare both specialized and general-purpose algorithms across multiple error metrics
  • 34.
    Evaluation of MLparadigms drives research and development Traditional paradigms: • Traditional Models (e.g., RF + MagPie[1] features) • AutoML inside “traditional ML” space (Automatminer) Advancements in deep neural networks: 1. doi.org/10.1038/npjcompumats.2016.28 Attention Networks (e.g., CRABNet [2]) Optimal Descriptor Networks (e.g, MODNet [3]) Crystal Graph Networks (e.g, CGCNN, MEGNet [4]) 2. doi.org/10.1038/s41524-021-00545-1 3. doi.org/10.1038/s41524-021-00552-2 4. doi.org/10.1021/acs.chemmater.9b01294
  • 35.
    Matbench compares theseML model paradigms Traditional paradigms: • Traditional Models (e.g., RF + MagPie[1] features) • AutoML inside “traditional ML” space (Automatminer) Advancements in deep neural networks: 1. doi.org/10.1038/npjcompumats.2016.28 Attention Networks (e.g., CRABNet [2]) Optimal Descriptor Networks (e.g, MODNet [3]) Crystal Graph Networks (e.g, CGCNN, MEGNet [4]) 2. doi.org/10.1038/s41524-021-00545-1 3. doi.org/10.1038/s41524-021-00552-2 4. doi.org/10.1021/acs.chemmater.9b01294 ✓ - in Matbench ✓ - in Matbench ✓ - in Matbench ✓ - CGCNN in Matbench ✓ - MEGNET in progress ✓ - PR in review
  • 36.
    Contribute your modelto the body of knowledge Matbench Python package Evaluate an entire benchmark with ~10 lines of code $: pip install matbench from matbench.bench import MatbenchBenchmark mb = MatbenchBenchmark(autoload=False) for task in mb.tasks: task.load() for fold in task.folds: train_inputs, train_outputs = task.get_train_and_val_data(fold) my_model.train_and_validate(train_inputs, train_outputs) test_inputs = task.get_test_data(fold, include_target=False) predictions = my_model.predict(test_inputs) task.record(fold, predictions) mb.to_file("my_models_benchmark.json.gz") Your model needs to have: • a function that trains it based on training data • makes a prediction based on the trained model
  • 37.
    Contribute your modelto the body of knowledge Matbench Python package Evaluate an entire benchmark with ~10 lines of code $: pip install matbench from matbench.bench import MatbenchBenchmark mb = MatbenchBenchmark(autoload=False) for task in mb.tasks: task.load() for fold in task.folds: train_inputs, train_outputs = task.get_train_and_val_data(fold) my_model.train_and_validate(train_inputs, train_outputs) test_inputs = task.get_test_data(fold, include_target=False) predictions = my_model.predict(test_inputs) task.record(fold, predictions) mb.to_file("my_models_benchmark.json.gz") Submit model file along with your desired model metadata via Github PR
  • 38.
    The ingredients ofthe Matbench benchmark ü Standard data sets ü Standard test splits according to nested cross-validation procedure ü An online leaderboard that encourages reproducible results 38
  • 39.
    Results so far:graph NN for large data sets, conventional ML for small Dunn, A.; Wang, Q.; Ganose, A.; Dopp, D.; Jain, A. Benchmarking Materials Property Prediction Methods: The Matbench Test Set and Automatminer Reference Algorithm. npj Comput Mater 2020, 6 (1), 138. https://doi.org/10.1038/s41524-020-00406-3. 39
  • 40.
    Overall and upcominggoals for Matbench • We have introduced a method that allows researchers to evaluate their machine learning models on a standard benchmark, if they so choose • The “Matbench” resource also provides metadata and code examples that allows others to reproduce and use community ML models more easily, as well as discover new ML models • In the future, we hope to do expand the type of tasks, perform meta- analyses on what kinds of algorithms work best for certain problems, and plot progress on these tasks over time 40
  • 41.
    Concluding thoughts The MaterialsProject is a free resource providing data and tools to help perform research and development of new materials Even more can be accomplished as a unified community to push forward data dissemination as well as the capabilities of machine learning 41 We encourage you to give Matbench a try, and look forward to seeing your algorithm on the leaderboard!
  • 42.
    Kristin Persson MP Director Theteam Intro Thank you! Patrick Huck Staff Scientist (MPContribs) Alex Dunn Grad Student (Matbench / matminer) Slides (already) uploaded to https://hackingmaterials.lbl.gov