How to use data to design and optimize reactions? ⎯⎯⎯⎯⎯⎯ A quick introduction to work from Sigman lab Speaker: Kosuke Higashida (Sawamura-G) Slides: Ichigaku Takigawa (Takigawa-G) ICReDD Fusion Research Seminar, Apr 27 2020.
Today's talk Introduce papers from Matthew S. Sigman's group that 
 considers how to use data to design and optimize catalysts. Matthew S Sigman Dept. Chemistry
 The University of Utah Organic Synthesis & Asymmetric Catalysis MS Sigman, KC Harper, EN Bess, A Milo
 The development of multidimensional analysis tools for asymmetric catalysis and beyond.
 Accounts of Chemical Research 2016 49 (6), 1292-1301. summarized the following studies from the lab. 1. Harper & Sigman, PNAS 2011 2. Harper & Sigman, Science 2011 • Werner, Mei, Burckle & Sigman, Science 2012 3. Milo, Bess & Sigman, Nature 2014 • Harper, Vilardi & Sigman, JACS 2013 • Bess, Bischoff, Sigman, PNAS 2014 4. Niemeyer, Milo, Hickey & Sigman, Nat Chem 2016;
 Milo, Neel, Toste & Sigman, Science 2015
Our consistent theme today • Chemically, it's about how to rationally design and improve asymmetric catalysis ( , 不对称合成). • Chemoinformatically, it's about QSAR (Quantitative Structure- Activity Relationship) and descriptors. • Statistically/Mathematically, it's about an application of linear regression (design of experiments). How to grasp "empirical" trends in experiments, and how to effectively use them to develop new catalytic reactions?
Outline 1. Harper & Sigman, PNAS. 2011
 Proof of Concept (PoC)
 Consider "Design of Experiments (DoE)" (Information Science) 2. Harper & Sigman, Science. 2011
 De novo Applications of 1 • Werner, Mei, Burckle & Sigman, Science 2012 3. Milo, Bess & Sigman, Nature. 2014
 "More sophisticated descriptors" (Computational Science) • Harper, Vilardi & Sigman, JACS, 2013 • Bess, Bischoff, Sigman, PNAS, 2014 4. Niemeyer, Milo, Hickey & Sigman, Nat Chem 2016;
 Milo, Neel, Toste & Sigman, Science 2015
 Establish a data-intensive workflow for "the study of reaction mechanisms" 0. Quick Introduction to Asymmetric Synthesis
Asymmetric synthesis (Enantioselective synthesis) • A molecule is called "chiral" if it cannot be superposed on its mirror. • A chiral molecule exists in two stereoisomers that are mirror images of each other, called "enantiomers". • They have the same chemical properties, except when reacting with other chiral compounds. • They also have the same physical properties, except that they often have opposite optical activities. • A chiral molecule must have at least one "chiral center" or "stereocenter". A reaction that produces the stereoisomeric (enantiomeric or diastereoisomeric) products in unequal amounts. A key process in modern chemistry, particularly for pharmaceuticals! "enantiomers"
A starting example 0.1 equiv. CrCl3 0.1 equiv. Chiral Ligand 2 equiv. Mn(0) 0.1 equiv. Triethylamine 2 equiv. SiMe3Cl Nozaki-Hiyama-Kishi (NHK) allylation reaction THF, RT, 20h R
How to read this? 0.1 equiv. CrCl3 0.1 equiv. Chiral Ligand Nozaki-Hiyama-Kishi (NHK) allylation reaction Substrates (Reactants) Product THF, RT, 20h Catalyst Ligand (binds to Cr) Reagents Solvent Reaction Temperature (RT = Room Temperature) Reaction Time Starting material ( ) 2 equiv. Mn(0) 0.1 equiv. Triethylamine 2 equiv. SiMe3Cl Reaction partner
A starting example 0.1 equiv. CrCl3 0.1 equiv. Chiral Ligand Nozaki-Hiyama-Kishi (NHK) allylation reaction THF, RT, 20h an activated allyl group is added to carbonyl group Phenyl-group Asymmetric center R 2 equiv. Mn(0) 0.1 equiv. Triethylamine 2 equiv. SiMe3Cl
A starting example 0.1 equiv. CrCl3 0.1 equiv. Chiral Ligand Nozaki-Hiyama-Kishi (NHK) allylation reaction THF, RT, 20h an activated allyl group is added to carbonyl group Phenyl-group or SR Enantiomer ratio: e.r. Enantiomeric Excess: e.e. AR : AS R 2 equiv. Mn(0) 0.1 equiv. Triethylamine 2 equiv. SiMe3Cl
Our problem for now What "substituent" for can improve the enantioselectivity, preferably for a wide range of substrates? "Ligands" ( ) iPr Bn ( ) Groups for ?
 "Substituents" • Ph • tBu • CF3 • CH(CH3)2 • OiPr • Cl • CH3 • CHO • CO2Me • COMe • COOH • OEt • NMe2 • OMe • NH2 • NO2 • OH • F • H= isopropyl group = benzyl group tBu = tertiary butyl 
 group Me = methyl group Et = ethyl group
Sigman's approach 1. Use high-throughput assays to rapidly evaluate many reaction conditions and catalyst structures. 2. Investigate the reaction mechanism to inform the design of better-performing catalysts.
 (by computing various transition-state structures) Two common tactics of many labs (including Sigman's lab) The newly proposed approach of Sigman's lab Simultaneously optimize reaction performance and interrogate mechanistic features of reactions by relating empirical results of carefully designed datasets to physical organic descriptors.
Backstory before the new approach Sigman & Miller, J. Org. Chem. 2009, 74, 20, 7633-7643 They observed 1. Linear Free Energy Relationship (LFER):
 the enantioselectivity and the size of the proline substituent G (measured by Taft/Charton steric parameters) were positively correlated. 2. A break in the LFER when larger G groups were employed. [ ]‡ means 
 a transition state unclear multistep reactions
Their two assumptions for the break of LFER Their focus on outliers: "results that appear at first glance to be peculiar or poor are considerably more interesting than ones that follow obvious or intuitive trends."
Their two assumptions for the break of LFER Their focus on outliers: "results that appear at first glance to be peculiar or poor are considerably more interesting than ones that follow obvious or intuitive trends." 1. The poor extrapolation to larger substituents may be due to any change in the mechanisms of asymmetric catalysis 2. The Taft/Charton parameters implemented do not precisely describe the observed steric effect. These two assumptions for the break of LFER are the starting point of their newly developed approach.
Their two assumptions for the break of LFER Their focus on outliers: "results that appear at first glance to be peculiar or poor are considerably more interesting than ones that follow obvious or intuitive trends." 1. The poor extrapolation to larger substituents may be due to any change in the mechanisms of asymmetric catalysis 2. The Taft/Charton parameters implemented do not precisely describe the observed steric effect. These two assumptions for the break of LFER are the starting point of their newly developed approach. Design of experiments? Data-intensive methods to probe it? More sophisticated parameters? Computational parameters?
First study 1/4 Harper & Sigman, PNAS 2011. Initially 25 ligands synthesized
 acc. the commercial availability and the ease of synthesis.
First study 1/4 Harper & Sigman, PNAS 2011. Initially 25 ligands synthesized
 acc. the commercial availability and the ease of synthesis. LFER development by fitting a linear regression model. enantioselectivity enantiomeric ratio gas constant reaction temperature
First study 1/4 Harper & Sigman, PNAS 2011. Initially 25 ligands synthesized
 acc. the commercial availability and the ease of synthesis. ... resulted in only moderate predictive power and did little to inform catalyst design. "How many and what distribution of data points would provide a meaningful model for prediction?" (from the initial review of the work) LFER development by fitting a linear regression model. enantioselectivity enantiomeric ratio gas constant reaction temperature
First study 2/4 Apply the principles of "Design of Experiments (DoE)" using physical organic parameters. "Briefly, the precepts of DoE involve evenly spreading experiments to the sensitivity limits of a system using continuous variables (e.g. temperature and concentration)." Harper & Sigman, PNAS 2011.
Technical note 1/3: Design of experiments (DoE) We have many variations, but in short... (1) Define coordinates
 with great care. (2) Examine/consider
 the limit of possible ranges. (3) Cover the region evenly 
 with interacting points. "32 factorial design"
 (full factorial design
 with 3 levels of 2 factors) Consult a statistician when descriptors have strong correlation or clear interdependency If you have more variables, you can also consider a "fractional" design...
 (a carefully chosen subset of experiments) better simplex lattice design central composite design Box-Behnken design face-centered central composite design Latin hypercube design Sphere packing design Uniform design Maximum entropy design
Technical note 2/3: Least-squares regression "Predictors"
 (descriptors) "Response" (target) Regression equation for prediction that minimizes the sum of (error-bar height) simple matrix multiplication 2 error-bar fitting a plane!
Technical note 3/3: Least-squares regression Regression equation for prediction Exactly same calculation! fitting a surface! called "response surface" models (RSMs) in DoE If we set like this... all calculated from X and Y
 (composite variables)
First study 3/4 1. Extrapolation is subject to greater uncertainty and error than interpolation. 2. The predictive power of any statistical model is dependent upon the variation in the distribution of data points used to develop the model. Harper & Sigman, PNAS 2011. Their two key tenets
First study 3/4 1. Extrapolation is subject to greater uncertainty and error than interpolation. 2. The predictive power of any statistical model is dependent upon the variation in the distribution of data points used to develop the model. Harper & Sigman, PNAS 2011. Their two key tenets Carefully designed and evenly spaced ligand space according to the range of substituent Charton values that are synthetically accessible! A 3 x 3 library defined
First study 3/4 Harper & Sigman, PNAS 2011. H Me iPr tBu Me tBu CEt3 The response surface model fitted to 3x3 design by linear regression 2 replicates for each point in 3x3 Not reproducible area
First study 3/4 Harper & Sigman, PNAS 2011. Et The response surface model fitted to 3x3 design by linear regression 2 replicates for each point in 3x3 H Me iPr tBu Me tBu CEt3
Second study 1/4 They admitted that ... they were aware of the likely best catalyst structure through extensive trial-and-error screening in the development of these systems, and the identity of the optimal predicted catalyst(s) was not surprising. Harper & Sigman, Science 2011.
Second study 1/4 They admitted that ... they were aware of the likely best catalyst structure through extensive trial-and-error screening in the development of these systems, and the identity of the optimal predicted catalyst(s) was not surprising. De novo application of this approach They applied this to an NHK-type propargylation reaction that had been abondoned years earlier in the group because of their inability, for more than 2 years, to achieve effective asymmetric catalysis.... Harper & Sigman, Science 2011.
Second study 1/4 They admitted that ... they were aware of the likely best catalyst structure through extensive trial-and-error screening in the development of these systems, and the identity of the optimal predicted catalyst(s) was not surprising. De novo application of this approach They applied this to an NHK-type propargylation reaction that had been abondoned years earlier in the group because of their inability, for more than 2 years, to achieve effective asymmetric catalysis.... It turned out 
 the best predicted one was
 limited to 50% ee... Harper & Sigman, Science 2011.
Second study 2/4 By screening only 9 ligands, they could confidently reach an informed decision that a change in catalyst structure was necessary to proceed with catalyst optimization. Harper & Sigman, Science 2011.
Second study 2/4 By screening only 9 ligands, they could confidently reach an informed decision that a change in catalyst structure was necessary to proceed with catalyst optimization. Several observations for redesigning the catalyst. 1. oxazoline unit did not bias the reaction to a great extent 2. a Hammett correlation between the enantioselectivity and the substrate structure was observed. (A significant electronic contribution from the substrate impacts the enantioselectivity) oxazoline Harper & Sigman, Science 2011.
Second study 3/4 These two observations implied ... The new catalyst design should incorporate a readily modifiable group whose electronics could be systematically perturbed to override the innate electronic bias of the substrate. Harper & Sigman, Science 2011.
Second study 3/4 These two observations implied ... The new catalyst design should incorporate a readily modifiable group whose electronics could be systematically perturbed to override the innate electronic bias of the substrate. A 3 x 3 library, replacing the oxazoline with a quinoline, with one dimension manipulated electronically (E) and the other sterically (S). oxazoline quinoline pyridine Harper & Sigman, Science 2011.
Side note 1/2: LFERs and Hammett equation "The Hammett equation (and its extended forms) has been one of the most widely used means for the study and interpretation of organic reactions and their mechanisms." (Hansch, Leo & Taft, Chem Rev 1991)
Side note 1/2: LFERs and Hammett equation "The Hammett equation (and its extended forms) has been one of the most widely used means for the study and interpretation of organic reactions and their mechanisms." (Hansch, Leo & Taft, Chem Rev 1991) For any two reactions with two aromatic reactants only differing in the type of substituent , the change in free energy of activation is proportional to the change in Gibbs free energy. (i.e. a "linear" relationship) equilibrium constant with that with = H (reference) substituent constant (depends only on ) reaction constant (depends on reactions but independent from ) "parameter"
 values for
Side note 1/2: LFERs and Hammett equation "The Hammett equation (and its extended forms) has been one of the most widely used means for the study and interpretation of organic reactions and their mechanisms." (Hansch, Leo & Taft, Chem Rev 1991) • This is an empirical trend, not follow from any chemical theory 
 (thus also long critisized from the theoretical side?). • A surprise was, this equation also holds for reaction rates. • This triggered many other useful variations as linear free-energy relationships (LFER) as well as nonlinear ones such as quadratic. For any two reactions with two aromatic reactants only differing in the type of substituent , the change in free energy of activation is proportional to the change in Gibbs free energy. (i.e. a "linear" relationship) equilibrium constant with that with = H (reference) substituent constant (depends only on ) reaction constant (depends on reactions but independent from ) "parameter"
 values for
Side note 2/2: LFER is generalized to QSAR/QSPR LFERs or prediction by empirical trends in data also yielded 
 a long history of QSAR/QSPR in Chemoinformatics 
 since Taft/Hammett and Hansch-Fujita, etc.... : Hansch (2014)
Second study 4/4 This allowed them to confidently identify a catalyst that performed well for a wide range of subtrates with differential electronic features. The same approach is also applied to identify highly enantioselective catalysts for the Pd-catalyzed redox-relay Heck reactions of alkenols 
 (Werner, Mei, Burckle & Sigman, Science 2012, 338, 1455-1458). Hammett parameters (σ) Charton parameters (ν) electronical effect (E) sterical effect (S) Harper & Sigman, Science 2011.
Toward parameter (re-)considerations It is critical to use appropriate parameters (descriptors) for characteristically describing the properties of substituents... For example, Taft/Charton values have been historically controversial because of the limited generality... (They arose from a particular mechanism)
Charton values NHK allylation Desymmetrization of bisphenols Worked. Failed. Harper, Bess & Sigman, Nat Chem 2012.
Sterimol Harper, Bess & Sigman, Nat Chem 2012. (Rediscovery of) Sterimol parameters • Develop in 70's (Verloop & Tipker, 1977) • Three parameters (L, B1, B5) • Computationally derived. Worked. Worked.
Third study 1/2 Milo, Bess & Sigman, Nature 2014. New computationally derived parameters: 
 Use IR vibrations as parameters to describe simultaneous steric and electronic perturbations to structure. To this point, a key problem in the history of LFERs has avoided:
 the inability to analyze substituents with changes to both size and electronic nature. (Use computational science!)
Third study 2/2 To this point, a key problem in the history of LFERs has avoided:
 the inability to analyze substituents with changes to both size and electronic nature. 1. Perform a geometry minimization and frequency calculation of starting materials 2. Compute vibrations as well as other parameters, including Sterimol values, NBO charges, and geometric factors such as torsion angles. 3. Abridge the parameter set to a statistically manageable number. 4. Perform and evaluate a systematic iterative process to establish and improve statistical predictive models. New computationally derived parameters: 
 IR(Infrared)-spectroscopy-based parameters to describe simultaneous steric and electronic perturbations to structure. Milo, Bess & Sigman, Nature 2014.
Applications of this to evaluate the substrate scope This approach is intended to use in catalyst optimization, but can be applied to different challenges such as "substrate scope". 1. Harper, Vilardi & Sigman, JACS 2013 
 The catalyst and substrate structures were jointly analyzed to predict the "best" catalyst/substrate combinations. 2. Bess, Bischoff, Sigman, PNAS 2014 
 A modest-sized library of substrates was designed to build a model for predicting the performance of even unreported substrates, and investigated a Rh-catalyzed asymmetric transfer hydrogeneation of ketones.
Returned to their original pursuit: reaction mechanisms They hypothesized their predictive models (multidimensional correlations) would encompass mechanism-rich information. However, inferring mechanistic information becomes a daunting task
 because 1. the models become increasingly complex. 2. the models contain compounded parameters that are not representative of one distinct structural characteristic. A data-intensive approach to mechanistic elucidation requires philoshophical changes: One has to overcome the entrenched dogma that low-enantioselectivity measurements are "bad results". Every value obtained in the course of data collection is useful and can serve to reveal structural information regarding the mechanism!
4th/5th studies Niemeyer, Milo, Hickey & Sigman, Nat Chem 2016
 Milo, Neel, Toste & Sigman, Science 2015 1. Niemeyer, Milo, Hickey & Sigman, Nat Chem, 2016 
 Parameter selection is not intuitive and an effective parameter set is statistically obtained for phosphine ligands that served to distinguish between two limiting mechanistic regimes in the site-selective oxidative addition in a Suzuki reaction. 2. Milo, Neel, Toste & Sigman, Science 2015 
 Synthetically reasonable and mechanistically informative catalysts, substrates, or reagents are virtually predicted, and then prepared and tested experimentally in the context of chiral phase-transfer catalysis. This tactic requires the design and preparation of a systematic and synthetically modular library of catalysts, substrates, and reagents. The experimental design defines the effects that can be explained by the produced models.
Collaborative and further studies 1. Neel, Milo, Sigman, Toste, JACS, 2016 
 Chiral phosphoric acid-catalyzed fluorination of allylic alcohols 2. Bess, DeLuca, Tindall, Oderinde, Roizen, Du Bois & Sigman, JACS, 2014 
 Rh-catalyzed C-H amination reaction 3. Bess, Guptill, Davies & Sigman, Chem Sci, 2015 
 a carbene C-H insertion reaction 4. Mougel, Santiago, Zhizhko, Bess, Varga, Frater, Sigman, Copéret, JACS, 2015
 Turnover frequency (TOF) of solid-supported W-based catalysts for olefin metathesis 5. Zhang, Santiago, Crawford & Sigman, JACS, 2015
 ligand optimization for Pd-catalyzed enantioselective redox-relay Heck reactions 6. Zhang, Santiago, Kou & Sigman, JACS, 2015
 evaluation of solvent effects for Pd-catalyzed enantioselective redox- relay Heck reactions
Lessons we learned 1. The design of experiments (DoE) is crucial to obtain causal insights on the target reactions. 2. We can computationally derive the physical organic chemical parameters to describe the mechanistic features of reactions. 3. "In brief, one needs a reliable and accurate assay for the output of interest, a reasonable spread of such outputs to facilitate statistical analysis, and a reasonable level of modularity in the process under study." Deconstructing rather complex relationships still requires profound physical organic chemistry expertise, the use of various classical tools in this field, and creativity and intuition to understand sophisticated processes.
After 2016: The 2019 Nature paper and beyond
1 page summary of the 2019 Nature paper Method: Forward stepwise linear regression of Matlab statistics toolbox (since Nature 2014?) Give up DoE and go more towards a "ML/AI"-like approach for out-of-sample prediction. nucleophile imine starting material amine product chiral phosphoric acid catalyst • 367 reactions 
 (manually curated from 17 papers) • 313 parameters from every components
 (Steric, DFT-derived, 2D chemoinfo descriptors, conditions) Proof-of-concept with a reaction class with different catalyst chemotypes, substrate structural types. out-of-sample prediction Note: this is predicted vs measured plot
 (not a fitted line)
Lessons we learned 1. The design of experiments (DoE) is crucial to obtain causal insights on the target reactions. 2. We can computationally derive the physical organic chemical parameters to describe the mechanistic features of reactions. 3. "In brief, one needs a reliable and accurate assay for the output of interest, a reasonable spread of such outputs to facilitate statistical analysis, and a reasonable level of modularity in the process under study." Deconstructing rather complex relationships still requires profound physical organic chemistry expertise, the use of various classical tools in this field, and creativity and intuition to understand sophisticated processes.
References
References
References
References
References
References

How to use data to design and optimize reaction? A quick introduction to work from Sigman lab

  • 1.
    How to usedata to design and optimize reactions? ⎯⎯⎯⎯⎯⎯ A quick introduction to work from Sigman lab Speaker: Kosuke Higashida (Sawamura-G) Slides: Ichigaku Takigawa (Takigawa-G) ICReDD Fusion Research Seminar, Apr 27 2020.
  • 2.
    Today's talk Introduce papersfrom Matthew S. Sigman's group that 
 considers how to use data to design and optimize catalysts. Matthew S Sigman Dept. Chemistry
 The University of Utah Organic Synthesis & Asymmetric Catalysis MS Sigman, KC Harper, EN Bess, A Milo
 The development of multidimensional analysis tools for asymmetric catalysis and beyond.
 Accounts of Chemical Research 2016 49 (6), 1292-1301. summarized the following studies from the lab. 1. Harper & Sigman, PNAS 2011 2. Harper & Sigman, Science 2011 • Werner, Mei, Burckle & Sigman, Science 2012 3. Milo, Bess & Sigman, Nature 2014 • Harper, Vilardi & Sigman, JACS 2013 • Bess, Bischoff, Sigman, PNAS 2014 4. Niemeyer, Milo, Hickey & Sigman, Nat Chem 2016;
 Milo, Neel, Toste & Sigman, Science 2015
  • 3.
    Our consistent themetoday • Chemically, it's about how to rationally design and improve asymmetric catalysis ( , 不对称合成). • Chemoinformatically, it's about QSAR (Quantitative Structure- Activity Relationship) and descriptors. • Statistically/Mathematically, it's about an application of linear regression (design of experiments). How to grasp "empirical" trends in experiments, and how to effectively use them to develop new catalytic reactions?
  • 4.
    Outline 1. Harper &Sigman, PNAS. 2011
 Proof of Concept (PoC)
 Consider "Design of Experiments (DoE)" (Information Science) 2. Harper & Sigman, Science. 2011
 De novo Applications of 1 • Werner, Mei, Burckle & Sigman, Science 2012 3. Milo, Bess & Sigman, Nature. 2014
 "More sophisticated descriptors" (Computational Science) • Harper, Vilardi & Sigman, JACS, 2013 • Bess, Bischoff, Sigman, PNAS, 2014 4. Niemeyer, Milo, Hickey & Sigman, Nat Chem 2016;
 Milo, Neel, Toste & Sigman, Science 2015
 Establish a data-intensive workflow for "the study of reaction mechanisms" 0. Quick Introduction to Asymmetric Synthesis
  • 5.
    Asymmetric synthesis (Enantioselectivesynthesis) • A molecule is called "chiral" if it cannot be superposed on its mirror. • A chiral molecule exists in two stereoisomers that are mirror images of each other, called "enantiomers". • They have the same chemical properties, except when reacting with other chiral compounds. • They also have the same physical properties, except that they often have opposite optical activities. • A chiral molecule must have at least one "chiral center" or "stereocenter". A reaction that produces the stereoisomeric (enantiomeric or diastereoisomeric) products in unequal amounts. A key process in modern chemistry, particularly for pharmaceuticals! "enantiomers"
  • 6.
    A starting example 0.1equiv. CrCl3 0.1 equiv. Chiral Ligand 2 equiv. Mn(0) 0.1 equiv. Triethylamine 2 equiv. SiMe3Cl Nozaki-Hiyama-Kishi (NHK) allylation reaction THF, RT, 20h R
  • 7.
    How to readthis? 0.1 equiv. CrCl3 0.1 equiv. Chiral Ligand Nozaki-Hiyama-Kishi (NHK) allylation reaction Substrates (Reactants) Product THF, RT, 20h Catalyst Ligand (binds to Cr) Reagents Solvent Reaction Temperature (RT = Room Temperature) Reaction Time Starting material ( ) 2 equiv. Mn(0) 0.1 equiv. Triethylamine 2 equiv. SiMe3Cl Reaction partner
  • 8.
    A starting example 0.1equiv. CrCl3 0.1 equiv. Chiral Ligand Nozaki-Hiyama-Kishi (NHK) allylation reaction THF, RT, 20h an activated allyl group is added to carbonyl group Phenyl-group Asymmetric center R 2 equiv. Mn(0) 0.1 equiv. Triethylamine 2 equiv. SiMe3Cl
  • 9.
    A starting example 0.1equiv. CrCl3 0.1 equiv. Chiral Ligand Nozaki-Hiyama-Kishi (NHK) allylation reaction THF, RT, 20h an activated allyl group is added to carbonyl group Phenyl-group or SR Enantiomer ratio: e.r. Enantiomeric Excess: e.e. AR : AS R 2 equiv. Mn(0) 0.1 equiv. Triethylamine 2 equiv. SiMe3Cl
  • 10.
    Our problem fornow What "substituent" for can improve the enantioselectivity, preferably for a wide range of substrates? "Ligands" ( ) iPr Bn ( ) Groups for ?
 "Substituents" • Ph • tBu • CF3 • CH(CH3)2 • OiPr • Cl • CH3 • CHO • CO2Me • COMe • COOH • OEt • NMe2 • OMe • NH2 • NO2 • OH • F • H= isopropyl group = benzyl group tBu = tertiary butyl 
 group Me = methyl group Et = ethyl group
  • 11.
    Sigman's approach 1. Usehigh-throughput assays to rapidly evaluate many reaction conditions and catalyst structures. 2. Investigate the reaction mechanism to inform the design of better-performing catalysts.
 (by computing various transition-state structures) Two common tactics of many labs (including Sigman's lab) The newly proposed approach of Sigman's lab Simultaneously optimize reaction performance and interrogate mechanistic features of reactions by relating empirical results of carefully designed datasets to physical organic descriptors.
  • 12.
    Backstory before thenew approach Sigman & Miller, J. Org. Chem. 2009, 74, 20, 7633-7643 They observed 1. Linear Free Energy Relationship (LFER):
 the enantioselectivity and the size of the proline substituent G (measured by Taft/Charton steric parameters) were positively correlated. 2. A break in the LFER when larger G groups were employed. [ ]‡ means 
 a transition state unclear multistep reactions
  • 13.
    Their two assumptionsfor the break of LFER Their focus on outliers: "results that appear at first glance to be peculiar or poor are considerably more interesting than ones that follow obvious or intuitive trends."
  • 14.
    Their two assumptionsfor the break of LFER Their focus on outliers: "results that appear at first glance to be peculiar or poor are considerably more interesting than ones that follow obvious or intuitive trends." 1. The poor extrapolation to larger substituents may be due to any change in the mechanisms of asymmetric catalysis 2. The Taft/Charton parameters implemented do not precisely describe the observed steric effect. These two assumptions for the break of LFER are the starting point of their newly developed approach.
  • 15.
    Their two assumptionsfor the break of LFER Their focus on outliers: "results that appear at first glance to be peculiar or poor are considerably more interesting than ones that follow obvious or intuitive trends." 1. The poor extrapolation to larger substituents may be due to any change in the mechanisms of asymmetric catalysis 2. The Taft/Charton parameters implemented do not precisely describe the observed steric effect. These two assumptions for the break of LFER are the starting point of their newly developed approach. Design of experiments? Data-intensive methods to probe it? More sophisticated parameters? Computational parameters?
  • 16.
    First study 1/4Harper & Sigman, PNAS 2011. Initially 25 ligands synthesized
 acc. the commercial availability and the ease of synthesis.
  • 17.
    First study 1/4Harper & Sigman, PNAS 2011. Initially 25 ligands synthesized
 acc. the commercial availability and the ease of synthesis. LFER development by fitting a linear regression model. enantioselectivity enantiomeric ratio gas constant reaction temperature
  • 18.
    First study 1/4Harper & Sigman, PNAS 2011. Initially 25 ligands synthesized
 acc. the commercial availability and the ease of synthesis. ... resulted in only moderate predictive power and did little to inform catalyst design. "How many and what distribution of data points would provide a meaningful model for prediction?" (from the initial review of the work) LFER development by fitting a linear regression model. enantioselectivity enantiomeric ratio gas constant reaction temperature
  • 19.
    First study 2/4 Applythe principles of "Design of Experiments (DoE)" using physical organic parameters. "Briefly, the precepts of DoE involve evenly spreading experiments to the sensitivity limits of a system using continuous variables (e.g. temperature and concentration)." Harper & Sigman, PNAS 2011.
  • 20.
    Technical note 1/3:Design of experiments (DoE) We have many variations, but in short... (1) Define coordinates
 with great care. (2) Examine/consider
 the limit of possible ranges. (3) Cover the region evenly 
 with interacting points. "32 factorial design"
 (full factorial design
 with 3 levels of 2 factors) Consult a statistician when descriptors have strong correlation or clear interdependency If you have more variables, you can also consider a "fractional" design...
 (a carefully chosen subset of experiments) better simplex lattice design central composite design Box-Behnken design face-centered central composite design Latin hypercube design Sphere packing design Uniform design Maximum entropy design
  • 21.
    Technical note 2/3:Least-squares regression "Predictors"
 (descriptors) "Response" (target) Regression equation for prediction that minimizes the sum of (error-bar height) simple matrix multiplication 2 error-bar fitting a plane!
  • 22.
    Technical note 3/3:Least-squares regression Regression equation for prediction Exactly same calculation! fitting a surface! called "response surface" models (RSMs) in DoE If we set like this... all calculated from X and Y
 (composite variables)
  • 23.
    First study 3/4 1.Extrapolation is subject to greater uncertainty and error than interpolation. 2. The predictive power of any statistical model is dependent upon the variation in the distribution of data points used to develop the model. Harper & Sigman, PNAS 2011. Their two key tenets
  • 24.
    First study 3/4 1.Extrapolation is subject to greater uncertainty and error than interpolation. 2. The predictive power of any statistical model is dependent upon the variation in the distribution of data points used to develop the model. Harper & Sigman, PNAS 2011. Their two key tenets Carefully designed and evenly spaced ligand space according to the range of substituent Charton values that are synthetically accessible! A 3 x 3 library defined
  • 25.
    First study 3/4Harper & Sigman, PNAS 2011. H Me iPr tBu Me tBu CEt3 The response surface model fitted to 3x3 design by linear regression 2 replicates for each point in 3x3 Not reproducible area
  • 26.
    First study 3/4Harper & Sigman, PNAS 2011. Et The response surface model fitted to 3x3 design by linear regression 2 replicates for each point in 3x3 H Me iPr tBu Me tBu CEt3
  • 27.
    Second study 1/4 Theyadmitted that ... they were aware of the likely best catalyst structure through extensive trial-and-error screening in the development of these systems, and the identity of the optimal predicted catalyst(s) was not surprising. Harper & Sigman, Science 2011.
  • 28.
    Second study 1/4 Theyadmitted that ... they were aware of the likely best catalyst structure through extensive trial-and-error screening in the development of these systems, and the identity of the optimal predicted catalyst(s) was not surprising. De novo application of this approach They applied this to an NHK-type propargylation reaction that had been abondoned years earlier in the group because of their inability, for more than 2 years, to achieve effective asymmetric catalysis.... Harper & Sigman, Science 2011.
  • 29.
    Second study 1/4 Theyadmitted that ... they were aware of the likely best catalyst structure through extensive trial-and-error screening in the development of these systems, and the identity of the optimal predicted catalyst(s) was not surprising. De novo application of this approach They applied this to an NHK-type propargylation reaction that had been abondoned years earlier in the group because of their inability, for more than 2 years, to achieve effective asymmetric catalysis.... It turned out 
 the best predicted one was
 limited to 50% ee... Harper & Sigman, Science 2011.
  • 30.
    Second study 2/4 Byscreening only 9 ligands, they could confidently reach an informed decision that a change in catalyst structure was necessary to proceed with catalyst optimization. Harper & Sigman, Science 2011.
  • 31.
    Second study 2/4 Byscreening only 9 ligands, they could confidently reach an informed decision that a change in catalyst structure was necessary to proceed with catalyst optimization. Several observations for redesigning the catalyst. 1. oxazoline unit did not bias the reaction to a great extent 2. a Hammett correlation between the enantioselectivity and the substrate structure was observed. (A significant electronic contribution from the substrate impacts the enantioselectivity) oxazoline Harper & Sigman, Science 2011.
  • 32.
    Second study 3/4 Thesetwo observations implied ... The new catalyst design should incorporate a readily modifiable group whose electronics could be systematically perturbed to override the innate electronic bias of the substrate. Harper & Sigman, Science 2011.
  • 33.
    Second study 3/4 Thesetwo observations implied ... The new catalyst design should incorporate a readily modifiable group whose electronics could be systematically perturbed to override the innate electronic bias of the substrate. A 3 x 3 library, replacing the oxazoline with a quinoline, with one dimension manipulated electronically (E) and the other sterically (S). oxazoline quinoline pyridine Harper & Sigman, Science 2011.
  • 34.
    Side note 1/2:LFERs and Hammett equation "The Hammett equation (and its extended forms) has been one of the most widely used means for the study and interpretation of organic reactions and their mechanisms." (Hansch, Leo & Taft, Chem Rev 1991)
  • 35.
    Side note 1/2:LFERs and Hammett equation "The Hammett equation (and its extended forms) has been one of the most widely used means for the study and interpretation of organic reactions and their mechanisms." (Hansch, Leo & Taft, Chem Rev 1991) For any two reactions with two aromatic reactants only differing in the type of substituent , the change in free energy of activation is proportional to the change in Gibbs free energy. (i.e. a "linear" relationship) equilibrium constant with that with = H (reference) substituent constant (depends only on ) reaction constant (depends on reactions but independent from ) "parameter"
 values for
  • 36.
    Side note 1/2:LFERs and Hammett equation "The Hammett equation (and its extended forms) has been one of the most widely used means for the study and interpretation of organic reactions and their mechanisms." (Hansch, Leo & Taft, Chem Rev 1991) • This is an empirical trend, not follow from any chemical theory 
 (thus also long critisized from the theoretical side?). • A surprise was, this equation also holds for reaction rates. • This triggered many other useful variations as linear free-energy relationships (LFER) as well as nonlinear ones such as quadratic. For any two reactions with two aromatic reactants only differing in the type of substituent , the change in free energy of activation is proportional to the change in Gibbs free energy. (i.e. a "linear" relationship) equilibrium constant with that with = H (reference) substituent constant (depends only on ) reaction constant (depends on reactions but independent from ) "parameter"
 values for
  • 37.
    Side note 2/2:LFER is generalized to QSAR/QSPR LFERs or prediction by empirical trends in data also yielded 
 a long history of QSAR/QSPR in Chemoinformatics 
 since Taft/Hammett and Hansch-Fujita, etc.... : Hansch (2014)
  • 38.
    Second study 4/4 Thisallowed them to confidently identify a catalyst that performed well for a wide range of subtrates with differential electronic features. The same approach is also applied to identify highly enantioselective catalysts for the Pd-catalyzed redox-relay Heck reactions of alkenols 
 (Werner, Mei, Burckle & Sigman, Science 2012, 338, 1455-1458). Hammett parameters (σ) Charton parameters (ν) electronical effect (E) sterical effect (S) Harper & Sigman, Science 2011.
  • 39.
    Toward parameter (re-)considerations Itis critical to use appropriate parameters (descriptors) for characteristically describing the properties of substituents... For example, Taft/Charton values have been historically controversial because of the limited generality... (They arose from a particular mechanism)
  • 40.
    Charton values NHK allylationDesymmetrization of bisphenols Worked. Failed. Harper, Bess & Sigman, Nat Chem 2012.
  • 41.
    Sterimol Harper, Bess& Sigman, Nat Chem 2012. (Rediscovery of) Sterimol parameters • Develop in 70's (Verloop & Tipker, 1977) • Three parameters (L, B1, B5) • Computationally derived. Worked. Worked.
  • 42.
    Third study 1/2Milo, Bess & Sigman, Nature 2014. New computationally derived parameters: 
 Use IR vibrations as parameters to describe simultaneous steric and electronic perturbations to structure. To this point, a key problem in the history of LFERs has avoided:
 the inability to analyze substituents with changes to both size and electronic nature. (Use computational science!)
  • 43.
    Third study 2/2 Tothis point, a key problem in the history of LFERs has avoided:
 the inability to analyze substituents with changes to both size and electronic nature. 1. Perform a geometry minimization and frequency calculation of starting materials 2. Compute vibrations as well as other parameters, including Sterimol values, NBO charges, and geometric factors such as torsion angles. 3. Abridge the parameter set to a statistically manageable number. 4. Perform and evaluate a systematic iterative process to establish and improve statistical predictive models. New computationally derived parameters: 
 IR(Infrared)-spectroscopy-based parameters to describe simultaneous steric and electronic perturbations to structure. Milo, Bess & Sigman, Nature 2014.
  • 44.
    Applications of thisto evaluate the substrate scope This approach is intended to use in catalyst optimization, but can be applied to different challenges such as "substrate scope". 1. Harper, Vilardi & Sigman, JACS 2013 
 The catalyst and substrate structures were jointly analyzed to predict the "best" catalyst/substrate combinations. 2. Bess, Bischoff, Sigman, PNAS 2014 
 A modest-sized library of substrates was designed to build a model for predicting the performance of even unreported substrates, and investigated a Rh-catalyzed asymmetric transfer hydrogeneation of ketones.
  • 45.
    Returned to theiroriginal pursuit: reaction mechanisms They hypothesized their predictive models (multidimensional correlations) would encompass mechanism-rich information. However, inferring mechanistic information becomes a daunting task
 because 1. the models become increasingly complex. 2. the models contain compounded parameters that are not representative of one distinct structural characteristic. A data-intensive approach to mechanistic elucidation requires philoshophical changes: One has to overcome the entrenched dogma that low-enantioselectivity measurements are "bad results". Every value obtained in the course of data collection is useful and can serve to reveal structural information regarding the mechanism!
  • 46.
    4th/5th studies Niemeyer, Milo,Hickey & Sigman, Nat Chem 2016
 Milo, Neel, Toste & Sigman, Science 2015 1. Niemeyer, Milo, Hickey & Sigman, Nat Chem, 2016 
 Parameter selection is not intuitive and an effective parameter set is statistically obtained for phosphine ligands that served to distinguish between two limiting mechanistic regimes in the site-selective oxidative addition in a Suzuki reaction. 2. Milo, Neel, Toste & Sigman, Science 2015 
 Synthetically reasonable and mechanistically informative catalysts, substrates, or reagents are virtually predicted, and then prepared and tested experimentally in the context of chiral phase-transfer catalysis. This tactic requires the design and preparation of a systematic and synthetically modular library of catalysts, substrates, and reagents. The experimental design defines the effects that can be explained by the produced models.
  • 47.
    Collaborative and furtherstudies 1. Neel, Milo, Sigman, Toste, JACS, 2016 
 Chiral phosphoric acid-catalyzed fluorination of allylic alcohols 2. Bess, DeLuca, Tindall, Oderinde, Roizen, Du Bois & Sigman, JACS, 2014 
 Rh-catalyzed C-H amination reaction 3. Bess, Guptill, Davies & Sigman, Chem Sci, 2015 
 a carbene C-H insertion reaction 4. Mougel, Santiago, Zhizhko, Bess, Varga, Frater, Sigman, Copéret, JACS, 2015
 Turnover frequency (TOF) of solid-supported W-based catalysts for olefin metathesis 5. Zhang, Santiago, Crawford & Sigman, JACS, 2015
 ligand optimization for Pd-catalyzed enantioselective redox-relay Heck reactions 6. Zhang, Santiago, Kou & Sigman, JACS, 2015
 evaluation of solvent effects for Pd-catalyzed enantioselective redox- relay Heck reactions
  • 48.
    Lessons we learned 1.The design of experiments (DoE) is crucial to obtain causal insights on the target reactions. 2. We can computationally derive the physical organic chemical parameters to describe the mechanistic features of reactions. 3. "In brief, one needs a reliable and accurate assay for the output of interest, a reasonable spread of such outputs to facilitate statistical analysis, and a reasonable level of modularity in the process under study." Deconstructing rather complex relationships still requires profound physical organic chemistry expertise, the use of various classical tools in this field, and creativity and intuition to understand sophisticated processes.
  • 49.
    After 2016: The2019 Nature paper and beyond
  • 50.
    1 page summaryof the 2019 Nature paper Method: Forward stepwise linear regression of Matlab statistics toolbox (since Nature 2014?) Give up DoE and go more towards a "ML/AI"-like approach for out-of-sample prediction. nucleophile imine starting material amine product chiral phosphoric acid catalyst • 367 reactions 
 (manually curated from 17 papers) • 313 parameters from every components
 (Steric, DFT-derived, 2D chemoinfo descriptors, conditions) Proof-of-concept with a reaction class with different catalyst chemotypes, substrate structural types. out-of-sample prediction Note: this is predicted vs measured plot
 (not a fitted line)
  • 51.
    Lessons we learned 1.The design of experiments (DoE) is crucial to obtain causal insights on the target reactions. 2. We can computationally derive the physical organic chemical parameters to describe the mechanistic features of reactions. 3. "In brief, one needs a reliable and accurate assay for the output of interest, a reasonable spread of such outputs to facilitate statistical analysis, and a reasonable level of modularity in the process under study." Deconstructing rather complex relationships still requires profound physical organic chemistry expertise, the use of various classical tools in this field, and creativity and intuition to understand sophisticated processes.
  • 52.
  • 53.
  • 54.
  • 55.
  • 56.
  • 57.