Henry Rzepa's blog

Tag: simulation

Reaction coordinates vs Dynamic trajectories as illustrated by an example reaction mechanism.

The example a few posts back of how methane might invert its configuration by transposing two hydrogen atoms illustrated the reaction mechanism by locating a transition state and following it down in energy using an intrinsic reaction coordinate (IRC). Here I explore an alternative method based instead on computing a molecular dynamics trajectory (MD).

I have used ethane instead of methane, since it is now possible to envisage two outcomes:

An animation of the IRC starting from the located transition state is shown below (DOI: 10.14469/hpc/2331). This is based purely on the computed potential energy surface of the molecule. The IRC is computed from the forces experienced on the atoms as they are displaced from an initial set of coordinates corresponding to the located transition state and then following the direction indicated by the eigenvectors of the negative force constant required of a transition state. Importantly, there is no time component; the path is based entirely on energies and forces.

Next, a molecular dynamics simulation (ωB97XD/6-31G(d,p), DOI: 10.14469/hpc/2330). This uses the ADMP method, which requests a classical trajectory calculation using the “atom-centered density matrix propagation molecular dynamics model”. This integrates kinetic energy contributions from the molecular vibrations and so explicitly now includes a time component. In this example the evolution of the system from the transition state is charted over a period of 100 femtoseconds (1000 integrated steps). As it happens this is a relatively short period of evolution; sometimes periods of picoseconds may be required to get a realistic model, especially if one is also dealing with explicit solvent molecules (of which perhaps 500 might be required).

Such explicit inclusion of the kinetic energy from molecular vibrations in effect allows the molecule to “jump” over shallow barriers. In this case, the barrier for a [1,2] hydrogen shift from the methyl group to the adjacent carbene (watch atom 8). Simultaneously, the path taken by two hydrogens no longer corresponds to their transposition but to their elimination as a hydrogen molecule. So this quite different outcome from the IRC is very probably also a much more realistic one.

If the MD method is so much more realistic than the IRC, then why is it not always used? The simple answer is computational time! For this very small molecule and using quite a modest basis set (6-31G(d,p)), the relatively short 1000 time steps took about three times as long to compute as the IRC. The factor gets worse as the size of the molecule increases and the number of time steps for a realistic result increases. Perhaps, as technology gets better and new computer architectures emerge, MD simulations of ever increasingly complex reactions will become far more common. In ten years time, I expect most of the examples on this blog will use this method!

March 20, 2017
The 2016 Bradley-Mason prize for open chemistry.

Peter Murray-Rust and I are delighted to announce that the 2016 award of the Bradley-Mason prize for open chemistry goes to Jan Szopinski (UG) and Clyde Fare (PG).

Jan’s open chemistry derives from a final year project looking at why atom charges derived from quantum chemical calculation of the electronic density represent chemical information well, but the electrostatic potential (ESP) generated from these charges is very poor and conversely charges derived from the computed electrostatic potential are incommensurate with chemical information (such as the electronegativity of atoms). He has developed a Python program called ‘repESP’ in which ‘compromise’ charges are generated which attempt to reconcile the physical world-view (fitting the ESP) with chemical insight provided by NPA (Natural Population Analysis). Jan was the main driver to making his code open source, “opening his supervisor’s eyes” to the various flavours of open source licences. To ensure that all subsequent improvements to the program remain available to anyone, the source code has been released under a ‘copyleft’ licence (GPL v3) and is maintained by Jan on GitHub, where Jan looks forward to helping new users and collaborating with contributors.

Clyde has made various contributions to opensource chemistry over the period of his PhD, with the focus mainly on utilities to improve quantum chemical research and the enhancement of a popular machine learning library with a method that has been successful in chemometrics, creation of an opensource channel for teaching chemists programming and data analysis and creation of a tool to help encourage open sourcing software development. Cclib is the most popular library for parsing quantum chemical data from output files and Clyde has contributed patches for the Atomic simulation environment which enables control of quantum chemical codes from a unified python interface. He was responsible for the construction of a computational chemistry electronic notebook published to github and which is now under active development by others as well. This aims to encapsulate computation chemical research projects, both for the sake of reproducibility and for the sake of organising and keeping track of quantum chemical research. Alongside this platform he created an enhanced Gaussian calculator for the Atomic Simulation Environment that enables automatic construction of ONIOM input files, also now under active development. He also made contributions to scikit learn, the most popular python machine learning framework, implementing a kernel for Kernel Ridge Regression that has become the most successful kernel for regression over molecular properties. He was part of the team that won the 2014 sustainable software conference prize for creation of the opensource healthchecker software as part of Sustain. He has argued for opensource as a platform for teaching resources and created the Imperial Chemistry github user account, which is now run by the department. Materials for the Imperial Chemistry Data Analysis and Programming workshops implemented as Python Notebooks are now available through this account and continue under active development.

Criteria for the award will include judging the submission on its immediate accessibility via public web sites, what is visible and re-usable in this way and of evidence of either community formation/engagement or re-use of materials by people other than the proposer.

October 4, 2016
Chiroptical spectroscopy of the natural product Steganone.
Steganone is an unusual natural product, known for about 40 years now. The assignment of its absolute configurations makes for an interesting, on occasion rather confusing, and perhaps not entirely atypical story. I will start with the modern accepted stereochemical structure of this molecule, which comes in the form of two separately isolable atropisomers.

The first reported synthesis of this system in 1977 was racemic, and no stereochemistry is shown in the article (structure 2).[cite]10.1039/P19770001674[/cite] Three years later an “Asymmetric total synthesis of (-)steganone and revision of its absolute configuration” shows how the then accepted configuration (structure 1 in this article) needs to be revised to the enantiomer shown as structure 12 in the article[cite]10.1016/S0040-4039(00)78586-8[/cite] and matching the above representation. The system has continued to attract interest ever since[cite]10.1039/P19820000521[/cite],[cite]10.1039/A900743A[/cite],[cite]10.1039/C39950001943[/cite],[cite]10.1002/ejoc.201402761[/cite], not least because of the presence of axial chirality in the form of atropisomerism. Thus early on it was shown that the alternative atropisomer, the (aS,R,R) configuration initially emerges out of several syntheses, and has to be converted to the (aR,R,R) configuration by heating[cite]10.1039/P19820000521[/cite]. One could easily be fooled by such isomerism!

Absolute configurations can be established in several ways.
1. From precursors of known absolute configuration. This was the most common method until relatively recently, but it is very expensive since asymmetric syntheses are often much more complex and longer than racemic ones. There is always a small residual doubt that any transformation in the synthesis might have altered the configuration in an unexpected manner.
2. From an X-ray of the final configuration (Bijvoet). Very often the structure is determined on a derivative of the target compound (the original may not form suitable crystals). There is also the doubt that the selected crystals may in fact be a minor form and do not represent the bulk of the system in solution. This is especially true where atropisomerism is concerned, since the solid state structure may not represent the same atropisomer present in solution.
3. In the last decade or so, it has become more common to make use of the computation of measured chiroptical spectroscopies to see if they match. It turns out that this method appears never to have been applied to Steganone, and here I attempt to rectify this.
First, let us compute the optical rotation. The (aR,R,R) stereoisomer is also known as (-)-Steganone, because the measured specific rotation is [α]₅₈₉ -170° ± 30.[cite]10.1039/P19820000521[/cite] It is computed (MN12L/6-311++G(d,p)/SCRF=chloroform) as -240°, [α]365 -2251[cite]10.14469/ch/189647[/cite]. The other atropisomer (aS,R,R) is computed to be 4.5 kcal/mol higher in free energy with [α]₅₈₉ +408°[cite]10.14469/ch/189646[/cite], and measured as +150.[cite]10.1039/P19820000521[/cite] There is some uncertainty in the computed values, since the rotations can be dependent on the conformation not only of the rings, but the substituents. You might imagine that the conformation of eg a -OMe group is unimportant, but this is not so. In this case, I have used a crystal structure of a related species to serve as the start point for optimising the MeO conformations. The greater mismatch between computation and experiment for the (aS,R,R) stereoisomer probably needs an exploration of more conformations of the -OMe groups. At least in both cases the signs match between computation and measurement.

Next, the electronic circular dichroism (ECD), which has also been measured[cite]10.1039/P19820000521[/cite] for the (aR,R,R) isomer as Δε 201nm (-ve Cotton effect), 218 (+ve), 244 (-ve), 276 (+ve) 304 (-ve) and 337 (-ve). Bearing in mind that the baselines in ECD spectra are notoriously difficult to define (moving it up or down can easily invert a Cotton effect), the agreement with the calculated spectrum MN12L/6-311++G(d,p)/SCRF=chloroform, nstates=200)[cite]10.14469/ch/189649[/cite] might seem reasonable, although the calculated version has more peaks in the region 225-265 than are reported (e.g. 235, +ve, 265, -ve).

The (aS,R,R) isomer seems a less good fit. The +ve peak at 218 is missing, the +ve 276 peak matches better than the other isomer, but the 337nm peak is again the wrong sign.

Of course, in such a game it may be the DFT functional used for the simulation that itself might be misleading, MN12L in this case. Just to check, I also include the results using M062X[cite]10.14469/ch/189657[/cite] to see how variable these simulations might be. The measured peaks at 201, 218, 244 and 337nm match, but the ones at 276 and 304nm do not.

Although matching computed with measured ECD spectra is commonly used to assign absolute configurations of molecules, you can see from these results that the technique is not a cast iron one! Even scanning through myriad DFT procedures to find the one that fits best is probably not a complete solution either. Can anything be done to further increase confidence?

How about Vibrational Circular Dichroism (VCD) predictions?[cite][/cite],[cite]10.14469/ch/189651[/cite]. Like ECD, VCD is also sensitive to conformation, which is why some modern instruments have low temperature probes operating at close to 0K which strive to capture only a single lowest energy conformation (although of course in any simulation, you have to identify that conformation reliably!). At some stage in the future, the VCD spectra of steganone might indeed be measured, and hence compared with the below. It might serve to increase confidence in the chiroptical methods as a means of assigning configuration.

We might conclude from this short exploration of chiroptical spectroscopy that no one single measured or computed value can be absolutely definitive; rather it is the accumulation from various sources that builds up the case for a particular configuration. But at least the above simulations do serve to add some useful additional data for the record.
February 10, 2015
A convincing example of the need for data repositories. FAIR Data.
Derek Lowe in his In the Pipeline blog is famed for spotting unusual claims in the literature and subjecting them to analysis. This one is entitled Odd Structures, Subjected to Powerful Computations. He looks at this image below, and finds the structures represented there might be a mistake, based on his considerable experience of these kinds of molecules. I expect he had a gut feeling within seconds of seeing the diagram.

Indeed, so, you will now find that the authors have apparently acknowledged a mistake[cite]10.1038/nchem.1821[/cite]. My interest piqued, I went to the article, and immediately tracked down the supplementary information. Surely, if these molecules had been subjected to powerful computation, this supporting information should contain coordinates of some kind that would allow a correlation with the 2D structural representation shown above. I have just returned from FORCE2015, a three-day event in Oxford. From the detailed agenda, you can see that a lot of the conference centered around what is called FAIR Data. FAIR stands for:
1. Findable
2. Accessible
3. Interoperable
4. Re-usable
So I then set out to find if the supplementary information WAS FAIR. Well, check for yourself (unlike the narrative article, the data should be accessible outside of the paywall, i.e. you should not need a subscription to access it). It is certainly big, running out to 45 pages, in the form of a paginated PDF file (the norm). The table of contents does not refer to data as such, but it does quote 25 figures, from which you might just be able to extract some data. But no molecules as such! So:
1. No data is findable, although the PDF which might contain it is reasonably so.
2. The data is not easily accessible,
3. let alone interoperable (thus many of the charts were probably created using spreadsheet software, but the source files for these are not available),
4. and not-reusable (certainly not without loss and possible error in any attempt at capture).
I think it fair to say that the data for these powerful computations are not FAIR. Had we had at least some coordinates (the computations involved molecular mechanics based dynamics simulations, which certainly involve manipulating atom coordinates in some form) then the structures shown in the figure above could be checked, and perhaps even the apparent error would have been quickly spotted.

Derek does not make the point about FAIR data (to be fair, he was not at FORCE2015) and so I will make the case. If you are reporting a computational model or simulation, there is no excuse for not supplying FAIR data to accompany it. If the data is FAIR it will be inter-operable and re-usable. And this will instantly allow anyone to check e.g. the structures above. You would not need to have Derek’s vast experience and instinct (although having it is also helps). And of course we might presume that there were 2-3 referees that also looked at the article, and presumably none of them requested FAIR data.

Oh, if you are interested in my take on FAIR data, I gave a talk about that at FORCE2015, which you are welcome to view; I hope it constitutes a FAIR talk!

Acknowledgments

This post has been cross-posted in PDF format at Authorea.
January 15, 2015
Computationally directed synthesis: 2,3-dimethyl-2-butene + NO(+).

In the previous posts, I explored reactions which can be flipped between two potential (stereochemical) outcomes. This triggered a memory from Alex, who pointed out this article from 1999[cite]10.1070/MC1999v009n02ABEH000995[/cite] in which the nitrosonium cation as an electrophile can have two outcomes A or B when interacting with the electron-rich 2,3-dimethyl-2-butene. NMR evidence clearly pointed to the π-complex A as being formed, and not the cyclic nitrosonium species B (X=Al₄^–). If you are wondering where you have seen an analogy for the latter, it would be the species formed when bromine reacts with an alkene (≡ Br⁺, X=Br^– or Br₃^–). The two structures are shown below[cite]10.1070/MC1999v009n02ABEH000995[/cite] Since the topic that sparked this concerned pericyclic reactions, it seemed possible that if it had been formed, species B would immediately undergo a pericyclic electrocyclic reaction to form the rather odd-looking cation C, which might then be trapped by eg X(-) to form the nitrone D. So this post is an exploration of what happens when X-NO (X= CF₃COO, trifluoracetate) interacts with 2,3-dimethyl-2-butene, as an illustration of what can be achieved nowadays from about 2 days worth of dry-lab computation as a prelude to e.g. an experiment in the wet-lab (it would take a little more than two days to achieve the latter I suspect). Hence computationally directed synthesis. The model is set up as ωB97XD/6-311G(d,p)/SCRF=chloroform. A transition state is located[cite]10.14469/ch/24979[/cite] and the resulting IRC (below) [cite]10.6084/m9.figshare.1162797[/cite] does not quite have the outcome the above scheme would suggest. Neither A nor B is formed; instead it is the tetrahedral species E, which is ~15 kcal/mol endothermic. I should immediately point out that this is not inconsistent with the formation of A as previously characterised[cite]10.1070/MC1999v009n02ABEH000995[/cite]. That is because this experiment was conducted with a non-nucleophilic counter-anion (X=Al₄^–), whereas in the computational simulation above, we have a nucleophilic anion (X= CF₃CO₂^–). What a difference the inclusion of a counter-ion in the calculation can have! The barrier however (~35 kcal/mol) is a little too high for a facile thermal reaction. In the second of this two-stage reaction, E now ring-opens to form the anticipated D[cite]10.6084/m9.figshare.1162676[/cite] with quite a small barrier of ~6 kcal/mol, but a highly exothermic outcome. I ask this question about it; can this still be described as a pericyclic process? (there is some analogy to the electrocyclic ring opening of a cyclopropyl tosylate). So what are the conclusions? Well, because of the rather high initial barrier, the alkene will need activation (by electron donating substituents, perhaps OMe) for the reaction to become more viable. But if it works, it could be an interesting synthesis of nitrones (I have not yet searched to find out if the reaction is actually known).

September 6, 2014
Thalidomide. The role of water in the mechanism of its aqueous racemisation.
Thalidomide is a chiral molecule, which was sold in the 1960s as a sedative in its (S,R)-racemic form. The tragedy was that the (S)-isomer was tetragenic, and only the (R) enantiomer acts as a sedative. What was not appreciated at the time is that interconversion of the (S)- and (R) forms takes place quite quickly in aqueous media. Nowadays, quantum modelling can provide good in-silico estimates of the (free) energy barriers for such processes, which in this case is a simple keto-enol tautomerism. In a recently published article[cite]10.1002/chem.201202651[/cite], just such a simulation is reported. By involving two explicit water molecules in the transition state, an (~enthalpic) barrier of 27.7 kcal/mol was obtained. The simulation was conducted just with two water molecules acting as solvent, and without any additional continuum solvation applied. So I thought I would re-evaluate this result by computing it at the ωB97XD/6-311G(d,p)/SCRF=water level (a triple-ζ basis set rather than the double-ζ used before[cite]10.1002/chem.201202651[/cite]), and employing a dispersion-corrected DFT method rather than B3LYP.

Keto-enol tautomerisation occupies a unique position in the history of mechanistic chemistry[cite]10.1002/jlac.18892500306[/cite]. In 1889, Beckmann got the whole field rolling by proposing an inferred enol intermediate (which he did not observe) to explain the isomerism of menthone to iso-menthone in conc. sulfuric acid. In modelling the enolisation of thalidomide, I have used both implicit and explicit solvents acting in a self-consistent manner. This approach is not yet much adopted in the wider literature[cite]10.1021/jo100920e[/cite]. I have deployed it extensively in this blog as an encouragement to others (selected examples are listed at the bottom of this post). It is worth noting at the outset that the transition state reported previously[cite]10.1002/chem.201202651[/cite] has a computed dipole moment of ~10D. My experience[cite]10.1021/jo100920e[/cite] suggests that any geometry with a dipole moment of this magnitude (or greater) is likely to relax when placed into a continuum field, and this relaxation becomes an important perturbation of both the computed geometry of the transition state and the intrinsic reaction coordinate profile computed from that starting point.

The (re)computed geometry of the aqueous transition state for enolisation of thalidomide is shown below, and for which the entropy-corrected ΔG^†₂₉₈ is 31.0 kcal/mol (the barrier for the prototypical enolisation of propanone is computed as 34.4 kcal/mol). The value in the literature[cite]10.1002/chem.201202651[/cite] is given as 27.7 kcal/mol for the zero-point-energy corrected total energy barrier, but this value notably does NOT include any entropic corrections. The measured literature value for ΔG^†₂₉₈is reported as 24.3 kcal/mol at pH 8, a value which probably also includes contributions from both the pure water catalysed route and those from hydroxide anion catalysis (see below). At this point, I should remind that the free energy of activation for a bi- or termolecular reaction in solution must be obtained by correcting the value obtained for a standard state of 1 atmosphere (the state used for the value quoted above). According to Alvarez-Idaboy and co-workers[cite]10.1021/ol060261z[/cite], this amounts to a total correction of -4.5 kcal/mol for a bimolecular reaction, and -8.73 kcal/mol for a termolecular reaction. Where one of the components of a termolecular reaction is also the solvent, these corrections probably need to be themselves reduced. But this does achieve a reduction in the computed value of 31.0 kcal/mol to something quite close to the experimental value!

Aqueous transition state for enolisation of thalidomide. Click for 3D.

Next, I want to consider the base-catalysed enolisation pathway. As with the reaction of dichlorobuteneone with tolyl-thiolate about which I wrote in another post, the authors of the thalidomide study[cite]10.1002/chem.201202651[/cite] modelled this route by introducing a solvated hydroxide anion, “OH^–·H₂O” into the structure without any accompanying counter-ion. In other words, their total system has an overall negative charge. I argued before, and I argue again here, that there is no real need to have to do this. Why not for example introduce NaOH•H₂O instead? One might argue that the cationic counter-ion so introduced cannot be properly modelled, but the combination of explicit first-sphere water molecules coupled with a continuum model actually handles these counter-ions reasonably well. So may I introduce you to my version of the base-catalysed reaction, involving a contact-ion-pair:

Base-catalysed (NaOH) enolisation of thalidomide. Click for 3D.

This has ΔG^†₂₉₈ 4.7 kcal/mol, much lower than the neutral water catalysed reaction. This value is of course for a standard state for [Na⁺OH^–] (1 atm). At pH 8, [OH^–] is at least six orders of magnitude less, which may rationalise why the experimental rate is so much slower than this barrier might imply. The IRC corresponds to proton transfer.

I would like to end by noting that many mechanisms which would otherwise involve the development of charge-separation may well borrow a protic solvent molecule in the manner shown here to reduce the degree of charge-separation needed. Further examples of this are listed below.
November 10, 2012
Secrets revealed for conjugate addition to cyclohexenone using a Cu-alkyl reagent.

The text books say that cyclohexenone A will react with a Grignard reagent by delivery of an alkyl (anion) to the carbon of the carbonyl (1,2-addition) but if dimethyl lithium cuprate is used, a conjugate 1,4-addition proceeds, to give the product B shown below. The standard explanation is that the alkyl copper is a “soft” nucleophile attacking the soft conjugate carbon, whereas the alkyl magnesium is a “hard” nucleophile attacking the hard carbonyl carbon. Is this the best explanation?

In 2007, one of those wonderfully simple experiments was done[cite]10.1021/ja067533d[/cite]. The dimethyl lithium cuprate reagent (dissolved in THF) was injected into an NMR sample tube at -100°C containing A, and the spectrum measured immediately. The species identified as 4 (the numbering as used in the reference) has two ¹H methyl resonances[cite]10.1021/ja027744s[/cite] at ~ -0.04 to – 0.23 ppm (assigned to Me_β) and -1.08 to -1.11ppm (assigned to Me_α), and the copper coordinates to the alkene as a π-complex. If TMS cyanide is added, 4 is immediately converted to complex 1, in which the π-complex is replaced by a simple C-Cu σ-bond. Compound 4 upon heating gives B, whilst 1 gives the silyl enol ether of B.

How does this match quantum simulation[cite]10.1021/ja0675346[/cite]? First, the ¹H NMR result for 4 (at the wB97XD/6-31G(d,p)/SCRF=THF level and with the lithium coordinated to an ether solvent) comes out as -1.4 ppm (Me_α) and -0.31 ppm (Me_β). The ¹³C is 76.4 and 60.6 ppm for the vinyl carbons (positions 3 and 4, obs) and 64.5/56.7 (calc). These latter values are affected by spin-orbital coupling to the metal, which can shift the values by up to about 10 ppm[cite]10.1021/np0705918[/cite], but the relative values are also in good agreement. So the reaction must proceed starting from this π-copper complex.

The IRC reveals a concerted transfer of Me_β to the conjugate 4-position of B, with a reasonable barrier to reaction which indicates that on warming to room temperatures, the complex 4 will readily react. Formally at least, this corresponds to reductive elimination from the Cu(III) species to form a Cu(I) complex (in which however the metal now coordinates to the enol double bond rather than the alkene).

IRC for methyl transfer. Click for 3D transition state.

I will deal with the case of methyl transfer from 1 in a later post. With 4, we can directly see that the origins of conjugate 1,4-addition an α,β-unsaturated ketone are that the Cu reagent forms a π-complex to the alkene, which positions one of the alkyl groups on the metal in the ideal position to attack in conjugate manner. Regarding the different behaviour of the magnesium Grignard reagent, it boils down to asking why it does NOT form a π-complex in this situation (I would note here that Mg-π-complexes are indeed known).

November 4, 2012
Dynamic effects in nucleophilic substitution at trigonal carbon (with Na+).

In the preceding post, I described a fascinating experiment and calculation by Bogle and Singleton, in which the trajectory distribution of molecules emerging from a single transition state was used to rationalise the formation of two isomeric products 2 and 3. In the present post, I explore possible consequences of including a sodium cation (X=Na⁺ below) in the computational model.

Sitting down to construct such a model, one is immediately faced with important decisions. Na⁺ comes with baggage, namely groupies in the form of solvent molecules and ionic bonding. The latter means less certainty regarding where to place the ion (covalent bonds have that nice attribute that their orientation and length is pretty predictable most of the time). I decided to construct the model shown below, using not one Na⁺ but two (such structures are known from the Cambridge crystal data base), the second Na⁺ being charge balanced by hydroxide anion.

The resulting transition state (B3LYP/6-31+G(d,p)/CPCM=ethanol) is shown below, and the free energy activation barrier, ΔG^† is 11.7 kcal/mol, well down on the value obtained using X=H⁺, and entirely reasonable for a reaction occurring at room temperature. This suggests that the model is not unreasonable (but of course does not prove it is the best).

The geometry of this transition state is significant. Of the two C-Cl bond lengths, the shorter (click the image above to inspect the model) is the one cis to the carbonyl (subsequent elimination of which would result in formation of the major product 2). But an IRC reveals what happens next. Recollect that when X=H⁺ a tetrahedral intermediate is formed that then collapses with elimination of H₃O⁺Cl^–. This time, no intermediate is seen on the IRC, and the requisite C-Cl bond is broken to form 2 in a concerted (but very asynchronous) manner, and in the manner reported by Bogle and Singleton for a model without counterion and explicit solvent.

Notice how preparation for eviction of the C-Cl bond only starts after the transition state is passed. The forces on the departing chloride start to grow after the dihedral angle of the Ar-S-C-Cl system has become antiperiplanar (IRC -3), resulting in the anion shooting out towards one of the two Na⁺ cations to form solvated NaCl.

So we now have a rather more complete model. But is it yet complete enough? How would one go about evicting the other chloride, resulting in formation of 3? I think it is fairly clear that the model will have to be enlarged yet again, this time to include at least one more Na⁺ located on the other side of the carbonyl, and ready to receive the anion. Possibly at least another two water molecules and one hydroxide anion would be required to surround this cation. Clearly, such a model would have grown substantially compared to the original one (Occam might not be happy), and that we are gradually edging towards having two quite separate transition state models to account for each of 2 and 3. At this stage, it would be interesting to apply Bogle and Singleton‘s direct dynamics model to try to establish if each transition state leads to only one product, or whether either of these transition states could result in cross-over to the other product.

I have no feel for whether the transition state presented here can be treated using direct dynamics; if it could, that would indeed be an interesting simulation.

July 19, 2012
Computers 1967-2011: a personal perspective. Part 4. Moore’s Law and Molecules.

Moore’s law describes a long-term trend in the evolution of computing hardware, and it is often interpreted in terms of processing speed. Here I chart this rise in terms of the size of computable molecules. By computable I mean specifically how long it takes to predict the geometry of a given molecule using a quantum mechanical procedure.

LSD, the 1975 benchmark for computable molecules.

The geometry (shape) of a molecule is defined by 3N-6 variables, where N is the number of atoms it contains. Optimising the value of variables in order to obtain the minimum value of a function was first conducted by chemical engineers, who needed to improve the function of chemical reactor plants. The mathematical techniques they developed were adopted to molecules in the 1970s, and in 1975 a milestone was reached with the molecule above. Here, N=49, and 3N-6=141. The function used was one describing its computed enthalpy of formation, using a quantum mechanical procedure known as MINDO/3. The computer used was what passed then for a supercomputer, a CDC 6600 (of which a large well endowed university could probably afford one of). It was almost impossible to get exclusive access to such a beast (its computing power was shared amongst the entire university, in this case of about 50,000 people), but during a slack period over a long weekend, the optimised geometry of LSD was obtained (it’s difficult to know how many hours the CDC 6600 took to perform this feat, but I suspect it might have been around 72). The result was announced by Paul Weiner to the group I was then part of (the Dewar research group), and Michael immediately announced that this deserved an unusual Monday night sojourn to the Texas Tavern, where double pitchers of beer would be available. You might be tempted to ask what the reason for the celebration was. Well, LSD was a “real molecule” (and not a hallucination). It meant one could predict for the first time the geometry of realistic molecules such as drugs and hence be taken seriously by people who dealt with molecules of this size for a living. And if you could predict the energy of its equilibrium geometry, you could then quickly move on to predicting the barriers to its reaction. A clear tipping point had been reached in computational simulation.

In 1975, MINDO/3 was thought to compute an energy function around 1000 to 10,000 faster than the supposedly more accurate ab initio codes then available (in fact you could not then routinely optimise geometries with the common codes of this type). With this in mind, one can subject the same molecule to a modern ωB97XD/6-311G(d,p) optimisation. This level of theory is probably closer to 10⁴ to 10⁵ times slower to compute than MINDO/3. On a modest “high performance” resource (which nowadays runs in parallel, in fact on 32 cores in this case), the calculation takes about an hour (starting from a 1973 X-ray structure, which turns out to be quite a poor place to start from, and almost certainly poorer than the 1975 point). In (very) round numbers, the modern calculation is about a million times faster. Which (coincidentally) is approximately the factor predicted by Moore’s law.

I will give one more example, this time for an example dating from around 2003, 28 years on from the original benchmark.

Transition state for lactide polymerisation.

This example has 114 atoms, and hence 3N-6 =336, or 2.42 times the 1975 size. It is a transition state, which is a far slower calculation then an equilibrium geometry. It is also typical of the polymerisation chemistry of the naughties. Each run on the computer (B3LYP/6-31G(d), with the alkyl groups treated at STO-3G) now took about 8-10 days (on a machine with 4 cores), and probably 2-4 runs in total would have been required per system (of which four needed to be studied to derive meaningful conclusions). Let us say 1000 hours per transition state. Together with false starts etc, the project took about 18 months to complete. Move on to 2010; added to the model was a significantly better (= slower) basis set and a solvation correction, and a single calculation now took 67 hours. In 2011, it would be reduced to ~10 hours (by now we are up to 64-core computers).

In 2011, calculations involving ~250 atoms are now regarded as almost routine, and molecules with up to this number of atoms cover most of the discrete (i.e. non repeating) molecular systems of interest nowadays. But the 1975 LSD calculation still stands as the day that realistic computational chemistry came of age.

October 28, 2011

► Necessary Cookies Always Active

Necessary cookies enable essential site features like secure log-ins and consent preference adjustments. They do not store personal data.

► Functional Cookies Remark

Functional cookies support features like content sharing on social media, collecting feedback, and enabling third-party tools.

► Analytical Cookies Remark

Analytical cookies track visitor interactions, providing insights on metrics like visitor count, bounce rate, and traffic sources.

► Advertisement Cookies Remark

Advertisement cookies deliver personalized ads based on your previous visits and analyze the effectiveness of ad campaigns.