Posts Tagged ‘3g’

Computers 1967-2011: a personal perspective. Part 4. Moore’s Law and Molecules.

Friday, October 28th, 2011

Moore’s law describes a long-term trend in the evolution of computing hardware, and it is often interpreted in terms of processing speed. Here I chart this rise in terms of the size of computable molecules. By computable I mean specifically how long it takes to predict the geometry of a given molecule using a quantum mechanical procedure.

LSD, the 1975 benchmark for computable molecules.

The geometry (shape) of a molecule is defined by 3N-6 variables, where N is the number of atoms it contains. Optimising the value of variables in order to obtain the minimum value of a function was first conducted by chemical engineers, who needed to improve the function of chemical reactor plants. The mathematical techniques they developed were adopted to molecules in the 1970s, and in 1975 a milestone was reached with the molecule above. Here, N=49, and 3N-6=141. The function used was one describing its computed enthalpy of formation, using a quantum mechanical procedure known as MINDO/3. The computer used was what passed then for a supercomputer, a CDC 6600 (of which a large well endowed university could probably afford one of). It was almost impossible to get exclusive access to such a beast (its computing power was shared amongst the entire university, in this case of about 50,000 people), but during a slack period over a long weekend, the optimised geometry of LSD was obtained (it’s difficult to know how many hours the CDC 6600 took to perform this feat, but I suspect it might have been around 72). The result was announced by Paul Weiner to the group I was then part of (the Dewar research group), and Michael immediately announced that this deserved an unusual Monday night sojourn to the Texas Tavern, where double pitchers of beer would be available. You might be tempted to ask what the reason for the celebration was. Well, LSD was a “real molecule” (and not a hallucination). It meant one could predict for the first time the geometry of realistic molecules such as drugs and hence be taken seriously by people who dealt with molecules of this size for a living. And if you could predict the energy of its equilibrium geometry, you could then quickly move on to predicting the barriers to its reaction. A clear tipping point had been reached in computational simulation.

In 1975, MINDO/3 was thought to compute an energy function around 1000 to 10,000 faster than the supposedly more accurate ab initio codes then available (in fact you could not then routinely optimise geometries with the common codes of this type). With this in mind, one can subject the same molecule to a modern ωB97XD/6-311G(d,p) optimisation. This level of theory is probably closer to 104 to 105 times slower to compute than MINDO/3. On a modest “high performance” resource (which nowadays runs in parallel, in fact on 32 cores in this case), the calculation takes about an hour (starting from a 1973 X-ray structure, which turns out to be quite a poor place to start from, and almost certainly poorer than the 1975 point). In (very) round numbers, the modern calculation is about a million times faster. Which (coincidentally) is approximately the factor predicted by Moore’s law.

I will give one more example, this time for an example dating from around 2003, 28 years on from the original benchmark.

Transition state for lactide polymerisation.

This example has 114 atoms, and hence 3N-6 =336, or 2.42 times the 1975 size. It is a transition state, which is a far slower calculation then an equilibrium geometry. It is also typical of the polymerisation chemistry of the naughties. Each run on the computer (B3LYP/6-31G(d), with the alkyl groups treated at STO-3G) now took about 8-10 days (on a machine with 4 cores), and probably 2-4 runs in total would have been required per system (of which four needed to be studied to derive meaningful conclusions). Let us say 1000 hours per transition state. Together with false starts etc, the project took about 18 months to complete. Move on to 2010; added to the model was a significantly better (= slower) basis set and a solvation correction, and a single calculation now took 67 hours. In 2011, it would be reduced to ~10 hours (by now we are up to 64-core computers).

In 2011, calculations involving ~250 atoms are now regarded as almost routine, and molecules with up to this number of atoms cover most of the discrete (i.e. non repeating) molecular systems of interest nowadays. But the 1975 LSD calculation still stands as the day that realistic computational chemistry came of age.

Computers 1967-2011: a personal perspective. Part 4. Moore's Law and Molecules.

Friday, October 28th, 2011

Moore’s law describes a long-term trend in the evolution of computing hardware, and it is often interpreted in terms of processing speed. Here I chart this rise in terms of the size of computable molecules. By computable I mean specifically how long it takes to predict the geometry of a given molecule using a quantum mechanical procedure.

LSD, the 1975 benchmark for computable molecules.

The geometry (shape) of a molecule is defined by 3N-6 variables, where N is the number of atoms it contains. Optimising the value of variables in order to obtain the minimum value of a function was first conducted by chemical engineers, who needed to improve the function of chemical reactor plants. The mathematical techniques they developed were adopted to molecules in the 1970s, and in 1975 a milestone was reached with the molecule above. Here, N=49, and 3N-6=141. The function used was one describing its computed enthalpy of formation, using a quantum mechanical procedure known as MINDO/3. The computer used was what passed then for a supercomputer, a CDC 6600 (of which a large well endowed university could probably afford one of). It was almost impossible to get exclusive access to such a beast (its computing power was shared amongst the entire university, in this case of about 50,000 people), but during a slack period over a long weekend, the optimised geometry of LSD was obtained (it’s difficult to know how many hours the CDC 6600 took to perform this feat, but I suspect it might have been around 72). The result was announced by Paul Weiner to the group I was then part of (the Dewar research group), and Michael immediately announced that this deserved an unusual Monday night sojourn to the Texas Tavern, where double pitchers of beer would be available. You might be tempted to ask what the reason for the celebration was. Well, LSD was a “real molecule” (and not a hallucination). It meant one could predict for the first time the geometry of realistic molecules such as drugs and hence be taken seriously by people who dealt with molecules of this size for a living. And if you could predict the energy of its equilibrium geometry, you could then quickly move on to predicting the barriers to its reaction. A clear tipping point had been reached in computational simulation.

In 1975, MINDO/3 was thought to compute an energy function around 1000 to 10,000 faster than the supposedly more accurate ab initio codes then available (in fact you could not then routinely optimise geometries with the common codes of this type). With this in mind, one can subject the same molecule to a modern ωB97XD/6-311G(d,p) optimisation. This level of theory is probably closer to 104 to 105 times slower to compute than MINDO/3. On a modest “high performance” resource (which nowadays runs in parallel, in fact on 32 cores in this case), the calculation takes about an hour (starting from a 1973 X-ray structure, which turns out to be quite a poor place to start from, and almost certainly poorer than the 1975 point). In (very) round numbers, the modern calculation is about a million times faster. Which (coincidentally) is approximately the factor predicted by Moore’s law.

I will give one more example, this time for an example dating from around 2003, 28 years on from the original benchmark.

Transition state for lactide polymerisation.

This example has 114 atoms, and hence 3N-6 =336, or 2.42 times the 1975 size. It is a transition state, which is a far slower calculation then an equilibrium geometry. It is also typical of the polymerisation chemistry of the naughties. Each run on the computer (B3LYP/6-31G(d), with the alkyl groups treated at STO-3G) now took about 8-10 days (on a machine with 4 cores), and probably 2-4 runs in total would have been required per system (of which four needed to be studied to derive meaningful conclusions). Let us say 1000 hours per transition state. Together with false starts etc, the project took about 18 months to complete. Move on to 2010; added to the model was a significantly better (= slower) basis set and a solvation correction, and a single calculation now took 67 hours. In 2011, it would be reduced to ~10 hours (by now we are up to 64-core computers).

In 2011, calculations involving ~250 atoms are now regarded as almost routine, and molecules with up to this number of atoms cover most of the discrete (i.e. non repeating) molecular systems of interest nowadays. But the 1975 LSD calculation still stands as the day that realistic computational chemistry came of age.

Bio-renewable green polymers: Stereoinduction in poly(lactic acid)

Saturday, July 24th, 2010

Lactide is a small molecule made from lactic acid, which is itself available in large quantities by harvesting plants rather than drilling for oil. Lactide can be turned into polymers with remarkable properties, which in turn degrade down easily back to lactic acid. A perfect bio-renewable material!

Lactide

The starting point for ring opening polymerisation is racemic lactide, or rac-LA. This is an equal mixture of the R,R and S,S enantiomers, and it is now treated with a catalyst based on a metal M. If M=Mg, there is a rather remarkable stereochemical outcome for the resulting polymer. The catalyst selects alternating enantiomers for the assembly, resulting in a chain (R,R),(S,S),(R,R),(S,S), etc, the name for which is a heterotactic polymer. It could instead have created a blend of equal proportions of (R,R),(R,R),(R,R) and (S,S),(S,S),(S,S) which is an isotactic polymer. Needless to say, these two polymers have quite different properties, and it very much matters which is formed. Without such a catalyst, a random atactic polymer is created rather than a stereoregular arrangement.

Poly (lactic acid)

The question is how does the catalyst manage to assemble the polymer with such stereoinduction? The origins of this depend on a detailed understanding of the mechanism of the reaction, and in 2005 we suggested one which offered an explanation for the stereospecificity (see E. L. Marshall, V. C. Gibson, and H. S. Rzepa, DOI: 10.1021/ja043819b and an interactive storyboard).

Mechanism for stereoregular polymerisation

The key features of this rational were:

  1. Two possible transition states may control the reaction, TS1 and TS2. Which one depends on which is the higher in energy.
  2. The smallest model for this process involves loading two molecules of lactide onto the catalyst. The first has already been ring opened, and will control the stereochemistry of the second, which is the one suffering the ring opening bond formations/breakings shown above (the first is lurking in the group R).
  3. This leads to four different possibilities, (R,R)-(R,R)*, (S,S)-(S,S)*, (R,R)-(S,S)*, and (S,S)-(R,R)* (where the * denotes the reacting lactide, as in the diagram above). These are all diastereomers, and hence will be different in energy. If one of the first two is the lowest, then isotactic polymer will result; if the latter two then a heterotactic polymer.

Back in 2004, we had constructed a model based on B3LYP and of necessity a mixed basis set, being 6-311G(3d) on the Mg, 6-31G on the lactide and only STO-3G on the catalyst. This was done because the complete system was actually rather large. Even so, a transition state calculation would regularly take at least 10 days to find using the fastest computers available to us at that time. Using this procedure, we found that the rate limiting kinetic step  was in fact TS2 for all four possibilities noted above. Of these, the (R,R)-(S,S) transition state turned out to represent the lowest energy pathway, thus confirming the observed heterotacticity for this particular catalyst.

Well, times have moved on:

  1. Six years later, computers are around 20 times faster! We can now afford to improve the basis set to 6-31G(d,p) on all the atoms, including the catalyst (the Mg stays at 6-311G(3d) however; improving it to 6-311G(3d,2f) makes little difference).
  2. We can now include the solvent (thf) as a continuum field.
  3. In the last five years the B3LYP functional has been shown to underestimate the energies of globular molecules. A modern functional such as ωB97XD, which includes dispersion energy corrections, should be expected to do much better.

It is the purpose of this blog to report an update to the modelling. Quoting relative free energies (including the solvation correction), the results come out as;

  1. (R,R)-(S,S) 0.0 kcal/mol for the TS1 geometry (see DOI: 10042/to-4950)
  2. (S,S)-(S,S) 1.8 for the TS2 geometry
  3. (S,S)-(R,R) 5.5 for the TS1 geometry
  4. (R,R)-(R,R) 9.1 for the TS1 geometry.

Well, there are surprises! Using the gas phase B3LYP model the key transition state was TS2; now its TS1 (for in fact three of the four possible transition states). The bottom line (almost) is that the same stereoisomer as before comes out the winner! The take home lesson is that in six years of progress, modelling can now encompass solvent and dispersion corrections. Many mechanisms with > ~100 atoms investigated in the past without inclusion of these effects could probably do with a re-investigation, especially if the transition states are “globular” in nature. Any by now you are probably wondering what the transition state looks like. Well, here it is (and see it in all its glory by clicking on the diagram below).

(R,R)-(S,S) Transition state for stereoregular lactide polymerisation. Click for animation

And if you are also wondering how one might proceed to analyse the origins of the stereoinduction, the NCI interaction surfaces (as described in this post) are shown below. Note how the extensive degree of green interaction surface is associated with the globular nature referred to above.

Non-covalent interaction (NCI) surfaces for the (R,R)-(S,S) transition state. Click for 3D

The conformational analysis of cyclo-octane

Sunday, January 31st, 2010

In the previous post, I suggested that inspecting the imaginary modes of planar cyclohexane might be a fruitful and systematic way in which at least parts of the conformational surface of this ring might be probed. Here, the same process is conducted for cyclo-octane. The ring starts with planar D8h symmetry, and at this geometry (B3LYP/6-311G(d,p), DOI: 10042/to-3742) five negative force constants (corresponding to imaginary modes) are calculated. The most negative is non-degenerate, and gives directly the crown conformation of D4d symmetry (DOI: 10042/to-3738). The remaining four modes comprise two degenerate pairs. Following either of the E2u eigenvectors downhill leads to another conformation, D2d (DOI: 10042/to-3741), with a geometry which is noteworthy for exhibiting a pair of unusually close non-bonded H…H contacts (1.908Å). This value is about  0.3Å shorter than the sum of the Wan der Waals radii (DOI: 10.1021/jp8111556). We can debate whether such a close approach or inter-penetration of two hydrogens is a bond or not (an AIM analysis appears at the bottom of this post).

D8h, +82.8 kcal/mol
Follow B2u 467i Follow E3g 404i Follow E2u 230i
to D4d +0.8 to Ci 131i (Au), +7.5 to D2d +3.6

B2u

E3g

E2u

Cs 0.0 C2 +1.6

Following the remaining E3g mode leads to a stationary point of Ci symmetry (DOI: 10042/to-3743). This is a valley-ridge potential, since this point turns out to be a transition state itself, and following the Au imaginary mode at this point results in another, this time stable conformation, of chiral C2 symmetry (DOI: 10042/to-3744). This has a calculated optical rotation [α]D of 72° (at 589nm in chloroform).

Are these three conformations all there are? Well, a thorough analysis of the conformational space has in fact identified six minima (DOI: 10.1002/(SICI)1096-987X(19980415)19:5<524::AID-JCC5>3.0.CO;2-O), of which the most stable has Cs symmetry (the so-called chair-boat conformation, and the one most frequently found in crystal structures of cyclo-octanes). Where is that one in the above analysis? It arrives by a distortion of the D4d form (DOI: 10042/to-3747) via a transition state of no symmetry (DOI: 10042/to-3752)

Whilst the full potential surface clearly has many more features, following the modes of the planar conformation of cyclo-octane is a simple and rapid way of establishing four of the six limiting stable conformations (the two remaining forms have  D2 and S4 symmetry, see DOI 10.1016/0166-1280(88)80008-3).

AIM analysis of D2d cyclo-octane.

Finally as promised, the AIM analysis of the D2d conformer (above). The ρ(r) value at the interesting H…H critical point is 0.015, which is pretty high in comparison to most normal hydrogen bonds, and would be conventionally taken to indicate attraction. The Laplacian ∇2ρ(r) is +0.05. The “bond” ellipticity ε has a value of 0.29. Single bonds are close to zero, and C=C double bonds are ~0.4, so this is pretty high (see also DOI: 10.1002/anie.200805751).

The two highest C-H stretching vibrations for this conformation are well separated from all the others (ν 3095, 3103 cm-1 for the symmetric A1 and antisymmetric B2 combinations, below for animations). These vibrations serve to both decrease and increase the H…H distances as part of the atomic (harmonic) displacements, and clearly doing so takes more energy than vibrating any of the other C-H bonds. It seems unlikely that the C-H bonds are themselves stronger, so does that mean that the H…H interaction is attractive or is it repulsive? In this context, it is worth noting that the symmetric vibration (both H…H distances decrease/increase at the same time) is lower in wavenumber than the mode which decreases one and increases the other.

A1

B2