Bond length alternation (BLA) in large aromatic rings: an experimental reality check.

September 30th, 2019

The theme of the last three posts derives from the recently reported claimed experimental observation of bond length alternation (BLA) in cyclo[18]carbon, a ring of just 18 carbon atoms.[1] Having found that different forms of quantum calculation seem to find this property particularly difficult to agree upon, not only for cyclocarbon but for twisted lemniscular annulenes (which contain CH rather than just C units), I thought it might be time to look at some more experimental data and my chosen system is a class called the hexaphyrins, of which there are a number of experimental crystal structures.

The general form these molecules take is shown below. Here, QA can be N or S and the hashed bonds are defined as any type. T3 indicates an atom designated as having just three substituents. For each of the six meso carbons to which a non-metal (NM) is attached, a pair of C-C distances is specified as the search variable. These will define any bond length alternation.

A search of the current crystal structure database specifies no errors and no disorder (see FAIR DOI: 10.1021/ja801983d). For any temperature, 64 hits are obtained. Specifying a temperature of < 95K and an R factor of <7.5% reduces the number to 18. The six pairs of distances are then processed for each molecule as follows;

  1. The Abs() operator is chosen and the absolute value of the difference in values between each of the six pairs is calculated, using the minus operator.
  2. Then using the Least() and Greatest() operators, the corresponding values for each of the six differences is calculated.
  3. Finally, a Heat plot of the least vs the greatest values is constructed for each of the molecules.

The result is shown below for the first search

There are two main clusters.

  1. The cluster with the hotspot (red) shows that the minimum and maximum BLA (bond length alternations) at any of the six meso carbon atoms is < 0.01Å.
  2. There is a second more diffuse cluster for which a greater minimum BLA at any meso atom of ~0.045Å and the maximum ~0.10Å are recorded.
  3. The most diffuse cluster is where the minimum distance is 0.01Å but the maximum is again ~0.10Å.

This result suggests that BLA may be sensitive to the nature of the substituents on the ring. Cluster 1 above, with the least BLA, may also arise because the crystal structure is actually the average of two BLA forms. We can reduce the temperature of the measurements to e.g. < 95K to see the effect this might have on any dynamically averaging effect on the distances. A very similar distribution can be seen (below).

Next, an equivalent search for octaphyrins. Although there are fewer examples, the same clustering is seen.

The compound (code EGIHUY, a [26]phyrin, where 26 = 4n+2 = aromatic, DOI: 10.5517/ccrts0b[2]) in the red hot spot in the top diagram, with the smallest BLA was chosen for computation (FAIR data DOI: 10.14469/hpc/6179). This has Ci symmetry and so only three pairs of distances need to be considered.

Click image for 3D model

The results show that both B3LYP and ωB97XD DFT methods actually get pretty good geometries, reproducing the non-bond-alternating characteristics of this hexaphyrin. This in turn suggests that the absence of BLA may be real rather than a crystallographic averaging artefact.

Meso distances, Å abs(Δr)
EGIHUY crystal, Ci symmetry
1.39930 1.40168 0.00248
1.41029 1.39960 0.01069
1.40874 1.39902 0.00972
B3LYP+GD3BJ/Def2-TZVPP
1.40842 1.40058 0.00784
1.39972 1.40963 0.00991
1.40848 1.40614 0.00234
ωB97XD/Def2-TZVPP
1.39636 1.40513 0.00877
1.40117 1.40680 0.00563
1.40730 1.39510 0.0122

What has been achieved? Well, the crystal structures show that whilst some hexaphyrin molecules have no bond alternation, most in fact do. DFT calculations reproduce the lack of bond alternation in one molecule. The next step is to show whether they also reproduce those which do have BLA. The story is not ended yet!

References

  1. K. Kaiser, L.M. Scriven, F. Schulz, P. Gawel, L. Gross, and H.L. Anderson, "An sp-hybridized molecular carbon allotrope, cyclo[18]carbon", Science, vol. 365, pp. 1299-1301, 2019. https://doi.org/10.1126/science.aay1914
  2. J. Sankar, S. Mori, S. Saito, H. Rath, M. Suzuki, Y. Inokuma, H. Shinokubo, K. Suk Kim, Z.S. Yoon, J. Shin, J.M. Lim, Y. Matsuzaki, O. Matsushita, A. Muranaka, N. Kobayashi, D. Kim, and A. Osuka, "Unambiguous Identification of Möbius Aromaticity for<i>meso</i>-Aryl-Substituted [28]Hexaphyrins(1.1.1.1.1.1)", Journal of the American Chemical Society, vol. 130, pp. 13568-13579, 2008. https://doi.org/10.1021/ja801983d

The Kekulé vibration as a function of aromatic ring size. A different perspective using lemniscular rings.

September 27th, 2019

In the previous posts, I tried to track down the onset of bond length alternation (BLA) as a function of ring size in aromatic cyclocarbons, finding the answer varied dramatically depending on the type of method used to calculate it. So here I change the system to an unusual kind of aromatic ring, the leminiscular or figure-eight annulene series. I explore the Kekulé vibration for such species for which a 4n+2 π electron count means they are cyclically Möbius aromatic.[1]

The advantage of using a lemniscular motif compared to the untwisted annulene is that perturbations due to trans-annular steric interactions between inward facing substituents are minimised. Before introducing the results for this type of molecule, I should also explain why the series CnFn is used rather than CnHn. This is because the C-C-H bending vibration is very similar in energy to the Kekulé C-C stretches, causing them to mix significantly and obscure the results. Substitution with F produces “clean” Kekulé modes. You can see this below:

One consequence of introducing two half-twists into the π-system is that this topology gets partitioned into true twist (Tw) and into a different property known as writhe (Wr),[2] the overall effect of which is to reduce the p-π/p-π overlaps of adjacent carbon atoms from suffering two twists to about half this. This in turn may affect the distortive tendency of the π-electrons to induce BLA in the ring.[3]  Let us now see what this change of molecule does to the value (and sign) of the Kekulé vibration. Included are conformations for the larger rings which vary in the total number of trans F-CC-F units in the ring. All the FAIR data for these calculations is at DOI: 10.14469/hpc/6139

Two density functional methods have been used, at opposite ends of the spectrum revealed in the previous posts, together with a reasonable Def2-TZVPP basis set. Each ring size can have different isomers, depending on the total number of transoid motifs present. For smaller rings (n=6, 10), the B3LYP+GD3BJ and ωB97XD functionals give very similar results. By n=18 however, a clear divergence has occurred, with the Kekulé modes being real (+ve force constant) for the former and almost equally imaginary (-ve force constant) for the latter. 

CnFn, n= Kekulé vibration (cm-1)
B3LYP+GD3BJ ωB97XD
6 (0 trans) 1305 1293

10 (2 trans) 1270 1067
10 (4 trans) 1279 1222

14 (2 trans) 1235 1012
14 (4 trans) 1128 -513
14 (6 trans) 1197 -592

18 (2 trans) 974 -987
18 (4 trans) 914 -1091
18 (6 trans) 933 -1232

22 (4 trans) 888 -1309
22 (6 trans) 757 -1811

26 (6 trans) -2140
26 (8 trans) 756 -2229
26 (10 trans) -437 -3296

Overall, the B3LYP+GD3BJ series shows a slow and reasonably regular decline in the value of the Kekulé modes. As the ring sizes gets larger, the double bond configurations of the rings start to become unstable, with C=C bond rotations occuring during geometry optimisations. At n=26 using B3LYP, we seen a sudden change from a real Kekulé mode mode for the 8-trans isomer to an imaginary one for the 10-trans isomer. As noted above, this abrupt change occurs much earlier with the ωB97XD functional at n=14, with a discontinuity between a conformation with two transoid motifs and the ones with four and six. I can offer no explanation (yet) for this strange abrupt onset of an imaginary Kekulé mode, except perhaps to speculate that it might be related to the Tw and Wr partitioning noted above, given that Tand BLA might themselves be related. As with the cyclocarbons, the BLA phenomenon seems to be peculiarly sensitive to the method used to compute it, more so than most other molecular properties.

Something deep and important is clearly happening and the underlying cause does need to be identified. It reminds in one sense also of the discontinuous transition between planar aromatic (bond equal) and  non-planar non-aromatic (BLA)  isomers of 10-π-Dihetero[8]annulenes[4]  Who might have imagined that simple aromatic rings could be so tantalisingly complex!


The first example of this was identified in 2005[5] and is characterised by a topological chiral property known as a linking number or Lk. For lemniscular molecules, Lk = 2π, or in plainer english it contains a double half twist in the π system around the ring. This investigation is a perfect example of the benefits of using a data repository. Many of these species were originally included in an article published in 2009[6] with the calculations being deposited in 2007. So all the starting geometries for the present investigation were quickly obtained from that source. †Bonds rotate to 10-trans isomer.

References

  1. P.L. Ayers, R.J. Boyd, P. Bultinck, M. Caffarel, R. Carbó-Dorca, M. Causá, J. Cioslowski, J. Contreras-Garcia, D.L. Cooper, P. Coppens, C. Gatti, S. Grabowsky, P. Lazzeretti, P. Macchi, . Martín Pendás, P.L. Popelier, K. Ruedenberg, H. Rzepa, A. Savin, A. Sax, W.E. Schwarz, S. Shahbazian, B. Silvi, M. Solà, and V. Tsirelson, "Six questions on topology in theoretical chemistry", Computational and Theoretical Chemistry, vol. 1053, pp. 2-16, 2015. https://doi.org/10.1016/j.comptc.2014.09.028
  2. S.M. Rappaport, and H.S. Rzepa, "Intrinsically Chiral Aromaticity. Rules Incorporating Linking Number, Twist, and Writhe for Higher-Twist Möbius Annulenes", Journal of the American Chemical Society, vol. 130, pp. 7613-7619, 2008. https://doi.org/10.1021/ja710438j
  3. S. Shaik, A. Shurki, D. Danovich, and P.C. Hiberty, "A Different Story of π-DelocalizationThe Distortivity of π-Electrons and Its Chemical Manifestations", Chemical Reviews, vol. 101, pp. 1501-1540, 2001. https://doi.org/10.1021/cr990363l
  4. H.S. Rzepa, and N. Sanderson, "Aromaticity on the edge of chaos: An ab initio study of the bimodal balance between aromatic and non-aromatic structures for 10π-dihetero[8]annulenes", Phys. Chem. Chem. Phys., vol. 6, pp. 310-313, 2004. https://doi.org/10.1039/b312724a
  5. H.S. Rzepa, "A Double-Twist Möbius-Aromatic Conformation of [14]Annulene", Organic Letters, vol. 7, pp. 4637-4639, 2005. https://doi.org/10.1021/ol0518333
  6. C.S. Wannere, H.S. Rzepa, B.C. Rinderspacher, A. Paul, C.S.M. Allan, H.F. Schaefer, and P.V.R. Schleyer, "The Geometry and Electronic Topology of Higher-Order Charged Möbius Annulenes", The Journal of Physical Chemistry A, vol. 113, pp. 11619-11629, 2009. https://doi.org/10.1021/jp902176a

Cyclo[6] and [10]carbon. The Kekulé vibrations compared.

September 3rd, 2019

In the previous post, I looked at the so-called Kekulé vibration of cyclo[18]carbon using various quantum methods and basis sets.  Because some of these procedures can take a very long time,  I could not compare them using the same high-quality consistent atom basis set for the carbon (Def2-TZVPP). Here I try to start to do this using the smaller six and ten carbon rings to see what trends might emerge. FAIR data are at DOI: 10.14469/hpc/6069

Method C-C bond length Kekulé mode, cm-1 Number of -ve force constants
Cyclo[6]carbon
B3LYP+GD3BJ 1.302 1300 2
wB97XD 1.298 1262 2
PBEQIDH 1.302 1332 1
MP2 1.318 1428 0
MP3 1.303 923 0
MP4(SDQ) 1.308 943 0
CCSD 1.308 1020 2
CCSD(T) 1.320 1189 2
Cyclo[10]carbon
B3LYP+GD3BJ 1.282 1334 1
wB97XD 1.279 975 1
PBEQIDH 1.282 1578 0
MP2 1.295 2829 0
MP3 1.283 -1946 1
MP4(SDQ) 1.285 -1003 2
CCSD 1.286 -781 3
CCSD(T) 1.295 1266 2

The conclusions can be summarised as:

  1. For six carbons, all the methods agree that the Kekulé vibration is real (+ve force constant), but there are distinct signs that the MP expansion may  not be fully converged, with  MP2 and MP3 differing significantly
  2. For ten carbons, we already see that  MP3 and MP4 differ from  MP2 in predicting a -ve force constant.

Multi-configuration calculations are problematic with these species.  For C10 for example, 20 electrons (10 π and 10 σ) in an active orbital space of 20 is required; a CASSCF(20,20) is beyond the scope of most quantum programs. And there is a need to evaluate the second derivatives of such a wavefunction in order to get the force constant for the required vibration. 

So the onset of bond length alternation in these cyclo-carbons could be as early as 10 atoms. But until higher level calculations can be performed in a systematic manner on these rings, the jury should perhaps still remain out as to when bond length alternation starts in terms of ring size.

Cyclo[18]carbon: The Kekulé vibration calculated and hence a mystery!

August 30th, 2019

I have discussed the vibration in benzene known as the Kekulé mode in other posts, the first of which was all of ten years ago. It is a stretching mode that lengthens three of the bonds in benzene (a [6]-annulene) and shortens the other three, thus leading to a cyclohexatriene motif (see below). This vibration is real (+ve force constant) in benzene itself, which indicates that distorting the structure from six to three-fold symmetry leads to an increase in energy. Benzene therefore has a symmetrising influence, and it comes as a surprise to most to learn that this is actually due to the σ rather than the π-electrons! But there are good reasons to believe that as the ring size of the annulene increases, the Kekulé vibration will evolve from a real mode into an imaginary (-ve force constant) vibration representing a transition state for mutating the single and double bonds. At some point therefore, the more symmetrical geometry of the annulene in which all the bonds are of equal length will change into one of lower symmetry, in which BLA (bond length alternation) occurs and the symmetrical form becomes a transition state for this process.

With this background, I noticed that a form of [18] annulene in which all the hydrogens have been removed (and is therefore another allotrope of carbon) has recently been synthesized and individual molecules studied on a metal surface using STM (a scanning tunnelling microscope).[1] This allotrope is also of interest as a “double aromatic” molecule, with 4n+2 electron aromaticity arising from both the π and the σ system.

The Kekule mode in benzene

Cyclo[18]carbon, as this form is known, could have either 18-fold symmetry with no bond alternation, or 9-fold symmetry in which alternating short and long bond occur. The STM conclusions pointed to the form of structure with alternating bonds; these are attributed to triple and single in the article, but in fact no actual accurate bond lengths were (or can be) measured to directly support this.

I thought it might be interesting to see how various forms of computational quantum calculation might reflect this new experiment. In order to exploit the higher 18-fold symmetry to reduce calculation time, only the geometry with 18 equal bond lengths is computed here, along with the calculated value of the Kekulé mode. All the calculations are presented as FAIR data at DOI: 10.14469/hpc/6038

Method Kekulé mode, cm-1 CC bond length, Å Number of -ve force constants
Density functional methods
B3LYP+DG3+BJ/6-31G(d) +766 1.284 5, all in-plane bucklings
B3LYP+DG3+BJ/6-311G(d) +699 1.278 1, in-plane buckling
B3LYP+DG3+BJ/Def2-TZVPP +673 1.276 0
B3LYP+DG3+BJ/Def2-QZVPP +649 1.275 0
ωB97X-D/Def2-TZVPP -2058 1.273 2, Kekulé + in-plane buckling
Double hybrid methods
B2PLYPD3/Def2-TZVPP +2598 1.281 0
DSDPBEP86/Def2-TZVPP +3470 1.283 0
PBEQIDH/Def2-TZVPP +3447 1.277 0
Møller-Plesset methods
MP2/6-31G(d) +20444 1.295 8, in/out-of-plane buckling
MP2/6-311G(d) +19885 1.293 8, in/out-of-plane buckling
MP2/Def2-TZVPP +19466 1.289 0
MP3/6-311G(d) -15637 1.284 11, Kekulé + in/out-of-plane buckling
MP4(SDQ)/6-31G(d) -9496 1.287 11, Kekulé + in/out-of-plane buckling
Coupled Cluster methods
CCSD/6-31G(d) -4564 1.288 11, Kekulé + in/out-of-plane buckling
CCSD/6-311G(d) -4310 1.286 11, Kekulé + in/out-of-plane buckling
CCSD(T)/6-31G(d) N/A 1.276 N/A

The conclusions include:

  1. The quality of the basis set is crucial. For lower quality bases, in-plane deformations of the ring occur, but the Kekulé mode itself is relatively unaffected. The minimum converged basis is Def2-TZVPP, which rather restricts the ability to perform higher level calculations.
  2. Even within the DFT methods, two popular functionals given diametrically opposite results. The B3LYP procedure predicts no BLA, whereas the alternative ωB97X-D functional suggests strong BLA. I could have proceeded through the functional zoo and added 100s more functionals to this list, but the point I wanted to make is already established!
  3. Double hybrid methods, which combine exact HF exchange with an MP2-like correlation, sometimes used as the next step up Jacob’s ladder, predict no BLA. With a very high real Kekulé mode (2600-3500 cm-1) they provide a first warning sign something is not quite right?
  4. Moller-Plesset expansions such as MP2 itself gives an unrealistic positive force constant for the Kekulé mode. The perturbation expansions are a notorious example that for a physically realistic result, the expansion has to converge and moreover is often assumed to converge rapidly. If the expansion is proven non-convergent, then neither the MP method itself, nor other methods which make use of it such as the double hybrid and the coupled cluster methods, can be trusted. For MP2 itself, the Kekulé mode has a value that indicates serious issues with the single-reference wavefunction used. The MP3 and MP4 expansions illustrate nicely the oscillation of the method.
  5. Coupled cluster methods, which also make use of MP expansions, are regarded as the next step up Jacob’s ladder. The computational cost of these scales so quickly that the larger basis sets are not feasible and so the (already proven as deficient) 6-31G(d) basis must be used. Again, we see the CCSD method mirroring the MP2-4 expansions.

So, are the MP and CCSD methods converging to a reliable solution, or are they oscillating too much to make any conclusions? If we think the convergence is approaching, then the Kekulé (transition state) mode for C18 (shown below) would indeed correspond to an interpretation of the STM observations as BLA.

But there might be an alternative explanation, that instead the molecule buckles in the manner illustrated below. This would also lead to 9-fold symmetry.

I conclude by pondering why the convergence of these methods is so strange. They are all single-reference methods, and perhaps C18 depends instead on multi-reference states? This would need a MCSCF (multi-configuration) or VB (valence bond) approach. Given the need for accurate basis sets, this is probably a big ask, but perhaps some group out there can do this and compare with the results here?

References

  1. K. Kaiser, L.M. Scriven, F. Schulz, P. Gawel, L. Gross, and H.L. Anderson, "An sp-hybridized molecular carbon allotrope, cyclo[18]carbon", Science, vol. 365, pp. 1299-1301, 2019. https://doi.org/10.1126/science.aay1914

A Non-nitrogen Containing Morpholine Isostere; an application of FAIR data principles.

August 4th, 2019

In the pipeline reports on an intriguing new ring system acting as an isostere for morpholine. I was interested in how the conformation of this ring system might be rationalised electronically and so I delved into the article.[1] Here I recount what I found.

The basis for the isosteric claim can be found in the conformational analysis reported in Figure 4. The N-diazine ring in A is found to be co-planar with the morpholine ring as shown in the diagram (the dihedral measured is indicated using boldened bonds). Compound D contains the cyclopropanated variation, which is postulated as isosteric at least in part on the basis that it too is co-planar, with a dihedral angle of ~170° (with a second slightly higher minimum having a value of ~10°) and hence might be capable of acting as an isostere to the morpholine.

Figure 4. Dihedral scanning plots for various pyrimidine fragments using DFT/6-31G**.

I was intrigued as to why a saturated sp3-carbon would exhibit the same behaviour as a nitrogen centre. The latter has a lone pair oriented at 90° to the aryl ring, the resulting conjugation favouring co-planarity. But how would that sp3-carbon centre do the same? Time to do some calculations, and hence on to the supporting information (SI) for the article in an effort to get a starting base – initially to replicate the calculation results shown above. I start by focusing on the value quoted above, ~170°. Note the ~, since I obtained that visually from the figure. In fact it must remain “~“, since no further geometrical information is available from the SI. Quickly I also realized that replication must also remain elusive, since the caption to the figure is the only information on the calculations which were used to produce figure 4. DFT you see is a generic term, standing for density functional theory. But in that theory the functional has to be defined; there are possibly about 500 different functionals that have been used in the literature. We do get a citation to the method (ref 25 in the article) which is to the commercial Jaguar program system. Herein lies a problem. Programs implement what might be described as default calculation options and quite possibly it is the default option that has been invoked here. A licensed user of Jaguar can probably find out what that default option is and hence can expand DFT to the actual functional used. But unfortunately I am not a licensed user, and even if the default option could be tracked down to an online manual somewhere, there is no certainty it was actually used to produce Figure 4.

So here I make my first plea. The SI for this article is not fully FAIR! In this instance it contains no accessible data that can be used to replicate the results reported. At a minimum, if DFT based results are going to be reported, then FAIR data containing the input(s) used for the calculation and one or more outputs should be made available. Perhaps then if one is lucky, those outputs might declare any default assumptions, such as the precise DFT method used.

I therefore went ahead with my own calculations, deciding to use B3LYP (being my declared DFT functional) with the 6-311++G(d,p) basis (an improvement on the 6-31G** (≡ 6-31G(d,p) basis set declared as used for Figure 4). I did two variations, one without a D3+BJ dispersion attraction correction and one with. It is now recognised that such corrections can be important, even for small molecules. Because we do not know the nature of the DFT method used in the article itself, we do not know if it incorporates such corrections or not. The results are shown below, with a FAIR data location of DOI: 10.14469/hpc/5990

The two minima from the new B3LYP+D3BJ/6-311++G(d,p) calculation have dihedral values of -139.4° and  +55.6° with dispersion included and essentially the same without, indicating that dispersion has only a small effect on the conformational geometry (the top trace above is without dispersion). These values are different from the ones inferred from Figure 4,  being closer to gauche than to co-planar. These new values can be rationalised as allowing good overlap between a C-C bond of the cyclopropane and the π-system of the aromatic ring (dihedrals 75 and 88° for the two minima vs 90° for the overlap of the  N-lone pair). The values for the conformation implied in Figure 4 are 41 and 55°, which is less favourable hyperconjugative overlap. The rotational barriers are ~18 and 25 kJ/mol, rather higher than those obtained visually from Figure 4, but still indicating a relatively flexible molecule which can probably adopt a relatively low energy co-planar isosteric conformation in the correct environment.

There is however some more information about these molecules reported in the article,[1] being a small molecule crystal structure for a related compound 12b (quoted for Figure 5 as CCDC 1864315). To quote, “the small molecule crystal structure of 12b confirms coplanarity in the solid phase”. However, the dihedral angle for this crystal structure is not given either in the text or the  SI. A search of the CCDC database reveals no entries in May 2019 database (the data is clearly too new to have been indexed there) and unfortunately the article SI contains no atom coordinates.  The calculations reported in Figure 4 and the ones in the plot above are of course for an isolated molecule. Once I manage to acquire the crystal coordinates, it should be possible to see if there are any intermolecular interactions which are a factor in explaining why the geometries of the isolated molecule and its crystal form might differ in co-planarity.

Until then I conclude that the inclusion of  FAIR data pertaining to this co-planarity in the article itself would certainly have helped to resolve the origins of the difference in the geometries reported in the article and my own calculations reported here; it may still be of course that functionals other than B3LYP+D3BJ reproduce the crystal structure better. Nonetheless, I think there is a more rational electronic basis for the conformation of the N-aryl ring in the isolated molecule based on the dihedral angles reported here, whilst an attempt to replicate the values reported in the article itself[1] based on further information would also be useful. 


Reprinted with permission from [1]. Copyright 2019 American Chemical Society.

In this article[2], making quite some waves, you can find a fascinating discussion of the perils of using “packaged” programs in which many “defaults” are allowed to persist by the user. In this particular case, the default was the size of the integration grid in the DFT calculation. This article make the very alarming case that for many years the default size in at least one popular DFT program was not good enough to ensure that resulting calculated free energies were sufficiently accurate to sustain many conclusions for regio and stereoselectivity out there in the wild. A awful lot of computational chemistry derived results might be wrong! You may only be slightly re-assured that the default grid sizes used for the calculations reported in this blog, at least for the last five years or so, are suitably larger than the one critiqued in this article.

From which you find a keyword integral=(acc2e=14,grid=ultrafine) defined, which ensures that not only is the integration grid declared, but also that the integral accuracy is pumped up beyond the program defaults of 12. We have found that this is very often helpful for calculation of frequencies. 

New entries, not yet available in the distributed database, can be accessed as e.g. https://www.ccdc.cam.ac.uk/structures/search?pid=ccdc:1864315 The dihedral is 167.7° for one conformation of compound 12b. This has now also been assigned  DOI: 10.5517/ccdc.csd.cc20kz6s The coordinates obtained from this source correspond to an absolute stereochemistry of 1R,6S, or 12a in the article.

 

References

  1. H. Hobbs, G. Bravi, I. Campbell, M. Convery, H. Davies, G. Inglis, S. Pal, S. Peace, J. Redmond, and D. Summers, "Discovery of 3-Oxabicyclo[4.1.0]heptane, a Non-nitrogen Containing Morpholine Isostere, and Its Application in Novel Inhibitors of the PI3K-AKT-mTOR Pathway", Journal of Medicinal Chemistry, vol. 62, pp. 6972-6984, 2019. https://doi.org/10.1021/acs.jmedchem.9b00348
  2. A.N. Bootsma, and S. Wheeler, "Popular Integration Grids Can Result in Large Errors in DFT-Computed Free Energies", 2019. https://doi.org/10.26434/chemrxiv.8864204.v1

CH…O hydrogen bonding competing with layered dispersion attractions.

July 19th, 2019

I have previously looked at the topic of hydrogen bonding interactions from the hydrogen of chloroform Here I generalize C-H…O interactions by conducting searches of the CSD (Cambridge structure database) as a function of the carbon hybridisation. I am going to jump straight to a specific molecule XEVJIR (DOI: 10.5517/cc5fgpq) identified from the searches appended to this post as interesting for further inspection.[1]

The distances from the carbonyl oxygen to CH groups of an adjacent intermolecular molecule are shown, revealing a bifurcated strong + weaker CH…O interaction. I would note that the CH…O distances are un-normalized, in the sense that a C-H distance obtained from X-ray diffraction data is normally about 0.1Å too short. A corrected value for the H…O distance is probably closer to 1.994Å. Next, a B3LYP+G3BJ/Def2-TZVPP calculation of just this dimeric interaction, which shows a somewhat different pattern, particularly from the carbonyl to the sp3-C-H (FAIR data DOI: 10.14469/hpc/5943) with one distance being shorter and one longer.

Click to load 3D model

A QTAIM analysis reveals the electron density ρ(r) of 0.021au, a relatively high value indicating a relatively strong interaction.

Side-views reveals a possible reason for why the calculation does not match the crystal structure. In the crystal structure, the sp3-CH2 group adopts a different conformation from that computed for just two interacting molecules, since this shape allows more efficient stacking of layers and hence allowing stabilizing dispersion energy between the layers to overcome some loss of hydrogen bonding energies in the plane of the layer. If this packing constraint is removed in the pure dimer, one sp3-CH moves into the plane allowing a shorter interaction to the carbonyl oxygen and the other sp3-CH adopts a pure axial position, unconstrained by any packed layer above it.The absence of layered dispersion attractions is hence compensated by forming strong CH…O interactions.

A calculation using six molecules arranged in three layers of two is an attempt to add back at least some of the layering dispersion terms (a full periodic boundary lattice calculation is the proper way of doing this calculation, but at the level chosen here would take far too much computer time!). The new CH…O distances are now 2.018 and 2.384Å (compared to 2.036 and 2.199Å for a model with just two molecules). Probably, more layers would be needed to replicate the crystal structure more accurately.


And now for the searches. The first is for sp-hybridised carbon, as an intermolecular interaction (R < 0.05, no errors, no disorder, T=<150K, H-position normalised for distances shorter than the sum of the vdW radii -0.4), for which a clear hot spot occurs at a H…O distance of ~2.1Å

Intermolecular to sp carbon

Next, sp2-C as an intermolecular interaction (T=<90K), where the hot spot is less distinct, being at the distance cut-off specified for the search. The shortest distance is ~2.0Å. I will return to this example shortly.

Intermolecular to sp2 carbon

An intramolecular version of this search shows a clearer hotspot, again at ~2.15Å

Intramolecular to sp2 carbon

Next, intramolecular sp3 hybridisation, for which there few examples with no clear hotspot.

Intramolecular to sp3 carbon

Finally, intermolecular sp3 hybridisation. The H…O distance hotspot is very slightly longer, as might be expected for a less acidic hydrogen. Nonetheless, the variation in the H…O distances with hybridisation is perhaps unexpectedly small.


To summarise, by performing a general search of the crystal structure database, one can identify general trends and then go to inspect outliers. In this case, this brought the focus onto an (dare I say otherwise umremarkable) molecule in which layers of aromatic molecules set up a competition between intra-layer CH…O hydrogen bonding and inter-layer dispersion stabilizations. I suspect this competition between these two type of weak interactions is far more common than is generally recognised.

 

References

  1. K.S. Huang, M.J. Haddadin, M.M. Olmstead, and M.J. Kurth, "Synthesis and Reactions of Some Heterocyclic Azacyanines<sup>1</sup>", The Journal of Organic Chemistry, vol. 66, pp. 1310-1315, 2001. https://doi.org/10.1021/jo001484k

Metadata. Why?

July 2nd, 2019

I have had some interesting discussions recently regarding metadata. What emerges is that it can be quite a broadly defined concept and it is clear that a variety of answers might be obtained when asking the simple question “what is it useful for?” Here I set out some of my answers to that question.

  1. Metadata vs Data. Questions such as where is the continuum between data/metadata and whether the metadata is fine-grained or more broadly-grained.
  2. What is its ultimate destination? Should metadata reside inside a complete package or container of data, serving the purpose of succinctly describing what to expect in that package? Or should it reside entirely separately from the data package in some sort of metadata store (MDS)?
  3. Are there issues of trust or provenance? Thus, how was the metadata created, by a person or a process and when? Has it been changed since it was created? If so, what are the revisions? Does the metadata adhere to a specified structure and has it been been validated against that structure.

Some context needs to be applied before answering such questions (context is perhaps a synonym for metadata!)

  1. Firstly, I am going to use metadata here in the context of describing data itself (i.e. rather than other research objects such as journal articles). This would include answers to questions such as:
    1. who created both the data and its metadata.
    2. when were both created and perhaps modified.
    3. where the data is stored
    4. what are its defined internal structures (sometimes also called  MEDIA types).
    5. who its “publisher” is (the organisation where the data was produced or is curated).
    6. what are the access and re-use rights associated with the data.

    These are broad-grained provenance if you like.

  2. Next, metadata describing the specific the context of the data, e.g. in my case the chemistry associated with it.
    1. Is it about a molecule?
    2. if so what is the nature of the molecule?
    3. Is it computational data about a molecule.
    4. If so, what software was used for the computations and its parameters, inputs and outputs.
    5. Might it be instrumental data recorded for a molecule?
    6. If the latter, does it record the instrument and its settings?

    We are now moving into fine-grained metadata, and perhaps even crossing the boundary into data itself, since the parameters for either software or instruments can be large and complex and are often so heavily mixed into the data itself that their extrication may be a challenge.

  3. Finally, what is the purpose of creating and storing such metadata.
    1. Here the context is of “discoverability” (of the data itself) and perhaps also
    2. Reusability” and/or “Interoperability (of the data itself).
    3. These attributes are nicely summarised by the acronym FAIR, where discoverability is specified by both Findability and Accessibility.

Before introducing examples based on metadata with the focus on discoverability, I want to distinguish between locally packaged metadata and separated metadata (Qu. 2 above). The examples below relate purely to the latter, which has been created as a separate entity by registration with an agency such as DataCite. Such registration also addresses Qu. 3 above about trust. This external agency adds trust by recording the identity of the person (or a process or workflow initiated by a person) registering the metadata together with the registration date (the Datestamp) and also monitors any changes to the metadata (which is allowed) by keeping its version history. Interestingly, there seems to be no mechanism to record any processes or workflows used to create  metadata so as to learn how the metadata itself was assembled. Nor have I seen much discussion of this aspect; one for the future I fancy.

I now introduce some examples of discoverability. The descriptions are quite short and are meant to be used in conjunction with a “reverse-engineering” of the (somewhat) human readable search query. These queries are also deposited as  “data”,  at DOI: 10.14469/hpc/5920

Entry Description Elasticsearch query
1 Media (MIME) type https://search.datacite.org/works?query=media.media_type:chemical/x-mnpub*
2 Combining Media with the DataCite Subject https://search.datacite.org/works?query=media.media_type:chemical/x-mnpub*+AND+subjects.subjectScheme:inchikey+AND+subjects.subject:XZYDALXOGPZGNV-UHFFFAOYSA-M+AND+media.media_type:chemical/x-gaussian*
3 Combining ORCID with Media https://search.datacite.org/works?query=contributors.nameIdentifiers.nameIdentifier:*0000-0002-8635-8390+AND+media.media_type:chemical/x-mnpub*
4 Exploiting Subject https://search.datacite.org/works?query=subjects.subjectScheme:Gibbs_Energy+AND+subjects.subject:”-39.946176″
5 Exploiting Subject with range query https://search.datacite.org/works?query=subjects.subjectScheme:Gibbs_energy+AND+subjects.subject:[\-649.1 TO \-649.8]
6 Nested search with two Subjects https://search.datacite.org/works?query=(subjects.subjectScheme:inchikey+AND+subjects.subject:”-1082.980914″)+AND+(subjects.subjectScheme:Gibbs_Energy+AND+subjects.subject:KTOSDSJYNBIDCN-UHFFFAOYSA-N)
Nested search with two Subjects transposed https://search.datacite.org/works?query=(subjects.subjectScheme:inchikey+AND+subjects.subject:KTOSDSJYNBIDCN-UHFFFAOYSA-N)+AND+(subjects.subjectScheme:Gibbs_Energy+AND+subjects.subject:”-1082.980914″)
7 Two different Media types https://search.datacite.org/works?query=media.media_type:chemical/x-gaussian*+AND+media.media_type:chemical/x-mnpub*
8 License type https://search.datacite.org/works?query=rightsList.rights:”Creative Commons Public Domain Dedication (CC0 1.0)”
9 Exploiting subjectscheme https://search.datacite.org/works?query=media.media_type:chemical/x-mnpub*+AND+subjects.subjectScheme:NMR_Nucleus+AND+subjects.subject:1H
10 Exploiting subjectscheme https://search.datacite.org/works?query=media.media_type:chemical/x-mnpub*+AND+subjects.subjectScheme:NMR_Pulse+AND+subjects.subject:1D
11 Simple PID query https://search.datacite.org/works?query=identifier:*10.14469/hpc*
12 Combining ORCID with PID query https://search.datacite.org/works?query=(contributors.nameIdentifiers.nameIdentifier:*0000-0002-8635-8390)+AND+(identifier:*10.14469/hpc*)
13 Combing researcher name with PID query https://search.datacite.org/works?query=(identifier:*10.14469/hpc*)+AND+(contributors.contributor.contributorName:Henry+Rzepa)
14 Entries in specific repository (Imperial) referencing specific Journal https://search.datacite.org/works?query=(relatedIdentifiers.relatedIdentifier:10.1021/acs.orglett*)+AND+(identifier:*10.14469/hpc*)
15 Entries in specific repository (Cambridge) referencing specific Journal https://search.datacite.org/works?query=(relatedIdentifiers.relatedIdentifier:10.1021/acs.orglett*)+AND+(identifier:*10.17863/cam*)
18 Entries in specific repository (Cambridge) referencing all publisher journals https://search.datacite.org/works?query=(relatedIdentifiers.relatedIdentifier:10.1021/acs*)+AND+(identifier:*10.17863/cam*)
16 Entries in all repositories except one referencing specific Journal https://search.datacite.org/works?query=(relatedIdentifiers.relatedIdentifier:10.1021/acs.orglett*)+NOT+(identifier:*10.5517*)
17 Entries in specific repository referencing one publisher https://search.datacite.org/works?query=(relatedIdentifiers.relatedIdentifier:10.1021*)+AND+(identifier:*10.5517*)
19 Entries in all publisher journals, excluding one data repository https://search.datacite.org/works?query=(relatedIdentifiers.relatedIdentifier:10.1021*)+NOT+(identifier:*10.5517*)
20 Entries in Institutional repository referencing datasets https://search.datacite.org/works?query=(relatedIdentifiers.relatedIdentifier:*10.14469/spiral*)+AND+(identifier:*)+AND+(types.resourceTypeGeneral:Dataset)

The examples above reveal a somewhat a not entirely human-friendly syntax; with each of them some effort at “de-bugging” was needed to make them work. I gather from the  PIDForum that a more friendly GUI to achieve this is on their radar. As I develop or discover more examples of such searches I will add them to the list above at DOI: 10.14469/hpc/5920. Meanwhile, if  you want to use any of the above as a template for your own searches do please explore.

Anniversaries: The World-Wide-Web at 30 and 25 (+ CERN's LHC as a bonus).

June 15th, 2019

 

The World-Wide-Web is currently celebrating its 30th anniversary; you can get the T-shirt in the CERN visitor centre!  Five years on, in May 1994, the first Web conference took place (WWW94) at CERN and now celebrating its own 25th anniversary. That 1994 conference also had various break-out sessions, one of which summarised the state of chemistry on the web at the time. You can see my general but entirely personal impressions written after the workshop (DOI: 10.14469/hpc/5850), with a chemistry specific version at DOI: 10.14469/hpc/5851.  A real trip down memory lane and an indication of how much has happened in 25 years!

I was lucky enough to be able to visit CERN a few days ago, primarily to view the CMS (Compact Muon Solenoid) detector and I also took the opportunity to remember WWW94. Here are a few photos of the visit. Note the chemistry seen in the final photo, taken in the control room (hint: look on the shelf at the back).

You can find a full report of the visit at https://www.imperial.ac.uk/news/191569/imperial-celebrates-towering-intellectual-achievement-cern/ with more photos. It is noteworthy that as well as celebrating the achievements of the  LHC, we also find that “CERN’s work often leads to paradigm-shifting technologies, … The most famous example is the World Wide Web, which was invented by British scientist Tim Berners-Lee while working at CERN in 1989.” Hence the link between WWW94 and the CMS.

I end with some chemistry. The design of the CMS detector (one of two that detected the Higgs Boson) took 27 years to complete! It started in the mid 1980s at a point when some of the science required for the detector was still unknown! It was a stupendous act of faith that during the time it took to design the detector, science would provide a solution. One such solution took the form of Lead tungstate (PbWO4), which scintillates when struck by high energy particles. Some 75,848 single crystals of lead tungstate were used in the detector, each being 23 cm long and weighing in total 100 tonnes! The technology to purify and then crystallise this material to optical transparency was developed in Russia and China.

The CMS is now undergoing experiments to find the missing mass in the universe, the so-called dark matter. Maybe another visit in 25 years time is called for to hear where that missing mass is!

Anniversaries: The World-Wide-Web at 30 and 25 (+ CERN’s LHC as a bonus).

June 15th, 2019

 

The World-Wide-Web is currently celebrating its 30th anniversary; you can get the T-shirt in the CERN visitor centre!  Five years on, in May 1994, the first Web conference took place (WWW94) at CERN and now celebrating its own 25th anniversary. That 1994 conference also had various break-out sessions, one of which summarised the state of chemistry on the web at the time. You can see my general but entirely personal impressions written after the workshop (DOI: 10.14469/hpc/5850), with a chemistry specific version at DOI: 10.14469/hpc/5851.  A real trip down memory lane and an indication of how much has happened in 25 years!

I was lucky enough to be able to visit CERN a few days ago, primarily to view the CMS (Compact Muon Solenoid) detector and I also took the opportunity to remember WWW94. Here are a few photos of the visit. Note the chemistry seen in the final photo, taken in the control room (hint: look on the shelf at the back).

You can find a full report of the visit at https://www.imperial.ac.uk/news/191569/imperial-celebrates-towering-intellectual-achievement-cern/ with more photos. It is noteworthy that as well as celebrating the achievements of the  LHC, we also find that “CERN’s work often leads to paradigm-shifting technologies, … The most famous example is the World Wide Web, which was invented by British scientist Tim Berners-Lee while working at CERN in 1989.” Hence the link between WWW94 and the CMS.

I end with some chemistry. The design of the CMS detector (one of two that detected the Higgs Boson) took 27 years to complete! It started in the mid 1980s at a point when some of the science required for the detector was still unknown! It was a stupendous act of faith that during the time it took to design the detector, science would provide a solution. One such solution took the form of Lead tungstate (PbWO4), which scintillates when struck by high energy particles. Some 75,848 single crystals of lead tungstate were used in the detector, each being 23 cm long and weighing in total 100 tonnes! The technology to purify and then crystallise this material to optical transparency was developed in Russia and China.

The CMS is now undergoing experiments to find the missing mass in the universe, the so-called dark matter. Maybe another visit in 25 years time is called for to hear where that missing mass is!

ChemRxiv. Why?

June 5th, 2019

In August 2016, the launch of a chemistry pre-print service ChemRxiv was announced. I was phoned a day or so later by a staff journalist at C&E News for my opinion. The only comment that was retained for their report was my instantaneous feeling that “the community needed a chemistry pre-print server like one needed a hole in the head“. I had been there before you see, recollecting a pre-print server launched by the ChemWeb service around 1996 or 1997 and which lasted only about two years before being withdrawn due to the low quality of the preprints. So what do I think of ChemRxiv now in 2019?

Let me set the scene first. Nowadays, many journals offer open access options, most upon payment of an APC (article processing charge). One can sometimes get a grant for this fee from institutional libraries. Mine for example has a policy that to apply for an APC, one has to deposit a “final author version” (FAV) of a manuscript in our local institutional repository (Spiral). Thus the final outcome is two versions of open access articles, one the FAV and then a version-of-record (VOR) held by the publisher. ChemRxiv can now add a third version to the process, since the expectation is that after some life as a pre-print, the manuscript can then be submitted to a peer-reviewed journal. Because the pre-print is allocated a persistent identifier (a DOI), the expectation is that the pre-print will indeed be persistent, with no expiration. Three versions of any given article are therefore now likely to be around, in effect permanently (or what goes for permanence nowadays). Importantly, there is no clear protocol for indicating how these three versions might differ, if they do. Even the FAV and the VOR may contain differences such as errors found in galley-proofing which will appear in the VOR but may not be propagated to the FAV. The congruence between the pre-print and any VOR is even less obvious.

All this came to a head as a result of the pre-print I noted in my previous two posts.[1] Unlike the topic of an earlier post of mine, where the VOR article[2] (not a preprint) allows readers to comment (see e.g. https://www.nature.com/articles/s41586-019-1059-9#article-comments) I have not been able to identify a mechanism to post any comment about pre-prints. After all, that did seem to me to be a primary reason for exposing a pre-print, which is to invite insights from the community, perchance to improve the science or make suggestions related to it. What I have spotted however was an altmetric index. Hover over that and you get social media metrics. For this pre-print[1], these put it in the top 5% of all outputs, so it is clearly attracting much interest. This interest includes (currently) 1955 views, 539 downloads and commentary via two blog posts (www.altmetric.com/details/59250193/blogs) and 40 tweets (www.altmetric.com/details/59250193/twitter). You would have to work quite hard to visit all the blog posts and read all the tweets to assess overall how the community was responding to any specific pre-print. 

So what is the purpose of posting (or should I use the term publishing?) a ChemRxiv pre-print? Is it primarily to gather commentary via social media such as blog and Twitter posts and to use this to improve the final VOR based on such feedback? A colleague I discussed this with suggested that in some very competitive areas of science/chemistry, it might also serve to acquire a date-stamp for the research (part of the metadata associated with a DOI) and hence to claim priority, a stamp which would thus pre-date that obtained from VOR publication by a few months. This might be perceived as making all the difference in a competitive area in terms of gathering evidence of esteem, inclusion in grant proposals etc, especially for early career researchers. There may be other reasons which I have not thought of and comments here for these are most welcome.

I will end with noting the following project: en.wikiversity.org/wiki/WikiJournal_of_Science,[3] being part of the WikiVersity. Here, the APC is dispensed with (no publication costs, at least to the authors), a DOI is again allocated and each article is subjected to both public peer review (en.wikiversity.org/wiki/WikiJournal_of_Science/Peer_reviewers) and can also carry post-publication review comments and even direct edits in the manner of Wikipedia. The other infra-structures of the Wiki ecosystem are available, including access to WikiData, which is high quality reference data.

So I think it is going to be an interesting debate about how the publication of primary research articles is going to evolve. Is a Triad of articles (the pre-print, the FAV and the VOR) the future? Or could it be e.g. the Wiki Journal of Science (extended perchance in the future to Wiki Journal of chemistry?) showing an interesting alternative way? Or is it all just getting too fragmented and confusing?

References

  1. K. Miyamoto, S. Narita, Y. Masumoto, T. Hashishin, M. Kimura, M. Ochiai, and M. Uchiyama, "Room-Temperature Chemical Synthesis of C2", 2019. https://doi.org/10.26434/chemrxiv.8009633.v1
  2. J. Lee, K.T. Crampton, N. Tallarida, and V.A. Apkarian, "Visualizing vibrational normal modes of a single molecule with atomically confined light", Nature, vol. 568, pp. 78-82, 2019. https://doi.org/10.1038/s41586-019-1059-9
  3. . , and T. Shafee, "The aims and scope of WikiJournal of Science", WikiJournal of Science, vol. 1, pp. 1, 2018. https://doi.org/10.15347/wjs/2018.001