Mechanism of the Masamune-Bergman reaction. Part 3: The transition state for Calicheamicin models.

September 11th, 2024

Calicheamicin was noted in the previous post as a natural product with antitumour properties and having many weird structural features such as  an unusual “enedidyne” motif. The representation is shown below.

A partial structure shown below for Calicheamicin replaces the -(CH2)4- substructure with a four carbon chain that includes two sp2centres instead of two sp3 centres. The purpose is to find out how these structural modifications to the classic Bergman affect the mechanism.

TS1 is shown below for this model and the computed free energy barrier for this cyclisation is 42.5 kcal/mol at the uωB97XD/Def2-TZVPP level, <S2> = 0.345. FAIR Data DOI: 10.14469/hpc/14583[1]. This compares with 33.0 kcal/mol calculated for the -(CH2)4- version, for which <S2> = 0.266. To prepare for modelling the full Calicheamicin molecule, the basis set for this model was reduced to Def2-SVPP and at this level ΔG was 43.0 kcal/mol, <S2> = 0.368, the difference being small enough that the reduction in basis set seems unlikely to affect the results. The C-C bond forming lengths are 1.957 (Deft-TZVPP) and 1.989Å (Def2-SVPP).

Now for a larger model containing the entire Calicheamicin molecule. Two possibilities were explored; one where the geometry of the system was fully optimised in isolation to yield a conformation for Calicheamicin which folded in upon itself and for which ΔG (Def2-SVPP) 40.1 kcal/mol, <S2> 0.368.

The second model used the initial geometry of Calicheamicin as obtained from a crystal structure of the ligand folded into the minor grove of a DNA fragment and which has a much more linear form. The reactant in this mode was +6.1 kcal/mol higher in energy than the previous and TS1 was 4.6 kcal/mol higher, leading to ΔG 38.6 kcal/mol, <S2> 0.367.

So what conclusions can we draw from these extended models of the Bergman cyclisation? The activation free energies for all three models are in the range 42.5 – 38.6 kcal/mol, which is a great deal higher than a value commensurate with a facile room temperature reaction (~22±3). The observation that Calicheamicin can in fact be characterised as a crystal structure when bound to DNA suggests that the cyclisation barrier cannot be too low, but conversely the range 42.5 – 38.6 kcal/mol appears too large for Calicheamicin to easily activate into a biradical in order to abstract hydrogen atom and end up causing strand scission. Might the simplistic model of a split UHF wavefunction resulting in values of <S2> 0.37 be the problem? Well, a similar approach was taken to modelling the Stevens rearrangement [2]. Using a plain non-biradical closed shell wavefunction, a barrier of ~48 kcal/mol was obtained, but this reduced to 14 kcal/mol when the UHF method was applied (<S2>  0.421), so this model appears to work well in those circumstances. The jury must still be out on whether the Bergman cyclisation mechanism is being correctly modelled here or whether something more complex is going on.

References

  1. H. Rzepa, "Mechanism of the Masamune-Bergman reaction. Calicheamicin", 2024. https://doi.org/10.14469/hpc/14583
  2. H. Rzepa, "The Stevens rearrangement: how history gives us new insights.", 2021. https://doi.org/10.59350/4010f-fvr26

Mechanism of the Masamune-Bergman reaction. Part 2: a possible 3D Model for Calicheamicin revealing the non-covalent-interactions (NCI) present.

August 26th, 2024

Calicheamicin is a natural product with antitumour properties discovered in the 1980s, with the structure shown below. As noted elsewhere, this structure has many weird properties, including amongst other features an unusual “enedidyne” motif and the presence of an iodo group on an aromatic ring. Its isolated 3D structure is quite difficult to get hold of (embedded structures in a DNA fragment are available however); the 3D model associated with the Wikipedia entry is essentially only in 2D. The representation shown below, including the absolute stereochemistry, was obtained from the SciFinder entry.

As a prelude to modelling the mechanism of the Bergman cyclisation (for Part 1 in which a simple cycloendiyne is explored, see DOI: 10.59350/jczra-f0r90 [1]) of the enediyne ring on this actual molecule, a 3D model was constructed. One possible such model is shown below, built to maximise wherever possible interactions such as hydrogen bonds and weak dispersion attractions from eg methyl groups. A side benefit of doing this is the natural emergence of a “cavity” in which the very large iodine atoms snuggles, as it happens adjacent to the enediyne component – something you would not naturally infer from the structure representation shown above! A spacefill model of this conformation is shown below (click on the image to get an interactive version), emerging from an ωB97XD/Def2-SVPP energy minimisation (DOI: 10.14469/hpc/14586).[2]

The below shows a crystal structure (2pik) of Calicheamicin embedded into a DNA duplex, which shows a stretched linear conformation of Calicheamicin rather than the compact form more appropriate for an isolated molecule.

The next step was to use the ωB97XD/Def2-SVPP wavefunction[2] to calculate the full electron density for the molecule, and using this to evaluate the NCI (non-covalent-interaction) isosurfaces. These are shown below, and the eye is immediately drawn to the regions surrounding that iodine atom, which are replete with attractive green surfaces. Blue and cyan coloured surfaces derive from hydrogen bonds formed within the 3D structure (click on the image to get an interactive version, but be patient, it takes a little while to load).

The next stage, using the model to evaluate the energetics of the Masamune-Bergman cyclisation for Calicheamicin itself will be reported in part 3.


For those interested, this was constructed in stages. The structure representation had been drawn in Chemdraw, saved as a pseudo 3D molfile and then loaded into Gaussview. There, it was subjected to several cycles of energy minimisation using the MMFF94 molecular mechanics force field. The stereochemistry of all the centres was carefully checked at each stage, if necessary corrected and re-optimised. The next stage was to subject it to a PM7 semiempirical SCF minimisation, a method which includes dispersion attraction terms and which tends to give geometries that are quite close to eg those obtained using dispersion-corrected DFT methods, in this example ωB97XD/Def2-SVPP.


References

  1. H. Rzepa, "Mechanism of the Masamune-Bergman reaction. Part 1.", 2024. https://doi.org/10.59350/jczra-f0r90
  2. H. Rzepa, "Calicheamicin, full system, reactant, wB97XD/Def2-svpp G = -5768.785232", 2024. https://doi.org/10.14469/hpc/14586

Mechanism of the Masamune-Bergman reaction. Part 1.

August 24th, 2024

The Masamune-Bergman reaction[1],[2] is an example of  a highly unusual class of chemical mechanism[3] involving the presumed formation of the biradical species shown as Int1 below by cyclisation of a cycloenediyne reactant. Such a species is  so reactive that it will be quickly trapped, as for example by dihydrobenzene to form the final product. This cycloenediyne is not just an obscure chemical curiosity, the motif is incorporated into the natural product Calicheamicin, which is a potent antitumor antibiotic discovered in the 1980s. This drug owes its activity to the cyclisation TS1 shown below, which for n=2 occurs at the low temperature of 310K. The resulting biradical Int1 is a potent hydrogen abstractor, the species acting this way for hydrogen atoms associated with deoxyribose of DNA, ultimately leading to strand scission. Although I have explored many a mechanism on this blog using computational methods, I have never included any biradical examples. Here I explore the computational aspects of this reaction, and also include a pathway proceeding vis TS2- Int2 – TS3 in which hydrogen abstraction precedes cyclisation, in order to see how competitive such an alternative might be as a function of the ring size (n in scheme below).

The computational procedure was ωB97XD/Def2-TZVPP and the FAIR data is collected at DOI: 10.14469/hpc/14546 [4]. A spin unrestricted procedure is adopted using an approximation to allow for biradicaloid species, namely an initial first guess at the wavefunction using the keyword guess(mixed) which mixes what would be the HOMO and the LUMO of the molecule in a closed shell sense to allow a combination which includes an open shell singlet with one electron in the HOMO and one electron in the LUMO (a biradical). Part of the purpose of this approach is to try to find out if it gives reasonable results for such a mechanism. I will introduce the spin expectation operator <S2> to help identify biradicals. For closed shell singlets it has the value  0.0, for a pure biradical it has the value 1.0. Thus for species  Int1, the values are typically ~0.995 and for the preceding TS1 ~ 0.3 to 0.57. IRC (Intrinsic reaction coordinate) calculations for TS1 show a smooth transition from values of <S2> = 0.0 (Reactant) through to 1.0 (Int1).

The results are shown below for three values of n, revealing that as the ring size increases (ending with an acyclic system Et2) the free energy barrier increases significantly, as indeed is reported[1],[2]. The alternative pathway proceeding via TS2 is always higher in free energy and varies much less with ring size. This route can therefore be firmly excluded from contention.

Table. Free energies for two mechanistic routes
System Reactant TS1 TS2 Int2 TS3 Int1
n=1 -580.843968 0.0 -580.806441 23.6 -580.771042 45.8 -580.797829 29.0 -580.795875 30.2 -580.838524 3.5
n=2 -620.142895 0.0 -620.090298 33.0 -620.068740 46.5 -620.094239 30.5 -620.088955 33.8 -620.131731 7.0
n=3 -659.434065 0.0 -659.370635 39.8 -659.356146 48.9 -659.384469 31.1 -659.375211 36.9 -659.413969 12.6
Et2 -621.348992 0.0 -621.278904 44.0 -621.265041 52.7 -621.292104 35.7 -621.280607 42.9 -621.319521 18.5

†<S2> =0.27. Final Product (n=2)  = -620.327868 (-116.1 kcal/mol)


This computational modelling largely agrees with the observations made for this reaction, with just one inconsistency. For n=2, the reaction is reported as taking place at 37°C, for which a typical free energy barrier would be in the region of ~24±2 kcal/mol,[5] around 9 kcal/mol lower than the computed value at this level of theory. This could originate from either a deficiency in the computational model, possibly in the handling of the open shell biradicaloid character by use of a simple spin unrestricted model,[6] or incursion of some lower energy process into the mechanism (free radical involvement?). I will continue probing this issue to see if its origins can be identified.

In the next part of this blog, I will investigate the mechanism as applied to Calicheamicin to see how the more complex bicycloenediyne nature of this natural product affects it.

References

  1. N. Darby, C.U. Kim, J.A. Salaün, K.W. Shelton, S. Takada, and S. Masamune, "Concerning the 1,5-didehydro[10]annulene system", J. Chem. Soc. D, vol. 0, pp. 1516-1517, 1971. https://doi.org/10.1039/c29710001516
  2. R.R. Jones, and R.G. Bergman, "p-Benzyne. Generation as an intermediate in a thermal isomerization reaction and trapping evidence for the 1,4-benzenediyl structure", Journal of the American Chemical Society, vol. 94, pp. 660-661, 1972. https://doi.org/10.1021/ja00757a071
  3. R.K. Mohamed, P.W. Peterson, and I.V. Alabugin, "Concerted Reactions That Produce Diradicals and Zwitterions: Electronic, Steric, Conformational, and Kinetic Control of Cycloaromatization Processes", Chemical Reviews, vol. 113, pp. 7089-7129, 2013. https://doi.org/10.1021/cr4000682
  4. H. Rzepa, "Mechanism of the Masamune-Bergman reaction", 2024. https://doi.org/10.14469/hpc/14546
  5. K.C. Nicolaou, G. Zuccarello, C. Riemer, V.A. Estevez, and W.M. Dai, "Design, synthesis, and study of simple monocyclic conjugated enediynes. The 10-membered ring enediyne moiety of the enediyne anticancer antibiotics", Journal of the American Chemical Society, vol. 114, pp. 7360-7371, 1992. https://doi.org/10.1021/ja00045a005
  6. E.M. Greer, C.V. Cosgriff, and C. Doubleday, "Computational Evidence for Heavy-Atom Tunneling in the Bergman Cyclization of a 10-Membered-Ring Enediyne", Journal of the American Chemical Society, vol. 135, pp. 10194-10197, 2013. https://doi.org/10.1021/ja402445a

Revisiting open/transparent peer review.

July 31st, 2024

Back in 2017, I was asked to peer review an article and its author asked if I would like the review to be “open” – that is that my name would be shown as a reviewer; [1] Replication in Science was – and still is – a hot topic and I had taken the opportunity with this article to try to (successfully I might add) replicate its main (computational) findings. This is something relatively easy to do with computation, but of course far more of challenge to do for experimental work for obvious reasons. I still regularly attempt some level of replication when I review articles nowadays.

So on to 2024, when I was asked – this time as an author – whether I would like the reviews of our own article to be so included, [2] now called transparent peer review.

Now the open aspects have been inverted! Whereas the identity of the reviewers continues to be withheld, their actual reviews are now available to be read, along with the authors’ responses. There is still no way in which any attempt at “replication” can be indicated – the reviews themselves are in free-text form and the reader has to judge for themselves what they might mean and whether replication was part of the process. I also wonder if replication whilst preserving reviewer anonymity can be achieved?

Not all journals by the Royal Society of Chemistry publisher offer transparent review and it is optional of course. But a search of the string “To support increased transparency, we offer authors the option to publish the peer review history alongside their article” suggests around 73 articles in several journals have such review. What is more difficult to establish is what proportion of published articles expose their reviews – is it a high or a low percentage?  Time will probably reveal this aspect.

It is also worth noting another experiment along these lines, the so-called Octopus publishing[3] model, where a scholarly article can have up to eight distinct components, each in theory written by different authors and where any one section could have several contributions –  including a replication study. Each set of authors gets credit, in the form of one or more publication DOIs. This publishing experiment has been running now for almost four years, although I note there are few if any submissions in the area of physical sciences and chemistry.

It might be fair to suggest that with innovations such as these, scholarly publishing is likely to evolve significantly over the next few years.

References

  1. D.C. Braddock, S. Lee, and H.S. Rzepa, "Modelling kinetic isotope effects for Swern oxidation using DFT-based transition state theory", Digital Discovery, vol. 3, pp. 1496-1508, 2024. https://doi.org/10.1039/d3dd00246b
  2. H. Rzepa, "Octopus publishing: dis-assembling the research article into eight components.", 2021. https://doi.org/10.59350/qxjaz-a2298

How should data be cited in journal articles? A Crossref request for public comment!

July 18th, 2024

Metadata is something that goes on behind the scenes and is rarely of concern to either author or readers of scientific articles. Here I tell a story where it has rather greater exposure. For journals in science and chemistry, each article published has a corresponding metadata record, associated with the persistent identifier of the article and known to most as its DOI. The metadata contains information about the article such as its authors and their affiliations, the title of the article and its abstract, and is submitted to/registered with Crossref –  an organisation set up in 1999 on behalf of publishers, libraries, research institutions and funders. Relatively recent additions to Crossref metadata are the citations included in the article, so-called Open Citations. Doing so has helped to create the new area of article metrics, used by e.g. Altmetrics or Dimensions  to help identify the impacts that science publications have. Basically, if one article is cited by another, it is making an impact. Many citations of a given article by other articles means a larger impact. Most researchers love to have a high – and of course positive – impact and perhaps for better or worse, academic careers to some extent depend on such impacts.

With that as the background, I now move to a recent article of ours.[1] The metadata record for this article can be obtained using the query:
https://api.crossref.org/works/10.1039/D3DD00246B/transform/application/vnd.crossref.unixsd+xml (retrieved 14/07/2024).

This has 63 citations in the body of the article, with the unusual but pertinent aspect that 30 of these relate not to other articles or to web links, but to data – specifically FAIR data. We even comment on this in our conclusions – “The citations noted here are included in the metadata record for the article, which is registered with Crossref, albeit with one significant current limitation in that there is currently no formal declaration of these citations as specific pointers to a FAIR data collection.” This statement was made on the premise that the article citations would show a 1:1 match with the metadata entries (which they do, see below.  But see also here[2]).

Before I take a look at this, I note that CrossRef metadata does not treat all citations equally. The traditional form of citation appears as such for reference 25 (there are 29 of these in total).

<citation key="D3DD00246B/cit25/1">
<journal_title>J. Chem. Phys.</journal_title>
<author>Scalmani</author>
<cYear>2010</cYear>
<first_page>114110</first_page>
<doi>10.1063/1.3359469</doi>
</citation>

A variation of this is used for variations on journal articles such as preprints, where an “unstructured” component is added to the citation. This is often used as a short commentary added by the authors relating to the citation  – in this case indicating that it relates to a preprint of the article itself. The term “unstructured” also means that the commentary may not have any predictable patterns, or use any terms from a specified dictionary,  and may need the special expertise of a human to process it. In other words, “unstructured” components may not be “machine friendly”. Or that a machine may have to work quite hard to work out what to do about the commentary.

<citation key="D3DD00246B/cit10/1">
<volume_title>ChemRxiv</volume_title>
<author>Braddock</author>
<cYear>2024</cYear>
<doi>10.26434/chemrxiv-2023-vcmcl</doi>
<unstructured_citation>For a preprint, see, D. C.Braddock, S.Lee and H. S.Rzepa, SWERN Oxidation. 
transition structure Theory is OK, ChemRxiv, 2023, preprint, 10.26434/chemrxiv-2023-vcmcl
</unstructured_citation>
</citation>

A third variation on this is present, but this time apparently relating to data itself.  Note again the use of an “unstructured” commentary, which effectively adds the information that the citation might “apparently” relate to data. To be fair,  the volume title also does that, but this should not be its job!

<citation key="D3DD00246B/cit19/1">
<volume_title>Imperial College Research Data Repository</volume_title>
<author>Braddock</author>
<cYear>2023</cYear>
<doi>10.14469/hpc/13108</doi>
<unstructured_citation>
D. C.Braddock , H. S.Rzepa and S.Lee, Imperial College Research Data Repository, 2023, 
10.14469/hpc/13108</unstructured_citation>
</citation>

Why might this be important? Well, the mantra nowadays is that information has to be processable not only by humans but also by machines undertaking learning or “artificial intelligence. Such ML/AI is at least in part about finding predictable patterns in data, and unstructured citations imply a certain lack of predictability! A machine can “read” a journal article  and that should also be possible for the data on which inferences reported in the article are made. So that data has to be accessible in the first instance and then interoperable and re-useable in the second instance. These attributes are known as FAIR. So it would be great if the metadata for the article could indicate to a machine when associated data might be available – and even better to suggest that this data might have attributes of FAIR.

So we now understand that there  does need to be a formal agreed way of specifically expressing a data citation in the CrossRef metadata, rather than just carrying an unstructured commentary in the citation. The good news is that such is on the way! A public discussion document requests comments by August 15th, 2024  and introduces two  new Crossref additions to the metadata, which are interpreted below in terms of the article we are discussing.

  1. <citation  type=”dataset” key="D3DD00246B/cit19/1">
    <volume_title>Imperial College Research Data Repository</volume_title>
    <author>Braddock</author>
    <cYear>2023</cYear>
    <doi>10.14469/hpc/13108</doi>
    </citation>

A more formal statement is also now added, and I quote Crossref’s reasons for its inclusion “we’d like to support several types of free-text statements in our metadata records as we’ve had feedback that they can be useful for downstream metadata users who are able to parse out and refine chunks of text in ways that may be useful. The statements are also useful for re-use in some situations.” In some ways, it replaces the unstructured citation from the example above, but now using a controlled dictionary term to specifically relate to data.

  1. <statement type=”data availability”>Data Availability and Discovery Statement</statement>

Let us now see how all this is handled for the article we are discussing.[1]

  • The data itself, found as a collection with its own metadata record[3] can and does cite the article[1].
  • The Crossref metadata record for the article as of 17.07.2024 has 38 entries which include an <unstructured_citation>, including 30 relating to data (which are currently inferred by a human).
  • If the metadata changes noted above are implemented, the 30 data citations will be clearly identified as such, as in the example shown in item 1 above, and no human inference would be needed.

The CrossRef public discussion document will remain available for another four weeks or so – meanwhile, public comments are requested! Once these enhancements have been implemented, we hope that the article metadata record we are analysing here can in turn be updated to reflect the FAIR data richness of the article. And then perhaps Altmetrics or  Dimensions  can start producing metrics relating to the impact of cited data. Watch this space!


One difference between the article itself and its metadata record is that the former does not change {unless a corrigendum is issued} – it is a so-called Version-of-Record or VOR, whereas the metadata record itself can be responsibly updated when deemed necessary. So it is important to note the date associated with any given version of a metadata record.

References

  1. D.C. Braddock, S. Lee, and H.S. Rzepa, "Modelling kinetic isotope effects for Swern oxidation using DFT-based transition state theory", Digital Discovery, vol. 3, pp. 1496-1508, 2024. https://doi.org/10.1039/d3dd00246b
  2. L. Besançon, G. Cabanac, C. Labbé, and A. Magazinov, "Sneaked references: Cooked reference metadata inflate citation counts", arXiv, 2023. https://doi.org/10.48550/arxiv.2310.02192
  3. H. Rzepa, "Modelling kinetic isotope effects for SWERN Oxidation. DFT-based transition structure Theory is OK.", 2023. https://doi.org/10.14469/hpc/13058

A peak behind the (hosting) scenes of this blog.

June 15th, 2024

I should start by saying that the server on which this blog is posted was set up in June 1993. Although the physical object has been replaced a few times, and had been “virtualised” about 15 years ago, a small number of the underlying software base components may well date way back, perhaps even to 1993. This system had begun to get unreliable in recent years, and it was decided about 6 months ago to build an entirely new virtual server and then migrate stuff to it.

This switch over from old to new servers happened on June 14 2024 and a DNS host tables switch was made to point the URL of the server to the new version rather than the old one. A significant change was to move from php 7 to php 8 and doing this broke a couple of the installed WordPress plugins. The most important was KCite, which handles the referencing of each post. The input is merely the DOI of the reference and Kcite expands this to a full citation and inserts this at the foot of the post. Whilst we wait for a new version of Kcite to appear (which will in fact do a great deal more than the old one), references here will appear merely as [1] for the time being. The other breakage was the ORCID plugin, which inserts the author ORCID into the post.

There are other things broken which will be worked upon over the next few days or so. Initially, images were broken, but it seems this has now been fixed specifically for the invocation https://www.ch.ic.ac.uk/rzepa/blog/?p=27133 rather than https://www.ch.imperial.ac.uk/rzepa/blog/?p=27133, an error caused by internal use of two different host names, one of which had been set up as alias of the other when Imperial College was “rebranded” about 20 years ago from ic.ac.uk to imperial.ac.uk. The new server does not currently like this mixing! Another misbehaving feature is the invocation of Jmol 3D molecular models by image clicking, which produces “Failed to load resource: the server responded with a status of 412 (Precondition Failed)”. Whilst we ponder this, bear with us. Meanwhile, the server is much more responsive and previous 3 minutes pauses in response hopefully are now fixed.

A 31 year old web server and its underlying services is a positive dinosaur in this information age (if anyone has one that has been broadcasting from the same URL for longer, please let me know). Hopefully, now that it has been given a transplant, it will go on for a few years longer.

References

  1. https://doi.org/

The 100th Anniversary year of Curly Arrows.

June 14th, 2024

Chemists now use the term “curly arrows” as a language to describe the electronic rearrangements that occur when a (predominately organic) molecule transforms to another – the so called chemical reaction. It is also used to infer, via valence bond or resonance theory, what the mechanistic implications of that reaction are. It was in this latter context that the very first such usage occured in 1924[1] taking the form of a letter by Robert Robinson to the secretary of the Chemical Society and “read” on December 18th 1924. The following diagram was included:

First curly arrows

I have commented previously on this diagram[cite10.59350/qqwk3-dgj13[/cite] and will not discuss it further here. To commemorate the 100th anniversary of their invention, I include shots of two “modern” sets of curly arrows, taken from a lecture I give to university students at the end of their first university year.

  1. The first was a new take on the peracid epoxidation of an alkene[2] in which quantum mechanical calculations have revealed that the classic take on the curly arrow mechanism for this reaction can be split into two sets, five for the first stage of the reaction up to the transition state and two for the final stage

Four becomes seven

  1. The second was also discussed here[3] and involves what is arguably a new type of arrow to join the existing stable – the dashed arrow (in red below). This electron transfer arrow can take place over long distances (15Å or more) and adds the concept that an arrow can have the properties of an (approximate) length as well as direction, start+points and perhaps even “curlyness”.

Proton coupled electron transfers

As a “language” describing mechanism and reactivity in molecules, curly arrows are still in common use, but as chemistry itself evolves into new areas, will curly arrows themselves morph into new forms, or will their use gradually decline?

References

  1. "Forthcoming events", Journal of the Society of Chemical Industry, vol. 43, pp. 1295-1298, 1924. https://doi.org/10.1002/jctb.5000435208
  2. H. Rzepa, "Experimental evidence for "hidden intermediates"? Epoxidation of ethene by peracid.", 2013. https://doi.org/10.59350/fdy9j-9fp48
  3. H. Rzepa, "Curly arrows in the 21st Century. Proton-coupled electron transfers.", 2020. https://doi.org/10.59350/rj90z-mxh96

Data Discoverability as a feature of Journal Articles.

June 11th, 2024

I can remember a time when journal articles carried selected data within their body as e.g. Tables, Figures or Experimental procedures, with the rest consigned to a box of paper deposited (for UK journals) at the British library. Then came ESI or electronic supporting information. Most recently, many journals are now including what is called a “Data availability” statement at the end of an article, which often just cites the ESI, but can increasingly  point to so-called FAIR data. The latter is especially important in the new AI-age (“FAIR is AI-Ready”). One attribute of FAIR data is that it can be associated with a DOI in addition to that assigned to the article itself, and we have been promoting the inclusion of that Data DOI in the citation list of the article.[1] Since the data can also cite the article, a bidirectional link between data and article is established. ESI itself can exceed 1000 “pages” of a PDF document and examples of chemical FAIR data exceeding 62 Gbytes[2] (Also see DOI: 10.14469/hpc/10386) are known. Finding the chemical needle in that data haystack can become a serious problem. So here I illustrate a recent suggestion for moving to the next stage, namely the inclusion of a “Data Availability and Discovery” statement. The below is the text of such a statement in a recently published article.[3]


Data availability and discovery statement. Available as a FAIR and AI-ready data collection accessible via doi: 10.14469/hpc/13058 for the overall collection18 and Findable by following the hierarchy of data collections identified there. The data discovery and accessibility aspects are further enabled by using one of the following methods.


Many variations on the above search can be constructed[4] It is also useful to note that the above syntax presents the results of the search in “human readable” form. For a machine version, either of the two forms below should be used.

  1. https://api.datacite.org/dois/?query=media.media_type:chemical/x-gaussian-log+AND+media.media_type:text/plain+AND+(titles.title:*Exo*+OR+titles.title:*Endo*)
  2. curl "https://api.datacite.org/dois/?query=media.media_type:chemical/x-gaussian-log+AND+media.media_type:text/plain+AND+(titles.title:*Exo*+OR+titles.title:*Endo*)"

These last forms emphasise that data discovery is aimed at machine automation as well as humans.

Finally, I ponder how machines will respond to articles containing references to such discoverability. Ideally, the machine actionable information should itself be included in the (CrossRef) metadata describing the article. At the moment that aspect is perhaps the weakest point of machine discoverability associated with journals.

References

  1. H. Rzepa, "The journey from Journal "ESI" to FAIR data objects: An eighteen year old (continuing) experiment.", 2023. https://doi.org/10.59350/g2p77-78m14
  2. T. Mies, A.J.P. White, H.S. Rzepa, L. Barluzzi, M. Devgan, R.A. Layfield, and A.G.M. Barrett, "Syntheses and Characterization of Main Group, Transition Metal, Lanthanide, and Actinide Complexes of Bidentate Acylpyrazolone Ligands", Inorganic Chemistry, vol. 62, pp. 13253-13276, 2023. https://doi.org/10.1021/acs.inorgchem.3c01506
  3. D.C. Braddock, S. Lee, and H.S. Rzepa, "Modelling kinetic isotope effects for Swern oxidation using DFT-based transition state theory", Digital Discovery, vol. 3, pp. 1496-1508, 2024. https://doi.org/10.1039/d3dd00246b
  4. H. Rzepa, "A cascading tutorial in finding rich NMR data using the Datacite datasearch engine.", 2020. https://doi.org/10.59350/7jq8v-z4p56

Possible Formation of an Impossible Molecule?

May 20th, 2024

In the previous post, I explored the so-called “impossible” molecule methanetriol. It is regarded as such because the equilbrium resulting in loss of water is very facile, being exoenergic by ~14 kcal/mol in free energy. Here I explore whether changing the substituent R could result in suppressing the loss of water and stabilising the triol.

I started (as I usually do) with a search for crystal structures, in this case containing the motif shown below (trisubstituted carbon, disubstituted oxygen and  R = H or C and any type of connecting bond), which is the species resulting from loss of R to form a trihydroxycarbenium cation.

This produces six hits, of which  HIWQEJ[1] (DOI: 10.5517/cc3k560) and UYOYUD[2] (DOI: 10.5517/ccvrghj) are both salts of trihydroxycarbenium cation (or protonated carbonic acid) itself – the counter ion being eg AsF6 or an iron system. So R needs to be a stable anion and two obvious groups are triflate (trifluoromethylsulfonate) or bis(trifluoromethanesulfonyl)azanide.

The triflate (R=CF3SO2-O) shown below has an unusually long predicted C-O bond (1.620Å), which suggests the system is already partially ionised as shown in the top diagram. An ωB97X-D calculation [3], DOI: 10.14469/hpc/14280) reveals the species shown below is +6.6 kcal/mol higher in free energy than the one corresponding to loss of water.


Bis-triflamide (bis(trifluoromethanesulfonyl)azanide) goes further, helped no doubt by the formation of a second strong hydrogen bond between the two ions. It is now -11.8 kcal/mol lower in free energy compared to the species resulting from loss of water.

So that is my candidate for a “possible” impossible molecule. Any takers for its synthesis?


Postscript: The next higher homologue, tris(trifluoromethanesulfonyl)methanide anion + trihydroxycarbenium cation is similar to the bis-triflamide in being -12.1 kcal/mol lower than the species resulting from loss of water.


References

  1. R. Minkwitz, and S. Schneider, "Trihydroxycarbenium Hexafluorometalates: Salts of Protonated Carbonic Acid", Angewandte Chemie International Edition, vol. 38, pp. 714-715, 1999. https://doi.org/10.1002/(sici)1521-3773(19990301)38:5<714::aid-anie714>3.0.co;2-k
  2. S. Guo, J. Lin, W. Chen, X. Wei, J. Wang, and W. Dong, "CCDC 797118: Experimental Crystal Structure Determination", 2011. https://doi.org/10.5517/ccvrghj
  3. H. Rzepa, "Possible Formation of an Impossible Molecule?", 2024. https://doi.org/10.14469/hpc/14280

Exploring Methanetriol – “the Formation of an Impossible Molecule”

May 16th, 2024

What constitutes an “impossible molecule”? Well, here are two, the first being the topic of a recent article[1]. The second is a favourite of organic chemistry tutors, to see if their students recognise it as an unusual (= impossible) form of a much better known molecule.

Perhaps we could define impossible molecules into two slightly different classes.

  1. The first class is a molecule which is entirely normal in terms of its structure and bonding, but just happens to be thermodynamically less stable than an isomeric form. If all mechanistic possibilities for converting it to the more stable form are eliminated, then there is no reason it should not be detected, even though it is “impossible”. By the way, quite a number of impossible molecules have been prepared using sterics  (t-butyl groups and the like, a strategy first used perhaps 40 or so years ago) to prevent the molecule from either reacting with itself or with other molecules.
  2. The second class is a molecule where the bonding or its structure are so deviant from accepted theories of the structures of molecules that its energy is either so high that either it simply cannot be prepared in the first place, or that nothing can be done to prevent its rearrangement to a much more stable form.

The first of the examples below falls clearly into the first category; methane triol. As reported[1], this impossible molecule has now been detected both at low temperatures and in the gas phase at low pressure using time-of-flight mass spectrometry and other elegant experiments. The key is to ensure either a very low temperature or the absence of any acid catalyst to decompose it to methanol and formic acid.

As is my usual practice in discussing any interesting molecule, I first tend to conduct a search of the CSD (Cambridge structure database) – in this case it has to be said with little hope of finding any examples. I was therefore very surprised to find one example reported, COLRUT.[2] The crystal structure of COLRUT can be viewed here.[3] (DOI: 10.5517/ccdc.csd.cc22yztvv).  Clearly, given the discussion at the top, alarm bells should be ringing about this result. When any such alarms sound, it is my second practice to turn to calculations for verification. In this case to FAIR Data calculations[4]  (DOI: 10.14469/hpc/14236).

The article[1] also reports such calculations, but its good to have independent verification (of some of them), so I list the essential conclusions from my own calculations here.

  1. At the CCSD(T)/Def2-TZVPP level, methane triol is ΔG298 14.49 kcal/mol higher in free energy than formic acid and water. This is not really an impossibly higher energy, and the molecule is “impossible” only because there is a very facile reaction for it to undergo (acid catalysed disproportionation for example).
  2. At the much faster ωB97X-D/Def2-TZVPP level, the value is 14.48 kcal/mol, which is agrees well enough with the previous to use this method to explore further.
  3. If the C-H is replaced by C-CF3 (again a good tutorial question for how to stabilize the diol form of eg acetone), the energy of the triol is reduced to +9.4 kcal/mol. Still positive, but much smaller than the original.
  4. If the C-H is replaced by C-(CF3)3 it is still unstable by 13.6 kcal/mol. Not much chance of using substituents to create a “possible” triol then.
  5. Next, the transition state for unimolecular decomposition to water and formic acid. An IRC for this is shown below and the free energy of activation is +36.6 kcal/mol. This proceeds via a very non-linear hydrogen transfer, a geometry known to be unfavourable and indeed an energy too high for this rearrangement to occur (in a mass spectrometer? What is the temperature of molecules under these conditions?). Note how a nice hydrogen-bonded form of the products forms at the end.


    I could not resist showing the dipole moment response along the IRC. Lovely!
  6. What about an intermolecular rearrangement, which would occur at either higher pressures or perhaps higher temperatures? Now, ΔG = 26.7 kcal/mol, a more viable thermal reaction.The lower barrier is because the 6-ring transition state now allows a less bent hydrogen transfer.

  7. This is the reaction of a trimer, ΔG = 24.2 kcal/mol. The 8-ring transition state now allows almost linear hydrogen transfers. Note that all three transferring hydrogens move more or less in synchrony.
  8. The tetramer: ΔG = 24.1 kcal/mol, now via a 10-ring transition state. If you look carefully at the animation, you can now see that the hydrogen transfers have become very non-synchronous (and the transition state more ionic), although they remain almost linear.
  9. But wait, there is another isomer of the tetramer reaction, instead proceeding via an 8-ring TS, with the fourth triol molecule bonding to the transition state via four hydrogen bonds. This is very much like a stabilised protein transition state and overcomes the extra entropy of adding that fourth molecule and then some; ΔG = 18.9 kcal/mol. So at high concentrations the disproportionation of methane triol is predicted to become a facile reaction and now can only be prevented at low temperatures!<

An NCI (non-covalent-interaction) analysis of the hydrogen bonds in this TS structure is shown below. The blue regions are hydrogen bonds. The ones labelled 1-4 are the four such interactions resulting from addition of a fourth molecule to the hydrogen transfer structure of the trimer. Click for a 3D rotatable model.

So I hope this extended analysis of what makes an “impossible molecule” actually possible adds another dimension to the original report.[1] As for that crystal structure, I will report to CCDC that it may in fact be an artefact and that they should take another look at the crystal structure data and correct it if needed. It is also interesting to explore the properties of cyclic hydrogen transfer reactions. The conclusion here is that an 8-ring transfer may be optimum, especially if it can be stabilized with four or more hydrogen bonds!

References

  1. J.H. Marks, X. Bai, A.A. Nikolayev, Q. Gong, C. Zhu, N.F. Kleimeier, A.M. Turner, S.K. Singh, J. Wang, J. Yang, Y. Pan, T. Yang, A.M. Mebel, and R.I. Kaiser, "Methanetriol─Formation of an Impossible Molecule", Journal of the American Chemical Society, vol. 146, pp. 12174-12184, 2024. https://doi.org/10.1021/jacs.4c02637
  2. P. Mi, L. He, T. Shen, J.Z. Sun, and H. Zhao, "A Novel Fluorescent Skeleton from Disubstituted Thiochromenones via Nickel-Catalyzed Cycloaddition of Sulfobenzoic Anhydrides with Alkynes", Organic Letters, vol. 21, pp. 6280-6284, 2019. https://doi.org/10.1021/acs.orglett.9b02161
  3. https://doi.org/
  4. H. Rzepa, "Exploring Methanetriol - "the Formation of an Impossible Molecule"", 2024. https://doi.org/10.14469/hpc/14236