Finding and Discovery Aids as part of data availability statements for research articles.

February 19th, 2025

Starting around 2016, journal publishers started including mandatory “Data Availability” statements as part of research articles; a typical (dated) example is linked here, including guidelines for how to cite the data itself. I wrote about these aspects last year in a blog post for the RSC journal Digital Discovery[1] and here I follow up with more news.

In a recently published article about Direct Amidation Reactions[2], the following version of a data availability statement appears: An IUPAC FAIRSpec Finding Aid for the NMR spectroscopic data is available at DOI: 10.14469/hpc/14884. A selection of data discovery searches can be found at DOI: 10.14469/hpc/14822 and it introduces the concept of a Finding Aid. Put simply, knowing where the data supporting a research is available will not necessarily lead you to the particular datum you might be looking for, especially if there is a lot of data. Data is still frequently made available in the form of a supporting document called ESI, and such documents can contain many tens of compounds and possibly hundreds of associated spectra. The aim of a Finding Aid is to help you find the ones you are interested in.

If you are interested in how this works, go explore either of the two links given above.  The Finding Aid tool was created by Bob Hanson as part of an IUPAC working party on how to create spectroscopic data in so-called FAIR form (The F of FAIR and the F of Finding Aid are one and the same of course!). This represents its first deployment for a newly published article. The creation tool itself is still α-stage – further tools are being developed – of which more later.

References

  1. H. Rzepa, "The evolving roles of data and citations in journal articles", 2024. https://doi.org/10.26434/chemrxiv-2024-dz2dv
  2. R.J. Procter, C. Alamillo-Ferrer, U. Shabbir, P. Britton, D. Bučar, A.S. Dumon, H.S. Rzepa, J. Burés, A. Whiting, and T.D. Sheppard, "Borate-catalysed direct amidation reactions of coordinating substrates", Chemical Science, vol. 16, pp. 4718-4724, 2025. https://doi.org/10.1039/d4sc07744j

Au-pseudocarbyne – a unusual example of a twelve coordination by carbon.

February 1st, 2025

Derek Lowe tells the story of “carbyne”, a potential further allotrope of carbon, comprising linear chains of carbon atoms, C-C≡C-C≡C-C. Whether such a molecule can exist on its own has long been the the topic of speculation. Now a report has appeared of a “pseudocarbyne”, stabilised by gold atoms.[1]

The now thankfully almost ubiquitous data availability statement includes the DOI: https://doi.org/10.48349/ASU/3TWEI0 [2] as a data repository source of replication data and one of the files found there is a CIF containing the crystal data. Playing with this, I noticed one unusual feature of this structure, which oddly is not apparently mentioned in the article itself and so I thought I would tease it out here – 12 coordination.

Ths simplest unit comprises three eight membered carbon rings, each connected by 4-membered rings to form a local structure with D3h symmetry and hence revealing twelve C-Au bonds of the same length; 2.415Å. Click on the image above to view a 3D model.

A larger section of the (polymeric) structure is shown below, now with D2h symmetry and again with twelve identical C-Au bond lengths

Is such coordination unusual? Well, not for metal clusters, including Au clusters. There are in fact 2014 hits (1985 examples where Y is constrained to be a metal, hence 29 where the central atom is NOT a metal) in the Cambridge crystal structure database for the general search X12Y where X and Y can be any atom, with 244 for X=Au and 576 for X=O but none yet for X=C (the current example has not yet appeared in the distributed database). So certainly Au-pseudocarbyne is a unique and unusual molecule. This also shows that 3D coordinates can always be a useful adjunct to articles to allow quick access for spotting perhaps unexpected features with just a single click!


You might be surprised that a similar search finds 138 hits for X14Y and 16 for X16Y

References

  1. J. Wu, P. Tarakeshwar, S.G. Sayres, M. Meneghetti, H. Kim, J. Barreto, and P.R. Buseck, "Crystal structure of Au-pseudocarbyne(C6)", Scientific Reports, vol. 15, 2025. https://doi.org/10.1038/s41598-024-80359-5
  2. J. Wu, P. Tarakeshwar, S.G. Sayres, M. Meneghetti, H. Kim, J. Barreto, and P.R. Buseck, "Replication Data for: Crystal structure of Au-pseudocarbyne(C6)", 2024. https://doi.org/10.48349/asu/3twei0

Molecules of the Year 2024: Molecular shuttle in a box.

January 25th, 2025

This is another in the C&E News list of candidates for the Molecule of the Year, Molecular shuttle in a box [1]

  1. Mirror-image cyclodextrin [2]
  2. Molecular shuttle in a box [1]
  3. Rule-bending strained alkene [3]
  4. First soluble promethium complex [4]
  5. Single-electron carbon-carbon bond [5]
  6. Hot MOF for capturing carbon[6]

The molecule shown below inside the cavity is coronene. A free energy barrier of ~13 kcal/mol was determined using NMR peak coalescence temperatures, and inferred to correspond to the energy required to move the coronene from one end of the cavity to the other. Here I perform a simple reality check on this result using ωB97XD/Def2-SVP DFT calculations.[7],[8],[9]. This functional includes a second generation dispersion correction, which is the primary effect controlling the position of the coronene inside the cavity.

Firstly, the fully optimised geometry of the complex.

A spacefill representation shows the coronene is a perfect fit inside the cavity!

An NCI analysis (non-covalent-interaction) shows the NCI region around the coronene providing the dispersion stabilisation of the complex. The red regions by the way are related to the Ir, which has very different NCI cut-offs compared to C,N,O and shows up as an artefact.

The barrier is induced by steric interactions between the coronene and the t-butyl groups attached to the edge of the cavitand, shown with a red arrow in the spacefill representation below.

Here is the crunch, the calculated ωB97XD/Def2-SVP barrier is ΔG 5.7 kcal/mol, significantly less than the value of ~13 kcal/mol measured for this dynamic process. But wait, another intermediate was located, shown below, now only 3.8 kcal/mol above the structure shown above. So the energy potential inside the cavity is more complex than just two minima and one transition state!

What are we to make of the disparity between the measured NMR barrier for the shuttle to move from one end of the cavitand to the other and the calculated value? Well, the barrier is likely to mostly arise from dispersion interactions, thus making this molecule a very sensitive test of how accurately the dispersion interactions are being calculated. It is known that the ωB97XD method does rather over-estimate these, and perhaps this is resulting in a barrier which is considerably too low? So this makes this molecule a useful test of potentially more-accurate dispersion corrected methods! The B3LYP+GD3+BJ method for this barrier is 6.6 kcal/mol [10],[11]. When new dispersion methods become available, I might add these as well to see if a trend develops.

References

  1. S. Ibáñez, P. Salvà, L.N. Dawe, and E. Peris, "Guest‐Shuttling in a Nanosized Metallobox", Angewandte Chemie International Edition, vol. 63, 2024. https://doi.org/10.1002/anie.202318829
  2. Y. Wu, S. Aslani, H. Han, C. Tang, G. Wu, X. Li, H. Wu, C.L. Stern, Q. Guo, Y. Qiu, A.X. Chen, Y. Jiao, R. Zhang, A.H.G. David, D.W. Armstrong, and J. Fraser Stoddart, "Mirror-image cyclodextrins", Nature Synthesis, vol. 3, pp. 698-706, 2024. https://doi.org/10.1038/s44160-024-00495-8
  3. L. McDermott, Z.G. Walters, S.A. French, A.M. Clark, J. Ding, A.V. Kelleghan, K.N. Houk, and N.K. Garg, "A solution to the anti-Bredt olefin synthesis problem", Science, vol. 386, 2024. https://doi.org/10.1126/science.adq3519
  4. D.M. Driscoll, F.D. White, S. Pramanik, J.D. Einkauf, B. Ravel, D. Bykov, S. Roy, R.T. Mayes, L.H. Delmau, S.K. Cary, T. Dyke, A. Miller, M. Silveira, S.M. VanCleve, S.M. Davern, S. Jansone-Popova, I. Popovs, and A.S. Ivanov, "Observation of a promethium complex in solution", Nature, vol. 629, pp. 819-823, 2024. https://doi.org/10.1038/s41586-024-07267-6
  5. T. Shimajiri, S. Kawaguchi, T. Suzuki, and Y. Ishigaki, "Direct evidence for a carbon–carbon one-electron σ-bond", Nature, vol. 634, pp. 347-351, 2024. https://doi.org/10.1038/s41586-024-07965-1
  6. R.C. Rohde, K.M. Carsch, M.N. Dods, H.Z.H. Jiang, A.R. McIsaac, R.A. Klein, H. Kwon, S.L. Karstens, Y. Wang, A.J. Huang, J.W. Taylor, Y. Yabuuchi, N.V. Tkachenko, K.R. Meihaus, H. Furukawa, D.R. Yahne, K.E. Engler, K.C. Bustillo, A.M. Minor, J.A. Reimer, M. Head-Gordon, C.M. Brown, and J.R. Long, "High-temperature carbon dioxide capture in a porous material with terminal zinc hydride sites", Science, vol. 386, pp. 814-819, 2024. https://doi.org/10.1126/science.adk5697
  7. H. Rzepa, "Molecular shuttle, wB97XD/Def2-svp, G = -8172.276752", 2025. https://doi.org/10.5281/zenodo.14746877
  8. H. Rzepa, "Molecular shuttle, wB97XD/def2-svp TS half way point, E = -8175.3727 G = -8172.267664 DG = 5.7", 2025. https://doi.org/10.5281/zenodo.14746910
  9. H. Rzepa, "Molecular shuttle, wB97XD/def2-svp TS half way point, (calcfc) optimisation G =-8172.270606 DG = 3.8", 2025. https://doi.org/10.5281/zenodo.14746936
  10. H. Rzepa, "Molecular shuttle, B3LYP+GD3+BJ/Def2-svp, G = -8175.526700", 2025. https://doi.org/10.5281/zenodo.14748031
  11. H. Rzepa, "Molecular shuttle, B3LYP+GD3+BJ/def2-svp TS half way point, -8175.516132 DG = 6.6", 2025. https://doi.org/10.5281/zenodo.14748035

Molecules of the Year 2024: A crystal structure perspective on anti-Bredt olefins.

January 8th, 2025

Each year C&E News publishes a list of candidates for the Molecule of the Year. For 2024 the list is (in order of votes cast for each)

  1. Mirror-image cyclodextrin [1]
  2. Molecular shuttle in a box [2]
  3. Rule-bending strained alkene [3]
  4. First soluble promethium complex [4]
  5. Single-electron carbon-carbon bond [5]
  6. Hot MOF for capturing carbon[6]

I dealt at length with entry 5 (single-electron carbon-carbon bond)  last year, my conclusions rather negating the statement made about it being an example of such a bond. Here I take a look at number 3, A solution to the anti-Bredt olefin synthesis problem.[3] Four molecules below (1-4) were identified as examples of anti-Bredt rule compounds from trapping experiments (their properties such as NMR or indeed structures are not reported). Julius Bredt had predicted 100 years ago would be particularly unstable.[7]

One way of putting these molecules into context is to search for any similarly strained alkenes in the Cambridge crystal database. The search query used defined a centroid of the plane defined by the three carbon atoms attached to the bridgehead carbon atom, and then the distance from that centroid to the carbon atom itself. For entirely  planar coordination of that atom, the distance would be ~zero and the deviation from zero is one way of measuring how strained the alkene is.

The results of the search (for which fullerenes are excluded as special cases) is shown below. The upward limits of the centroid distance are between ~0.3 – 0.34Å; the outlier at 0.47Å appears to be an error, since the corresponding C=C distance is 1.565Å.

Screenshot

For comparison, the centroid distance to four-coordinate carbon (a central carbon with four attached carbon ligands) is shown below – the most probable value being ~0.51Å.

Screenshot

Since compounds 1-4 were not actually isolated, no crystal structures or NMR data are available. ωB97XD/Def2-TZVPP calculations were performed to establish trends in these properties (FAIR Data [8]).

Molecule Centroid distance, Å C=C length ν cm-1 δ 1H δ 13C
1 (“ABO 12”) 0.510 (adduct 62 [3]) 1.346 1611 6.76 189.8
2 0.505 (adduct 58 [3]) 1.349 1594 6.76 196.8
3 0.341 (adduct 72 [3]) 1.341 1684 5.84 170.3
4 0.357 (adduct 68 [3]) 1.336 1694 6.26 173.3

For compounds 1-2, the largest ring of the three associated with the bridgehead carbon is six, whereas for compounds 3-4 it is seven. This is reflected in the values shown in the table above. The centroid distance for the six-ring examples is close to 0.5Å, for which no examples exist in the crystal structure database. The centroid distance for the seven-ring examples is 0.34-0.35Å, for which a number of crystalline examples are evident. It seems likely then that compounds 3-4 stand a better chance of being isolated as such, rather than having their existence inferred from the cycloadducts they form. Perhaps a modification to the experimental procedures might accomplish this? The predicted 1H and 13C spectra are shown in the table to aid identification if this is ever achieved. Also noteworthy are the C=C stretching vibrations, which are lowered significantly for 1-2 compared to 3-4.

Its good to have experimental evidence for compounds that 100 years ago were predicted to be unusually unstable. Perhaps the next step is to isolate them as pure compounds and study their properties.

References

  1. Y. Wu, S. Aslani, H. Han, C. Tang, G. Wu, X. Li, H. Wu, C.L. Stern, Q. Guo, Y. Qiu, A.X. Chen, Y. Jiao, R. Zhang, A.H.G. David, D.W. Armstrong, and J. Fraser Stoddart, "Mirror-image cyclodextrins", Nature Synthesis, vol. 3, pp. 698-706, 2024. https://doi.org/10.1038/s44160-024-00495-8
  2. S. Ibáñez, P. Salvà, L.N. Dawe, and E. Peris, "Guest‐Shuttling in a Nanosized Metallobox", Angewandte Chemie International Edition, vol. 63, 2024. https://doi.org/10.1002/anie.202318829
  3. L. McDermott, Z.G. Walters, S.A. French, A.M. Clark, J. Ding, A.V. Kelleghan, K.N. Houk, and N.K. Garg, "A solution to the anti-Bredt olefin synthesis problem", Science, vol. 386, 2024. https://doi.org/10.1126/science.adq3519
  4. D.M. Driscoll, F.D. White, S. Pramanik, J.D. Einkauf, B. Ravel, D. Bykov, S. Roy, R.T. Mayes, L.H. Delmau, S.K. Cary, T. Dyke, A. Miller, M. Silveira, S.M. VanCleve, S.M. Davern, S. Jansone-Popova, I. Popovs, and A.S. Ivanov, "Observation of a promethium complex in solution", Nature, vol. 629, pp. 819-823, 2024. https://doi.org/10.1038/s41586-024-07267-6
  5. T. Shimajiri, S. Kawaguchi, T. Suzuki, and Y. Ishigaki, "Direct evidence for a carbon–carbon one-electron σ-bond", Nature, vol. 634, pp. 347-351, 2024. https://doi.org/10.1038/s41586-024-07965-1
  6. R.C. Rohde, K.M. Carsch, M.N. Dods, H.Z.H. Jiang, A.R. McIsaac, R.A. Klein, H. Kwon, S.L. Karstens, Y. Wang, A.J. Huang, J.W. Taylor, Y. Yabuuchi, N.V. Tkachenko, K.R. Meihaus, H. Furukawa, D.R. Yahne, K.E. Engler, K.C. Bustillo, A.M. Minor, J.A. Reimer, M. Head-Gordon, C.M. Brown, and J.R. Long, "High-temperature carbon dioxide capture in a porous material with terminal zinc hydride sites", Science, vol. 386, pp. 814-819, 2024. https://doi.org/10.1126/science.adk5697
  7. J. Bredt, "Über sterische Hinderung in Brückenringen (Bredtsche Regel) und über die <i>meso</i>‐<i>trans</i>‐Stellung in kondensierten Ringsystemen des Hexamethylens", Justus Liebigs Annalen der Chemie, vol. 437, pp. 1-13, 1924. https://doi.org/10.1002/jlac.19244370102
  8. H. Rzepa, "Molecules of the Year - 2024. A crystal structure perspective on anti-Bredt olefins.", 2025. https://doi.org/10.14469/hpc/14898

The secrets of FAIR Metadata: optimisation for Chemical Compounds.

December 11th, 2024

The idea of so-called FAIR (Findable, Accessible, Interoperable and Reusable) data is that each object has an associated metadata record which serves to enable the four aspects of FAIR. Each such record is itself identified by a persistent identifier known as a DOI. The trick in producing useful FAIR data is defining what might be termed the “granularity” of data objects that generate the most readily findable and which most usefully enable the other three attributes of FAIR.

To set the scene for how to do this optimally, I first set out two extreme examples of FAIR objects relating to chemical spectroscopy such as NMR. These will be directly associated with a journal article describing for arguments sake say 50 compounds new to science, with the existence of these data objects identified via a data availability statement appended to the article. Each compound might be characterised by say spectroscopic and crystallographic information and perhaps some computational analysis. For the spectroscopic analysis, perhaps 5 types of NMR experiments might be included, giving a total of around 10 separate types of datasets for each compound, or in round numbers lets say 500 data sets for the 50 compounds reported in such an article.

  • Method A: The data associated with an articles takes the form of a ZIP (or other type of compressed) archive containing all 500 of the intended FAIR data sets. The resulting ZIP file is then described with a single metadata record and assigned a single DOI using e.g. the tools of a data repository. That one metadata record has the (mammoth) task of describing all of these datasets, across perhaps ten different kinds of experiment. This type of monolithic object is in fact not unusual, for several reasons. Some repositories impose a significant charge for each deposition, and so the temptation to reduce costs would be to adopt this expedient.
  • Method B: The other extreme is to literally deposit all 500 data sets separately and assign 500 DOIs, each with a separate metadata record. The issue now is less how well the metadata record can describe each dataset, but more of to establish the relationships between these 501 objects (the journal article and each dataset). Such relationships could include:
    • that between the compound molecular structure and the dataset
    • that between say the dataset and the type of spectroscopic experiment (e.g. IR, MS, NMR, XRD, Comp)
    • that between different eg NMR experiments for the same compound (the nucleus, the pulse sequence, the solvent, etc).
    • These could in total represent a great many individual relationships between both the 500 data sets and the article itself (formally around 5012/2!)

Before setting our solution, I show below how a typical repository such as Zenodo handles the relationships between data objects noted above.

ggg

The relation type is selected from a controlled list of about 30, and is entered for each individual metadata record associated with a DOI. So clearly, relationships in the second category would have to be individually entered, hardly feasible for 5012/2 entries. And in the first category, only one relationship between the single large archive of data and the journal DOI can be added. One of the  more important relationships in this context are the “Has part”  or “Is part of” ones (diagram above).

The use of this now constitutes Method C.

  1. One starts by creating what could be called a top or level 1 entry, which will contain important  core metadata information such as the contributing authors, the institute where the data was obtained,  the title and overall description of the datasets to come, a license,  a date, a declaration of the published article associated with the data and finally the  DOI of this metadata record. This top-level entry would  also list all the compounds on level 2 for which data is available  and each being referenced by a “Has part” declaration via a  DOI for each compound.
  2. Each compound on level 2 would in turn point back to level 1 by an “Is part of” metadata declaration. Each compound on level 2 would also  list the spectroscopic experiments available that compound, for example the NMR method as part of level 3. It would have an “Is part of” declaration pointing back to the compound level  2 entry.
  3. The  list of the different NMR experiments on  level 3 also have “Has part” declarations pointing  to the list of NMR experiments on level 4.
  4. Each NMR experiment conducted on level 4 would contain an “Is part of” declaration back to level 3 and a list of “Has part” entries which describe the individual data files available for that experiment in the metadata record for level 4.

If you wish, you can inspect all “Has part”/”Is part of” declarations in the metadata records for these various levels by invoking e.g. https://data.datacite.org/application/vnd.datacite.datacite+xml/10.14469/hpc/11446 (replacing e.g. 11446 by any of the DOI suffixes shown in red in the diagram below). They are all associated with this published article.[1]

What does this use of relational parts declarations achieve? Well, compared to method  A, where everything had to be achieved within a single metadata record (and in practice never is) or method  B, where a very large number of relationships would have to be declared (and again never are), Method C achieves a good balance between the two. By collecting the metadata information into groups, one can achieve a more readily navigable structure for the information and also allow sub-groups to effectively inherit properties from the higher group.

I end by noting that far too few FAIR data collections associated with published journal articles adopt such procedures, in large part because of very little current exploitation of relationships between the data such as the one used above (“Has part”/”Is part of”). The repository itself has to  be carefully designed to do this as automatically as possible and not require the human depositor to invoke each instance by hand (as shown for e.g. Zenodo above). An example of just such a repository is described here.[2]


The data sets themselves might be made available in more than one form (for NMR, a Bruker ZIP archive, an Mnova file, a JCAMP-DX format or just a PDF spectrum), thus increasing the number even further.
It reminds me of when I used to teach molecular orbital theory using the Hückel method, which requires a secular matrix to be diagonalised. For e.g. naphthalene, this operation would have to be conducted on a 10*10 matrix, something almost impossible by hand. However, one could use group theory to block diagonalise this matrix into much smaller matrices with the off-diagonal elements between them set to zero, thus considerably reducing the task at hand.

References

  1. T. Mies, A.J.P. White, H.S. Rzepa, L. Barluzzi, M. Devgan, R.A. Layfield, and A.G.M. Barrett, "Syntheses and Characterization of Main Group, Transition Metal, Lanthanide, and Actinide Complexes of Bidentate Acylpyrazolone Ligands", Inorganic Chemistry, vol. 62, pp. 13253-13276, 2023. https://doi.org/10.1021/acs.inorgchem.3c01506
  2. M.J. Harvey, A. McLean, and H.S. Rzepa, "A metadata-driven approach to data repository design", Journal of Cheminformatics, vol. 9, 2017. https://doi.org/10.1186/s13321-017-0190-6

Data Discovery: A pick-n-mix library of useful FAIR Data searches – and a call for new search suggestions.

November 25th, 2024

With AI and Machine learning needing data in abundance, interest in data discovery is intense. However, this type of discovery is somewhat different from more traditional data base searches, in that it is particularly suited for machine discovery as well as by humans. The discovery searches are conducted using an aggregated and federated metadata store, such as that curated by DataCite. How to construct a suitable search is however still not entirely human-friendly. The start point for understanding how to search is this resource: XML to JSON mappings and the XML referred to can be found here. [1] Since the learning curve to construct such data searches can be quite steep, I thought I would share as a library some recent searches I constructed for a talk I am giving. This post is essentially an extension and update of an earlier challenge I was set along these lines and which appeared here.[2]

You can see that the searches come as components linked by Boolean operators, separated by strings such as +AND+, +OR+ or +NOT+. Essentially like a Lego constructor set, you can create your own searches by combining these components to suit your own needs. No doubt some AI-based procedure will come along that will convert natural language expressions of the intended search into the JSON-friendly strings you see below – at least that is the hope.

Part 1: Data discovery based on general properties such as the reporting Institution, the publisher or the Researcher

  1. Find all Data-related Works associated with Cambridge University and the American Chemical Society Publisher
  2. Find all Data-related Works associated with Imperial College and the American Chemical Society Publisher
  3. Find all Datasets OR Collections associated with Imperial College and the American Chemical Society Publisher and the term
    Pyrazol in the Title or Description

  4. Find all Datasets OR Collections associated with Imperial College and the American Chemical Society Publisher and the term
    Pyrazol in the Title or Description and a specified Researcher

  5. Find Datasets only associated with Imperial College and the term Pyrazol in the Title or Description
  6. Find just Datasets associated with a specific researcher
  7. Find Data-related  Works associated with Cambridge University, the SubjectScheme FOS (Field of Science) and the Subject term *Chemical*
  8. Establish if a specified publication with a specified author has an associated FAIR Dataset or FAIR Collection:
  9. Establish how many journal publications by a specified author have an associated FAIR Dataset or FAIR Collection:

Part 2: Data discovery based on chemical properties such as NMR, IR or X-ray spectroscopy

  1. Find all Datasets associated with Chemical structure representation and NMR Media types,
    NMR as a Subject and the title or description term
    “Pyrazol”

  2. Find all Datasets associated with Chemical structure representation and NMR Media types,
    NMR Nuclei as a Subject, for 13C and the title or description term
    “Pyrazol”

  3. Find all Datasets associated with Chemical structure representation and NMR Media types,
    NMR as a Subject, for HMBC Experiments and the title or description term
    “Pyrazol”

  4. Find all Datasets associated with Chemical structure representation and NMR Media types,
    NMR as a Subject, using solvent “CD3OD” and the title or description term
    “Pyrazol”

  5. Find all Datasets associated with NMR Media types,
    NMR as a Subject and InChIKey : OZEYXLXJQKVGCZ-UHFFFAOYSA-L

  6. Find all Datasets associated with NMR Media types,
    NMR as a Subject and the molecular formula component of the full InChI : InChI=1S/2C18H16N2O3.2C2H6O.Ca/c2*1-23-15-9-7-13 etc

  7. Find all Datasets associated with Chemical structure representation Media types,
    IR as a Subject and the title or description term
    “Pyrazol”

  8. Find all Datasets associated with a Chemical structure representation and Crystal structure
    Media types, XRAY as a Subject and the
    title or description term “Pyrazol”

Part 3: Data discovery based on chemical properties such as Computational modelling

  1. Find all Datasets associated with Chemical structure representation and Computation Media
    types, COMP as a Subject and the title
    or description term “Pyrazol”
  2. Find all Datasets associated with Computation Media types and the subject KIE for Hydrogen isotopes.

One feature of this approach is that the searches themselves, which are across a globally aggregated metadata store, can change with time. So repeating some of the searches at defined time intervals can also give a dynamic indication of how a particular area of data is growing. Other searches are of course designed to give a single hit which probably will not change with time.

The above is based on an interpretation and implementation of the DataCite Schema, one which will eventually need to be agreed by the communities and sub-communities that might wish to use them. So beware, there may be other implementations covering similar data that would not eg be found by the above searches, particularly in the way the subject terms above are used. They are therefore included here purely to raise awareness of the potential that such an approach has – along with my observation that I had never attended any presentation where they have been discussed or shown. In the future, it seems likely that these JSON-based searches will themselves get automated and generated by software rather than by a human as here. When that comes, searching will never be the same again!


I also welcome suggestions for new search queries. This might either be accommodated using the existing metadata, or might require new additions to the metadata record. Please send them here as comments.


 

References

  1. DataCite Metadata Working Group., "DataCite Metadata Schema Documentation for the Publication and Citation of Research Data and Other Research Outputs v4.5", DataCite, 2024. https://doi.org/10.14454/g8e5-6293
  2. H. Rzepa, and T. Davies, "Open publishing FAIR spectra for and by students", Spectroscopy Europe, pp. 22, 2022. https://doi.org/10.1255/sew.2022.a10

Mechanism of the Masamune-Bergman reaction. Part 4. Why was the DFT energy barrier too high for the Calicheamicin reaction?

October 29th, 2024

Michael in a comment here on the mechanism of the Masamune-Bergman reaction notes that when it occurs as part of the Calicheamicin (an antibody-drug conjugate or ADC) version of this mechanism, a pre-step is first necessary. As discussed in this review article,[1] the trisulfide linkage is reduced and the resulting thiolate undergoes a facile 1,4-addition to the adjacent enone.

DFT calculations on the new form (FAIR Data DOI: 10.14469/hpc/14632 [2] show that the free energy barrier is reduced from 38.6 kcal/mol to 26.2 kcal/mol.

This is now a reasonable value for a thermal reaction, being a 12.4 kcal/mol reduction from the unactivated species. We can conclude that Michael’s suggestion was spot on, and suggests in turn that a DFT-biradicaloid calculation is in fact a reasonable procedure for modelling this type of system.

References

  1. V. Kostova, P. Désos, J. Starck, and A. Kotschy, "The Chemistry Behind ADCs", Pharmaceuticals, vol. 14, pp. 442, 2021. https://doi.org/10.3390/ph14050442
  2. H. Rzepa, "Mechanism of the Masamune-Bergman reaction. Part 4. Why is the DFT energy barrier too high?", 2024. https://doi.org/10.14469/hpc/14632

A one-electron bond in methyl-λ1-borane.

October 9th, 2024

In exploring one-electron carbon-carbon bonds, I had noted previously[1] that both hexafluoroethane and ethane itself could each lose an electron to produce such species. A discussion developed in which a molecule isoelectronic with ethane, namely the methyl-λ1-borane radical (H3B-CH3) was proposed by Jacob. The optimised structure at the ωB97XD/6-31G(d) level exhibited a B-C bond length of 1.57Å, with two of the B-H hydrogens forming a a 3c-3e bond with boron and so a one-electron B-C bond was discounted. Here I take a closer look at this system.

At the ωB97XD/Def2-TZVPP level, I located an alternative structure with a longer B-C bond of 1.737Å[2] and an “agostic” like interaction between C and one B-H bond.

The electron density difference maps between methyl-λ1-borane and its mono cation is shown below and following it the density difference map between the corresponding anion and methyl-λ1-borane radical. These are very similar to the maps obtained previously for hexafluoroethand and ethane  and support the hypothesis that the differences between the two-electron/zero-electron species and the one-electron radical originates at least in part in the B-C bond.


A contour map of the negative region of the electron density Laplacian (-0.04 au) again shows that it lies along the B-C bond, suggesting covalency. Note the -ve Laplacian in the region of the agostic interaction! The NCI (non-covalent-interaction) plot is featureless.

The computed methyl-λ1-borane radical has a B-C stretching vibration corresponding to 494 cm-1, a Wiberg bond order of 0.660 and Wiberg bond index totals of 3.51 for carbon and 3.28 for boron. These can all be reasonably interpreted as a one-electron “half” bond between C and B. With a computed bond length of 1.737Å, it represents the shortest “one electron” bond thus far identified, and hence extends the length range of such bonds to around 1.16Å.


Postscript 1
I also looked at the radical anion of H3B-BH3 which is isoelectronic to methyl-λ1-borane, revealing rB-B 2.124Å and has a classic “ethane” D3d like structure. The electron density difference map between H3B-BH3 and the neutral H3B-BH3 is shown below, revealing in a considerable reorganisation of the electron density, only one aspect being the B-B region and different from the reorganisation of the radical cation of ethane itself. This reveals that simply talking about a two-atom region for this sort of system is very simplistic and misleading. The Wiberg B-B bond index is 0.383 and the B-B stretching vibration is 384 cm-1.

The electron density Laplacian of H3B-BH3 contoured for a -ve value of -0.04 au, again implying a covalent B-B bond.

Postscript 2

Here I add hexamethylethane radical cation to the list. Firstly the density difference map. Note the longer C-C bond (2.31Å) than for ethane radical cation (1.933Å). In this sense, the hexamethyl radical cation has a weaker C-C bond than does the unsubstituted version (191 cm-1) vs 477 cm-1)

The Laplacian shows no -ve value in the C-C region (isosurface value -0.01), again placing it in the weak bond category.

Finally some NCI plots. Here the density cut-off threshold is crucial. Typically a second period element covalent density is taken as 0.05 au, and this is removed from the NCI analysis. The feature seen along the C-C bond at this level is typical of weak covalent interactions however.

Reducing the density to 0.023 (typical of density in which one atom is of the third period, ie Si) removes the central C-C feature, leaving only NCI effects between the hydrogen atoms of the methyl groups. These in fact form a continuous weakly stabilizing surface between the two halves.

So with hexamethylethane radical cation, we get messages that the interaction between the two carbons is both weak, but also not a non-covalent interaction. So this is a very weak covalent bond perhaps, but in this strange region, it is difficult to ascribe a single description to it.

References

  1. H. Rzepa, "A one-electron bond in methyl-λ1-borane.", 2024. https://doi.org/10.14469/hpc/14662

The one-electron carbon-carbon bond: Hexafluoroethane and ethane radical cations.

October 3rd, 2024

In the previous post, I looked[1] at the recently reported[2] hexa-arylethane containing a carbon-carbon one-electron bond, its structure having been determined by x-ray diffraction (XRD). The measured C-C bond length was ~2.9aÅ and my conclusion was that the C…C region represented more of a weak “interaction” than of a bond as such. How about a much simpler system, hexafluoroethane? Here, the two-electron C-F bonds are much lower in energy than the C-C bond, so when the molecule is ionised, it escapes from the C-C bond rather than any of the C-F bonds. The below is the structure computed at the ωB97XD/Def2-TZVPP level, revealing a much shorter C-C bond of 2.149Å. The computed C-C stretching vibrational frequency is 179 cm-1.

An electron density difference map, obtained by subtracting the computed density of the dication from that of the radical cation at the geometry of the former is shown below, confirming that the electron has been removed from the C-C region, with a smaller removal from the C-F bonds.

The Laplacian of the electron density is shown below contoured for negative values of this function. Unlike the previous molecule, this now has a (small) negative value along the C-C region (contour -0.001).

A calculation of the NCI surface gave a null result! The parameters for computing a non-covalent analysis are thus: [0.5 1 0.0005 0.05 0.95 1.00], being the ones used in the previous analysis. The value of 0.05 is the density cutoff used to remove covalent density and using this value, no non-covalent features are detected. Or, put another way, only covalent features are present, as supported by the -ve Laplacian noted above.

Whilst C2F6+. cannot be claimed to be typical of a molecule with a hypothetical “pure” one-electron C-C bond, it is certainly very different from the previous example.[1],[2] Time to go all the way and try ethane itself, C2H6+.. Again the same behavour is seen, whilst the calculated C-C length reduces to 1.933Å. The C-C stretching vibrational frequency is elevated to 477 cm-1. We might take these last values as the natural ones for a one-electron C-C bond?

This alternative subtraction involves the density difference between neutral ethane and its radical cation. The result is essentially the same.

So these two ethane derivatives add some further context to the properties of a one-electron C-C bond. We have seen them range from a low of ~1.9Å to a high of ~2.9Å This variation of around 1Å as a function of the substituents on the two carbons must be the largest ever seen for any kind of bond!

References

  1. H. Rzepa, "A carbon-carbon one-electron bond! Or a weak carbon-carbon interaction?", 2024. https://doi.org/10.59350/xp5a3-zsa24
  2. G.N. Lewis, "THE ATOM AND THE MOLECULE.", Journal of the American Chemical Society, vol. 38, pp. 762-785, 1916. https://doi.org/10.1021/ja02261a002

A carbon-carbon one-electron bond! Or a weak carbon-carbon interaction?

October 1st, 2024

More than 100 years ago, before the quantum mechanical treatment of molecules had been formulated, G. N. Lewis proposed[1] a simple model for chemical bonding that is still taught today. This is the idea of the three categories of bond we know as single, double and triple, comprising respectively two, four and six shared electrons each, at least for the very common carbon-carbon bond. A little more than a decade ago, this was extended upwards to the eight-electron quadruple bond.[2]. Now, at the other extreme of downwards, a molecule has been characterised in the solid state with a one-electron C-C bond.[3] In this sub-two-electron region, bonds such as hydrogen bonds have long been recognised and they form part of a class of “weak” bonding known instead as exhibiting “non-covalent-interactions” or NCI. But specifically a one-electron carbon-carbon bond stands apart from these weaker types and so it is certainly news when one such is reported and characterised in the crystalline state by x-ray diffraction.

To start the investigation, a search of the crystal structure database was performed using the following more general query of the structure above. The central C-C bond (in green below) was not added, leaving the two carbons as 3-coordinate.

This resulted in 10 hits, all revealed as dications, with the central C-C distance ranging from 2.8Å to 3.0Å. So the unique feature of this new report is that they were able to find a system where oxidation did not proceed directly to the dication, but stopped at the 1-electron level to give a radical cation instead. This new structure poses a bit of a quandry for the curators of the CSD. The index for this database is built on the basis of whether any two atoms in a molecule are connected by a “bond”, and the allowed values for bonds range from single to quadruple, with various intermediate descriptions (such as aromatic) and finally “any”. This latter basically means any of the previous, but what I am pretty certain of is that it does not mean “one-electron”, or “half”. The new compound has not yet been indexed in my current version of the CSD, so this presumption is not yet tested.

The authors[3] did also make the dication and they report a length of 3.03Å for this species, broadly in accord with the range shown above and a reduced value of 2.92Å for the radical cation (Δr 0.11Å). This is quite a small contraction induced by the formation of the one-electron bond, which is already hinting that it might actually be a weak bond.

Next, I proceeded by performing my own DFT calculations on these species, at the ωB97XD/Def2-TZVPP level. At this level the di- and monocationic C-C bond lengths came out as 3.075Å and 2.867Å (Δr 0.21Å), a slightly larger contraction than that reported, but still representing a weak bond.

With wavefunctions now available for the species, I decided to inspect the electron densities. This was calculated at the geometry of the radical cation, and then at the same geometry, the dication was calculated and the two electron densities subtracted. The resulting density surface, representing one electron is shown below. As expected, the most significant feature occurs in the C-C region, but quite a lot of this one electron is distributed around the aromatic rings (I must find out how to integrate regions!). So already we see that this “1-electron” bond is in fact only a fraction of one electron. Again an indication that it is a weak bond.

A procedure often used to identify weak bonds is called NCI, or noncovalent-interactions.[4] These are by definition interactions weaker than the single bonds, often being hydrogen bonds and other unusual interactions such as a π-π stacking region (rather than a bond). So here, we see that below the single bond type, we get a continuum of interactions rather than bonds as such. The resulting NCI analysis is shown below for firstly the radical cation and then the di-cation at the same geometry.

The colour coding in the NCI surface analysis above means that dark blue are strong non-covalent interactions such as hydrogen bonds, paler blue or cyan areas are weaker ones and green is weaker still and typical of π-π stacking regions rather than bonds between two atoms. These are all deemed stabilising, whereas orange and red regions are destabilising. Click on the image above to inspect the full three dimensional surface of this NCI function and you will find the π-π stacking features, but also three cyan regions. Enclosed by two of the cyan regions are dark blue ones, whilst the third cyan region contains only a small blue part. This third cyan region is indeed in the C-C one-electron bond region, but using this analysis it emerges as only a “weak” interaction.

But a surprise! The two dark blue regions, deemed strong “interactions” are between a C-H of an aryl group and the two carbon atoms shown with blue dots in the diagram above and these are apparently more stabilizing than the one-electron C-C “bond”. Should they not also be bonds then?

The plot above is for the di-cation at the radical cation geometry. It emerges as very similar to the radical cation itself, although the C-C cyan NCI region is less intense than that for the latter and now contains little trace of the dark blue inner core.

We might conclude from this inspection of the newly reported molecule containing a one-electron C-C bond, is that it probably belongs to the class known as an “interaction” rather than an actual bond. Even as an interaction, it is not particularly strong – in part this is probably because only a proportion of that one electron is actually located in the C-C region, with the rest being distributed around the aromatic rings. However, I rather suspect that despite it resembling an interaction, it will no doubt become known as a bond!

Added in response to comment

Below is shown the Laplacian of the electron density (a definition can be found at eg [5]). Negative values of the Laplacian appear here in purple and positive values in orange (contour value 0.125 a.u). The regular C-C bonds are all enclosed in a negative region of the Laplacian, whilst the one-electron C-C bond lies in the orange region.


At the Def2-SVPP level. A Def2-TZVPP NCI calculation is under way, somewhat delayed by technical issues.

References

  1. G.N. Lewis, "THE ATOM AND THE MOLECULE.", Journal of the American Chemical Society, vol. 38, pp. 762-785, 1916. https://doi.org/10.1021/ja02261a002
  2. S. Shaik, D. Danovich, W. Wu, P. Su, H.S. Rzepa, and P.C. Hiberty, "Quadruple bonding in C2 and analogous eight-valence electron species", Nature Chemistry, vol. 4, pp. 195-200, 2012. https://doi.org/10.1038/nchem.1263
  3. T. Shimajiri, S. Kawaguchi, T. Suzuki, and Y. Ishigaki, "Direct evidence for a carbon–carbon one-electron σ-bond", Nature, vol. 634, pp. 347-351, 2024. https://doi.org/10.1038/s41586-024-07965-1
  4. E.R. Johnson, S. Keinan, P. Mori-Sánchez, J. Contreras-García, A.J. Cohen, and W. Yang, "Revealing Noncovalent Interactions", Journal of the American Chemical Society, vol. 132, pp. 6498-6506, 2010. https://doi.org/10.1021/ja100936w
  5. H. Rzepa, "Looking at bonds in a different way: the Laplacian.", 2010. https://doi.org/10.59350/bk5zm-6rk67