Physical Sample identifiers – the future?

July 12th, 2023

I have variously talked about persistent identifiers on this blog. These largely take the form of DOIs (Digital object identifiers), and here they relate to either journal articles or datasets associated with either the article or the blog post or both. Other disciplines, particularly the earth sciences, have long used persistent identifiers (PIDs) to identify physical objects rather than digital ones. One of my ambitions is to assign such identifiers to a small but highly historical collection of physical objects in my possession, as described at this post. As a prelude to this project, here I describe some ways of searching for physical objects that have been assigned a PID. Thanks Rorie for providing these! 

  1. Here is a general search for physical objects with associated metadata describing them as registered with DataCite. https://commons.datacite.org/doi.org?query=types.resourceTypeGeneral:PhysicalObject (11,269,090 items)
  2. The search can be slightly constrained to find only identifiers that originate from the earlier IGSN ID (International generic sample number) see here for details and https://www.igsn.org/about/ for the organisation set up) using the syntax query=client.client_type:igsnCatalog types.resourceTypeGeneral:PhysicalObject (9,642,030 items)

The exciting prospect is that in due time, such searches could be constrained by adding specifically chemical properties, most obviously eg an InChI identifier. At the moment, it is unlikely any existing samples have even been registered with such a term.

  1. Thus combining two queries would give the following:
    query=client.client_type:igsnCatalog types.resourceTypeGeneral:PhysicalObject+AND+subjects.subjectScheme:inchikey+AND+subjects.subject:*
  2. Removing the PhysicalObject constrain gives a different response:
    query=(subjects.subjectScheme:inchikey+AND+subjects.subject:*+OR+subjects.subjectScheme:inchi+AND+subjects.subject:*)

 When this becomes possible, (see project above!), it would enable for example journal articles (or the FAIR data associated with them) to reference information about a physical sample associated with eg the preparation of a molecule new to science.

Diberyllocene — and Lithioborocene?

June 18th, 2023

Sometimes, the properties of a molecule are predicted long before it is synthesised. One such is diberyllocene. I first encountered a related molecule, beryllocene itself, many moons ago.[1] This was unusual because unlike the original metallocenes, the metal atom was not symmetrically disposed between the two cyclopentadienyl faces. Now diberyllocene is finally reported in which replacing one Be by Be-Be induces (according to calculation, D2) symmetry[2]. I will not repeat the excellent analysis of the wavefunction reported in this article, but confine myself to showing two molecular orbitals which examplify its bonding.

Highest occupied molecular orbital
Lowest occupied π-molecular orbital

The HOMO (FAIR data 10.14469/hpc/12702) essentially shows a Be-Be single bond, originating formally from the central Be22+ dication, balanced by the two cyclopentadienyl anion ligands. Click on the images to see this orbital in  3D.

Oddly, an excited state of Be2 on its own actually carries a Be=Be double bond, a property again predicted a long time ago by theory. The most stable π-MO in diberyllocene originates in the six electron aromatic cyclopentadienyl rings. By acquiring a share of six electrons from one Cp ring, and a share of the two electrons from the Be-Be bond, each Be atom achieves the octet of electrons known by generations of students. This is the sort of molecule that could be taught in schools at an early stage to illustrate the octet rule. And its good to know that simple new molecules illustrating this are still being discovered by chemists.


Since this started with experimental realisation of a predicted molecule, can I suggest as a new prediction, lithioborocene, in which a B and an Li replace two Be atoms? Individually, lithiocenes, boracenes and Li-B bonds are known from crystal structures. So its not a way out prediction to combine these observations. Any friendly synthetic chemist up for the challenge?



This post has DOI: 10.14469/hpc/12704

References

  1. M.J.S. Dewar, and H.S. Rzepa, "Ground states of molecules. 45. MNDO results for molecules containing beryllium", Journal of the American Chemical Society, vol. 100, pp. 777-784, 1978. https://doi.org/10.1021/ja00471a020
  2. J.T. Boronski, A.E. Crumpton, L.L. Wales, and S. Aldridge, "Diberyllocene, a stable compound of Be(I) with a Be–Be bond", Science, vol. 380, pp. 1147-1149, 2023. https://doi.org/10.1126/science.adh4419

The Pinacol rearrangement.

June 13th, 2023

This is a venerable organic reaction, which curiously I have not previously covered here. First described in 1859, its nature was only properly elucidated in 1873. It is a member of a class of reaction I have previously named “solvolytically assisted pericyclic”, or “perisolvolytic“. Here I explore some of the subtle stereoelectronic effects observed for this apparently simple reaction.

It applies to a class of molecule known as 1,2-diols. Protonation is quickly followed by migration of a (in this example) methyl group, followed by deprotonation of the carbonyl group formed by this process. There are two mechanistic stages, the first being the departure of the now protonated “ol” unit, and the second the migration of the methyl. In most text books and of course Wikipedia, these are shown as very distinct steps. But they could also occur in one concerted step, albeit probably asynchronously.

A B3LYP+GD3+BJ/Def2-TZVPP/SCRF=ethanol calculation provides mechanistic detail (FAIR Data 10.14469/hpc/1769)

  1. To start with, we note the H-bond formed between O22-H21. Between IRC = -10 and -6, this lengthens from 1.625Å to 1.843Å, destabilising the protonated alcohol group.
  2. Between IRC -6 to -1, the C1-O19 bond breaks, from a starting length of 1.556Å to ~2.787Å.
  3. When IRC 0.0 is reached (the transition state), the C11 methyl starts to migrate across, a process mostly complete by IRC +2. 
  4. The final stage is formation of a weak interaction between C2 and O19 to reach IRC 7.
  5. Several more minor effects can also be discerned. Firstly methyl C3 rotates, to set up a better hyperconjugative interaction with the temporary carbocation forming at C1. This rotamer forms the first of several “hidden intermediates” in the reaction, intermediates which almost form before being consumed, at IRC -6.5 (see the plot above labelled RMS gradient form, for the minimum in the function at this IRC value).
  6. Another hidden intermediate appears at IRC -2, being the transient carbocation, as shown in stepwise versions of this mechanism, such as the Wikipedia page. But its not real, merely hidden! As it approaches, methyl C7 rotates to maximise the hyperconjugative interactions.
  7. At IRC ~+3, methyl C15 rotates to again maximise hyperconjugation with the newly formed C=O bond.

Ca we quantify some of these effects? This can be done by computing localised orbitals (NBOs) and pairwise interactions between a donor NBO (a bond or a lone pair) and an acceptor NBO (an antibonding orbital). 

  1. The E(2) interaction between donor bond C2-C11 and acceptor C1-O19 is 3.3 kcal/mol (above the noise, but not especially strong). It corresponds to an antiperiplanar alignment of the C2-C11 σ orbital and the C1-O19 σ* orbitals and results in the breaking of bond C2-C11 (and reformation as C1-C11). 
  2. The E(2) value between donor lone pair O22 and acceptor C2-C11 σis 6.9 kcal/mol and corresponds to antiperiplanar alignment of these two orbitals, resulting in formation of the C=O carbonyl π-bond, whilst simultaneously increasing the antibonding character of the C-C bond to encourage it to break.

Models of these two interactions can be seen below. Click on the image to load them. The colour blue overlaps positively with the colour purple, and red with orange.

By the time the transition state is reached, these two interactions have evolved to the following:

So this venerable reaction has some nice subtle stereoelectronic behaviour. Those methyl rotations have been skipped over here, but a deeper look into them might also be worthwhile. There is much more to this reaction, but I will leave this analysis here.


This post has DOI https://doi.org/10.14469/hpc/12684


“For chemists, the AI revolution has yet to happen”.

May 25th, 2023

This editorial from Nature[1] is a timely reminder of the importance of data. But also, not just any data, but “accurate and accessible training data“. Accessible of course is one of the attributes of FAIR (Findable, Accessible, Interoperable and Re-usable). The editorial also states “data need to be recorded in agreed and consistent formats, which they are not at present“. That is covered by the I and R of FAIR, often applied in conjunction with metadata recording the Media type that the data is held in (See DOI https://doi.org/jvk9 for examples of the use of Media types in chemical computation and chemical NMR). Again, “The best possible training sets would also include data on negative outcomes“. This relates to the separation of the two publication processes, namely the article itself (or the story behind the data) and the data itself as a first class scientific object. Thus when we publish FAIR data in association with articles, the data archive will often contain data that is not used in the article itself (perhaps because it led to a negative outcome), but is nevertheless part of the FAIR data collection for that topic. Even if the data does not lead to journal publication, publishing it in a data repository means it will not be lost. Somebody (or AI software) may still find it useful.

Whilst the acronym AI is increasingly used and hyped up, I would argue that FAIR should accompany the use of the term AI in most cases (as indeed it is at eg.[2]). Amongst other benefits, FAIR implies a metadata descriptor record is present, which if richly populated, would help address the “accurate” of “accurate and accessible” by adding context. As we show here[2], FAIR is also “AI-Ready“. Indeed an often used alternative expansion of the acronym is “FAIR is AI-Ready”. It is indeed designed to be so if the metadata is sufficiently rich. I also remind that an IUPAC working party is working to produce recommendations to help with this aspect.[3]

My final comment adds to the requirement of “accurate and accessible training data“. I would reformulate this as “accurate, accessible and complete training data“. Much data in chemical science is recorded on an instrument, or computed using modelling software. As it emerges from the instrument or the software package, it can be said to be “complete”. Nothing has been thrown away at this stage. But think of eg NMR data. This is acquired as a FID, and then subjected to analysis (A Fourier Transform, after weighting, which does introduce potential artefacts into the data!). It is the latter data type that is invariably published, often in a visual (PDF) form which may lack numerical accuracy and which is machine processable only with difficulty.  Or think of crystallography, where data emerges as diffraction images and is then transformed into structure factors and coordinates. Only the last form is often published (as a CIF file), but the original data is almost never so (see[4] for an example where complete crystallographic data is published). Then again, chemical computations. The full record of the computation is often produced as a “checkpoint” or “interoperability format” (see eg DOI: 10.14469/hpc/10043) which contains the computed wavefunction and which can be re-used to compute a wide variety of new properties. But most articles currently record computational data simply as a set of atom coordinates. If you are really lucky, you might get some keywords used to run the calculation. But nothing which would eg allow an AI-algorithm to easily compute a property it might need. We cannot be sure that a machine learning/AI procedure might not benefit from such complete data.

So, FAIR and AI are conjoined, they each need the other and should not be separated. And to repeat, where data is transformed before being published, please also add the complete dataset, not just any reduced form.


Post DOI: 10.14469/hpc/12586


References

  1. "For chemists, the AI revolution has yet to happen", Nature, vol. 617, pp. 438-438, 2023. https://doi.org/10.1038/d41586-023-01612-x
  2. H.S. Rzepa, and S. Kuhn, "A data‐oriented approach to making new molecules as a student experiment: artificial intelligence‐enabling FAIR publication of NMR data for organic esters", Magnetic Resonance in Chemistry, vol. 60, pp. 93-103, 2021. https://doi.org/10.1002/mrc.5186
  3. R.M. Hanson, D. Jeannerat, M. Archibald, I.J. Bruno, S.J. Chalk, A.N. Davies, R.J. Lancashire, J. Lang, and H.S. Rzepa, "IUPAC specification for the FAIR management of spectroscopic data in chemistry (IUPAC FAIRSpec) – guiding principles", Pure and Applied Chemistry, vol. 94, pp. 623-636, 2022. https://doi.org/10.1515/pac-2021-2009
  4. J. Almond-Thynne, A.J.P. White, A. Polyzos, H.S. Rzepa, P.J. Parsons, and A.G.M. Barrett, "Synthesis and Reactions of Benzannulated Spiroaminals: Tetrahydrospirobiquinolines", ACS Omega, vol. 2, pp. 3241-3249, 2017. https://doi.org/10.1021/acsomega.7b00482

Tunable aromaticity? An unrecognized new aromatic molecule?

May 21st, 2023

Some time ago in 2010, I showed a chemical problem I used to set during university entrance interviews. It was all about pattern recognition and how one can develop a hypothesis based on this. In that instance, it involved recognising that a cyclic molecule which appeared to have the cyclohexatriene benzene-aromatic pattern 1 was in fact a trimer of carbon dioxide. Perhaps small amounts of this aromatic molecule exist in solutions of fizzy drinks? Analysing these patterns occupied about 10-20 minutes of an interview, and although you might think I was posing a difficult challenge, many students successfully rose to it! Now I revisit, but with a slightly better reality check on a related molecule 2 (cyanuric acid).

.

As many as 58 examples of crystal structures of 1,3,5-triazinane-2,4,6-trione 2 (cyanuric acid) are known, often with a co-adduct. Cyanuric acid is in effect a cyclic trimer of isocyanic acid rather than of carbon dioxide. These examples tend to be planar, with a mean C-N ring distance of ~1.37Å and a C-O distance of 1.22Å. 

Two outliers stand out, both from a very recently published article, being a co-adduct with melamine (1,3,5-triazine-2,4,6-triamine).[1] QACSUI02 exhibits a shorter C-N distance of ~1.33Å but a longer C-O distances of 1.32Å and have a symmetrical patten of hydrogen bonds to the six receptors of the central unit. Could this correspond more closely to the cyclohexatriene resonance structures shown to the left of the diagram at the top? The first task is to see if these bond lengths can be replicated using calculation (often a useful procedure to check that the crystal structure is correct). For this purpose, the structure below was chosen as the starting point for various models, using an ωB97XD/Def2-TZVPP model.

Model C-N distance C-O distance
QACSUI02 (crystal structure) 1.331 1.318
ωB97XD/Def2-TZVPP as single layer 1.3678 1.2185
ωB97XD/Def2-TZVPP three layers 1.365 1.218
ωB97XD/Def2-TZVPP no H-bonds 1.3816 1.2002

XAKSOU (crystal structure) 1.367 1.208
ωB97XD/Def2-TZVPP  1.3670 1.2213

This creates a mystery. The calculated bond lengths show that whilst H-bonding to the central ring decreases the C-N length by 0.014Å and increases the C-O length by 0.017Å, this effect is nowhere near large enough to match the apparent lengths in the crystal structure, where a C-N effect of ~0.037Å would be needed.

Another system XAKSOU has been reported where discrete LiCl units replace the hydrogen the H-bonds formed to melamine above.[2] A Li is coordinated to the carbonyl oxygen instead of a hydrogen bond, and a chloride anion from another molecule in the unit cell replaces the H-bond to nitrogen.

In the computed model, an intramolecular Cl-H hydrogen bond is used as the model, resulting in similar C-N lengths as the crystal structure (one which does not match the lengths in the outlying crystal structure QACSUI02)

So the final question to ask is whether this latter structure is aromatic. NICS(0)/(1) values of -2.8/-1.1ppm are computed, which suggests very little aromaticity (aromatic values would be -10 to -20 pm). So it does not seem as if aromaticity can be tuned into cyanuric acid 2 by polarising both the NH and CO units with ionic/H-bond interactions so that the aromatic cyclohexatriene motif is better favoured over the 1,3,5-triazinane-2,4,6-trione non-aromatic resonance form. Are there any other examples where aromatically tunable molecules might be possible?

References

  1. K. Song, H. Yang, B. Chen, X. Lin, Y. Liu, Y. Liu, H. Li, S. Zheng, and Z. Chen, "The facile implementing ternary resistive memory in graphite-like melamine-cyanuric acid hydrogen-bonded organic framework with high ternary yield and environmental tolerance", Applied Surface Science, vol. 608, pp. 155161, 2023. https://doi.org/10.1016/j.apsusc.2022.155161
  2. O. Shemchuk, D. Braga, L. Maini, and F. Grepioni, "Anhydrous ionic co-crystals of cyanuric acid with LiCl and NaCl", CrystEngComm, vol. 19, pp. 1366-1369, 2017. https://doi.org/10.1039/c7ce00037e

One vs two bond rotation – An example using Acyl amides

April 3rd, 2023

One of the important aspects of chemical reaction mechanisms is the order in which things happen. More specifically, the order in which bonds make or break when there are more than two involved in undertaking a reaction. So we have:

  1. concerted mechanisms, when all bonds in any particular stage of a mechanism are changing in concert via a unique transition state,
  2. asynchronous concerted mechanism, when all the bonds are changing, but not necessarily all at the same rate and which may involve so called “hidden intermediates”, but which nevertheless stil involves only one transition state.
  3. stepwise mechanisms, in which more than one transition state is involved, connected by a discrete intermediate along the pathway.

Here I consider an example of another type of (isomerisation) mechanism, involving bond rotations rather than bond formations or breakages. The two bonds in this case have a higher bond order than 1, and so are starting to verge on a type of isomerism known as atropisomerism, where the rotation takes place on a relatively slow time scale (unlike single bonds themselves, where rotation about them is normally relatively fast). Do two such bonds rotate in a stepwise or a concerted manner? In the structure below, we have two rotatable bonds, shown in red and blue, which due to conjugation of the lone electron pair on the nitrogen atoms with the carbonyl group have bond orders >1. Do these bonds rotate in concert or in a stepwise manner?

The calculations of the rotations are done at the B3LYP+GD3+BJ/Def2-SVPP/SCRF=DCM level, Data DOI: 10.14469/hpc/12299

  1. Firstly, for the system R=R’ = Me. The reaction coordinate is specified around the red bond.

    The animation along the IRC (Intrinsic reaction coordinate) appears below, where you can see the red bond rotating and the blue bond spectating.

  2. The response of the dihedral angles about both bonds is shown below, which reinforces the conclusion that whilst one dihedral changes by about 180°, the other hardly changes. The overall dipole moment changes significantly as a result of the relative orientation of the two carbonyl groups changing. The two bonds can be said to rotate in a stepwise mechanism, involving an intermediate where one has rotated and the other has not.


  3. When the bulk of the central group is increased, different behaviour is now observed.

  4. Both dihedral angles now change by ~180°, in concert but not in synchrony! The first more or less transforms evenly by ~180°, but the second changes direction at ~IRC=-5 to rejoin the other.

When the steric bulk means that the rotating substituents start to interfere with each other, so-called “gearing” starts to take place where the motions of the two become coupled by the gears. The rotations are now a concerted asynchronous process.

So now to my concluding thought. The above is a simple example of gearing involving rotation about two coupled bonds. So how many bonds can be simultaneously geared so that when one rotates, the others do as well? I am now hunting for an example of three such bonds geared together. And is there a limit to how many can do so in concert? Here we enter into analogy with bond cleavage, where there are numerous examples of bonds breaking in concert, if not in synchrony. Most pericyclic processes are of this type. Is there a similar patten in bond rotations?

A ROR Persistent Identifier for the WATOC organisation – helping to make scientific connections.

March 9th, 2023

Science frequently works by people making connections between related (or even apparently unrelated) concepts or data. There are many ways of helping people make these connections – attending a conference or seminar, searching journals for published articles and nowadays also searching for data are just a few examples. For about 20 years now, one technology which has been helping to enable such discoveries is what are called “Persistent IDentifiers” or PIDs. These are unique labels which can be attached to a (scientific) object such as a journal article, a dataset or a researcher. The PIDs for the first two examples have become better known as DOIs (digital object identifier), the last is known as an ORCID. The PID is registered with a registration authority. Two of the oldest and  best known authorities are CrossRef for journal articles, funders (etc) and DataCite, who specialise in citable identifiers for data. The registration process includes creating and adding a metadata record to the PID, the record is then indexed and can then be used for searching for the objects. The terms of these metadata records are carefully controlled to use specified and standardised vocabularies to describe the objects (one current initiative in chemistry in this area is described here[1]).

The PID “ecosystem” is constantly expanding and a recent addition is the ROR registration authority. This issues PIDs for research organisations, so that one can then easily associate a scientific object with the organisation where the research was conducted. The initial focus for ROR PIDs was the traditional forms of organisation such as a university and company research labs. Here I tell about how a rather different type of organisation came to have its own ROR, the “World Association of Theoretical and Computational Chemists” or WATOC. The aims of WATOC are primarily to hold triennial congresses to promote scientific exchange and to help researchers make those connections through presentations, posters and numerous coffee breaks!

Last July, the proposal for creating a ROR for WATOC was accepted by its decision making body and can now be announced as https://ror.org/04rp40h82, where 04rp40h82 is the unique WATOC identifier. The prefix https://ror.org/ is called the “resolver”, which in turn allows access to the associated metadata record via an API. That record in turn includes a link to the organisation, similar to links to journal articles as specified by a DOI.

It is now time to show some examples of how the WATOC ROR can actually be used.

  1. One outcome of the last WATOC Congress held in 2022 in Vancouver is the production of a themed peer-reviewed issue of the Canadian journal of chemistry, created by inviting speakers to submit an article corresponding to their presentation. Armed with the WATOC ROR, the publisher was approached to ask if this identifier could be included in the metadata record for each accepted article. This was agreed and in due course will be added to the Crossref metadata record for each article in this special issue. When this happens, it can be searched using e.g. https://api.crossref.org/works?filter=ror-id:04rp40h82  Because creation of a metadata record is actually part of the complex journal production workflow, this will not occur until the journal has updated its procedures to do this, which may take a little while yet. Invoking that search would then allow all published articles associated with (at least in part) WATOC activities.
  2. The link https://api.crossref.org/works?filter=ror-id:04rp40h82 is actually part of the CrossRef API (application programmer interface) and so can now be used to construct complex programatic queries which include the WATOC ROR and for deployment in e.g. AI applications.[2] Although not derived from the CrossRef API, I can show here some similar uses of metadata for the construction of so-called Knowledge Graphs [2], which can be thought of as visual representation of connections between scientific objects, organisations and other types of entity to which a registered PID has been assigned.
    1. This knowledge graph was created using SciFinder by specifying a person (myself in this case) and any conferences they have been associated with. However, in the past the capture of conference attendance was a rather hit and miss process and so the record is very incomplete. It is the expectation that metadata associated with ROR PIDs will help make these records more complete and hence useful.  ROR is also fully open and hence its use is less restricted than the proprietary SciFinder system.
    2. I cannot resist also adding this one. The metadata record now contains named concepts, this one being “transition states” which I have been associated with in the past.
    3. As of today, the WATOC ROR has not propagated to any CrossRef metadata records and so I cannot yet show any knowledge graphs with nodes based on WATOC.
  3. The ROR PID can also be used for inclusion in metadata records describing datasets. This is one such search, now of the DataCite metadata store:
    https://commons.datacite.org/doi.org?query=((contributors.affiliation.affiliationIdentifier:*04rp40h82)+AND+(contributors.affiliation.affiliationIdentifierScheme:ROR))+OR+((creators.affiliation.affiliationIdentifier:*04rp40h82)+AND+(creators.affiliation.affiliationIdentifierScheme:ROR))
    Note the somewhat more complex logic being used, in part because a dataset can be “created” by a named person but also can be “contributed to” and one should really search for both possibilities.

  4. One can also combine two different identifiers, namely an organisational ROR and a researcher ORCID into a single query:
    https://commons.datacite.org/?query=((creators.affiliation.affiliationIdentifier:*04rp40h82)+OR+(contributors.affiliation.affiliationIdentifier:*04rp40h82))+AND+(contributors.nameIdentifiers.nameIdentifier:*0000-0002-8635-8390)
    There are many more combinations of searches that can be constructed using other types of identifiers.[3]

  5. Further in the future, one might expect that metadata records from e.g. both CrossRef and DataCite could be combined to create knowledge graphs by combining information based on both journal articles and published FAIR datasets. Currently, CrossRef does not identify PIDs for datasets that might be cited in an article bibliography as explicit data, but that too may be coming in the near future.[4]

Way back in January 1994, WATOC was one of the very first chemical-science based organisations to have its own web page. Now it is leading the way in acquiring and deploying its very own persistent identifier in the form of a ROR. One might hope that many more such organisations acquire one soon.


The DOI for this post is 10.14469/hpc/12363


References

  1. R.M. Hanson, D. Jeannerat, M. Archibald, I.J. Bruno, S.J. Chalk, A.N. Davies, R.J. Lancashire, J. Lang, and H.S. Rzepa, "IUPAC specification for the FAIR management of spectroscopic data in chemistry (IUPAC FAIRSpec) – guiding principles", Pure and Applied Chemistry, vol. 94, pp. 623-636, 2022. https://doi.org/10.1515/pac-2021-2009
  2. A. Hogan, E. Blomqvist, M. Cochez, C. D’amato, G.D. Melo, C. Gutierrez, S. Kirrane, J.E.L. Gayo, R. Navigli, S. Neumaier, A.N. Ngomo, A. Polleres, S.M. Rashid, A. Rula, L. Schmelzeisen, J. Sequeda, S. Staab, and A. Zimmermann, "Knowledge Graphs", ACM Computing Surveys, vol. 54, pp. 1-37, 2021. https://doi.org/10.1145/3447772
  3. H.S. Rzepa, and S. Kuhn, "A data‐oriented approach to making new molecules as a student experiment: artificial intelligence‐enabling FAIR publication of NMR data for organic esters", Magnetic Resonance in Chemistry, vol. 60, pp. 93-103, 2021. https://doi.org/10.1002/mrc.5186
  4. . , "NISO JATS4R Data Citations Recommendation v2.0", . https://doi.org/10.3789/niso-rp-36-2020

Determining absolute configuration: Cylindricine.

February 1st, 2023

Nature has produced most natural molecules as chiral objects, which means the molecule can come in two enantiomeric forms, each being the mirror image of the other. When a natural product is synthesised in a laboratory, a chiral synthesis means just one form is made, and then is compared with the natural product to see if it matches. Just such a process was following in the recent synthesis of cylindricine, a marine alkaloid[1] featured on the ACS molecule-of-the-week site. The authors noted that the absolute configuration of cylindricine as isolated naturally had remained unassigned, and as it happens one way of measuring the properties of the individual enantiomer – its optical rotation – had not been determined. So in part, the purpose of this synthesis was to determine the absolute configuration of this molecule. Here I explore this process.

There are several different procedures for finding the absolute configuration of a molecule.

  1. By synthesis from a starting material, itself presumed of known absolute configuration – in this example, a molecule[2] which had been previously assigned an absolute configuration. The presumption then is that all the transformations made to this molecule have stereochemically certain and predictable outcomes and of course that the configuration of the starting material in this process was not in any doubt. Ultimately, this chain of inferences should be traceable all the way back to D-(+)-glyceraldehyde. These inference chains can involve multiple groups working at different times.
  2. Alternative methods can be used as an independent check on the first method above, which depend only on the properties of the target molecule itself and not on any inference chain. One such is the “gold standard”, introduced in 1951[3] and using X-ray crystallography. This method is quite common nowadays, but it does require a suitable crystal for measurement.
  3. The so-called chiroptical properties of the target molecule can also be subjected to computational prediction, a method first introduced in 1937[4] using the optical rotation as the measure and based on linearly polarized light at a specific wavelength (normally corresponding ot 589nm). As was found in 1937, this can be quite a fragile method, depending very much on the actual conformations of the molecule. Rigid molecules are more predictable than flexible ones. Cylindricine itself has a number of conformations or orientations of the various substituents and it then becomes an question of finding the most stable of these, in terms of their overall contributing populations.
  4. A more recent method is the use of a technique known Electronic circular dichroism (ECD), which uses circularly rather than linearly polarised light, across a range of wavelengths from ~200 up to ~800nm.
  5. An even more recent chiroptical method is VCD or Vibrational Circular Dichroism. This spectroscopic technique detects differences in attenuation of left and right circularly polarized light passing through a sample. It is an extension of circular dichroism spectroscopy into the infrared ranges.

Any or all of methods 3-5 could be used to independently check on the results inferred in procedure 1. Here I report the results of such an attempted verification.

The start point is an attempt to find the most stable conformation of cylindricine. Here I am using a conformational tool called GMMX, part of the Gaussview suite. Loading the molecule as drawn above, six rotatable bonds are automatically identified and the program systematically rotates about all of these in turn using a molecular mechanics force field to compute an energy. This field includes so-called dispersion or van der Waals attractions. I used the MMFF94 force field, with its origins in the pharmaceutical industries and reasonably suitable for a natural product. The lowest energy conformation obtained is shown below, but it should be noted that there are 36 further conformations within 3.5 kcal/mol of the lowest. This conformer was chosen for the chiroptical calculations described in 3-5 above. Of course, more thoroughly all the conformers with a population of at least 1% should be included in this process for a more comprehensive analysis.

To get an inkling of why this conformer might be the lowest in energy, inspect the model below (click on the image to get a 3D rotatable model). It shows the so-called NCI (non-covalent-interactions), which are mostly composed of hydrogen bond and dispersion stabilisations. Each little blue/green isosurface is one of these – and the more of them there are – the more stable the conformer.

For this conformer, the calculated optical rotation emerge as -34° at 589nm (FAIR Data DOI: 10.14469/hpc/12231). The reported value is -8.5°. You might think that the agreement is poor, but such calculations are only reasonably clear-cut for large values of the rotations! Clearly, this calculation provides some supporting evidence that the assignment of absolute configuration is correct. The take home message is not the value of the rotation but its sign, where calculation and measurement agree. The next step would be to perform a full conformational average over all 37 conformations!

The calculated ECD spectrum is shown below. It only shows a weak negative feature at ~220nm and strong evidence requires features at >280nm to be clear cut. This result suggests that recording this spectrum is not recommended.

The VCD spectrum is shown below. This does show strong features in both the C-H stretching region and the 1500-800 wavenumber region and would be a good diagnostic. Recording it would indirectly also reveal whether the conformer chosen above is likely to be correct or not.

So the above provides a start point for a more comprehensive and independent method for verifying the absolute configuration. The total synthesis using a starting material of known configuration it has to be said is normally pretty reliable, but there are rare examples where a mistake in assignment was made of such a precursor and which was indeed corrected by VCD assignment.[5]


This blog has DOI: 10.14469/hpc/12233

References

  1. M. Piccichè, A. Pinto, R. Griera, J. Bosch, and M. Amat, "Total Synthesis of (−)-Cylindricine H", Organic Letters, vol. 24, pp. 5356-5360, 2022. https://doi.org/10.1021/acs.orglett.2c02004
  2. M. Amat, O. Bassas, N. Llor, M. Cantó, M. Pérez, E. Molins, and J. Bosch, "Dynamic Kinetic Resolution and Desymmetrization Processes: A Straightforward Methodology for the Enantioselective Synthesis of Piperidines", Chemistry – A European Journal, vol. 12, pp. 7872-7881, 2006. https://doi.org/10.1002/chem.200600420
  3. J.M. BIJVOET, A.F. PEERDEMAN, and A.J. van BOMMEL, "Determination of the Absolute Configuration of Optically Active Compounds by Means of X-Rays", Nature, vol. 168, pp. 271-272, 1951. https://doi.org/10.1038/168271a0
  4. J.G. Kirkwood, "On the Theory of Optical Rotatory Power", The Journal of Chemical Physics, vol. 5, pp. 479-491, 1937. https://doi.org/10.1063/1.1750060
  5. J.L. Arbour, H.S. Rzepa, A.J.P. White, and K.K.(. Hii, "Unusual regiodivergence in metal-catalysed intramolecular cyclisation of γ-allenols", Chem. Commun., pp. 7125-7127, 2009. https://doi.org/10.1039/b913295c

A look at (one of) the dyes used in the Bayeaux tapestry.

January 3rd, 2023

I have previously looked at the pigments used to colour the Book of Kells, which dates from around 800 AD and which contained arsenic sulfide as the yellow colourant. The Bayeaux tapestry is a later embroidery dating probably from around 1077 and here the colours are based entirely on mordanted natural dyes. These are generally acknowledged to be blue woad (principle component indigo), red madder (principle component alizarin) and the less well-known yellow weld, which comes from the plant Reseda Luteola and the principle component of which is luteolin.

Luteolin has an interesting chemical history. It was first purified in 1829, in the dawn of organic chemistry, and its formula C15H10Oestablished by 1864. A. G. Perkin, the son of the William Perkin who discovered the dye mauveine, then provided the chemical structure[1] in 1896. This latter article is well worth a modern read, since it beautifully illustrates how the art of structure determination was conducted in the days before crystallography and NMR.

Perkin obtains his structure by comparing luteolin to then known quercetin, concluding that the former must also contain an aromatic hydroxy group “ortho” to the carbonyl group, as in querecetin. The key experimental evidence was that alkylation of luteolin with iodoethane only produces a triethoxy derivative of luteolin, with “one hydroxy group resisting ethylation“. It was by then established, by four different sets of researchers, that hydroxy groups adjacent to the carbonyl in e.g. quercetin or alizarin resisted alkylation. The structure of luteolin was established (see eg 10.5517/cc798yq) by combining various such observations, a method (and skill) that has largely lapsed nowadays. 

A modern take on this selective alkylation might be to compute e.g. the wavefunction (ωB97XD/Def2-TZVPP/SCRF=water) of luteolin to inspect the energies of the orbitals associated with alkylation of the hydroxyl group, using the energy of the nucleophilic lone pair oxygen orbital (FAIR DOI: 10.14469/hpc/12185) as an indicator. The least stable such orbital (highest energy) is normally an indicator of the most nucleophilic electron pair. In this case, the highest (most reactive) such orbital is the one adjacent to the carbonyl group, which thereby reveals a mystery, since it is this very hydroxyl that resists alkylation! A transition state approach to this might be needed to resolve the mystery, factoring in perhaps steric effects etc.

-0.6951 au -0.7132 au
-0.7169 au -0.7205 au
<

The calculated UV-Vis spectrum is shown below, showing the peak at ~300 NM responsible for the intense yellow colour (300-400 nm).


The strongest oscillator contribution to the transition is shown below.

LUMO au HOMO

So here I have cast a little more light on this relatively unknown natural yellow dye, that was used for many centuries to colour woollen materials.

References

  1. A.G. Perkin, "XLIX.—Luteolin. Part II", J. Chem. Soc., Trans., vol. 69, pp. 799-803, 1896. https://doi.org/10.1039/ct8966900799

Molecules of the year -2022. A closer look at the Megalo-Cavitands.

December 15th, 2022

In the previous post, I discussed how data associated with two of the candidates for molecules of the year – 2022 could be retrieved and then used to inspect their three dimensional structures. Here I focus on the ultra large cavitands recently reported[1]. As I noted, these have an associated data coordinate archive published on Zenodo (DOI: 10.5281/zenodo.6953961) although this is not cited in the article itself.

Shown below are the coordinates of the A4-T molecule containing C70, the first being optimized at the PM6 level and the second at the PM7 level. The most obvious difference is that all the close C-H…H-C contacts of the host molecule shrink from between ~4Å to 2.6Å at PM6 geometries, down to about 2.1Å for PM7, a contraction of at least 0.5Å. Also, the gap between the host and the guest reduces from around 4.2Å to 3.45 Å (a distance typical of π-π stacking by the way), a significant reduction of ~0.75Å. Click on the two images below to view this model.

The difference in the dispersion terms for these two geometries emerges as 36.6 kcal/mol lower for the PM7 optimised geometry compared to the original PM6 geometry, a significant stabilisation. FAIR data is at DOI: 10.14469/hpc/12022 if you want to analyse the cavity sizes further.

Shown below is the NCI (non-covalent-interaction) surface, computed at the PM7 geometry and using the MNDO wavefunction. This illustrates the stabilisations occuring from the non-covalent density (takes a little while to load).


This post has DOI: 10.14469/hpc/12027


References

  1. J. Pfeuffer‐Rooschüz, S. Heim, A. Prescimone, and K. Tiefenbacher, "Megalo‐Cavitands: Synthesis of Acridane[4]arenes and Formation of Large, Deep Cavitands for Selective C70 Uptake", Angewandte Chemie International Edition, vol. 61, 2022. https://doi.org/10.1002/anie.202209885