Geometries of proton transfers: modelled using total energy or free energy?

April 18th, 2022

Proton transfers are amongst the most common of all chemical reactions. They are often thought of as “trivial” and even may not feature in many mechanistic schemes, other than perhaps the notation “PT”. The types with the lowest energy barriers for transfer often involve heteroatoms such as N and O, and the conventional transition state might be supposed to be when the proton is located at about the half way distance between the two heteroatoms. This should be the energy high point between the two positions for the proton. But what if a crystal structure is determined with the proton in exactly this position? Well, the first hypothesis is that using X-rays as the diffracting radiation is unreliable, because protons scatter x-rays very poorly. Then a more arduous neutron diffraction study is sometimes undertaken, which is generally assumed to be more reliable in determining the position of the proton. Just such a study was undertaken for the structure shown below (RAKQOJ)[1], dataDOI: 10.5517/cc57db3 for the 80K determination. The substituents had been selected to try to maximise the symmetry of the O…H…N motif via pKa tuning (for another tuning attempt, see this blog). The more general landscape this molecule fits into[2] is shown below:

The results obtained for the position of the proton for RAKQOJ were fascinating. They were very dependent on the temperature of the crystal! At room temperatures (using X-rays), the proton was measured as 1.09Å from the oxygen and 1.47Å from the nitrogen (neutral form above). At 20K, the OH distance was 1.309Å and the HN 1.206Å (~ionic form above). Indeed, the very title of this article is First O-H-N Hydrogen Bond with a Centered Proton Obtained by Thermally Induced Proton Migration. The authors give a number of reasons for this behaviour (their ref 17[1] and also[2]), but one they do not mention is thermally induced changes in the dielectric constant of the crystal with temperature, given that in one position for the proton the molecule is ionic and in the other neutral. So I decided to model the system as a function of solvent. In this model, the solvent dielectric is used to approximate the crystal dielectric. My first choice of energy function is to compute geometries using the B3LYP+GD3BJ/Def2=TZVPP/SCRF=solvent method to see what might emerge and as a possible prelude to trying other functionals. FAIR data for these calculations are collected at DOI: 10.14469/hpc/10368.

Solvent ε ΔG298 for O…HN rO…H rHN ΔG298 for OH…N rOH rH…N ΔG298
TS (PT)
rOH rHN
Water 78.4 -2893.387188
-2893.334325
1.4913 1.0827 -2893.386705
-2893.334333
1.0364 1.5696 -2893.387668
-2893.336183
1.1852 1.2899
Dichloro
methane
8.9 -2893.385173 1.4566 1.0945 -2893.385662 1.0309 1.5878 -2893.386022 1.2072 1.2642
Chloroform 4.7 -2893.382254 1.4227 1.1082 -2893.384514 1.0261 1.6049 -2893.384773 1.2321 1.2388
Dibutyl ether 3.1 -2893.380813 1.3778 1.1302 -2893.383511 1.0213 1.6235 -2893.382918 1.2667 1.2078
Toluene 2.4 -2893.379752 1.3248 1.1635 -2893.382915 1.0178 1.6385 -2893.379773 1.2851 1.1934
Gas phase 0 n/a -2893.377949 1.0009 1.7387 n/a
Expt (RT)
[1]
? n/a 1.09 1.47 n/a
Expt (20K)
[1]
? n/a 1.309 1.206 n/a

At 20K

Results:

  1. The geometries for each model are obtained by minimising the total energy of the system as a function of the 3N-6 geometric variables (coordinates). 
  2. The geometries show that for all solvents, TWO minima in the total energy are obtained, one for the ionic and one for the neutral form. This is called a double-well energy potential. Even a non-polar solvent such as toluene produces a solvation energy of ~3.1 kcal/mol compared to the gas phase, which is sufficient to induce a double-well potential.
  3. Without solvent (gas phase), only the neutral geometry is obtained. 
  4. In the most polar solvent water, the double well potential looks like this:

    The ionic well is about 0.4 kcal/mol lower in total energy (and ~0.3 kcal/mol in free energy, see table above) than the neutral form, with a barrier connecting neutral to ionic only 1.0 kcal/mol. A transition state + intrinsic reaction coordinate (IRC) can be easily located on this total energy potential, confirming the double-well form.
  5. When free energies ΔG are computed, which include thermal effects such as entropy and zero-point energy, the transition state emerges as 0.3 kcal/mol less than the total energy of the ionic form (red entries, Table). In effect, the free energy potential surface is INVERTED compared to the total energy surface and the “transition state” becomes the lowest point on the energy surface. So this point is a minimum in the free energy but a maximum in the total energy, the result of adding thermal effects to the total energy.
  6. In dichloromethane, the free energy of the neutral form is now lower by 0.3 kcal/mol than the ionic form. The OH bond is starting to get shorter and the NH one longer. The transition state is now 0.22 kcal/mol lower than the neutral form. With chloroform, the OH and HN bonds have become ~equal in length, the proton is symmetrically disposed.
  7. By the time dibutyl ether as solvent is reached, the transition state is no longer lower in ΔG than the neutral form, moving on to being 2.0 kcal/mol higher for toluene. So as the solvent polarity decreases, we see a change in the potential from a single well in ΔG, in which the proton is centred, to a very asymmetric well in which the proton is attached to the oxygen.
  8. Can we match the observed neutron diffraction results to the calculations? As the temperature decreases, the neutron diffraction shows the start of proton transfer from oxygen to nitrogen to form an ionic species. The calculations show that this can be modelled by an increase in the effective dielectric constant of the  medium. The computed “transition state” for proton transfer somewhere between dibutyl ether and toluene (as a dielectric media) emerges as approximately the best model for the structure of this species. At this dielectric, the calculated ΔG is no longer quite the lowest free energy point in the potential. This might be due to the many approximations used in this model such as minimisation of total energy, the partition function method used to calculate entropy, the nature of the DFT functional, the continuum solvation model, the basis set, etc. 

Conclusions:

These results were obtained with the approximation that minimising the total molecular energy produces a computed geometry that can be compared to the experimental neutron diffraction structures. But can one do better? Obtaining molecular geometries by minimising the computed free energies would be non-trivial. Firstly, minimisation would depend on availability of first derivatives of the energy function with respect to coordinates, in this case ΔG. These are not available for any DFT codes. The result would itself be temperature dependent (as indeed are the experimental results shown above). Furthermore, ΔG is computed from normal vibrational modes and these are only appropriate when the first derivatives of the function are zero, at which point the so-called six rotations and translations of the molecule in free space also have zero energy. So we need vibrations to compute derivatives, but we need derivatives to compute vibrations in this classical approach.

It would be great for example if the approximate model of the potential for a hydrogen transfer used above as based on minimising total energies for derivatives could be checked against a model based on geometries optimised using free energies instead. Such procedures do exist,[3] using molecular dynamics trajectory methods.


This post has DOI: 10.14469/hpc/10382 [4]

References

  1. T. Steiner, I. Majerz, and C.C. Wilson, "First O−H−N Hydrogen Bond with a Centered Proton Obtained by Thermally Induced Proton Migration", Angewandte Chemie International Edition, vol. 40, pp. 2651-2654, 2001. https://doi.org/10.1002/1521-3773(20010716)40:14<2651::aid-anie2651>3.0.co;2-2
  2. I. Majerz, and M.J. Gutmann, "Mechanism of proton transfer in the strong OHN intermolecular hydrogen bond", RSC Advances, vol. 1, pp. 219, 2011. https://doi.org/10.1039/c1ra00219h
  3. M. Higashi, S. Hayashi, and S. Kato, "Geometry optimization based on linear response free energy with quantum mechanical/molecular mechanical method: Applications to Menshutkin-type and Claisen rearrangement reactions in aqueous solution", The Journal of Chemical Physics, vol. 126, 2007. https://doi.org/10.1063/1.2715941
  4. H. Rzepa, "Geometries of proton transfers: modelled using total energy or free energy?", 2022. https://doi.org/10.14469/hpc/10368

C2N2: a 10-electron four-atom molecule displaying both Hückel 4n+2 and Baird 4n selection rules for ring aromaticity.

April 7th, 2022

The previous examples of four atom systems displaying two layers of aromaticity illustrated how 4 (B4), 8 (C4) and 12 (N4) valence electrons were partitioned into 4n+2 manifolds (respectively 2+2, 6+2 and 6+6). The triplet state molecule B2C2 with 6 electrons partitioned into 2π and 4σ electrons, with the latter following Baird’s aromaticity rule.[1],[2]. Now for the final missing entry; as a triplet C2N2 has 10 electrons, which now partition into 4 + 6. But would that be 4π + 6σ or 4σ + 6π? Well, in a way neither! Read on.

Bonding MOs for C2N2.
Click image to load 3D model
π3, 1 electron π2, 1 electron
σ3 2 electron σ2, 2 electron
π1 2 electron σ1, 2 electron

The calculations (ωB97XD/Def2-TZVPP and CCSD(T)/Def2-TZVPP) are collected at FAIR DOI: 10.14469/hpc/10346. These show a partitioning into 5σ + 5π, a species that is not a minimum but undergoes a non-planar distortion.

However, the first excited state (the triplet) IS planar and is only 12.5 kcal/mol above the planar 5+5 precursor. It is now partitioned into 6σ and 4π, with the latter conforming to Baird’s rule for open shell triplets.[1],[2] So this is unlike C2B2, which showed 2π + 4σ partitioning with the σ series following Baird’s rule. Now we have two examples in which one of the σ or the π-manifolds follow Baird’s rule and the other follows Hückel’s rule. The systems themselves are somewhat contrived, but they show the simple fun and games that can be had with these aromaticity rules.


This post has DOI: 10.14469/hpc/10350

References

  1. N.C. Baird, "Quantum organic photochemistry. II. Resonance and aromaticity in the lowest 3.pi..pi.* state of cyclic hydrocarbons", Journal of the American Chemical Society, vol. 94, pp. 4941-4948, 1972. https://doi.org/10.1021/ja00769a025
  2. M. Rosenberg, C. Dahlstrand, K. Kilså, and H. Ottosson, "Excited State Aromaticity and Antiaromaticity: Opportunities for Photophysical and Photochemical Rationalizations", Chemical Reviews, vol. 114, pp. 5379-5425, 2014. https://doi.org/10.1021/cr300471v

Raw data and the evolution of crystallographic FAIR data. Journals, processed and raw structure data.

March 28th, 2022

In my previous post on the topic, I introduced the concept that data can come in several forms, most commonly as “raw” or primary data and as a “processed” version of this data that has added value. In crystallography, the chemist is interested in this processed version, carried by a CIF file. However on rare occasions when a query arises about the processed component, this can in principle at least be resolved by taking a look at the original raw data, expressed as diffraction images. I established with much appreciated help from CCDC that since 2016, around 65 datasets in the CSD (Cambridge structural database) have appeared with such associated raw data. The problem is easily reconciling the two sets of data (the raw data is not stored on CSD) and one way of doing this is via the metadata associated with the datasets. In turn, if this metadata is suitably registered, one can query the metadata store for such associations, as was illustrated in the previous post on the topic. Here I explore the metadata records for five of these 65 sets to find out their properties, selected to illustrate the five data repositories thus far that host such data for compounds in the CSD database.

Raw data
repository
Raw Data
DOI
Raw data
→CSD?
CSD→
Raw data?
⇐Journal⇒
Zenodo 10.5281/zenodo.4271549 No No 10.1039/C6RA28567H
Imperial College research data repository 10.14469/hpc/2298 Yes Yes 10.1021/acsomega.7b00482
RepoD, a Harvard Dataverse instance 10.18150/repod.6628285 No No 10.1021/acs.cgd.0c01252
Cambridge university repository 10.17863/CAM.21968 No No 10.1016/j.inoche.2018.08.024
Isis neutron and muon source data journal 10.5286/ISIS.E.RB1620465 No No 10.1039/D0CC02418J

Ideally, one is looking for bidirectional links between the data as expressed in the metadata and in both directions. As you can see from the above, these links are present in only one of the five sets. More common is that both the raw and the processed data will contain links to the journal article where the data is discussed. Very much less commonly are there links from the journal article to the raw data, although such links are slightly more likely to exist from the journal to the processed data. If you click on the link in any of the last three columns, a copy of the metadata will download for you to inspect. There you can verify if the assertions made above are correct. 

What the metadata records demonstrate above is a very small scale so-called PID graph (DOI: [1] 10.5438/jwvf-8a66) where each DOI is a node in that graph and if a connection exists, it is shown by a line connecting the nodes. The PID graph can be extended to include a third type of node, the journal article and then it starts to get interesting! I will investigate if I can generate the PID graph for the above, although be prepared, it will not (yet) contain very many lines between nodes!

References

  1. M. Fenner, and A. Aryani, "Introducing the PID Graph", 2019. https://doi.org/10.5438/jwvf-8a66

Sir Geoffrey Wilkinson: An anniversary celebration. 23 March, 2022, Burlington House, London.

March 24th, 2022

The meeting covered the scientific life of Professor Sir Geoffrey Wilkinson from the perspective of collaborators, friends and family and celebrated three anniversaries, the centenary of his birth (2021), the half-century anniversary of the Nobel prize (2023) and 70 years almost to the day (1 April) since the publication of the seminal article on Ferrocene (2022).[1]


The meeting was organised as “inverse hybrid” (to use the new terminology), with a maximum capacity in-person audience attending along with fourteen speakers, three of whom were remote and one who could not attend on the day but whose presentation was given on their behalf. I will not give abstracts for the talks here, but note two common themes that I thought emerged during the day.

  1. All the speakers found themes in either their memories of Wilkinson and their time in his laboratories or their current research work that show how he continues to  influence, along with the famous text book that he co-wrote, the modern world of chemistry. He truly left a remarkable legacy.
  2. This is a personal observation, but in his day, Wilkinson was famously sceptical of the ability of molecular modelling to cast profound insights into the molecules his group were studying. Yesterday I think with only one or two exceptions, the talks were accompanied by “DFT modelling” helping to provide such insights, either into the reaction mechanisms via energy profiles or into the properties of the molecules themselves, including their spectroscopy.

A small exhibition of artefacts included his famous portrait, all the editions of the text book and other items from his desk.

Finally, I thought I might explore the famous controversy surrounding the model of ferrocene which is shown in the photos below. It is shown with the two cyclopentadienyl rings in a so-called “eclipsed” conformation. To cast light on this, I show a search of the Cambridge crystal database of all molecules with this sub-structure. There are 24,868 of them.

The histogram plot of the dihedral angle is shown below. The staggered geometry has a dihedral of 36° and you can see a small maximum at this point in the distribution below. But this is dwarfed by 0°, the value for the eclipsed orientation. The  barrier  to  rotation is  known to be very small, and this is reflected in the almost continuous distribution amongst those 24,868 molecules.

References

  1. G. Wilkinson, M. Rosenblum, M.C. Whiting, and R.B. Woodward, "THE STRUCTURE OF IRON BIS-CYCLOPENTADIENYL", Journal of the American Chemical Society, vol. 74, pp. 2125-2126, 1952. https://doi.org/10.1021/ja01128a527

A four-atom molecule exhibiting simultaneous compliance with Hückel 4n+2 and Baird 4n selection rules for ring aromaticity.

March 22nd, 2022

Normally, aromaticity is qualitatively assessed using an electron counting rule for cyclic conjugated rings. The best known is the Hückel 4n+2 rule (n=0,1, etc) for inferring diatropic aromatic ring currents in singlet-state π-conjugated cyclic molecules and a counter 4n rule which infers an antiaromatic paratropic ring current for the system. Some complex rings can sustain both types of ring currents in concentric rings or regions within the molecule, i.e. both diatropic and paratropic regions. Open shell (triplet state) molecules have their own rule; this time the molecule has a diatropic ring current if it follows a 4n rule, often called Baird’s rule. But has a molecule which simultaneously follows both Hückel’s AND Baird’s rule ever been suggested? Well, here is one, as indeed I promised in the previous post.

The species shown above has two carbons and two borons in a ring. These have a total of 14 valence electrons, eight of which occupy the C-B bonds, leaving six contributing to circulating ring currents. These partition into two π-electrons which then form a Hückel 4n+2 aromatic (n=0) and four σ-electrons which then form a Baird 4n aromatic (n=1) as a triplet. The triplet for this molecule is indeed its lowest state, 38.9 kcal/mol or 45.4 kcal/mol in free energy lower than the two lowest energy singlet states. These arise by placing two electrons in either of the two orbitals σ2 or σ3 each singly occupied in the triplet state (FAIR Data collection: 10.14469/hpc/10267)


Bonding MOs for C2B2.
Click image to load 3D model
σ3 σ2
σ1
π1

So here we see a different sort of doubly aromatic molecule, to add to C4, B4 and N4. With two electrons less than C4, it is now doubly aromatic as a triplet state, this time conforming to two different electron counting rules. It would be good to know if any other examples showing this pattern are known.

Hückel’s rule originally applied to p-π electrons in a cycle, such as benzene. Nowadays it is also used for σ in-plane electrons in a cycle.


This post has DOI: 10.14469/hpc/10271.

More aromatic species with four atoms. B4 and N4.

March 19th, 2022

I discussed in the previous post the small molecule C4 and how of the sixteen valence electrons, eight were left over after forming C-C σ-bonds which partitioned into six σ and two π. So now to consider B4. This has four electrons less, and now the partitioning is two σ and two π (CCSD(T)/Def2-TZVPPD calculation, FAIR DOI: 10.14469/hpc/10157). Again both these sets fit the Hückel 4n+2 rule (n=0).

Since B4 has only two rather than six delocalized σ-orbitals, the contributions to the central B-B bond are weaker and so the B-B bond is much longer.

Bonding MOs for B4.
Click image to load 3D model
σ1, -0.335 au
π1, -0.372 au

Next, N4.

π-Bonding MOs for N4.
Click image to load 3D model
π3 π2
π1
σ-Bonding MOs for N4.
Click image to load 3D model
σ3 σ2
σ1

The pattern for N4 is different in several aspects. Firstly the π-system has six bonding electrons distributed over only four atoms. This makes the electron repulsions too high and the species is no longer stable, having one large imaginary force constant corresponding to an out-of-plane distorsion. Secondly the lowest energy σ orbital is highly localised onto two nitrogens rather than being delocalised around the ring periphery. So all those electrons crammed into a small space have taken their toll.

Thus far we have identified three species, B4, C4 and N4 with interesting sets of respectively 4,8 and 12 electrons, all partitioned into 4n+2 collections. But what happens if one cannot do that; lets say 6 and 10 electrons? Hang around to find out!

An unusually small (doubly) aromatic molecule: C4.

March 15th, 2022

When you talk π-aromaticity, benzene is the first molecule that springs to mind. But there are smaller molecules that can carry this property; cyclopropenylidene (five atoms) is the smallest in terms of atom count I could think of until now, apart that is from H3+ which is the smallest possible molecule that carries σ-aromaticity. So here I have found what I think is an even smaller aromatic molecule containing only four carbon atoms. And it is not only π-aromatic but σ-aromatic.

Let me go through the analysis (using a CCSD(T)/Def2-TZVPPD calculation, DOI: 10.14469/hpc/10226).

  1. Four carbons contain 16 valence electrons for bonding.
  2. Eight of these are conventional, forming four C-C single bonds around the 4-ring.
  3. Eight are left over, and these partition into a set of six and a set of two.
  4. The set of two are in p-π atomic orbitals and form a 4n+2 (n=0) aromatic system
  5. The set of six are in σ-sp AOs and form a 4n+2 (n=1) aromatic system.
  6. The three σ-MOs all contribute to the central C-C bond, particularly σ3 and σ2 in different ways.
  7. σ2 also reminds of [1.1.1]-propellane, where the two σ-electrons are in effect external to the central C-C bond, but spin coupled to form what might be called a σ exo-bond. There is also similarity to the exo bond in C2.
  8. The dissociation energy of the central bond can be estimated at 28 kcal/mol from the triplet state energy.
Bonding MOs for C4.
Click image to load 3D model
π1
σ3 σ2
σ1

So this little molecule carries a lot of diversity in its chemical bonding; an ideal candidate perhaps for a tutorial in bonding theory of organic molecules?


The post has DOI: 10.14469/hpc/10252

Raw data: the evolution of FAIR data and crystallography.

March 1st, 2022

Scientific data in chemistry has come a long way in the last few decades. Originally entangled into scientific articles in the form of tables of numbers or diagrams, it was (partially) disentangled into supporting information when journals became electronic in the late 1990s. The next phase was the introduction of data repositories in the early naughties. Now associated with innovative commercial companies such as Figshare and later the non-commercial Zenodo, such repositories are also gradually spreading to institutional form such as eg the earlier SPECTRa project of 2006[1] and still evolving.[2] Perhaps the best known, and certainly the oldest example of curated data in chemistry is the CCDC (Cambridge crystallographic data centre) CSD (Cambridge structural database) which has been operating for more than 55 years now. Curation here is the important context, since there you will find crystal diffraction data which has been refined into a structural model, firstly by the authors reporting the structure and then by CSD who amongst other operations, validate the associated data using a utility called CheckCIF.[3] What perhaps is not realised by most users of this data source is that the original or “raw” data, as obtained from a X-ray diffractometer and which the CSD data is derived from, is not actually available from the CSD. This primary form of crystallographic data is the topic of this post.

Most chemical data now emerges from an instrument, where it is already partially processed internally before being offered. Such raw/primary data is perhaps best known in the form of NMR information, where it is offered by the instrument in the form of an FID or free induction decay. Its transformation from this form into what all chemists know as a spectrum requires further software processing, and including other operations such as peak integration. It is this processed spectrum that had traditionally been offered as part of a scientific article (often only in visual, or peak listed form) and rarely has the FID form been made available to anyone interested. It is important to state that the transformation to spectrum also incurrs significant loss of data. An interesting project led by the editors of two organic chemistry journals[4],[5] had the aim of encouraging the submission of FAIR data to the journal, although in fact the project actually concentrated on the submission of raw NMR data. As it turned out, only a very small proportion of all the submissions to these journals over the period of a year actually provided such data (~113 datasets) in the form of ZIP archives and containing anywhere between one and ~100 actual sets of raw NMR data per archive. One should make the point that raw data is not necessarily FAIR data. The latter requires rich metadata describing the data to become findable, accessible, interoperable and reusable (FAIR), and such metadata was not actually generated as part of this project. 

Here I will take a closer look at potentially FAIR raw data in the area of crystallography. This project is perhaps less well known than the previous one,[4],[5] hence the present post strives to make it better known. As with NMR, a useful starting point is to describe the various stages in the lifecycle of crystal data.

  1. A crystal is mounted in the diffractometer and x-ray diffraction images are recorded. These are considered the raw data, and as with most instruments, their form is determined both by the instrument itself and the software used to start the refinement process into a molecular structure
  2. This refinement then assigns a space group to the data and derives so-called structure factors or hkl data. This data can now be captured in a much more standard form known as a CIF (crystallographic information file) and is nowadays the format that is deposited with CSD.
  3. A reduced form of the CIF file, containing a sub-set of the information but lacking the hkl data is much the more common, and was the form originally sent to CSD until a few years ago.
  4. Very often an image of the resulting model for the molecular structure is also included. Whilst it is based on the data in the CIF file, it does not contain reusable data as such and is considered as being made available only for human use and perception

It is form 1 that is missing from the CSD datasets. Because it can be quite large (~0.5-9 Gbyte), the current recommendation is that it is not stored on the CSD but on local data repositories. So now we see a need to establish if possible bidirectional links between type 1 and types 2-4 and to identify what characteristics of FAIR each has. Primarily, the F (findable) of FAIR will be explored here. This is done by illustrating some searches for this data, based on the metadata registered for it with DataCite.

  1. https://commons.datacite.org/?query=relatedIdentifiers.relatedIdentifier:10.5517/ccdc.csd*  (72 works)
    This simple search identifies any entry in any repository which cites in its metadata record the DOI for an entry in CSD, taking the form 10.5517/ccdc.csd* which is common to all entries.
  2. https://commons.datacite.org/?query=relatedIdentifiers.relatedIdentifier:*10.5517/ccdc.csd*+AND+(media.media_type:chemical/x-cif+OR+media.media_type:application/x-7z-compressed+OR+media.media_type:application/gzip+OR+media.media_type:application/zip) (8 works).
    This also specifies that search 5 is further constrained by requiring one of four media types to ALSO be present in the repository metadata record. These types are standard compressed archives which the raw crystal data is likely to be stored as, along with a CIF entry that is clearly associated with crystal structure data. The Boolean OR indicates that any one of them can be present! One can now be a little more certain that these entries contain crystal structure data. That we cannot be absolutely certain is clearly a current deficiency of the metadata present for the entries! 
  3. https://commons.datacite.org/?query=identifier:*10.5517/ccdc.csd*+AND+(relatedIdentifiers.relatedIdentifier:*10.14469/hpc/*) (7 works)
    The 8 works from search 6 originate from a repository with the prefix 10.14469/hpc/* and so now one can reverse the direction and ask how many are referenced in the metadata for each published item in the CSD. Around 327,064 entries in the CSD currently have a persistent DOI identifier associated with them, all starting with 10.5517/ccdc.csd (this is only around 25% of the total depositions there however) and so now one can search for how many of these also reference a related identifier at 10.14469/hpc/*  Seven of them show up there.
  4. Also in the CSD metadata records is an item with the attribute relationType=”IsDerivedFrom” carrying the meaning that the CSD data is itself derived from (raw) data held elsewhere. This information is captured during the deposition process with CCDC as per below.

    It should be possible to incorporate this property into a search as above, but its currently not working. When that is sorted, I will add that as search 8 here. This will give more idea of how many datasets in the  CSD are actually associated with additional raw data (CCDC tell me its around 65).

So with these projects aiming to capture data from chemical instrumentation are just starting to reveal the potential of this modern system for storing data in two or more locations and reconciling various forms of this data, from raw form to derived or processed data. The interested user can then use whichever form is most relevant to their needs, and having found one form can then trace back to the other form(s). We might anticipate many developments in this area in the near future. 


One has to expand the archive to find out how many actual raw datasets are inside, which is not ideal. 


This post has DOI: 10.14469/hpc/10177


References

  1. J. Downing, P. Murray-Rust, A.P. Tonge, P. Morgan, H.S. Rzepa, F. Cotterill, N. Day, and M.J. Harvey, "SPECTRa: The Deposition and Validation of Primary Chemistry Research Data in Digital Repositories", Journal of Chemical Information and Modeling, vol. 48, pp. 1571-1581, 2008. https://doi.org/10.1021/ci7004737
  2. M.J. Harvey, A. McLean, and H.S. Rzepa, "A metadata-driven approach to data repository design", Journal of Cheminformatics, vol. 9, 2017. https://doi.org/10.1186/s13321-017-0190-6
  3. A.L. Spek, "Structure validation in chemical crystallography", Acta Crystallographica Section D Biological Crystallography, vol. 65, pp. 148-155, 2009. https://doi.org/10.1107/s090744490804362x
  4. A.M. Hunter, E.M. Carreira, and S.J. Miller, "Encouraging Submission of FAIR Data at <i>The Journal of Organic Chemistry</i> and <i>Organic Letters</i>", The Journal of Organic Chemistry, vol. 85, pp. 1773-1774, 2020. https://doi.org/10.1021/acs.joc.0c00248
  5. A.M. Hunter, E.M. Carreira, and S.J. Miller, "Encouraging Submission of FAIR Data at <i>The Journal of Organic Chemistry</i> and <i>Organic Letters</i>", Organic Letters, vol. 22, pp. 1231-1232, 2020. https://doi.org/10.1021/acs.orglett.0c00383

Chasing ever higher bond orders; the strange case of beryllium.

February 7th, 2022

Ever since the concept of a shared two-electron bond was conjured by Gilbert N. Lewis in 1916,[1] chemists have been fascinated by the related concept of a bond order (the number of such bonds that two atoms can participate in, however a bond is defined) and pushing it ever higher for pairs of like-atoms. Lewis first showed in 1916[1] how two carbon atoms could share two, four or six electrons to achieve a bond order of up to three. It took quite a few decades for this to be extended to four for carbon (and nitrogen) and that only with some measure of controversy and dispute (for one recent brief summary, see[2]).

For the transition elements over the last forty years or so, bond orders of four, five and even six between like atom pairs have been mooted and many characterised.[3] Moving to the left of the transition elements in the periodic table, this hunt has looked at elements such as beryllium. Eleven years back, I explored here how a Be=Be double bond could be formed, strangely enough as an electronically excited state of the dispersion-bound weak Be2 dimer.[4] This species had a calculated Be-Be distance of 1.78Å, resulting from double excitation from the 2s σ*-antibonding orbital into the degenerate π-bonding orbital above it, giving four electrons in bonding valence orbitals. In 2019, three articles appeared which showed how this bond order might be extended to the lofty heights of three as in Be≡Be[5],[6],[7] for (hypothetical) molecules in their ground electronic state. Here I discuss one example from these articles and compare it to the excited state observations made previously.

A useful starting point is the standard molecular orbital diagram for Be2, illustrating why the ground state singlet actually has a bond order of zero.

The three 2019 suggestions[5],[6],[7] modified this to surround the Be2 core with e.g. six Li atoms, resulting in a stable singlet species with a Be-Be distance (calculated at e.g. the CCSD/Def2-TZVP level) of 1.99Å and exhibiting C2h symmetry. The role of the Li is to polarise and repopulate Be orbitals by delocalization of e.g. a 2c-2e bond in Be2 dimer into a 6c-2e bond in Be2Li6. The reported calculations (as successfully replicated here, FAIR DOI: 10.14469/hpc/10106) show the resulting molecular orbitals for Be2Li6 comprise an (accidentally) degenerate π-pair and a higher energy weak σ-orbital, together forming the proposed triple bond. This of course inverts the normal ordering of such bonds, for which the σ-orbital is lower in energy (more stable) than π-bonds. The form of the σ-orbital also reminds to some extent of the fourth σ-bond in C⩸C.

MOs for Be2Li6

HOMO, σ orbital

-0.158au

HOMO-1, π-pair,

-0.175au

HOMO-2, π-pair

-0.176au

Because the static 2D projections shown in the articles cited above do not always make for easy interpretation, if you click on the orbital thumbnails, you will get dynamic 3D isosurfaces to rotate and inspect. These were generated using the tool at https://www.ch.ic.ac.uk/rzepa/cub2jvxl/

The two lower energy 2s σ-orbitals, which taken together do not apparently contribute to the overall bond order in Be2Li6, are shown below.

Lower energy MOs for Be2Li6
σ -0.235au σ-0.496au

ELF (electron localisation function) integrations for Be2Li6 show each beryllium has two basins in the Be-Be region of about 2.5e each (red arrows) typical of triple bonds and two terminal Li-Be basins of 2.3e.

One aspect arising from my earlier post on the excited state Be=Be double bond relates to the reported calculated Be-Be bond length of 1.99Å and ν 718 cm-1 for ground state Be2Li6. To quote one article[5], “the Be≡Be triple bond in Li6Be2 may also be considered as another example of an ultraweak but ultrashort triple bond.” I had noted earlier that the electronically excited state of the Be2 dimer has a computed bond length of 1.78Å and ν 917 cm-1 for a double bond order, this being significantly shorter than the suggested ultrashort triple bond. We learn from this that the relationship between a bond order and a bond length may not always be linear. In other words, a longer bond may in fact have a higher bond order than a shorter bond between the same two atoms. The same was true as it happens with C⩸C; the mooted quadruple bond had a longer bond length than the triple bond in HC≡CH. That observation was controversial at the time; I suspect a similar phenomenon for Be has become less controversial.

To go back to the Be=Be dimer which started things off and that excited state with one electron in each of the degenerate π-orbitals (actually a triplet state). What would happen if two electrons were to be added, making an excited state of Be22-? Yes indeed, this species (CCSD/Def2-TZVPPD) has a calculated bond length of 1.885Å and ν 766 cm-1. If this di-anion is stabilised with a continuum water field (a milder version of surrounding the dimer with Li atoms), the Be-Be length contracts to 1.74Å, the Be-Be stretch increases to 949 cm-1 and the σ-orbital becomes more stable than the π-orbitals. At the higher CCSD(T)/Def2-TZVPPD/SCRF=water level, the bond length still has the ultrashort value of 1.761Å, which might be assumed as the natural value for Be≡Be, a classical triple bond. From that perspective, the “ultraweak but ultrashort triple bond” predicted for Be2Li6 actually emerges as a relatively long triple bond!

Our final exploration is to add two lithium atoms to Be2 to form the neutral LiBe≡BeLi. This was done in stages (see FAIR DOI 10.14469/hpc/10106), starting with a linear arrangement of atoms which revealed two negative force constants, a C2h shape with one negative force constant and ending with a C2 (chiral!) geometry with no negative force constants. This has a Be≡Be length of 1.705Å (ωB97XD/Def2-TZVPPD/SCRF=water), ν 1129 cm-1, a Wiberg bond index of 2.98 and a Li-Be bond index of 0.0065, indicating an entirely ionic lithium and again a central Be22- unit. As an excited state, it is 49.8 kcal/mol higher than the ground state of Be2Li2.

NBOs for LiBe≡BeLi

HOMO, π-pair,

-0.175au

HOMO, π-pair

-0.176au

HOMO-2, σ orbital

-0.158au

So to conclude, we have seen two different motifs for constructing a model of a Be≡Be triple bond, one recently reported in the literature for a ground state species with six lithium atoms surrounding the Be2 dimer and a simpler one with just two lithiums exhibiting a much shorter Be≡Be bond but which requires electronic excitation to achieve. So these two motifs are not equivalent. But hopefully this exercise shows how playing around with atoms and electrons can achieve very unusual bonding states and elevated bond orders from which one can learn a lot, although with the caveat that one does not always produce molecules capable of facile synthesis!


On a slightly different theme, Cs can be shown to sustain three bonds, albeit all to different atoms. See DOI: 10.6084/m9.figshare.861030 Li≡Li4- can also be calculated as the tetra-anion showing almost identical properties to Be≡Be2- with a Li≡Li triple bond distance of 2.11Å. See DOI: 10.14469/hpc/10122. Replication was necessary because the appropriate wavefunction files for analysis were not included in the supporting information. Only the coordinates were available for interoperation, and due to a quirk in the way Adobe Acrobat works, even those could not be easily transferred by a simple copy/paste operation to create a job input file. See e.g. here or DOI: 10.14469/hpc/10043 for more discussion. All the wavefunction files for this replication are available at the FAIR DOI noted above. The Be-Be distance in catena(dimethylberyllium), a polymer comprising two bridging Me units connecting Be atoms, is only slightly longer at 2.09Å[8] This fascinating transannular Be-Be interaction is one to be explored elsewhere.


The post has DOI: 10.14469/hpc/10125


References

  1. G.N. Lewis, "THE ATOM AND THE MOLECULE.", Journal of the American Chemical Society, vol. 38, pp. 762-785, 1916. https://doi.org/10.1021/ja02261a002
  2. H.S. Rzepa, "Routes involving no free C <sub>2</sub> in a DFT-computed mechanistic model for the reported room-temperature chemical synthesis of C <sub>2</sub>", Physical Chemistry Chemical Physics, vol. 23, pp. 12630-12636, 2021. https://doi.org/10.1039/d1cp02056k
  3. D. Lu, P.P. Chen, T. Kuo, and Y. Tsai, "The MoMo Quintuple Bond as a Ligand to Stabilize Transition‐Metal Complexes", Angewandte Chemie International Edition, vol. 54, pp. 9106-9110, 2015. https://doi.org/10.1002/anie.201504414
  4. P.J. Bruna, and J.S. Wright, "Strongly bound doubly excited states of Be<sub>2</sub>", Canadian Journal of Chemistry, vol. 74, pp. 998-1004, 1996. https://doi.org/10.1139/v96-111
  5. S.S. Rohman, C. Kashyap, S.S. Ullah, A.K. Guha, L.J. Mazumder, and P.K. Sharma, "Ultra‐Weak Metal−Metal Bonding: Is There a Beryllium‐Beryllium Triple Bond?", ChemPhysChem, vol. 20, pp. 516-518, 2019. https://doi.org/10.1002/cphc.201900051
  6. X. Liu, R. Zhong, M. Zhang, S. Wu, Y. Geng, and Z. Su, "BeBe triple bond in Be<sub>2</sub>X<sub>4</sub>Y<sub>2</sub> clusters (X = Li, Na and Y = Li, Na, K) and a perfect classical BeBe triple bond presented in Be<sub>2</sub>Na<sub>4</sub>K<sub>2</sub>", Dalton Transactions, vol. 48, pp. 14590-14594, 2019. https://doi.org/10.1039/c9dt03321a
  7. S.S. Rohman, C. Kashyap, S.S. Ullah, L.J. Mazumder, P.P. Sahu, A. Kalita, S. Reza, P.P. Hazarika, B. Borah, and A.K. Guha, "Revisiting ultra-weak metal-metal bonding", Chemical Physics Letters, vol. 730, pp. 411-415, 2019. https://doi.org/10.1016/j.cplett.2019.06.023
  8. A.I. Snow, and R.E. Rundle, "The structure of dimethylberyllium", Acta Crystallographica, vol. 4, pp. 348-352, 1951. https://doi.org/10.1107/s0365110x51001100

Data base or Data repository? – A brief and very selective history of data management in chemistry.

January 26th, 2022

Way back in the late 1980s or so, research groups in chemistry started to replace the filing of their paper-based research data by storing it in an easily retrievable digital form. This required a computer database and initially these were accessible only on specific dedicated computers in the laboratory. These gradually changed from the 1990s onwards into being accessible online, so that more than one person could use them in different locations. At least where I worked, the infrastructures to set up such databases were mostly not then available as part of the standard research provisions and so had to be installed and maintained by the group itself. The database software took many different forms and it was not uncommon for each group in a department to come up with a different solution that suited its needs best. The result was a proliferation of largely non-interoperable solutions which did not communicate with each other. Each database had to be searched locally and there could be ten or more such resources in a department. The knowledge of how the system operated also often resided in just one person, which tended to evaporate when this guru left the group.

After the millennium, two newcomers started to appear, one being called an ELN (electronic laboratory notebook) and the second a data repository. The first was a heavily customised database containing research data as obtained from instruments, computers, images/video, chemical structure drawings etc. ELNs, even to this day, have limitations of interoperability with other ELNs and the contents of an ELN are often closed, requiring authentication credentials to access. The data repository also started to appear in chemistry around this period. Even in its early incarnations, it could be associated with an ELN “front end” as part of the data pipeline; an early example of this coupling is described here.[1] Another key phrase that became associated with repositories starting around 2014 was the concept of FAIR, including ideas such as the Findability (discoverability) and Interoperablity of data, a theme often explored and illustrated on this blog.

These last seventeen years has seen organisations such as funding agencies and publishers increasingly mandating the use of such data management methods, using either a repository on its own or a combination of an ELN and repository as routine operations in research activity and publication processes. The close coupling of an ELN and repository is still however uncommon. 

A colleague recently alerted me to a computational chemistry repository first launched in 2014; www.iochem-bd.org  Reading the about text, I found these statements;

  • Chem-BD is a digital repository aimed to manage and store Computational Chemistry files.
  • Goals: Build a distributed database of computational chemistry results: reduce size and increase value.
  • Set a common data standard among all quantum chemistry legacy formats (XML – CML[2])

So this is both a database and a data repository, as well as espousing a commendable common data standard![2] I decided to explore the first two aspects here using this resource as an example.

  • Whilst the absolute distinction between the two types can be blurry, the crucial difference between the two is that a database functions on curation via a structured index of the data, whilst a repository aspires to having FAIR attributes primarily through its metadata as exposed by registration (metadata is data describing the data).
  • A database holds this data index locally and the Findability of the data is associated purely with the functionality of  the database. The data structures are defined by a database schema, describing in detail all the terms indexed (a key and its value) and searched using the values of these key pairs. This schema is unlikely to be exactly the same as e.g. databases on related topics, largely because the database is self-contained and self-consistent.
  • A data repository also uses a schema (DOI: 10.14454/3w3z-sa82 and[3]) to express the key pairs, but this time it is expressed as metadata. Now, this metadata is registered externally to the repository using a registration agency.[3] The metadata for each deposited object is assigned a persistent identifier known as a DOI. Although it might be indexed and searchable locally, it must be capable of also being searched in aggregated/federated form using services provided by registration or other agencies. This independence of metadata is part of those FAIR criteria.
  • Whereas a database can be very finely grained in order to describe individual properties of an object, repository metadata tends to be more coarsely grained to describe the object as a whole, to place it in context and to impart provenance.
  • Both databases and repositories can have what is called an API (application programmer interface) to allow machine access (the A of FAIR) to the contents. Accessing the former would normally require bespoke code to be written and possibly authentication credentials, whereas information to access to repository held data is provided via the registered metadata (which does not normally require credentials). Access to the repository may also require code, but if the metadata is carefully standardised by adherence to the schema, the code can be made more general than that required for a database.
  • A typical entry in the www.iochem-bd.org repository has a DOI of 10.19061/iochem-bd-4-36
  • This DOI is registered with the CrossRef agency, one normally used for registering journal articles, rather than DataCite which is used for registering data and other research objects. The metadata for this DOI can be viewed using the resolution service https://api.crossref.org/works/10.19061/iochem-bd-4-36/transform/application/vnd.crossref.unixsd+xml and shows that it largely contains the bibliographic information typical of a journal article. So in this sense it is certainly a repository, but using a metadata schema that is more frequently used for journal articles than for data sets.
  • The CrossRef metadata record also has an item <resource>https://www.iochem-bd.org/handle/10/235025</resource> which points to the so-called landing page for that item, but information about the properties of the actual data itself must be instead obtained directly from the repository. 
  • Because the metadata describing the data is only held at this repository and not elsewhere (a local metadata record), it can only be queried locally and the query cannot be upon aggregated metadata  provided by the registration agency. A machine query would have to be constructed by coding a suitable request using the API provided for the database aspect of this repository. 

This example has served to highlight just a few of the often quite subtle distinctions between eg a database and a data repository and that some examples can indeed be both.  It also highlights that repositories can have the attributes of  FAIR, which in themselves are driven by asking “what could a machine do to obtain data?” rather than what could a human achieve by browsing. So another question that arises when evaluating the characteristics of a repository is whether each item held there has a FAIR-enabling metadata record describing the data, a record which is registered in a manner that can be aggregated and hence used to find and access content across multiple independent repositories.


This post has DOI 10.14469/hpc/10043


Indeed in that era, few online/Internet infrastructures were available as part of departmental resources. See also here.  In this last regard, I note a workshop devoted largely to such interoperability and machine access in chemistry coming up soon; https://www.cecam.org/workshop-details/1165 The CrossRef schema is not referenced using an assigned DOI: data.crossref.org/reports/help/schema_doc/5.3.1/.An example can be seen at doi: 10.14469/hpc/10059 Here, invoking a hyperlink based purely on the data DOI and the data media type required in turn calls code (Javascript) which retrieves the metadata held for that DOI and parses it to identify whether it indicates the presence of a file manifest. If it does, it identifies the type of manifest (ORE in this case) and the media types the manifest points to and finally uses that manifest to then retrieve data filtered by media type and pipes it into a visualiser (JSmol). In this case the endpoint is visualisation, but it could also be eg piped into an AI/ML program for analysis. In this case only one instance of data is machine retrieved, but in principle it could be a multitude of data files obtained from a multitude of different locations and based on a multitude of criteria as filtered by suitable searches of registered metadata.[4]


References

  1. M.J. Harvey, N.J. Mason, and H.S. Rzepa, "Digital Data Repositories in Chemistry and Their Integration with Journals and Electronic Notebooks", Journal of Chemical Information and Modeling, vol. 54, pp. 2627-2635, 2014. https://doi.org/10.1021/ci500302p
  2. P. Murray-Rust, and H.S. Rzepa, "Chemical Markup, XML, and the Worldwide Web. 1. Basic Principles", Journal of Chemical Information and Computer Sciences, vol. 39, pp. 928-942, 1999. https://doi.org/10.1021/ci990052b
  3. H. Cousijn, T. Habermann, E. Krznarich, and A. Meadows, "Beyond data: Sharing related research outputs to make data reusable", Learned Publishing, vol. 35, pp. 75-80, 2022. https://doi.org/10.1002/leap.1429
  4. H.S. Rzepa, and S. Kuhn, "A data‐oriented approach to making new molecules as a student experiment: artificial intelligence‐enabling FAIR publication of NMR data for organic esters", Magnetic Resonance in Chemistry, vol. 60, pp. 93-103, 2021. https://doi.org/10.1002/mrc.5186