Category: crystal_structure_mining

  • Quantum crystallography: The structure and C-C bond length alternation of [18]-annulene.

    In my story about one of the molecules of the year, cyclo[48]carbon,[cite]10.1126/science.ady6054[/cite] I noted that the DFT method used in the literature to model the C-C bond length alternation around the ring (OX B3LYP30[cite]10.1021/acsnano.4c14100[/cite]) had been re-calibrated against a remeasured crystal structure[cite]10.5517/ccdc.csd.cc2gzmz2[/cite] of C18H18 or [18]-annulene (below) in order to reproduce the observed values for this molecule.


    [18]-annulene

    A noteworthy aspect of this structure is the six hydrogen atoms pointing into the centre of the ring, which come into very close contact with each other. The conventional method of refining the crystal structure (which includes an assumption that the electron density surrounding the H and indeed other atoms is spherical) results in C-H distances which are too short by about 0.1Å, which has the knock on effect that the H…H separations are now too long. The recent introduction of a refinement method (NoSpherA2) which uses DFT-calculated non-spherical atom electron density distributions rather than spherical ones has the effect of producing more sensible values for e.g. C-H distances[cite]10.59350/5dy8w-0zs92[/cite] and so by implication, results in much shorter inner H..H distances for [18]-annulene. The question now is: do these shorter H…H distances in turn have any effect on the C-C ring distances, and hence affect the alternation of these distances around the ring and the resulting outcome of the calibration process for the development of the OX B3LYP30 method.

    Method: We decided to re-refine the structure of [18]annulene (CCDC refcode ANULEN03[cite]10.5517/ccdc.csd.cc2gzmz2[/cite]) using modern quantum crystallography (NoSpherA2[cite]10.1107/S0021889808042726[/cite],[cite],[cite]10.1039/d0sc05526c[/cite]). To do this, we used Def2-SVP as the basis set and wB97X-V for the method, with a multiplicity of 2 in the settings for the OLEX2 program.

    The published structure has the molecule sitting across a centre of symmetry so only half of it is unique, and it was found to be disordered with a second orientation of the complete molecule (effectively the macrocycle rotated in plane by ca. 30°) in a ca. 84:16 ratio. This caused trouble with the quantum crystallography refinements as allowing all of the hydrogen atoms to be positionally free (i.e. removing the AFIX commands) and anisotropic at the same time caused 6 of the 8 hydrogen atoms of the minor occupancy component to “wander off” into chemically nonsensical positions, and 4 of the major occupancy plus all 8 of the minor occupancy hydrogen atoms went non positive definite (one of the thermal ellipsoid radii refined to a negative length).

    However, we discovered that doing the refinement in stages allowed a more settled structure. Starting with the published structure and allowing all of the hydrogen atoms to be positionally free gave a nice stable result. Allowing the hydrogens to go anisotropic afterwards did result in 1 of the major occupancy and all 8 of the minor occupancy hydrogen atoms going non positive definite (n.p.d.), but the positions of the hydrogen atoms remained sensible. Subsequently reverting all 8 hydrogen atoms of the minor occupancy component to be isotropic resulted in a stable and sensible refinement where the sole non positive definite atom of the major occupancy component corrected itself into being normal (i.e. no longer non positive definite). This is the re-refined version of the structure we used for further analysis below.[cite]10.14469/hpc/15681[/cite]

    Analysis: The closest H···H separations for the “inner” hydrogen atoms of the major occupancy orientation emerge as 1.8276(9), 1.8791(9) and 1.9022(8) Å (Figure 1, mean 1.870Å) This compares to the values extracted from the published structure of 1.99252(4), 2.02490(3) and 2.05217(3), mean = 2.0232Å for the major occupancy orientation, a difference of Δ -0.153Å.

    Figure 1.

    A search for other close H…H contacts: A search of the crystal structure database for close intramolecular H…H distances of <1.9 Å (< 100K, R < 0.05, no errors, excluding H-C-H substructures) reveals the following distribution (Figure 2). Although examples of distances <1.9 Å are relatively sparse (95, February 2026), they are not that unusual. It is highly probable that all these examples were determined using the classical method of spherical atoms. It is to be expected that in the future, examples refined using non-spherical atoms will start appearing – and that one will be specifically able to search for such analyses.

    Figure 2.

    We focussed on just one of the entries in the figure: DOCKEO[cite]10.1021/acs.orglett.9b00717[/cite]
    ,[cite]10.5517/ccdc.csd.cc21qkbp[/cite] (which is a masked [14]annulene) being an example (see Figure 3) of a compound having an even shorter apparent H…H contact of ~1.65Å (after C-H distance correction). This was also subjected to NoSpherA2[cite]10.1107/S0021889808042726[/cite],[cite],[cite]10.1039/d0sc05526c[/cite] analysis. The structure is in a chiral space group P212121 with no firm indication of the correct enantiomer (the Flack of 0.3(4) is very indeterminate with a large error that encompasses the whole range). It was initially refined as a 2-component racemic twin (using TWIN/BASF) to no real effect. Although this is the standard approach when the Flack is far from zero, it was not really surprising that it had no effect, given the large error (σ = 0.4). Next it was noticed that the original authors had not modelled some evident disorder in one of the CF3 groups. Since the fluorine thermal parameters were reasonable, it is understandable to ignore it, but the largest residual electron density peaks were around this group in obvious disorder positions and with the extra precision desired in quantum crystallography refinements, it was best to model this. A quick rough and ready approach was adopted, one not to be used in a structure of “publication quality”, but enough to “soak up” the electron density. Next, NoSpherA2 was used in a refinement that relaxed the H atom positions (no AFIXes). This worked sensibly, though it had fairly little effect on the R-factor. However, refining the hydrogen atoms anisotropically went poorly; of the 15 hydrogen atoms, 5 went n.p.d, another 5 went nearly n.p.d, and only 2 of them could be described as approaching reasonable. Ultimately, handling the hydrogen atoms was done isotropically. Finally, adding an extinction parameter caused a final 0.2% drop in the R-factor and the H…H distance of closest approach emerged as 1.600Å (Figure 3).

    Figure 3.

    It is worth noting that this distance is not what it might seem. Thus the calculated DFT H-H distance (using r2SCAN-3c) is 1.8975Å, or Δ0.2975Å. It corresponds to a calculated double minimum potential energy well. However, a Cs-symmetric form with the hydrogen located at the centre of this double well turns out to be a transition state with a shorter H…H separation of 1.7393Å. The imaginary calculated transition mode of νi 61 cm-1 is associated with a tiny free energy barrier of ~0.03 kcal/mol, well below a quantum of vibrational energy and hence the observed hydrogen will in fact correspond to that of a single minimum potential well. The lesson learnt from this analysis is that measured distances (for a single potential well) and calculated distances (for a double potential well) may not always correspond and care must be taken in interpreting such distances.

    C-C Distances in [18]-annulene. The nine unique pairs of C-C distances in the measured structure of [18]annulene derive from Figure 4 and the atom numbering shown there.

    Figure 4.

    Table 1. Crystallographic C-C bond lengths and Δ differences for [18]-annulene, Å

    Original refinement[cite]10.5517/ccdc.csd.cc2gzmz2[/cite]
                                       old Δ
    C2 C3 1.403(2)   C2 C1 1.3883(17)  0.0147
    C3 C4 1.3913(14) C2 C3 1.403(2)    0.0117
    C5 C4 1.3926(15) C3 C4 1.3913(14)  0.0013
    C5 C6 1.4056(18) C5 C4 1.3926(15)  0.0130
    C7 C6 1.3870(15) C6 C5 1.4056(18)  0.0186
    C8 C7 1.3927(15) C7 C6 1.3870(15)  0.0057
    C8 C9 1.4032(19) C8 C7 1.3927(15)  0.0105
    C9 C1 1.3897(15) C8 C9 1.4032(19)  0.0135
    C2 C1 1.3883(17) C9 C1 1.3897(15)  0.0014
                                  Mean 0.0100Å
    NoSpherA2 refinement                New Δ     old Δ
    C2 C3 1.4036(12)  C2 C1 1.3948(11)  0.0088   0.0147
    C3 C4 1.3954(9)   C2 C3 1.4036(12)  0.0082   0.0117
    C5 C4 1.3939(10)  C3 C4 1.3954(9)   0.0015   0.0013
    C5 C6 1.4064(11)  C5 C4 1.3939(10)  0.0125   0.0130
    C7 C6 1.3928(10)  C5 C6 1.4064(11)  0.0136   0.0186
    C8 C7 1.3938(10)  C7 C6 1.3928(10)  0.0010   0.0057
    C8 C9 1.4126(12)  C8 C7 1.3938(10)  0.0188   0.0105
    C9 C1 1.3846(10)  C8 C9 1.4126(12)  0.0280   0.0135
    C2 C1 1.3948(11)  C9 C1 1.3846(10)  0.0102   0.0014
                                   Mean 0.0114Å  0.0100Å
    

    Shown below is the atom numbering used in the r2-SCAN-3C DFT geometry optimisation[cite]10.14469/hpc/15615[/cite] (Figure 5).

    Figure 5

    Table 2. Computed r2-SCAN-3c C-C bond lengths and Δ differences for [18]-annulene, Å

                                     Δ
    1 5   1.40751  5 11  1.39010  0.01741
    5 11  1.39010 11 3   1.39020  0.00010
    11 3  1.39020  3 15  1.40755  0.01735
    3 15  1.40755 15 13  1.39017  0.01738
    15 13 1.39017 13 7   1.38998  0.00019
    13 7  1.38998  7 9   1.40748  0.01750
    7 9   1.40748  9 35  1.39009  0.01739
    9 35  1.39009 35 19  1.39009  0.00000
    35 19 1.39009 19 23  1.40752  0.01743
                             Mean 0.0116
    

    Conclusions: We set out to study the extent to which the C-C distances in the [18]-annulene molecule, as used to calibrate a modified DFT method[cite]10.1126/science.ady6054[/cite] could be affected by steric compressions in the centre of the ring caused by close approaches of the inward pointing hydrogens. The NoSpherA2 method of crystal structure refinement results in a slight increase in the C-C bond length alternation around the ring, from 0.0100 to 0.0114Å, but  given that this analysis is quick and easy to perform, there is no reason not to use it as a standard method for structures used for calibration purposes. The newly re-refined bond alternating distance compares with 0.0116Å calculated using the r2-SCAN-3c DFT procedure and 0.0112Å calculated using the original literature[cite]10.1126/science.ady6054[/cite] OX B3LYP30 method which had been calibrated against this distance. Both the DFT methods are thus seen to perform very well against the measured bond length alternation. Clearly however there is a need to undertake more such studies for a clearer understanding of the performance of DFT methods in this area.


    As the macrocycle sits across a centre of symmetry there are only 3 unique H…H distances and 9 unique C-C differences.

  • Cyclo-Heptasulfur, S7 – a classic anomeric effect discovered during a pub lunch!

    Way back in 1977, the crystal structure of the sulfur ring S7 was reported.[cite]10.1002/anie.197707151[/cite] The authors noted that “The δ modification of S7 contains bonds of widely differing length: this has never been observed before in an unsubstituted molecule.” No explanation was offered, although they note that similar effects have been observed in S8O, S7I+ and S7O. The S7 molecule was yesterday brought to my attention (thanks Derek!) over a pub lunch and in the time honoured manner of scientists, sketched out on a napkin – with a pen obtained from the waitress!. As an “organic chemist”, I immediately thought “anomeric effects”. And so indeed it has proven. A calculation using the MN15L/Def2-TZVPP DFT method and analysis using the Weinhold NBO7 procedure[cite]10.14469/hpc/15228[/cite] reveals the following structure (with Cs symmetry) and indeed the four unique S-S distances are all different (experimental values in parentheses). So how does this arise?

    Effect 1 is the donation of a lone pair from sulfur S4 or S2 into the antibonding orbital of the long S3-S7 bond labelled 2.174Å. The NBO E(2) perturbation energy is 12.35 kcal/mol, a fairly large effect when you consider that the more conventional value involving oxygen instead of sulfur is ~16 kcal/mol. There are two such donations (black and red) and so this long bond is doubly lengthened. Simultaneously the S4-S7 or S2-S3 bonds associated with the donor sulfur are shortened to 1.982Å.

    You can see the orbitals involved below (click on the image to obtain a 3D rotatable model) and consider that the blue phase overlaps positively with the purple and also the red with orange. These overlaps conspire to move electrons from the S4 lone pair into the S4-S7 bond and to move electrons from the S3-S7 bond into an S3 lone pair and hence to shorten the first to give it some π-bond character (Wiberg bond index 1.1796) and to lengthen the second bond (Wiberg bond index 0.8295).

    Effect 2 is the donation of a lone pair from sulfur S3 or S7 into the antibonding orbital of the S1-S2 bond with length 2.087Å. Only one donation – E(2) is now 10.12 kcal/mol – for each of the two S-S antibonding orbitals occurs (S1-S2 and S4-S5) and hence the lengthening of these is less than before. This again serves to shorten the S2-S3 and S4-S7 bonds labelled with the distance of 1.982Å

    A smaller effect (E(2) 4.6 kcal/mol) occurs between S2/S4 and S1-S6/S5-S6.

    So this adds a nice stereoelectronic explanation to an observation made almost 50 years ago. Perhaps this example should be included in all taught inorganic curricula?


    Postscript: The S-S stretching frequencies vary a great deal. The symmetric and antisymmetric S2-S3 and S4-S7 modes are respectively ν 564 and 557 cm-1 whilst the S3-S7 mode is way less at 370 cm-1


    Postscript 1: The smaller S5 ring also shows this effect, but to a smaller extent (E(2) = 6.1 kcal/mol) and νS1-S2 = 382 vs 546 and 540cm-1

    Also for fun, how about singlet state cyclo-O7 (heptaoxolane)? Unsurprisingly, the anomeric effects noted for S7 itself are amplified to the point that the molecule dissociates to O3 and 2O2 (singlet).

    Finally, singlet state cyclo-O5 (pentaoxolane)

    Here νO-O cover the remarkable range from 1519, 1101, 953, 227 to 200 cm-1 (purple values in diagram above)

    These vibrations are associated with the following NBO E(2) Energies; O2Lp-O3O4σ* 34.1, O2Lp-O1O5σ* 23.3, O4Lp-O1O5σ* 20.7, O5Lp-O3O4σ* 19.9, O1Lp-O2O3σ* 12.1, O3Lp-O1O2σ* 10.6.

    In addition to these lone pair to σ* interactions, there are two very high σ to σ* interactions (O1O5 to O3O4* 39.9 and O3O4 to O1O5* 33.5 kcal/mol) which strongly suggest very high so-called multi-reference character to the wavefunction.

    Although not a molecule that is ever likely to be isolated in a laboratory, cyclo-O5 still has a lot to teach us.


    Note added April 2026: Anomeric effects in linear polysulfide anions such as S82-[cite]https://doi.org/10.5517/ccykj88[/cite] were previously noted on this blog[cite]10.59350/ae8gx-pqy35[/cite].

  • Blue blood.

    Respiratory pigments are metalloproteins that transport O2, the best known being the bright red/crimson coloured hemoglobin in human blood. The colour derives from Fe2+ at the core of a tetraporphyrin ring. But less well known is blue blood, and here the colour derives from an oxyhemocyanin unit based on Cu1+ (the de-oxy form is colourless) rather than iron. See below for the carapace of a red rock crab.

    Here I take a look at this very unusual structure, the core of which is an imidazole ring coordinated via nitrogen to the metal Cu.
    A search of the crystal structure database for the following sub-structure reveals 12 hits, with a range of O-O distances ranging from 1.37 to 1.54Å. A histogram of the O-O lengths in the Cu(O-O)Cu sub structure shown below shows quite a distribution amongst the 12 known examples.

    Of these, one (UTETEU[cite]10.1039/D2FD00162D[/cite], DOI: [cite]10.5517/ccdc.csd.cc1l9d7j[/cite]) is perhaps the closest to the oxyhemocyanin core, albeit with the imidazole heterocycle replaced by the isomeric pyrazole ring (no Ag or Au examples are known). The overall 2+ charge deriving from two Cu1+ units is internally balanced with two 4-coordinate B1- end caps, and this system was chosen as the starting model for some computational studies.[cite]10.1021/ja00030a025 [/cite]

    Firstly, the crystal structure reveals an O-O distance of 1.531Å; the O=O distance (from crystal structures where it is present) is ~1.24Å (DOI: 10.5517/cct597h) for neutral (triplet?) oxygen, ~1.50Å for the dianion O22- and 1.32Å for the monoanion O21-[cite]10.1039/A800952J[/cite].

    Computational models were constructed at the ωB97XD/Def2-SVPP level, FAIR Data DOI: 10.14469/hpc/12584.

    The computed O-O distance for a singlet state of the complex is shorter than that measured in the crystal structure (1.368 vs 1.531Å). At the better Def2-TZVPP basis set level, the O-O bond length is 1.379Å, still shorter. A model of singlet state oxyhemocyanin itself (Def2-TZVPP) as a di-cation (these charges are balanced by carboxylate anions from the surrounding protein) shows a very similar O-O bond length (1.361Å).

    How about the oxyhemocyanin as a triplet state, the same state of isolated oxygen itself? Oxyhemocyanin now has a O-O distance of 1.477Å (Def2-TZVPP) and a Cu-O distance of 1.972 (1.934 from crystal structure of UTETEU). The UTETEU analogue has a calculated distance of 1.483Å (crystal structure 1.531Å), which strongly suggests that this system exists as a triplet rather than as a singlet spin state (click on image below to view as a 3D model).

    The spin density in UTETEU is shown above, which indicates that the two unpaired electrons are delocalised on Cu, nitrogen and O atoms, compared to only the oxygen in O2 itself.

    So we may conclude from this brief investigation into the structures of this component of “blue blood” captures oxygen as a sandwich between two copper atoms (a mode very unlike the iron equivalent in hemoglobin), and moreover that the spin state in this capture retains the triplet motif of gaseous oxygen itself, whilst the spin density of the unpaired electrons is delocalised over both copper, nitrogen and oxygen.


    This post has DOI: 10.14469/hpc/13111


  • Tunable aromaticity? An unrecognized new aromatic molecule?

    Some time ago in 2010, I showed a chemical problem I used to set during university entrance interviews. It was all about pattern recognition and how one can develop a hypothesis based on this. In that instance, it involved recognising that a cyclic molecule which appeared to have the cyclohexatriene benzene-aromatic pattern 1 was in fact a trimer of carbon dioxide. Perhaps small amounts of this aromatic molecule exist in solutions of fizzy drinks? Analysing these patterns occupied about 10-20 minutes of an interview, and although you might think I was posing a difficult challenge, many students successfully rose to it! Now I revisit, but with a slightly better reality check on a related molecule 2 (cyanuric acid).

    .

    As many as 58 examples of crystal structures of 1,3,5-triazinane-2,4,6-trione 2 (cyanuric acid) are known, often with a co-adduct. Cyanuric acid is in effect a cyclic trimer of isocyanic acid rather than of carbon dioxide. These examples tend to be planar, with a mean C-N ring distance of ~1.37Å and a C-O distance of 1.22Å. 

    Two outliers stand out, both from a very recently published article, being a co-adduct with melamine (1,3,5-triazine-2,4,6-triamine).[cite]10.1016/j.apsusc.2022.155161[/cite] QACSUI02 exhibits a shorter C-N distance of ~1.33Å but a longer C-O distances of 1.32Å and have a symmetrical patten of hydrogen bonds to the six receptors of the central unit. Could this correspond more closely to the cyclohexatriene resonance structures shown to the left of the diagram at the top? The first task is to see if these bond lengths can be replicated using calculation (often a useful procedure to check that the crystal structure is correct). For this purpose, the structure below was chosen as the starting point for various models, using an ωB97XD/Def2-TZVPP model.

    Model C-N distance C-O distance
    QACSUI02 (crystal structure) 1.331 1.318
    ωB97XD/Def2-TZVPP as single layer 1.3678 1.2185
    ωB97XD/Def2-TZVPP three layers 1.365 1.218
    ωB97XD/Def2-TZVPP no H-bonds 1.3816 1.2002

    XAKSOU (crystal structure) 1.367 1.208
    ωB97XD/Def2-TZVPP  1.3670 1.2213

    This creates a mystery. The calculated bond lengths show that whilst H-bonding to the central ring decreases the C-N length by 0.014Å and increases the C-O length by 0.017Å, this effect is nowhere near large enough to match the apparent lengths in the crystal structure, where a C-N effect of ~0.037Å would be needed.

    Another system XAKSOU has been reported where discrete LiCl units replace the hydrogen the H-bonds formed to melamine above.[cite]10.1039/C7CE00037E[/cite] A Li is coordinated to the carbonyl oxygen instead of a hydrogen bond, and a chloride anion from another molecule in the unit cell replaces the H-bond to nitrogen.

    In the computed model, an intramolecular Cl-H hydrogen bond is used as the model, resulting in similar C-N lengths as the crystal structure (one which does not match the lengths in the outlying crystal structure QACSUI02)

    So the final question to ask is whether this latter structure is aromatic. NICS(0)/(1) values of -2.8/-1.1ppm are computed, which suggests very little aromaticity (aromatic values would be -10 to -20 pm). So it does not seem as if aromaticity can be tuned into cyanuric acid 2 by polarising both the NH and CO units with ionic/H-bond interactions so that the aromatic cyclohexatriene motif is better favoured over the 1,3,5-triazinane-2,4,6-trione non-aromatic resonance form. Are there any other examples where aromatically tunable molecules might be possible?

  • Geometries of proton transfers: modelled using total energy or free energy?

    Proton transfers are amongst the most common of all chemical reactions. They are often thought of as “trivial” and even may not feature in many mechanistic schemes, other than perhaps the notation “PT”. The types with the lowest energy barriers for transfer often involve heteroatoms such as N and O, and the conventional transition state might be supposed to be when the proton is located at about the half way distance between the two heteroatoms. This should be the energy high point between the two positions for the proton. But what if a crystal structure is determined with the proton in exactly this position? Well, the first hypothesis is that using X-rays as the diffracting radiation is unreliable, because protons scatter x-rays very poorly. Then a more arduous neutron diffraction study is sometimes undertaken, which is generally assumed to be more reliable in determining the position of the proton. Just such a study was undertaken for the structure shown below (RAKQOJ)[cite]10/c3zxh2[/cite], dataDOI: 10.5517/cc57db3 for the 80K determination. The substituents had been selected to try to maximise the symmetry of the O…H…N motif via pKa tuning (for another tuning attempt, see this blog). The more general landscape this molecule fits into[cite]10.1039/C1RA00219H[/cite] is shown below:

    The results obtained for the position of the proton for RAKQOJ were fascinating. They were very dependent on the temperature of the crystal! At room temperatures (using X-rays), the proton was measured as 1.09Å from the oxygen and 1.47Å from the nitrogen (neutral form above). At 20K, the OH distance was 1.309Å and the HN 1.206Å (~ionic form above). Indeed, the very title of this article is First O-H-N Hydrogen Bond with a Centered Proton Obtained by Thermally Induced Proton Migration. The authors give a number of reasons for this behaviour (their ref 17[cite]10/c3zxh2[/cite] and also[cite]10.1039/C1RA00219H[/cite]), but one they do not mention is thermally induced changes in the dielectric constant of the crystal with temperature, given that in one position for the proton the molecule is ionic and in the other neutral. So I decided to model the system as a function of solvent. In this model, the solvent dielectric is used to approximate the crystal dielectric. My first choice of energy function is to compute geometries using the B3LYP+GD3BJ/Def2=TZVPP/SCRF=solvent method to see what might emerge and as a possible prelude to trying other functionals. FAIR data for these calculations are collected at DOI: 10.14469/hpc/10368.

    Solvent ε ΔG298 for O…HN rO…H rHN ΔG298 for OH…N rOH rH…N ΔG298
    TS (PT)
    rOH rHN
    Water 78.4 -2893.387188
    -2893.334325
    1.4913 1.0827 -2893.386705
    -2893.334333
    1.0364 1.5696 -2893.387668
    -2893.336183
    1.1852 1.2899
    Dichloro
    methane
    8.9 -2893.385173 1.4566 1.0945 -2893.385662 1.0309 1.5878 -2893.386022 1.2072 1.2642
    Chloroform 4.7 -2893.382254 1.4227 1.1082 -2893.384514 1.0261 1.6049 -2893.384773 1.2321 1.2388
    Dibutyl ether 3.1 -2893.380813 1.3778 1.1302 -2893.383511 1.0213 1.6235 -2893.382918 1.2667 1.2078
    Toluene 2.4 -2893.379752 1.3248 1.1635 -2893.382915 1.0178 1.6385 -2893.379773 1.2851 1.1934
    Gas phase 0 n/a -2893.377949 1.0009 1.7387 n/a
    Expt (RT)
    [cite]10/c3zxh2[/cite]
    ? n/a 1.09 1.47 n/a
    Expt (20K)
    [cite]10/c3zxh2[/cite]
    ? n/a 1.309 1.206 n/a

    At 20K

    Results:

    1. The geometries for each model are obtained by minimising the total energy of the system as a function of the 3N-6 geometric variables (coordinates). 
    2. The geometries show that for all solvents, TWO minima in the total energy are obtained, one for the ionic and one for the neutral form. This is called a double-well energy potential. Even a non-polar solvent such as toluene produces a solvation energy of ~3.1 kcal/mol compared to the gas phase, which is sufficient to induce a double-well potential.
    3. Without solvent (gas phase), only the neutral geometry is obtained. 
    4. In the most polar solvent water, the double well potential looks like this:

      The ionic well is about 0.4 kcal/mol lower in total energy (and ~0.3 kcal/mol in free energy, see table above) than the neutral form, with a barrier connecting neutral to ionic only 1.0 kcal/mol. A transition state + intrinsic reaction coordinate (IRC) can be easily located on this total energy potential, confirming the double-well form.
    5. When free energies ΔG are computed, which include thermal effects such as entropy and zero-point energy, the transition state emerges as 0.3 kcal/mol less than the total energy of the ionic form (red entries, Table). In effect, the free energy potential surface is INVERTED compared to the total energy surface and the “transition state” becomes the lowest point on the energy surface. So this point is a minimum in the free energy but a maximum in the total energy, the result of adding thermal effects to the total energy.
    6. In dichloromethane, the free energy of the neutral form is now lower by 0.3 kcal/mol than the ionic form. The OH bond is starting to get shorter and the NH one longer. The transition state is now 0.22 kcal/mol lower than the neutral form. With chloroform, the OH and HN bonds have become ~equal in length, the proton is symmetrically disposed.
    7. By the time dibutyl ether as solvent is reached, the transition state is no longer lower in ΔG than the neutral form, moving on to being 2.0 kcal/mol higher for toluene. So as the solvent polarity decreases, we see a change in the potential from a single well in ΔG, in which the proton is centred, to a very asymmetric well in which the proton is attached to the oxygen.
    8. Can we match the observed neutron diffraction results to the calculations? As the temperature decreases, the neutron diffraction shows the start of proton transfer from oxygen to nitrogen to form an ionic species. The calculations show that this can be modelled by an increase in the effective dielectric constant of the  medium. The computed “transition state” for proton transfer somewhere between dibutyl ether and toluene (as a dielectric media) emerges as approximately the best model for the structure of this species. At this dielectric, the calculated ΔG is no longer quite the lowest free energy point in the potential. This might be due to the many approximations used in this model such as minimisation of total energy, the partition function method used to calculate entropy, the nature of the DFT functional, the continuum solvation model, the basis set, etc. 

    Conclusions:

    These results were obtained with the approximation that minimising the total molecular energy produces a computed geometry that can be compared to the experimental neutron diffraction structures. But can one do better? Obtaining molecular geometries by minimising the computed free energies would be non-trivial. Firstly, minimisation would depend on availability of first derivatives of the energy function with respect to coordinates, in this case ΔG. These are not available for any DFT codes. The result would itself be temperature dependent (as indeed are the experimental results shown above). Furthermore, ΔG is computed from normal vibrational modes and these are only appropriate when the first derivatives of the function are zero, at which point the so-called six rotations and translations of the molecule in free space also have zero energy. So we need vibrations to compute derivatives, but we need derivatives to compute vibrations in this classical approach.

    It would be great for example if the approximate model of the potential for a hydrogen transfer used above as based on minimising total energies for derivatives could be checked against a model based on geometries optimised using free energies instead. Such procedures do exist,[cite]10.1063/1.2715941[/cite] using molecular dynamics trajectory methods.


    This post has DOI: 10.14469/hpc/10382 [cite]10/hqsm[/cite]

  • Protein-Biotin complexes. Crystal structure mining.

    In the previous post, I showed some of the diverse “non-classical”interactions between Biotin and a protein where it binds very strongly. Here I take a look at two of these interactions to discover how common they are in small molecule structures.

    The first search is of a CH hydrogen bond to the face of the aromatic ring in a tryptophane residue

    The search is shown below, in which the distance of the hydrogen to the ring centroid is defined, as is the angle subtended at that centroid, constrained to lie within 20° of a vertical approach.

    The resulting heat plot shows 2772 entries (no disorder, no errors, R < 0.05), with a rather diffuse red spot at around 2.7-2.8Å (but which can be as short as 2.3Å) and an angle of approach of ~90±5°. This matches the concept of a region of interaction rather than a more focused “hydrogen bond”. It is seen as a relatively common motif!


    The next search is for “hydrogen bonding” between the sulfur of an C-S-C unit (as found in Biotin) and an OH group.
    This is less common, with 151 entries in the Cambridge small molecule database, the red spot having a relatively short S…H distance of 1.65Å and a slightly non linear angle.

    The NH analogue of this search is shown below (422 hits) shows two clusters. The one with a large angle at H is more concentrated and reveals a distance of ~2.9Å whilst the second cluster has smaller angle and a long tail out to ~2.5Å

    So we conclude there is ample evidence in small molecule crystal structures for the types of interaction mooted for Biotin with proteins.

  • First came Molnupiravir – now there is Paxlovid as a SARS-CoV-2 protease inhibitor. An NCI analysis of the ligand.

    Earlier this year, Molnupiravir hit the headlines as a promising antiviral drug. This is now followed by Paxlovid, which is the first small molecule to be aimed by design at the SAR-CoV-2 protein and which is reported as reducing greatly the risk of hospitalization or death when given within three days of symptoms appearing in high risk patients.

    The Wikipedia page (first created in 2021) will display a pretty good JSmol 3D model of this; the coordinates being generated automatically on the fly from a SMILES string, which specifies only what atoms are connected in the structure by bonds. Given that the structure of this molecule as embedded in the SARS-CoV-2 main protease[cite]10.1007/s13238-021-00883-2[/cite] has been determined (and can be viewed here), I thought I might display those coordinates as an alternative to the Wikipedia/JSmol generated structure.

    Click to get 3D model

    I extracted the ligand from the PDF file and then added hydrogens manually to obtain the above result. There are two noteworthy points about these representations:

    1. A mystery concerns the nominal C≡N group on the top right, which displays an angle at the carbon of 117°. A cyano group is of course linear (180°). This is not a defect of the crystal structure determination, but an indication of a rather stronger interaction occurring (as indeed noted[cite]10.1007/s13238-021-00883-2[/cite]). The distance between the carbon of the cyano group and an adjacent sulfur is 1.814Å, which indicates a covalent bond has formed to the cyano group. The nitrogen of the erstwhile cyano group is 3.013Å away from an adjacent NH group, which suggests it is stabilised by a hydrogen bond.
    2. Crystal structure searching of units with S…C…N in which the N has only one bond reveals zero hits, but searches of S…C…NH reveal nine hits, with S…C distances in the range 1.74 – 1.80Å and C…N distances in the region 1.25-1.27&Aring. The reported CN distance is 1.251&ARing, confirming that when bound to the protein, the cyano group is replaced by an S-C=NH group and hence is clearly an important component of the mode of action of Paxlovid.
    3. The conformation of Paxlovid is in one respect not fully represented by the Wikipedia diagram, as shown below. This implies the t-butyl group (on the left) as being well separated from the pyrrolidinone ring system at the right of the molecule.

      In fact the two groups are adjacent, being held in that conformation by probably a combination of weak dispersion forces and a contribution from the surrounding protein in the crystal structure. This is more graphically shown by the NCI (non-covalent-interaction) diagram below (DOI: 10.14469/hpc/9964), where the green areas in the region between the two groups (ringed in red) represent stabilising interactions between them. You might also spot other green/cyan regions indicating additional weak hydrogen bonds between C-H groups and oxygen!
    PAXLOVID NCI analysis

    There are only a small number of crystal structures of small molecules containing the S-C=NH motif. I will try to find out how common this is in protein-ligand structures.


    There are many tools for performing this operation. I used the following procedure. I downloaded the PDB file (https://files.rcsb.org/download/7vh8.cif), opened it in CSD Mercury, selected the ligand (by identifying the CF3 group and clicking on one atom), inverted the selection so that everything but the ligand was then selected and using edit/structure, I deleted the selected atoms, leaving only the ligand.

    Postsript

    The cyanopyrrolidine group such as in Paxlovid is well known as a specific probe.[cite]10.1039/D1MD00218J[/cite],[cite]10.1021/jacs.0c04527[/cite],[cite]10.1021/acschembio.0c00031[/cite] CovalentInDB is a comprehensive database facilitating the discovery of such covalent inhibitors[cite]10.1093/nar/gkaa876[/cite] and is available here. There is also a program called DataWarrior that is potentially able to find such probes.

  • More examples of crystal structures containing embedded linear chains of iodines.

    The previous post described the fascinating 170-year history of a crystalline compound known as Herapathite and its connection to the mechanism of the Finkelstein reaction via the complex of Na+I2 (or Na22+I42-). Both compounds exhibit (approximately) linear chains of iodine atoms in their crystal structures, a connection which was discovered serendipitously. Here I pursue a rather more systematic way of tracking down similar compounds.

    Here is one search query which can be used in the CSD database of crystal structures. A chain of eight iodine atoms is defined, and the six angles subtended at iodine restricted to the range 150-180° (i.e. linear). The inner six iodines are also defined as having only two bonded atoms.

    This results in four hits (October 2021), three of which are shown below (the fourth, JOPLEH, contains chains of I82- anions which do not appear to be infinitely repeating).

    1. IQIVIP, containing the heterocyclic unit pyrroloperylene and connected chains of I29.[cite]10.1002/anie.201601585[/cite] See also DOI: 10.5517/ccdc.csd.cc1m1tj0
      Click to load 3D model of IQIVIP


      The truly remarkable feature is that the iodine chain appears to adopt a gentle right-handed helix in this isomer. One has to wonder how this might respond to light!
    2. IQIVOV, closely related to IQIVIP, this time containing connected chains of gently spiralling I10 groups.[cite]10.1002/anie.201601585[/cite] See also DOI: 10.5517/ccdc.csd.cc1m1tk1
      Click to load 3D model of IQIVOV
    3. WEVFAE, containing a tetramethyl stilbonium cation (an analogue of a tetramethylammonium cation) and this time infinite chains of I83- anions.[cite]10.1002/anie.199409871[/cite]
      Click to load 3D model of WEVFAE

    The list is not long, but contains some fascinating examples of how iodine can catenate into infinitely long chains, sometimes linear (on the time averaged scale at the temperature of the data recording), sometimes gently helical and as with Herapathite, a rather more undulating motif. Again how the crystals of these compounds respond to light remains to be established. However it may be that since these three molecules are reported variously as being black-green, black and golden, some may be opaque to light in any orientation. I also note that linear chains of Ag, Ga In and Tl have also been reported in inorganic metal nitrides.[cite]10.1002/anie.200601726[/cite]


    The same result is obtained if the specification of iodine in this search is replaced by “any” element. This post has DOI: 10.14469/hpc/9540. See also DOI: 10.1016/j.hm.2005.11.005 for a connection between coiled chains of iodine atoms and Einstein’s theory of teleparallel spacetime, invoking torsional geometries.

  • Herapathite: an example of (double?) serendipity.

    On October 13, 2021, the historical group of the Royal Society of Chemistry organised a symposium celebrating ~150 years of the history of (molecular) chirality. We met for the first time in person for more than 18 months and were treated to a splendid and diverse program about the subject. The first speaker was Professor John Steeds from Bristol, talking about the early history of light and the discovery of its polarisation. When a slide was shown about herapathite[cite]10.1126/science.1173605[/cite] my “antennae” started vibrating. This is a crystalline substance made by combining elemental iodine with quinine in acidic conditions and was first discovered by William Herapath as long ago as 1852[cite]10.1080/14786445208646983[/cite] in unusual circumstances. Now to the serendipity!

    Herapath was able to get small crystals of this substance and discovered that when he placed one crystal upon another at “right angles”, the combination went “black as midnight”. He recognised that it was functioning as an excellent linear light polarizer, absorbing virtually all the light polarized along the shorter axis of the best-developed facet of the crystal. A number of well known scientists investigated this substance at the time, but by about 1951 it had largely been forgotten. The person to rediscover it was Edwin Land, of Polaroid camera fame.[cite]10.1364/JOSA.41.000957[/cite] He oriented the microcrystals into an extruded polymer to stabilize them and hence produce the first large-aperture light polarizer, which enabled him to manufacture his first camera. The serendipity resulted from him spotting the by then forgotten properties of Herapathite (I wonder if he recorded how this actually came about) and recognising how to exploit it.

    In 2009 Bart Kahr had noticed that the crystal structure of this material had never been reported. It was a challenging structure to solve[cite]10.1126/science.1173605[/cite] but established that the polarizing property of the crystals was in large measure due to the presence of infinite chains of I3 units aligned in an almost linear channel in the crystal structure. And so it was that in October 2021, John Steeds showed the structure containing these iodine chains in his slide on the topic. The crystal structure is in the CCDC database as WEYDOV and can be seen here at DOI: 10.5517/ccsdg7v I show below part of the extended lattice, showing that chain of iodines.

    Click to view 3D model of WEYDOV

    So the next (possible) instance of serendipity. From the audience, I immediately recognised this structural motif as being related to the crystal structure of both Na+I (NAIACE) and Na+I2 (GADMOO)[cite]10.1107/S0108270103006395[/cite] which I discussed in one of the very first posts on this blog in 2009 as part of a story about the Finkelstein reaction. Both these structures were obtained from acetone solution, and this solvent very much forms part of the crystal structures, serving to coordinate the sodium cations and playing the role of the quinine in herapathite. The iodine chains, comprising in GADMOO units of I3 and I, are almost exactly linear!

    Click to view 3D model of NAICE
    Click to view 3D model of GADMOO

    So, the question arises as to whether crystals of Na+I2 have ever been examined for light polarisation? One might also ask whether eg the chiral quinine imparts a critical property to the herapathite crystal, or could the achiral acetone also serve the purpose? What would happen if substituted versions of acetone were used (halo, methyl etc)? Would they destroy those linear chains, or would they survive? Are repeating chains of I3 units essential, or can chains of alternating units of I3 and I also serve the purpose? All questions that can only be answered by experiments! Anyone up for trying?


    This post has DOI: 10.14469/hpc/9537


  • More record breakers for the anomeric effect involving C-N bonds.

    An earlier post investigated large anomeric effects involving two oxygen atoms attached to a common carbon atom.

    A variation is to replace one oxygen by a nitrogen atom, as in N-C-O. Shown below is a scatter plot of the two distances to the common carbon atom derived from crystal structures.

    You can see some entries for which the C-O bond length is shorter than normal and the C-N distance very much longer than normal; an example of a highly asymmetric anomeric effect operating in just one direction rather than the two shown in the top diagram (red/blue arrows).

    One example is LOFPON[cite]10.1039/C4CE00981A[/cite] (DOI: 10.5517/cc121rsn) with bond lengths shown calculated at the ωB97XD/def2svpp level (Calculation DOI: 10.14469/hpc/8682) and is rationalised by the nitrogen being a quaternary cation and hence an excellent leaving group which biases the electron flow towards it. Anomeric effects can be quantified using a technique known as NBO analysis, which uses perturbation theory to estimate the interaction energy between a donor orbital (the oxygen lone pair in this case) and an acceptor orbital (the C-N σ* unoccupied orbital). Populating the C-N σ* antibonding orbital causes the C-N length to increase and the interaction energy in this example is 36.4 kcal/mol. This is around twice the normal value for anomeric effects and so is unusually large.

    LOFPON

    The other prominent example is NAWNUV (Data DOI: 10.5517/cc93pkm) where the bond length asymmetry is slightly larger and so is the perturbation energy (E2) is 41.0 kcal/mol (ωB97XD/def2svpp calculation DOI: 10.14469/hpc/8378). 

    NAWNUV

    In the opposite direction, NUQKAM[cite]10.1107/S1600536810014248[/cite] is an example of a lengthened C-O bond and a shortened C-N bond, with the crystal structure (DOI: 10.5517/ccv3ln5) shown below.

    In this instance, a ωB97XD/def2svpp calculation (Data DOI: 10.14469/hpc/8806) does not bear this structure out, with CN and CO bond lengths of 1.422 (vs 1.369) and 1.434 (vs 1.529)Å and a final E(2) of 22.1 kcal/mol (which is close to normal). This is an example of how mining the crystal structure can yield results that can be checked by a different (quantum computational) technique, which in this instance reveals a probable issue in the crystal structure refinement which is probably causing the apparently large anomeric effect in the crystal structure to manifest.

    Another entry is ANUVUD[cite]10.1016/j.bmc.2021.116113[/cite] with a crystal structure (data DOI: 10.5517/ccdc.csd.cc24zxdg) shown below and CN and CO lengths of 1.391 and 1.559Å, which in this case ARE reasonably replicated by calculation (1.402, 1.499). This effect is promoted by the good leaving group ability of the carboxylate anion and the antiperiplanar orientation of the nitrogen lone pair with respect to the C-O bond, E(2)=35.2 kcal/mol (DOI: 10.14469/hpc/8807)

    I end with FEHYOG, a relatively old structure[cite]10.1039/C39870000578[/cite] showing a very long C-N distance (1.673Å) but a normal associated C-O distance (1.423Å). This rings an alarm bell. Indeed, the respective computed distances are 1.482 and 1.425Å, a significant discrepancy (DOI: 10.14469/hpc/8769). The NBO interaction energy is an umremarkable 12.5 kcal/mol.

    Data mining of the crystal structure database has revealed a number of abnormally large bond length asymmetries around the N-C-O unit. Some of these are true record breakers, but two have been identified where calculations cannot reproduce the observed bond lengths. One might indeed ask whether a quantum computation of the structure might not be added to the curation checks made by the CCDC of their database. It might improve the quality of the data even further!