Archive for the ‘Uncategorized’ Category

Possible Formation of an Impossible Molecule?

Monday, May 20th, 2024

In the previous post, I explored the so-called “impossible” molecule methanetriol. It is regarded as such because the equilbrium resulting in loss of water is very facile, being exoenergic by ~14 kcal/mol in free energy. Here I explore whether changing the substituent R could result in suppressing the loss of water and stabilising the triol.

I started (as I usually do) with a search for crystal structures, in this case containing the motif shown below (trisubstituted carbon, disubstituted oxygen and  R = H or C and any type of connecting bond), which is the species resulting from loss of R to form a trihydroxycarbenium cation.

This produces six hits, of which  HIWQEJ[1] (DOI: 10.5517/cc3k560) and UYOYUD[2] (DOI: 10.5517/ccvrghj) are both salts of trihydroxycarbenium cation (or protonated carbonic acid) itself – the counter ion being eg AsF6 or an iron system. So R needs to be a stable anion and two obvious groups are triflate (trifluoromethylsulfonate) or bis(trifluoromethanesulfonyl)azanide.

The triflate (R=CF3SO2-O) shown below has an unusually long predicted C-O bond (1.620Å), which suggests the system is already partially ionised as shown in the top diagram. An ωB97X-D calculation [3], DOI: 10.14469/hpc/14280) reveals the species shown below is +6.6 kcal/mol higher in free energy than the one corresponding to loss of water.


Bis-triflamide (bis(trifluoromethanesulfonyl)azanide) goes further, helped no doubt by the formation of a second strong hydrogen bond between the two ions. It is now -11.8 kcal/mol lower in free energy compared to the species resulting from loss of water.

So that is my candidate for a “possible” impossible molecule. Any takers for its synthesis?


Postscript: The next higher homologue, tris(trifluoromethanesulfonyl)methanide anion + trihydroxycarbenium cation is similar to the bis-triflamide in being -12.1 kcal/mol lower than the species resulting from loss of water.


References

  1. R. Minkwitz, and S. Schneider, "Trihydroxycarbenium Hexafluorometalates: Salts of Protonated Carbonic Acid", Angewandte Chemie International Edition, vol. 38, pp. 714-715, 1999. https://doi.org/10.1002/(sici)1521-3773(19990301)38:5<714::aid-anie714>3.0.co;2-k
  2. S. Guo, J. Lin, W. Chen, X. Wei, J. Wang, and W. Dong, "CCDC 797118: Experimental Crystal Structure Determination", 2011. https://doi.org/10.5517/ccvrghj
  3. H. Rzepa, "Possible Formation of an Impossible Molecule?", 2024. https://doi.org/10.14469/hpc/14280

Exploring Methanetriol – “the Formation of an Impossible Molecule”

Thursday, May 16th, 2024

What constitutes an “impossible molecule”? Well, here are two, the first being the topic of a recent article[1]. The second is a favourite of organic chemistry tutors, to see if their students recognise it as an unusual (= impossible) form of a much better known molecule.

Perhaps we could define impossible molecules into two slightly different classes.

  1. The first class is a molecule which is entirely normal in terms of its structure and bonding, but just happens to be thermodynamically less stable than an isomeric form. If all mechanistic possibilities for converting it to the more stable form are eliminated, then there is no reason it should not be detected, even though it is “impossible”. By the way, quite a number of impossible molecules have been prepared using sterics  (t-butyl groups and the like, a strategy first used perhaps 40 or so years ago) to prevent the molecule from either reacting with itself or with other molecules.
  2. The second class is a molecule where the bonding or its structure are so deviant from accepted theories of the structures of molecules that its energy is either so high that either it simply cannot be prepared in the first place, or that nothing can be done to prevent its rearrangement to a much more stable form.

The first of the examples below falls clearly into the first category; methane triol. As reported[1], this impossible molecule has now been detected both at low temperatures and in the gas phase at low pressure using time-of-flight mass spectrometry and other elegant experiments. The key is to ensure either a very low temperature or the absence of any acid catalyst to decompose it to methanol and formic acid.

As is my usual practice in discussing any interesting molecule, I first tend to conduct a search of the CSD (Cambridge structure database) – in this case it has to be said with little hope of finding any examples. I was therefore very surprised to find one example reported, COLRUT.[2] The crystal structure of COLRUT can be viewed here.[3] (DOI: 10.5517/ccdc.csd.cc22yztvv).  Clearly, given the discussion at the top, alarm bells should be ringing about this result. When any such alarms sound, it is my second practice to turn to calculations for verification. In this case to FAIR Data calculations[4]  (DOI: 10.14469/hpc/14236).

The article[1] also reports such calculations, but its good to have independent verification (of some of them), so I list the essential conclusions from my own calculations here.

  1. At the CCSD(T)/Def2-TZVPP level, methane triol is ΔG298 14.49 kcal/mol higher in free energy than formic acid and water. This is not really an impossibly higher energy, and the molecule is “impossible” only because there is a very facile reaction for it to undergo (acid catalysed disproportionation for example).
  2. At the much faster ωB97X-D/Def2-TZVPP level, the value is 14.48 kcal/mol, which is agrees well enough with the previous to use this method to explore further.
  3. If the C-H is replaced by C-CF3 (again a good tutorial question for how to stabilize the diol form of eg acetone), the energy of the triol is reduced to +9.4 kcal/mol. Still positive, but much smaller than the original.
  4. If the C-H is replaced by C-(CF3)3 it is still unstable by 13.6 kcal/mol. Not much chance of using substituents to create a “possible” triol then.
  5. Next, the transition state for unimolecular decomposition to water and formic acid. An IRC for this is shown below and the free energy of activation is +36.6 kcal/mol. This proceeds via a very non-linear hydrogen transfer, a geometry known to be unfavourable and indeed an energy too high for this rearrangement to occur (in a mass spectrometer? What is the temperature of molecules under these conditions?). Note how a nice hydrogen-bonded form of the products forms at the end.


    I could not resist showing the dipole moment response along the IRC. Lovely!
  6. What about an intermolecular rearrangement, which would occur at either higher pressures or perhaps higher temperatures? Now, ΔG = 26.7 kcal/mol, a more viable thermal reaction.The lower barrier is because the 6-ring transition state now allows a less bent hydrogen transfer.

  7. This is the reaction of a trimer, ΔG = 24.2 kcal/mol. The 8-ring transition state now allows almost linear hydrogen transfers. Note that all three transferring hydrogens move more or less in synchrony.
  8. The tetramer: ΔG = 24.1 kcal/mol, now via a 10-ring transition state. If you look carefully at the animation, you can now see that the hydrogen transfers have become very non-synchronous (and the transition state more ionic), although they remain almost linear.
  9. But wait, there is another isomer of the tetramer reaction, instead proceeding via an 8-ring TS, with the fourth triol molecule bonding to the transition state via four hydrogen bonds. This is very much like a stabilised protein transition state and overcomes the extra entropy of adding that fourth molecule and then some; ΔG = 18.9 kcal/mol. So at high concentrations the disproportionation of methane triol is predicted to become a facile reaction and now can only be prevented at low temperatures!<

An NCI (non-covalent-interaction) analysis of the hydrogen bonds in this TS structure is shown below. The blue regions are hydrogen bonds. The ones labelled 1-4 are the four such interactions resulting from addition of a fourth molecule to the hydrogen transfer structure of the trimer. Click for a 3D rotatable model.

So I hope this extended analysis of what makes an “impossible molecule” actually possible adds another dimension to the original report.[1] As for that crystal structure, I will report to CCDC that it may in fact be an artefact and that they should take another look at the crystal structure data and correct it if needed. It is also interesting to explore the properties of cyclic hydrogen transfer reactions. The conclusion here is that an 8-ring transfer may be optimum, especially if it can be stabilized with four or more hydrogen bonds!

References

  1. J.H. Marks, X. Bai, A.A. Nikolayev, Q. Gong, C. Zhu, N.F. Kleimeier, A.M. Turner, S.K. Singh, J. Wang, J. Yang, Y. Pan, T. Yang, A.M. Mebel, and R.I. Kaiser, "Methanetriol─Formation of an Impossible Molecule", Journal of the American Chemical Society, vol. 146, pp. 12174-12184, 2024. https://doi.org/10.1021/jacs.4c02637
  2. P. Mi, L. He, T. Shen, J.Z. Sun, and H. Zhao, "A Novel Fluorescent Skeleton from Disubstituted Thiochromenones via Nickel-Catalyzed Cycloaddition of Sulfobenzoic Anhydrides with Alkynes", Organic Letters, vol. 21, pp. 6280-6284, 2019. https://doi.org/10.1021/acs.orglett.9b02161
  3. H. Rzepa, "Exploring Methanetriol - "the Formation of an Impossible Molecule"", 2024. https://doi.org/10.14469/hpc/14236

Detecting anomeric effects in tetrahedral boron bearing four oxygen substituents.

Tuesday, April 30th, 2024

In an earlier post, I discussed[1] a phenomenon known as the “anomeric effect” exhibited by tetrahedral carbon compounds with four C-O bonds. Each oxygen itself bears two bonds and has two lone pairs, and either of these can align with one of three other C-O bonds to generate an anomeric effect. Here I change the central carbon to a boron to explore what happens, as indeed I promised earlier.

One can identify candidates for such molecules by a constrained search of the CSD or the Cambridge structural database, as shown below.

The four B-O distances for each compound matching the query are now subjected to further analysis, the greatest and least values are identified and the difference between them calculated.

The results are shown in the diagram below. Three outliers are identified for close inspection.

Each of the three candidates is also subjected to a Gaussian calculation (MD15L/Def2-TZVPP)[2] (See DOI: 10.14469/hpc/14092)

  1. QIXREW[3]. This molecule is overall neutral and for which ΔrB-O = 0.193Å (MN15L/Def2-TZVPP ΔrB-O = 0.175Å). The Wiberg bond indices of longest and shortest B-O bonds are 0.486 and 0.698, Δ = 0.212Å.This is significantly larger than the best example of the C-O series, for which the largest ΔrC-O = 0.074Å and 0.137 for the Wiberg index.
  2. XOVZOY[4] is a tri-anion with intercalated Ir3+ counterion. ΔrB-O = 0.347Å. A calculation on the isolated tri-anion (with a continuum water field to help emulate the crystal environment) results in the maximum B-O bond length difference of only 0.004Å, which is dramatically different from the crystal structure. This may be an example where the counter-ion is especially important for modelling structure, or it may be simply an anomalous refinement of the crystal structure.
  3. KBDCTB, ΔrB-O measured = 0.451Å, Calculated 0.0314Å.
    This is another structure where all may not be what it seems. This again is an anionic structure and geometry optimisation of a single molecule results in a dramatic change in the internal hydrogen bonding of the species. In the crystal structure, the carboxylic acid groups all form intermolecular hydrogen bonds. Optimized as an isolated molecule, the former are no longer possible and a big conformational change occurs to allow all four carboxylic acid groups to instead form intramolecular H-bonds. In this conformation, all four B-O bonds are essentially the same length. So this might well be an example of a large change in anomeric effects due to changes in geometry induced by hydrogen bonding.

    Intermolecular H-bonds Intramolecular H-bonds

One lesson one always learns when comparing the lengths of bonds observed in crystal structures with those calculated using quantum mechanics is that they sometimes do not match well. These mis-matches can occur for various reasons; changes in hydrogen bonding, or the presence of unmodelled counterions or simply errors in the reported crystal structure. But we might suggest from this brief foray into B-O bonds that the anomeric effects found there may indeed be larger than those of their C-O counterparts.

References

  1. H. Rzepa, "Detecting anomeric effects in tetrahedral carbon bearing four oxygen substituents.", 2024. https://doi.org/10.59350/dfkt5-k2b20
  2. H. Rzepa, "Detecting anomeric effects in tetrahedral boron bearing four oxygen substituents.", 2024. https://doi.org/10.14469/hpc/14092
  3. S.I. Kalläne, T. Braun, B. Braun, and S. Mebs, "Versatile reactivity of a rhodium(i) boryl complex towards ketones and imines", Dalton Transactions, vol. 43, pp. 6786, 2014. https://doi.org/10.1039/c4dt00080c
  4. H. Danjo, K. Hirata, S. Yoshigai, I. Azumaya, and K. Yamaguchi, "Back to Back Twin Bowls of <i>D</i><sub>3</sub>-Symmetric Tris(spiroborate)s for Supramolecular Chain Structures", Journal of the American Chemical Society, vol. 131, pp. 1638-1639, 2009. https://doi.org/10.1021/ja8071435

Internet Archeology: reviving a 2001 article published in the Internet Journal of Chemistry.

Thursday, March 21st, 2024

In the mid to late 1990s as the Web developed, it was becoming more obvious that one area it would revolutionise was of scholarly journal publishing. Since the days of the very first scientific journals in the 1650s, the medium had been firmly rooted in paper. Even printed colour only became common (and affordable) from the 1980s. An opportunity to move away from these restrictions was provided by the Web. Early adopters of this medium in chemistry were the CLIC pilot project[1] in 1995 and the Internet Journal of Chemistry (IJC), the latter offering “enhanced chemical publication which permits the publication of materials which cannot be published on paper and end-use customization which permits the readers to read articles prepared for their specific needs“.[2] Publication of the latter started in January 1998, offering “authors the opportunity to enhance their articles by fully incorporating multimedia, large data sets, Java applets, color images and interactive tools.” The journal remained online for seven years, after which it was closed and the articles became inaccessible. By then many major chemistry journals had started evolving along some of the same lines, and it could be argued this journal had served its purpose of alerting both publishers and authors to these new opportunities. Here I describe how an IJC article published in 2001 was brought back to life in more or less the enhanced manner intended.[3]

Entitled “The Mechanism and Design of Asymmetric Co-Arctate Br+ (Mobius) Atom Transfers Between Alkenes. A Computational Study“, an abstract of the article is still visible via services such as e.g. Scifinder, but a more complete and open metadata description which can be provided from an assigned DOI (Digital object identifier) is not available, since back in 2001 the adoption of DOIs by journals was still in its infancy. Fortunately, the original source was still available from the authors as a combination of HTML, image files and data, the latter two being hyperlinked into the body of the article. These files are in fact all that is needed to recreate the original IJC article (if not its style), using the mechanism of a data repository[4],[5] rather than that normally designed for a journal. The procedure adopted was as follows:

  1. All the data files were uploaded to the repository as a dataset.[6], DOI: 10.14469/hpc/13929.
  2. The metadata record generated and registered for these depositions (https://data.datacite.org/application/vnd.datacite.datacite+xml/10.14469/hpc/13929) has Access (the A of FAIR) identifiers in the form of e.g.
    1. <relatedIdentifier relatedIdentifierType="URL" relationType="HasPart">https://data.hpc.imperial.ac.uk/resolve/?doi=13929&file=1</relatedIdentifier>
    2. Descriptive metadata providing further properties if needed, such as file names and media types and file sizes can be obtained via
      • <relatedIdentifier relatedIdentifierType="URL" relationType="HasMetadata">https://data.hpc.imperial.ac.uk/resolve/?ore=13929</relatedIdentifier>
    3. These access identifiers replaced the hyperlinks in the original article HTML
      1. Originally: <a href="supplemental/3-ts-rh3.pdb">-1.5</a>
      2. Becomes: <a href="https://data.hpc.imperial.ac.uk/resolve/?doi=13929&file=54">-1.5</a>
      3. It is worth noting that there are basically two methods of accessing a file. The first relies on its relative path in a hierarchical file system. Hard-coding such a location into a URL means it may not be persistent – the hyperlink is vulnerable to “link rot” when the file system is reorganised and the path to the file changes. The second method relies on a database query, which should be rather more persistent, since the database should always incorporate any reorganisation of the internal systems. A third option (not used here) is to assign a persistent identifier to every file, and to ensure that a properly persistent direct access mechanism is described in metadata for that file.
      4. The root document for the article, given the reserved filename index.html was edited to reflect the changes in the hyperlinks.
  3. The article document index.html was now itself uploaded to the repository. In a conventional data repository, such a file invokes no specific actions, but in the repository used for this purpose it does have the reserved meaning of invoking in effect a preview or “LiveView” using the syntax
    • <iframe name="liveview" src="https://data.hpc.imperial.ac.uk/resolve/?doi=13929&file=90"
      width="100%" height="600" id="liveview"></iframe>
  4. The article now functions much in the same way it would have done on IJC, albeit in one interesting way. The regular style adopted in journals is to place the ESI or electronic supporting information files into a separate enclave, linked via the article landing page by parochial mechanisms. In this instance the article and its data files are visible on the same page – it is a data repository after all – thus elevating the data to the same status as the article. Such elevation is often referred to as making “Data a first class citizen of the publication processes“.
  5. The opportunity now arose to incorporate an interactive tool based on the use of the JSmol molecule viewer.
    • By adding an additional header to the HTML document containing a Javascript invocation of JSmol, selected data could be brought to life by creating a molecular model in a separate window.
    • This is invoked by a variation on the hyperlink shown above in section 3.2 by
      <a href="javascript:show_jmol();javascript:handle_jmol('10.14469/hpc/13933',%20';frame 1;font label 16;zoom 5;moveto 4 90 4 80 65 120;spin 3;set echo bottom left;font echo 20 serif bolditalic;color echo green;echo TS for 3 (C2 symmetry);')">Load 3D Model</a>
    • Additional tools are now provided, from activating a (molecular) vibration, calculating a chirality (if applicable) or others invoked from a pull-down menu.
    • In this example, the data is again accessed directly from a data repository, albeit by a different mechanism from that shown in 3.2 and here based only on the DOI of the data and its media type (in this case chemical/x-mdl-molfile).

It was not the intention here to illustrate how a Journal infrastructure might work – merely to rescue an article published 23 years ago (a long time in the Internet era) from a journal that is no longer disseminating articles. In the process the article has acquired its own DOI (albeit as data and not journal article), something not available from the original journal and some level of interactivity of the type originally envisaged. The (manual) process took something around 2-3 hours to achieve, and would certainly need automating if it were to be used more than once. I take encouragement however that after so many years, it was still possible with relatively little effort to achieve this curation.

References

  1. D. James, B.J. Whitaker, C. Hildyard, H.S. Rzepa, O. Casher, J.M. Goodman, D. Riddick, and P. Murray‐Rust, "The case for content integrity in electronic chemistry journals: The CLIC project", New Review of Information Networking, vol. 1, pp. 61-69, 1995. https://doi.org/10.1080/13614579509516846
  2. S.M. Bachrach, "The 21st century chemistry journal", Química Nova, vol. 22, pp. 273-276, 1999. https://doi.org/10.1590/s0100-40421999000200020
  3. H. Rzepa, "Internet Archeology: an example of a revitalised molecular resource with a new activity now built in.", 2020. https://doi.org/10.59350/9c769-34y25
  4. Re3data.Org., "Imperial College Research Computing Service Data Repository", 2016. https://doi.org/10.17616/r3k64n
  5. FAIRsharing Team., and , ., "FAIRsharing record for: Imperial College Research Computing Service Data Repository", 2018. https://doi.org/10.25504/fairsharing.letkjt
  6. H. Rzepa, "The Mechanism and Design of Asymmetric Co-Arctate Br+ (Mobius) Atom Transfers Between Alkenes. A Computational Study", 2024. https://doi.org/10.14469/hpc/13929

Detecting anomeric effects in tetrahedral carbon bearing four oxygen substituents.

Monday, March 18th, 2024

I have written a few times about the so-called “anomeric effect“, which relates to stereoelectronic interactions in molecules such as sugars bearing a tetrahedral carbon atom with at least two oxygen substituents. The effect can be detected when the two C-O bond lengths in such molecules are inspected, most obviously when one of these bonds has a very different length from the other. The effect originates when one of the lone pair of electrons on one oxygen atom uniquely overlaps with the C-O antibonding σ* on another oxygen, thus shortening the length of the donating oxygen-carbon length and lengthening the length of accepting C-O bond. Here I take a look at tetra-substituted versions of this (C(OR)4), where in theory there are up to eight lone pairs, interacting with any of three C-O bonds, giving a total of 24 possible anomeric effects in one molecule.


We start the process with a search of the Cambridge crystal structure database, using the following search query:

This yields 25 hits. We now want to find out what the longest and shortest C-O bonds are, and how large the difference between them is. To do this, we have to resort to applying some functions, using the calculator tool built into the Mercury analysis software. The following functions were used:

  1. Greatest('search3'.'DIST1','search3'.'DIST2','search3'.'DIST3','search3'.'DIST4')
  2. Least('search3'.'DIST1','search3'.'DIST2','search3'.'DIST3','search3'.'DIST4')
  3. Greatest('search3'.'DIST1', 'search3'.'DIST2', 'search3'.'DIST3', 'search3'.'DIST4')-Least('search3'.'DIST1', 'search3'.'DIST2', 'search3'.'DIST3', 'search3'.'DIST4')

The results can be displayed as below, in which the difference between the two bond lengths is colour coded (red = greatest, blue = least).

  1. Here you can see that when the difference between the longest and short C-O bond lengths is small, the colour is blue.
  2. Green dots show a difference of about 0.04-0.05Å
  3. The red dot has the greatest difference of 0.087Å and corresponds to the entry SILDOH ([1], DataDOI: [2], 10.5517/ccq8lq8.

The next step is to apply a “reality check” using computation, here a MN15L/Def2-TZVPP calculation on the top eight entries as sorted by the largest C-O bond length differences (ΔrC-O > 0.05Å.[3], data DOI: 10.14469/hpc/13925

CCDC Ref code Crystal structure Computational structure
Longest Shortest Δ Longest shortest Δ
SILDOH 1.451 1.364 0.087 1.441 1.367 0.074
PILTOU 1.432 1.361 0.071 1.418 1.378 0.040
GISSAD 1.435 1.367 0.068 1.422 1.375 0.047
BODGEG 1.507 1.442 0.065 1.424 1.370 0.054
GINLOF 1.425 1.364 0.061 1.418 1.377 0.041
POCPOO 1.419 1.361 0.058 1.421 1.371 0.050
KEVFUM 1.417 1.361 0.056 1.395 1.391 0.004
AHEYAO 1.423 1.370 0.053 1.422 1.372 0.050
  1. The largest effect occurs for SILDOH, and this is replicated by calculation.
  2. The largest discrepancy between measurement and calculation is for KEVFUM,  where calculation predicts almost no C-O bond differences. This will be discussed elsewhere.

Focusing on SILDOH, we look at the NBO E(2) energies for the donor-acceptor interactions of an oxygen lone pair donating into a C-O antibonding σ* orbital.

Click on the image below for a 3D model of the two interacting orbitals (positive overlap = blue + purple, red + orange)

The interaction of LpO1 to the long bond C5-O4 = 18.0 and LpO2 to C5-O4 = 16.3 kcal/mol, whereas in the reverse directions, LpO4 to C5-O1 is only 6.0 kcal/mol and LpO4 to C5-O2 is 10.7 kcal/mol.  For a “normal” C-O bond however such as  C5-O3,  LpO2 to C5-O3 = 3.1 and LPO1 to C5-O3 = 5.3 kcal/mol. In effect, two oxygens “gang up” on weakening the  long C5-O4 bond, but leave the shorter C5-O3 bond alone. So the individual anomeric effects are no larger than normal, but the cooperative effect of two acting together is what produces the final geometric asymmetry.

The Wiberg bond index mirrors this effect. The bond indices are 0.9882 for O1-C5 and C5-O4 0.8512 (Δ =-0.137) which is a big difference in bond order and accounting for the large (record?) difference in bond length.

In the next post, I will analyse the equivalent molecules B(OR)4.

References

  1. R. Betz, and P. Klüfers, "Norbornane-2,7-diyl 1′,2′-phenylene orthocarbonate", Acta Crystallographica Section E Structure Reports Online, vol. 63, pp. o3933-o3933, 2007. https://doi.org/10.1107/s1600536807042298
  2. Betz, R.., and Klufers, P.., "CCDC 663670: Experimental Crystal Structure Determination", 2007. https://doi.org/10.5517/ccq8lq8
  3. H. Rzepa, "Detecting anomeric effects in tetrahedral carbon bearing four oxygen substituents.", 2024. https://doi.org/10.14469/hpc/13925

Data Citation – a snapshot of the chemical landscape.

Monday, February 26th, 2024

The recent release of the DataCite Data Citation corpus, which has the stated aim of providing “a trusted central aggregate of all data citations to further our understanding of data usage and advance meaningful data metrics” made me want to investigate what the current state of citing data in the area of chemistry might be. Chemistry is known to be a “data rich” science (as most of the physical sciences are) and  here on this very blog I try to cite whenever possible the source(s) of the data that  I often use when discussing a topic. Such citations are not necessarily the same as citing a journal source via e.g. its DOI, although of course one is very likely to find data associated with most articles nowadays, albeit almost entirely via any associated supporting information document. However the latter is often presented in a relatively unstructured (PDF) form, which does not adhere to what are called the “FAIR” guidelines of being findable, accessible, interoperable and reusable. Directly citing data is a way of improving its FAIR-characteristics. So what insights does the Data citation corpus reveal?

  1. This overview shows that by far the most common mechanism for citing data is via its Accession Number, used predominantly by Life Sciences (an example of this latter is linked here[1]), with the DOI (digital object identifier) being less common.
  2. Tunnelling down to citation counts in chemical sciences by publisher, an odd picture emerges with just a handful of citations.
  3. The more general physical sciences does not fare much better:
  4. Lets try a different approach, filtering by repository. Thus here are the statistics for the Cambridge crystallographic data centre, which was citing data in large amounts a few years back, but which appears to have dropped off in the last few years. Given that the entries there continue to go up almost exponentially, we begin to suspect that the data citations there are not being properly recognised as such by the citation corpus.
  5. Lets try another repository, Zenodo, which again is dropping but where the totals are about 500 a year for the most recent.
  6. OK, one more go, the RSC chemistry publisher.

I am not sure what to make of this; areas where you would expect very high levels of data citation in chemical sciences do not appear to exist – I think for some reason, the DataCite citation corpus is not yet capturing them.[2] But when things do start operating as perhaps expected, I think we will have a very valuable resource, which should firmly put data (whether FAIR or not) on the map.

References

  1. D. Batista, A. Gonzalez-Beltran, S. Sansone, and P. Rocca-Serra, "Machine actionable metadata models", Scientific Data, vol. 9, 2022. https://doi.org/10.1038/s41597-022-01707-6
  2. R. Page, "Problems with the DataCite Data Citation Corpus", 2024. https://doi.org/10.59350/t80g1-xys37

3D Molecular model visualisation: 3 Million atoms +

Saturday, January 27th, 2024

In the late 1980s, as I recollected here[1] the equipment needed for real time molecular visualisation as it became known as was still expensive, requiring custom systems such as Evans and Sutherland PS390 workstations. One major breakthrough in making such techniques generally available on less specialised equipment was achieved by Roger Sayle[2], then working at Imperial College around 1990 and using a Silicon Graphics workstation. He greatly optimised up the rendering algorithms by creating a program called RasMol (after his initials), which meant such visualisations could very rapidly also be achieved even on a personal computer. Moving from vector display technology (the PS390) to Raster/bitmap graphics had allowed spacefilling representations of molecules containing 100s if not 1000s of atoms – and in turn enabled the new World-Wide Web to exploit the technique.[3]

Whilst Rasmol is very much still around, it also provided an inspiration for successor programs such as Jmol (based on Java) and JSMol (based on the Javascript language built into all modern web browsers). There are now many articles in the literature describing this program. In 2008 the very first post on this blog described how run it in a WordPress instance[4].

Now a new milestone in molecular visualisation has been reached – the ability to display 3 million atoms! Bob Hanson has just released Jmol/JSmol 16.1.51 which supports the BinaryCIF file format. An example of the power of both program and this new format is illustrated with the protein 8glv[5] which contains 3 million atoms (the bcif file itself is only 47.4 Mb).

The Jmol/JSmol script to load it is:

t = now();    
     set autobond false;
     load =8glv.bcif filter "*.C";
     spacefill on;
     color chain;
     print now(t);

and the actual rendering takes just 10-20 seconds. You can see from the screenshots below that when it is zoomed in, it really does show individual atoms! Who knows what the practical atom limit is, but it is almost certainly more than three million! And it may even be possible on a mobile phone!


OK, you are asking why I have not loaded 8glv into this page? Well, I need to update JSmol on this site first, and have encountered an issue that needs fixing.

References

  1. H. Rzepa, "Computers 1967-2011: a personal perspective. Part 2. 1985-1989.", 2011. https://doi.org/10.59350/g4j62-4xk50
  2. R. Sayle, "RASMOL: biomolecular graphics for all", Trends in Biochemical Sciences, vol. 20, pp. 374-376, 1995. https://doi.org/10.1016/s0968-0004(00)89080-5
  3. H.S. Rzepa, B.J. Whitaker, and M.J. Winter, "Chemical applications of the World-Wide-Web system", Journal of the Chemical Society, Chemical Communications, pp. 1907, 1994. https://doi.org/10.1039/c39940001907
  4. H. Rzepa, "Jmol and WordPress: Loading 3D molecular models, molecular isosurfaces and molecular vibrations into a blog", 2008. https://doi.org/10.59350/pq7ds-gqr71
  5. T. Walton, and A. Brown, "96-nm repeat unit of doublet microtubules from Chlamydomonas reinhardtii flagella", 2023. https://doi.org/10.2210/pdb8glv/pdb

The Macintosh computer at 40.

Thursday, January 25th, 2024

On 24th January 1984, the Macintosh computer was released, as all the media are informing us. Apparently, some are still working. I thought I would give my own personal recollections of that period.

In fact, the Mac reached UK stores via a dealership only in 1985. What brought it to the attention of our university chemistry department was that also in 1985 the Chemdraw program was released and visitors to e.g. ACS meetings that year (probably the spring meeting) brought news of it back. A third piece of the puzzle, the Laserwriter also appeared that year. What difference would all this make? Well, take a look at the diagram in this 1983 article[1]. I drew that with stencils and transfer lettering, and the diagrams in this article took me ages! The article was submitted to what was called a “camera ready” journal, as part of the process of accelerating its publication, so it had to be as perfect as I could make it. I had to start from the beginning several times, since sometimes even Typex could not fix the errors or rescue the diagram from being a bit to big to fit onto the Journal provided template.

After drafting these diagrams, I vowed never again! Fortunately, the Mac, Chemdraw and the Laserwriter appeared some 18 months later! I remember going around the (mostly organic) chemists in the department, asking if they would like to join in a bulk purchase and we ended up with 10 Macs. By 1985, the model had moved on to the Mac 512K which were the ones actually purchased and photos of the front and rear of one are shown below (I still have it, hoping a collector might make me an offer one day).

The first year of use revealed an infamous quirk. The port on the rear of the Mac 512K did not support attachment of any hard drives (although in 1985 these were ferociously expensive for a 10 MB drive!) and so most of the time one spent not using eg Chemdraw but pushing floppy disks in and out of the machine. A year later, the Mac Plus 1Mb version was introduced (third photo) and this had a SCSI port. I attached such a 10 Mbyte drive to this port and the bliss at not having to rotate floppy disks was immense.

Back to the 512K model. After they were delivered, I gathered all 9 other users and introduced them all to the mouse. In the first 15 minutes, there were rumblings that they would never get used to such a strange object, but at roughly the 45 minute mark, they were all converts. The program demonstrated was of course Chemdraw. Microsoft Word was not yet available but another simple word processor was (WriteNow) and everyone practised constructing diagrams such as the above. What joy! And no Typex, or starting the diagram from scratch – merely a simple 10 second edit.

By 1987 as I recollect, there were many 1MB models now installed and we set about networking them all together and connecting them to the Laserwriter. We even managed to use the Mac to connect to STN international to search Chemical Abstracts[2] and the modern era was well under way.

So this is my tribute to the Mac on its 40th birthday. I still use them to this day.

References

  1. A.M. Lobo, S. Prabhakar, H.S. Rzepa, A.C. Skapski, M. Tavers, and D.A. Widdowson, "C-substitution reactions of c,n-diaryl nitrones", Tetrahedron, vol. 39, pp. 3833-3841, 1983. https://doi.org/10.1016/s0040-4020(01)88625-7
  2. H. Rzepa, "A trip down memory lane: An online departmental connection map from 1989.", 2023. https://doi.org/10.59350/85xp6-2sy65

Scholarly journals vs Scholarly Blogs.

Friday, January 12th, 2024

First, a very brief history of scholarly publishing, starting in 1665[1] when scientific journals started to be published by learned societies. This model continued until the 1950s, when commercial publishers such as Pergamon Press started with their USP (unique selling point) of rapid time to publication of ~3 months,[2] compared to typical times for many learned society publishers of 2 years or longer. Fast forward another 50 years or so, and the commercial publishers were now dominating the scene, but the business model was still based on institutional subscriptions, whereby the institution rather than authors paid the costs of publication. As the number of journals expanded, even well-off institutions had to make difficult decisions on which subscriptions to keep and which to cancel. By the late 1990s the delivery model was changing from print to online, but the overall issue was that many scientists around the world no longer had access to many journals.

Enter the APC, or article processing charge, whereby the authors themselves had to reimburse the journals for publishing their papers, although they could often still recover these costs from their institution. The cost of an APC depended on the reputation of the journal; those with the highest “impact factors” often charged the highest APCs, some of which could reach £5000+ for a single “paper” (still called that even in an electronic era). Also, some journals remained “hybrid”, where the costs were split between institutional subscriptions and APC funded. At least the latter could be accessed by anyone (including the “public”) without restriction (Open-Access) often also referred to as GOLD  and even Diamond (also known as platinum) articles which  are  GOLD open access but without author fees. Diamond is typically used by publishers who are keen to emphasise that they do not charge authors to publish open access.

With many APCs ranging from £1000 up to £5000 or more, some started asking why it should cost so much to have this type of publishing infrastructure. Also in the early 2000s, “social media” started up, which at first tended to concentrate on instant publication and hence impact. The longevity of these media was not considered capable or indeed even desirable of rivalling that achieved by journal publishers, which after all had been around for 360 years or so. Things have begun to change however. Enter as an example Rogue Scholar, and its associated blog Front Matter. The aim here is to exploit the underpinning technical infrastructure of a blog host by automatically adding features more commonly associated with learned society or commercial journal publishing.

I wrote[3] about some of the features available last September and now only four months later the functionality continues to expand. This includes:

  1. The ability to acquire a JATS XML version (Journal article tag suite), the standard format for scholarly articles
  2. I had previously noted that Blog posts are assigned a DOI based on the Crossref registration agency, and hence also acquire a metadata record which becomes useful for searching. All 800+ of the posts on this site have such a DOI for example.
  3. One interesting recent use of blogs is to act as a science newsletter associated with a funded grant, as an adjunct to simply publishing the research results in a journal.
  4. Indexing is also making big strides with the introduction of an API (application programmer interface), another service offered by scholarly publishers. As part of this, fields of science are being added to the metadata to enable filtering such as eg Chemistry
  5. Archiving, in theory for all of posterity, is also starting to be addressed . This requires transformation from HTML, typically used in blogs, to a medium more appropriate for long term archiving.

The cost of the infrastructures described above are certainly very much less than eg the APC charges noted above, in part because they are so highly automated. I expect things will move very rapidly on this front.


It is hoped to automatically include these in the post itself in the future. Meanwhile, it can easily be retrieved by a suitable search.

References

  1. "Epistle dedicatory", Philosophical Transactions of the Royal Society of London, vol. 1, 1665. https://doi.org/10.1098/rstl.1665.0001
  2. D. Ginsburg, and W.J. Rosenfelder, "Alicyclic studies—X", Tetrahedron, vol. 1, pp. 3-8, 1957. https://doi.org/10.1016/0040-4020(57)85003-0
  3. H. Rzepa, "Improving the Science blog – The Rogue Scholar service.", 2023. https://doi.org/10.59350/8m2d8-47b52

Macrocyclic peptide antibiotics – now Zosurabalpin – then antibacterial agents based on cyclic D,L-α-peptide architectures.

Monday, January 8th, 2024

Zosurabalbin[1],[2], is receiving a great deal of attention as a new class of antibiotic which can target infections for which current treatment options are inadequate. It is a cyclic peptide and seeing this triggered memory of an earlier such species reported way back in 1995[3],[4]. This octa-peptide (YIJDIE, DOI: 10.5517/cc58gxs) was presumed to function in a novel manner, having linear water channels wide enough to form a molecular nanoscale pipe for a stream of water molecules to flow along. When inserted into the bacterial cell membrane via its lipophilic sidechains, it drained the bacterium of its cell water within seconds, thus killing it. A 3D model shows the effect very clearly.

Zosurabalpin does not function in this manner. Its structure was devised by optimising the various substituents until optimal activity was obtained (see this patent WO202319441).

The ligand (VB6) is seen below. A program such as Chimera can tease out many more details.

Zosurabalpin embedded in the protein pdb8frn can be viewed below and the coordinates can be obtained via DOI: 10.2210/pdb8frn/pdb

The original 1995 report[3] about the cyclic octapeptide appears was never developed into a clinically useful antibiotic, but I wonder where this approach led to.

References

  1. C. Zampaloni, P. Mattei, K. Bleicher, L. Winther, C. Thäte, C. Bucher, J. Adam, A. Alanine, K.E. Amrein, V. Baidin, C. Bieniossek, C. Bissantz, F. Boess, C. Cantrill, T. Clairfeuille, F. Dey, P. Di Giorgio, P. du Castel, D. Dylus, P. Dzygiel, A. Felici, F. García-Alcalde, A. Haldimann, M. Leipner, S. Leyn, S. Louvel, P. Misson, A. Osterman, K. Pahil, S. Rigo, A. Schäublin, S. Scharf, P. Schmitz, T. Stoll, A. Trauner, S. Zoffmann, D. Kahne, J.A.T. Young, M.A. Lobritz, and K.A. Bradley, "A novel antibiotic class targeting the lipopolysaccharide transporter", Nature, vol. 625, pp. 566-571, 2024. https://doi.org/10.1038/s41586-023-06873-0
  2. S. Hawser, N. Kothari, T. Valmont, S. Louvel, and C. Zampaloni, "2131. Activity of the Novel Antibiotic Zosurabalpin (RG6006) against Clinical <i>Acinetobacter</i> Isolates from China", Open Forum Infectious Diseases, vol. 10, 2023. https://doi.org/10.1093/ofid/ofad500.1754
  3. M.R. Ghadiri, K. Kobayashi, J.R. Granja, R.K. Chadha, and D.E. McRee, "The Structural and Thermodynamic Basis for the Formation of Self‐Assembled Peptide Nanotubes", Angewandte Chemie International Edition in English, vol. 34, pp. 93-95, 1995. https://doi.org/10.1002/anie.199500931
  4. S. Fernandez-Lopez, H. Kim, E.C. Choi, M. Delgado, J.R. Granja, A. Khasanov, K. Kraehenbuehl, G. Long, D.A. Weinberger, K.M. Wilcoxen, and M.R. Ghadiri, "Antibacterial agents based on the cyclic d,l-α-peptide architecture", Nature, vol. 412, pp. 452-455, 2001. https://doi.org/10.1038/35086601