Archive for the ‘Chemical IT’ Category

Some examples of open access publications citing managed research data (RDM).

Tuesday, January 5th, 2016

In May 2015, the EPSRC funding council in the UK required researchers to publish the outcomes of the funded work to include an OA (open access) version of the narrative and to cite the managed research data used to support the research with a DOI (digital object identifier). I was discussing these aspects with a senior manager (research outcomes) at the EPSRC and he asked me to provide some examples from my area of chemistry; here are some.

The basics are covered by three broad actions:

  1. The researcher should adopt a research data management plan. This can be quite brief, but it is important that it be updated as the strategies evolve with time and that it is consistent within the group (ideally the department).

    • It would include a general policy for the research group to access and if appropriate share a common, private, storage area for the so-called "active data" (data still being analysed and processed). It could for example take the form of cloud storage, using commercial providers such as Box, DropBox or GitHub. The data is accessible only to those who have been granted access.
    • It could be a software organizer in which cloud storage is implicit. For quantum calculations, we use a locally developed system for the purpose which serves a storage function and which has one other even more important attribute in functioning as a generator and collector of metadata associated with the datasets being generated.[1],[2],[3]
  2. The narrative describing the research is then published as an OA article, conjointly with …
  3. … the datasets being published to a data repository, and assigned a DOI.
  4. There are some delicate aspects of ensuring that actions 2 and 3 are synchronised, ensuring that the article cites the data and that the data cites the article. I will not here detail the mechanisms for achieving this.

What follows here are 11 examples of OA articles in which managed data is cited in the manner decribed at the start. You may notice a diversity of styles and procedures. At the time most of these examples were being worked upon, there were few examples or indeed guidelines, and so these really constitute an exploration of various ways in which it can be done.

Article 

DOI

Article short DOI

Representative

dataset

Data is cited as:
[4] 9qg [5]
  • Additional file 1. Interactivity box 1. Data-based object illustrating various aspects of the interaction at the heart of Z-DNA.
  • A footnote in the preceding object: The original complete data set is also available at http://doi.org/10.14469/ch/13514 via a digital repository.
[6] 9qf [7] Interactive Table S1, using dataDOIs referencing a data repository, as eg http://doi.org/10042/to-8576 and 11 other examples.
[8] 9p3 [7]
  • Full details of all calculations are available via the individual digital repository entries associated with Interactivity Boxes 1 and 2 (Web enhanced objects) available with this article (doi 10.6084/m9.figshare.797484, shortdoi: rns) or directly by the following doi resolvers: TS1, 10042/to-13699 and ~40 further entries
  • supplemental data: 10.6084/m9.figshare.777773, shortdoi: rnf.
[9] 9p2 [10] Ref 20 and 21 as an Interactivity box, datadoi: 10.6084/m9.figshare.785756, shortdoi: n6q and further references to individual datasets are available in this object.
[11] 9p4 [12]
  • Each data table or data Figure is assigned a doi in the Figshare repository (see footnotes), all retrievable as e.g. shortdoi: qd8.
  • Each figure or table contains further data citations (~20 per table).
[13] 9p5 [14]
  • footnotes to individual tables (Table 5, Table 7, Table 9)
  • and in the section Associated content at the end of the article, citing Interactive Tables 1, which themselves cite further datadois.
[1] vf4   This article discusses the technology behind five examples of articles which themselves contain citations to data.
[15] 9p6 [16] Refs 17 (doi: 10.6084/m9.figshare.988346, shortdoi: tb3) and 18 (doi: 10.6084/m9.figshare.1293562, shortdoi: znk)
[2] 73z [17] References 27 (10.6084/m9.figshare.1266197, shortdoi: xn3) and 28 (10.6084/m9.figshare.1342036, shortdoi: 2zb).
[18] 9p9 [19] Ref 15, in the form: An interactive table corresponding to the data for these calculations and the experimental details can be retrieved from doi:10.6084/m9.figshare.1181739, shortdoi: vz9. NCI surfaces were created using the resource doi:10.6084/m9.figshare.811862, shortdoi: n5b.
[3] 73x [20] Refs 36 (doi:10.6084/m9.figshare.1342036, shortdoi: 2zb) and ref 50 (10.6084/m9.figshare.1330063, shortdoi:6cq).

I hope this table adds to the open collection of pointers linking open access research articles to associated managed data. One really requires this association to be achieved using metadata and perhaps something along these lines might emerge quite soon from the fruits of the current collaborations between CrossRef and DataCite. Ideally, one should be able to pose search queries along the lines of identifying all research data associated with an article, and indeed vice versa.

When the scientific journal arose some 350 years ago, the format and presentation of the narrative evolved only relatively slowly, an evolution that has accelerated somewhat in the online era largely due to the author guidelines imposed by the publishers. I suspect most authors were happy to allow the publishers to take control of this aspect. There may now however be a similar expectation that the publishers specify how authors' data is managed and presented. I would however argue here that it is the authors themselves who know the attributes of their data best and the 11 examples above show one evolutionary process of the data publication process which in this instance was largely determined by the authors themselves. We should strive to allow the authors to retain these measures of creativity in the future, as RDM and its integration into journals matures and develops.


Interactive tables here were created as convenient collections of dataset DOIs, and have been presented in conjunction with visualisation software such as Jmol or JSmol. These tables can themselves be published in a repository and assigned a DOI. Most of the examples we prepared were published in the Figshare repository (the DOIs for some of which are shown in the last column of the table above). Special actions had to be taken at the Figshare end to allow the tables to be incorporated into the landing page presentation corresponding to the DOI. In December 2015, the site was refactored and this functionality is currently disabled, but should be restored in the near future.

 If anyone reading this post is aware of interesting chemistry examples illustrating formal data citation of managed research data using e.g. a DOI in published articles, do please let me know and if appropriate I will add them to the table above.

 
 
 

References

  1. M.J. Harvey, N.J. Mason, and H.S. Rzepa, "Digital Data Repositories in Chemistry and Their Integration with Journals and Electronic Notebooks", Journal of Chemical Information and Modeling, vol. 54, pp. 2627-2635, 2014. https://doi.org/10.1021/ci500302p
  2. M.J. Harvey, N.J. Mason, A. McLean, and H.S. Rzepa, "Standards-based metadata procedures for retrieving data for display or mining utilizing persistent (data-DOI) identifiers", Journal of Cheminformatics, vol. 7, 2015. https://doi.org/10.1186/s13321-015-0081-7
  3. M.J. Harvey, N.J. Mason, A. McLean, P. Murray-Rust, H.S. Rzepa, and J.J.P. Stewart, "Standards-based curation of a decade-old digital repository dataset of molecular information", Journal of Cheminformatics, vol. 7, 2015. https://doi.org/10.1186/s13321-015-0093-3
  4. H.S. Rzepa, "Chemical datuments as scientific enablers", Journal of Cheminformatics, vol. 5, 2013. https://doi.org/10.1186/1758-2946-5-6
  5. H.S. Rzepa, "C 19 H 28 N 9 O 10 P 1", 2012. https://doi.org/10.14469/ch/13514
  6. M.J. Gomes, L.F. Pinto, P.M. Glória, H.S. Rzepa, S. Prabhakar, and A.M. Lobo, "N-heteroatom substitution effect in 3-aza-cope rearrangements", Chemistry Central Journal, vol. 7, 2013. https://doi.org/10.1186/1752-153x-7-94
  7. H.S. Rzepa, "C 11 H 16 N 1 O 5 -1", 2011. https://doi.org/10.14469/ch/8551
  8. F.L. Cherblanc, Y. Lo, W.A. Herrebout, P. Bultinck, H.S. Rzepa, and M.J. Fuchter, "Mechanistic and Chiroptical Studies on the Desulfurization of Epidithiodioxopiperazines Reveal Universal Retention of Configuration at the Bridgehead Carbon Atoms", The Journal of Organic Chemistry, vol. 78, pp. 11646-11655, 2013. https://doi.org/10.1021/jo401316a
  9. D. Christopher Braddock, J. Clarke, and H.S. Rzepa, "Epoxidation of bromoallenes connects red algae metabolites by an intersecting bromoallene oxide – Favorskii manifold", Chemical Communications, vol. 49, pp. 11176, 2013. https://doi.org/10.1039/c3cc46720a
  10. H.S. Rzepa, "C 6 H 9 Br 1 O 2", 2013. https://doi.org/10.14469/ch/18928
  11. A. Armstrong, R.A. Boto, P. Dingwall, J. Contreras-García, M.J. Harvey, N.J. Mason, and H.S. Rzepa, "The Houk–List transition states for organocatalytic mechanisms revisited", Chem. Sci., vol. 5, pp. 2057-2071, 2014. https://doi.org/10.1039/c3sc53416b
  12. N. Mason, and N. Mason, "C 18 H 23 N 1 O 3", 2013. https://doi.org/10.14469/ch/18808
  13. S. Lal, H.S. Rzepa, and S. Díez-González, "Catalytic and Computational Studies of N-Heterocyclic Carbene or Phosphine-Containing Copper(I) Complexes for the Synthesis of 5-Iodo-1,2,3-Triazoles", ACS Catalysis, vol. 4, pp. 2274-2287, 2014. https://doi.org/10.1021/cs500326e
  14. H.S. Rzepa, "C 15 H 12 I 1 N 3", 2011. https://doi.org/10.14469/ch/10258
  15. K.K.(. Hii, H.S. Rzepa, and E.H. Smith, "Asymmetric Epoxidation: A Twinned Laboratory and Molecular Modeling Experiment for Upper-Level Organic Chemistry Students", Journal of Chemical Education, vol. 92, pp. 1385-1389, 2015. https://doi.org/10.1021/ed500398e
  16. H.S. Rzepa, "C 21 H 32 O 1 S 2", 2015. https://doi.org/10.14469/ch/177853
  17. H.S. Rzepa, N. Mason, A. Mclean, and M. Harvey, "Interoperability for Data Repositories. Machine Methods for Retrieving Data for Display or Mining Utilising Persistent (data-DOI) Identifiers", 2014. https://doi.org/10.6084/m9.figshare.1266197
  18. T. Lanyon-Hogg, M. Ritzefeld, N. Masumoto, A.I. Magee, H.S. Rzepa, and E.W. Tate, "Modulation of Amide Bond Rotamers in 5-Acyl-6,7-dihydrothieno[3,2-<i>c</i>]pyridines", The Journal of Organic Chemistry, vol. 80, pp. 4370-4377, 2015. https://doi.org/10.1021/acs.joc.5b00205
  19. H.S. Rzepa, "C 15 H 15 N 1 O 1 S 1", 2014. https://doi.org/10.14469/ch/25041
  20. H.S. Rzepa, M.J. Harvey, N.J. Mason, A. Mclean, P. Murray-Rust, and J.J.P. Stewart, "Standards-based curation of a decade-old digital repository dataset of molecular information.", 2015. https://doi.org/10.6084/m9.figshare.1330063

Could anyone comment on any recent calculated results on the planarity, or lack thereof, of azobenzene?

Sunday, December 20th, 2015

This question was posted on the CCL (computational chemistry list) by John McKelvey. Here, I give an answer in the form of a search of the CSD (crystal structure database).

I was not sure if the question related purely to the geometries obtained using computational methods or to comparisons with experimentally determined structures. Or indeed whether it related to azobenzene specifically or to azobenzenes in general. Here, I comment only in respect of the latter two. The search was defined as below, with the following specifications:

  1. The absolute value of the central torsion (TOR1) was constrained to 0-60° for cis azobenzenes and to 120-180° for trans azobenzenes.
  2. Two further torsions (TOR2, TOR3) specify the torsion angle about the aryl to N bond.
  3. The R factor is < 0.1, and there are no errors or disorder.
  4. The C-N bonds were specified as acyclic.

search azobenzene

Trans Azobenzenes, 1111 examples
trans azobenzene
Cis Azobenzenes, 42 examples
cis azobenzene

The results show that by and large, trans azo-benzenes are co-planar to ± 30°, but there are some interesting points in the centre with dihedral angles of ~90°. Cis azobenzenes on the other hand are mostly NOT planar, with red hotspots at about 50 or 130° of twist.

These results took about 20 minutes to define, search, and plot as per above. I hope it provides John with an answer, even if it’s not the one he might have meant!

Deviations from planarity of trigonal carbon and from linearity of digonal carbon.

Sunday, September 13th, 2015

Previously, I explored deviation from ideal tetrahedral arrangements of four carbon ligands around a central (sp3) carbon using crystal structures. Now it is the turn of digonal (sp1) and trigonal (sp2) carbons. 

Firstly, the digonal C≡C case. Attached to each carbon of the C≡C unit are two saturated carbon ligands; this to prevent conjugation from influencing our result. 

Scheme

The result of a search (R-factor < 5%, no errors, no disorder) shows the hotspot at the expected ~180°, but then a fascinating curve as the angle subtended at the digonal carbon angle decreases down to ~110°, with the C≡C bond length gradually increasing. This apparently non-linear behaviour would be interesting to replicate using quantum mechanics.

Scheme

Next, the trigonal case. Again, the substituents are 4-coordinate carbons to prevent complicating conjugations.

Scheme

A plot of the C=C distance vs the C-C=C angle brings a surprise. There are four clusters centered at angles of ~132°, 123°, 110° and 94° (cyclobutenes) and a small cluster at ~150°. The C=C distance stays constant at around 1.335Å or shorter, a clear difference with the sp-case. There is perhaps a small outlier collection where the angle is ~108° and the distance ~1.4Å.

Scheme

This plots the dihedral angle subtended at one of the trigonal carbon atoms and measures how non-planar that atom is. There is again no real evidence that the C=C bond length changes as the trigonal centre becomes bent.

Scheme

This dihedral angle measures the twist about the C=C bond; up to about 30° is tolerated, but again there is no clear indication of a systematic change in the C=C length.

Scheme

These analyses reveal general trends on bond lengths induced by distorting the normal coordination around trigonal and digonal carbon atoms. It is only the start of the story of course, since there are plenty of isolated outliers that really should be explored; some may be simply due to undetected crystallographic errors, whilst with others there may lurk interesting or even new chemical phenomena. 


Below, the crystal structure result (with the axes transposed) is compared to a closed shell single reference ωB97XD/6-311+G(2df) calculation. Whilst the trend is replicated, it is not quantitative. This is probably because many of the crystal structures are perturbed by other effects, most probably by coordination of a metal and hence back-donation of π-electrons into vacant metal orbitals. The CSD indexing of the structures however retains the C≡C bond notation, even though the bond is no longer truly a triple one. This reinforces the observation I made in the previous post that when searching the CSD, one can stipulate a bond type to constrain the search. But that bond type may be purely nominal and bear little resemblance to the actual electronic structure of the species. There are other issues;  the wave function was constrained to closed shell single determinant. At low angles, the calculation itself is probably not accurate (as can be seen from a kink in the plot, indicating instability).

Scheme

Scheme


Deviations from tetrahedral four-coordinate carbon: a statistical exploration.

Sunday, September 6th, 2015

An article entitled “Four Decades of the Chemistry of Planar Hypercoordinate Compounds[1] was recently reviewed by Steve Bacharach on his blog, where you can also see comments. Given the recent crystallographic themes here, I thought I might try a search of the CSD (Cambridge structure database) to see whether anything interesting might emerge for tetracoordinate carbon.

The search definition is shown below using a  simple carbon with four ligands, the ligands themselves also being tetracoordinate carbon. The search is restricted to data collected below temperatures of 140K, as well as R-factor <5%, no errors and no disorder. Cyclic species are allowed and a statistically reasonable 2773 hits emerged from the search.

Scheme

Recollect that the idealised angle subtended at the centre is 109.47°. I show below three separate heat plots of the search results. Why three? The way the search software (Conquest) works is that one could define four C-C distances and six angles, and then plot any combination of one distance and one angle. I show just three combinations here, but could have included many more.

There appear to be four distinct clusters of values for this angle that emerge from the three plots shown below (the “bin size” is 100, and the frequency colour code indicates how many hits there are in each bin).

  1. The hotspot is unsurprisingly ~109° with a corresponding C-C distance of ~1.54Å.
  2. There may be two clusters at angles of ~60° (cyclopropane), with C-C values ranging from ~1.47 to ~1.55Å.
  3. A collection at ~90° (mostly cyclobutane?), with C-C values up to 1.6Å.
  4. A collection at ~140° (again small rings), now with much shorter C-C values of ~1.46Å. This reminds of the approximation that the hybridisation in e.g. cyclopropane is a combination of sp5 and sp3.

Scheme

Scheme

Scheme

Ideally, what one might want to plot would be sums of four angles; for a pure tetrahedral carbon the sum would always be 438° (4*109.47°) but for a pure planar carbon it could be as low as 360° (4*90°). One could then see how closely the distribution approaches to the latter and hence reveal whether there are any true planar tetracoordinate carbon species known. Although the Conquest software cannot analyse in such terms, a Python-based API has recently been released that should allow this to be done, although I should state that this requires a commercial license and it is not open access code. If we manage to get it working, I will report!


As a teaser I also include a plot of six-coordinate carbon, in which the ligands can be any non-metal. Note the clusters at angles of 60, ~112 and ~120-130°. It is worth pointing out that the definition of the connection between a carbon and a ligand as a “bond” becomes increasingly arbitrary as the coordination becomes “hyper”. Because crystallography does not measure electron densities in “bonds”, we know nothing of its topology in this region. It is therefore quite possible that the appearance of the heat plot below might be related just as much to whatever convention is being used in creating the entry in the CSD as it would be to a quantum analysis of the bonding.

Scheme

References

  1. L. Yang, E. Ganz, Z. Chen, Z. Wang, and P.V.R. Schleyer, "Four Decades of the Chemistry of Planar Hypercoordinate Compounds", Angewandte Chemie International Edition, vol. 54, pp. 9468-9501, 2015. https://doi.org/10.1002/anie.201410407

π-Resonance in thioamides: a crystallographic “diff” with amides.

Saturday, September 5th, 2015

The previous post explored the structural features of amides. Here I compare the analysis with that for the closely related thioamides.

Scheme

Here is the torsional analysis around the C-N bond. The “diff” (difference) is that almost all the hits are concentrated into angles of 0° or 180°; the twist about the C-N bond from co-planarity is much less if S is present. This is normally explained in terms of Spπ-Cpπ overlaps being less favourable than Opπ-Cpπ ones owing to the mismatch in the size of the atomic orbital for S and C. Hence the resonance which reduces the C=S double bond character in favour of greater C=N character is enhanced compared to O.

Scheme

A consequence is that the nitrogen atom is less easily deformed from planarity in a thioamide. Notice also that at the hotspot, the C=N distance is ~1.32Å compared to 1.34Å for a regular amide.

Scheme

This emerges from the plot below as well; the range of values for the C-N bond is reduced compared to amides, but the diagonal trend that as the C=N bond gets longer so the C-S gets shorter is still seen.

Scheme

All these trends are described qualitatively in most text books of organic chemistry, but one never sees statistical evidence for them. And it truly only takes 5-10 minutes to produce.

π-Resonance in thioamides: a crystallographic "diff" with amides.

Saturday, September 5th, 2015

The previous post explored the structural features of amides. Here I compare the analysis with that for the closely related thioamides.

Scheme

Here is the torsional analysis around the C-N bond. The “diff” (difference) is that almost all the hits are concentrated into angles of 0° or 180°; the twist about the C-N bond from co-planarity is much less if S is present. This is normally explained in terms of Spπ-Cpπ overlaps being less favourable than Opπ-Cpπ ones owing to the mismatch in the size of the atomic orbital for S and C. Hence the resonance which reduces the C=S double bond character in favour of greater C=N character is enhanced compared to O.

Scheme

A consequence is that the nitrogen atom is less easily deformed from planarity in a thioamide. Notice also that at the hotspot, the C=N distance is ~1.32Å compared to 1.34Å for a regular amide.

Scheme

This emerges from the plot below as well; the range of values for the C-N bond is reduced compared to amides, but the diagonal trend that as the C=N bond gets longer so the C-S gets shorter is still seen.

Scheme

All these trends are described qualitatively in most text books of organic chemistry, but one never sees statistical evidence for them. And it truly only takes 5-10 minutes to produce.

π-Resonance in amides: a crystallographic reality check.

Saturday, September 5th, 2015

The π-resonance in amides famously helped Pauling to his proposal of a helical structure for proteins. Here I explore some geometric properties of amides related to the C-N bond and the torsions about it.

Scheme

The key aspect of amides is that a lone pair of electrons on the nitrogen can conjugate with the C=O carbonyl only if the lone pair orbital is parallel to the C-O π-system. We can define this with the O=C-N-R torsion angle (and equate 0 or 180° with the p-orbitals being parallel). In the above definition, each R can be either 4-coordinate C (to avoid alternative conjugations) or H and the C-N bond is specified as being cyclic. As usual the R-factor is < 5%, no errors, no disorder.

First, the C-N torsion, which adopts values of either 0 or 180°. Notice that whilst the anti R-group shows no more than about 20° deviation from 180°, it does have a small tail tending towards longer C-N distances of >1.4Å. The hotspot is for the syn R-group.  Here there is a strong trend that as the dihedral deviates from 0° the C-N bond very clearly elongates. As the π-π overlap decreases, the bond elongates from the hot spot value of ~1.34Å to 1.41Å at 50°. The greater propensity of the syn-R to twist may be because it incurs more steric hindrance or perhaps because we have defined the C-N bond to be part of a cycle.

Scheme

Next, we plot the C-N distance against the torsion R-N-C-R’, which defines how planar the nitrogen is. A value of 180° is planar and the hot-spot is here. But as the planarity decreases down to almost tetrahedral (110°) the C-N bond elongates to  1.41Å. Notice one rather intriguing aspect;  from 180° to 160° or so, there is little response from the  C-N bond, but the elongation really accelerates from 140° to 110°. A little twisting hardly affects the π-π overlap, but it really starts to matter for twists of >50°.

Scheme

Finally a plot of the C-N vs the C-O distances. As the C-N increases, the C-O contracts, this being a nice summary of the π resonance in amides. 

Scheme

We have not seen any surprises, but this statistical exploration of crystal structures at least puts some numbers on the changes in bond lengths as a result of conjugative resonance.

A sea-change in science citation? The Wikipedia Science conference.

Thursday, September 3rd, 2015

The first conference devoted to scientific uses of Wikipedia has just finished; there was lots of fascinating stuff but here I concentrate on one report that I thought was especially interesting. To introduce it, I need first to introduce WikiData. This is part of the WikiMedia ecosystem, and one of the newest. The basic concept is really simple.

  1. It is a repository for data objects; 14,757,419 of them as I write this to be precise. These are called items, and each has an ID, prefaced with the letter Q. An example might be mauveine, which is Q421898.
  2. Any item can have one or more properties which can only be selected from a controlled list, of which there are around 3000. An example here might be an individual’s ORCID identifier, which is P496. ORCID is also an item, Q51044.
  3. As with Wikipedia itself, WikiData also has a set of community rules which contributors have to follow. One of the rules in Wikipedia (the COI or conflict of interest rule) which deprecates an interested party from making a significant contribution to any page. Items in Wikidata have a looser COI code; facts are basically facts rather than opinions, but their provenance still matters of course.

With the basic structure set out, I will now describe what I heard today.

  1. An item can be a citation (to the scientific published literature), of which there are currently around 76 million, although currently nothing like that number are currently in WikiData.
  2. One of the properties of a citation item is its DOI or digital object identifier (P356), which would nowadays be regarded as in effect mandatory. Citations can have other properties, which would be populated from CrossRef or DataCite such as metadata associated with e.g. the DOI itself; the journal, the authors, the date, etc. Citations from DataCite can in fact have far richer metadata than the usual, if you follow this link you can see an example of such data properties.
  3. But here is the new stuff. Citations as items can have more subtle properties. Thus a citation could be invoked with a property: A disagrees with B, where A and B are both items (or perhaps properties).

You can see from this that allowing a citation to have such properties can potentially revolutionise the way a scientific article can be constructed. When a citation is invoked, the context the authors wish that citation to have can be added. Contrast this with the context-free way in which articles currently cite other articles. And as with anything in WikiData, instances can be counted, the context in which instances occur can be identified and statistics accumulated.

The way it might work is not so much that any interested reader (a human) would browse through WikiData. Instead it is something that a machine (software) might invoke. In Wikipedia for example, one can transclude or subsume into the article an item from Wikidata. This could be a citation, which you could transclude with one or more associated properties. In chemistry at the moment, the most prominent objects that are constructed from such Wikidata transclusions are ChemBoxes, or tables of properties of molecules as items (Q52426). This is done dynamically at the time of reading the Wikipedia article and so you can imagine that such transclusions can respond as the values of properties are updated/corrected/extended. Unfortunately I do not (yet) know of a good example of all of this which can be linked to here. If any do come to light, I will try to remember to add them here.

As often happens, the concepts above are not entirely new; many were already present in a variation of the Wiki called the Semantic MediaWiki and experiments in chemistry were tried as early as 2007.‡ But WikiData is far easier to use and in symbiosis with a conventional Wiki it might just start to fly now.

The implications of all of this for the way in which a scientific article might work are deep from many different perspectives. I do wonder whether all this data-rich context in which a scientific article or narrative might be couched will be welcomed by either publishers or indeed authors. Perhaps the emotions that humans have but which machines do not will in fact dominate. But it does appear to have the potential for a sea-change in how scientists exchange information.


The number of itemised molecules recently reached 100 million, and there are a few thousand (>? <?) well defined properties that can be associated with molecules. So the whole of known molecular chemistry is actually not that different in scale from the current Wikidata.

Semantic wiki as a model for an intelligent chemistry journal, Rzepa, Henry S. Abstracts of Papers, 233rd ACS National Meeting, Chicago, IL, United States, March 25-29, 2007, CINF-053. Abstract and talk.

A visualization of the anomeric effect from crystal structures.

Thursday, August 27th, 2015

The anomeric effect is best known in sugars, occuring in sub-structures such as RO-C-OR. Its origins relate to how the lone pairs on each oxygen atom align with the adjacent C-O bonds. When the alignment is 180°, one oxygen lone pair can donate into the C-O σ* empty orbital and a stabilisation occurs. Here I explore whether crystal structures reflect this effect.

Scheme

The torsion angles along each O-C bond are specified, along with the two C-O distances. All the bonds are declared acyclic, and the usual R < 5%, no disorder and no errors specified.

  1. You can see from the plot below that the hotspot occurs when both RO-CO torsions are ~65°. From this we will assume that the two (unseen) lone pairs at any one of the oxygens are distributed approximately tetrahedrally around each oxygen, and if this is true then one of them must by definition be oriented ~ 180° with respect to the same RO-CO bond (the other is therefore oriented -60°). This allows it to be antiperiplanar to the adjacent C-O bond and hence interact with its σ* empty orbital. So the hotspot corresponds to structures where BOTH oxygen atoms have lone pairs which interact with the adjacent O-C anti bond.
  2. There is a tiny cluster for which both RO-CO torsions are ~180° and hence neither oxygen has an antiperiplanar lone pair.
  3. Only slightly larger are clusters where one torsion is ~65° and the other ~180°, meaning that only one oxygen has an antiperiplanar lone pair.
  4. A plot of the two C-O lengths indeed shows an overall hotspot at ~1.40Å for both distances. If the search is filtered to include only torsions in the range 150-180°, the hotspot value increases to 1.415Å for both. If one torsion is restricted to 40-80° and the other to 150-180° the hotspot shows one C-O bond is about 0.012Å shorter than the other.

Scheme

Scheme

I also include a further constraint, that the diffraction data must be collected below 140K. The hotspot moves to ~ 55/60° indicating values free of some vibrational noise.

Scheme

Interestingly, replacing  oxygen with  nitrogen reveals relatively few examples of the effect (C(NR2)4 is an exception). Replacing  O by divalent S produces only 13 hits, with the surprising result (below) that in all of them only one S sets up an anomeric interaction. Arguably, the number of examples is too low to draw any firm conclusions from this observation.

Scheme


Most diffractometers measure low angle scattering of X-rays by high density electrons. These are the core electrons associated with a nucleus rather than the valence electrons associated with lone pairs. Hence very few positions of valence lone pairs have ever been crystallographically measured.

Mesomeric resonance in substituted benzenes: a crystallographic reality check.

Wednesday, August 26th, 2015

Previously, I showed how conjugation in dienes and diaryls can be visualised by inspecting bond lengths as a function of torsions. Here is another illustration, this time of the mesomeric resonance on a benzene ring induced by an electron donating substituent (an amino group) or an electron withdrawing substituent (cyano).

Scheme

In both cases, you can see this resonance showing as a lengthening of the C(ipso)-C(ortho) and C(meta)-C(para) bonds, and a contracting of the C(ortho)-C(meta) bonds. Does this reflect in the measured structures? The usual search is applied (R < 5%, no disorder, no errors) and qualified with the following:

  1. The amino has three bonds, and can bear either H, or 4-bonded carbon only.
  2. R on the ring can be either H or C.
  3. Three distances are defined.

Scheme

The results of a search are shown below; the hotspot shows the C-C(ortho) distance is close to 1.40Å, whilst the corresponding value for C(ortho)-C(meta) is 1.38Å, a contraction of ~0.02Å. The contraction is smaller for phenols (~0.01Å).

Scheme

The C(ortho)-C(meta) vs C(meta)-C(para) amino plot shows a cluster of hotspots for which the former (1.38Å) is  shorter than the latter (~1.39Å) but the effect is less clear cut as the distance from the substituent increases.

Scheme

For an electron withdrawing cyano substituent, C(ipso)-C(ortho) at 1.395Å is longer than C(ortho)-C(meta) at 1.385Å, although the difference seems smaller than for the amino substituent. The (ortho)-C(meta) to C(meta)-C(para) comparison is similar.

Scheme

Scheme

These searches take but a few minutes to perform, and do serve as a reality check on the oft-seen mesomeric π-resonance shown in all organic text books.