Posts Tagged ‘Oscar’

Examples please of FAIR (data); good and bad.

Sunday, May 6th, 2018

The site fairsharing.org is a repository of information about FAIR (Findable, Accessible, Interoperable and Reusable) objects such as research data.

A project to inject chemical components, rather sparse at the moment at the above site, is being promoted by workshops under the auspices of e.g. IUPAC and CODATA and the GO-FAIR initiative. One aspect of this activity is to help identify examples of both good (FAIR) and indeed less good (unFAIR) research data as associated with contemporary scientific journal publications.

Here is one example I came across in 2017.[1]. The data associated with this article is certainly copious, 907 pages of it, not including data for 21 crystal structures! The latter is a good example of FAIR, being offered in a standard format (CIF) well-adapted for the type of data contained therein and for which there are numerous programs capable of visualising and inter-operating (i.e. re-using) it. The former is in PDF, not a format originally developed for data and one could argue is closer to the unFAIR end of the spectrum. More so when you consider this one 907-page paginated document contains diverse information including spectra on around 60 molecules. Thus the spectra are all purely visual; they are obviously data but in a form largely designed for human consumption and not re-use by software. The text-based content of this PDF does have numerous pattens, which lends itself to pattern recognition software such as OSCAR, but patterns are easily broken by errors or inexperience and so we cannot be certain what proportion of this can be recovered. The metadata associated with such a collection, if there is any at all, must be general and cannot be easily related to specific molecules in the collection. So I would argue that 907 pages of data as wrapped in PDF is not a good example of FAIR. But it is how almost all of the data currently being reported in chemistry journals is expressed. Indeed many a journal data editor (a relatively new introduction to the editorial teams) exerts a rigorous oversight over the data presented as part of article submissions to ensure it adheres to this monolithic PDF format.

You can also visit this article in Chemistry World (rsc.li/2HG7lTk) for an alternative view of what could be regarded as rather more FAIR data. The article has citations to the FAIR components, which is not published as part of the article or indeed by the journal itself but is held separately in a research data repository. You will find that at doi: 10.14469/hpc/3657 where examples of computational, crystallographic and spectroscopic data are available.

The workshop I allude to above will be held in July. Can I ask anyone reading this blog who has a favourite FAIR or indeed unFAIR example of data they have come across to share these here. We also need to identify areas simply crying out for FAIRer data to be made available as part of the publishing process beyond the types noted above. I hope to report back on both such feedback and the events at this workshop in due course.

References

  1. J.M. Lopchuk, K. Fjelbye, Y. Kawamata, L.R. Malins, C. Pan, R. Gianatassio, J. Wang, L. Prieto, J. Bradow, T.A. Brandt, M.R. Collins, J. Elleraas, J. Ewanicki, W. Farrell, O.O. Fadeyi, G.M. Gallego, J.J. Mousseau, R. Oliver, N.W. Sach, J.K. Smith, J.E. Spangler, H. Zhu, J. Zhu, and P.S. Baran, "Strain-Release Heteroatom Functionalization: Development, Scope, and Stereospecificity", Journal of the American Chemical Society, vol. 139, pp. 3209-3226, 2017. https://doi.org/10.1021/jacs.6b13229

The importance of being complete.

Monday, September 26th, 2011

To (mis)quote Oscar Wilde again, ““To lose one methyl group may be regarded as a misfortune; to lose both looks like carelessness.” Here, I refer to the (past) tendency of molecular modellers to simplify molecular structures. Thus in 1977, quantum molecular modelling, even at the semi-empirical level, was beset by lost groups. One of my early efforts (DOI: 10.1021/ja00465a005) was selected for study because it had nothing left to lose; the mass spectrometric fragmentation of the radical cations of methane and ethane. Methyl, phenyl and other “large” groups were routinely replaced by hydrogen in order to enable the study. Cations indeed were always of interest to modellers; the relative lack of electrons almost always meant unusual or interesting structures and reactions (including this controversial species, DOI: 10.1021/ja00444a012). Inured to such functional loss, we modellers forgot that (unless in a mass spectrometer), cations have to have a counter anion. Here I explore one example of the model being complete(d).

The ion-pair complex of cyclobutadiene.

In the earlier post on this topic I had explored the possibility of a new isomer of cyclobutadiene, induced by the presence nearby of a strong acid, in the form of guanidinium cation. You might note there was no mention of any counterion! Well, here I add it in to complete the model, using perchlorate. I was following in a sense my own advice on Steve Bachrach’s blog, where the NMR spectrum of the adamantly cation was discussed. I had argued there that the anion (I chose SnCl5) might actually have an effect on the NMR. For the cyclobutadiene complex above without a counter-ion, this non-planar form of the cyclobutadiene was calculated earlier to be ~8.5 kcal/mol in free energy higher than the rectangular conventional geometry. Add the perchlorate as above, and this energy difference drops to 4.1 kcal/mol (modelled in water as a solvent). So the counter-ion CAN make a difference!

What are the implications to a modeller of adding counter ions? Well, when you start doing such calculations, you find that the practical matter of optimising the geometry is not quite as straightforward as it is found to be for what I would call covalently bonded systems. These latter have pretty predictable geometries, and these geometries are pretty rigid. Ion-pairs on the other hand are less predictable. Note for example in the above diagram that the perchlorate counterion sits to one side of the molecule, and is not symmetrical. The potential energy surface can be very flat indeed, which means that locating the optimal geometry can be quite a struggle. And unlike a covalent structure, where once the location of the covalent bonds is decided, there is little further ambiguity, ion-pairs may have many different possible relative orientations. Thus the above one may not be unique!

But the last word to this post should be: do not forget counter ions if you a looking at ionic species, and always strive to be complete!