Archive for the ‘Chemical IT’ Category

Raw data: the evolution of FAIR data and crystallography.

Tuesday, March 1st, 2022

Scientific data in chemistry has come a long way in the last few decades. Originally entangled into scientific articles in the form of tables of numbers or diagrams, it was (partially) disentangled into supporting information when journals became electronic in the late 1990s. The next phase was the introduction of data repositories in the early naughties. Now associated with innovative commercial companies such as Figshare and later the non-commercial Zenodo, such repositories are also gradually spreading to institutional form such as eg the earlier SPECTRa project of 2006[1] and still evolving.[2] Perhaps the best known, and certainly the oldest example of curated data in chemistry is the CCDC (Cambridge crystallographic data centre) CSD (Cambridge structural database) which has been operating for more than 55 years now. Curation here is the important context, since there you will find crystal diffraction data which has been refined into a structural model, firstly by the authors reporting the structure and then by CSD who amongst other operations, validate the associated data using a utility called CheckCIF.[3] What perhaps is not realised by most users of this data source is that the original or “raw” data, as obtained from a X-ray diffractometer and which the CSD data is derived from, is not actually available from the CSD. This primary form of crystallographic data is the topic of this post.

(more…)

References

  1. J. Downing, P. Murray-Rust, A.P. Tonge, P. Morgan, H.S. Rzepa, F. Cotterill, N. Day, and M.J. Harvey, "SPECTRa: The Deposition and Validation of Primary Chemistry Research Data in Digital Repositories", Journal of Chemical Information and Modeling, vol. 48, pp. 1571-1581, 2008. http://dx.doi.org/10.1021/ci7004737
  2. M.J. Harvey, A. McLean, and H.S. Rzepa, "A metadata-driven approach to data repository design", Journal of Cheminformatics, vol. 9, 2017. http://dx.doi.org/10.1186/s13321-017-0190-6
  3. A.L. Spek, "Structure validation in chemical crystallography", Acta Crystallographica Section D Biological Crystallography, vol. 65, pp. 148-155, 2009. http://dx.doi.org/10.1107/s090744490804362x

Data base or Data repository? – A brief and very selective history of data management in chemistry.

Wednesday, January 26th, 2022

Way back in the late 1980s or so, research groups in chemistry started to replace the filing of their paper-based research data by storing it in an easily retrievable digital form. This required a computer database and initially these were accessible only on specific dedicated computers in the laboratory. These gradually changed from the 1990s onwards into being accessible online, so that more than one person could use them in different locations. At least where I worked, the infrastructures to set up such databases were mostly not then available as part of the standard research provisions and so had to be installed and maintained by the group itself. The database software took many different forms and it was not uncommon for each group in a department to come up with a different solution that suited its needs best. The result was a proliferation of largely non-interoperable solutions which did not communicate with each other. Each database had to be searched locally and there could be ten or more such resources in a department. The knowledge of how the system operated also often resided in just one person, which tended to evaporate when this guru left the group.

(more…)

Quantum chemistry interoperability (library): another step towards FAIR data.

Saturday, January 1st, 2022

To be FAIR, data has to be not only Findable and Accessible, but straightforwardly Interoperable. One of the best examples of interoperability in chemistry comes from the domain of quantum chemistry. This strives to describe a molecule by its electron density distribution, from which many interesting properties can then be computed. The process is split into two parts:

(more…)

A comparison of searches based on metadata records from three (update: five) research repositories.

Tuesday, September 28th, 2021

In the previous blog post, I looked at the metadata records registered with DataCite for some chemical computational modelling files as published in three different repositories. Here I take it one stage further, by looking at how searches of the DataCite metadata store for three particular values of the metadata associated with this dataset compare.

(more…)

A comparison of descriptive metadata across different data repositories.

Tuesday, September 28th, 2021

The number of repositories which accept research data across a wide spectrum of disciplines is on the up. Here I report the results of conducting an experiment in which chemical modelling data was deposited in three such repositories and comparing the richness of the metadata describing the essential properties of the three depositions.

(more…)

HPC Access and Metadata Portal (CHAMP).

Monday, September 13th, 2021

You might have noticed if you have read any of my posts here is that many of them have been accompanied since 2006 by supporting calculations, normally based on density functional theory (DFT) and these calculations are accompanied by a persistent identifier pointer to a data repository publication. I have hitherto not gone into the detail here of the infrastructures required to do this sort of thing, but recently one of the two components has been updated to V2, after being at V1 for some fourteen years[1]  and this provides a timely opportunity to describe the system a little more. 

(more…)

References

  1. M.J. Harvey, N.J. Mason, and H.S. Rzepa, "Digital Data Repositories in Chemistry and Their Integration with Journals and Electronic Notebooks", Journal of Chemical Information and Modeling, vol. 54, pp. 2627-2635, 2014. http://dx.doi.org/10.1021/ci500302p

Octopus publishing: dis-assembling the research article into eight components.

Friday, August 13th, 2021

In 2011, I suggested that the standard monolith that is the conventional scientific article could be broken down into two separate, but interlinked components, being the story or narrative of the article and the data on which the story is based. Later in 2018 the bibliography in the form of open citations were added as a distinct third component.[1] Here I discuss an approach that has taken this even further, breaking the article down into as many as eight components and described as “Octopus publishing” for obvious reasons. These are;

(more…)

References

  1. D. Shotton, "Funders should mandate open citations", Nature, vol. 553, pp. 129-129, 2018. http://dx.doi.org/10.1038/d41586-018-00104-7

Room-temperature superconductivity in a carbonaceous sulfur hydride!

Saturday, October 17th, 2020

The title of this post indicates the exciting prospect that a method of producing a room temperature superconductor has finally been achived[1]. This is only possible at enormous pressures however; >267 gigaPascals (GPa) or 2,635,023 atmospheres.

(more…)

References

  1. E. Snider, N. Dasenbrock-Gammon, R. McBride, M. Debessai, H. Vindana, K. Vencatasamy, K.V. Lawler, A. Salamat, and R.P. Dias, "RETRACTED ARTICLE: Room-temperature superconductivity in a carbonaceous sulfur hydride", Nature, vol. 586, pp. 373-377, 2020. http://dx.doi.org/10.1038/s41586-020-2801-z

Exploiting the power of persistent identifiers (PIDs) for locating all kinds of research object.

Saturday, August 29th, 2020

The folks at DataCite have announced a new research object discovery service which aims to give users a “comprehensive overview of connections between entities in the research landscape”. The portal https://commons.datacite.org acts as the entry point for three basic types of persistent identifiers (PIDs);

(more…)

A cascading tutorial in finding rich NMR data using the Datacite datasearch engine.

Saturday, April 11th, 2020

In the previous post, I introduced three of a new generation of search engines specialising in the discovery of data. Data has some special features which make its properties slightly different from the conceptual (or natural language) searches we are used to performing for general information and so a search engine specifically for data is invariably going to reflect this. At the simplest level, the data search can retain much of the generic simplicity of a regular search, but to exploit the unique features of data, one really does have to move on to an advanced mode. Here, by introducing a set of search definitions that gradually increase in specificity and power, I hope to convey some of the flavour of one way in which this could be done.

(more…)