Posts Tagged ‘Quotation’
Saturday, February 3rd, 2018
The topic of open citations was presented at the PIDapalooza conference and represents a third component in the increasing corpus of open scientific information.
David Shotton gave us an update on Citations as First Class data objects – Citation Identifiers and introduced (me) to the blog where he discusses this topic. The citations or bibliography has long been regarded as an essential, and until recently inseparable, component at the end of a scientific article. It is also a component easily susceptible to “game play“. Authors can be tempted to self-cite themselves, possibly to excess and perhaps worse, to cite their friends and colleagues for other than purely scientific reasons. There are other issues. Thus to infer the context of any particular citation, one has to read the text where it is cited and this too can be subjected to game play. One may have to “read between the lines” to try to judge whether the citation is being cited favourably as supporting any case being made, or instead to indicate disagreement with the cited authors. An article that is being cited because one disagrees with the conclusions therein may still go on to contribute to the cited author’s “h-index” of esteem. So there are various aspects of citations that deserve improvement, or certainly development and evolution.
Shotton told us that many publishers are now releasing article citations as open (CC0) data in their own right, as urged to do so on the Initiative for Open Citations site. A corpus of some 13 million of these are now available as RDF triples with a SPARQL end-point. This latter means that semantic searches of the corpus can be undertaken. So what are the benefits? Worthy aspirations such as to explore connections between knowledge fields, and to follow the evolution of ideas and scholarly disciplines (similar in fact to the new Dimensions product I discussed in the previous post). When I probed into the various sites linked above, I had in mind to identify some clear scientific outcomes of making them available in this manner, perchance even in the field of chemistry. When I succeed I will follow-up on this post, but at the moment I am not yet in a position to illustrate these benefits with chemical stories. If anyone reading this post has such, please let us know!
I will conclude here by noting much discussion at universities of the future of the scientific article itself; whether it should be increasingly mandated as GOLD Open Access (made so by payment of an article processing charge, or APC, by its authors), or whether journals should retain the hybrid publishing models where only a proportion of articles are GOLD, and the remainder are paid for by subscription fees for licensing access to the non-GOLD articles in the journal. Meanwhile, in what seems sometimes as a separate conversation, the article itself is being dis-assembled into components such as open and/or FAIR data, open citations, infographics, social media and yes, even blogs. Are these two evolutions headed in different directions? Certainly, I think the future is not what it used to be!
Tags:Academic publishing, Applied linguistics, article processing charge, British National Corpus, chemical stories, cited author, Corpus linguistics, David Shotton, Entertainment/Culture, Linguistics, Open access, Quotation, RDF, social media, Texas A&M–Corpus Christi Islanders women's basketball
Posted in Chemical IT | 2 Comments »
Tuesday, January 23rd, 2018
Another occasional conference report (day 1). So why is one about “persistent identifiers” important, and particularly to the chemistry domain?
The PID most familiar to most chemists is the DOI (digital object identifier). In fact there are many; some 60 types have been collected by ORCID (themselves purveyors of researcher identifiers). They sometimes even have different names; in life sciences they tend to be known instead as accession numbers. One theme common to many (probably not all) is that they represent sources of metadata about the object being identified. Further information if which allows you (or a machine) to decide if acquiring the full object is worthwhile. So in no particular order, here are some of the things I learnt today.
- Mark Hahnel noted the recent launch of the Dimensions resource which links research data with other research activities; I have not yet had a chance to learn its capabilities, but it seems an interesting alternative to other stalwarts such as eg Google Scholar etc.
You can try this example: https://app.dimensions.ai/discover/publication?search_text=10.6084&search_type=kws&full_search=true which retrieves articles in which the data repository with prefix 10.6084 (Figshare) is cited. Try also the prefix 10.14469 which is the Imperial College repository.
- Andy Mabbett talked about the deployment and use of persistent identifiers (the Q numbers) in Wikidata, which increasingly underpin the basis for the various flavours of Wikipedia. He also noted their use of some 50 different identifiers.
- Johanna McEntyre noted some 5M published articles in life sciences which reference 1M+ ORCID identifiers, easily the domain with the fastest uptake of this type. Also noted was the new FREYA project; aiming to connect open identifiers for discovery, access and use of research resources.
- Tom Gillespie talked about RRID, or Research Resource Identifiers. Included in this are hardware, including instruments and with around 6000 RRIDs systematized so far. They argue this area promotes both the A and I of FAIR (accessible and inter-operable). Of course A and I mean many things to many people.
- Several other presentations talked about the finer detail of metadata, such as sub-classifications into e.g. descriptive/admin/technical, but I did rather miss demos showing how search queries of such fine-grained metadata could be constructed.
Apart from the presentations themselves, PIDapalooza is unusual for some other activities. Thus you could go get your PIDnails done, with a selection of 8 or so tasteful logos to choose from. There will be tattoos tomorrow (this is a conference for younger people after all). I may grab a photo or two to provide evidence!
Tags:Academic publishing, Andy Mabbett, Digital Object Identifier, Identifiers, Imperial College, Index, Information science, Johanna McEntyre, Knowledge, Mark Hahnel, ORCiD, Persistent identifier, Publishing, Quotation, researcher, Scholarly communication, SciCrunch, search engines, Technical communication, Technology/Internet, Tom Gillespie
Posted in Chemical IT | 1 Comment »
Tuesday, August 29th, 2017
Another selection (based on my interests, I have to repeat) from WATOC 2017 in Munich.
- Odile Eisenstein gave a talk about predicted 13C chemical shifts in transition metal (and often transient) complexes, with the focus on metallacyclobutanes. These calculations include full spin-orbit/relativistic corrections, essential when the carbon is attached to an even slightly relativistic element. She noted that the 13C shifts of the carbons attached to the metal fall into two camps, those with δ ~+80 ppm and those with values around -8 ppm. These clusters are associated with quite different reactivities, and also seem to cluster according to the planarity or non-planarity of the 4-membered ring. There followed some very nice orbital explanations which I cannot reproduce here because my note taking was incomplete, including discussion of the anisotropy of the solid state spectra. A fascinating story, which I add to here in a minor aspect. Here is a plot of the geometries of the 52 metallacyclobutanes found in the Cambridge structure database. The 4-ring can be twisted by up to 60° around either of the C-C bonds in the ring, and rather less about the M-C bonds. There is a clear cluster (red spot) for entirely flat rings, and perhaps another at around 20° for bent ones, but of interest is that it does form something of a continuum. What is needed is to correlate these geometries with the observed 13C chemical shifts to see if the two sets of clusters match. I include this here because in part such a search can be done in “real-time” whilst the speaker is presenting, and can then be offered as part of the discussion afterwards. It did not happen here because I was chairing the meeting, and hence concentrating entirely on proceedings!

- Stefan Grimme introduced his tight binding DFT method, an ultra fast procedure for computing large molecules and in passing noted the arrival of his D4 procedure (almost everyone currently uses D3 methods for this, including many of the results reported on this blog) for correcting for dispersion energies in molecules based on computed charge dependencies using the TBDFT methods. Thus we see dispersion as a property which is based on the wavefunction of the molecule, but still fast enough to accurately correct dispersion energies. He followed this with his automated procedures based on the TBDFT methods for computing full spin-spin coupled 1H NMR spectra of organic molecules. The core of this method is to recognise conformational and rotational freedoms and to compute the NMR properties for all identified isomers. These parameters are then Boltzmann averaged prior to computation of the final spin-coupled simulated frequency domain spectrum (rather than inverting this procedure by computing spin-coupled spectra of all rotamers and conformations and then averaging the spectral envelopes). This should widely revolutionise the interpretation of 1H NMR spectra by synthetic chemists.
- Another automated tool for synthetic chemists was presented by Jan Jenson, and can be seen here. It used MOPAC PM3 semi-empirical theory to compute relative proton affinities for a series of regioisomers as a prelude to predicting the position of aromatic electrophilic substitutions in heteroaromatic molecules. Try it out by putting a SMILES string into the box provided (e.g. COC1=CC=CC=C1) waiting a bit and seeing what the prediction is (it should be p- for the preceding example). During Q&A, a question was asked about the canonical “purity” of the SMILES (the one used in this tool comes from the Chemdraw program, which might not be identical to a SMILES for the same molecule produced by a different program), and whether an InChI descriptor might be better (also produced by Chemdraw, but perhaps a bit more canonical). Also asked was whether the prediction for an electrophile rather larger than a proton might not give good predictions? This one perhaps could be tested by readers, who could report back here?
- Walter Thiel completes the semi-empirical theme when he reported the new ODM2 method, the D now including dispersion. This is a powerful program, which includes e.g. full CI (configuration interaction + gradients) capability and is especially good for excited states, for dynamic simulations, and for combining these into dynamic photochemical simulations. This was applied to the chromophore in the famous “nanocar” in studying the dynamics of the photochemical rotation of the motor of the car (the thermally induced rotation was not studied). At the time that the nanocar caught my attention, I wondered about how the four independent molecular motors synchronised their rotations to allow the car to drive in a straight line. No doubt the answer is known, and if anyone reading this knows, please tell! It is probably a dynamics problem on four rotors (Walter reported just on one!).

Tags:chemical shifts, Chemistry, City: Munich, Jan Jenson, metal fall, Munich, Odile Eisenstein, Quotation, speaker, Stefan Grimme, Transition metal, Walter Thiel, World Association of Theoretical and cOmputational Chemists
Posted in Interesting chemistry, WATOC reports | 6 Comments »
Tuesday, May 23rd, 2017
This is taking place in the idyllic surroundings of the Niederwald forest, Rüdesheim, Germany. Here I highlight only aspects of the first three talks.
Martin Hicks introduced the conference with concepts such as the global public good. In the area of open access, he reminded us of the terms Platinum/Diamond open access, which are journals with no article processing charges (which can reach £5000 per article for some other OA journals), but which go with the challenge of ensuring that more gatekeepers of this global public good are needed to avoid being overwhelmed. He ended by asking us all to consider what the unit of knowledge is that needs to be shared.
The first talk was by Klaus Tochtermann who (amongst other topics) brought to our attention the Dutch GoFAIR initiative in the European Open Science Cloud, sub-divided into Go-train (i.e. data experts, who will build e.g. metadata tools) and Go-build (eco-systems: Internet of FAIR data and FAIR services). I think the message is that all organisations with chemistry labs should consider this as being an essential part of their future infrastructures.
Jeremy Frey’s title was Reducing Uncertainty: The Raison d’Être for Open Science who defined the fundamental principles of open science as transparency, capability and obtainability and encouraged data publication at source (as opposed to e.g. PhD writing up period) to ensure fidelity in the capture of metadata.
The team of Leah McEwen, Ian Bruno, Stuart Chalk and Richard Kidd told us about Global Data Initiatives and Chemistry and the need for social and technical bridges to enable open data sharing. I learnt for example that the IUPAC Gold book of chemical terms and definitions now has DOIs for each of the terms. Thus chemical shift (DOI: 10.1351/goldbook.C01036[1]), spectroscopy (DOI: 10.1351/goldbook.S05848[2]) and electron density function (DOI: 10.1351/goldbook.ET07024[3]). I will now to associate such links with e.g. deposited NMR data to help increase the semantics of the data (see e.g. DOI: 10.14469/hpc/1975).
Finally, a photo from the region, taken from the gondola adjacent to the venue and riding down to the small town on the banks of the Rhine.

References
- "chemical shift", The IUPAC Compendium of Chemical Terminology, 2014. https://doi.org/10.1351/goldbook.c01036
- "spectroscopy", The IUPAC Compendium of Chemical Terminology, 2014. https://doi.org/10.1351/goldbook.s05848
- "electron density function", The IUPAC Compendium of Chemical Terminology, 2014. https://doi.org/10.1351/goldbook.et07024
Tags:article processing charges, Bad Kreuznach, chemical shift, chemical terms, City: Rüdesheim, Country: Germany, Hesse, Hesse-Nassau, Ian Bruno, Jeremy Frey, Klaus Tochtermann, Leah McEwen, Martin Hicks, metadata tools, Niederwald, Niederwalddenkmal, Quotation, Rheingau-Taunus-Kreis, Rhine, Richard Kidd, Rüdesheim, Rüdesheim am Rhein, Rüdesheim an der Nahe, spectroscopy, States of Germany, Stuart Chalk, Technology/Internet
Posted in Chemical IT | 1 Comment »
Tuesday, October 4th, 2016
Peter Murray-Rust and I are delighted to announce that the 2016 award of the Bradley-Mason prize for open chemistry goes to Jan Szopinski (UG) and Clyde Fare (PG).
Jan’s open chemistry derives from a final year project looking at why atom charges derived from quantum chemical calculation of the electronic density represent chemical information well, but the electrostatic potential (ESP) generated from these charges is very poor and conversely charges derived from the computed electrostatic potential are incommensurate with chemical information (such as the electronegativity of atoms). He has developed a Python program called ‘repESP’ in which ‘compromise’ charges are generated which attempt to reconcile the physical world-view (fitting the ESP) with chemical insight provided by NPA (Natural Population Analysis). Jan was the main driver to making his code open source, “opening his supervisor’s eyes” to the various flavours of open source licences. To ensure that all subsequent improvements to the program remain available to anyone, the source code has been released under a ‘copyleft’ licence (GPL v3) and is maintained by Jan on GitHub, where Jan looks forward to helping new users and collaborating with contributors.
Clyde has made various contributions to opensource chemistry over the period of his PhD, with the focus mainly on utilities to improve quantum chemical research and the enhancement of a popular machine learning library with a method that has been successful in chemometrics, creation of an opensource channel for teaching chemists programming and data analysis and creation of a tool to help encourage open sourcing software development. Cclib is the most popular library for parsing quantum chemical data from output files and Clyde has contributed patches for the Atomic simulation environment which enables control of quantum chemical codes from a unified python interface. He was responsible for the construction of a computational chemistry electronic notebook published to github and which is now under active development by others as well. This aims to encapsulate computation chemical research projects, both for the sake of reproducibility and for the sake of organising and keeping track of quantum chemical research. Alongside this platform he created an enhanced Gaussian calculator for the Atomic Simulation Environment that enables automatic construction of ONIOM input files, also now under active development. He also made contributions to scikit learn, the most popular python machine learning framework, implementing a kernel for Kernel Ridge Regression that has become the most successful kernel for regression over molecular properties. He was part of the team that won the 2014 sustainable software conference prize for creation of the opensource healthchecker software as part of Sustain. He has argued for opensource as a platform for teaching resources and created the Imperial Chemistry github user account, which is now run by the department. Materials for the Imperial Chemistry Data Analysis and Programming workshops implemented as Python Notebooks are now available through this account and continue under active development.
Criteria for the award will include judging the submission on its immediate accessibility via public web sites, what is visible and re-usable in this way and of evidence of either community formation/engagement or re-use of materials by people other than the proposer.
Tags:Analytical chemistry, chemical information, chemical insight, Cheminformatics, Chemistry, Chemometrics, Clyde Fare, Company: GitHub, computation chemical research projects, computational chemistry, computing, Cross-platform software, driver, GitHub, Jan Szopinski, machine learning, open sourcing software development, opensource healthchecker software, Peter Murray-Rust, public web sites, Python, quantum chemical calculation, quantum chemical codes, quantum chemical data, quantum chemical research, Quotation, Server & Database Software, simulation, Software, supervisor, sustainable software conference prize, Technology/Internet
Posted in Bradley-Mason Prize for Open Chemistry | No Comments »