Henry Rzepa's blog

Tag: chemical information

The 2016 Bradley-Mason prize for open chemistry.

Peter Murray-Rust and I are delighted to announce that the 2016 award of the Bradley-Mason prize for open chemistry goes to Jan Szopinski (UG) and Clyde Fare (PG).

Jan’s open chemistry derives from a final year project looking at why atom charges derived from quantum chemical calculation of the electronic density represent chemical information well, but the electrostatic potential (ESP) generated from these charges is very poor and conversely charges derived from the computed electrostatic potential are incommensurate with chemical information (such as the electronegativity of atoms). He has developed a Python program called ‘repESP’ in which ‘compromise’ charges are generated which attempt to reconcile the physical world-view (fitting the ESP) with chemical insight provided by NPA (Natural Population Analysis). Jan was the main driver to making his code open source, “opening his supervisor’s eyes” to the various flavours of open source licences. To ensure that all subsequent improvements to the program remain available to anyone, the source code has been released under a ‘copyleft’ licence (GPL v3) and is maintained by Jan on GitHub, where Jan looks forward to helping new users and collaborating with contributors.

Clyde has made various contributions to opensource chemistry over the period of his PhD, with the focus mainly on utilities to improve quantum chemical research and the enhancement of a popular machine learning library with a method that has been successful in chemometrics, creation of an opensource channel for teaching chemists programming and data analysis and creation of a tool to help encourage open sourcing software development. Cclib is the most popular library for parsing quantum chemical data from output files and Clyde has contributed patches for the Atomic simulation environment which enables control of quantum chemical codes from a unified python interface. He was responsible for the construction of a computational chemistry electronic notebook published to github and which is now under active development by others as well. This aims to encapsulate computation chemical research projects, both for the sake of reproducibility and for the sake of organising and keeping track of quantum chemical research. Alongside this platform he created an enhanced Gaussian calculator for the Atomic Simulation Environment that enables automatic construction of ONIOM input files, also now under active development. He also made contributions to scikit learn, the most popular python machine learning framework, implementing a kernel for Kernel Ridge Regression that has become the most successful kernel for regression over molecular properties. He was part of the team that won the 2014 sustainable software conference prize for creation of the opensource healthchecker software as part of Sustain. He has argued for opensource as a platform for teaching resources and created the Imperial Chemistry github user account, which is now run by the department. Materials for the Imperial Chemistry Data Analysis and Programming workshops implemented as Python Notebooks are now available through this account and continue under active development.

Criteria for the award will include judging the submission on its immediate accessibility via public web sites, what is visible and re-usable in this way and of evidence of either community formation/engagement or re-use of materials by people other than the proposer.

October 4, 2016
The status of blogging as scientific communication.
Blogging in chemistry remains something of a niche activity, albeit with a variety of different styles. The most common is commentary or opinion on the scientific literature or conferencing, serving to highlight what their author considers interesting or important developments. There are even metajournals that aggregate such commentaries. The question therefore occasionally arises; should blogs aspire to any form of permanence, or are they simply creatures of their time.

In this blog, as you might have noticed, I take a slightly different tack. One focus is on exploring, perchance in more detail than might be found in the standard text-book, some of the dogmas of chemistry. It happens that occasionally when writing a conventional scientific article, I find myself wishing to cite such sources. This of itself raises interesting issues (such as should one cite what might be considered material that has not been peer-reviewed in the conventional manner) but the most important would be whether one should cite evanescent sources. So this brings me to the topic of this post; can a post be archived in a sense that achieves a greater perceived permanence? Nowadays, permanence tends to be associated with a digital object identifier, or DOI. So one can boil this question down to: can one assign a DOI to a blog post?

Well, if you came to this post via the main page, you may indeed have spotted that some do have a DOI. This is an experiment I have been running with an organisation known as The Winnower, who provide a WordPress extension to archive any individual post and assign it a (CrossRef) DOI. The archived version also includes metadata that points back to the original post.

This archival is not yet perfect. In its current state it does not (yet) capture:
1. Comments on any post (which could be considered a form of open peer review)
2. Enhancements such as the links to Jmol/JSmol that I associate with some of the posts
3. The ORCID identifier, which adds a layer of additional provenance.
4. We of course do not yet know what the lifetime expectancy archiving organisations will achieve (could it be 100 years for example?).
It does capture the citation list when there is one, and since I include citations to my data sources (for the computations performed in support of many of my posts) the archive is I think accordingly rendered more valuable.

What brought this post on? Well, the Journal of Chemical Education has put out a call for articles on chemical information for a special issue. I decided to contribute by aggregating some of my teaching related posts; indeed individually could perhaps have only appeared here as opposed to a more traditional means of dissemination such as the JCE journal itself. And I wanted to cite them using the DOI rather than simply the URL of the post. It’s an experiment, and one which I do not yet know if anyone else will try. That in some ways is the point of a blog; it is an interesting experimental vehicle!

Acknowledgments

This post has been cross-posted in PDF format at Authorea.
May 10, 2015
Data-round-tripping: wherein the future?

Moving (chemical) data around in a manner which allows its (automated) use in whichever context it finds itself must be a holy grail for all scientists and chemists. I posted earlier on the fragile nature of molecular diagrams making the journey between the editing program used to create them (say ChemDraw) and the Word processor used to place them into a context (say Microsoft office), via an intermediate storage area known as the clipboard. The round trip between the Macintosh (OS X) versions of these programs had been broken a little while, but it is now fixed! A small victory. This blog reports what happened when such a Mac-created Word document is sent to someone using Microsoft Windows as an OS (or vice versa).

As you might have guessed, the molecular diagram arrives largely dead, and not re-usable. Opening the .docx archive (it is nothing more than a zip file) reveals only a JPEG file residing inside. Nothing that can be chemically repurposed. If the reverse process is undertaken, of creating a chemdraw diagram, and pasting it into Word on Windows, one finds in the .docx two components; a bit-mapped image linked to an active object containing the data. Only the first of these is recognised if the file makes its way to a Macintosh; i.e. the same story, the data is again lost. So the bottom line is that Mac users and Windows users cannot, after all, exchange repurposable molecular diagrams using Word documents using this combination of programs. This is not good.

But let me remind what happened around 1993. The word processor was joined by a program called the Web browser. In 1996, the underlying content carrier, HTML, became XHTML (an instance of XML). Right from day 1 almost, such XHTML could, and frequently was repurposed. A memorable example is that search engines could use it to index the Web. The XHTML easily survived trips to and from clipboards. In 1996, CML joined HTML as a way of carrying chemical information capable of round-tripping without loss (if need be). There are other chemical XML languages in use nowadays, including CDXML used by the ChemDraw program. Word itself now uses XML (the x in .docx). So, after 14 years, why am I still describing the difficulties above? I am frankly at a loss to explain why there is still a need to write this post.

All is not entirely lost. The CML4Word approach is designed to enable (chemical) data round tripping from the outset. Although I do not yet know if the CML created and stored in the Word document using this mechanism is recognised anywhere outside of Word 2007 on Windows? If anyone can let me know of examples where such a CML-enabled Word document can be used in other environments, I would be very grateful (but not on OS X, as I know already).

And as I might have mentioned in the previous post on this topic, things may not however be getting better in that other carrier of information and data, the mobile phone/iPad, as exemplified by operating systems such as iOS or Android. Watch this space, as they say.

December 7, 2010

► Necessary Cookies Always Active

Necessary cookies enable essential site features like secure log-ins and consent preference adjustments. They do not store personal data.

► Functional Cookies Remark

Functional cookies support features like content sharing on social media, collecting feedback, and enabling third-party tools.

► Analytical Cookies Remark

Analytical cookies track visitor interactions, providing insights on metrics like visitor count, bounce rate, and traffic sources.

► Advertisement Cookies Remark

Advertisement cookies deliver personalized ads based on your previous visits and analyze the effectiveness of ad campaigns.