Posts Tagged ‘Digital Object Identifier’

A search of some major chemistry publishers for FAIR data records.

Friday, April 12th, 2019

In recent years, findable data has become ever more important (the F in FAIR). Here I test that F using the DataCite search service.

Firstly an introduction to this service. This is a metadata database about datasets and other research objects. One of the properties is relatedIdentifier which records other identifiers associated with the dataset, being say the DOI of any published article associated with the data, but it could also be pointers to related datasets.

One can query thus:

  1. https://search.datacite.org/works?query=relatedIdentifiers.relatedIdentifier:*
    which retrieves the very healthy looking 6,179,287 works.
  2. One can restrict this to a specific publisher by the DOI prefix assigned to that publisher:
    ?query=relatedIdentifiers.relatedIdentifier:10.1021*
    which returns a respectable 210,240 works.
  3. It turns out that the major contributor to FAIR currently are crystal structures from the CCDC. One can remove them from the search to see what is left over:
    ?query=(relatedIdentifiers.relatedIdentifier:10.1021*)+NOT+(identifier:*10.5517*) 
    and one is down to 14,213 works, of which many nevertheless still appear to be crystal structures. These may be links to other crystal datasets.

I have performed searches 2 and 3 for some popular publishers of chemistry (the same set that were analysed here).

Publisher Search 2 Search 3
ACS 210,240 14,213
RSC 138,147 1,279
Elsevier 185,351 56,373
Nature 12,316 8,104
Wiley 135,874 9,283
Science 3,384 2,343

These publishers all have significant numbers of datasets which at least accord with the F of FAIR. A lot of data sets may not have metadata which in fact points back to a published article, since this can be something that has to be done only when the DOI of that article appears, in other words AFTER the publication of the dataset. So these numbers are probably low rather than high.

How about the other way around? Rather than datasets that have a journal article as a related identifier, we could search for articles that have a dataset as a related identifier?

  1. ?query=(identifier:*10.1039*)+AND+(relatedIdentifiers.relatedIdentifier:*)
    returns rather mysterious nothing found. It might also be that there is no mapping of this search between the CrossRef and DataCite metadata schemas.
  2. And just to show the searches are behaving as expected:
    ?query=(relatedIdentifiers.relatedIdentifier:10.1021*)+AND+(identifier:*10.5517*)
    returns 196,027 works.

It will also be of interest to show how these numbers change over time. Is there an exponential increase? We shall see.

Finally, we have not really explored adherence to eg the AIR of FAIR.  That is for another post.

PIDapalooza 2018. A conference like no other!

Tuesday, January 23rd, 2018

Another occasional conference report (day 1). So why is one about “persistent identifiers” important, and particularly to the chemistry domain?

The PID most familiar to most chemists is the DOI (digital object identifier). In fact there are many; some 60 types have been collected by ORCID (themselves purveyors of researcher identifiers). They sometimes even have different names; in life sciences they tend to be known instead as accession numbers. One theme common to many (probably not all) is that they represent sources of metadata about the object being identified. Further information if which allows you (or a machine) to decide if acquiring the full object is worthwhile. So in no particular order, here are some of the things I learnt today.

  1. Mark Hahnel noted the recent launch of the Dimensions resource which links research data with other research activities; I have not yet had a chance to learn its capabilities, but it seems an interesting alternative to other stalwarts such as eg Google Scholar etc.

    You can try this example: https://app.dimensions.ai/discover/publication?search_text=10.6084&search_type=kws&full_search=true which retrieves articles in which the data repository with prefix 10.6084 (Figshare) is cited. Try also the prefix 10.14469 which is the Imperial College repository.

  2. Andy Mabbett talked about the deployment and use of persistent identifiers (the Q numbers) in Wikidata, which increasingly underpin the basis for the various flavours of Wikipedia. He also noted their use of some 50 different identifiers.
  3. Johanna McEntyre noted some 5M published articles in life sciences which reference 1M+ ORCID identifiers, easily the domain with the fastest uptake of this type. Also noted was the new FREYA project; aiming to connect open identifiers for discovery, access and use of research resources.
  4. Tom Gillespie talked about RRID, or Research Resource Identifiers. Included in this are hardware, including instruments and with around 6000 RRIDs systematized so far. They argue this area promotes both the A and I of FAIR (accessible and inter-operable). Of course A and I mean many things to many people.
  5. Several other presentations talked about the finer detail of metadata, such as sub-classifications into e.g. descriptive/admin/technical, but I did rather miss demos showing how search queries of such fine-grained metadata could be constructed.

Apart from the presentations themselves, PIDapalooza is unusual for some other activities. Thus you could go get your PIDnails done, with a selection of 8 or so tasteful logos to choose from. There will be tattoos tomorrow (this is a conference for younger people after all). I may grab a photo or two to provide evidence!

 

Challenges in reliably representing the chemistry of crystal structures.

Monday, May 29th, 2017

The title here is taken from a presentation made by Ian Bruno from CCDC at the recent conference on Open Science. It also addresses the theme here of the issues that might arise in assigning identifiers for any given molecule.

The structure was represented as shown[1] by the original authors, in which the bonding from S to Sn is indicated with both solid lines (a bond) and dotted lines (an “interaction”).

Why would this matter? Well, to enable any entry in the Cambridge structure database as findable (the F of FAIR) it has to be given a unique identifier. There are in general three such identifiers assigned by the CCDC:

  1. The Refcode, in this case XONHIS. These six or seven letter codes are historically the oldest, and started off at least with an attempt if possible to assign some semantic inference from the name, even if only occasionally. 
  2. The CCDC deposition number, in this case 650011. This is the number that an author will receive immediately upon deposition, and you often find these identifiers quoted in supporting information files
  3. The DOI (digital object identifier), in this case 10.5517/ccptd3z, which can be used to view the structure even if access to the full CSD is not available to the user. In that sense, the DOI is the FAIRest of the first three of these identifiers.
  4. However, CCDC reported that they are considering adding a 4th very common identifier, based on the InChI (International chemical identifier), which comes as a full string and with the structure of the molecule at least in part inferrable from it, together with  a shortened (almost) unique string which has the advantage of being “Googlable”. Both are helpfully FAIR.

It is this 4th identifier that is at issue here. InChIs are derived from atom connection tables; you need to define all bonds present in the molecule. And it is here that the dotted “bond”/”interaction” above becomes a problem. This is the representation shown in the CSD database, which reveals that all the Sn…S interactions are classified as “bonds”, along with some creative(!) representations of the C…S bonds.

So the InChI will very much depend on whether all the Sn…S contacts are termed as bonds or as interactions. To help clarify that, it is useful to show the typical range of lengths of such contacts. Below is a simple search for all Sn and S systems where the pair are either close in space (< 3.5Å) or have a bond specified between the two atoms.

The main cluster occurs at ~2.5Å, but there is some evidence of a second peak at about 3.0Å. The third distribution up to 3.5Å is probably a continuum of very weak dispersion interaction, which most molecules exhibit. The values for XONHIS are 2.521 and 2.996Å, which match the two clusters above.

So perhaps a quantum calculation can shed some light (DOI: 10.14469/hpc/2593)? The values on the right are the optimised bond lengths which are pretty similar to the crystal structure. On the left are the calculated Wiberg bond orders (B3LYP+D3BJ/Def2-TZVPP/chloroform calculation). These reveal both “bonds” have an order less than 1. The value of ~0.6 is probably not contentious, but it does graphically show that when a compound is indexed as having a “single bond” between two atoms, the quantitative bond order may be substantially less. What however would one make of a bond order of 0.214? Should it be classified as a bond, albeit a much weaker one than normal? Or should it instead simply be a rather strong “interaction” which is not classified as a bond? And perhaps one should have in mind the question “how sensitive is this result to the quantum mechanical procedure used?”

Why does this distinction matter? Well, the InChI algorithm is based on simple connectivity; are two atoms connected by a bond or not? There are no nuances here. At the moment, this decision can be made by an algorithm based on the distance between any atom pair (whether computed or measured), but more often I suspect it derives from a “molfile” which is often derived from a human-drawn representation using a structure drawing program. It does rather boil down to the individual preferences of the human drawing the molecule. Due in part to such uncertainties, it was estimated that only 22% of structures in the CSD can be used to generate a reliable InChI. Hydrogen bonds are almost always classified as non-bonds, which means their presence is rarely systematically flagged during the indexing of the structures. Organometallics often pose some of the greatest representational problems (there are many others).

I will end by observing another class of structure that I deal with, “reaction transition states”. As you might imagine these forms are full of pairs of atoms with ambiguous bond lengths and hence connectivity. We currently have no truly reliable method for assigning useful identifiers to them. So lots of challenges for the future then!

 

References

  1. R. Reyes-Martínez, R. Mejia-Huicochea, J.A. Guerrero-Alvarez, H. Höpfl, and H. Tlahuext, "Synthesis, heteronuclear NMR and X-ray crystallographic studies of two dinuclear diorganotin(IV) dithiocarbamate macrocycles", Arkivoc, vol. 2008, pp. 19-30, 2007. https://doi.org/10.3998/ark.5550190.0009.503

Revisiting (and maintaining) a twenty year old web page. Mauveine: The First Industrial Organic Fine-Chemical.

Thursday, February 2nd, 2017

Almost exactly 20 years ago, I started what can be regarded as the precursor to this blog. As part of a celebration of this anniversary, I revisited the page to see whether any of it had withstood the test of time. Here I recount what I discovered.

The site itself is at www.ch.ic.ac.uk/motm/perkin.html  and has the title “Mauveine: The First Industrial Organic Fine-Chemical” It was an application of an earlier experiment[1] to which we gave the title “Hyperactive Molecules and the World-Wide-Web Information System“. The term hyperactive was supposed to be a play on hyperlinking to the active 3D models of molecules built using their 3D coordinates. The word has another, more negative, association with food additives such as tartrazine – which can induce hyperactivity in children – and we soon discontinued the association. This page was cast as a story about a molecule local to me in two contexts; the first being that the discoverer of mauveine, W. H. Perkin, had been a student at what is now the chemistry department at Imperial College. The second was the realization that where we lived in west London was just down the road from Perkin’s manufacturing factory. Armed with (one of the first) digital cameras, a Kodak DC25, I took some pictures of the location and added them later to the web page. The page also included two sets of 3D coordinates for mauveine itself and alizarin, another dyestuff associated with the factory. These were “activated” using HTML to make use of the then very new Chime browser plugin; hence the term hyperactive molecule.

This first effort, written in December 1995, soon needed revision in several ways. I note that I had maintained the site in 1998, 2001, 2004 and 2006. This took the form of three postscripts to add further chemical context and more recent developments and in replacing the original Chime code for Java code to support the new Jmol software (Chime itself had been discontinued, probably around 2001 or possibly 2004). With the passage of a further ten years, I now noticed that the hyperactive molecules were no longer working; the original Jmol applet was no longer considered secure by modern browsers and hence deactivated. So I replaced this old code with the latest version (14.7.5 as JmolAppletSigned.jar) and this simple fix has restored the functionality. The coordinates themselves were invoked using the HTML applet tag, which amazingly still works (the applet tag had replaced an earlier one, which I think might have been embed?).  A modern invocation would be by using e.g. the JSmol Javascript based tool and so perhaps at some stage this code will indeed need further revision when the Java-based applet is permanently disabled.

You may also notice that the 3D coordinates are obtained from an XML document, where they are encoded using CML (chemical markup language[2]), which is another expression from the family that HTML itself comes from. That form may well last rather longer than earlier formats – still commonly used now – such as .pdb or .mol (for an MDL molfile). 

Less successful was the attempt to include buttons which could be used to annotate the structures with highlights. These buttons no longer work and will have to be entirely replaced in the future at some stage.

The final part of the maintenance (which I had probably also done with the earlier versions) was to re-validate the HTML code. Checking that a web page has valid HTML was always a behind-the-scenes activity which I remember doing when constructing the ECTOC conferences also back in 1995 and doing so probably does prolong the longevity of a web page. This requires “tools-of-the-trade” and I use now (and indeed did also back in 1995 or so) an industrial strength HTML editor called BBedit. To this is added an HTML validation tool, the installation of which is described at https://wiki.ch.ic.ac.uk/wiki/index.php?title=It:html5 I re-ran this again and so this 2017 version should be valid for a little while longer at least. The page itself now has not just a URL but a persistent version called a DOI (digital object identifier), which is 10.14469/hpc/2133[3]. In theory at least, even if the web server hosting the page itself becomes defunct, the page could – if moved – be found simply from its DOI. The present URL-based hyperlink of course is tied to the server and would not work if the server stopped serving.

To complete this revisitation, I can add here a recent result. Back in 1995, I had obtained the 3D coordinates of mauveine using molecular modelling software (MOPAC) together with a 2D structure drawing package (ChemDraw) because no crystal structure was available. Well, in 2015 such structures were finally published.[4] Twenty years on from the original “hyperactive” models, their crystal structures can be obtained from their assigned DOI, much in the same manner as is done for journal articles: Try DOI: 10.5517/CC1JLGK4[5] or DOI: 10.5517/CC1JLGL5[6].

At some stage, web archaeology might become a fashionable pursuit. Twenty year old Web pages are actually not that common and it would be of interest to chart their gradual decay as security becomes more important and standards evolve and mature. One might hope that at the age of 100, they could still be readable (or certainly rescuable). During this period, the technology used to display 3D models within a web page has certainly changed considerably and may well still do so in the future. Perhaps I will revisit this page in 2037 to see how things have changed!


The old code can still be seen at www.ch.ic.ac.uk/motm/perkin-old.html

It should really be postscript 4.

References

  1. O. Casher, G.K. Chandramohan, M.J. Hargreaves, C. Leach, P. Murray-Rust, H.S. Rzepa, R. Sayle, and B.J. Whitaker, "Hyperactive molecules and the World-Wide-Web information system", Journal of the Chemical Society, Perkin Transactions 2, pp. 7, 1995. https://doi.org/10.1039/p29950000007
  2. P. Murray-Rust, and H.S. Rzepa, "Chemical Markup, XML, and the Worldwide Web. 1. Basic Principles", Journal of Chemical Information and Computer Sciences, vol. 39, pp. 928-942, 1999. https://doi.org/10.1021/ci990052b
  3. H. Rzepa, "Molecule of the month: Mauveine.", Imperial College London, 2017. https://doi.org/10.14469/hpc/2133
  4. M.J. Plater, W.T.A. Harrison, and H.S. Rzepa, "Syntheses and Structures of Pseudo-Mauveine Picrate and 3-Phenylamino-5-(2-Methylphenyl)-7-Amino-8-Methylphenazinium Picrate Ethanol Mono-Solvate: The First Crystal Structures of a Mauveine Chromophore and a Synthetic Derivative", Journal of Chemical Research, vol. 39, pp. 711-718, 2015. https://doi.org/10.3184/174751915x14474318419130
  5. Plater, M. John., Harrison, William T. A.., and Rzepa, Henry S.., "CCDC 1417926: Experimental Crystal Structure Determination", 2016. https://doi.org/10.5517/cc1jlgk4
  6. Plater, M. John., Harrison, William T. A.., and Rzepa, Henry S.., "CCDC 1417927: Experimental Crystal Structure Determination", 2016. https://doi.org/10.5517/cc1jlgl5

Single Figure (nano)publications, reddit AMAs and other new approaches to research reporting

Wednesday, August 5th, 2015

I recently received two emails each with a subject line new approaches to research reporting. The traditional 350 year-old model of the (scientific) journal is undergoing upheavals at the moment with the introduction of APCs (article processing charges), a refereeing crisis and much more. Some argue that brand new thinking is now required. Here are two such innovations (and I leave you to judge whether that last word should have an appended ?).

To set the scene for the first, I will quote the abstract: “The single figure publication is a novel, efficient format by which to communicate scholarly advances. It will serve as a forerunner of the nano-publication, a modular unit of information critical for machine-driven data aggregation and knowledge integration[1] The kernel of this suggestion is (again I quote) “We offer the idea of the micro-publication unit, the single figure publication (SFP), to provide scholars with a real-world, manageable method to inform research.” I was struck by the overlap between this suggestion and the one you may find on many of the posts on this blog, where what I refer to as FAIR Data is assigned a digital object identifier (DOI) and included in the citation lists at the end of the post. The key phrase in the above abstract is machine-driven data aggregation and knowledge, although the article does not really go into any mechanisms for easily achieving this. It is my argument that the act of assigning a DOI carries with it the association that there is machine searchable metadata which can be retrieved and used for the aggregation and knowledge mining. The authors of this article, Do and Mobley, advocate adoption of nanopublications defined by inclusion of just a single figure (notably, not a table of results!) and some accompanying context which they claim would reduce the unit of publication to a more tractable size. This does raise the question of whether science needs more publications (in chemistry alone there are said to be more than a million published each year) or whether we should instead be concentrating our efforts on improving the data side of things by increasing its semantic content and formalising its structures, its preservation and curation. I certainly argue that far too little effort has been poured into these latter activities. You only have to look at the typical SI (supporting information) associated with many chemistry articles to realise that in many cases they are still hardly fit for purpose. There is one concept introduced by Do and Mobley that also deserves mention. Their nanopublications are structured to be read by machines, not people. They will therefore not be refereed by people (my inference). They do not really discuss how else the quality will be assessed, but of course if you treat their nanopublication as essentially FAIR data, then it does become possible to develop methods of machine refereeing.

The second email alerted me to an article[2] in the Winnower, a forum that offers a bridge between “traditional scholarly publishing tools to traditional and non-traditional scholarly outputs—because scholarly communication doesn’t just happen in scholarly journals“. Here, the concept of scholarly communication is extended to the New Reddit Journal of Science and introduces the concept pioneered by reddit of the AMA, or “ask me anything” environment. I occasionally publish some of the posts on this blog to the Winnower, receiving in return the increasingly ubiquitous DOI. I have also occasionally quoted these DOIs in articles submitted to conventional chemistry journals. What we see now is the propagation of a Winnower DOI on to e.g. https://www.reddit.com/r/science/ where anyone can post a question related to the original research reporting. I must state that I do have some reservations about this. Whilst it is likely that the majority of traditional scholarly reporting is likely to receive no AMAs (just as a very high proportion of research articles attract few if any citations in other articles over a period of decades), it is also likely that the quality of posted AMAs may turn out to be very low. At which point the original researcher has to make a judgement as to whether to devote any of their increasingly precious and fragmented time to answering them. And if few if any answers are posted in response to an AMA, the system seems unlikely to flourish.

But what we see here are two serious attempts to develop new approaches to research reporting, and not doubt others will emerge. To quote Yogi Berra, the future is not what it used to be.


Anyone can also post to this blog to ask similar questions. But note that associating an ORCID with such comments is highly recommended. I do not think that reddit currently supports ORCID, but  I would argue if the intent is serious, it certainly should.

References

  1. L. Do, and W. Mobley, "Single Figure Publications: Towards a novel alternative format for scholarly communication", F1000Research, vol. 4, pp. 268, 2015. https://doi.org/10.12688/f1000research.6742.1
  2. . RobustTempComparison, and . r/Science, "Science AMA Series: Climate models are more accurate than previous evaluations suggest. We are a bunch of scientists and graduate students who recently published a paper demonstrating this, Ask Us Anything!", The Winnower, . https://doi.org/10.15200/winn.143871.12809

The status of blogging as scientific communication.

Sunday, May 10th, 2015

Blogging in chemistry remains something of a niche activity, albeit with a variety of different styles. The most common is commentary or opinion on the scientific literature or conferencing, serving to highlight what their author considers interesting or important developments. There are even metajournals that aggregate such commentaries. The question therefore occasionally arises; should blogs aspire to any form of permanence, or are they simply creatures of their time.

In this blog, as you might have noticed, I take a slightly different tack. One focus is on exploring, perchance in more detail than might be found in the standard text-book, some of the dogmas of chemistry.  It happens that occasionally when writing a conventional scientific article, I find myself wishing to cite such sources. This of itself raises interesting issues (such as should one cite what might be considered material that has not been peer-reviewed in the conventional manner) but the most important would be whether one should cite evanescent sources. So this brings me to the topic of this post; can a post be archived in a sense that achieves a greater perceived permanence? Nowadays, permanence tends to be associated with a digital object identifier, or DOI. So one can boil this question down to: can one assign a DOI to a blog post?

Well, if you came to this post via the main page, you may indeed have spotted that some do have a DOI. This is an experiment I have been running with an organisation known as The Winnower, who provide a WordPress extension to archive any individual post and assign it a (CrossRef) DOI. The archived version also includes metadata that points back to the original post.

This archival is not yet perfect. In its current state it does not (yet) capture:

  1. Comments on any post (which could be considered a form of open peer review)
  2. Enhancements such as the links to Jmol/JSmol that I associate with some of the posts
  3. The ORCID identifier, which adds a layer of additional provenance.
  4. We of course do not yet know what the lifetime expectancy archiving organisations will achieve (could it be 100 years for example?).

It does capture the citation list when there is one, and since I include citations to my data sources (for the computations performed in support of many of my posts) the archive is I think accordingly rendered more valuable.

What brought this post on? Well, the Journal of Chemical Education has put out a call for articles on chemical information for a special issue. I decided to contribute by aggregating some of my teaching related posts; indeed individually could perhaps have only appeared here as opposed to a more traditional means of dissemination such as the JCE journal itself. And I wanted to cite them using the DOI rather than simply the URL of the post. It’s an experiment, and one which I do not yet know if anyone else will try. That in some ways is the point of a blog; it is an interesting experimental vehicle!

One molecule, one identifier: Viewing molecular files from a digital repository using metadata standards.

Monday, September 8th, 2014

In the beginning (taken here as prior to ~1980) libraries held five-year printed consolidated indices of molecules, organised by formula or name (Chemical abstracts). This could occupy about 2m of shelf space for each five years. And an equivalent set of printed volumes from the Beilstein collection. Those of us who needed to track down information about molecules prior to ~1980 spent many an afternoon (or indeed a whole day) in the libraries thumbing through these weighty volumes. Fast forward to the present, when (closed) commercial databases such as SciFinder, Reaxys and CCDC offer information online for around 100 million molecules (CAS indicates it has 89,506,154 today for example). These have been joined by many open databases (e.g. PubChem). All these sources of molecular information have their own way of accessing individual entries, and the wonderful program Jmol (nowadays JSmol) has several of these custom interfaces programmed in. Here I describe some work we have recently done[1] on how one might generalise access to an individual molecule held in what is now called a digital data repository.

Such repositories are gradually becoming more common. Unlike most (all?) of the bespoke molecular repositories noted above, metadata (XML) resourcemap standards have been developed[2] for data repositories to enable rich and open searches and to help in the discoverability of individual entries (e.g. OAI-ORE). Each dataset is characterised by a DOI (digital object identifier), just like individual articles found in a conventional journal. However, there is an issue in quoting just a conventional DOI to describe a dataset. The DOI points to what is called the article landing page in the journal. A landing page which by and large is meant to be navigated by a human. To get a flavour for how this works (or more accurately does not work) for data, visit this DOI[3] for an entry in the CCDC crystal database noted above (and about which I have previously blogged). In essence, a human is needed to complete the requested information in order to proceed to retrieving the data. Data, I contend here, should not need a landing page. It can benefit from being passed straight on to e.g. a visualising program such as JSmol. So a mechanism is needed to encapsulate any bespoke (and potentially changeable) access path to the data by expressing it instead in standard metadata form.

In our first solution to this issue, and the one illustrated here, we used a standard known as 10320/loc[2]. A datafile need only be specified by its DOI (or more generically, its handle) to be recovered from the data repository; no landing page need be involved (and no human need ponder what next to do with the data).

  1. First, let me reference a molecule (as it happens the one described in the preceding post), using the normal invocation[4]. This will take you to a conventional landing page.
  2. The next example is the same dataset, but this time with the landing page replaced by a Javascript/JSmol wrapping. This is achieved using a utility which is itself packaged up and placed on a repository (shortdoi: vjj)[5], and which is embedded here for you to try out. If you want the technical detail, read about it here.[1]

There is more to come. But you will have to wait for part 2!

References

  1. M.J. Harvey, N.J. Mason, and H.S. Rzepa, "Digital Data Repositories in Chemistry and Their Integration with Journals and Electronic Notebooks", Journal of Chemical Information and Modeling, vol. 54, pp. 2627-2635, 2014. https://doi.org/10.1021/ci500302p
  2. "DOI Name 10320/loc Values"http://doi.org/10320/loc
  3. Jana, Anukul., Omlor, Isabell., Huch, Volker., Rzepa, Henry S.., and Scheschkewitz, David., "CCDC 967887: Experimental Crystal Structure Determination", 2014. https://doi.org/10.5517/cc11h55w
  4. H.S. Rzepa, N. Mason, and M J Harvey., "Retrieval and display of Gaussian log files from a digital repository", 2014. https://doi.org/10.6084/m9.figshare.1164282

The blog post as a scientific article: citation management

Monday, February 27th, 2012

Sometimes, as a break from describing chemistry, I take to describing the (chemical/scientific) creations behind the (WordPress) blog system. It is fascinating how there do seem increasing signs of convergence between the blog post and the journal article. Perhaps prompted by transclusion of tools such as Jmol and LaTex into Wikis and blogs, I list the following interesting developments in both genres.

  1. Improved equation display for Chemistry Central articles using MathJax  This is a way of rendering equations in the pages of both a Blog  and a journal article. This blog is now so empowered, although in fact I employ few equations on these pages.
  2. Citation management and meta-data gathering. This blog plugin takes the form of a numbered citation[1] as here, and which converts the specified DOI to a listing at the bottom of the post in the manner of a conventional scientific article (conventional document citation managers such as EndNote do this as well). It is actually much more than that, since the plugin automatically uses the CrossRef API to retrieve metadata for the quoted Digital Object Identifier (DOI), thus enhancing the metadata associated with the post and its discoverability. Dublin-Core is already present in the post as well as FOAF output, and I occasionally trawl using the Calais archive tagger (although this is not very good at finding chemistry tags).
  3. I installed Chemicalize a year or so ago. This scans the blog text for chemical terms, and adds a hover/popup image of structures it identifies (it is also responsible for the occasional doubled Gravatar image you may see here! Apologies!).
  4. I noted the addition of ChemDoodle to this blog previously. There may be newcomers which I need to track down to this type of non-Java based molecular rendering.

So you can see that building a chemical/science-savvy blog can be great fun! It is also significant that science/chemistry publishers are starting to do this. I bring only one example to your attention, although this introduces a host of other issues that perhaps I should leave for another post.

References

  1. H.S. Rzepa, "The past, present and future of Scientific discourse", Journal of Cheminformatics, vol. 3, 2011. https://doi.org/10.1186/1758-2946-3-46