Posts Tagged ‘Scientific Journal’

Re-inventing the anatomy of a research article.

Saturday, December 29th, 2018

The traditional structure of the research article has been honed and perfected for over 350 years by its custodians, the publishers of scientific journals. Nowadays, for some journals at least, it might be viewed as much as a profit centre as the perfected mechanism for scientific communication. Here I take a look at the components of such articles to try to envisage its future, with the focus on molecules and chemistry.

The formula which is mostly adopted by authors when they sit down to describe their chemical discoveries is more or less as follows:

  1. An introduction, setting the scene for the unfolding narrative
  2. Results. This is where much of the data from which the narrative is derived is introduced. Such data can be presented in the form of:
    • Tables
    • Figures and schemes
    • Numerical and logical data embedded in narrative text
  3. Discussion, where the models constructed from the data are illustrated and new inferences presented. Very often categories 2 and 3 are conflated into one single narrative.
  4. Conclusions, where everything is brought together to describe the essential aspects of the new science.
  5. Bibliography, where previous articles pertinent to the narrative are listed.

In the last decade or so, the management of research data has developed as a field of its own, with three phases:

  1. Setting out a data management plan at the start of the project, often a set of aspirations together with putative actions,
  2. the day-to-day management of the data as it emerges in the form of an electronic laboratory notebook (ELN),
  3. the publication of selected data from the ELN into a repository, together with the registration of metadata describing the properties of the data.

In the latter category, item 8 can be said to be a game-changer, a true disruptive influence on the entire process. The key aspect is that it constitutes independent publication of data to sit alongside the object constructed from 1-5. More disruption emerges from the open citations project, whereby category 5 above can be released by publishers to adopt its own separate existence. So now we see that of the five essential anatomic components of a research article, two are already starting to achieve their own independence. Clearly the re-invention of the anatomy of the research article is well under way already.

Next I take a look at what sorts of object might be found in category 8, drawing very much on our own experience of implementing 7 and 8 over the last twelve years or so. I start by observing that in 2 above, figures are perhaps the object most in need of disruptive re-invention. In the 1980s, authors were much taken by the introduction of colour as a means of conveying information within a figure more clearly; although the significant costs then had to be borne directly by these authors (and with a few journals this persists to this day). By the early 1990s, the introduction of the Web[1] offered new opportunities not only of colour but of an extra dimension (or at least the illusion of one) by means of introducing interactivity for three-dimensional models. Some examples resulting from combining figures from category 2 with 8 above are listed in the table below.

Examples of re-invented data objects from category 2
Example Object title Object DOI Article DOI
1 Figure 9. Catalytic cycle involving one amine …etc. 10.14469/hpc/1854 10.1039/C7SC03595K
2 FAIR Data Figure. Mechanistic insights into boron-catalysed direct amidation reactions 10.14469/hpc/4919 10.1039/C7SC03595K
3 FAIR Data table. Computed relative reaction free energies (kcal/mol-1) of Obtusallene derived oxonium and chloronium cations 10.14469/hpc/1248 10.1021/acs.joc.6b02008
4 (raw) NMR data for Epimeric Face-Selective Oxidations … 10.14469/hpc/1267 10.1021/acs.joc.6b02008
5 Bibliography 10.14469/hpc/1116 10.1021/acs.joc.6b02008

Example 1 illustrates how a figure from category 2 above can be augmented with active hyperlinks specifying the DOI of the data in category 8 from which the figure is derived, thus creating a direct and contextual connection between the research article and the research data it is based upon. These links are embedded only in the Acrobat (PDF) version of the article as part of the production process undertaken by the journal publisher. Download Figure 9 from the link here and try it for yourself or try the entire article from the journal, where more figures are so enhanced.

Example 2 takes this one stage further. The hyperlinks in the published figure in example 1 were embedded in software capable of resolving them, namely a PDF viewer. But that is all that this software allows. By relocating the hyperlink into a Web browser instead, one can add further functionality in the form of Javascripts perhaps better described as workflows (supported by browsers but not supported by Acrobat). There are three such workflows in example 2.

  • The first uses an image map to associate a region of the figure data object defined by a DOI.
  • The second interrogates the metadata specifically associated with the DOI (the same DOIs that are seen in the figure itself) to see if there is any so-called ORE metadata available (ORE= Object Re-use and Exchange). If there is, it uses this information to retrieve the data itself and pass it through to
  • the third workflow represented by a set of JavaScripts known as JSmol. These interpret the data received and construct an interactive visual 3D molecular model representing the retrieved data.

All this additional workflowed activity is implemented in a data repository. It is not impossible that it could also be implemented at the journal publisher end of things, but it is an action that would have to be supported by multiple publishers. Arguably this sort of enhancement is far better suited and more easily implemented by a specialised data publisher, i.e. a data repository.

Example 3 does the same thing for a table.

Example 4 enhances in a different manner. Conventionally NMR data is added to the supporting information file associated with a journal article, but such data is already heavily processed and interpreted. The raw instrumental data is never submitted to the journal and is pretty much always possibly only available by direct request from the original researchers (at least if the request is made whilst the original researchers are still contactable!). The data repository provides a new mechanism for making such raw instrumental (and indeed computational) data an integral part of the scientific process.

Example 5 shows how a bibliography can be linked to a secondary bibliography (citations 35 and 36 in this example in the narrative article) and perhaps in the future to Open Citations semantic searches for further cross references.

So by deconstructing the components of the standard scientific article, re-assembling some of them in a better-suited environment and then linking the two sets of components to each other, one can start to re-invent the genre and hopefully add more tools for researchers to use to benefit their basic research processes. The scope for innovation seems considerable. The issue of course is (a) whether publishers see this as a viable business model or whether they instead wish to protect their current model of the research article and whether (b) authors wish to undertake the learning curve and additional effort to go in this direction. As I have noted before, the current model is deficient in various ways; I do not think it can continue without significant reinvention for much longer. And I have to ask that if reinvention does emerge, will science be the prime beneficiary?

References

  1. H.S. Rzepa, B.J. Whitaker, and M.J. Winter, "Chemical applications of the World-Wide-Web system", Journal of the Chemical Society, Chemical Communications, pp. 1907, 1994. https://doi.org/10.1039/c39940001907

Ten years on: Jmol and WordPress.

Wednesday, May 16th, 2018

Ten years are a long time when it comes to (recent) technologies. The first post on this blog was on the topic of how to present chemistry with three intact dimensions. I had in mind molecular models, molecular isosurfaces and molecular vibrations (arguably a further dimension). Here I reflect on how ten years of progress in technology has required changes and the challenge of how any necessary changes might be kept “under the hood” of this blog.

That first post described how the Java-based applet Jmol could be used to present 3D models and animations. Gradually over this decade, use of the Java technology has become more challenging, largely in an effort to make Web-page security higher. Java was implemented into web browsers via something called Netscape Plugin Application Programming Interface  or NPAPI, dating from around 1995. NPAPI has now been withdrawn from pretty much all modern browsers. Modern replacements are based on JavaScript, and the standard tool for presenting molecular models, Jmol has been totally refactored into JSmol. Now the challenge becomes how to replace Jmol by JSmol, whilst retaining the original Jmol Java-based syntax (as described in the original post). Modern JSmol uses its own improved syntax, but fortunately one can use a syntax converter script Jmol2.js which interprets the old syntax for you. Well, almost all syntax, but not in fact the variation I had used throughout this blog, which took the form:

<img onclick=”jmolApplet([450,450],’load a-data-file;spin 3;’);” src=”static-image-file” width=”450″ /> Click for 3D structure

This design was originally intended to allow browsers which did not have the Java plugin installed to default to a static image, but that clicking on the image would allow browsers that did support Java to replace (in a new window) the static image with a 3D model generated from the contents of a-data-file. The Jmol2.js converter script had not been coded to detect such invocations. Fortunately Angel came to my rescue and wrote a 39 line Javascript file that does just that (my Javascript coding skills do not extend that far!). Thanks Angel!!

In fact I did have to make one unavoidable change, to;

<img onclick=”jmolApplet([450,450],’load a-data-file;spin 3;’,’c1′);” src=”image-file” width=”450″ /> Click for 3D structure

to correct an error present in the original. It manifests when one has more than one such model present in the same document, and this necessitates that each instance has a unique name/identifier (e.g. c1). So now, in the WordPress header for the theme used here (in fact the default theme), the following script requests are added to the top of each page, the third of which is the new script.

<script type=”text/javascript” src=”JSmol.min.js”></script>
<script type=”text/javascript” src=”js/Jmol2.js”></script>
<script type=”text/javascript” src=”JmolAppletNew.js”></script>

The result is e.g.

Click for 3D

Click for 3D structure of GAVFIS

Click for 3D

Click for 3D interaction

This solution unfortunately is also likely to be unstable over the longer term. As standards (and security) evolve, so invocations such as onclick= have become considered “bad practice” (and may even become unsupported). Even more complex procedures will have to be devised to keep up with the changes in web browser behaviour and so I may have to again rescue the 3D models in this blog at some stage! Once upon a time, the expected usable lifetime of e.g. a Scientific Journal (print!) was a very long period (>300 years). Since ~1998 when most journals went online, that lifetime has considerably shortened (or at least requires periodic, very expensive, maintenance). For more ambitious types of content such as the 3D models discussed here, it might be judged to be <10 years, perhaps much less before the maintenance becomes again necessary. Sigh!


At the time of writing, WaterFox is one of the few browsers to still support it. An early issue with using Javascript instead of Java was performance. For some tasks, the former was often 10-50 times slower. Improvements in both hardware and software have now largely eliminated this issue. Thus using Jquery.