Posts Tagged ‘web technologies’

Curating a nine year old journal FAIR data table.

Monday, May 29th, 2017

As the Internet and its Web-components age, so early pages start to decay as technology moves on. A few posts ago, I talked about the maintenance of a relatively simple page first hosted some 21 years ago. In my notes on the curation, I wrote the phrase “Less successful was the attempt to include buttons which could be used to annotate the structures with highlights. These buttons no longer work and will have to be entirely replaced in the future at some stage.” Well, that time has now come, for a rather more crucial page associated with a journal article published more recently in 2009.[1]

The story started a few days ago when I was contacted by the learned society publisher of that article, noting they were “just checking our updated HTML view and wanted to test some of our old exceptions“. I should perhaps explain what this refers to. The standard journal production procedures involve receiving a Word document from authors and turning that into XML markup for the internal production processes. For some years now, I have found such passive (i.e. printable only) Word content unsatisfactory for expressing what is now called FAIR (Findable, accessible, inter-operable and re-usable) data. Instead, I would create another XML expression (using HTML), which I described as Interactive Tables and then ask the publisher to host it and add that as a further link to the final published article. I have found that learned society publishers have not been unwilling to create an “exception” to their standard production workflows (the purely commercial publishers rather less so!). That exceptional link is http://www.rsc.org/suppdata/cp/b8/b810301a/Table/Table1.html but it has now “fallen foul of the java deprecation“. 

Back in 2008 when the table was first created, I used the Java-based Jmol program to add the interactive component. That page, when loaded, now responds with the message:

This I must emphasise is nothing to do with the publisher, it is the Jmol certificate that has been revoked. That of itself requires explanation. Java is a powerful language which needs to be “sandboxed” to ensure system safety. But commands can be created which can access local file stores and write files out there (including potentially dangerous ones). So it started to become the practise to sign the Java code with the developer certificate to ensure provenance for the code. These certificates are time-expired and around 2015 the time came to renew it. Normally, when such a certificate is renewed, the old one is allowed to continue operation. On this occasion the agency renewing the certificate did not do this but revoked the old one instead (Certificate has been revoked, reason: CESSATION_OF_OPERATION, revocation date: Thu Oct 15 23:11:18 BST 2015). So all instances of Jmol with the old certificate now give the above error message. 

The solution in this case is easy; the old Jmol code (as JmolAppletSigned.jar) is simply replaced with the new version for which the certificate is again valid. But simply doing that alone would merely have postponed the problem; Java is now indeed deprecated for many publishers, which is a warning that it will be prohibited at some stage in the future.‡ So time to bite the bullet and remove the dependency on Java-Jmol, replacing it with JSmol which uses only JavaScript.

Changing published content is in general not allowed; one instead must publish a corrigendum. But in this instance, it is not the content that needs changing but the style of its presentation (following the principle of the Web of a clear-cut separation of style and content). So I set out to update the style of presentation, but I was keen to document the procedures used. I did this by commenting out non-functional parts of the style components of my original HTML document (as <!– comment –>) and adding new ones. I describe the changes I made below.

  1. The old HTML contained the following initialisation code: jmolInitialize(".","JmolAppletSigned.jar");jmolSetLogLevel('0'); which was commented out.
  2. New scripts to initialize instead JSmol were added, such as:
    <script src="JSmol.min.js" type="text/javascript"> </script>
  3. I added further scripts to set up controls to add interactivity.
  4. The now deprecated buttons had been invoked using a Jmol instance:  jmolButton('load "7-c2-h-020.jvxl";isosurface "" opaque; zoom 120;',"rho(r) H")
  5. which was replaced by the JSmol equivalent, but this time to produce a hyperlink rather than a button (to allow the greek ρ to appear, which it could not on a button): <a href="javascript:show_jmol_window();Jmol.script(jmolApplet0,'load 7-c2-020.jvxl;isosurface &quot;&quot; translucent;spin 3;')">ρ(r)</a>,
  6. Some more changes were made to another component of the table, the links to the data repository. Originally, these quoted a form of persistent identifier known as a Handle; 10042/to-800. Since the data was deposited in 2008, the data repository has licensed further functionality to add DataCite DOIs to each entry. For this entry,  10.14469/ch/775. Why? Well, the original Handle registration had very little (chemically) useful registered metadata, whereas DataCite allows far richer content. So an extra column was added to the table to indicate these alternate identifiers for the data.
  7. We are now at the stage of preparing to replace the Java applet at the publishers site with the Javascript version, along with the amended HTML file. The above link, as I write this post, still invokes the old Java, but hopefully it will shortly change to function again as a fully interactive table.
  8. I should say that the whole process, including finding a solution and implementing it took 3-4 hours work, of which the major part was the analysis rather than its implementation.

It might be interesting to speculate how long the curated table will last before it too needs further curation. There are some specifics in the files which might be a cause for worry, namely the so-called JVXL isosurfaces which are displayed. These are currently only supported by Jmol/JSmol. They were originally deployed because iso-surfaces tend to be quite large datafiles and JVXL used a remarkably efficient compression algorithm (“marching cubes”) which reduces their size ten-fold or more. Should JSmol itself become non-operational at some time in the (hopefully) far future (which we take to be ~10 years!) then a replacement for the display of JVXL will need to be found. But the chances are that the table itself will decay “gracefully”, with the HTML components likely to outlive most of the other features. The data repository quoted above has itself now been available for ~12 years and it too is expected to survive in some form for perhaps another 10. Beyond that period, no-one really knows what will still remain. 

You may well ask why the traditional journal model of using paper to print articles and which has survived some 350 years now, is being replaced by one which struggles to survive 10 years without expensive curation. Obviously, a 3D interactive display is not possible on paper. But one also hears that publishers are increasingly dropping printed versions entirely. One presumes that the XML content will be assiduously preserved, but re-working (transforming, as in XSLT) any particular flavour of XML into another publishers systems is also likely to be expensive. Perhaps in the future the preservation of 100% of all currently published journals will indeed become too expensive and we might see some of the less important ones vanishing for ever?


Nowadays it is necessary to configure your system or Web browser to allow even signed valid Java applets to operate. Thus in the Safari browser (which still allows Java to operate, other popular browsers such as Chrome and Firefox have recently removed this ability), one has to go to preferences/security/plugin-settings/Java, enter the URL of the site hosting the applet and set it to either “ask” (when a prompt will always appear asking if you want to accept the applet) or “on” when it will always do so. How much longer this option will remain in this browser is uncertain.

In the area of chemistry, an early pioneer was the Internet Journal of Chemistry, where the presentation of the content took full advantage of Web-technologies and was on-line only. It no longer operates and the articles it hosted are gone.

References

  1. H.S. Rzepa, "Wormholes in chemical space connecting torus knot and torus link π-electron density topologies", Phys. Chem. Chem. Phys., vol. 11, pp. 1340-1345, 2009. https://doi.org/10.1039/b810301a

Blasts from the past. A personal Web presence: 1993-1996.

Saturday, November 1st, 2014

Egon Willighagen recently gave a presentation at the RSC entitled “The Web – what is the issue” where he laments how little uptake of web technologies as a “channel for communication of scientific knowledge and data” there is in chemistry after twenty years or more. It caused me to ponder what we were doing with the web twenty years ago. Our HTTP server started in August 1993, and to my knowledge very little content there has been deleted (it’s mostly now just hidden). So here are some ancient pages which whilst certainly not examples of how it should be done nowadays, give an interesting historical perspective. In truth, there is not much stuff that is older out there!

  1. This page was written in May 1994 as a journal article, although it did have to be then converted into a Word document to actually be submitted.[1] Because it introduced hyperlinks to a chemical audience, we wanted to illustrate these in the article itself! Hence permission was obtained from the RSC for an HTML version to be “self-archived” on our own servers where the hyperlinks were supposed to work (an early example of Open Access publishing!). I say supposed because quite a few of them have now “decayed”. We were aware of course that this might happen, but back in 1994, no-one knew how quickly this would happen. What is interesting is that the HTML itself (written by hand then) has survived pretty well! I will leave you to decide how much the message itself has decayed.
  2. This HTML actually predates the above; it was written around November 1993 and represented the very first lecture notes I converted into this form (on the topic of NMR spectroscopy). A noteworthy aspect is the scarce use of colour images. At the start of 1994, the bandwidth available on our campus was pretty limited (the switches were 10 Mbps only) and a request went out to reduce the bit-depth of any colour images to 4-bits to help conserve that bandwidth! I rather doubt anyone took much notice however, and the policy was forgotten just a few months later.
  3. In 1996, I had two visitors to the group, Guillaume Cottenceau, a french undergraduate student, and Darek Bogdal, a Polish researcher who wanted to learn some HTML. Together they produced this, which was an interactive tutorial to accompany the NMR lecture notes previously mentioned. These pages introduce the Java applet (yes, it was very new in 1996), which Guillaume had written and which Darek then made use of. And hey, what do you know, the applet still works (although you might have to coerce your browser into accepting an unsigned applet).
  4. Here is a programming course that I had been running with Bryan Levitt for a few years, now recast into HTML web pages some time in 1994-5. This particular project I still hold dear, since it expanded upon the NMR lectures by getting the students to synthesize a FID (free induction decay) using the program they wrote, and then perform a Fourier Transform on it. I even encouraged students to present their results in HTML (I cannot now remember how many did). This link is to the computing facilities we offered students in 1994 for this project, ah those were the times! In 1996, the programming course was replaced by one on chemical information technologies, and here students were most certainly expected to write HTML. Some of the best examples are still available. And to illustrate how things happen in cycles, that course itself is now gone to be replaced by, yes, a programming course (but using Python, and not the original Fortran).
  5. In tracking down the materials for the programming course described above, I re-discovered something far older. It is linked here and is (some of) the Fortran source code I wrote as a PhD student in 1974 1972. So I will indulge in a short digression. My Ph.D. involved measuring rate constants, and the accepted method for analysing the raw kinetic data was using graph paper. For first order rate behaviour, this required one to measure a value at time=∞, which is supposed to be measured after ten half-lives. I was too impatient to wait that long, and worked out that a non-linear least squares analysis did not require the time=∞ value; indeed this value could be predicted accurately from the earlier measurements. So in 1974, I wrote this code to do this; no graph paper for me! Also for good measure is a least squares analysis of the Eyring equation. And you get proper standard deviations for your errors. In retrospect I should have commercialised this work, but in 1974, almost no-one paid money for software! What a change since then. I must try recompiling this code to see if it still works! And for good measure, here is a Huckel MO program I wrote in 1984 or earlier (I did compile this recently and found it works) and here is a little program for visualising atomic orbitals.
  6. In January 1994, I was asked to create a web page for the WATOC organisation. This certainly predated the web sites for e.g. the RSC, the ACS, indeed famous sites such as the BBC and Tesco (a large supermarket chain) which only started up in mid 1994. The WATOC site itself moved a few years ago.
  7. This is one of those wonderfully naive things I started in 1994, and which did not last long (in my hands). Nowadays, the concept lives on as MOOCs. Note again the almost complete expiry of the hyperlinks.
  8. This is a project we also started in 1994, Virtual reality[2],[3]. The idea was that if HTML was text-markup, VRML was going to be 3D markup. VRML itself never quite caught on, but it is having a new life as a 3D printing language!
  9. And by 1995, I felt confident enough in my ability to (edit) HTML, that we started a virtual conference in organic chemistry (we did four of them in the end). I remember the first one involved contributors sending me a Word version of their poster, and I did all the work in converting it into HTML. Such virtual conferences still run, but in truth most participants still prefer to travel long distances to go drink a beer with their chums, rather than hack HTML.

I am going to stop now, since this is far too much wallowing in the past. But at least all this stuff is not (yet) lost to posterity.

References

  1. H.S. Rzepa, B.J. Whitaker, and M.J. Winter, "Chemical applications of the World-Wide-Web system", Journal of the Chemical Society, Chemical Communications, pp. 1907, 1994. https://doi.org/10.1039/c39940001907
  2. O. Casher, and H.S. Rzepa, "Chemical collaboratories using World-Wide Web servers and EyeChem-based viewers", Journal of Molecular Graphics, vol. 13, pp. 268-270, 1995. https://doi.org/10.1016/0263-7855(95)00053-4
  3. O. Casher, C. Leach, C.S. Page, and H.S. Rzepa, "Advanced VRML based chemistry applications: a 3D molecular hyperglossary", Journal of Molecular Structure: THEOCHEM, vol. 368, pp. 49-55, 1996. https://doi.org/10.1016/s0166-1280(96)90535-7