Posts Tagged ‘computing’

Ten years on: Jmol and WordPress.

Wednesday, May 16th, 2018

Ten years are a long time when it comes to (recent) technologies. The first post on this blog was on the topic of how to present chemistry with three intact dimensions. I had in mind molecular models, molecular isosurfaces and molecular vibrations (arguably a further dimension). Here I reflect on how ten years of progress in technology has required changes and the challenge of how any necessary changes might be kept “under the hood” of this blog.

That first post described how the Java-based applet Jmol could be used to present 3D models and animations. Gradually over this decade, use of the Java technology has become more challenging, largely in an effort to make Web-page security higher. Java was implemented into web browsers via something called Netscape Plugin Application Programming Interface  or NPAPI, dating from around 1995. NPAPI has now been withdrawn from pretty much all modern browsers. Modern replacements are based on JavaScript, and the standard tool for presenting molecular models, Jmol has been totally refactored into JSmol. Now the challenge becomes how to replace Jmol by JSmol, whilst retaining the original Jmol Java-based syntax (as described in the original post). Modern JSmol uses its own improved syntax, but fortunately one can use a syntax converter script Jmol2.js which interprets the old syntax for you. Well, almost all syntax, but not in fact the variation I had used throughout this blog, which took the form:

<img onclick=”jmolApplet([450,450],’load a-data-file;spin 3;’);” src=”static-image-file” width=”450″ /> Click for 3D structure

This design was originally intended to allow browsers which did not have the Java plugin installed to default to a static image, but that clicking on the image would allow browsers that did support Java to replace (in a new window) the static image with a 3D model generated from the contents of a-data-file. The Jmol2.js converter script had not been coded to detect such invocations. Fortunately Angel came to my rescue and wrote a 39 line Javascript file that does just that (my Javascript coding skills do not extend that far!). Thanks Angel!!

In fact I did have to make one unavoidable change, to;

<img onclick=”jmolApplet([450,450],’load a-data-file;spin 3;’,’c1′);” src=”image-file” width=”450″ /> Click for 3D structure

to correct an error present in the original. It manifests when one has more than one such model present in the same document, and this necessitates that each instance has a unique name/identifier (e.g. c1). So now, in the WordPress header for the theme used here (in fact the default theme), the following script requests are added to the top of each page, the third of which is the new script.

<script type=”text/javascript” src=”JSmol.min.js”></script>
<script type=”text/javascript” src=”js/Jmol2.js”></script>
<script type=”text/javascript” src=”JmolAppletNew.js”></script>

The result is e.g.

Click for 3D

Click for 3D structure of GAVFIS

Click for 3D

Click for 3D interaction

This solution unfortunately is also likely to be unstable over the longer term. As standards (and security) evolve, so invocations such as onclick= have become considered “bad practice” (and may even become unsupported). Even more complex procedures will have to be devised to keep up with the changes in web browser behaviour and so I may have to again rescue the 3D models in this blog at some stage! Once upon a time, the expected usable lifetime of e.g. a Scientific Journal (print!) was a very long period (>300 years). Since ~1998 when most journals went online, that lifetime has considerably shortened (or at least requires periodic, very expensive, maintenance). For more ambitious types of content such as the 3D models discussed here, it might be judged to be <10 years, perhaps much less before the maintenance becomes again necessary. Sigh!


At the time of writing, WaterFox is one of the few browsers to still support it. An early issue with using Javascript instead of Java was performance. For some tasks, the former was often 10-50 times slower. Improvements in both hardware and software have now largely eliminated this issue. Thus using Jquery.

FAIR data ⇌ Raw data.

Thursday, December 7th, 2017

FAIR data is increasingly accepted as a description of what research data should aspire to; Findable, Accessible, Inter-operable and Re-usable, with Context added by rich metadata (and also that it should be Open). But there are two sides to data, one of which is the raw data emerging from say an instrument or software simulations and the other in which some kind of model is applied to produce semi- or even fully processed/interpreted data. Here I illustrate a new example of how both kinds of data can be made to co-exist.

I will start with a recent publication[1] with the title Crystallographic Snapshot of an Arrested Intermediate in the Biomimetic Activation of CO2The nature of this intermediate caught the eye of another research group, who responded with their own critique[2] along with the comment “However, since we have no access to the original crystallographic data …” They might have been referring to the semi-processed data (containing the so-called hkl structure factors) but they may also have been alluding to the raw image data captured directly from the diffractometer cameras. That traditionally has not been available via the CSD (Cambridge structural database), but would be required for a complete re-analysis of the crystal structure. Now the first example of how both FAIR (processed) data and raw data can co-exist has appeared.

The latest version of the CSD database shows an entry resulting from the following publication[3] and the deposited data has its own DOI there (10.5517/ccdc.csd.cc1n9ppb). That entry in turn has a DOI pointer to the Raw data (10.14469/hpc/2300) held in a different location and the pointer is reciprocated (⇌) with the latter pointing back to the former. Both datasets point to the original article, thus completing a holy triangle.

There is more. The Raw dataset (10.14469/hpc/2300) declares it is a member of a superset, called Crystal structure data for Synthesis and Reactions of Benzannulated Spiroaminals; Tetrahydrospirobiquinolines (10.14469/hpc/2297where you can find information about six other related structures. That collection is in turn a member of a superset called Synthesis and Reactions of Benzannulated Spiroaminals; Tetrahydrospirobiquinolines (10.14469/hpc/2099where DOIs to other types of data associated with this project can be found, such as Computational data (10.14469/hpc/2098) and NMR data (10.14469/hpc/2294). Although a human can with some determination follow these associations up, down and across, the system is designed to also be followed by automated algorithms that could traverse this web quickly and efficiently.

So you can now see that a crystal structure held in the CSD could be the starting point for a journey of FAIR data discovery, in manner that has not hitherto been possible. How quickly the CSD will become populated by links to Raw (and other) data remains to be seen. I have not yet discovered any mechanism for specifying a CSD query which stipulates that Raw data must be available, but no doubt this will come.

To end, back to the Biomimetic Activation of CO2 referred to at the start. With no access to the original data, recourse was made to computational modelling.[2] Which where  I came in, since I wanted access to the original (computational) data. Sadly it did not appear to be available with the article,[2] in much the same manner as the original complaint. Perhaps, when FAIR data becomes fully accepted as part of how science is done nowadays, such complaints will become ever rarer!


In fact the original authors did respond[4] with an acknowledgement that their original conclusions were not correct.

Almost. The article [3] cites DOI: 10.14469/hpc/2099 (Ref 28), but it does not cite DOI: 10.5517/ccdc.csd.cc1n9ppb because the latter had not been minted yet at the time the final proofs were corrected, and there is no mechanism to add it at a later stage.

References

  1. S.L. Ackermann, D.J. Wolstenholme, C. Frazee, G. Deslongchamps, S.H.M. Riley, A. Decken, and G.S. McGrady, "Crystallographic Snapshot of an Arrested Intermediate in the Biomimetic Activation of CO<sub>2</sub>", Angewandte Chemie International Edition, vol. 54, pp. 164-168, 2014. https://doi.org/10.1002/anie.201407165
  2. J. Hurmalainen, M.A. Land, K.N. Robertson, C.J. Roberts, I.S. Morgan, H.M. Tuononen, and J.A.C. Clyburne, "Comment on “Crystallographic Snapshot of an Arrested Intermediate in the Biomimetic Activation of CO<sub>2</sub>”", Angewandte Chemie International Edition, vol. 54, pp. 7484-7487, 2015. https://doi.org/10.1002/anie.201411654
  3. J. Almond-Thynne, A.J.P. White, A. Polyzos, H.S. Rzepa, P.J. Parsons, and A.G.M. Barrett, "Synthesis and Reactions of Benzannulated Spiroaminals: Tetrahydrospirobiquinolines", ACS Omega, vol. 2, pp. 3241-3249, 2017. https://doi.org/10.1021/acsomega.7b00482
  4. S.L. Ackermann, D.J. Wolstenholme, C. Frazee, G. Deslongchamps, S.H.M. Riley, A. Decken, and G.S. McGrady, "Corrigendum: Crystallographic Snapshot of an Arrested Intermediate in the Biomimetic Activation of CO<sub>2</sub>", Angewandte Chemie International Edition, vol. 54, pp. 7470-7470, 2015. https://doi.org/10.1002/anie.201504197

PIDapalooza 2018: the open festival for persistent identifiers.

Tuesday, November 14th, 2017

PIDapalooza is a new forum concerned with discussing all things persistent, hence PID. You might wonder what possible interest a chemist might have in such an apparently arcane subject, but think of it in terms of how to find the proverbial needle in a haystack in a time when needles might look all very similar. Even needles need descriptions, they are not all alike and PIDs are a way of providing high quality information (metadata) about a digital object.  

The topics for discussion along with descriptions are now available at https://pidapalooza18.sched.com/list/descriptions/ and yes, before you ask, the event has its own PID (DOI: 10.5438/11.0002). Check out the speakers at https://pidapalooza18.sched.com/directory/speakers. I will be telling some stories from chemistry, and who knows, even some of the posts on this blog might feature. And if you do not brush up on the topic, no doubt your librarian, your funding body and your publisher will be telling you about it soon!

Twenty one years of chemistry-related Java apps: RIP Java?

Saturday, June 10th, 2017

In an earlier post, I lamented the modern difficulties in running old instances of Jmol, an example of an application program written in the Java programming language. When I wrote that, I had quite forgotten a treasure trove of links to old Java that I had collected in 1996-7 and then abandoned. Here I browse through a few of the things I found.

The collection is at DOI: 10.14469/hpc/2657. Here I track down how some of them are doing 20+ years on.

  1. Formula-To-Mass-To-Formula (f2m2f), which was started in August 1996 and was written by Guillaume Cottenceau, a french undergraduate student visiting London and who wanted to learn to program. I suggested he try Java, and as I recollect sent him out to the business park west of London where Sun Microsystems had an office to learn how to do so (they had only released the development kit a few months earlier!). The applet he wrote still works (being unsigned, you have to jump through a few hoops to allow it to run (but be quick, not many browsers will still let you do so!). The applet also has a benchmark feature. Running the heavy bench now takes ~ 0.4s on a laptop.  I cannot be sure,  but I seem to remember that this one took ~20 seconds back in 1997. 
  2. Guillaume then returned to Paris to finish the above off, but also managed to find the time a year later to produce Jspec, a visualiser for NMR and MS. Darek Bogdal was visiting from Poland in August 1997(8?) and he incorporated these tools into a general spectral display and problem solving resource, which also still mostly works (with no curation!). The bit that does not work depended on the Chime plugin, now long gone and of course replaced in large measure by Jmol and now JSmol. 
  3. Here is an equation setter. The original site has long gone, but I had copied the classes over and it also (mostly) works!
  4. This dates from 1997 by Wyn Locke and Alan Tongue and uses JavaScript plus the spectral viewer to communicate with Chime. All done much better by many others since of course.

That said, many of the other links at DOI: 10.14469/hpc/2657 no longer work. In truth I am slightly surprised a few still do! 

Quite possibly these screen shots may be the only visual images that can be created in the very near future, as all but very specialised web browsers drop “plug-in” (aka Java) support. So perhaps it will be RIP Java, at least for the in-browser frame mode (but certainly not for the stand-alone application mode).

Curating a nine year old journal FAIR data table.

Monday, May 29th, 2017

As the Internet and its Web-components age, so early pages start to decay as technology moves on. A few posts ago, I talked about the maintenance of a relatively simple page first hosted some 21 years ago. In my notes on the curation, I wrote the phrase “Less successful was the attempt to include buttons which could be used to annotate the structures with highlights. These buttons no longer work and will have to be entirely replaced in the future at some stage.” Well, that time has now come, for a rather more crucial page associated with a journal article published more recently in 2009.[1]

The story started a few days ago when I was contacted by the learned society publisher of that article, noting they were “just checking our updated HTML view and wanted to test some of our old exceptions“. I should perhaps explain what this refers to. The standard journal production procedures involve receiving a Word document from authors and turning that into XML markup for the internal production processes. For some years now, I have found such passive (i.e. printable only) Word content unsatisfactory for expressing what is now called FAIR (Findable, accessible, inter-operable and re-usable) data. Instead, I would create another XML expression (using HTML), which I described as Interactive Tables and then ask the publisher to host it and add that as a further link to the final published article. I have found that learned society publishers have not been unwilling to create an “exception” to their standard production workflows (the purely commercial publishers rather less so!). That exceptional link is http://www.rsc.org/suppdata/cp/b8/b810301a/Table/Table1.html but it has now “fallen foul of the java deprecation“. 

Back in 2008 when the table was first created, I used the Java-based Jmol program to add the interactive component. That page, when loaded, now responds with the message:

This I must emphasise is nothing to do with the publisher, it is the Jmol certificate that has been revoked. That of itself requires explanation. Java is a powerful language which needs to be “sandboxed” to ensure system safety. But commands can be created which can access local file stores and write files out there (including potentially dangerous ones). So it started to become the practise to sign the Java code with the developer certificate to ensure provenance for the code. These certificates are time-expired and around 2015 the time came to renew it. Normally, when such a certificate is renewed, the old one is allowed to continue operation. On this occasion the agency renewing the certificate did not do this but revoked the old one instead (Certificate has been revoked, reason: CESSATION_OF_OPERATION, revocation date: Thu Oct 15 23:11:18 BST 2015). So all instances of Jmol with the old certificate now give the above error message. 

The solution in this case is easy; the old Jmol code (as JmolAppletSigned.jar) is simply replaced with the new version for which the certificate is again valid. But simply doing that alone would merely have postponed the problem; Java is now indeed deprecated for many publishers, which is a warning that it will be prohibited at some stage in the future.‡ So time to bite the bullet and remove the dependency on Java-Jmol, replacing it with JSmol which uses only JavaScript.

Changing published content is in general not allowed; one instead must publish a corrigendum. But in this instance, it is not the content that needs changing but the style of its presentation (following the principle of the Web of a clear-cut separation of style and content). So I set out to update the style of presentation, but I was keen to document the procedures used. I did this by commenting out non-functional parts of the style components of my original HTML document (as <!– comment –>) and adding new ones. I describe the changes I made below.

  1. The old HTML contained the following initialisation code: jmolInitialize(".","JmolAppletSigned.jar");jmolSetLogLevel('0'); which was commented out.
  2. New scripts to initialize instead JSmol were added, such as:
    <script src="JSmol.min.js" type="text/javascript"> </script>
  3. I added further scripts to set up controls to add interactivity.
  4. The now deprecated buttons had been invoked using a Jmol instance:  jmolButton('load "7-c2-h-020.jvxl";isosurface "" opaque; zoom 120;',"rho(r) H")
  5. which was replaced by the JSmol equivalent, but this time to produce a hyperlink rather than a button (to allow the greek ρ to appear, which it could not on a button): <a href="javascript:show_jmol_window();Jmol.script(jmolApplet0,'load 7-c2-020.jvxl;isosurface &quot;&quot; translucent;spin 3;')">ρ(r)</a>,
  6. Some more changes were made to another component of the table, the links to the data repository. Originally, these quoted a form of persistent identifier known as a Handle; 10042/to-800. Since the data was deposited in 2008, the data repository has licensed further functionality to add DataCite DOIs to each entry. For this entry,  10.14469/ch/775. Why? Well, the original Handle registration had very little (chemically) useful registered metadata, whereas DataCite allows far richer content. So an extra column was added to the table to indicate these alternate identifiers for the data.
  7. We are now at the stage of preparing to replace the Java applet at the publishers site with the Javascript version, along with the amended HTML file. The above link, as I write this post, still invokes the old Java, but hopefully it will shortly change to function again as a fully interactive table.
  8. I should say that the whole process, including finding a solution and implementing it took 3-4 hours work, of which the major part was the analysis rather than its implementation.

It might be interesting to speculate how long the curated table will last before it too needs further curation. There are some specifics in the files which might be a cause for worry, namely the so-called JVXL isosurfaces which are displayed. These are currently only supported by Jmol/JSmol. They were originally deployed because iso-surfaces tend to be quite large datafiles and JVXL used a remarkably efficient compression algorithm (“marching cubes”) which reduces their size ten-fold or more. Should JSmol itself become non-operational at some time in the (hopefully) far future (which we take to be ~10 years!) then a replacement for the display of JVXL will need to be found. But the chances are that the table itself will decay “gracefully”, with the HTML components likely to outlive most of the other features. The data repository quoted above has itself now been available for ~12 years and it too is expected to survive in some form for perhaps another 10. Beyond that period, no-one really knows what will still remain. 

You may well ask why the traditional journal model of using paper to print articles and which has survived some 350 years now, is being replaced by one which struggles to survive 10 years without expensive curation. Obviously, a 3D interactive display is not possible on paper. But one also hears that publishers are increasingly dropping printed versions entirely. One presumes that the XML content will be assiduously preserved, but re-working (transforming, as in XSLT) any particular flavour of XML into another publishers systems is also likely to be expensive. Perhaps in the future the preservation of 100% of all currently published journals will indeed become too expensive and we might see some of the less important ones vanishing for ever?


Nowadays it is necessary to configure your system or Web browser to allow even signed valid Java applets to operate. Thus in the Safari browser (which still allows Java to operate, other popular browsers such as Chrome and Firefox have recently removed this ability), one has to go to preferences/security/plugin-settings/Java, enter the URL of the site hosting the applet and set it to either “ask” (when a prompt will always appear asking if you want to accept the applet) or “on” when it will always do so. How much longer this option will remain in this browser is uncertain.

In the area of chemistry, an early pioneer was the Internet Journal of Chemistry, where the presentation of the content took full advantage of Web-technologies and was on-line only. It no longer operates and the articles it hosted are gone.

References

  1. H.S. Rzepa, "Wormholes in chemical space connecting torus knot and torus link π-electron density topologies", Phys. Chem. Chem. Phys., vol. 11, pp. 1340-1345, 2009. https://doi.org/10.1039/b810301a

Conference report: an example of collaborative open science (reaction IRCs).

Thursday, May 25th, 2017

It is a sign of the times that one travels to a conference well-connected. By which I mean email is on a constant drip-feed, with venue organisers ensuring each delegate receives their WiFi password even before their room key. So whilst I was at a conference espousing the benefits of open science, a nice example of open collaboration was initiated as a result of a received email.

Steven Kirk  contacted me with the following query: Do you know of any open-access database of calculated IRCs with coverage of as broad a range of classes of chemical reactions as possible? I recollected that about six years ago, I was exploring the use of iTunesU as a system for delivering course content in a rich-media format. I produced animations for about 115 reactions (many of which as it happens were taken from this blog, but quite a number were also unique to that project) and placed them into iTunesU, and now sending the URL https://itunes.apple.com/gb/course/id562191342 to Steven.

I should at this point explain something of the structure of such an iTunesU course.

  1. An essential feature is the course icon, seen below on the left. Since the course is hosted by Imperial College, it had to be an officially approved icon. I am sure you can believe me if I tell you that this took a month or so to obtain, with a fair bit of persistence required!
  2. I also had to get approval to place the iTunes app on all the teaching computers so that students could open the course. Believe me again when I tell you that I had to persuade the Apple lawyers in Cupertino to release a special license for this app to persuade our administrators here to install it on the Windows teaching clusters. Another few months had passed by.
  3. When creating an entry (using e.g. https://itunesu.itunes.apple.com/coursemanager/ ) one has to specify values for various descriptors, also often called metadata. Thus any one entry has fields for name and description, with the popularity added by Apple. Only a few words are visible in the description field, which can be expanded in iTunes using the i button.
  4. Steven meanwhile had replied asking if the original data that was used to generate the IRC might be available. Specifically his second question was “So the DOIs are only stamped into the animation’s bitmaps, or are they also somewhere in the metadata?“. That little i button is not easy to spot, and there is no indication, in the event, of what information it might actually contain.
  5. Here it is expanded. The contents are unstructured text, into which I have placed the required DOI.
  6. The lesson here is that I had fortunately had the foresight to include a link to the IRC data in anticipation of just such a question from someone in the future. But black mark to Apple here; the text cannot be selected and copied into a clipboard! It is fairly unFAIR data, since it can only be inter-operated (the I of FAIR) by a human re-typing it by hand. And the human has also to recognise the pattern of a DOI; a machine could not obtain this information easily. Moreover Steven is a Linux user; he does not readily have access to the iTunes app on this operating system!
  7. Also, there were 115 such entries, and now the prospect was rearing that each would have to be hand processed. Moreover, because the text was unstructured, there was no guarantee that I would have adopted the same pattern for all 115 entries.
  8. Fortunately Steven was on the ball. I quote again: it turns out iTunes isn’t needed at all. A service I found on the web http://picklemonkey.net/feedflipper-home/ takes an ITunes URL and converts it to an RSS feed. Opening this feed in Firefox and RSSOwl respectively let me save the feed as XML and HTML (both attached).
  9. This is currently where we stand (Steven’s first email was two days ago), but it’s not finished yet. Depending on how assiduous I was five years ago, some DOIs to the data may be acquired from the list. Sometimes I simply wrote e.g. See http://www.ch.imperial.ac.uk/rzepa/blog/?p=6816 knowing that the links to the data were there instead. I can already see that some descriptions have neither a DOI nor a link to the blog. More detective work will be needed, unfortunately.

How might the situation described above been avoided? Well, Apple in iTunesU only provided in effect one metadata field, and this was an unstructured one. Anything went in that field. Had they provided (or had the course creator been able to configure it themselves) there might have been another field entitled say “data source“. This could moreover been made a mandatory field and a structured one. Thus it might have only accepted known types of persistent identifier, such as a DOI. Further, the system could have checked that the DOI was actually resolvable. Before you ask, I did log a “bug” with Apple asking this be done, but nothing ever was. With such a tool to hand, I might have achieved data sources for all the 115 entries. The resulting XML (as generated above) could have been used to automate the retrieval of all 115 datasets describing this course. 

At this stage then, Steven can follow-up his interest in building a reaction IRC library and analysing it. I will do all I can to encourage Steven not to make the mistakes I did and to ensure that any further data that is required to augment the library does not suffer the problems above. On the other hand, I console myself that in two days, much of the data for the course I created five years ago was salvageable; I wonder how many other iTunesU courses there are for which that can be said!

I will let (with some blushing) the final word be Steven’s: You are one of the few chemists who has both pioneered and built the principles of ‘open chemistry’ into their actual scientific work. I visit your blog occasionally knowing that there is a very high probability I could download and tinker with the results of real calculations.


Might I assure all the speakers that I concentrated totally on their talks rather than incoming emails!

The 2016 Bradley-Mason prize for open chemistry.

Tuesday, October 4th, 2016

Peter Murray-Rust and I are delighted to announce that the 2016 award of the Bradley-Mason prize for open chemistry goes to Jan Szopinski (UG) and Clyde Fare (PG).

Jan’s open chemistry derives from a final year project looking at why atom charges derived from quantum chemical calculation of the electronic density represent chemical information well, but the electrostatic potential (ESP) generated from these charges is very poor and conversely charges derived from the computed electrostatic potential are incommensurate with chemical information (such as the electronegativity of atoms). He has developed a Python program called ‘repESP’ in which ‘compromise’ charges are generated which attempt to reconcile the physical world-view (fitting the ESP) with chemical insight provided by NPA (Natural Population Analysis). Jan was the main driver to making his code open source, “opening his supervisor’s eyes” to the various flavours of open source licences. To ensure that all subsequent improvements to the program remain available to anyone, the source code has been released under a ‘copyleft’ licence (GPL v3) and is maintained by Jan on GitHub, where Jan looks forward to helping new users and collaborating with contributors.

Clyde has made various contributions to opensource chemistry over the period of his PhD, with the focus mainly on utilities to improve quantum chemical research and the enhancement of a popular machine learning library with a method that has been successful in chemometrics, creation of an opensource channel for teaching chemists programming and data analysis and creation of a tool to help encourage open sourcing software development. Cclib is the most popular library for parsing quantum chemical data from output files and Clyde has contributed patches for the Atomic simulation environment which enables control of quantum chemical codes from a unified python interface. He was responsible for the construction of a computational chemistry electronic notebook published to github and which is now under active development by others as well. This aims to encapsulate computation chemical research projects, both for the sake of reproducibility and for the sake of organising and keeping track of quantum chemical research. Alongside this platform he created an enhanced Gaussian calculator for the Atomic Simulation Environment that enables automatic construction of ONIOM input files, also now under active development. He also made contributions to scikit learn, the most popular python machine learning framework, implementing a kernel for Kernel Ridge Regression that has become the most successful kernel for regression over molecular properties. He was part of the team that won the 2014 sustainable software conference prize for creation of the opensource healthchecker software as part of Sustain. He has argued for opensource as a platform for teaching resources and created the Imperial Chemistry github user account, which is now run by the department. Materials for the Imperial Chemistry Data Analysis and Programming workshops implemented as Python Notebooks are now available through this account and continue under active development.

Criteria for the award will include judging the submission on its immediate accessibility via public web sites, what is visible and re-usable in this way and of evidence of either community formation/engagement or re-use of materials by people other than the proposer.

Computers 1967-2013: a personal perspective. Part 5. Network bandwidth.

Wednesday, June 5th, 2013

In a time of change, we often do not notice that Δ = ∫δ. Here I am thinking of network bandwidth, and my personal experience of it over a 46 year period.

I first encountered bandwidth in 1967 (although it was not called that then). I was writing Algol code to compute the value of π, using paper tape to send the code to the computer. Unfortunately, the paper tape punch was about 10 km from that computer. The round trip (by van) took about a week, the outcome being often merely to discover that the first line of the code contained a compilation error. I think I got to computing π after about six weeks. That is a bandwidth of about 18 characters (108 bits) in 3628800 seconds, or 0.00003 bits per second.

I did my undergraduate work in 1969, when the distance between the card punch and the computer had reduced to about 50m, and instant turnaround involved circulating in a loop between the punch and the line printer, hoping that neither suffered a paper-wreck. The bandwidth had certainly gone up. On a good day, you could make 20 or so circuits, which did leave one feeling faintly dizzy. 

The next improvement came in 1972, when I was solving non-linear equations for kinetic rate constants, using a 110 bits per second (baud) or ~ 18 characters per second using the 6-bit computers of that era) teletypewriter. This was about 50m from the lab where the kinetic measurements were made (using, if you are interested a scintillation counter. Yes, I was mildly radioactive for most of my PhD, but I do not believe I glowed in the dark). This bandwidth was in fact fine for uploading kinetic data, and receiving the computed rate constant and its standard error. You might note however that this teletypewriter was the only one in the building I occupied, and yet demand for it was small (I was pretty much its only user). 

The next increment occurred in Texas 1974-1977, where I was now doing quantum chemical calculations. Back in time to the card punch and the lineprinter (Texas is big, and so now the distance between them was a 10 minute walk). But in my last year there, a state-of-the-art 300 baud teletypewriter was installed! This was now fast enough to play a computer game (something to do with Dragons and Dungeons I think), and so now there was competition to use it. Particularly from one of my friends, who shall be called George, and who on one occasion spent about 48 virtually contiguous hours trying to get to the last level. The rest of us returned to the card punch to submit the calculations. It was also during this period that the first emails started to be exchanged, but only really as a curiosity: “it would never catch on” was the opinion of most.

Back in the UK by 1977, I was overwhelmed by the speed of the 9.6 kbaud graphics terminal I now had access to, 32 times faster. And the rate continued to multiply, by a further 1000 to attain 10 Mbaud in 1987. But another change occurred during this period. The previous eras had involved transmitting the data no more than ~200m, from one point in the campus to another. But by 1986, if one tried hard enough, one could reach ARPANET. And that was 5000 km away! My first use of such distances was to reach California and download Apple’s system 5.0 for the Macs in the department (I have described elsewhere the role the Mac’s printer port played in this). From then on, we always did have the latest operating system installed on most of the machines (although not always did this subterfuge address the intended issue, which was to stop the computer crashing as often).

These speeds however did not reach beyond the university. Back home, around 1983, I was back to using a 300 baud modem, with an acoustic coupler to the land line. Our young daughter, aged 3 at the time, joined in the data transmission with gusto. Her joyful shrieks were invariably picked up by the acoustic coupler, and translated into a jumble of characters, which were then interleaved into the numbers coming back from quantum calculations. It was sometimes difficult to tell them apart! These domestic modems gradually got faster, probably attaining 9.6 kbaud by about 1993 (during the course of which the acoustic component was replaced by electronics, and oddly, our daughter stopped shrieking in quite the same way). 

Back in the university in 1993, the first 100 megabits per second (100Mbps ≅100 Mbaud) ethernet lines and switches were being installed, but the national and international backbones were still a lot slower. It was in this year that I was approached to be part of a SuperJanet project. We were going to do a molecular videoconference from London to Cambridge and Leeds; a three-way connection, and this needed ~ 20Mbps to transmit the signal from the video camera as well as the 3D images of molecules in real-time (compression techniques were not so advanced in those days). Because BT was sponsoring the project, they naturally wanted some publicity, and so we even got to appear on the national television news that night. But we came within about 1 minute of a disaster. Our 20Mbps connection went through the SuperJanet national backbone, the capacity of which was, you guessed, ~ 20 Mbps. The network operators (located at the Rutherford-Appleton laboratories), who we had not had the foresight to pre-warn, came within 1 minute of isolating Imperial College from the national network because of our bandwidth hogging. I met them a month or so later, and they told me this. I feel I was lucky to escape with my life and body intact from that meeting (or to put it another way, they were not happy bunnies). 

By about 2000, I had achieved 1 Gbps to my desktop computer (and there it has stayed for the past 13 years). What about home? Well, to cut the story short, I recently benchmarked the domestic WiFi connection between a laptop and “the world” at about 65 Mbps (download) and 18 Mbps (upload), a little less than 1 million times greater than 30 years earlier and a 12 orders of magnitude greater than in 1967. I gather however that some lucky inhabitants of Austin Texas (the scene of my 1974-1977 experiments), courtesy of Google, can get 1 Gbps!

I will end by quoting Samuel Butler, writing in 1863I venture to suggest that … the general development of the human race to be well and effectually completed when all men, in all places, without any loss of time, at a low rate of charge, are cognizant through their senses, of all that they desire to be cognizant of in all other places. … This is the grand annihilation of time and place which we are all striving for, and which in one small part we have been permitted to see actually realised” (Quoted in George Dyson, “Darwin amongst the Machines, The Evolution of Global Intelligence”, Addison-Wesley, N.Y., 1997. ISBN 0-201-400649-7).


I just benchmarked my office computer (using only solid-state memory and that 1Gbps connection) and got 58Mbps (download)/75Mbps (upload).

The standard program was NCSA Telnet if  I remember. You made a connection from the computer (using its printer port) to the ARPANET node at University College London (not a widely advertised service), and thence to an Apple FTP site where one could initiate an anonymous file transfer back to one’s computer.  System 5 was about half a Mbyte then, and this took about 1-2 hours to retrieve (unless the connection went down, in which case one started again).

A comparison of left and right handed DNA double-helix models.

Saturday, January 1st, 2011

When Watson and Crick (WC) constructed their famous 3D model for DNA, they had to decide whether to make the double helix left or right handed. They chose a right-handed turn, on the grounds that their attempts at left-handed models all “violated permissible van der Waals contacts“. No details of what these might have been were given in their original full article (or the particular base-pairs which led to the observation). This follow-up to my earlier post explores this aspect, using a computer model.

One half of a (CGCG) DNA strand

The DNA model used here is shown above; in shorthand it is d(CGCG)2. A crystal structure reveals it to form a (non-Watson-Crick) left-handed helix. If you open the 3D model below (based on a ωB97XD/6-31G(d)/SCRF=water optimisation), some of the short van der Waals contacts are measured. Most are around 2.25Å and the shortest is 2.1Å. It is worth noting that WC note in their article that a distance of 2.1Å for the B-form is acceptable (p92, bottom) and not a violation. All twelve hydrogen bond lengths H…O or H…N are normal, with lengths around 1.8Å. Given that a H…H distance is at its most attractive at ~2.4Å, and plenty of H…H distances of ~2.1Å are known from the crystal structures of organic molecules, one might conclude that (for the CG base pair), their hypothesis that the Z-form could be eliminated was wrong.

The DNA duplex d(CGCG) showing a left handed helix with short H...H contacts shown. Click for 3D

But might the original WC-right handed form for this system be at least competitive? There is one H…H of 2.05Å and quite a few at ~2.5Å (3D model below). The “violation” of van der Waals contacts is if anything slightly worse than with the left-handed helix. The total difference in the dispersion energy is a rather astonishing ~12 kcal/mol in favour of the Z-form. I will update this post (as a comment) when the relative free energies of the two forms are available (this calculation takes a while), but there is little doubt that the Z-form is indeed the more stable.

The DNA duplex d(CGCG) showing a right handed helix with short H...H contacts shown. Click for 3D

What can also be said about the Watson-Crick right handed form is that the hydrogen bonding is not so optimal. One of the twelve interactions between a (terminal) CG pair has some signs of being “unzipped“, with an N-H…O=C distance of ~1.9Å (there is no sign of similar unzipping in the Z-form). One must wonder whether this difference in the Z- and B-helices for the CG pair has been exploited in nature.

 

One crucial aspect of DNA is the local conformation about the bond connecting the base and the ribose, N9-C8 in the diagram below(green arrow).

Conformation of the base-ribose unit

An analysis of this bond can be expressed in terms of NBO theory. This clearly shows a strong interaction energy (E2) between the lone pair on N9 and the C8-O4 antibonding orbital of 13.3 kcal/mol, a classical anomeric effectin fact. In this case, it promotes the local conformation of this unit, which has a significant effect on the final model.

What else can analysis of the wavefunction tell us? Well, curiously, the optical rotation of this particular small oligomer has never been reported in the literature, and an intriguing question is whether it might have proved useful to distinguish between B- and Z-forms of the duplex? To do this, one needs a reasonably reliable way of computing [α]D for both isomers. This is because optical rotations are not reliably additive, and it is difficult to estimate them accurately based purely on the fragments present in the molecule. In 2011, is is now perfectly possible to calculate this quantity quantum mechanically, even for 250 atoms, using a reasonable basis set and making allowance for solvation (which is known to affect the calculated rotation). The values (CAM-B3LYP/6-31G(d)/SCRF=water) for the Z-isomer are 66° and 32° for the B-isomer. Of course the model is not complete, lacking a counterion for the phosphate and explicit water molecules, but even so, it might appear that the reason optical rotations are not reported is that they truly are not useful!