Posts Tagged ‘operating system’

Conference report: an example of collaborative open science (reaction IRCs).

Thursday, May 25th, 2017

It is a sign of the times that one travels to a conference well-connected. By which I mean email is on a constant drip-feed, with venue organisers ensuring each delegate receives their WiFi password even before their room key. So whilst I was at a conference espousing the benefits of open science, a nice example of open collaboration was initiated as a result of a received email.

Steven Kirk  contacted me with the following query: Do you know of any open-access database of calculated IRCs with coverage of as broad a range of classes of chemical reactions as possible? I recollected that about six years ago, I was exploring the use of iTunesU as a system for delivering course content in a rich-media format. I produced animations for about 115 reactions (many of which as it happens were taken from this blog, but quite a number were also unique to that project) and placed them into iTunesU, and now sending the URL https://itunes.apple.com/gb/course/id562191342 to Steven.

I should at this point explain something of the structure of such an iTunesU course.

  1. An essential feature is the course icon, seen below on the left. Since the course is hosted by Imperial College, it had to be an officially approved icon. I am sure you can believe me if I tell you that this took a month or so to obtain, with a fair bit of persistence required!
  2. I also had to get approval to place the iTunes app on all the teaching computers so that students could open the course. Believe me again when I tell you that I had to persuade the Apple lawyers in Cupertino to release a special license for this app to persuade our administrators here to install it on the Windows teaching clusters. Another few months had passed by.
  3. When creating an entry (using e.g. https://itunesu.itunes.apple.com/coursemanager/ ) one has to specify values for various descriptors, also often called metadata. Thus any one entry has fields for name and description, with the popularity added by Apple. Only a few words are visible in the description field, which can be expanded in iTunes using the i button.
  4. Steven meanwhile had replied asking if the original data that was used to generate the IRC might be available. Specifically his second question was “So the DOIs are only stamped into the animation’s bitmaps, or are they also somewhere in the metadata?“. That little i button is not easy to spot, and there is no indication, in the event, of what information it might actually contain.
  5. Here it is expanded. The contents are unstructured text, into which I have placed the required DOI.
  6. The lesson here is that I had fortunately had the foresight to include a link to the IRC data in anticipation of just such a question from someone in the future. But black mark to Apple here; the text cannot be selected and copied into a clipboard! It is fairly unFAIR data, since it can only be inter-operated (the I of FAIR) by a human re-typing it by hand. And the human has also to recognise the pattern of a DOI; a machine could not obtain this information easily. Moreover Steven is a Linux user; he does not readily have access to the iTunes app on this operating system!
  7. Also, there were 115 such entries, and now the prospect was rearing that each would have to be hand processed. Moreover, because the text was unstructured, there was no guarantee that I would have adopted the same pattern for all 115 entries.
  8. Fortunately Steven was on the ball. I quote again: it turns out iTunes isn’t needed at all. A service I found on the web http://picklemonkey.net/feedflipper-home/ takes an ITunes URL and converts it to an RSS feed. Opening this feed in Firefox and RSSOwl respectively let me save the feed as XML and HTML (both attached).
  9. This is currently where we stand (Steven’s first email was two days ago), but it’s not finished yet. Depending on how assiduous I was five years ago, some DOIs to the data may be acquired from the list. Sometimes I simply wrote e.g. See http://www.ch.imperial.ac.uk/rzepa/blog/?p=6816 knowing that the links to the data were there instead. I can already see that some descriptions have neither a DOI nor a link to the blog. More detective work will be needed, unfortunately.

How might the situation described above been avoided? Well, Apple in iTunesU only provided in effect one metadata field, and this was an unstructured one. Anything went in that field. Had they provided (or had the course creator been able to configure it themselves) there might have been another field entitled say “data source“. This could moreover been made a mandatory field and a structured one. Thus it might have only accepted known types of persistent identifier, such as a DOI. Further, the system could have checked that the DOI was actually resolvable. Before you ask, I did log a “bug” with Apple asking this be done, but nothing ever was. With such a tool to hand, I might have achieved data sources for all the 115 entries. The resulting XML (as generated above) could have been used to automate the retrieval of all 115 datasets describing this course. 

At this stage then, Steven can follow-up his interest in building a reaction IRC library and analysing it. I will do all I can to encourage Steven not to make the mistakes I did and to ensure that any further data that is required to augment the library does not suffer the problems above. On the other hand, I console myself that in two days, much of the data for the course I created five years ago was salvageable; I wonder how many other iTunesU courses there are for which that can be said!

I will let (with some blushing) the final word be Steven’s: You are one of the few chemists who has both pioneered and built the principles of ‘open chemistry’ into their actual scientific work. I visit your blog occasionally knowing that there is a very high probability I could download and tinker with the results of real calculations.


Might I assure all the speakers that I concentrated totally on their talks rather than incoming emails!

Chemistry data round-tripping. Has there been ANY progress?

Monday, December 2nd, 2013

This is one of those topics that seems to crop up every three years or so. Since then, new versions of operating systems, new versions of programs, mobile devices and perhaps some progress? 

Right, I will briefly recapitulate. Chemical structure diagrams are special; they contain chemical semantics (what an atom is, what a bond is, stereochemistry, charges, etc). One needs special programs to represent this. Take two well-known ones. ChemBioDraw V 13 is the latest in a long line dating back to 1985 or so. A newcomer is ChemDoodle, just updated to version 6. The idea is you express your molecule, and capture some of its semantics using one of these programs. And then paste the data into another veritable word processor, Word (also dating back to around 1984). Then send the Word document to a colleague. Who might want to copy the structure back out, and put it back into ChemBioDraw/ChemDoodle. And put those semantics to good use, by editing it, or re-purposing the information. This is round-tripping the data. Its been almost 30 years, surely the process should be seamless by now? Wrong!

One problem is that the “exchange-particle” is the clipboard, yet another ancient and presumed mature technology. Its invisible of course, we rarely get to see it. And very operating system specific! So what is the current state of play? Round tripping ChemBiodraw structures across a single operating system might work. Well, it currently does for just one of the two most common desktop operating systems (remember, Word is provided by the originator of one of these operating systems). The other program, ChemDoodle round trips within both operating systems.

But, here is the key point, not across operating systems. Paste either a ChemBioDraw or a Chemdoodle structure into Word on one of these OS, and try re-editing that diagram on the version of Word on the other OS. The data is lost unless you have the “right” operating system.

An experiment I have not tried, but regarding which I would welcome any feedback is to factor in the two newest operating systems, this time for mobile devices such as tablets and phones. Lets not even worry whether different flavours of one of these mobile OSs are compatible. Apps for drawing chemical structures are available for both of these. Here, the amazing clipboard still exists. One now has four OS to consider, and four homogenous permutations and a minimum of six heterogenous round trips the data could try to take for any given app. We do not even consider app2app transfers not involving discrete intermediate documents. I would predict that only a few of these permutations preserve round-tripped data and its semantics.

Perhaps we need to look at it in a different way? One simply avoids putting data from one program into another. Chemical data is kept in its own files, never mixed with data from other programs, but always kept/sent separately. Pre-1984 and the clipboard, this might have made sense. But in an era when XML was invented around 17 years ago to allow data to fully retain semantic information in any environment it finds itself in, it seems surprising that we still have this situation.

I mention all of this, since there is a current refocusing on the importance of data; “emancipating data” is now important. But the reality is that much current software destroys the semantics in data at almost every turn. Thirty years of no progress then. But what of Chem4Word, a combination of differently namespaced  XML in which the chemistry is expressed in CML (it is only available for a single operating system!). I will perhaps devote a separate post to that one; first I have to try a few experiments!

Computers 1967-2013: a personal perspective. Part 5. Network bandwidth.

Wednesday, June 5th, 2013

In a time of change, we often do not notice that Δ = ∫δ. Here I am thinking of network bandwidth, and my personal experience of it over a 46 year period.

I first encountered bandwidth in 1967 (although it was not called that then). I was writing Algol code to compute the value of π, using paper tape to send the code to the computer. Unfortunately, the paper tape punch was about 10 km from that computer. The round trip (by van) took about a week, the outcome being often merely to discover that the first line of the code contained a compilation error. I think I got to computing π after about six weeks. That is a bandwidth of about 18 characters (108 bits) in 3628800 seconds, or 0.00003 bits per second.

I did my undergraduate work in 1969, when the distance between the card punch and the computer had reduced to about 50m, and instant turnaround involved circulating in a loop between the punch and the line printer, hoping that neither suffered a paper-wreck. The bandwidth had certainly gone up. On a good day, you could make 20 or so circuits, which did leave one feeling faintly dizzy. 

The next improvement came in 1972, when I was solving non-linear equations for kinetic rate constants, using a 110 bits per second (baud) or ~ 18 characters per second using the 6-bit computers of that era) teletypewriter. This was about 50m from the lab where the kinetic measurements were made (using, if you are interested a scintillation counter. Yes, I was mildly radioactive for most of my PhD, but I do not believe I glowed in the dark). This bandwidth was in fact fine for uploading kinetic data, and receiving the computed rate constant and its standard error. You might note however that this teletypewriter was the only one in the building I occupied, and yet demand for it was small (I was pretty much its only user). 

The next increment occurred in Texas 1974-1977, where I was now doing quantum chemical calculations. Back in time to the card punch and the lineprinter (Texas is big, and so now the distance between them was a 10 minute walk). But in my last year there, a state-of-the-art 300 baud teletypewriter was installed! This was now fast enough to play a computer game (something to do with Dragons and Dungeons I think), and so now there was competition to use it. Particularly from one of my friends, who shall be called George, and who on one occasion spent about 48 virtually contiguous hours trying to get to the last level. The rest of us returned to the card punch to submit the calculations. It was also during this period that the first emails started to be exchanged, but only really as a curiosity: “it would never catch on” was the opinion of most.

Back in the UK by 1977, I was overwhelmed by the speed of the 9.6 kbaud graphics terminal I now had access to, 32 times faster. And the rate continued to multiply, by a further 1000 to attain 10 Mbaud in 1987. But another change occurred during this period. The previous eras had involved transmitting the data no more than ~200m, from one point in the campus to another. But by 1986, if one tried hard enough, one could reach ARPANET. And that was 5000 km away! My first use of such distances was to reach California and download Apple’s system 5.0 for the Macs in the department (I have described elsewhere the role the Mac’s printer port played in this). From then on, we always did have the latest operating system installed on most of the machines (although not always did this subterfuge address the intended issue, which was to stop the computer crashing as often).

These speeds however did not reach beyond the university. Back home, around 1983, I was back to using a 300 baud modem, with an acoustic coupler to the land line. Our young daughter, aged 3 at the time, joined in the data transmission with gusto. Her joyful shrieks were invariably picked up by the acoustic coupler, and translated into a jumble of characters, which were then interleaved into the numbers coming back from quantum calculations. It was sometimes difficult to tell them apart! These domestic modems gradually got faster, probably attaining 9.6 kbaud by about 1993 (during the course of which the acoustic component was replaced by electronics, and oddly, our daughter stopped shrieking in quite the same way). 

Back in the university in 1993, the first 100 megabits per second (100Mbps ≅100 Mbaud) ethernet lines and switches were being installed, but the national and international backbones were still a lot slower. It was in this year that I was approached to be part of a SuperJanet project. We were going to do a molecular videoconference from London to Cambridge and Leeds; a three-way connection, and this needed ~ 20Mbps to transmit the signal from the video camera as well as the 3D images of molecules in real-time (compression techniques were not so advanced in those days). Because BT was sponsoring the project, they naturally wanted some publicity, and so we even got to appear on the national television news that night. But we came within about 1 minute of a disaster. Our 20Mbps connection went through the SuperJanet national backbone, the capacity of which was, you guessed, ~ 20 Mbps. The network operators (located at the Rutherford-Appleton laboratories), who we had not had the foresight to pre-warn, came within 1 minute of isolating Imperial College from the national network because of our bandwidth hogging. I met them a month or so later, and they told me this. I feel I was lucky to escape with my life and body intact from that meeting (or to put it another way, they were not happy bunnies). 

By about 2000, I had achieved 1 Gbps to my desktop computer (and there it has stayed for the past 13 years). What about home? Well, to cut the story short, I recently benchmarked the domestic WiFi connection between a laptop and “the world” at about 65 Mbps (download) and 18 Mbps (upload), a little less than 1 million times greater than 30 years earlier and a 12 orders of magnitude greater than in 1967. I gather however that some lucky inhabitants of Austin Texas (the scene of my 1974-1977 experiments), courtesy of Google, can get 1 Gbps!

I will end by quoting Samuel Butler, writing in 1863I venture to suggest that … the general development of the human race to be well and effectually completed when all men, in all places, without any loss of time, at a low rate of charge, are cognizant through their senses, of all that they desire to be cognizant of in all other places. … This is the grand annihilation of time and place which we are all striving for, and which in one small part we have been permitted to see actually realised” (Quoted in George Dyson, “Darwin amongst the Machines, The Evolution of Global Intelligence”, Addison-Wesley, N.Y., 1997. ISBN 0-201-400649-7).


I just benchmarked my office computer (using only solid-state memory and that 1Gbps connection) and got 58Mbps (download)/75Mbps (upload).

The standard program was NCSA Telnet if  I remember. You made a connection from the computer (using its printer port) to the ARPANET node at University College London (not a widely advertised service), and thence to an Apple FTP site where one could initiate an anonymous file transfer back to one’s computer.  System 5 was about half a Mbyte then, and this took about 1-2 hours to retrieve (unless the connection went down, in which case one started again).

Mobile-friendly solutions for viewing (WordPress) Blogs with embedded 3D molecular coordinates.

Sunday, December 11th, 2011

My very first post on this blog, in 2008, was to describe how Jmol could be used to illustrate chemical themes by adding 3D models to posts. Many of my subsequent efforts have indeed invoked Jmol. I thought I might review progress since then, with a particular focus on using the new generations of mobile device that have subsequently emerged.

  1. Jmol is based on Java, which has been adopted by Google’s Android mobile operating system, but not by Apple’s IOS.
    • An Android version of Jmol was recently released, to rave reviews! I do not know however whether the Jmol on these posts can be viewed via Android. Perhaps someone can post a comment here on that aspect?
    • HP has just announced it will open source WebOS, but it seems Java will not be supported so probably no Jmol there then.
    • Windows 8 Mobile (Metro) also seems unlikely to support it either.
  2. Apple has been prominent in touting HTML5 as a Java replacement. In practice, this means that any molecular viewer would be based on a combination of Javascript and WebGL technologies.  Whereas Java is a compiled language, Javascript is interpreted on-the-fly by the browser. Its viability has been greatly increased by very large improvements in the speeds that browsers interpret Javascript nowadays, but this speed is unlikely to ever match that of Java. The real issue is whether that matters. The other difference is that whereas a signed Java applet allows data to escape from the security Sandbox (and into eg a file system), Javascript is likely to be much more restrictive. These two properties mean that Javascript/HTML5 implementations make a lot of use of server-side functionality; in other words a lot of bytes may have to flow between server and mobile device to achieve a desired effect (and the user may have to pay for these bytes via their data plan).
    • One early adopter of the Javascript/WebGL HTML5 model has been ChemDoodle, which I illustrated on this blog about a year ago. I have tidied up the recipe for invoking it since then, and this is given below for anyone interested in implementing it. As of this moment, one essential component, WebGL, is only available to developers of Apple’s IOS system, but I expect this to become generally available soon. When that happens, ChemDoodle components on this blog will start working.
    • A new entrant is GLmol, an open source molecular viewer for Apple’s IOS. A version is also available for Android. I may give a try at embedding this into the blog.
It seems that the 3D molecular viewing options are certainly increasing, but at the moment there is some uncertainty in performance, compatibility and the ability to extract molecular data from the “sandboxes“. This last comment relates to the re-usability of data, which I particularly value.

Although this post has focussed on embedding and rendering molecular data into a blog post, the same principle in fact applies to other expressions. Perhaps the most interesting is the epub3 e-book format, which also supports Javascript/HTML5, and which seems likely to be adopted for future interactive e-books. Indeed, it should be possible to fully convert an interactive blog created using this technology to a e-book with relatively little effort. I have also illustrated here how lecture notes can be so converted.

If you get the impression that the task of a modern communicator of science and chemistry is not merely that of penning well chosen words to describe their topic, but of having to program their effort, then you may not be mistaken.


Procedure for creating a 3D model in a WordPress blog post using ChemDoodle.

  1. As administrator, go to
    wp-content/themes/default

    (or whatever theme you use) and in the file header.php, paste the following

    <link rel="stylesheet" href="../ChemDoodle/ChemDoodleWeb.css" type="text/css">
      <script type="text/javascript" src="../ChemDoodle/ChemDoodleWeb-libs.js"></script>
      <script type="text/javascript" src="../ChemDoodle/ChemDoodleWeb.js"></script>
       <script type="text/javascript" language="JavaScript">
      function httpGet(theUrl)
       {var xmlHttp = null;
       xmlHttp = new XMLHttpRequest();
       xmlHttp.open( "GET", theUrl, false );
       xmlHttp.send( );
       return xmlHttp.responseText;}
       </script>
  2. From here, get the ChemDoodle components and put them into the directory immediately above the WordPress installation. They are there referenced by the path ../ChemDoodle as in the script above. You can put the folder elsewhere if you modify the path in the script accordingly.
  3. Invoke an instance of a molecule thus;
    <script type="text/javascript">// <![CDATA[
    var transformBallAndStick2 = new ChemDoodle.TransformCanvas3D('transformBallAndStick2', 190, 190);transformBallAndStick2.specs.set3DRepresentation('Ball and Stick');         transformBallAndStick2.specs.backgroundColor = 'white';var molFile = httpGet( 'wp-content/uploads/2011/12/85-trans.mol' );var molecule = ChemDoodle.readMOL(molFile, 2);         transformBallAndStick2.loadMolecule(molecule);
    // ]]></script>
  4. The key requirement is that the body of the script (starting with var) must not contain any line breaks; it must be a single wide line. So that you can see the whole line here, I show it in wrapped form (which you must not use);
    var transformBallAndStick2 = new
    ChemDoodle.TransformCanvas3D('
    transformBallAndStick2', 190,
    190);transformBallAndStick2.specs.
    set3DRepresentation('Ball and Stick');
    transformBallAndStick2.specs.
    backgroundColor = 'white';var molFile =
    httpGet('wp-content/uploads/2011/12/85-trans.mol');
    var molecule =ChemDoodle.readMOL(molFile, 2);
    transformBallAndStick2.loadMolecule(molecule);
  5. The key data will be located in the path wp-content/uploads/2011/12/85-trans.mol which you should upload. Note that only the MDL molfile is supported in this mode (which makes no server-side requests). One can use eg CML, but this must be as a server request.
  6. If you want multiple instances, then you must change each occurrence of the name of the variable, e.g. transformBallAndStick2 to be unique for each.
  7. If you want to annotate the resulting display, server-side requests are again needed. I do not illustrate these here, but there is an excellent tutorial.

Computers 1967-2011: a personal perspective. Part 2. 1985-1989.

Friday, July 8th, 2011

As a personal retrospective of my use of computers (in chemistry), the Macintosh plays a subtle role.

  1. 1985: In the previous part, I noted how the Corvus Concept computer introduced a network hard drive (these still being too expensive for any one individual to afford one); the same principle applied to the 1985 Macintosh but now relating to the remarkable introduction of the laser printer. Until then, us chemists had used french curves (see previous post for an explanation), stencils or transfer lettering. It could be really tedious preparing a complex manuscript. Indeed, in some published articles of the time, one often saw hand-drawn chemical diagrams! So when the Macs arrived in 1985 (and it has to be said the associated rise of ChemDraw at that time), it became imperative to network them so that everyone could have access to that precious laser printer (I still remember its network name, selected using the aptly named Chooser utility). Fortunately, the Mac came with a network port (unless I am mistaken, this was not an invariable feature of the IBM PC of the period). The network was created using a router (the first time I had come across one of these) from the Webster corporation in Australia, and our local electrician and his colleagues suddenly found themselves putting in Appletalk cables everywhere. The poor chemists in the department not only had to get used to the mouse pointing device and unfloppy floppy disks, but to the idea of selecting network devices.
  2. 1987:We also acquired a Microvax with an Evans and Sutherland PS390 stereographics device at this time (more of which later in another post), and this came with an interesting bonus. Haggling had managed to leave about £25K left over, which I decided to spend on a “grown up proper network”. This took the form of a thickwire ethernet of about 400m length. This stretched from the Microvax to the main college hub and thence the outside world (the “Internet”) and also to the close-by new network distribution cabinet where one end of the Fibre optic cable was terminated (a bonus of all this was a Pirelli calendar, yet another story that must wait to be told).  The fibre was strung to a catenary connecting to our other building (the idea being that it should be immune to lightening strikes. I had earlier explored the idea of a copper cable routed through tunnels connecting the two chemistry buildings, and spent a most interesting day down in those tunnels exploring. Therein lies yet another story for another day). Anyway, we now had a 10 megabit network (1000 times faster than the old PADs, which were still around) and this was connected to the Webster multigate routers (there were two of them now, one for each building). Our Macs all had the Internet!

    Apple, bless their hearts, distributed a control panel called MacTCP, and after I figured out what it all meant (network masks, Class C subnets and the like) I let everyone know that another network device had been added to join the laserprinter. Few IBM PC owners could boast this. At this stage, in truth, there was not that much people could connect to. Using MacTelnet, we could indeed access CAS Online, and print the search to a laserprinter. Using MacFTP, we could get files remotely from other FTP servers, and we started to acquire coordinate files for our molecular modelling. This in turn brought the realisation that the existing formats (Brookhaven protein databank files were the most common at the time) were not ideally suited for the purpose, and this could be seen as another spark for the CML (XML) work that started about nine years later. I also remember discovering that Apple computer ran their own FTP server, where I could download the latest operating system disk images (Systems 5-7 as I recollect were obtained from this site ). Things were free (but not always that easy) in those days. Our Macs ended up have the latest OS on them (in other words, they tended to crash a little less) almost as soon as it was released (and the Mac app store™, with its impending 4.6 Gbyte of OS X Lion about to be downloaded is merely the latest example of this).

  3. 1987: Armed with all this experience, I was also asked to serve a two year stint on the editorial advisory board of the Royal Society of Chemistry. At the time, what is now called supporting information was just starting, and of course it was going to be in print only. I suggested that perhaps the RSC should plan for the day when it could be online instead (the term online was not, I think, in that common use then, and electronic journals were also not yet common). I was still not happy that the only way to access that information would have to be FTP file transfers, but then little did I realise then that Tim Berners-Lee at CERN already had a glimmer in his eye.
  4. 1988: The network on the Macs became a little more useful in this year, when a Macintosh email client called Eudora was released (in truth, I had already sent my first email in 1976, from CMU in Pittsburgh whilst on a visit there, to the person standing next to me!). The Microvax alluded to above provided the mail relay, and a few brave individuals started sending email (not that many people had email addresses in those days mind you). The RSC was still grappling with this. I remember putting my email address at the top of an article submitted to them, and the copy-editor deleted it from the proofs as “unrecognised address form“. I re-instated it, they deleted it again. After some telephone negotiation, it remained (although the RSC assured me it would confuse the journal readers mightily). For the record, if you do manage to find it, it no longer works (being something like rzepa@vaxa.ch.ic.ac.uk. We were still learning how to do things properly then).
  5. 1989: I managed to convince the department that it would be useful to use computers for undergraduate teaching, and we opened a computer room with 12 Macs. I maintained them using a wonderful network utility called  RevRDist for Mac, which cloned a master Mac onto the 12 clients, and made the task of adding new software very easy. There was always lots of good software for Macs in those early days. But to introduce students to how to use them, I did feel impelled to produce a 4 page printed handout explaining it all. And I only did this once a year. Clearly again, the need to manage this better must have been in my mind.

This post focuses on a very short period, because I wanted to get across how (in my mind at least) chemistry became globally networked for the (chemical) masses (or at least those with Apple Macintosh computers!), and the role the laserprinter Pippa played in this development.