Archive for the ‘Chemical IT’ Category

Validating the chemical literature heritage. Eudesma-1,3-dien-6,13-olide.

Thursday, December 8th, 2011

Previously, I had noted that Corey reported in 1963/65 the total synthesis of the sesquiterpene dihydrocostunolide. Compound 16, known as Eudesma-1,3-dien-6,13-olide was represented as shown below in black; the hydrogen shown in red was implicit in Corey’s representation, as was its stereochemistry. As of this instant, this compound is just one of 64,688,893 molecules recorded by Chemical Abstracts. How can we, in 2011, validate this particular entry, and resolve the stereochemical ambiguity? Here I discuss one approach (a vision if you like of the semantic web).

The following facts are asserted about 16;

  1. Its connection table, namely what atoms are connected by at least a single bond.
  2. The (presumed) absolute stereochemistry at four stereogenic centres, leaving the 5th (in red) either unknown or implicit. I say presumed because often when it is not known which of two possible enantiomers a scalemic molecule exists in; just one is often drawn, in essence as a guess.
  3. The 1H NMR chemical shifts of 13 of the 20 hydrogen atoms present in the molecule (the solvent used is unreported, and may be implicitly chloroform).
  4. [α]D +375° (no solvent reported)
  5. m.p. 69.5-70.5° (note by the way that the units represented by the symbol ° are quite different for these two facts! A scientist of course can easily recognise the implicit difference)
  6. λmax (methanol) 265 mµ, ε 4800 (note again the ambiguity in the units, in fact 265 mµ is nowadays written 265 nm and the molar extinction coefficient ε is assumed to be expressed in units of L mol−1 cm−1).
Can we use these facts to validate the structure of 16 and to resolve its stereochemical ambiguity? Well, modern computational quantum chemistry can (inter alia)  supply the following:
  1. From a given connection table, an accurate prediction of the 3D coordinates of all the atoms for, in this case, either of the stereoisomers involving the hydrogen shown in red.
  2. The 1H NMR shifts relative to TMS, to an accuracy of better than 0.5ppm (often very much better).
  3. [α]D
  4. λmax (methanol) and an approximate estimate of ε.
How do things pan out? We model the more specific stereoisomer shown below, with complete stereochemical notation (CIP) now annotated in.

  1. The 1H NMR was calculated at a ωB97XD/6-311G(d,p) optimised geometry and a single point 6-311++G(d,p) wavefunction. I have linked the “DOI” identified for this calculation to this post so that the calculation itself can be verified by others. It comes out (in ppm) δ 1.02 [0.98, 3H,s], 1.17 [1.15, 3H, d], 2.11 [1.95, 3H, s], 3.85 [3.79, 1H, dd], 5.75, 6.13, 6.30 [5.2-6.0, vinyl], the reported experimental values being in square brackets […].
  2. The spin spin couplings were calculated using the NMR(spinspin,mixed) model implemented in Gaussian (a specification for which is found in the online documentation of the NMR keyword). For δ 3.79, two couplings of 10 Hz are reported. The calculation predicts 9.77 and 9.53 Hz (for assignments, click on the image above to get a 3D model).
  3. [α]D +391° (calculated for chloroform)
  4.  λmax 265 nm (calculated for methanol; ε ~4800 for a linewidth of 3600 cm-1).
  5. Strictly speaking, all of the above should be repeated for the other possible stereoisomer, and the results for the two together analysed statistically.
Can we add data to the original information (a process which might be called curation)? Well, we can using the above calculations to;
  1. provide estimated chemical shifts and coupling constants for ALL the protons in the molecule, not just the 13 reported by Corey, and for all the carbons (no 13C spectrum was reported). Advances in spectrometer sensitivity and resolution mean that if these spectra were ever to be (re)measured, the additional protons could probably be easily identified, and both homo and heteronuclear spin-spin couplings measured.
  2. predict the electronic circular dichroism spectrum for 16 (not previously measured) and in particular the Cotton effect on the λmax 265 nm absorption as being positive (Δε ~+20). This would allow the absolute configuration of this scalemic molecule to be independently validated. We could add to this a prediction of the vibrational circular dichroism spectrum if need be.
  3. What we cannot easily do is predict the melting point (or indeed the crystal packing), although no doubt this will become more reliable in the future.
So what is the big picture? In the earlier post, I had identified a key article in the development of the electronic theory of pericyclic reactions, and in particular how the inferred stereochemistry of 10, 13 and 16 could have been used as the spark that ignited that theory. It would have been essential to ensure that these stereochemical foundations were absolutely sound. In this case of course, the compounds were related to many others by synthetic transformations, and the very fabric of the connections between these molecules served as a validation of the nature of the molecules.

But think how many (millions) of such molecules have been discovered, and how the majority of these have probably not been subjected to such rigorous scrutiny. It is entirely possible that much of the chemical literature is sprinkled with errors in assignments (and many more have unresolved ambiguities, such as the stereochemistry of the hydrogen shown in red at the top of this post). However, for the first time in the history of chemistry, we can now (almost routinely) use quantum modelling to provide independent validation of the chemical literature, as illustrated above. Of course, the validation is not absolute, merely probable to some degree (the above example we might agree shows a very high level of probability that the structure shown is in fact correct). More importantly, in computational validation, we have the potential for automation. One might strive for an infra-structure where much of the validation can be performed automatically, by tireless machines that operate 24/7, and that only flag probable errors when they discover them. This is the vision of the chemical semantic web!

Spotting the unexpected. Anomeric effects involving alkenes?

Wednesday, November 2nd, 2011

How one might go about answering the question: do alkenes promote anomeric effects? A search of chemical abstracts does not appear to cite any examples (I may have missed them of course, since it depends very much on the terminology you use, and new effects may not yet have any agreed terminology) and a recent excellent review of hyperconjugation does not mention it. Here I show how one might provide an answer.

First, what is an anomeric effect? The diagram below shows the classic anomeric effect in which a donor (an oxygen lone pair) interacts with an acceptor (a C-O bond). The orientation around the single bond shown with a green arrow is crucial; the effect only happens when the donating lone pair is aligned antiperiplanar to the accepting C-O bond, at which point the lengthening of the C-O bond should be maximal (shown as a dashed line below). The blue analogue is the corresponding effect using an alkene as the donor, but retaining the C-O bond as the acceptor.

I had previously addressed this theme by discussing the molecule below. Switching the acceptor from a C-O to a C-cyano bond has the effect of inducing an axial orientation for both cyano groups, a “cyanomeric” effect! Whilst the stronger is undoubtedly the one shown in red, note the blue interaction, that involves an alkene rather than oxygen as donor.

One way of providing evidence is a crystallographic search. Here I am using Conquest, the program provided by the Cambridge crystallographic data centre, with the following specification (thanks to Andrew White for helping me frame this search!).

The search query

  1. The length of the C-O bond (blue arrow) is defined as a search parameter
  2. The absolute value of the torsion around the bond (red arrow) is also so defined
  3. I have restricted the acceptor to C-O bonds (this of course excludes C-CN).
  4. The C-O acceptor can be enhanced by bearing an electron withdrawing group, which can be e.g. carbonyl, phosphate, sulfate, perchlorate etc.
  5. The alkene donor can be enhanced with donating groups such as oxygen, nitrogen or carbon
  6. NOT Booleans are applied to restrict the substituents the alkene can carry  to only sp3 carbons (or H) by excluding sp2 or sp hybridised carbons. This is to prevent the substituents from delocalizing the alkene (in effect preventing competition from these substituents), but allowing them to stabilise any induced carbocation resonance by hyperconjugation.
  7. The C of the C-O is specified as acyclic (to allow the torsion to in theory have any allowed value).
  8. The search is also restricted to structures with no disorder or other errors, and an R factor of < 0.075.
These specifications can be seen in the first hit obtained:

A hit

A total of 215 structures are found, and a scatterplot of the C-O bond length version the (abs)C=C-C-O torsion is shown below.

Scatterplot. Click to view a larger version.

There are two main clusters of hits, those with torsions close to zero, and those with torsions between ~90-120°. The latter cluster is very clearly shifted to the right of the former, indicating that on average these C-O bond lengths are longer. The red-orange-light green hits (1.46-1.50Å range) are to be found exclusively in the “antiperiplanar” cluster. One might conclude that statistically, the π-anomeric effect appears real. Of course, there may be many other reasons why the C-O bond is lengthened, and each of the molecules above should be individually inspected to exclude these.

This sort of structural search takes only minutes (if you know how to formulate it) and I would certainly encourage you to try it out on your own favourite effect!  See if the excellent  and open CrystalEye resource gives a similar answer (the Conquest /CCDC system is commercial, and not open).


H. S. Rzepa, 2011-11-02. URL:http://www.ch.imperial.ac.uk/rzepa/blog/?p=5368. Accessed: 2011-11-02. (Archived by WebCite® at http://www.webcitation.org/62tOSgnzK)

Blogbooks, e-books and future proofing chemical diagrams.

Monday, October 31st, 2011

Most of the chemical structure diagrams in this blog originate from Chemdraw, which seems to have been around since the dawn of personal computers! I have tended to use this program to produce JPG bitmaps for the blog, writing them out in 4x magnification, so that they can be scaled down for display whilst retaining some measure of higher resolution if needed for other purposes. These other purposes might be for e.g. the production of e-books (using Calibre), the interesting Blog(e)book format offered as a service by Feedfabrik, or display on mobile tablets where the touch-zoom metaphor to magnify works particularly well. But bitmap images are not really well future proofed for such new uses. Here I explore one solution to this issue.

I have previously mentioned scalable vector graphics (SVG) as an alternative, and fortunately the production of such has become routine.3 The diagram above2 is indeed SVG (and if you cannot see it, then try a modern SVG-capable browser1). It was produced thus:

  1. Drawn in Chemdraw
  2. Exported as Encapsulated postscript
  3. Imported into  Scribus, an Open Source desktop publishing program (where it can be annotated/edited if need be)
    • This program will also need Ghostscript installed to handle the EPS
  4. and exported from Scribus to SVG.
  5. Notice how the diagram above automatically scales to fill the width of the page. If you click on it, you get the diagram on its own. If you zoom the browser window, it should scale perfectly.
  6. I note that these SVG diagrams work well in e-books or blogbooks.
There seem to be many other (open) programs out there which support SVG, so the above combination is not necessary the only one, or indeed the best. There is one other aspect which might be mentioned. The old GIF or JPG bitmap formats do have good meta-data support, such as  EXIF, GPS or XMP. These invisible data have often been used to embed a molecular connection table into a GIF or JPG file, such that the original molecular data can be reconstituted from the image file. Unfortunately, there are no real standards for doing this, and so round-tripping the data is probably a closed process within a specific software environment. However, because SVG is an XML format, it can be readily made to carry such information in a fully inter-operable manner. For example, one could easily embed a CML description of the molecule into its own container (namespace) in the SVG file. For the purposes of rendering an on-screen image, this extra information is of course ignored.

1 I notice that Internet Explorer 9 (both 32- and 64-bit versions) will display (and save) the above diagram if you click on it, but it cannot (yet) be inlined into the post, although the documentation implies it should.
2 The version below is the conventional JPG form (click on it to see the original 4x version).

Diagram displayed using JPG.

3. Historical note. Peter Murray-Rust and I have been promoting SVG for use in chemistry for 11+ years now. For one ancient page, see here. The syntax has decayed somewhat, but some of the diagrams still work!

Science publishers (and authors) please take note.

Monday, October 24th, 2011

I have for perhaps the last 25 years been urging publishers to recognise how science publishing could and should change. My latest thoughts are published in an article entitled “The past, present and future of Scientific discourse” (DOI: 10.1186/1758-2946-3-46). Here I take two articles, one published 58 years ago and one published last year, and attempt to reinvent some aspects. You can see the result for yourself (since this journal is laudably open access, and you will not need a subscription). The article is part of a special issue, arising from a one day symposium held in January 2011 entitled “Visions of a Semantic Molecular Future” in celebration of Peter Murray-Rust’s contributions over that period (go read all 15 articles on that theme in fact!).

Here I want to note just two features, which I have also striven to incorporate into many of the posts this blog (which in one small regard I have attempted to formulate as an experimental test-bed for publishing innovations). Scalable-Vector-Graphics (SVG) emerged around the turn of the millennium as a sort of HTML for images. To my knowledge, no science publisher has yet made it an intrinsic part of their publishing process (although gratifyingly all modern browsers support at least a sub-set of the format). Until now (perhaps). Thus 10.1186/1758-2946-3-46 contains diagrams in SVG, but you will need to avoid the Acrobat version, and go straight to the HTML version to see them. However, what sparked my noting all of this here was the recent announcement by Amazon that they are adopting a new format for their e-books, which they call Kindle Format 8 or KF8 (the successor to their Mobi7 format). To quote: “Technical and engineering books are created more efficiently with Cascading Style Sheet 3 formatting, nested tables, boxed elements and Scalable Vector Graphics“. This is wrapped in HTML5 to be able to provide (inter alia) a rich interactive experience for the reader. In fairness, there is also the more open epub3 which strives for the same. Other features of HTML5 include embedded chemistry using WebGL and the same mechanisms are being used for the construction of modern chemical structure drawing packages.

It remains to be seen how much of all of this will be adopted by mainstream chemistry publishers. Here, we do get into something of a cyclic argument. I suspect the publishers will argue that few of the authors that contribute to their journals will send them copy in any of these new formats and that it would be too expensive for them to re-engineer these articles with little or no help from such authors. The chemistry researchers who do the writing (perhaps composition might be a better word?) might argue there is little point in adopting innovative formats if the publishers do not accept them (I will point out that my injection of SVG into the above article did have some teething problems). For example, you will not find SVG noted in any of the “instructions for authors” in most “high impact journals” (or, come to that, HTML5).

If one looks at the 25 year old period, in 1986 all chemistry journals were distributed exclusively on paper. My office shelves still show the scars of bearing the weight of all that paper. Move on 25 years, and all journals almost without exception are now distributed electronically. I suspect the outcome in many a reader’s hands is simply that they (rather than the publisher) now bear the printing costs themselves (despite or perhaps because of the introduction of electronic binders such as Mendeley). But it will only be when the article itself grows out of its printable constraints, and hops onto mobile devices such as Kindles and iPads in the promised (scientifically) interactive and data-rich form, that the true revolution will start taking place.

A final observation: you will not readily obtain the interactive features of 10.1186/1758-2946-3-46 on e.g. an iPad or Kindle because the Java-based Jmol is not supported on either. But Jmol has now been ported to Android, and its certainly one to watch.

Bonds.

Thursday, October 13th, 2011

Bonds are a good example of something all chemists think they can recognise when they see them. But they are also remarkably dependent on context. We are running a molecular modelling course at the moment, and I found myself explaining to someone how very context-sensitive they can be. I thought it might be useful to collect my thoughts here.

  1. The most primitive bond is the connection type. This is used in chemical informatics to define a connection table for a molecule, which is used by all the major chemical databases to index and hence search for molecules. It is also used by the InChI identifier to create the InChI key, and of course SMILES strings. The connection bond has no other properties (such as its bond order etc), but it is assumed to be covalent rather than ionic.
  2. The next is the display bond. This is used by chemical visualisation programs; it is normally created by the code based on very simple rules, such as how far apart the two (or more) atoms are. Such bonds are normally drawn with straight lines, of which there can be up to five (or six at a pinch) nowadays. There is however only a fuzzy convention for how non-integer bond orders are represented. A dashed line can be added (and it might be the only line for weaker types such as hydrogen bonds), but its clear this display convention is suffering at this stage.
    • Perhaps to keep the synthetic chemists happy, I should add two flavours to this category, the stereochemical display bond, which attempts to add a 3D context but in truth does this less than perfectly and the retrosynthetic bond. I will not dwell.
  3. Then there is what I call the mechanical bond. This is used in molecular mechanics force fields. It is a declared bond, i.e. you declare where you want the bond to be, and once that is done, it remains there (it is thus never broken). Each declaration is associated with (quadratic) force constants, which taken as a whole define the force field.
  4. Next comes the quantum chemical bond. This is defined by a wavefunction, which in turn tells us about the electron density. This, to be frank, can be a can of worms. There must be dozens of ways of interpreting the electron density in terms of a bond type. I have used just one of these on this blog, the ELF procedure, which gives an estimate of how many electrons are involved in any bond (and these are always non-integers). Books could be written about this topic, but I will mention just three varieties which indicate how confusing quantum bonds can become. These are the homo(aromatic) bond, which itself comes in two varieties, bond and no-bond types (DOI: 10.1021/jp026521l), bent bonds and transition state bonds. Phew!
  5. The quantum topological bond emerges from  Bader’s QTAIM procedure, which provides a formal topological framework for defining what a bond is. As I noted in earlier posts, it is controversial, since it does not always reflect what chemists might regard as a useful definition that helps them do chemistry.
  6. Finally (?), I could add Rydberg bonds, which are mysterious formations on excited state surfaces, and which can be extraordinarily long (> 500Å), thus defying application of simple distance rules as noted in type 2 above.
It is a taken that the moment anyone tries to define boundaries and rules for bonds, people will argue against the scheme. But if you have your own type which is missing above, do let me know!

Steve Jobs and chemistry: a personal recollection.

Sunday, October 9th, 2011

Steve Jobs death on October 5th 2011 was followed by a remarkable number of tributes and reflections on the impact the company he founded has had on the world. Many of these tributes summarise the effect as a visionary disruption. Here I describe from my own perspective some of the disruptions to chemistry I experienced (for another commentary, see here).

Chemical diagram, circa 1983.

The diagram above originates in 1983 just before the impact of Jobs’ vision burst upon chemistry. It was published in one of the new-generation of camera-ready journal, the objective being to reduce publication times from a typical 12-24 months down to around around three months. Camera-ready meant that the authors had to prepare a photo-ready manuscript; the role of these journals was to photograph, print and publish. The diagram above was prepared using stencils and Rotring technical pens together with Letraset lettering. The snippet above would probably take an hour or two to draft; the diagrams for an entire article were probably about 1 weeks work. Imagine how much time would be needed for a 200 page PhD thesis (some of this time was occupied by rushing out to a purchase more Letraset sheets because one had run out of say the letter r needed to represent the bromine in the above!). The diagram below was publishedin the same camera-ready journal in 1987.

Chemical diagram, circa 1987.

It was produced using Chemdraw on an Apple Macintosh computer introduced in 1984 (and which reached UK chemistry departments in 1985) and printed on an Apple laser printer. It would have taken perhaps 5 minutes to produce. More significantly, by copying and pasting (terms which need little explanation nowadays), one could re-use the diagram repeatedly as a template in a more complex scheme for little extra effort. You might argue that these two diagrams do not actually differ in quality that much (actually, the Apple-derived diagrams are of much higher quality than implied above, and the loss of quality is because the article has subsequently been scanned by the journal). But in fact the impact of Jobs’ Macintosh computer was far greater than just being able to produce nice chemical diagrams. Because it also introduced chemists to disruptive new concepts, the consequences of which are still impacting today.

The first is the idea of the re-use of digital data, as mentioned above. Once one had a diagram drawn, one could use it to almost instantly derive other properties of the molecule. For example, the molecular weight or an atom connection table. This in turn could be used to start an online search. And it was the Macintosh that really bump-started the idea of online activities.

Although chemistry had started going online around 1980 (I remember a single terminal station enabling STN express online access to chemical abstracts being introduced then, and in fact computational chemists were already online around 1974), the idea of an entire department of researchers ALL being online in their lab or office was very much the result of introducing the Macintosh in 1985. It came with a network connector at no extra cost. This in turn allowed all owners of such a computer to connect online to the (then very expensive) laser printer, and as a by-product almost, to the rest of the world! I have described some of the disruption this introduced elsewhere. By around 1987, most of our Mac users were happily going online (it has to be said that owners of IBM PCs were rarely doing so at this time). That is one of the true legacies that Jobs’ disruptive technologies introduced to us chemists.

I am going to quote Samuel Butler now, writing in 1863: “I venture to suggest that … the general development of the human race to be well and effectually completed when all men, in all places, without any loss of time, at a low rate of charge, are cognizant through their senses, of all that they desire to be cognizant of in all other places. … This is the grand annihilation of time and place which we are all striving for, and which in one small part we have been permitted to see actually realised“.

Steve Jobs made a big contribution to that general development of the human race!

Hunt the charge: the Cheshire cat of chemistry

Thursday, September 29th, 2011

Charges in chemistry, like the grin on Lewis Carroll’s cat, can be mysterious creatures. Take for example the following structure, reported by Paul Lickiss and co-workers (DOI: 10.1039/b513203g).

A silenium cation.

A student of chemistry might be wondering what is going on, since this representation seems to “break the rules”. Thus there is a clear-cut pentavalent carbon atom, but even more mysteriously, there is a positive charge that seems to be floating uncertainly (much like Carroll’s Cheshire cat). Another Lewis famously introduced the concept of the covalent electron pair bond in 1916, and ever since then we tend to represent these types of bond with (straight) lines. If this convention is adhered to rigidly, then the carbon highlighted with the purple dot would have five lines to it, 10 electrons, and therefore in violation of the octet rule for main group elements. Nowadays however, a line between two atoms is not necessarily interpreted as a Lewis structure, but more simply representing a connection to be used in a connection table for indexing and searching the structure. So what IS the valency of this carbon, and where IS that charge located?

One starts by converting the above representation into more formally correct Lewis, or valence-bond structures.

Valence bond structures

These structures (there are more, but they are related by symmetry to those shown above) are bound, by the rules, to locate the bonds exactly, and hence allow one to infer where the charge is. The latter two emerge as different resonance forms of what we call Wheland intermediates. The former is the silicon equivalent of a tertiary carbocation. Quantum mechanics now tells us which of these (if any) is the most realistic. To do this, I invoke the ELF (electron localisation function) method, which identifies so-called synaptic basins in the function, and how much electron density is contained in each (there are of course many other ways of partitioning the electrons).

ELF basin integrations.

Basin 1 has 1.51, basin 2 has 2.39, basin 3 has 2.84 and basin 4 has 2.65e. Let us discuss the significance of these.

  1. Basins 1+2 (+ their symmetric equivalent) together contain 3.02 + 4.78 = 7.8e. Thus this carbon definitely is not hypervalent, since its octet is pretty much satisfied! But notice that this carbon is not a conventional so-called sp3 hybridized carbon, which has four equal two-electron bonds. This one has two bonds with significantly less than two electrons, and two with significantly more! A most unusual 4-coordinate carbon. Bond 1 has a Wiberg bond index of 0.48 and bond  2 is 1.29.
  2. By the same process, each Si atom integrates to 7.72e. If either were to be a silacation, that would imply only 6 electrons, which is clearly not the case for either  Si.
  3. How about the ortho, meta and para carbons of the phenyl ring? These are respectively 7.39, 7.64 and 7.44e. These are definitely a bit low, and taken together they constitute the equivalent of a six-valence electron carbocation. So it looks as if we have found our positive charge, which is delocalized on the phenyl ring in the manner of a Wheland intermediate (with more of the +ve charge on the o/p positions).

Well, if this species is really a Wheland intermediate in which the cyclic conjugation of π-electrons is disrupted, the phenyl ring should not be aromatic. In fact, it turns out this ring IS recognisably (if not highly) aromatic. Its NICS(1) index value is -8.3ppm (benzene is ~-11 on the same scale). Exactly the same phenomenon was found for the supposed Wheland intermediate (which in fact turned out to be a transition state) identified as the mechanism of nitrosation of benzene using CF3COONO. Can all these disparate properties be reconciled?

Yet another way of looking at what is happening in this molecule is Natural bond orbital analysis (NBO). I have previously used this technique to probe the structure of DNA, and for identifying unexpected anomeric effects (amongst others). Applied to this system, it reveals four donor-acceptor interaction energies E(2), each of ~ 10.5 kcal/mol, between the four permutations of e.g. bond 1 acting as the donor, and bond 3 acting as an acceptor, a σSi-C*C-C interaction. The value of E(2) corresponds to almost a full anomeric effect (these tend to be ~15 kcal/mol if the donor is a O lone pair and the acceptor a  C-O bond), and there are four of them after all! This particular conjugation is the one that makes the phenyl ring retain much of its aromaticity, i.e. having its cake and eating it.

Notice that individually, each of the effects I have described above is actually borrowed from fairly conventional introductory level organic chemistry (Lewis structures, the octet rule, cation stability, aromaticity, stereoelectronic/anomeric effects), and it shows how a combination of these in a single molecule can result in quite unusual properties.

Computers 1967-2011: a personal perspective. Part 3. 1990-1994.

Tuesday, July 12th, 2011

In 1986 or so, molecular modelling came of age. Richard Counts, who ran an organisation called QCPE (here I had already submitted several of the program codes I had worked on) had a few years before contacted me to ask for my help with his Roadshow. He had started these in the USA as a means of promoting QCPE, which was the then main repository of chemistry codes, and as a means of showing people how to use the codes. My task was to organise a speakers list, the venue being in Oxford in a delightful house owned by the university computing services. Access to VAX computers was provided, via VT100 terminals. Amazingly, these terminals could do very primitive molecular graphics (using delightfully named escape codes, which I learnt to manipulate).

An expert on the use of such codes was George Purvis, who hailed from the quantum theory project at the University of Florida at Gainesville. He had developed QUIPU for VAX/VT100 and together we had much fun setting things up for the participants at these QCPE workshops (which ran 1986-1990). During one session, George asked me whether I thought a properly implemented and reasonably cheap graphical user interface might have commercial potential in chemistry. Remember, the VAX/Evans&Sutherland PS390 system we had acquired in 1987 was NOT cheap. I must have encouraged him, since in 1990 George (now part of the CACHE, or computer assisted chemistry, group at the Tektronix corporation in Beaverton) had brought to market a “shrink-wrapped” system which did just that. This was, in many ways, well ahead of its time. It was based on a then state-of-the-art Macintosh computer, with a co-processor that could crunch floating point numbers quite fast (this was then very rare in so called personal computers, being reserved for supercomputers). It had a unique spherical trackball (almost a haptic device) for rotating molecules, and a liquid crystal polarized screen running at 120Hz (60Hz for the left eye, 60Hz for the right eye). Wearing polarized (passive) glasses, the stereo 3D effect via the 19″ screen (big for its day) was awe inspiring. What is more, two people could sit at it and both see molecules in stereo.

We managed to get a grant to purchase such a system, and I well remember taking it to the 1990 Oxford workshop (I had now taken over from Richard for the UK workshops) in the back of my car. This involved driving to my office on a Saturday, and heaving the thing out. A security guard saw me doing this and arrested me. After much ado, I was forced to take the CACHE to my office and told not to try that again. I waited 30 minutes, and took it out the back door (which nowadays has a black security camera watching it, but in those days was not guarded) and on to Oxford (checking for police sirens all the way). I think I made the trip to Oxford with this thing in the back of the car one more time, where I used it to give a poster at a conference, handing out the 3D glasses to anyone who expressed an interest (and reclaiming them rapidly if they posed no interesting question). I still fancy this was almost unique in the history of posters (which tend, even nowadays, to be printed on paper). Reflecting on this, I realise that my total aversion to Powerpoint probably dates from that time.

At this stage, I will tell you about some of the science we did with the remarkable stereographical 3D CACHE system. The first is our realisation that the Pirkle reagent exhibits a π-facial hydrogen bond from the OH group (DOI: 10.1039/C39910000765). Indeed, I notice that four of the posts here relate to this topic! Once you know what you are looking for, its trivial to spot. But I recollect that the crystallographers who did the structure for us had failed to identify this unusual hydrogen bond; it took the CACHE, and its 3D glasses, for us to notice it.

But the really important breakthrough using CACHE was a different molecule, halofantrine (X=Y=Cl, DOI: 10.1039/C39940001135) an antimalarial pharmaceutical molecule.

Halofantrine.

At this stage, pharmaceutical companies were assiduously resolving chiral compounds into their enantiomers and testing each separately for biological activity. It had been noticed that whereas X=H, Y=Cl could NOT be resolved on a chiral column, replacing X=H by X=Cl suddenly made it possible to do so. But why? Well, in order to inspect this with the CACHE system, we asked for the crystal structure to be done. Back it came and Mike Webb and I sat inspecting the coordinates in full stereoscopic glory, as I recollect for about an hour, twiddling the viewpoint here and there. Each of us would take over the haptic trackball for 10-15 minutes, and we would then discuss what we saw. In one of those magical moments (I can assure you that shivers do run down one’s back at moments like this) we spotted that X=H had a strong hydrogen bond to the OH of another molecule, whereas X=Cl did not. Suppressing that C-H…O interaction forces the molecule to π-π stack instead, and this mode now enables it to better interact with the chiral column and hence resolve.

Halofantrine. Click for 3D.

Some of that magic is recreated above. If you click on the image, the coordinates will be loaded. Now that the relevant interaction is highlighted, it is so easy to spot you might wonder how anyone would have ever missed it!. At any rate, shortly after writing this article, I sat down to write another on a new phenomenon called the World-Wide-Web. And to illustrate why the Web might become important, we highlighted halofantrine, and how the Web could carry such immediately visual information to its readers. This blog, in effect, is a direct descendent of that article (which, by the way, is still available in HTML form here). So, 3D graphics led to the (chemical) Web. What a tangled web indeed.

And to end with 3D. I live in hope that shortly, stereoscopic tablets will make an appearance. Given that the CACHE system noted above was heavy (it was a major struggle moving the monitor into the car, as described above), it will be an amazing evolution to see (almost) pocket sized devices being carried around for the same purpose.

Computers 1967-2011: a personal perspective. Part 2. 1985-1989.

Friday, July 8th, 2011

As a personal retrospective of my use of computers (in chemistry), the Macintosh plays a subtle role.

  1. 1985: In the previous part, I noted how the Corvus Concept computer introduced a network hard drive (these still being too expensive for any one individual to afford one); the same principle applied to the 1985 Macintosh but now relating to the remarkable introduction of the laser printer. Until then, us chemists had used french curves (see previous post for an explanation), stencils or transfer lettering. It could be really tedious preparing a complex manuscript. Indeed, in some published articles of the time, one often saw hand-drawn chemical diagrams! So when the Macs arrived in 1985 (and it has to be said the associated rise of ChemDraw at that time), it became imperative to network them so that everyone could have access to that precious laser printer (I still remember its network name, selected using the aptly named Chooser utility). Fortunately, the Mac came with a network port (unless I am mistaken, this was not an invariable feature of the IBM PC of the period). The network was created using a router (the first time I had come across one of these) from the Webster corporation in Australia, and our local electrician and his colleagues suddenly found themselves putting in Appletalk cables everywhere. The poor chemists in the department not only had to get used to the mouse pointing device and unfloppy floppy disks, but to the idea of selecting network devices.
  2. 1987:We also acquired a Microvax with an Evans and Sutherland PS390 stereographics device at this time (more of which later in another post), and this came with an interesting bonus. Haggling had managed to leave about £25K left over, which I decided to spend on a “grown up proper network”. This took the form of a thickwire ethernet of about 400m length. This stretched from the Microvax to the main college hub and thence the outside world (the “Internet”) and also to the close-by new network distribution cabinet where one end of the Fibre optic cable was terminated (a bonus of all this was a Pirelli calendar, yet another story that must wait to be told).  The fibre was strung to a catenary connecting to our other building (the idea being that it should be immune to lightening strikes. I had earlier explored the idea of a copper cable routed through tunnels connecting the two chemistry buildings, and spent a most interesting day down in those tunnels exploring. Therein lies yet another story for another day). Anyway, we now had a 10 megabit network (1000 times faster than the old PADs, which were still around) and this was connected to the Webster multigate routers (there were two of them now, one for each building). Our Macs all had the Internet!

    Apple, bless their hearts, distributed a control panel called MacTCP, and after I figured out what it all meant (network masks, Class C subnets and the like) I let everyone know that another network device had been added to join the laserprinter. Few IBM PC owners could boast this. At this stage, in truth, there was not that much people could connect to. Using MacTelnet, we could indeed access CAS Online, and print the search to a laserprinter. Using MacFTP, we could get files remotely from other FTP servers, and we started to acquire coordinate files for our molecular modelling. This in turn brought the realisation that the existing formats (Brookhaven protein databank files were the most common at the time) were not ideally suited for the purpose, and this could be seen as another spark for the CML (XML) work that started about nine years later. I also remember discovering that Apple computer ran their own FTP server, where I could download the latest operating system disk images (Systems 5-7 as I recollect were obtained from this site ). Things were free (but not always that easy) in those days. Our Macs ended up have the latest OS on them (in other words, they tended to crash a little less) almost as soon as it was released (and the Mac app store™, with its impending 4.6 Gbyte of OS X Lion about to be downloaded is merely the latest example of this).

  3. 1987: Armed with all this experience, I was also asked to serve a two year stint on the editorial advisory board of the Royal Society of Chemistry. At the time, what is now called supporting information was just starting, and of course it was going to be in print only. I suggested that perhaps the RSC should plan for the day when it could be online instead (the term online was not, I think, in that common use then, and electronic journals were also not yet common). I was still not happy that the only way to access that information would have to be FTP file transfers, but then little did I realise then that Tim Berners-Lee at CERN already had a glimmer in his eye.
  4. 1988: The network on the Macs became a little more useful in this year, when a Macintosh email client called Eudora was released (in truth, I had already sent my first email in 1976, from CMU in Pittsburgh whilst on a visit there, to the person standing next to me!). The Microvax alluded to above provided the mail relay, and a few brave individuals started sending email (not that many people had email addresses in those days mind you). The RSC was still grappling with this. I remember putting my email address at the top of an article submitted to them, and the copy-editor deleted it from the proofs as “unrecognised address form“. I re-instated it, they deleted it again. After some telephone negotiation, it remained (although the RSC assured me it would confuse the journal readers mightily). For the record, if you do manage to find it, it no longer works (being something like rzepa@vaxa.ch.ic.ac.uk. We were still learning how to do things properly then).
  5. 1989: I managed to convince the department that it would be useful to use computers for undergraduate teaching, and we opened a computer room with 12 Macs. I maintained them using a wonderful network utility called  RevRDist for Mac, which cloned a master Mac onto the 12 clients, and made the task of adding new software very easy. There was always lots of good software for Macs in those early days. But to introduce students to how to use them, I did feel impelled to produce a 4 page printed handout explaining it all. And I only did this once a year. Clearly again, the need to manage this better must have been in my mind.

This post focuses on a very short period, because I wanted to get across how (in my mind at least) chemistry became globally networked for the (chemical) masses (or at least those with Apple Macintosh computers!), and the role the laserprinter Pippa played in this development.

Computers 1967-2011: a personal perspective. Part 1. 1967-1985.

Thursday, July 7th, 2011

Computers and I go back a while (44 years to be precise), and it struck me (with some horror) that I have been around them for ~62% of the modern computing era (Babbage notwithstanding, ~1940 is normally taken as the start of the modern computing era). So indulge me whilst I record this perspective from the viewpoint of the computers I have used over this 62% of the computing era.

  1. 1967: I encountered (but that term has to be qualified) my first computer, suggested to me as an alternative to running quarter marathons on Wimbledon common at school by an obviously enlightened teacher! I wrote a program (in Algol) on paper tape, put the tape in an envelope, and sent it off to Imperial College (by van) to run, on an IBM 7094. A week later, printed output showed you had made a mistake on line 1 of the program. As I recollect, after about eight weeks of this, I got the program to run (and calculated π to 5 decimal places).
  2. 1970: By now I was a student (again at Imperial College), and was introduced to Fortran, then a radical new innovation to a chemistry degree. The delightfully named pufft compiler combined with the 7094 again, but this time with punched Holerith cards as input and line printer output. I cannot remember what we were asked to program. I do remember that the punched cards were produced by a pool of punch card operators, working from code pages written by the programmer. Some students (not me!) thought it great fun to give their Fortran variables naughty names (which the punch card operators then refused to punch, thus causing the student to fail the course!).
  3. 1971: I really liked this programming lark, so when instant-turnaround was introduced that year, I decided to do a proper program. It was called NLADAD (yes, I was no good at names, even then), which stood for non-linear-analysis of donor-acceptor complexes. The idea was to take recorded NMR chemical shifts, and fit them to an equilibrium A+B ⇔ AB+B ⇔ AB2 using non-linear regression analysis. It must have been all of 200 lines of code (OK, I did not write the matrix inversion routine myself)! Instant turnaround was also great, you got to punch your own cards this time, and had the great excitement of feeding them into a card reader yourself. You then walked about 5 yards to the line printer and waited agog. No waiting one week, this was less than a minute. Or it would have been if the line printer did not paper-wreck every two minutes! (I might add that I have a dim recollection of a member of the computer centre staff standing by to recover these paper wrecks. He, by the way, is now the director of the ICT division here!).
  4. 1972: I am now doing a PhD (yes, boringly, yet again at Imperial College). I had found the one and only teletypewriter in the chemistry department. The crystallographers had secreted it away in their empire, but were very dismayed to find me occupying it constantly. Instant was now even more instant. I was now connecting to a time-sharing CDC 6400 computer, at the dazzling speed of 110 baud (or bytes per second). These were small bytes by the way, since the CDC used 6 bits per byte. The result was that one did everything in UPPER CASE, since a 6-bit byte only allows 64 characters! My (still Fortran) programs reached probably 1000 lines of code now, and I was engrossed in deriving non-linear analyses of steady state chemical kinetics (about four different kinds of rate equation as I recollect). Ah, the joys of covariance analysis, and propagation of errors (I was in a kinetics lab, and all the other students plotted graphs on graph paper, and if pressed, plotted gradients of graphs, the so-called Guggenheim plots. I thought this the dark ages, but no-one volunteered to join me in this single teletypewriter room. Not even the attractive girls in the group. I was the geek of my time, no doubt about that. My kinetic analysis did however have one upside. Its how I meet my wife to be a few years later!).
  5. 1974: PhD completed, I was now ready to go to Texas, where everything is bigger (and in terms of computers, slightly better, a CDC 6600 now and a 300 baud teletypewriter!). I had been computing now for seven years, and finally I actually got to SEE the device for the very first time. My mentor, Michael Dewar, had a sort of special relationship with the university. His students (and possibly only his students) were allowed to go into the depths of the machine room, where behind plate glass you could see the CDC 6600. I soon learnt how to get even closer. It was not particularly exciting however. I was more entranced with the CALCOMP flatbed plotter, which was located next to the 6600. Pictures at last (you probably do not want to know that to convert my kinetics in 4 above to pictures, I got quite expert in using a french curve. Look it up before you jump to conclusions). Part of the pact I negotiated was that I was only allowed into the inner sanctum at 03:00 in the morning (sic!). Still a geek then! Oddly, I was one of the few students in Dewar’s group using the CALCOMP, but at least we now had pictures of the molecules I was now calculating (using MINDO/3). To put the computing power into context, in 1975, Paul Weiner, another group member, announced that he had completed a full geometry optimisation of LSD, this having taken about 4 days to do on that over-worked 6600. The entire group went out to celebrate. Many pitchers of beer were drunk that nite.

    Computer graphics from 1976.

  6. 1977: Back to Imperial, where we might have also now had a CDC 6600. And a Tektronix terminal running at the dizzying (hardwired end-to-end) speed of 9600 baud. I learnt to Word process on this device (using a word processor, written in Fortran, although not by me) and I wrote three review articles by this means, using a fancy phototypesetter as the printer. My next program, STEK, probably ran to about 5000 lines of code, and it persuaded the Tektronix to plot all sorts of things, ball&stick diagrams, isometric potential surfaces, molecular orbitals, and the like (and jumping ahead, my experience with this program eventually led to CML, and Peter Murray-Rust, but that is indeed jumping ahead). I think I also managed to gain access to the Imperial machine room, that inner sanctum, yet again. But for reasons I will not go into, it was not as interesting as the Texan machine room.

    Chemistry Computer graphics, circa 1977-85.

  7. 1979: I encountered a Cray 1 computer, and probably also 8-bit bytes (and yes, lower case printer outputs) for the first time at the University of London Computing Centre.
  8. 1980: Remember that teletypewriter, encountered earlier. Well these were now running at 2400 baud and I started to organise the deployment of a chemistry department computer network to sprinkle several such terminals around the department. The controller was a PAD, and in that year, we introduced STN ONLINE using this network. It was the first time we could search CAS online ourselves (previously, it was a service offered by the library). Literature searching has not been the same since.
  9. 1980: I finally again encountered a real computer, which one could happily listen to without creeping into machine rooms in the middle of the night. It was the data system on a Bruker Spectrospin 250 MHz superconducting NMR spectrometer. I had many adventures on this system. It was installed, by the way, on more or less the same day as the birth of my first daughter Joana. It had a hard drive (5 Mbytes as I recollect, and cost an absolute fortune, around £10,000 if I remember correctly).

    Combining Quantum mechanics and NMR.

    Computer graphics 1982, from NMR spectrometer.

  10. 1982: More networks, this time a curious computer known as the Corvus Concept, using a networked hard drive (possibly as big as 20 Mbytes by now), and a large screen.
  11. 1985: Enter the Mac (OK, the IBM PC came a little earlier, but it was not entrancing). Now one really had a tactile computer that made noises (not always nice), produced smoke signals occasionally, and ejected its floppy disk incessantly. Yet another revolution to cope with. As I type this, I look down on that Mac, which is still underneath my desk. Wonder if its worth anything on ebay?

Well, a second consecutive blog, with (almost) no pictures or molecules. And I have only gotten to the half way stage of my story. Better break off then.