Posts Tagged ‘chemical’

Impossible molecules.

Monday, April 1st, 2019

Members of the chemical FAIR data community have just met in Orlando (with help from the NSF, the American National Science Foundation) to discuss how such data is progressing in chemistry. There are a lot of themes converging at the moment. Thus this article[1] extolls the virtues of having raw NMR data available in natural product research, to which we added that such raw data should also be made FAIR (Findable, Accessible, Interoperable and Reusable) by virtue of adding rich metadata and then properly registering it so that it can be searched. These themes are combined in another article which made a recent appearance.[2]

One of the speakers made a very persuasive case based in part on e.g. the following three molecules which are discussed in the first article[1] (the compound numbers are taken from there). The question was posed at our meeting: why did the referees not query these structures? And the answer in part is to provide referees with access to the full/primary/raw NMR data (which almost invariably they currently do not have) to help them check on the peaks, the purity and indeed the assignments. I am sure tools that do this automatically from such supplied data by machines on a routine basis do exist in industry (and which is something FAIR is designed to enable). Perhaps there are open source versions available?

17 18 19

 
328[3] 348 713

Here I suggest a particularly simple and rapid “reality check” which I occasionally use myself. This is to compute the steric energy of the molecule using molecular mechanics. The mechanics method is basically a summation of simple terms such as the bond length, bond angle, torsion angle, a term which models non bonded repulsions, dispersion attractions and electrostatic contributions. The first three are close to zero for an unstrained molecule (by definition). The last three terms can be negative or positive, but unless the molecule is protein sized, they also do not depart far from zero. A suitable free tool that packages all this up is Avogadro.

The procedure is as follows

  1. Start from the Chemdraw representation of the molecule. If the publishing authors have been FAIR, you might be able to acquire that from their deposited data. Otherwise, redraw it yourself and save as e.g. a molfile or Chemdraw .cdxml file.
  2. Drop into Avogadro, which will build a 3D model for you using stereochemical information present in the Chemdraw or Molfile.
  3. In the  E tool (at the top on the left of the Avogadro menu) select e.g. the MMFF94 force field. This is a good one to use for “organic” molecules for which the total steric energy for “normal” molecules is likely to be < 200 kJ. Calculate that for your system; this normally takes less than one minute to complete. The values obtained for the three above are shown in the table. All three are well over 200 kJ/mol, which should set alarm bells ringing.
  4. A “more reasonable” structure for 17 is shown below. This has a steric energy of 152 kJ/mol, some 176 kJ/mol lower than the original structure. This does not of itself “prove” this alternative, but it is a starting point for showing it might be correct.Of course mis-assigned but otherwise reasonable structures are unlikely to be revealed by the steric energy test. But impossible ones will probably always be flagged as such using this procedure. 

Postscript: Hot on the heels of writing this, the molecule Populusone came to my attention.[4] On first sight, it seems to have some of the attributes of an “impossible molecule” (click on diagram below for 3D coordinates).

However, it has been fully characterised by x-ray analysis! The steric energy using the method above comes out at 384 kJ/mol, which in the region of impossibility! This can be decomposed into the following components: bond stretch 30, bend 51, torsion 32, van der Waals (including repulsions) 177, electrostatics 87 (+ some minor cross terms). These are fairly evenly distributed, with internal steric repulsions clearly the largest contributor. The C=C double bond is hardly distorted however, which is in its favour. Clearly a natural product can indeed load up the unfavourable interactions, and this one must be close to the record of the most intrinsically unstable natural product known!

References

  1. J.B. McAlpine, S. Chen, A. Kutateladze, J.B. MacMillan, G. Appendino, A. Barison, M.A. Beniddir, M.W. Biavatti, S. Bluml, A. Boufridi, M.S. Butler, R.J. Capon, Y.H. Choi, D. Coppage, P. Crews, M.T. Crimmins, M. Csete, P. Dewapriya, J.M. Egan, M.J. Garson, G. Genta-Jouve, W.H. Gerwick, H. Gross, M.K. Harper, P. Hermanto, J.M. Hook, L. Hunter, D. Jeannerat, N. Ji, T.A. Johnson, D.G.I. Kingston, H. Koshino, H. Lee, G. Lewin, J. Li, R.G. Linington, M. Liu, K.L. McPhail, T.F. Molinski, B.S. Moore, J. Nam, R.P. Neupane, M. Niemitz, J. Nuzillard, N.H. Oberlies, F.M.M. Ocampos, G. Pan, R.J. Quinn, D.S. Reddy, J. Renault, J. Rivera-Chávez, W. Robien, C.M. Saunders, T.J. Schmidt, C. Seger, B. Shen, C. Steinbeck, H. Stuppner, S. Sturm, O. Taglialatela-Scafati, D.J. Tantillo, R. Verpoorte, B. Wang, C.M. Williams, P.G. Williams, J. Wist, J. Yue, C. Zhang, Z. Xu, C. Simmler, D.C. Lankin, J. Bisson, and G.F. Pauli, "The value of universally available raw NMR data for transparency, reproducibility, and integrity in natural product research", Natural Product Reports, vol. 36, pp. 35-107, 2019. https://doi.org/10.1039/c7np00064b
  2. A. Barba, S. Dominguez, C. Cobas, D.P. Martinsen, C. Romain, H.S. Rzepa, and F. Seoane, "Workflows Allowing Creation of Journal Article Supporting Information and Findable, Accessible, Interoperable, and Reusable (FAIR)-Enabled Publication of Spectroscopic Data", ACS Omega, vol. 4, pp. 3280-3286, 2019. https://doi.org/10.1021/acsomega.8b03005
  3. A.I. Savchenko, and C.M. Williams, "The Anti‐Bredt Red Flag! Reassignment of Neoveratrenone", European Journal of Organic Chemistry, vol. 2013, pp. 7263-7265, 2013. https://doi.org/10.1002/ejoc.201301308
  4. K. Liu, Y. Zhu, Y. Yan, Y. Zeng, Y. Jiao, F. Qin, J. Liu, Y. Zhang, and Y. Cheng, "Discovery of Populusone, a Skeletal Stimulator of Umbilical Cord Mesenchymal Stem Cells from <i>Populus euphratica</i> Exudates", Organic Letters, vol. 21, pp. 1837-1840, 2019. https://doi.org/10.1021/acs.orglett.9b00423

Managing (open) NMR data: a working example using Mpublish.

Monday, August 1st, 2016

In March, I posted from the ACS meeting in San Diego on the topic of Research data: Managing spectroscopy-NMR, and noted a talk by MestreLab Research on how a tool called Mpublish in the forthcoming release of their NMR analysis software Mestrenova could help. With that release now out, the opportunity arose to test the system.

I will start by reminding that NMR data associated with a published article is (or should be) openly free: one should not need a subscription to the journal to access it (although one might in order to find it). Now, NMR data as it emerges from a spectrometer is highly sophisticated, comprising a collection of (sometimes) binary proprietary files containing the measured free induction decays (FID). Turning this raw data into an interpretable NMR spectrum, the visual form of the data that so appeals to human beings, is non trivial. This requires what may be highly sophisticated software and that in turn means that it may be a commercial product. Of course there are also examples of non-commercial open software packages that are best-of-breed; indeed in its early life-cycle MestreNova was known as MESTREC before becoming a commercial product. Could one achieve the benefits of both open and fully functional NMR data with no loss from the original instrument coupled with the ability to apply top-quality software for its analysis in an open manner? This is a demonstration of how Mpublish achieves this.

  1. Invoke the URL data.datacite.org/chemical/x-mnpub/10.14469/hpc/1087 from a browser
  2. This action queries the metadata deposited with DataCite for the doi 10.14469/hpc/1087 and retrieves the first instance of any file associated with that dataset that has the format type chemical/x-mnpub. You can directly view this metadata by invoking just data.datacite.org/10.14469/hpc/1087 where you can find both mnpub and mnova formats listed. A command such as data.datacite.org/chemical/x-mnpub/10.14469/hpc/1087 allows the file retrieval to be incorporated into automated workflows based just on the doi and the media type desired. Note my parenthetical comment above about finding data; here you only need its doi to retrieve it!
  3. The URL above downloads a small text file with the suffix .mnpub which contains in essence two components:

    • A URL pointing directly to an .mnova file at the repository for which the doi has been issued
    • A signature key derived used to verify that the public key of the publisher (the data repository in this instance) was counter-signed by Mestrelab.
  4. If you now download the application program and install it (but for the purpose of this demonstration, ignore any requests to try to license the program. Use it unlicensed) and open the .mnpub file using it, you should get the below.The application program has checked the signature key, and if valid, proceeds to download a full data file (a .mnova file in this case), and to analyze and display it within the program. The data is fully active; it can be manipulated and analysed. Notice in the picture below, the red arrow points to the state of the license, in this case not present.
    mn
  5. It is also possible to apply this procedure to the raw data as it emerges from the (Bruker) spectrometer, and compressed into a .zip archive. The MestreNova software will automatically process the contents by applying various default parameters, although the result may not correspond exactly to that present in e.g. the equivalent .mnova file (which may have had specific parameters applied).

It is my hope that anyone who records NMR data and processes it using software such as MestreNova will now consider using the mechanism above to accompany their submitted articles, rather than just automatically pasting a static image of the spectrum into a PDF file as "supporting information". This is part of what is meant by "managed research data" (RDM).

One cannot help but note that many types of scientific instrument nowadays come with bespoke software for analysing the data they produce. Very often this software is unavailable to anyone who has not purchased the instrument itself. To make the data available to others, the processed data and its visual interpretation often have to be reduced, with much consequent information loss, to a lowest common denominator format such as Acrobat/PDF. Here we see a mechanism for avoiding any such information loss whilst enabling, for that dataset only, the full potential for (re)analysing the data. It will be interesting to see if other examples of this model or its equivalent emerge in the near future.

 
 
 

Global initiatives in research data management and discovery: searching metadata.

Monday, March 7th, 2016

The upcoming ACS national meeting in San Diego has a CINF (chemical information division) session entitled "Global initiatives in research data management and discovery". I have highlighted here just one slide from my contribution to this session, which addresses the discovery aspect of the session.

Data, if you think about it, is rarely discoverable other than by intimate association with a narrative or journal article. Even then, the standard procedure is to identify the article itself as being of interest, and then digging out the "supporting information", which normally takes the form of a single paginated PDF document. If you are truly lucky, you might also get a CIF file (for crystal structures). But such data has little life of its own outside of its parent, the article. Put another way, it has no metadata it can call its own (metadata is data about an object, in this case research data). An alternative is to try to find the data by searching conventional databases such as CAS,  Beilstein/Reaxys or CSD, and there of course the searches can be very precise. But (someone) has to pay the bills for such accessibility.

We are now starting to see quite different solutions to finding data (the F in FAIR data, the other letters representing accessibility, interoperability and re-usability). These solutions depend on metadata being a part of the solution from the outset, rather than any afterthought produced as a commercial solution. The collection of metadata is part of the overall process called RDM, or research data management, perhaps even the most important part of it. In exchange for identifying metadata about one's data, one gets back a "receipt" in the form of a persistent identifier for the data, more commonly known as a DOI. The agency that issues the DOI also undertakes to look after the donated metadata, and to make it searchable. The table below shows eight searches of such metadata, one example of how to acquire statistics relating to the usage of the data and one search of how to find repositories containing the data.

Search queries enabled by the use of metadata in data publication
# Search query* Instances retrieved:
1 http://search.datacite.org/ui?q=alternateIdentifier:InChIKey:*  InChI identifier
2 http://search.datacite.org/ui?q=alternateIdentifier:InChI:*  InChI key 
3 http://search.datacite.org/ui?q=alternateIdentifier:InChIKey:CULPUXIDFLIQBT-UHFFFAOYSA-N InChI key CULPUXIDFLIQBT-UHFFFAOYSA-N 
4 http://search.datacite.org/ui?q=ORCID:0000-0002-8635-8390+alternateIdentifier:InChIKey:* ORCID 0000-0002-8635-8390 AND (boolean) InChI key.
5 http://search.datacite.org/ui?q=ORCID:0000-0002-8635-8390+alternateIdentifier:InChI:InChI=1S/C9H11N5O3* ORCID 0000-0002-8635-8390 AND (boolean) + InChI string 1S/C9H11N5O3 with the * wild.
6 http://search.datacite.org/ui?q=has_media:true&fq=prefix:10.14469 Has content media for Publisher 10.14469 (Imperial College)
7 http://search.datacite.org/ui?q=format:chemical/x-* Data format type chemical/x-* 
8 http://search.datacite.org/api?&q=prefix:10.14469& fq=alternateIdentifier:InChIKey:*& fl=doi,title,alternateIdentifier& wt=json&rows=15
http://api.labs.datacite.org/works?q=prefix:10.14469+AND+alternateIdentifier:InChIKey:*
First 15 hits in JSON format, batch query mode
9 http://stats.datacite.org/?fq=datacentre_facet:"BL.IMPERIAL – Imperial College London" resolution statistics for publisher 10.14469 (Imperial College) per month
10 http://service.re3data.org/search?query=&subjects[]=31 Chemistry Research data repository search for Chemistry (135 hits)

In this instance the three MIME media types are chemical/x-wavefunction, chemical/x-gaussian-checkpoint and chemical/x-gaussian-log. See[1] for chemical MIME (multipurpose internet media extensions).


Anyone familiar with the standard ways of finding data (CAS, CSD, Reaxys) will appreciate that the above does not yet have the finesse to find eg sub-structures of chemical structures, synthetic procedures or molecular properties. My including it here is primarily to show some of the potential such systems have, and to remark particularly that the batch query capability of this infrastructure could indeed be used in the future to construct much more sophisticated systems.  Oh, and to the end-user at least, the searches shown above do not require institutional licenses to use. Both the data and its metadata is free, mostly with a CC0 or CC BY 3.0 license for re-use (the R of FAIR).

If more of interest related to this topic emerges at the ACS session,  I will report back here.

References

  1. H.S. Rzepa, P. Murray-Rust, and B.J. Whitaker, "The Application of Chemical Multipurpose Internet Mail Extensions (Chemical MIME) Internet Standards to Electronic Mail and World Wide Web Information Exchange", Journal of Chemical Information and Computer Sciences, vol. 38, pp. 976-982, 1998. https://doi.org/10.1021/ci9803233

The 5σ-confidence level: how much chemistry achieves this?

Saturday, July 5th, 2014

I was lucky enough to attend the announcement made in 2012 of the discovery of the Higgs Boson. It consisted of a hour-long talk mostly about statistics, and how the particle physics community can only claim a discovery when their data has achieved a 5σ confidence level. This represents a 1 in 3.5 million probability of the result occurring by chance. I started thinking: how much chemistry is asserted at that level of confidence? Today, I read Steve Bachrach’s post on the structure of Citrinalin B and how “use of Goodman’s DP4 method indicates a 100% probability that the structure of citrinalin B is (the structure below)”. Wow, that is even higher than the physicists. Of course, 100% has been obtained by rounding 99.7 (3σ is 99.73%) or whatever (this is one number that should never be so rounded!). pc But there was one aspect of this that I did want to have a confidence level for; the absolute configuration of citrinalin B. Reading the article Steve quotes[1], one sees this aspect is attributed to ref 5[2], dating from 2005. There the configuration was assigned on the basis of “comparison of the electronic circular dichroism (ECD) spectra for 1 and 2 with those of known spirooxiindole alkaloids“. However, this method can fail[3]. Also, one finds “comparison of the vibrational circular dichroism (VCD) spectra of 1 with those of model compounds[2]. Nowadays, one would say that there is no need for model compounds, why not measure and compute the VCD of the actual compound? Even a determination using the Flack crystallographic method can occasionally be wrong![4]. Which leads to asking what typical confidence levels might be for these three techniques, and indeed whether improving instrumentation means that the confidence level gets higher with time. OK, I am going to guess these.

  1. I think the confidence level for assigning absolute configurations on the basis of ECD analogy with other compounds is the lowest of all the methods. Around 1σ or 68.3% (and this mostly from additional information such as the chemical transforms performed from starting materials of known absolute configuration).
  2. VCD is higher. If performed on the actual compound, I think it can be as high as 2-3σ or 95.5-99.7%. It is difficult to know how much of this certainty is lost by using only model compounds.
  3. Flack analysis (of anomalous X-ray)[5] is probably also at 2-3σ; I suggest however that a fair bit of uncertainly not included in the 2-3σ probably arises from analysing a tiny crystal (1 µg) arising from a solution perhaps 10,000 times larger in weight of sample.
  4. And of course combining the uncertainties from multiple experiments reduces it overall.

I am not casting any doubts on an assigned absolute configuration on which that of citrinalin B is based, as done in 2005. I have no grounds to think it is wrongly assigned. I am merely suggesting that in 2014, one should be able to achieve an even greater confidence level. And do what the physicists do, try to estimate the confidence level attained. I wonder how much chemistry would match the physicists 5σ-confidence level (99.99994%)?

References

  1. E.V. Mercado-Marin, P. Garcia-Reynaga, S. Romminger, E.F. Pimenta, D.K. Romney, M.W. Lodewyk, D.E. Williams, R.J. Andersen, S.J. Miller, D.J. Tantillo, R.G.S. Berlinck, and R. Sarpong, "Total synthesis and isolation of citrinalin and cyclopiamine congeners", Nature, vol. 509, pp. 318-324, 2014. https://doi.org/10.1038/nature13273
  2. T. Mugishima, M. Tsuda, Y. Kasai, H. Ishiyama, E. Fukushi, J. Kawabata, M. Watanabe, K. Akao, and J. Kobayashi, "Absolute Stereochemistry of Citrinadins A and B from Marine-Derived Fungus", The Journal of Organic Chemistry, vol. 70, pp. 9430-9435, 2005. https://doi.org/10.1021/jo051499o
  3. F. Cherblanc, Y. Lo, E. De Gussem, L. Alcazar‐Fuoli, E. Bignell, Y. He, N. Chapman‐Rothe, P. Bultinck, W.A. Herrebout, R. Brown, H.S. Rzepa, and M.J. Fuchter, "On the Determination of the Stereochemistry of Semisynthetic Natural Product Analogues using Chiroptical Spectroscopy: Desulfurization of Epidithiodioxopiperazine Fungal Metabolites", Chemistry – A European Journal, vol. 17, pp. 11868-11875, 2011. https://doi.org/10.1002/chem.201101129
  4. F.L. Cherblanc, Y. Lo, W.A. Herrebout, P. Bultinck, H.S. Rzepa, and M.J. Fuchter, "Mechanistic and Chiroptical Studies on the Desulfurization of Epidithiodioxopiperazines Reveal Universal Retention of Configuration at the Bridgehead Carbon Atoms", The Journal of Organic Chemistry, vol. 78, pp. 11646-11655, 2013. https://doi.org/10.1021/jo401316a
  5. H.D. Flack, and G. Bernardinelli, "The use of X‐ray crystallography to determine absolute configuration", Chirality, vol. 20, pp. 681-690, 2007. https://doi.org/10.1002/chir.20473

Internet Archaeology: Blasts from the past.

Friday, October 11th, 2013

In 1993-1994, when the Web (synonymous in most minds now with the Internet) was still young, the pace of progress was so rapid that some wag worked out that one “web-year” was like a dog-year, worth about 7 years of normal human time. So in this respect, 1994 is now some 133 web-years ago. Long enough for an archaeological excavation.

And so it was that I came across two Web-pages that have suddenly acquired a topical significance:

  1. http://www.ariadne.ac.uk/issue1/clic
  2. http://doi.org/10.1080/13614579509516846[1]

Their topicality in part arises from e.g. http://www.rsc.org/AboutUs/News/PressReleases/2013/RSC-announces-chemical-sciences-repository.asp where the RSC seeks community support to help curate the data we as scientists produce.

Some of my recent posts (this one on dual-publisher models and this one on publishing procedures) also pertain to this and Peter Murray-Rust is constantly blogging on the topic (see this for the latest).

Perhaps 2013 will indeed be the year of data! 

References

  1. D. James, B.J. Whitaker, C. Hildyard, H.S. Rzepa, O. Casher, J.M. Goodman, D. Riddick, and P. Murray‐Rust, "The case for content integrity in electronic chemistry journals: The CLIC project", New Review of Information Networking, vol. 1, pp. 61-69, 1995. https://doi.org/10.1080/13614579509516846

Aromaticity in the benzidine-like π-complex formed from PhNHOPh.

Saturday, January 19th, 2013

The transient π-complex formed during the “[5,5]” sigmatropic rearrangement of protonated N,O-diphenyl hydroxylamine can be (formally) represented as below, namely the interaction of a six-π-electron aromatic ring (the phenoxide anion 2) with a four-π-electron phenyl dication-anion pair 1. Can one analyse this interaction in terms of aromaticity?

pi-complex1

I showed previously that the interaction between these two components involves the stabilising overlap (donation) of a filled orbital on 2 with an empty (acceptor) orbital on the dication-anion pair 1. So what does the interaction of a six-electron (and hence 4n+2 Hückel aromatic) donor ring with a four-electron (and hence formally a Hückel anti-aromatic) acceptor ring lead to? To find out, I carried out a QTAIM analysis of the ring- and bond-critical points in the topology of the computed electron density of complex, and then evaluated the NICS (nucleus-independent-chemical shift) NMR probe at these points. First, the QTAIM analysis. Green=bond critical points, red=ring and blue=cage.

pi-QTAIM

The NMR analysis (ωB97XD/6-311G(d,p)/SCRF=water) is shown below. This is for a closed shell wavefunction, which does not include contributions from any open shell biradical singlet.

Click for  3D.

Click for 3D.

  1. This is the ring centroid of the phenoxide anion 2, and would normally be expected to show a highly diatropic NICS value indicative of ring aromaticity. The value computed for the complex is -0.9 ppm, which is not aromatic!
  2. This is the ring centroid of the (nominally) antiaromatic 1, and has a value of -5.9 (mildly aromatic; benzene itself on this scale is about -10 ppm). Neither ring is behaving as might be indicated they should prior to their forming the π-complex.
  3. The remaining points all lie in plane between the two rings; they are unique to the π-complex itself. Point # 3 has a NICS of -14.0 ppm; it is ~located at the centroid of the C=O and C=N bonds.
  4. This point has NICS -17.1 ppm, being the most highly diatropic of the seven computed.
  5. This, -12.1, and
  6. the next -13.2 are ring points lying between the 3,3′ carbons of either ring.
  7. This point is the (defining) centroid of the whole complex and has NICS -15.7 ppm.

This reveals that neither individual ring of the complex sustains a a diatropic ring current, but that the region between the two rings, one that defines the π-complex itself, is very highly diatropic. The most simple way of looking at it is that the two rings coming together has created an aromatic complex (I remind again that this is in the closed-shell picture of this system, allowing partial biradical character may influence this). To illustrate this holistic aspect, I show below the most stable of the π-MOs (this MO in fact resembles to remarkable degree the lowest π-MO of ferrocene  which can be used to illustrate the 18-electron filled shells of the iron at the centre).  in fact this is one of seven π-MOs that can be identified, making the system a 14 (certainly 10)-π-electron aromatic (the extra electrons come from the oxygen of 2 and the C=N region of 1).

Click for  3D.

Click for 3D.

It is most amusing (which is how Michael Dewar might have stated it) that such an unpretentious molecule as PhNHOPh could reveal such surprises. It is also noteworthy that Dewar championed the concept of using aromaticity to determine selection rules for pericyclic reactions, and so he would perhaps have appreciated that the π-complex he suggested for the benzidine pericyclic rearrangement might have its own unique aromatic character.

Computers 1967-2011: a personal perspective. Part 3. 1990-1994.

Tuesday, July 12th, 2011

In 1986 or so, molecular modelling came of age. Richard Counts, who ran an organisation called QCPE (here I had already submitted several of the program codes I had worked on) had a few years before contacted me to ask for my help with his Roadshow. He had started these in the USA as a means of promoting QCPE, which was the then main repository of chemistry codes, and as a means of showing people how to use the codes. My task was to organise a speakers list, the venue being in Oxford in a delightful house owned by the university computing services. Access to VAX computers was provided, via VT100 terminals. Amazingly, these terminals could do very primitive molecular graphics (using delightfully named escape codes, which I learnt to manipulate).

An expert on the use of such codes was George Purvis, who hailed from the quantum theory project at the University of Florida at Gainesville. He had developed QUIPU for VAX/VT100 and together we had much fun setting things up for the participants at these QCPE workshops (which ran 1986-1990). During one session, George asked me whether I thought a properly implemented and reasonably cheap graphical user interface might have commercial potential in chemistry. Remember, the VAX/Evans&Sutherland PS390 system we had acquired in 1987 was NOT cheap. I must have encouraged him, since in 1990 George (now part of the CACHE, or computer assisted chemistry, group at the Tektronix corporation in Beaverton) had brought to market a “shrink-wrapped” system which did just that. This was, in many ways, well ahead of its time. It was based on a then state-of-the-art Macintosh computer, with a co-processor that could crunch floating point numbers quite fast (this was then very rare in so called personal computers, being reserved for supercomputers). It had a unique spherical trackball (almost a haptic device) for rotating molecules, and a liquid crystal polarized screen running at 120Hz (60Hz for the left eye, 60Hz for the right eye). Wearing polarized (passive) glasses, the stereo 3D effect via the 19″ screen (big for its day) was awe inspiring. What is more, two people could sit at it and both see molecules in stereo.

We managed to get a grant to purchase such a system, and I well remember taking it to the 1990 Oxford workshop (I had now taken over from Richard for the UK workshops) in the back of my car. This involved driving to my office on a Saturday, and heaving the thing out. A security guard saw me doing this and arrested me. After much ado, I was forced to take the CACHE to my office and told not to try that again. I waited 30 minutes, and took it out the back door (which nowadays has a black security camera watching it, but in those days was not guarded) and on to Oxford (checking for police sirens all the way). I think I made the trip to Oxford with this thing in the back of the car one more time, where I used it to give a poster at a conference, handing out the 3D glasses to anyone who expressed an interest (and reclaiming them rapidly if they posed no interesting question). I still fancy this was almost unique in the history of posters (which tend, even nowadays, to be printed on paper). Reflecting on this, I realise that my total aversion to Powerpoint probably dates from that time.

At this stage, I will tell you about some of the science we did with the remarkable stereographical 3D CACHE system. The first is our realisation that the Pirkle reagent exhibits a π-facial hydrogen bond from the OH group (DOI: 10.1039/C39910000765). Indeed, I notice that four of the posts here relate to this topic! Once you know what you are looking for, its trivial to spot. But I recollect that the crystallographers who did the structure for us had failed to identify this unusual hydrogen bond; it took the CACHE, and its 3D glasses, for us to notice it.

But the really important breakthrough using CACHE was a different molecule, halofantrine (X=Y=Cl, DOI: 10.1039/C39940001135) an antimalarial pharmaceutical molecule.

Halofantrine.

At this stage, pharmaceutical companies were assiduously resolving chiral compounds into their enantiomers and testing each separately for biological activity. It had been noticed that whereas X=H, Y=Cl could NOT be resolved on a chiral column, replacing X=H by X=Cl suddenly made it possible to do so. But why? Well, in order to inspect this with the CACHE system, we asked for the crystal structure to be done. Back it came and Mike Webb and I sat inspecting the coordinates in full stereoscopic glory, as I recollect for about an hour, twiddling the viewpoint here and there. Each of us would take over the haptic trackball for 10-15 minutes, and we would then discuss what we saw. In one of those magical moments (I can assure you that shivers do run down one’s back at moments like this) we spotted that X=H had a strong hydrogen bond to the OH of another molecule, whereas X=Cl did not. Suppressing that C-H…O interaction forces the molecule to π-π stack instead, and this mode now enables it to better interact with the chiral column and hence resolve.

Halofantrine. Click for 3D.

Some of that magic is recreated above. If you click on the image, the coordinates will be loaded. Now that the relevant interaction is highlighted, it is so easy to spot you might wonder how anyone would have ever missed it!. At any rate, shortly after writing this article, I sat down to write another on a new phenomenon called the World-Wide-Web. And to illustrate why the Web might become important, we highlighted halofantrine, and how the Web could carry such immediately visual information to its readers. This blog, in effect, is a direct descendent of that article (which, by the way, is still available in HTML form here). So, 3D graphics led to the (chemical) Web. What a tangled web indeed.

And to end with 3D. I live in hope that shortly, stereoscopic tablets will make an appearance. Given that the CACHE system noted above was heavy (it was a major struggle moving the monitor into the car, as described above), it will be an amazing evolution to see (almost) pocket sized devices being carried around for the same purpose.

Computers 1967-2011: a personal perspective. Part 2. 1985-1989.

Friday, July 8th, 2011

As a personal retrospective of my use of computers (in chemistry), the Macintosh plays a subtle role.

  1. 1985: In the previous part, I noted how the Corvus Concept computer introduced a network hard drive (these still being too expensive for any one individual to afford one); the same principle applied to the 1985 Macintosh but now relating to the remarkable introduction of the laser printer. Until then, us chemists had used french curves (see previous post for an explanation), stencils or transfer lettering. It could be really tedious preparing a complex manuscript. Indeed, in some published articles of the time, one often saw hand-drawn chemical diagrams! So when the Macs arrived in 1985 (and it has to be said the associated rise of ChemDraw at that time), it became imperative to network them so that everyone could have access to that precious laser printer (I still remember its network name, selected using the aptly named Chooser utility). Fortunately, the Mac came with a network port (unless I am mistaken, this was not an invariable feature of the IBM PC of the period). The network was created using a router (the first time I had come across one of these) from the Webster corporation in Australia, and our local electrician and his colleagues suddenly found themselves putting in Appletalk cables everywhere. The poor chemists in the department not only had to get used to the mouse pointing device and unfloppy floppy disks, but to the idea of selecting network devices.
  2. 1987:We also acquired a Microvax with an Evans and Sutherland PS390 stereographics device at this time (more of which later in another post), and this came with an interesting bonus. Haggling had managed to leave about £25K left over, which I decided to spend on a “grown up proper network”. This took the form of a thickwire ethernet of about 400m length. This stretched from the Microvax to the main college hub and thence the outside world (the “Internet”) and also to the close-by new network distribution cabinet where one end of the Fibre optic cable was terminated (a bonus of all this was a Pirelli calendar, yet another story that must wait to be told).  The fibre was strung to a catenary connecting to our other building (the idea being that it should be immune to lightening strikes. I had earlier explored the idea of a copper cable routed through tunnels connecting the two chemistry buildings, and spent a most interesting day down in those tunnels exploring. Therein lies yet another story for another day). Anyway, we now had a 10 megabit network (1000 times faster than the old PADs, which were still around) and this was connected to the Webster multigate routers (there were two of them now, one for each building). Our Macs all had the Internet!

    Apple, bless their hearts, distributed a control panel called MacTCP, and after I figured out what it all meant (network masks, Class C subnets and the like) I let everyone know that another network device had been added to join the laserprinter. Few IBM PC owners could boast this. At this stage, in truth, there was not that much people could connect to. Using MacTelnet, we could indeed access CAS Online, and print the search to a laserprinter. Using MacFTP, we could get files remotely from other FTP servers, and we started to acquire coordinate files for our molecular modelling. This in turn brought the realisation that the existing formats (Brookhaven protein databank files were the most common at the time) were not ideally suited for the purpose, and this could be seen as another spark for the CML (XML) work that started about nine years later. I also remember discovering that Apple computer ran their own FTP server, where I could download the latest operating system disk images (Systems 5-7 as I recollect were obtained from this site ). Things were free (but not always that easy) in those days. Our Macs ended up have the latest OS on them (in other words, they tended to crash a little less) almost as soon as it was released (and the Mac app store™, with its impending 4.6 Gbyte of OS X Lion about to be downloaded is merely the latest example of this).

  3. 1987: Armed with all this experience, I was also asked to serve a two year stint on the editorial advisory board of the Royal Society of Chemistry. At the time, what is now called supporting information was just starting, and of course it was going to be in print only. I suggested that perhaps the RSC should plan for the day when it could be online instead (the term online was not, I think, in that common use then, and electronic journals were also not yet common). I was still not happy that the only way to access that information would have to be FTP file transfers, but then little did I realise then that Tim Berners-Lee at CERN already had a glimmer in his eye.
  4. 1988: The network on the Macs became a little more useful in this year, when a Macintosh email client called Eudora was released (in truth, I had already sent my first email in 1976, from CMU in Pittsburgh whilst on a visit there, to the person standing next to me!). The Microvax alluded to above provided the mail relay, and a few brave individuals started sending email (not that many people had email addresses in those days mind you). The RSC was still grappling with this. I remember putting my email address at the top of an article submitted to them, and the copy-editor deleted it from the proofs as “unrecognised address form“. I re-instated it, they deleted it again. After some telephone negotiation, it remained (although the RSC assured me it would confuse the journal readers mightily). For the record, if you do manage to find it, it no longer works (being something like rzepa@vaxa.ch.ic.ac.uk. We were still learning how to do things properly then).
  5. 1989: I managed to convince the department that it would be useful to use computers for undergraduate teaching, and we opened a computer room with 12 Macs. I maintained them using a wonderful network utility called  RevRDist for Mac, which cloned a master Mac onto the 12 clients, and made the task of adding new software very easy. There was always lots of good software for Macs in those early days. But to introduce students to how to use them, I did feel impelled to produce a 4 page printed handout explaining it all. And I only did this once a year. Clearly again, the need to manage this better must have been in my mind.

This post focuses on a very short period, because I wanted to get across how (in my mind at least) chemistry became globally networked for the (chemical) masses (or at least those with Apple Macintosh computers!), and the role the laserprinter Pippa played in this development.

Molecular illusions and deceptions. Ascending and Descending Penrose stairs.

Wednesday, June 15th, 2011

It is not often that an article on the topic of illusion and deception makes it into a chemical journal. Such is addressed (DOI: 10.1002/anie.201102210) in no less an eminent journal than Angew Chemie. The illusion (or deception if you will) actually goes to the heart of how we represent three-dimensional molecules in two dimensions, and the meanings that may be subverted by doing so. A it happens, it is also a recurring theme of this particular blog, which is the need to present chemistry with data for all three dimensions fully intact (hence the Click for 3D captions which often appear profusely here).

Molecular Penrose stair. Click for 3D.

The molecule above has been synthesized and a crystal structure obtained (if you click above, you will get the 3D coordinates; the above is pruned of some sidechains which are irrelevant here). The authors assert it as an example of a Penrose stair, or perhaps the better known lithograph by M. C. Escher known as Ascending and Descending. This is a visual paradox, the point of which is to show how the eye can be easily deceived if the brain is asked to fill in a third dimension given only two. The molecule above has been drawn with the illusion of depth, using (or mis-using) the embolded bonds akin to those proposed by Maehr (who meant something quite different as it happens). It should be worrying to any chemist who cares about stereochemistry to think that use of these time honoured conventions could actually result in a paradox! So perhaps this accounts for why an article on this very topic has made it onto the pages of such an eminent journal.

What might be of (chemical) interest about this molecule, other than its illusory aspects? Well, could you for example work out, given ONLY the representation above, that the molecule has D2 symmetry, and is therefore chiral? I wondered if it might also be Möbius (with perhaps two half twists), although in fact the π-system in this case has a linking number Lk of zero (not 2). It also has some interesting rather close H…H contacts in the middle, and the inner periphery appears to be a 14-annulene and the outer 26, both conforming to the Huckel 4n+2 rule. Despite this the central C-C bond is actually quite long, and the conjugation is hence significantly interrupted. There is more chemistry in the original article.

But I want to close here on the point that to overcome the deception and illusion, you need to get the three dimensional data in chemistry. You can do this by clicking above. Or, by going to the original article and striving to do so there. I think you will find the latter route the greater challenge! Then, ask yourself why an eminent journal, in publishing an article on the topic of deception and illusion, makes it so relatively difficult to overcome that illusion. Certainly more difficult than I hope it proves to be on this blog!

(re)Use of data from chemical journals.

Wednesday, December 22nd, 2010

If you visit this blog you will see a scientific discourse in action. One of the commentators there notes how they would like to access some data made available in a journal article via the (still quite rare) format of an interactive table, but they are not familiar with how to handle that kind of data (file). The topic in question deals with various kinds of (chemical) data, including crystallographic information, computational modelling, and spectroscopic parameters. It could potentially deal with much more. It is indeed difficult for any one chemist to be familiar with how data is handled in such diverse areas. So I thought I would put up a short tutorial/illustration in this post of how one might go about extracting and re-using data from this one particular source.

Interactive Journal table

The above is a snapshot of part of the table in question, with a box in the middle set aside for a Jmol applet to appear. What might be both less obvious, and less familiar to many who might have seen such a display is the very rich environment available for manipulating the data. To expose some of this, proceed as follows:

  1. Firstly, load a molecule into the Jmol window by clicking on e.g. the hyperlink shown below.

    Loading a molecule

  2. The display shown below will appear, in this case a set of coordinates used to present a 3D model of a molecule, which can be rotated, zoomed, etc. It also has been labelled with various selected bond lengths etc.

    Interactive table with molecule loaded

  3. To extract data, right-click anywhere in the molecule area. Navigate through the menus which appear as shown below. In this case, the data is present in the form of a Gaussian log file. This can contain the history of the particular calculation performed (e.g. a geometry optimisation) or as in this case, all 3N-6 calculated normal vibrational modes. The one of interest here is number 318, being an O=C=O stretching mode.

    An Interactive table in a chemistry journal.

  4. This mode can now be manipulated visually by selecting various parameters:

    Manipulating a vibrational mode

  5. Jmol has a scintillating display of other options, and more are being added all the time, so the above display is by no means the limit of what one can do.
  6. Now to the most important bit. Invoke the menu as shown below, whereupon a copy of the relevant file (gzipped in this case to reduce its size) will be downloaded to your local system. You will now need to use a program on your own computer capable of reading and processing such a file (after unzipping).

    Downloading a data file.

  7. There may be a bewildering variety of programs and toolkits which may perform the operation you wish on such a file. Some are commercial, some are open source. To help people get going, I link to one of the latter type here, You might also want to visit the Quixote project for ideas.
  8. We are not quite finished yet. Perhaps a Gaussian log file does not suite your purpose. Well, now try clicking on this link

    Link to a digital repository

  9. This produces a page such as below, which contains more files. In this example, several molecular identifiers are present (InChI and InChI key) to help identify the uniqueness of the system, the molecular coordinates are available as a .cml file which itself can be processed by a variety of software tools, the original file used to run the calculation can be inspected (if you want to eg repeat it) as input.gjf, the logfile we have seen above, and a checkpoint file, which is most useful when using either the Gaussian program system or a visualiser (Gaussview, ChemBio3D etc, both commercial programs). A SMILES string is also offered, and sometimes (not in this example) a so-called wavefunction file which can be used by some programs to analyse the wavefunction, and perform e.g. QTAIM, ELF, NCI analyses.

    A digital repository page.

    It is now up to the user to identify suitable processing programs on their computer which fit their purpose.

  10. There is one other file present which I have not yet explained, the mets.xml manifest. This is a metadata file, containing (along with much else) an RDF declaration of (some) of the properties of the molecule. In theory at least, this file could be automatically harvested for the RDF, which could be injected into a triple store, and queried semantically using eg SPARQL. That is part of the semantic web.

I hope some of the screenshots here make the process of extracting data from an interactive table article a little more obvious. I must declare that this way of doing it is just one of the ways being explored and also (much to my regret) is not yet particularly common. But hopefully you might capture a little of what some of us believe to be the future of scientific journals.