Archive for the ‘Chemical IT’ Category

Research data and the “h-index”.

Monday, June 24th, 2013

The blog post by Rich Apodaca entitled “The Horrifying Future of Scientific Communication” is very thought provoking and well worth reading. He takes us through disruptive innovation, and how it might impact upon how scientists communicate their knowledge. One solution floated for us to ponder is that “supporting Information, combined with data mining tools, could eliminate most of the need for manuscripts in the first place“. I am going to juxtapose that suggestion on something else I recently discovered. 

Someone encouraged me to take a look at Google Scholar. It is one of those resources that, amongst other features, computes an individual’s h-index and i10-index (the former, having gone through its purple patch, is now apparently at the end of the road, at least for chemists). One reason perhaps why proper curation of research data is not high on most chemists’ list of priorities is that it does not contribute to one’s h-index, and particularly one’s prospects of a successful research career. Thus “supporting information (data)” is one of those things, like styling the citations in a research article, that most people probably prepare through gritted teeth (a rather annoying ritual without which a research article cannot be published). So when I inspected my own Google Scholar profile (you can do the same here) I was rather surprised to find, appended to all the regular research articles, a long list of data citations (sic!). Because I have placed much of my own data into a digital repository, this has opened it up to Google (where don’t they get to nowadays?) for listing (if not actually mining). These citations of themselves actually do not (currently?) contribute to eg the h-index, since currently these entries are not attracting citations by others. And that of course is because doing so is not yet an accepted part of the ritual of preparing a scientific article.

Most scientists must now be pondering what the future holds in terms of how they can bring themselves to the attention of others (in a good way) and hence progress their careers. So I will take Rich’s suggestion one step further. Those scientists who create new data in a process called research, should firstly curate this data properly (via eg a digital repository) and then expect to promote their activity by garnering not only citations for the published narratives (= articles) but also associated published data. Their success as a researcher would be (in part) judged by both. Who knows, as well as famous published narratives, perhaps we will also rank famous published datasets! 


I do the same for the data I use to support many of the posts for this blog.

Digital repositories. An update to the update.

Monday, August 13th, 2012

A third digital repository has been added to the two I described before. Chempound is a free open-source repository which (unlike DSpace and Figshare) was developed specifically for chemistry.

It carries more semantic information (in the form of an RDF triple declaration), which allows SPARQL queries on the entry to be performed.

Our original DSpace repository is also being tweaked to allow additional information to be added to existing entries; in particular if an entry is linked to in a journal publication, the DOI of that article is inserted into the DSpace descriptions. It is also relatively simply to duplicate the information in one repository by re-depositing it into another. Thus it becomes feasible to clone the information about the 600+ entries in our DSpace that have been subsequently published in peer-reviewed journal articles, thus adding a measure of confidence to their provenance.

To compare how the three repositories carry information about the same molecule, invoke any of the links below:

  1. DSpace
  2. Figshare
  3. Chempound 

QR codes and InChI strings.

Sunday, July 22nd, 2012

A month or so ago at a workshop I was attending, a speaker included in his introductory slide a QR (Quick Response) Code. It is a feature of most digital eco-systems that there is probably already “an app for it”. So I thought I would jump on the band wagon by coding an InChI string. Here it is below:

QRCode for an InChI string. Point your smart device at it, and see the InChI appear!

You then invoke an appropriate app (I used QR Reader for iPhone, but there are many), point it at the screen (a fair bit of wobble seems tolerated) and you get the InChI. Are there any hackers out there that could process the resulting InChI and display not so much it, but the molecule it corresponds to? A Quick mash-up I should imagine (its probably already been done!).

Here is another QR Code, this time for another post on this blog (more serious than this one!). 

QR URL code for using on a mobile device.

If you click on either QR image above, this will take you to one (of several) QR code generators. I found that selecting error correction code H seems to make recognition virtually instant. Suddenly an image popped into my mind, of a class of students in a lecture, pointing their device at my InChI codes on the projected screen, and twiddling with the molecules during my lecture (they probably never listen to me anyway 🙂 This may not be as unlikely as it seems. I am in fact compositing an iTunesU course at the moment. For a sneak β-style preview, open this page on an iPad and click on this link to load the course up (or use the QR code below). You probably need to also load up the iTunesU app first. 

QR Code for iTunesU course.

Comments welcome, QR code below!.

 

Digital repositories. An update.

Saturday, July 21st, 2012

I blogged about this two years ago and thought a brief update might be in order now. To support the discussions here, I often perform calculations, and most of these are then deposited into a DSpace digital repository, along with metadata. Anyone wishing to have the full details of any calculation can retrieve these from the repository. Now in 2012, such repositories are more important than ever. 

In the UK, the main funding organisations are increasingly requiring researchers to deposit their primary data in such open archives, and some disciplines are better than others at this (chemistry does not rank very highly in general however in terms of deposition of data). Our DSpace server is a local one running at Imperial College, but a few months back I became aware of Figshare, which aspires to operate on a much wider and more general scale.  So I have injected one of the calculations reported in another post (the IRC for the sodium tolyl thiolate reaction with dichlorobutenone) into Figshare, making use of the API which has recently been developed for this purpose and implemented by  Matt Harvey. As with DSspace, it issues a DOI, which can be then quoted wherever appropriate (and particularly in scientific articles). This particular deposition is 10.6084/m9.figshare.93096

This repository is still undergoing a lot of development, but already one can see many interesting features, such as export to Endnote or Mendeley, and a QR barcode for devices with cameras. I would encourage anyone who regularly generates e.g. computational chemistry data, or knows a group that does, to encourage them to make use of such facilities.

Postscript: If you have a look at this deposition in Figshare you may already notice some of the developments I note above.  Matt Harvey (who, with Mark Hahnel of Figshare, developed our publish script) has added to the entry:

* A data descriptor document URL

* Wikipedia and pubchem links (automatically resolved from Inchi/Key searches)

* Links to chemspider searches

* Links to all other objects in the  Spectra DSpace repository with a common Inchi/Key

Connections in chemistry. Anti-malaria drug ↔ organocatalysis.

Thursday, July 5th, 2012

Back in 1994, we published the crystal structure of the molecule below (X=H), a putative anti-malarial drug called halofantrine. Little did we realise that a whole area of organo catalysis based on a thiourea catalyst with a similar motif would emerge a little later. Here is how the two are connected.

In our original article we described how our interest was sparked by observing the following chiral HPLC behaviour. The two enantiomers of compound 2 (X=Y=Cl) separated nicely on the column. So did the compound where X=Cl, Y=H (4). However, when X=H,Y=Cl specifically (3), all chiral recognition on the column vanished! What was the reason?

As it happens, we had recently acquired our stereoscopic CAChe system, and the structure was loaded into this. After a little while, we noticed that the compound formed a complementary dimer, glued together by C-H…O hydrogen bonds (magenta arrows below). When the H is replaced by Cl, this edge-on dimeric structure is completely destroyed, and is replaced by π-π stacking instead. 

The dimeric structure of Halofantrine. Click for 3D

The significance is that the hydrogen atom specifically antiperiplanar to the electron withdrawing CF3 group was the one forming the C-H…O hydrogen bond (quite a short one as it happens, see 3D above). This despite 7-bonds separating the H-C from the CF3 group. In the article,we speculated on how this effect of acidifying the hydrogen to encourage it to form a hydrogen bond might even be augmented. On historical note, we had made the article available using the then newly accessible World-Wide Web, providing MPEG diagrams corresponding to the structure above. It would be reasonable to claim that this is the very first article in chemistry to have been made so available, dating from mid 1994! If anyone can find an earlier example, do let me know! You might also note the difference between the MPEG movie and the presentation now available above. 

In another part of the world a few years later, Peter Schreiner and his group were exploring organocatalysts based on thiourea (see 10.1021/jo201864e for a recent article) and in 2003 they had discovered remarkable catalytic properties for the system below.

As with us, they speculated that in addition to hydrogen bonds formed to substrates from the N-H groups, these could be augmented by rather weaker secondary interactions to the adjacent C-H bonds; certainly the presence of the CF3 groups was the secret of these catalysts (now quite a family). It is tempting to conclude that both sets of observations are related by the same phenomenon!

I end here by showing a QTAIM analysis of our halofantrine system (see similar analysis for the Pirkle reagent). The key region indicated with magenta arrows above does indeed contain bond critical points (BCPs), with values of ρ(r) ~ 0.045, which is near the top of the range experienced for hydrogen bonds of this type.

QTAIM analysis, showing three bond critical points. Click for 3D.

Connections like this probably permeate chemistry, and all too few of them are actually spotted.

The Dieneone-phenol controversies.

Monday, April 30th, 2012

During the 1960s, a holy grail of synthetic chemists was to devise an efficient route to steroids. R. B. Woodward was one the chemists who undertook this challenge, starting from compounds known as dienones (e.g. 1) and their mysterious conversion to phenols (e.g. 2 or 3) under acidic conditions. This was also the golden era of mechanistic exploration, which coupled with an abundance of radioactive isotopes from the war effort had ignited the great dienone-phenol debates of that time (now largely forgotten). In a classic recording from the late 1970s, Woodward muses how chemistry had changed since he started in the early 1940s. In particular he notes how crystallography had revolutionised the reliability and speed of molecular structure determination. Here I speculate what he might have made of modern computational chemistry, and in particular whether it might cast new light on those mechanistic controversies of the past.

Charting the mechanistic pathway connecting 1 and 2 was first done by Capsi using 14C labels (* in the diagram above, on a steroid derivative),  when after claimed selective Birch reduction of the blue double bond in 2, alkene ozonolysis and decarboxylative loss of *, all the radioactivity ended up in the CO2. This showed that the mechanism involved path (a). For paths (b) or (c), the label would have ended up at * and hence not oxidatively lost as CO2. Futaki did the experiment in a different way, putting his 14C label in the position * where he found that only about half of the label was retained in this position (and then lost when he specifically degraded 2 by oxidatively removing that carbon). This now strongly implicated path (b), and also seemed to disprove not only path (a) but also mechanism (c), where a [1,5] shift should have retained the label at the original position (and caused all of it to be lost upon decarboxylation). It was these two apparently contradictory results that helped ignite the controversies.

All the routes (a)-(e) above involve pericyclic sigmatropic reactions, the understanding of which was about to be revolutionised by Woodward (with Hoffmann) in the mid 1960s. In fact, the mechanism here comprises a mixture of [1,2] cationic sigmatropic migrations and [1,5] neutral sigmatropic migrations. To balance one against the other, can computational chemistry come to the rescue? I first note that the mechanisms above are all shown as cations. Until recently, a computational chemist would simply set the charge on their model to +1 and proceed onwards and upwards. But now we can do a bit better. We can (arguably we always should) include the counterion, and so in my own exploration, I have included a perchlorate anion, and the whole study then becomes one of a neutral system (charge =0), a zwitterion. A B3LYP/6-311G(d,p) model with SCRF=water continuum solvent was employed. Let us see what emerges:

  1. Path (a) involves a [1,2] angular methyl (R=Me) migration, which turns out to have ΔG28.5 kcal/mol. The IRC for this migration is shown below. 

    The (Wheland) intermediate then loses a proton to give 2.

  2. Path (b) involves an alternative rate-limiting migration of the angular methyl, ΔG30.2 kcal/mol, followed by two lower energy [1,2] migrations of the ring ΔG27.8 and 25.3via a spiro-ring Wheland intermediate (relative energy +3.8 kcal/mol), and deprotonation to again give 2.

    Path b-1. Click for 3D

    Path b-2. Click for 3D

    Path b-3. Click for 3D

    Notice how the perchlorate counterion is relatively free to change its position relative to the substituents, and not all these positions have been explored here. This stochastic problem is an issue with counter ions (more accurately, this problem is almost always massaged away by simply ignoring this counterion. But if its ultimate positioning does matter, then one must argue that its inclusion is essential in order to build a good model). 

  3. The energy of path (a) is thus seen to be 1.7 kcal/mol lower than (b/c), which is sufficient to favour positioning of most of the 14C tracer on * rather than and which seems to favour the Capsi mechanism over the Futaki one, although clearly the balance between the two is a fine one. The  Capsi mechanism does seem to hinge on the observation that  Birch reduction of  1 reduces the blue bond entirely specifically, and the evidence for this does need to be reviewed (in an informatics sense, this evidence is buried in a string of logically connected semantic inferences, each of which may well be contained as a passing comment in a different article).
  4. Regarding the matter of whether path (b) or  path (c) is the better representation, this goes to the heart of whether the path is respectively stepwise or concerted. The barriers for escape out of the spiro-ring intermediate defining the steps in path  (b) are key. The IRC for a reaction path with a shallow intermediate  is shown below. If the depth of the well it finds itself in imparts sufficient lifetime for it to lose all  (dynamic) memory of where it came from, then the probability of the  * label remaining in its original position is only 50%, since the other (symmetrically equivalent but unlabeled) position may also migrate in the next step. This seems to be the case for path (b), where the intermediate is in quite a deep well (21.5 kcal/mol for escape), and this is consistent with Futaki’s experiment. If the intermediate however were to be in only in a shallow minimum (2-4 kcal/mol), the momentum it inherits from the previous transition state may carry it over to the second stage without scrambling the isotope. For systems such as these, we do encounter a serious limitation of simple transition state theory, and must start to adopt a molecular dynamics approach. This might also apply to the positioning of the counterion, although perhaps less so for the relatively heavy perchlorate. It may also be an interesting issue of electron dynamics. Path (c) formally involves six electrons, path (b) only two. In a previous post, I speculated whether the electronic pack size for proton transfer was 4,6 or 8 electrons. Perhaps one day it will be possible to either measure (attosecond spectroscopy) or compute the preferred dynamics.
The points made in the last section come to the fore in a result obtained by Hopff and Drieding (he of the models). They confirmed the formation of 2 from 1, and also reported that at 80°C in 70% perchloric acid, 2 was itself then converted in two hours to 3. The debate again turns to whether this is accomplished via path (d) involving 2-electron shifts or path (e) involving a 6-electron shift. No radio-labelling experiments have been reported on this system. 
 
Well, as suspected perhaps, the computational analysis of the dienone-phenol rearrangements has shown the system to be poised on a knife-edge (of chaos). Tiny changes might swing things one way or the other. Adding two further (steroid rings) to  1 might of itself change the balance between e.g. path (a) and  path (b). So too might a change of counterion, or indeed solvent. One needs to identify the evidence that selective reduction of 2 reduces just the blue bond. If computational chemistry has not (yet) provided a clear-cut resolution to the chemistry of this system, at least it can identify new experiments that might.

Postscript:  I posed the question above about  Capsi’s identification of the reduction product of 2. The two possible products would give different outcomes for whether the * label would be lost upon subsequent oxidation or not.

If the reaction is thermodynamically controlled, then the relative free energies of 3 and 4 would determine the outcome. A B3LYP/6-311G(d,p) calculation (in ethanol as solvent, which has a very similar dielectric to liquid ammonia) predicts 4 is about 0.3 kcal/mol lower than 3. This does not suggest that the reaction is going to be particularly regioselective, and of course Capsi’s interpretation depends on the product being entirely 4, with no 3 formed.

The blog post as a scientific article: citation management

Monday, February 27th, 2012

Sometimes, as a break from describing chemistry, I take to describing the (chemical/scientific) creations behind the (WordPress) blog system. It is fascinating how there do seem increasing signs of convergence between the blog post and the journal article. Perhaps prompted by transclusion of tools such as Jmol and LaTex into Wikis and blogs, I list the following interesting developments in both genres.

  1. Improved equation display for Chemistry Central articles using MathJax  This is a way of rendering equations in the pages of both a Blog  and a journal article. This blog is now so empowered, although in fact I employ few equations on these pages.
  2. Citation management and meta-data gathering. This blog plugin takes the form of a numbered citation[1] as here, and which converts the specified DOI to a listing at the bottom of the post in the manner of a conventional scientific article (conventional document citation managers such as EndNote do this as well). It is actually much more than that, since the plugin automatically uses the CrossRef API to retrieve metadata for the quoted Digital Object Identifier (DOI), thus enhancing the metadata associated with the post and its discoverability. Dublin-Core is already present in the post as well as FOAF output, and I occasionally trawl using the Calais archive tagger (although this is not very good at finding chemistry tags).
  3. I installed Chemicalize a year or so ago. This scans the blog text for chemical terms, and adds a hover/popup image of structures it identifies (it is also responsible for the occasional doubled Gravatar image you may see here! Apologies!).
  4. I noted the addition of ChemDoodle to this blog previously. There may be newcomers which I need to track down to this type of non-Java based molecular rendering.

So you can see that building a chemical/science-savvy blog can be great fun! It is also significant that science/chemistry publishers are starting to do this. I bring only one example to your attention, although this introduces a host of other issues that perhaps I should leave for another post.

References

  1. H.S. Rzepa, "The past, present and future of Scientific discourse", Journal of Cheminformatics, vol. 3, 2011. https://doi.org/10.1186/1758-2946-3-46

Shared space (in science).

Friday, January 6th, 2012

I thought I would launch the 2012 edition of this blog by writing about shared space. If you have not come across it before, it is (to quote Wikipedia), “an urban design concept aimed at integrated use of public spaces.” The BBC here in the UK ran a feature on it recently, and prominent in examples of shared space in the UK was Exhibition Road. I note this here on the blog since it is about 100m from my office.

Shared space is the Mornington Crescent of urban design, where you have to work out the rules of the game by in effective participating in it. Thus the new “rules” of travelling down Exhibition Road (by either foot, car, bike, bus or indeed motorbike as I do each day) are not declared, and each participant works them out on the fly. This is supposed to lead to fewer misunderstandings, although the practice does seem rather different (at least at the moment). But where is the chemistry? Well, these thoughts were triggered by two colleagues independently asking me about how chemists use metaphors, and how chemists use representations. I have in fact touched upon both of these previously, and it struck me that this last example, of arrow pushing in organic chemistry, was in fact a nice example of a shared space in chemistry. The rules of arrow pushing are not formally set out (in an IUPAC rule book or similar) but are worked out on the hoof so to speak. Except that the space is shared only by organic chemists. I have observed over the years that e.g. physical or inorganic chemists will mostly not dare venture into that shared space; they often give a rather good impression of not understanding the rules. I also know from experience that mathematicians and physicists regard arrow pushing as anything other than a shared (scientific) space.

Yet the modern scope and ethos of science is that we should all venture into shared spaces (whether they are in or out of our comfort zones). Perhaps, in science, the problem is that so much of what we do has what I refer to as “implicit semantics” (its part of our DNA of e.g. being a chemist). Take for example the diagram below (which I used previously) which sets out four possible sets of rules for this particular shared space. Even so, without further explanation, you might be struggling to infer what message is carried by this diagram. That is because so much of it contains implicit semantics, and unless you recognise the missing features, how can you go about finding out what is invisible?

Curly arrow pushing

My concluding thought would be that shared space is what the semantic web is surely striving for. And if Exhibition Road is anything to go by, it is clearly quite a challenge. But if I (and particularly the pedestrians I encounter there each day) end up surviving 2012, perhaps the Semantic Web may one day come about as well!

Mobile-friendly solutions for viewing (WordPress) Blogs with embedded 3D molecular coordinates.

Sunday, December 11th, 2011

My very first post on this blog, in 2008, was to describe how Jmol could be used to illustrate chemical themes by adding 3D models to posts. Many of my subsequent efforts have indeed invoked Jmol. I thought I might review progress since then, with a particular focus on using the new generations of mobile device that have subsequently emerged.

  1. Jmol is based on Java, which has been adopted by Google’s Android mobile operating system, but not by Apple’s IOS.
    • An Android version of Jmol was recently released, to rave reviews! I do not know however whether the Jmol on these posts can be viewed via Android. Perhaps someone can post a comment here on that aspect?
    • HP has just announced it will open source WebOS, but it seems Java will not be supported so probably no Jmol there then.
    • Windows 8 Mobile (Metro) also seems unlikely to support it either.
  2. Apple has been prominent in touting HTML5 as a Java replacement. In practice, this means that any molecular viewer would be based on a combination of Javascript and WebGL technologies.  Whereas Java is a compiled language, Javascript is interpreted on-the-fly by the browser. Its viability has been greatly increased by very large improvements in the speeds that browsers interpret Javascript nowadays, but this speed is unlikely to ever match that of Java. The real issue is whether that matters. The other difference is that whereas a signed Java applet allows data to escape from the security Sandbox (and into eg a file system), Javascript is likely to be much more restrictive. These two properties mean that Javascript/HTML5 implementations make a lot of use of server-side functionality; in other words a lot of bytes may have to flow between server and mobile device to achieve a desired effect (and the user may have to pay for these bytes via their data plan).
    • One early adopter of the Javascript/WebGL HTML5 model has been ChemDoodle, which I illustrated on this blog about a year ago. I have tidied up the recipe for invoking it since then, and this is given below for anyone interested in implementing it. As of this moment, one essential component, WebGL, is only available to developers of Apple’s IOS system, but I expect this to become generally available soon. When that happens, ChemDoodle components on this blog will start working.
    • A new entrant is GLmol, an open source molecular viewer for Apple’s IOS. A version is also available for Android. I may give a try at embedding this into the blog.
It seems that the 3D molecular viewing options are certainly increasing, but at the moment there is some uncertainty in performance, compatibility and the ability to extract molecular data from the “sandboxes“. This last comment relates to the re-usability of data, which I particularly value.

Although this post has focussed on embedding and rendering molecular data into a blog post, the same principle in fact applies to other expressions. Perhaps the most interesting is the epub3 e-book format, which also supports Javascript/HTML5, and which seems likely to be adopted for future interactive e-books. Indeed, it should be possible to fully convert an interactive blog created using this technology to a e-book with relatively little effort. I have also illustrated here how lecture notes can be so converted.

If you get the impression that the task of a modern communicator of science and chemistry is not merely that of penning well chosen words to describe their topic, but of having to program their effort, then you may not be mistaken.


Procedure for creating a 3D model in a WordPress blog post using ChemDoodle.

  1. As administrator, go to
    wp-content/themes/default

    (or whatever theme you use) and in the file header.php, paste the following

    <link rel="stylesheet" href="../ChemDoodle/ChemDoodleWeb.css" type="text/css">
      <script type="text/javascript" src="../ChemDoodle/ChemDoodleWeb-libs.js"></script>
      <script type="text/javascript" src="../ChemDoodle/ChemDoodleWeb.js"></script>
       <script type="text/javascript" language="JavaScript">
      function httpGet(theUrl)
       {var xmlHttp = null;
       xmlHttp = new XMLHttpRequest();
       xmlHttp.open( "GET", theUrl, false );
       xmlHttp.send( );
       return xmlHttp.responseText;}
       </script>
  2. From here, get the ChemDoodle components and put them into the directory immediately above the WordPress installation. They are there referenced by the path ../ChemDoodle as in the script above. You can put the folder elsewhere if you modify the path in the script accordingly.
  3. Invoke an instance of a molecule thus;
    <script type="text/javascript">// <![CDATA[
    var transformBallAndStick2 = new ChemDoodle.TransformCanvas3D('transformBallAndStick2', 190, 190);transformBallAndStick2.specs.set3DRepresentation('Ball and Stick');         transformBallAndStick2.specs.backgroundColor = 'white';var molFile = httpGet( 'wp-content/uploads/2011/12/85-trans.mol' );var molecule = ChemDoodle.readMOL(molFile, 2);         transformBallAndStick2.loadMolecule(molecule);
    // ]]></script>
  4. The key requirement is that the body of the script (starting with var) must not contain any line breaks; it must be a single wide line. So that you can see the whole line here, I show it in wrapped form (which you must not use);
    var transformBallAndStick2 = new
    ChemDoodle.TransformCanvas3D('
    transformBallAndStick2', 190,
    190);transformBallAndStick2.specs.
    set3DRepresentation('Ball and Stick');
    transformBallAndStick2.specs.
    backgroundColor = 'white';var molFile =
    httpGet('wp-content/uploads/2011/12/85-trans.mol');
    var molecule =ChemDoodle.readMOL(molFile, 2);
    transformBallAndStick2.loadMolecule(molecule);
  5. The key data will be located in the path wp-content/uploads/2011/12/85-trans.mol which you should upload. Note that only the MDL molfile is supported in this mode (which makes no server-side requests). One can use eg CML, but this must be as a server request.
  6. If you want multiple instances, then you must change each occurrence of the name of the variable, e.g. transformBallAndStick2 to be unique for each.
  7. If you want to annotate the resulting display, server-side requests are again needed. I do not illustrate these here, but there is an excellent tutorial.