Posts Tagged ‘detective’

Conference report: an example of collaborative open science (reaction IRCs).

Thursday, May 25th, 2017

It is a sign of the times that one travels to a conference well-connected. By which I mean email is on a constant drip-feed, with venue organisers ensuring each delegate receives their WiFi password even before their room key. So whilst I was at a conference espousing the benefits of open science, a nice example of open collaboration was initiated as a result of a received email.

Steven Kirk  contacted me with the following query: Do you know of any open-access database of calculated IRCs with coverage of as broad a range of classes of chemical reactions as possible? I recollected that about six years ago, I was exploring the use of iTunesU as a system for delivering course content in a rich-media format. I produced animations for about 115 reactions (many of which as it happens were taken from this blog, but quite a number were also unique to that project) and placed them into iTunesU, and now sending the URL https://itunes.apple.com/gb/course/id562191342 to Steven.

I should at this point explain something of the structure of such an iTunesU course.

  1. An essential feature is the course icon, seen below on the left. Since the course is hosted by Imperial College, it had to be an officially approved icon. I am sure you can believe me if I tell you that this took a month or so to obtain, with a fair bit of persistence required!
  2. I also had to get approval to place the iTunes app on all the teaching computers so that students could open the course. Believe me again when I tell you that I had to persuade the Apple lawyers in Cupertino to release a special license for this app to persuade our administrators here to install it on the Windows teaching clusters. Another few months had passed by.
  3. When creating an entry (using e.g. https://itunesu.itunes.apple.com/coursemanager/ ) one has to specify values for various descriptors, also often called metadata. Thus any one entry has fields for name and description, with the popularity added by Apple. Only a few words are visible in the description field, which can be expanded in iTunes using the i button.
  4. Steven meanwhile had replied asking if the original data that was used to generate the IRC might be available. Specifically his second question was “So the DOIs are only stamped into the animation’s bitmaps, or are they also somewhere in the metadata?“. That little i button is not easy to spot, and there is no indication, in the event, of what information it might actually contain.
  5. Here it is expanded. The contents are unstructured text, into which I have placed the required DOI.
  6. The lesson here is that I had fortunately had the foresight to include a link to the IRC data in anticipation of just such a question from someone in the future. But black mark to Apple here; the text cannot be selected and copied into a clipboard! It is fairly unFAIR data, since it can only be inter-operated (the I of FAIR) by a human re-typing it by hand. And the human has also to recognise the pattern of a DOI; a machine could not obtain this information easily. Moreover Steven is a Linux user; he does not readily have access to the iTunes app on this operating system!
  7. Also, there were 115 such entries, and now the prospect was rearing that each would have to be hand processed. Moreover, because the text was unstructured, there was no guarantee that I would have adopted the same pattern for all 115 entries.
  8. Fortunately Steven was on the ball. I quote again: it turns out iTunes isn’t needed at all. A service I found on the web http://picklemonkey.net/feedflipper-home/ takes an ITunes URL and converts it to an RSS feed. Opening this feed in Firefox and RSSOwl respectively let me save the feed as XML and HTML (both attached).
  9. This is currently where we stand (Steven’s first email was two days ago), but it’s not finished yet. Depending on how assiduous I was five years ago, some DOIs to the data may be acquired from the list. Sometimes I simply wrote e.g. See http://www.ch.imperial.ac.uk/rzepa/blog/?p=6816 knowing that the links to the data were there instead. I can already see that some descriptions have neither a DOI nor a link to the blog. More detective work will be needed, unfortunately.

How might the situation described above been avoided? Well, Apple in iTunesU only provided in effect one metadata field, and this was an unstructured one. Anything went in that field. Had they provided (or had the course creator been able to configure it themselves) there might have been another field entitled say “data source“. This could moreover been made a mandatory field and a structured one. Thus it might have only accepted known types of persistent identifier, such as a DOI. Further, the system could have checked that the DOI was actually resolvable. Before you ask, I did log a “bug” with Apple asking this be done, but nothing ever was. With such a tool to hand, I might have achieved data sources for all the 115 entries. The resulting XML (as generated above) could have been used to automate the retrieval of all 115 datasets describing this course. 

At this stage then, Steven can follow-up his interest in building a reaction IRC library and analysing it. I will do all I can to encourage Steven not to make the mistakes I did and to ensure that any further data that is required to augment the library does not suffer the problems above. On the other hand, I console myself that in two days, much of the data for the course I created five years ago was salvageable; I wonder how many other iTunesU courses there are for which that can be said!

I will let (with some blushing) the final word be Steven’s: You are one of the few chemists who has both pioneered and built the principles of ‘open chemistry’ into their actual scientific work. I visit your blog occasionally knowing that there is a very high probability I could download and tinker with the results of real calculations.


Might I assure all the speakers that I concentrated totally on their talks rather than incoming emails!

William Henry Perkin: The site of the factory and the grave.

Monday, March 11th, 2013

William Henry Perkin is a local chemical hero of mine. The factory where he founded the British (nay, the World) fine organic chemicals industry is in Greenford, just up the road from where we live. The factory used to be close to the Black Horse pub (see below) on the banks of the grand union canal. It is now commemorated merely by a blue plaque placed on the wall of the modern joinery building occupying the location (circled in red on the photo).


View Larger Map

Perkin-Factory-plaquePerkin-Factory

But when BBC TV contacted me to ask where his grave was, a little detective work was needed to track it down to the cemetery in Christchurch, Roxeth (near Harrow-on-the-Hill). 


View Larger Map

Perkin's_GravestoneWide-shot-of-grave

And if you ever need to track me down, my office window is the one with the translucent image of a mauveine molecular orbital.


View Larger Map

Click for  3D

Click for 3D

Historical detective stories: colourful crystals.

Friday, October 21st, 2011

Organic chemists have been making (more or less pure) molecules for the best part of 180 years. Occasionally, these ancient samples are unearthed in cupboards, and then the hunt for their origin starts. I have previously described tracking down the structure of a 120 year-old sample of a naphthalene derivative. But I visited a colleague’s office today, and recollected having seen a well-made wooden display cabinet there on a previous visit. Today I took a photo of one of the samples:

One of the "Hofmann" collection.

No date, no name, but a structure! As I noted before, when it comes to structures, you have to research the conventions (and numbering) used at the time. Thus note the apparent cyclohexane rings, the N(Me)2 group and the lack of stereochemistry around the alkenes. The former dates the sample to before 1950, whilst the use of Me to mean methyl puts it in the 20th century. Which is shame, since it had been known as the “Hofmann” collection, meaning some sort association with August von Hofmann, the first professor of organic chemistry in the UK, who occupied that position from 1845-1864. Samples that old are very rare. The one above by the way is very deep green (the photo does not do it justice), and very crystalline! Tracing the history of where the display cabinet might have been did indeed reveal that it probably started its life at the same institute as Hofmann was working in (and where I now work), but little more than this was known about it.

A search of the Beilstein database (nowadays known as Reaxys) revealed a collection of samples corresponding to the above structure (with benzenes of course, not cyclohexanes), but co-crystallised with different molecules, and dating from 1921. These were known as the Heilbron collection, and this was encouraging, since Heilbron was indeed a successor to Hofmann, being active in the 1920s. During his career, he and his students probably made 100s, if not 1000s of compounds, so why did they go to the considerable expense of having beautiful wooden cases built to house these particular samples? Probably because the basic colour varied from yellow to black (perhaps 400nm difference in λmax) and for which they had no explanation! So, much like some people are cryofrozen in the hope an advanced civilisation might bring them back to life in the future, these samples were mounted in a display cabinet in the hope that someone would find out the origins of their variable colour.

Well, in 1984 (some 63 years after the event) researchers in the Technion-lsrael Institute of Technology, Haifa, came upon the 1921 article (but not the samples; if they read this they might be amazed that these still exist!), repeated (most of the) syntheses, and determined the crystal structure of three of the molecules (but conspicuously not the one above). One 3D structure is shown below. The colours were ascribed to charge-transfer interactions between the components of the molecules.

DADZIR. Click for 3D

As I noted previously, it is well worth preserving chemical samples for future generations (and sometimes that generation is 120 years in the future!). Sadly, health and safety aspects (real or imagined) mean that such collections are being lost to posterity at an every increasing rate. Soon, there may be no collections of old chemicals left. That would be indeed a loss to science. So if you know of a lovingly preserved case of old chemicals, go take a look at it. And if it’s in danger of being put in the skip, then rescue it. There is no telling what may be scientifically interesting about it.

A historical detective story: 120 year old crystals

Wednesday, November 17th, 2010

In 1890, chemists had to work hard to find out what the structures of their molecules were, given they had no access to the plethora of modern techniques we are used to in 2010. For example, how could they be sure what the structure of naphthalene was? Well, two such chemists, William Henry Armstrong (1847-1937) and his student William Palmer Wynne (1861-1950; I might note that despite working with toxic chemicals for years, both made it to the ripe old age of ~90!) set out on an epic 11-year journey to synthesize all possible mono, di, tri and tetra-substituted naphthalenes. Tabulating how many isomers they could make (we will call them AW here) would establish beyond doubt the basic connectivity of the naphthalene ring system. This was in fact very important, since many industrial dyes were based on this ring system, and patents depended on getting it correct! Amazingly, their collection of naphthalenes survives to this day. With the passage of 120 years, we can go back and check their assignments. The catalogued collection (located at Imperial College) comprises 263 specimens. Here the focus is on just one, specimen number number 22, which bears an original label of trichloronaphthalene [2:3:1] and for which was claimed a melting point of 109.5°C. What caught our attention is that a search for this compound in modern databases (Reaxys if you are interested, what used to be called Beilstein) reveals the compound to have a melting point of ~84°C. So, are alarm bells ringing? Did AW make a big error? Were many of the patented dyes not what they seemed?

1,2,3-trichloronaphthalene

The story starts to get murky when Reaxys reports the earliest literature for this compound as being 1941 (DOI: 10.1039/JR9410000243), the authority being Wynne himself (now a sprightly 80). The collection of 263 specimens was thought to go back to the 1890s, so how could it contain a compound only made about 50 years later? Time to do an X-ray determination. Remarkably, the 120 year old crystals of specimen 22 were still in good shape, but the determined structure held an initial surprise. The compound was in fact 1,6,7-trichloronaphthalene, quite a different species from the label.

1,6,7-trichloronaphthalene

So, did AW get things badly wrong, and were all those patents based on these structures potentially invalid? A little more detective work using Reaxys reveals that the 1,6,7 isomer melts at 109.5°C, the same as reported by AW in 1890 (Chem. News J. Ind. Sci., 1890 , 61, p. 273). So how did the 1,6,7-compound come to be mistaken for a 1,2,3,-isomer? The culprit turns out to be one prime (‘).

1,6,7 = 2:3:1' Click for 3D

Updated (see comment) Click for 3D

The numbering system in 1890 was different from what it is now. Then, primes were used to distinguish the numbering on each ring. When the collection was catalogued (in the 1990s), the 1′ was mistaken for 1 (you can see the prime on the original label). AW were correct all along, and the patent owners for all those naphthalene dyes can rest easy.

Sample 22 from AW collection

What this teaches us is that crystallography on 120 year old organic compounds is perfectly viable. Indeed, can anyone else claim to have solved the structure of such an old compound? And that those old chemists knew what they were doing, despite not having any instrumentation to help them. Oh, and a final comment. Precious few collections of molecules made by the original scientists exist nowadays. Many a collection has literally been skipped because of health and safety concerns. The AW collection itself was rescued from oblivion by the narrowest of margins. And we have permanently lost the opportunity for any detective work of the type described above. You can see that I am very upset by this. Heritage conservation should not just be old buildings, paintings etc, but the chemical heritage collections as well.

Thanks to Andrew White for the crystal structures (of this and three other samples, but their stories are for another day).