Posts Tagged ‘Imperial College’
Wednesday, August 8th, 2018
White City is a small area in west london created as an exhibition site in 1908, morphing over the years into an Olympic games venue, a greyhound track, the home nearby of the BBC (British Broadcasting Corporation) and most recently the new western campus for Imperial College London.♣ The first Imperial department to move into the MSRH (Molecular Sciences Research Hub) building is chemistry. As a personal celebration of this occasion, I here dedicate three transition states located during my first week of occupancy there, naming them the White City trio following earlier inspiration by a string trio and their own instruments.
The chemistry revisits the mechanism of amide formation from an acid and an amine, which I first described on this blog about four years ago. I had constructed a model of one amine and one carboxylic acid, to which I added a further acid in recognition that proton transfers are a key aspect of the mechanism. When the model is quantified using quantum calculations (ωB97XD/6-311G(d,p)/SCRF=p-toluene) it resulted in a free energy barrier ΔG298‡ of about 22 kcal/mol. Re-reading what I wrote, I see I did rather gloss over this value, which implies a decently rapid reaction! In fact, the reaction occurs relatively slowly at the temperature of refluxing toluene. Perhaps some alarm bells should have been tinkling at this stage (although the sluggish reaction might for example instead be due to poor solubility) and so here I have a rethink of the model used to see if that modest barrier really is correct.
The new premise is to test if the required proton transfers can instead be mediated using a second molecule of amine instead of acid; thus two molecules of carboxylic acid are now accompanied by two of amine, one of which will be used to transfer protons. The second acid is retained to facilitate comparison. As before, the mechanism is characterised by three transition states and two tetrahedral intermediates. The new mechanism is summarised below, with TS1-3 being the White City Trio.
The free energies are summarised in the table below. TS3, the rate limiting step, is slightly lower in energy if the amine is used for the proton transfer than via carboxylic acid. This is the wrong direction; we really want the barrier to increase to explain the relative difficulty of the reaction as observed in refluxing toluene! Fear not however, the new barrier is indeed a much more sluggish 28.6 kcal/mol (30.5 using a larger basis set).
How did this happen? It’s the reactants! The original reactant model was based on the known structure of acetic acid dimer, with an amine weakly hydrogen bonded. Adding an extra amine now allows an entirely new motif to form, in which the amine disrupts the acetic dimer to form a cyclic system with a pair of very strong (-)O-H-N(+)-H-O(-) hydrogen bond units.† The original model did not have sufficient components to fully allow this to happen.
So the White City Trio achieve a performance which helps explain why a reaction is sluggish rather than facile (normally one strives to show the opposite). Perhaps however it should be the White City quartet, in recognition that the reactant also had a role to play?
♣A photograph of the building under construction can be seen here. ‡Def2-TZVPPD basis set. †There does not appear to be a recorded structure for methylammonium acetate. We hope to obtain one to check what the extended structure actually is. ♥I will elaborate an interesting new use of this value in a separate post.
Tags:acetic acid, Acid, Amide, Amine, carboxylic acid, Chemistry, Company: BBC, Company: British Broadcasting Corporation, energy, Ester, exhibition site, free energy barrier, Functional groups, Hydrogen bond, Imperial College, Imperial College London, Ionic product, Newspaper & Magazine Printing Services, Non-ionic product, Olympic games, Organic chemistry, White City Trio
Posted in Interesting chemistry | 6 Comments »
Wednesday, August 8th, 2018
White City is a small area in west london created as an exhibition site in 1908, morphing over the years into an Olympic games venue, a greyhound track, the home nearby of the BBC (British Broadcasting Corporation) and most recently the new western campus for Imperial College London.♣ The first Imperial department to move into the MSRH (Molecular Sciences Research Hub) building is chemistry. As a personal celebration of this occasion, I here dedicate three transition states located during my first week of occupancy there, naming them the White City trio following earlier inspiration by a string trio and their own instruments.
The chemistry revisits the mechanism of amide formation from an acid and an amine, which I first described on this blog about four years ago. I had constructed a model of one amine and one carboxylic acid, to which I added a further acid in recognition that proton transfers are a key aspect of the mechanism. When the model is quantified using quantum calculations (ωB97XD/6-311G(d,p)/SCRF=p-toluene) it resulted in a free energy barrier ΔG298‡ of about 22 kcal/mol. Re-reading what I wrote, I see I did rather gloss over this value, which implies a decently rapid reaction! In fact, the reaction occurs relatively slowly at the temperature of refluxing toluene. Perhaps some alarm bells should have been tinkling at this stage (although the sluggish reaction might for example instead be due to poor solubility) and so here I have a rethink of the model used to see if that modest barrier really is correct.
The new premise is to test if the required proton transfers can instead be mediated using a second molecule of amine instead of acid; thus two molecules of carboxylic acid are now accompanied by two of amine, one of which will be used to transfer protons. The second acid is retained to facilitate comparison. As before, the mechanism is characterised by three transition states and two tetrahedral intermediates. The new mechanism is summarised below, with TS1-3 being the White City Trio.
The free energies are summarised in the table below. TS3, the rate limiting step, is slightly lower in energy if the amine is used for the proton transfer than via carboxylic acid. This is the wrong direction; we really want the barrier to increase to explain the relative difficulty of the reaction as observed in refluxing toluene! Fear not however, the new barrier is indeed a much more sluggish 28.6 kcal/mol (30.5 using a larger basis set).
How did this happen? It’s the reactants! The original reactant model was based on the known structure of acetic acid dimer, with an amine weakly hydrogen bonded. Adding an extra amine now allows an entirely new motif to form, in which the amine disrupts the acetic dimer to form a cyclic system with a pair of very strong (-)O-H-N(+)-H-O(-) hydrogen bond units.† The original model did not have sufficient components to fully allow this to happen.
So the White City Trio achieve a performance which helps explain why a reaction is sluggish rather than facile (normally one strives to show the opposite). Perhaps however it should be the White City quartet, in recognition that the reactant also had a role to play?
♣A photograph of the building under construction can be seen here. ‡Def2-TZVPPD basis set. †There does not appear to be a recorded structure for methylammonium acetate. We hope to obtain one to check what the extended structure actually is. ♥I will elaborate an interesting new use of this value in a separate post.
Tags:acetic acid, Acid, Amide, Amine, carboxylic acid, Chemistry, Company: BBC, Company: British Broadcasting Corporation, energy, Ester, exhibition site, free energy barrier, Functional groups, Hydrogen bond, Imperial College, Imperial College London, Ionic product, Newspaper & Magazine Printing Services, Non-ionic product, Olympic games, Organic chemistry, White City Trio
Posted in Interesting chemistry | 6 Comments »
Tuesday, January 23rd, 2018
Another occasional conference report (day 1). So why is one about “persistent identifiers” important, and particularly to the chemistry domain?
The PID most familiar to most chemists is the DOI (digital object identifier). In fact there are many; some 60 types have been collected by ORCID (themselves purveyors of researcher identifiers). They sometimes even have different names; in life sciences they tend to be known instead as accession numbers. One theme common to many (probably not all) is that they represent sources of metadata about the object being identified. Further information if which allows you (or a machine) to decide if acquiring the full object is worthwhile. So in no particular order, here are some of the things I learnt today.
- Mark Hahnel noted the recent launch of the Dimensions resource which links research data with other research activities; I have not yet had a chance to learn its capabilities, but it seems an interesting alternative to other stalwarts such as eg Google Scholar etc.
You can try this example: https://app.dimensions.ai/discover/publication?search_text=10.6084&search_type=kws&full_search=true which retrieves articles in which the data repository with prefix 10.6084 (Figshare) is cited. Try also the prefix 10.14469 which is the Imperial College repository.
- Andy Mabbett talked about the deployment and use of persistent identifiers (the Q numbers) in Wikidata, which increasingly underpin the basis for the various flavours of Wikipedia. He also noted their use of some 50 different identifiers.
- Johanna McEntyre noted some 5M published articles in life sciences which reference 1M+ ORCID identifiers, easily the domain with the fastest uptake of this type. Also noted was the new FREYA project; aiming to connect open identifiers for discovery, access and use of research resources.
- Tom Gillespie talked about RRID, or Research Resource Identifiers. Included in this are hardware, including instruments and with around 6000 RRIDs systematized so far. They argue this area promotes both the A and I of FAIR (accessible and inter-operable). Of course A and I mean many things to many people.
- Several other presentations talked about the finer detail of metadata, such as sub-classifications into e.g. descriptive/admin/technical, but I did rather miss demos showing how search queries of such fine-grained metadata could be constructed.
Apart from the presentations themselves, PIDapalooza is unusual for some other activities. Thus you could go get your PIDnails done, with a selection of 8 or so tasteful logos to choose from. There will be tattoos tomorrow (this is a conference for younger people after all). I may grab a photo or two to provide evidence!
Tags:Academic publishing, Andy Mabbett, Digital Object Identifier, Identifiers, Imperial College, Index, Information science, Johanna McEntyre, Knowledge, Mark Hahnel, ORCiD, Persistent identifier, Publishing, Quotation, researcher, Scholarly communication, SciCrunch, search engines, Technical communication, Technology/Internet, Tom Gillespie
Posted in Chemical IT | 1 Comment »
Thursday, May 25th, 2017
It is a sign of the times that one travels to a conference well-connected. By which I mean email is on a constant drip-feed, with venue organisers ensuring each delegate receives their WiFi password even before their room key. So whilst I was at a conference espousing the benefits of open science, a nice example of open collaboration was initiated as a result of a received email.‡
Steven Kirk
contacted me with the following query: Do you know of any open-access database of calculated IRCs with coverage of as broad a range of classes of chemical reactions as possible? I recollected that about six years ago, I was exploring the use of iTunesU as a system for delivering course content in a rich-media format. I produced animations for about 115 reactions (many of which as it happens were taken from this blog, but quite a number were also unique to that project) and placed them into iTunesU, and now sending the URL https://itunes.apple.com/gb/course/id562191342 to Steven.
I should at this point explain something of the structure of such an iTunesU course.
- An essential feature is the course icon, seen below on the left. Since the course is hosted by Imperial College, it had to be an officially approved icon. I am sure you can believe me if I tell you that this took a month or so to obtain, with a fair bit of persistence required!
- I also had to get approval to place the iTunes app on all the teaching computers so that students could open the course. Believe me again when I tell you that I had to persuade the Apple lawyers in Cupertino to release a special license for this app to persuade our administrators here to install it on the Windows teaching clusters. Another few months had passed by.

- When creating an entry (using e.g. https://itunesu.itunes.apple.com/coursemanager/ ) one has to specify values for various descriptors, also often called metadata. Thus any one entry has fields for name and description, with the popularity added by Apple. Only a few words are visible in the description field, which can be expanded in iTunes using the i button.

- Steven meanwhile had replied asking if the original data that was used to generate the IRC might be available. Specifically his second question was “So the DOIs are only stamped into the animation’s bitmaps, or are they also somewhere in the metadata?“. That little i button is not easy to spot, and there is no indication, in the event, of what information it might actually contain.
- Here it is expanded. The contents are unstructured text, into which I have placed the required DOI.

- The lesson here is that I had fortunately had the foresight to include a link to the IRC data in anticipation of just such a question from someone in the future. But black mark to Apple here; the text cannot be selected and copied into a clipboard! It is fairly unFAIR data, since it can only be inter-operated (the I of FAIR) by a human re-typing it by hand. And the human has also to recognise the pattern of a DOI; a machine could not obtain this information easily. Moreover Steven is a Linux user; he does not readily have access to the iTunes app on this operating system!
- Also, there were 115 such entries, and now the prospect was rearing that each would have to be hand processed. Moreover, because the text was unstructured, there was no guarantee that I would have adopted the same pattern for all 115 entries.
- Fortunately Steven was on the ball. I quote again: it turns out iTunes isn’t needed at all. A service I found on the web http://picklemonkey.net/feedflipper-home/ takes an ITunes URL and converts it to an RSS feed. Opening this feed in Firefox and RSSOwl respectively let me save the feed as XML and HTML (both attached).
- This is currently where we stand (Steven’s first email was two days ago), but it’s not finished yet. Depending on how assiduous I was five years ago, some DOIs to the data may be acquired from the list. Sometimes I simply wrote e.g. See http://www.ch.imperial.ac.uk/rzepa/blog/?p=6816 knowing that the links to the data were there instead. I can already see that some descriptions have neither a DOI nor a link to the blog. More detective work will be needed, unfortunately.
How might the situation described above been avoided? Well, Apple in iTunesU only provided in effect one metadata field, and this was an unstructured one. Anything went in that field. Had they provided (or had the course creator been able to configure it themselves) there might have been another field entitled say “data source“. This could moreover been made a mandatory field and a structured one. Thus it might have only accepted known types of persistent identifier, such as a DOI. Further, the system could have checked that the DOI was actually resolvable. Before you ask, I did log a “bug” with Apple asking this be done, but nothing ever was. With such a tool to hand, I might have achieved data sources for all the 115 entries. The resulting XML (as generated above) could have been used to automate the retrieval of all 115 datasets describing this course.
At this stage then, Steven can follow-up his interest in building a reaction IRC library and analysing it. I will do all I can to encourage Steven not to make the mistakes I did and to ensure that any further data that is required to augment the library does not suffer the problems above. On the other hand, I console myself that in two days, much of the data for the course I created five years ago was salvageable; I wonder how many other iTunesU courses there are for which that can be said!
I will let (with some blushing) the final word be Steven’s: You are one of the few chemists who has both pioneered and built the principles of ‘open chemistry’ into their actual scientific work. I visit your blog occasionally knowing that there is a very high probability I could download and tinker with the results of real calculations.
‡Might I assure all the speakers that I concentrated totally on their talks rather than incoming emails!
Tags:animation, chemical reactions, City: Cupertino, Company: Cupertino Elec, Company: Firefox Communic, Computer Hardware - NEC, computing, detective, Digital media, Drip, Electronic documents, Electronic publishing, Email, HTML, Imperial College, Linux, operating system, Password, Person Location, Steven Kirk, Technology/Internet, XML
Posted in Chemical IT | No Comments »
Thursday, February 2nd, 2017
Almost exactly 20 years ago, I started what can be regarded as the precursor to this blog. As part of a celebration of this anniversary, I revisited the page to see whether any of it had withstood the test of time. Here I recount what I discovered.
The site itself is at www.ch.ic.ac.uk/motm/perkin.html and has the title “Mauveine: The First Industrial Organic Fine-Chemical” It was an application of an earlier experiment[1] to which we gave the title “Hyperactive Molecules and the World-Wide-Web Information System“. The term hyperactive was supposed to be a play on hyperlinking to the active 3D models of molecules built using their 3D coordinates. The word has another, more negative, association with food additives such as tartrazine – which can induce hyperactivity in children – and we soon discontinued the association. This page was cast as a story about a molecule local to me in two contexts; the first being that the discoverer of mauveine, W. H. Perkin, had been a student at what is now the chemistry department at Imperial College. The second was the realization that where we lived in west London was just down the road from Perkin’s manufacturing factory. Armed with (one of the first) digital cameras, a Kodak DC25, I took some pictures of the location and added them later to the web page. The page also included two sets of 3D coordinates for mauveine itself and alizarin, another dyestuff associated with the factory. These were “activated” using HTML to make use of the then very new Chime browser plugin; hence the term hyperactive molecule.
This first effort, written in December 1995, soon needed revision in several ways. I note that I had maintained the site in 1998, 2001, 2004 and 2006. This took the form of three postscripts to add further chemical context and more recent developments and in replacing the original Chime code for Java code to support the new Jmol software (Chime itself had been discontinued, probably around 2001 or possibly 2004). With the passage of a further ten years, I now noticed that the hyperactive molecules were no longer working; the original Jmol applet was no longer considered secure by modern browsers and hence deactivated. So I replaced this old code with the latest version (14.7.5 as JmolAppletSigned.jar) and this simple fix has restored the functionality. The coordinates themselves were invoked using the HTML applet tag, which amazingly still works (the applet tag had replaced an earlier one, which I think might have been embed?). A modern invocation would be by using e.g. the JSmol Javascript based tool and so perhaps at some stage this code will indeed need further revision when the Java-based applet is permanently disabled.

You may also notice that the 3D coordinates are obtained from an XML document, where they are encoded using CML (chemical markup language[2]), which is another expression from the family that HTML itself comes from. That form may well last rather longer than earlier formats – still commonly used now – such as .pdb or .mol (for an MDL molfile).
Less successful was the attempt to include buttons which could be used to annotate the structures with highlights. These buttons no longer work and will have to be entirely replaced in the future at some stage.

The final part of the maintenance (which I had probably also done with the earlier versions) was to re-validate the HTML code. Checking that a web page has valid HTML was always a behind-the-scenes activity which I remember doing when constructing the ECTOC conferences also back in 1995 and doing so probably does prolong the longevity of a web page. This requires “tools-of-the-trade” and I use now (and indeed did also back in 1995 or so) an industrial strength HTML editor called BBedit. To this is added an HTML validation tool, the installation of which is described at https://wiki.ch.ic.ac.uk/wiki/index.php?title=It:html5 I re-ran this again† and so this 2017 version should be valid for a little while longer at least. The page itself now has not just a URL but a persistent version called a DOI (digital object identifier), which is 10.14469/hpc/2133[3]. In theory at least, even if the web server hosting the page itself becomes defunct, the page could – if moved – be found simply from its DOI. The present URL-based hyperlink of course is tied to the server and would not work if the server stopped serving.
To complete this revisitation, I can add here a recent result‡. Back in 1995, I had obtained the 3D coordinates of mauveine using molecular modelling software (MOPAC) together with a 2D structure drawing package (ChemDraw) because no crystal structure was available. Well, in 2015 such structures were finally published.[4] Twenty years on from the original “hyperactive” models, their crystal structures can be obtained from their assigned DOI, much in the same manner as is done for journal articles: Try DOI: 10.5517/CC1JLGK4[5] or DOI: 10.5517/CC1JLGL5[6].
At some stage, web archaeology might become a fashionable pursuit. Twenty year old Web pages are actually not that common and it would be of interest to chart their gradual decay as security becomes more important and standards evolve and mature. One might hope that at the age of 100, they could still be readable (or certainly rescuable). During this period, the technology used to display 3D models within a web page has certainly changed considerably and may well still do so in the future. Perhaps I will revisit this page in 2037 to see how things have changed!
†The old code can still be seen at www.ch.ic.ac.uk/motm/perkin-old.html
‡It should really be postscript 4.
References
- O. Casher, G.K. Chandramohan, M.J. Hargreaves, C. Leach, P. Murray-Rust, H.S. Rzepa, R. Sayle, and B.J. Whitaker, "Hyperactive molecules and the World-Wide-Web information system", Journal of the Chemical Society, Perkin Transactions 2, pp. 7, 1995. https://doi.org/10.1039/p29950000007
- P. Murray-Rust, and H.S. Rzepa, "Chemical Markup, XML, and the Worldwide Web. 1. Basic Principles", Journal of Chemical Information and Computer Sciences, vol. 39, pp. 928-942, 1999. https://doi.org/10.1021/ci990052b
- H. Rzepa, "Molecule of the month: Mauveine.", Imperial College London, 2017. https://doi.org/10.14469/hpc/2133
- M.J. Plater, W.T.A. Harrison, and H.S. Rzepa, "Syntheses and Structures of Pseudo-Mauveine Picrate and 3-Phenylamino-5-(2-Methylphenyl)-7-Amino-8-Methylphenazinium Picrate Ethanol Mono-Solvate: The First Crystal Structures of a Mauveine Chromophore and a Synthetic Derivative", Journal of Chemical Research, vol. 39, pp. 711-718, 2015. https://doi.org/10.3184/174751915x14474318419130
- Plater, M. John., Harrison, William T. A.., and Rzepa, Henry S.., "CCDC 1417926: Experimental Crystal Structure Determination", 2016. https://doi.org/10.5517/cc1jlgk4
- Plater, M. John., Harrison, William T. A.., and Rzepa, Henry S.., "CCDC 1417927: Experimental Crystal Structure Determination", 2016. https://doi.org/10.5517/cc1jlgl5
Tags:10.5517, Advertising & Marketing - NEC, chemical context, chemical markup language, City: London, Commercial REITs - NEC, Company: Chime, Company: Eastman Kodak, Company: First Industrial, digital cameras, Digital Object Identifier, food additives, HTML, Imperial College, industrial strength HTML editor, Java, JavaScript, manufacturing factory, mauveine using molecular modelling software, Person Attributes, Photographic Equipment, Technology/Internet, validation tool, Web, web archaeology, web server, XML, year old Web pages
Posted in Chemical IT, Historical | 1 Comment »
Friday, June 3rd, 2016
The title might give it away; this is my 500th blog post, the first having come some seven years ago. Very little online activity nowadays is excluded from measurement and so it is no surprise that this blog and another of my “other” scholarly endeavours, viz publishing in traditional journals, attract such “metrics” or statistics. The h-index is a well-known but somewhat controversial measure of the impact of journal articles; here I thought I might instead take a look at three less familiar ones – one relating to blogging, one specific to journal publishing and one to research data.
First, an update on the accumulated outreach of this blog over this seven-year period. The total number of country domains measured is 190. The African continent still has quite a few areas with zero hits (as does Svalbard, with a population of only 2600 for a land mass area 61,000 km2 or 23 km2 per person). Given the low blog readership density on the African continent, it would be interesting to find out whether journal readership is any better.

Next, I look at the temporal distribution for individual posts. The first has attracted the highest total; in five years it has had 19,262 views (the diagram below shows the number of views per day). Four others exceed 10,000 and 80 exceed 1000 views.

Of these five, the next is the oldest, going back to 2009. I was very surprised to find such longevity, with the number of views increasing rather than decreasing with the passage of time.

So time now to compare these statistics with the journals. And of course its chalk and cheese. A “view” for a post means someone (or something) accessing the post URL, which is then recorded in the server log. Resolving the URL does at least load the entire content of the post; whether its read or not is of course not recorded. Importantly, if you want to view the content at some later stage, a new “view” has to be made (although some browsers do save a web page and allow offline viewing at a later stage, but I suspect this usage is low). With electronic journal access, it’s rather different. Access to an article is now predominantly via two mechanisms:
- From the table of contents (this is somewhat analogous to browsing a blog)
- From the article DOI.
Statistics for these two methods are gathered differently. The new CrossRef resource chronograph.labs.crossref.org (CrossRef allocate all journal DOIs) can be used to measure what they call DOI “resolutions”. A DOI resolution however leads one only to what is called the “landing page”, where the interested reader can view the title, the graphical abstract and some other metadata. It does not mean of course that they go on to actually view the article (as HTML, equivalent to the blog above, or probably more often by downloading a PDF file). Here are a few results using this method:
What about the other main journal article access method, not via a DOI but from a table of contents page journal page? A Google search revealed this site: jusp.mimas.ac.uk (JUSP stands for Journal usage statistics portal, which sounded promising). This site collects “COUNTER compliant usage data”. COUNTER (Counting Online Usage of Networked Electronic Resources) is an initiative supported by many journal publishers and it sounds an interesting way of measuring “usage” (as opposed to “views” or “resolutions”; it’s that chalk and cheese again!). I would love to be able to show you some statistics using this resource, but the “small print” caught me out: “JUSP gives librarians a simple way of analysing the value and impact of their electronic journals”. Put simply, I am a researcher, not a librarian. As a researcher I do not have direct access; JUSP is a closed, restricted access (albeit taxpayer-funded) resource. I am discussing this with our head of information resources (who is a librarian) and hope to report back here on the outcome.
Finally research data. This is almost too new to be able to measure, but this resource stats.datacite.org is starting to collect statistics on data resolutions (similar to DOI resolutions).
- You can see from the below for Imperial College (in fact this represents the two data repositories that we operate and which I cite here extensively on these blogs) that the resolution at running up to about 200 a month per dataset (more typically ~25 a month), with a total of 5065 resolutions for all items in March 2016 (the blog has ~12,000 views per month).

- Figshare is another data repository we have made use of:
So to the summary.
- Firstly, we see that I have shown three forms of impact, views, resolutions and usage. If one had statistics on all three, one might then try to see if they are correlated in any way. Even then, normalisation might be a challenge.
- Over ~7 years, five posts on this blog have attracted >10,000 views.
- Many of the blog posts have a long “finish” (to use a wine tasting term); the views continue regularly and often increase over time.
- My analysis of the three journal articles above (and about 15 others) shows that between 50-300 resolutions over a few years is fairly typical (for this researcher at least; I am sure most better known researchers attract far far more).
- The temporal distribution for article resolutions and blog views show both can have continuing impact over an extended period. None of the 18 articles I looked at show a significantly increasing impact with time but many of the blog posts do. This tends to suggest that the audiences for each are quite different; researchers for articles and a fair proportion of inquisitive students for the blog?
- I may speculate whether a correlation between my article resolutions and my h-index probably might be found, but the article resolution has a fine-grained temporal resolution (allowing a derivative wrt time to be obtained) that is perhaps potentially more valuable than just the coarse h-index integration (an article can of course be cited for both positive and negative reasons!).
- Initial analysis for data shows resolutions running at a similar rate to article resolutions. It is not yet possible to correlate data resolutions with article resolutions in which that data is discussed.
References
- S.M. Rappaport, and H.S. Rzepa, "Intrinsically Chiral Aromaticity. Rules Incorporating Linking Number, Twist, and Writhe for Higher-Twist Möbius Annulenes", Journal of the American Chemical Society, vol. 130, pp. 7613-7619, 2008. https://doi.org/10.1021/ja710438j
- A.E. Aliev, J.R.T. Arendorf, I. Pavlakos, R.B. Moreno, M.J. Porter, H.S. Rzepa, and W.B. Motherwell, "Surfing π Clouds for Noncovalent Interactions: Arenes versus Alkenes", Angewandte Chemie International Edition, vol. 54, pp. 551-555, 2014. https://doi.org/10.1002/anie.201409672
- K. Abersfelder, A.J.P. White, H.S. Rzepa, and D. Scheschkewitz, "A Tricyclic Aromatic Isomer of Hexasilabenzene", Science, vol. 327, pp. 564-566, 2010. https://doi.org/10.1126/science.1181771
Tags:Country: Svalbard and Jan Mayen, CrossRef, head of information resources, HTML, Imperial College, librarian, online activity, Online Usage, PDF, researcher, search engines, usage statistics portal
Posted in Chemical IT | 4 Comments »
Monday, March 7th, 2016
The upcoming ACS national meeting in San Diego has a CINF (chemical information division) session entitled "Global initiatives in research data management and discovery". I have highlighted here just one slide from my contribution to this session, which addresses the discovery aspect of the session.
Data, if you think about it, is rarely discoverable other than by intimate association with a narrative or journal article. Even then, the standard procedure is to identify the article itself as being of interest, and then digging out the "supporting information", which normally takes the form of a single paginated PDF document. If you are truly lucky, you might also get a CIF file (for crystal structures). But such data has little life of its own outside of its parent, the article. Put another way, it has no metadata it can call its own (metadata is data about an object, in this case research data). An alternative is to try to find the data by searching conventional databases such as CAS, Beilstein/Reaxys or CSD, and there of course the searches can be very precise. But (someone) has to pay the bills for such accessibility.
We are now starting to see quite different solutions to finding data (the F in FAIR data, the other letters representing accessibility, interoperability and re-usability). These solutions depend on metadata being a part of the solution from the outset, rather than any afterthought produced as a commercial solution. The collection of metadata is part of the overall process called RDM, or research data management, perhaps even the most important part of it. In exchange for identifying metadata about one's data, one gets back a "receipt" in the form of a persistent identifier for the data, more commonly known as a DOI. The agency that issues the DOI also undertakes to look after the donated metadata, and to make it searchable. The table below shows eight searches of such metadata, one example of how to acquire statistics relating to the usage of the data and one search of how to find repositories containing the data.
‡In this instance the three MIME media types are chemical/x-wavefunction, chemical/x-gaussian-checkpoint and chemical/x-gaussian-log. See[1] for chemical MIME (multipurpose internet media extensions).
Anyone familiar with the standard ways of finding data (CAS, CSD, Reaxys) will appreciate that the above does not yet have the finesse to find eg sub-structures of chemical structures, synthetic procedures or molecular properties. My including it here is primarily to show some of the potential such systems have, and to remark particularly that the batch query capability of this infrastructure could indeed be used in the future to construct much more sophisticated systems. Oh, and to the end-user at least, the searches shown above do not require institutional licenses to use. Both the data and its metadata is free, mostly with a CC0 or CC BY 3.0 license for re-use (the R of FAIR).
If more of interest related to this topic emerges at the ACS session, I will report back here.
References
- H.S. Rzepa, P. Murray-Rust, and B.J. Whitaker, "The Application of Chemical Multipurpose Internet Mail Extensions (Chemical MIME) Internet Standards to Electronic Mail and World Wide Web Information Exchange", Journal of Chemical Information and Computer Sciences, vol. 38, pp. 976-982, 1998. https://doi.org/10.1021/ci9803233
Tags:Academic publishing, chemical, chemical information division, Chemical nomenclature, chemical structures, Chemical substance, chemical/x-wavefunction, Cheminformatics, City: San Diego, content media, data repository search, format type chemical/x-* , Identifiers, Imperial College, Imperial College London, International Chemical Identifier, JSON, media types, multipurpose internet media extensions, ORCiD, PDF, potential such systems, research data management, Search queries, Technical communication, Technology/Internet
Posted in Chemical IT | 2 Comments »
Friday, June 26th, 2015
Open principles in the sciences in general and chemistry in particular are increasingly nowadays preached from funding councils down, but it can be more of a challenge to find innovative practitioners. Part of the problem perhaps is that many of the current reward systems for scientists do not always help promote openness. Jean-Claude Bradley was a young scientist who was passionately committed to practising open chemistry, even though when he started he could not have anticipated any honours for doing so. A year ago a one day meeting at Cambridge was held to celebrate his achievements, followed up with a special issue of the Journal of Cheminformatics. Peter Murray-Rust and I both contributed and following the meeting we decided to help promote Open Chemistry via an annual award to be called the Bradley-Mason prize. This would celebrate both “JC” himself and Nick Mason, who also made outstanding contributions to the cause whilst studying at Imperial College. The prize was initially to be given to an undergraduate student at Imperial, but was also extended to postgraduate students who have promoted and showcased open chemistry in their PhD researches.
Peter and I are delighted to announce the inaugural winners of this prize.
The postgraduate winner is Tom Phillips for his open blog describing his experiences as a PhD student and for leading by example. He has published his instrumental codes on Github (and now Zenodo[1]) and data and codes for reproducing the graphs in his work on the “lab on a chip” in Figshare[2] and through his blog has encouraged other research students to do the same. Tom has worked assiduously to ensure that all the articles describing his PhD work are or will be open access.[3]
The undergraduate winner is Tom Arrow for his “spare time” involvement with WikiMedia (the foundation that underpins the open Wikipedia), including participating in a Wikimedia EU hackathon in Lyon France, and feeding his experiences and skills back into his undergraduate environment as well as enhancing the teaching Wiki used by his fellow students. Tom took the lead in introducing us to Wikidata[4] for storing chemical data in an open Wikibase data repository and in promoting its use for enriching Wikipedia chemistry pages and showcasing open data in undergraduate teaching environments.
References
- T. Phillips, and S. Macbeth, "pumpy: Zenodo release", 2015. https://doi.org/10.5281/zenodo.19033
- T. Phillips, J.H. Bannock, and J.D. Mello, "Data for microscale extraction and phase separation using a porous capillary", 2015. https://doi.org/10.6084/m9.figshare.1447208
- T.W. Phillips, J.H. Bannock, and J.C. deMello, "Microscale extraction and phase separation using a porous capillary", Lab on a Chip, vol. 15, pp. 2960-2967, 2015. https://doi.org/10.1039/c5lc00430f
- D. Vrandečić, and M. Krötzsch, "Wikidata", Communications of the ACM, vol. 57, pp. 78-85, 2014. https://doi.org/10.1145/2629489
Tags:Cambridge, chemical data, Chemistry Central, Collective intelligence, Crowdsourcing, Doctor of Philosophy, Education, European Union, France, GITHUB INC., Imperial College, Jean Claude Bradley, lab on a chip, Lyon, Nick Mason, Nonprofit technology, Open content, Peter Murray-Rust, reward systems, Technology/Internet, Tom Arrow, Tom Phillips, Wikimedia Foundation, wikipedia, World Wide Web, young scientist
Posted in Bradley-Mason Prize for Open Chemistry, Chemical IT | 1 Comment »
Wednesday, April 1st, 2015
The reduction of cinnamaldehyde by lithium aluminium hydride (LAH) was reported in a classic series of experiments[1],[2],[3] dating from 1947-8. The reaction was first introduced into the organic chemistry laboratories here at Imperial College decades ago, vanished for a short period, and has recently been reintroduced again.‡ The experiment is really simple in concept; add LAH to cinnamaldehyde and you get just reduction of the carbonyl group; invert the order of addition and you additionally get reduction of the double bond. Here I investigate the mechanism of these reductions using computation (ωB97XD/6-311+G(d,p)/SCRF=diethyl ether).

The mechanism can be envisaged as proceeding through a 1,4-hydride attack (TS14) with a hidden intermediate (HI14) on the reaction path, or instead finding a pathway involving either one or two consecutive 1,2-attacks; TS12-1, TS12-2 via an explicit intermediate I12. Experiment shows that quenching with D2O at the end of the reduction to replace a C-Al with a C-D bond certainly seems to rule out the 1,4 route, since that would not lead to incorporation of deuterium at the benzylic position. So does the computational model reflect this reality?
I have chosen a model in which two dimethyl ether molecules solvate the lithium cation. The reactant itself has an interesting structure, in which two of the Al-H bonds form bridges to the Li, which ends up being five-coordinated. Further weak C-H…O=C hydrogen bonding is also observed. The NCI (non-covalent-interaction) surfaces are well worth inspecting (inspection notes: the NCI surrounding the Al has artefacts, since the value of the electron density surrounding the metal is lower than covalent density for the other elements. Click on the image below to load the 3D model).

Click for 3D
TS14 retains that C-H…O=C hydrogen bond, but the double Al-H-Li bridge is lost. The 8-ring for the TS allows the hydride transfer to be approximately linear, and the Bürgi-Dunitz angle of approach of the hydride to the double bond is 107.4°. Whilst the barrier is acceptably low, the reaction reaches a cul-de-sac down this path; it has no low energy escape route.

Click for 3D
TS12-1 loses the C-H…O=C hydrogen bond, but being 3.3 kcal/mol lower in free energy than TS14 fortunately provides a lower energy alternative to that cul-de-sac! The Bürgi-Dunitz angle is 112.0°.


TS12-2 is required to proceed further to the dihydrocinnamyl alcohol reduction product P12, and now we have to confront the nub of the problem. Why does this further reduction only proceed when the LAH is in excess? TS12-2 itself corresponds to an Al-H addition across a C=C double bond.[11]†, with a similar barrier to TS12-1. The answer to this conundrum is to recognise that I12 forms what is called a resting state for the reaction, and that to proceed further the reaction has to overcome the barrier from I12 to TS12-2. That barrier is 42.3 kcal/mol, far too high to proceed thermally. When one encounters an unreasonable barrier, one has to look very carefully at the model one has constructed for the process.

Click for 3D
Clearly, the model I used here is lacking something. Since the reaction only proceeds when LAH is in excess, we can formulate the hypothesis that further LAH must be added to the model, from which a more reasonable barrier might emerge. If I find out how that can be done, I will report back here.
‡ LAH as a reagent was originally available in powder form, which could be quite tricky to handle and could cause fires if not handled properly. The lab organiser Chris tells me it now comes in standard-sized pellets which are far easier and safer to handle in a laboratory, allowing its re-introduction.
†Biographical note. This footnote is added because I spent three years as a Ph.D. student trying to construct transition state models by measuring kinetic isotope effects. My failure to do so convincingly meant I decided to spend a further three years as a Post Doc inverting the concept by learning how to model transition states using quantum mechanical computation. I first applied these skills as an independent researcher to locating the transition state for Cl-H addition (vs Al-H in this post) across a C=C double bond and computing the associated isotope effects.[12] This article ends with the assertion that “SCF-MO calculations may provide a more rational basis for interpreting kinetic isotopes than the reverse procedure of attempting to establish a transition state model from the observed kinetic data.” It is nice to see that posterity has shown that this assessment was about right.
References
- R.F. Nystrom, and W.G. Brown, "Reduction of Organic Compounds by Lithium Aluminum Hydride. I. Aldehydes, Ketones, Esters, Acid Chlorides and Acid Anhydrides", Journal of the American Chemical Society, vol. 69, pp. 1197-1199, 1947. https://doi.org/10.1021/ja01197a060
- R.F. Nystrom, and W.G. Brown, "Reduction of Organic Compounds by Lithium Aluminum Hydride. II. Carboxylic Acids", Journal of the American Chemical Society, vol. 69, pp. 2548-2549, 1947. https://doi.org/10.1021/ja01202a082
- F.A. Hochstein, and W.G. Brown, "Addition of Lithium Aluminum Hydride to Double Bonds", Journal of the American Chemical Society, vol. 70, pp. 3484-3486, 1948. https://doi.org/10.1021/ja01190a082
- H.S. Rzepa, "C 13 H 24 Al 1 Li 1 O 3", 2015. https://doi.org/10.14469/ch/191154
- H.S. Rzepa, "C 13 H 24 Al 1 Li 1 O 3", 2015. https://doi.org/10.14469/ch/191148
- H.S. Rzepa, "C 13 H 24 Al 1 Li 1 O 3", 2015. https://doi.org/10.14469/ch/191152
- H.S. Rzepa, "C 13 H 24 Al 1 Li 1 O 3", 2015. https://doi.org/10.14469/ch/191149
- H.S. Rzepa, "C 13 H 24 Al 1 Li 1 O 3", 2015. https://doi.org/10.14469/ch/191151
- H.S. Rzepa, and H.S. Rzepa, "C 13 H 24 Al 1 Li 1 O 3", 2015. https://doi.org/10.14469/ch/191156
- H.S. Rzepa, "C 13 H 24 Al 1 Li 1 O 3", 2015. https://doi.org/10.14469/ch/191155
- H.S. Rzepa, "Gaussian Job Archive for C2H7Al", 2015. https://doi.org/10.6084/m9.figshare.1362146
- H.S. Rzepa, "MNDO SCF-MO calculations of kinetic isotope effects for dehydrochlorination reactions of chloroalkanes", Journal of the Chemical Society, Chemical Communications, pp. 939, 1981. https://doi.org/10.1039/c39810000939
Tags:Al-H-Li bridge, dihydrocinnamyl alcohol reduction product, free energy, Imperial College, independent researcher, low energy escape route, lower energy alternative, metal, pence
Posted in reaction mechanism | 5 Comments »
Saturday, March 8th, 2014
The diazo-coupling reaction dates back to the 1850s (and a close association with Imperial College via the first professor of chemistry there, August von Hofmann) and its mechanism was much studied in the heyday of physical organic chemistry.[1] Nick Greeves, purveyor of the excellent ChemTube3D site, contacted me about the transition state (I have commented previously on this aspect of aromatic electrophilic substitution). ChemTube3D recruits undergraduates to add new entries; Blue Jenkins is one such adding a section on dyes.

The mechanism can be rate limiting either in the initial electrophilic attack (black arrows) or in the subsequent proton removal (red arrows using an intermolecular base such as chloride anion).[2].‡ The product is normally assumed to be the trans-diazo compound rather than cis. This distribution is certainly true in the crystal structure database (below, although some examples of cis are known, including azobenzene itself). Would this distribution be reflected in the transition states? Initial attempts by the ChemTube3D team had resulted only in a cis-transition state being located, and they asked me to check this out.

ωB97XD/6-311G(d,p)/SCRF=water calculations using phenyl diazonium chloride (I do like my counter-ions) coupling to benzene resulted in location of both cis[3] and trans[4] transition states, the former being the lower by 1.0 kcal/mol in free energy (this might well be due to the dispersion stabilisation from π-π stacking).† The IRC for the cis is shown below.[5]



You can see that the entire process is concerted. The Wheland intermediate normally invoked as part of the mechanism of aromatic electrophilic substitution is not a proper intermediate but a hidden one for the reaction with X=Y=H. The reaction coordinate has a flat top, and that passage along this part represents the hidden Wheland. The reaction barrier is high however, and it is certainly observed that only activated arenes (phenols, anilines, X,Y=OH, NH2) actually couple with diazonium cations. For these, the hidden intermediate is stabilized by the substituent, and no doubt emerges as a real intermediate.
For my thesis work, I studied[2] diazo-coupling of indoles. I might have a go at returning to that work, to see if calculations can replicate my finding, that for unhindered indoles proton removal from the Wheland intermediate is fast, but add a few t-butyl hindering groups and it becomes slow.
PS. Here is the IRC for the formation of trans-diazobenzene.[6]

‡Such diazo compounds make up a significant proportion of the 50 or so real molecules I have personally added to the collection of 84 million or so thus far identified.
†Working with ions has one statistical problem that covalent systems do not have; where to geometrically place the counter-ion. One should really stochastically explore reasonable locations before concluding the likely location of the globally lowest energy pose.
References
- S.B. Hanna, C. Jermini, H. Loewenschuss, and H. Zollinger, "Indices of transition state symmetry in proton-transfer reactions. Kinetic isotope effects and Bronested's .beta. in base-catalyzed diazo-coupling reactions", Journal of the American Chemical Society, vol. 96, pp. 7222-7228, 1974. https://doi.org/10.1021/ja00830a009
- B.C. Challis, and H.S. Rzepa, "The mechanism of diazo-coupling to indoles and the effect of steric hindrance on the rate-limiting step", Journal of the Chemical Society, Perkin Transactions 2, pp. 1209, 1975. https://doi.org/10.1039/p29750001209
- H.S. Rzepa, "Gaussian Job Archive for C12H11ClN2", 2014. https://doi.org/10.6084/m9.figshare.956138
- H.S. Rzepa, "Gaussian Job Archive for C12H11ClN2", 2014. https://doi.org/10.6084/m9.figshare.956139
- H.S. Rzepa, "Gaussian Job Archive for C12H11ClN2", 2014. https://doi.org/10.6084/m9.figshare.956209
- H.S. Rzepa, "Gaussian Job Archive for C12H11ClN2", 2014. https://doi.org/10.6084/m9.figshare.956213
Tags:covalent systems, first professor, free energy, Imperial College, lowest energy pose, Nick Greeves, professor of chemistry
Posted in reaction mechanism | 2 Comments »