Henry Rzepa's blog

Tag: Chemical IT

Data-round-tripping: wherein the future?

Moving (chemical) data around in a manner which allows its (automated) use in whichever context it finds itself must be a holy grail for all scientists and chemists. I posted earlier on the fragile nature of molecular diagrams making the journey between the editing program used to create them (say ChemDraw) and the Word processor used to place them into a context (say Microsoft office), via an intermediate storage area known as the clipboard. The round trip between the Macintosh (OS X) versions of these programs had been broken a little while, but it is now fixed! A small victory. This blog reports what happened when such a Mac-created Word document is sent to someone using Microsoft Windows as an OS (or vice versa).

As you might have guessed, the molecular diagram arrives largely dead, and not re-usable. Opening the .docx archive (it is nothing more than a zip file) reveals only a JPEG file residing inside. Nothing that can be chemically repurposed. If the reverse process is undertaken, of creating a chemdraw diagram, and pasting it into Word on Windows, one finds in the .docx two components; a bit-mapped image linked to an active object containing the data. Only the first of these is recognised if the file makes its way to a Macintosh; i.e. the same story, the data is again lost. So the bottom line is that Mac users and Windows users cannot, after all, exchange repurposable molecular diagrams using Word documents using this combination of programs. This is not good.

But let me remind what happened around 1993. The word processor was joined by a program called the Web browser. In 1996, the underlying content carrier, HTML, became XHTML (an instance of XML). Right from day 1 almost, such XHTML could, and frequently was repurposed. A memorable example is that search engines could use it to index the Web. The XHTML easily survived trips to and from clipboards. In 1996, CML joined HTML as a way of carrying chemical information capable of round-tripping without loss (if need be). There are other chemical XML languages in use nowadays, including CDXML used by the ChemDraw program. Word itself now uses XML (the x in .docx). So, after 14 years, why am I still describing the difficulties above? I am frankly at a loss to explain why there is still a need to write this post.

All is not entirely lost. The CML4Word approach is designed to enable (chemical) data round tripping from the outset. Although I do not yet know if the CML created and stored in the Word document using this mechanism is recognised anywhere outside of Word 2007 on Windows? If anyone can let me know of examples where such a CML-enabled Word document can be used in other environments, I would be very grateful (but not on OS X, as I know already).

And as I might have mentioned in the previous post on this topic, things may not however be getting better in that other carrier of information and data, the mobile phone/iPad, as exemplified by operating systems such as iOS or Android. Watch this space, as they say.

December 7, 2010
Data-round-tripping: moving chemical data around.
For those of us who were around in 1985, an important chemical IT innovation occurred. We could acquire a computer which could be used to draw chemical structures in one application, and via a mysterious and mostly invisible entity called the clipboard, paste it into a word processor (it was called a Macintosh). Perchance even print the result on a laserprinter. Most students of the present age have no idea what we used to do before this innovation! Perhaps not in 1985, but at some stage shortly thereafter, and in effect without most people noticing, the return journey also started working, the so-called round trip. It seemed natural that a chemical structure diagram subjected to this treatment could still be chemically edited, and that it could make the round trip repeatedly. Little did we realise how fragile this round trip might be. Years later, the computer and its clipboard, the chemistry software, and the word processor had all moved on many generations (it is important to flag that three different vendors were involved, all using proprietary formats to weave their magic). And (on a Mac at least) the round-tripping no longer worked. Upon its return to (Chemdraw in this instance), it had been rendered inert, un-editable, and devoid of semantic meaning unless a human intervened. By the way, this process of data-loss is easily demonstrated even on this blog. The chemical diagrams you see here are similarly devoid of data, being merely bit-mapped JPG images. Which is why, on many of these posts, I put in the caption Click for 3D, which gives you access to the chemical data proper (in CML or other formats). And I throw in a digital repository identifier for good measure should you want a full dataset.

It is only now that we (more specifically, this user) understand what had happened under-the-hood to break this round-tripping. In 1984, when Apple produced the Mac, they also produced a most interesting data format called PICT. A human saw the PICT as a PICTure, but the computer saw more. It (could) see additional data embedded in the PICT. The clipboard supported the PICT format, which meant that both picture and data could be transferred between programs. And ChemDraw and Word also understood this. Hence the ability to round-trip noted above (it has to be said between specifically these programs).

Times moved on and the limitations of PICT set in. Apple refocussed on the PDF format. Related, notice, to the Postscript format that Adobe had introduced in order to allow high quality laserprinting. PICT support was abandoned, and the various components no longer carried recognisable data (specifically the clipboard or the ability of Word to recognise the data). Round-tripping broke. Does this matter? Well, one colleague where I work had accumulated more than 1000 chemical diagrams, which he decided to store in Powerpoint (and yes, he threw the original Chemdraw files away). The day came when he wanted to round trip one of them. And of course he could not. He was rather upset I have to say!

PDF was not really a format designed to carry data (see DOI: 10.1021/ci9003688). But, bless their hearts, the three vendors involved in this story all agreed to support data embedded in the PDF hamburger (and Abobe to tolerate it) and now once again, a structure diagram can move into an Office program (on Mac) and out again and retain its chemical integrity. What lessons can be learnt?
1. Firstly, out of side, out of mind. The clipboard is truly mostly out of sight, and it was not really designed from the outset to preserve data properly. Nowadays I wonder whether clipboards in general recognise XML (and hence CML) and preserve it. I truly do not know. But they should.
2. Secondly, any system which relies on three or four commercial vendors, who at least in the past, devised proprietary formats which they could change without warning, is bound to be fragile.
3. We have learnt that data is valuable. More so than the representation of it (i.e. a 2D or 3D structure diagram). But when its lost, the users should care! And tell the vendors.
4. Peter Murray-Rust and his team have produced CML4Word (or as Microsoft call it, Chemistry add-in for Word). At its heart is data integrity. Fantastic! But I wonder if it survives on Microsoft’s clipboard (I know it does not on Apple’s, since CML4Word is not available on that OS. And is unlikely to ever become so).
5. And I can see history about to repeat itself. The same seems about to happen on new devices such as the Apple iPad. It too has copy/paste via a clipboard. I bet this will not round trip chemistry (or much other) data! Want to bet that the lessons of this story have not yet been learnt?
Oh, for those who wish to round-trip chemistry on a Mac, you will have to acquire ChemDraw 12.0.2 and Word 2011 (version 14.01), as well as OS X 10.6 for it to work.
November 20, 2010
Semantically rich molecules
Peter Murray-Rust in his blog asks for examples of the Scientific Semantic Web, a topic we have both been banging on about for ten years or more (DOI: 10.1021/ci000406v). What we are seeking of course is an example of how scientific connections have been made using inference logic from semantically rich statements to be found on the Web (ideally connections that might not have previously been spotted by humans, and lie overlooked and unloved in the scientific literature). Its a tough cookie, and I look forward to the examples that Peter identifies. Meanwhile, I thought I might share here a semantically rich molecule. OK, I identified this as such not by using the Web, but as someone who is in the process of delivering an undergraduate lecture course on the topic of conformational analysis. This course takes the form of presenting a set of rules or principles which relate to the conformations of molecules, and which themselves derive from quantum mechanics, and then illustrating them with selected annotated examples. To do this, a great many semantic connections have to be made, and in the current state of play, only a human can really hope to make most of these. We really look to the semantic web as it currently is to perhaps spot a few connections that might have been overlooked in this process. So, below is a molecule, and I have made a few semantic connections for it (but have not actually fully formalised them in this blog; that is a different topic I might return to at some time). I feel in my bones that more connections could be made, and offer the molecule here as the fuse!

Two chair conformations of the molecule DULSAE. Click here for 3D. Note the (attractive) short H…H contacts.

To list all the likely semantics that a chemist would perceive in the graphic above would take far too long (by the time one would have finished, a text book would have been written). So here is a very very short summary in the context of conformational analysis.
1. The molecule has a six membered ring as its backbone
2. which can adopt two possible chair conformations
3. which can interconvert by exchanging the axial and equatorial group pair for each of the four carbon atoms in the ring.
4. An organic chemist will immediately notice a very unusual group, Fe(CO)₂Cp, which itself is a semantic goldmine,
5. but for the purposes here we will regard merely as a C-Fe bond!
The (semantic) question to be posed is “which of the two conformations shown above is the most stable“? That too of course has an abundance of implicit semantics, but most human chemists will probably know that this refers to asking which of the two geometries represents the lowest thermodynamic free energy (and we leave aside the issue of what medium the molecule is in, i.e. solid, solution or gas). A far trickier question is “why”?

So to (some interim) answers. Well, a ωB97XD/6-311G(d) calculation (wow, think of what is implied in that concise notation) predicts conformation (a) to be more stable by 2.3 kcal/mol (2.1 in ΔG, see DOI: 10042/to-4911). Now to the why. What connections would someone well versed in conformation analysis spot?
1. The molecule has two methyl groups on adjacent atoms. They may prefer to be di-axial rather than di-equatorial to avoid excessive steric repulsions (whatever we mean by that!). That might prefer (b).
2. The molecule has one carbon with both a cyano and an ether linkage. Well, that is susceptible to an anomeric effect (although, as I pointed out in an earlier post here, this connection has in fact often NOT been made in the literature). Only in conformation (a) is one of the oxygen lone pairs aligned anti-periplanar to the axis of the C-CN bond. The reasons why this is important are outlined in my Lecture course.
3. Having spotted the last, the human might ask whether there is any possibility of an anomeric effect between an oxygen lone pair and the axis of the C-Fe bond? Well, I rather think that not a single human ever has asked that question! (I cannot know that of course, and perhaps someone has speculated upon this in the literature; this is where a full semantic web would help. That question could be posed of it! The reason I suspect the connection might not have been made is that the anomeric effect is the domain of the organic chemistry, and C-Fe bonds are those of the organometallic chemist. They do tend to see the chemical world rather differently, these two groups of chemists). If there was such an effect, it would favour (a).
4. Then we have an X-C-C-Y motif. Depending on the nature of X and Y, the molecule might actually prefer a gauche conformation, i.e the dihedral angle XCCY would be around 60°. There are several such motifs one can detect; X=Y=O (twice). It might be that other permutations such as X=CN, Y=Fe(CO)₂Cp, favour anti-periplanar. There are other permutations whose orientational preference may not even be recorded (in text books). Suddenly its gotten complicated!
5. There are a number of short (~2.4Å) H…H contacts
6. We are starting to understand that to unravel the conformation of this molecule, one may have to identify quite a number of different “rules”, and then to quantify each, and add up the numbers to get the final result. That energy of 2.3 kcal/mol may be composed of the result of applying quite a number of different rules. Hence the title of this post, a semantically rich molecule!
Well, I will leave it here for this post, without giving answers to the six points listed above, or really answering my main question posed above. That would make the post too complex (but I will follow this up!). I do want to end by planting the idea that answering this question involves making a great many chemical connections about the properties of this molecule, and then identifying quantitative ways (algorithms) in which an answer can be formulated. The molecule above is presented as a challenge for the Semantic Web to address!
May 2, 2010
WebCite and Jmol

Since I have gotten into the habit of quoting some of my posts in other contexts, I have started to also archive them using WebCite. One can quote the resulting archive as:

Rzepa, Henry. Quintuple bonds. 2010-04-18. URL:http://www.ch.ic.ac.uk/rzepa/blog/?p=1722. Accessed: 2010-04-18. (Archived by WebCite^® at http://www.webcitation.org/5p5BtuzSH)

There is one issue though. Many of my posts expose molecules via a Jmol popup. WebCite cannot archive that aspect; the Jmol applet fails to run (it would be surprising, since WebCite would have to archive a local copy of Jmol to create a new sandbox). Anyone got any thoughts?

April 18, 2010
To blog or to publish. That is the question.

Scientists write blogs for a variety of reasons. But these do probably not include getting tenure (or grants). For that one has to publish. And I will argue here that a blog is not currently accepted as a scientific publication (for more discussion on this point, see this article by Maureen Pennock and Richard Davis). For chemists, publication means in a relatively small number of high-impact journals. Anything more than five articles a year in such journals, and your tenure is (probably) secure (if not your funding).

Can one do both? Post a blog item, and then publish a follow-up in a high-impact journal? Well, yes and no.

I had better explain. A blog post is more often then not catalysed by reading an article, viewing another blog, or discussing something with a colleague. One posts in the hope of getting some feedback, from which one’s ideas might mature, develop, or indeed collapse! Scientists have long done this of course, albeit with a colleague down the corridor, at conferences or seminars. The ideas thus cast forth may also of course also get stolen, and so these traditional mechanisms for floating ideas are often very short on detail. Sometimes, returning to the idea of blogs, one post can lead to another, and the nature of the blog means the ideas can evolve, mutate very rapidly. Eventually, one might wish to take a good overview of all the various efforts. At this point, one is now considering publishing a journal article, since currently at least, the longevity of a journal is considered longer than that of a blog (see this post here for more ruminations on that theme). There are other good reasons for then choosing a journal rather than one’s blog. The QA (quality assurance) necessary to get an article accepted in a good journal is, let’s face it, rather greater than that of a blog (although to be fair, it is only motivation that limits the quality of the latter). Apart from adding all those control experiments/calculations that may be missing from the blog, one also must be far more fastidious in citing the literature correctly.

I do speak from (thus far one) experience. The story starts here, this being the initial post on a story that broke on Steve Bachrach’s blog about a compound with a potentially pentavalent carbon; Steve’s own post was based on an original article on the theme. Several more blog posts followed as the logical theme gradually developed. I eventually decided that telling how this set of logical connections came about was almost as interesting as the specific molecules it covered. The story had also evolved from discussing the element Astatine to speculating about the rare gas Helium, a somewhat less than obvious connection path (and how to discover connections between disparate and apparently unconnected concepts is a different story). Where should the story about how astatine was connected to helium be told? I decided it should indeed be in a formally published journal article. But it was also important to tell the story more or less as it happened, and particularly to include the role that the blogs themselves had played.

In fact, as soon as I started this undertaking, I realised that more calculations, and at a rather higher theoretical level, needed to be done in order to persuade the referees of the article that the science was sound, and also that it advanced our knowledge significantly. In the event, although the calculations were repeated, enhanced, or evolved in some manner or other, and new ideas injected, none of the original assertions was proven wrong (and of course its now not just me that thinks this, but the 2-3 referees who also commented). Ultimately, I would estimate I ended up spending perhaps ten times as much time on the journal article as on the sum of the initial blog posts on the topic. It an interesting question as to whether the motivation needed to put in this amount of care and attention could also have been generated with blog as the sole output medium (see my opening remarks).

The article is now published (DOI: 10.1038/nchem.596). Of course, you can only read it if your institution (or you personally) has a subscription to the journal (although, like this blog, the article can be located using public search facilities such as Google Scholar). There is another aspect of both the blog and the article worth mention. Both contain data. The blogs contain the molecular coordinates of all the molecules discussed, as well as the DOIs for the digital repository where the calculations are archived. So does the article, in the form of an interactive table, although again access to this table may or may not require a journal subscription (in this regard I note that whereas an earlier article I wrote for this publisher, see DOI 10.1038/nchem.373, is protected from non-subscribers, the interactive table which is part of the article is openly accessible. The journal deserves full credit for allowing this data to be on public access).

There is another aspect of the blog and the article, which was alluded to above. I introduced the theme of linking concepts together. This very blog post (and all the others) have been subjected to analysis using the calais archive tagger. This automatically determines appropriate tags to annotate each post with, and then declares them using standard methods (which include RDF). The published article is similarly tagged by the publisher. In theory at least, this collection of materials, the blogs and their tags, and the article and indeed commentaries about both, should be reconcilable using appropriate semantic searches. But at this point, I feel that this topic deserves separate attention and I will close here.

February 9, 2010
Semantic Blogs
A Semantic blog is one in which the system at least in part understands about (some of the) concepts and topics that are in the content. The idea is that this content can be more intelligently (is that the correct word?) and importantly, automatically searched, harvested, and connected to the same or similar concepts found elsewhere in other blogs and the Web as whole. I am writing this blog using Firefox, having added a Firefox extension called Zemanta. As I write, the system offers suggestions for similar themes elsewhere that I could choose to link to the blog (and obviously the more one writes, or the more specific the terms one uses, the more sensible the suggestions become. At this precise moment, it is still offering fairly generic suggestions, one of which I have just chosen to add). My purpose in this particular post is to explore how the very process of writing a blog might be affected by such a product. I am also inferring (but cannot add detail at the moment) that all the (semantic) connections or links to other materials will be expressed in this blog using some form of formal declaration, such as e.g. RDF or RDFa.

Thus this blog has a WordPress plugin called wp-RDFa as part of its library. This gathers meta-data in two forms, FOAF and Dublin-Core, and expresses it using the RDFa formalism. This is really just a standard way of letting any software that might visit the blog know that this meta-data is available for harvesting. FOAF is something we discussed a year or so back; it is a formal way of expressing information about yourself in RDF (see an ACS talk on the topic), and in particular indicating what you are interested in (as a chemist in my case), who you collaborate with, where you visit (information of course that you do wish to make public, you do not have to include any private details). Nowadays, a variety of social networking tools have become semantically enabled. This blog is, a flavour of Wikis (SemediaWiki, and its potential as a format for science journals), Second Life and many others. At the moment, there is little apparent added value emerging from such enrichment (I have just noted another two Zemanta articles flagged, which I will add at this instant) and certainly little in chemistry.

But what could one aspire to? For example, Steve Bachrach on his blog routinely adds InChI identifiers and keys to uniquely identify all molecules mentioned on his site. Just imagine a situation where one is describing a molecule in one’s own blog, and e.g. Zemanta instantly flags up any other article out there which has tagged the same molecule. That article and your blog can now be semantically identified as talking about the same system. A harvester could collect the information about this molecule, and create a superset of information about it (hey, we chemists already have such a system, it is called Chemical Abstracts! But of course its not quite the same, and I had better reserve a comparison with CAS for another post), which in turn enriches resources such as Zemanta. Its a sort of positive feed-back loop!

Well, the Semantic Web has been a long time coming (see DOI: http://dx.doi.org/10.1021/ci000406v or 10.1087/095315101750240421 which were both written in 2001), and since it has not yet changed the Web, some tend to write it off as a lost cause. Perhaps the semantification of blogs will make a difference?

Related articles by Zemanta
- More on Zemanta (thwaits.wordpress.com)
- Google, RDFa, and Reusing Vocabularies (go-to-hellman.blogspot.com)
- The end of Search? Linked Data, Semantic Web & thoughts. (webr3.org)
January 17, 2010
How long will a blog last? ArchivePress
After around 40 posts here, I decided to take a look at the whole effort and ask some questions. For example
1. Should (scientific) blogs be used to report new science, or merely opinion on existing science (see this blog also)?
2. If the former, should they be abstracted in the manner of regular articles (e.g. by CAS etc).
3. Unlike e.g. a journal, a blog is often (and certainly in this case) the effort of an individual. Journals on the other hand can last for centuries (see for example this link to the ToC of the world’s oldest scientific journal that has been in continuous publication for 355 years!). So how long should/can a blog last?
4. The last question leads on to whether blogs should be archived or curated in a larger sense?
The last question leads directly to projects such as ArchivePress which has just started up a few months ago. I will quote two of their objectives
- Methodology and guidance for the effective capture and management of blog posts.
- Scripts/plugins to enable WordPress to be used as a blog aggregator and archiving engine.
Of course, this will have to be a fairly generic solution, and certainly one aspect of my blog presents another challenge, namely how to preserve the molecules mentioned here (many of the posts include 3D coordinates lurking under the images). But one step at a time!

I will post on another solution to the preservation issues, which should enter the public domain in a month or so. Meanwhile, let’s see what the ArchivePress project can offer!
January 9, 2010
Spotting the unexpected: Anomeric effects

Chemistry can be very focussed nowadays. This especially applies to target-driven synthesis, where the objective is to make a specified molecule, in perhaps as an original manner as possible. A welcome, but not always essential aspect of such syntheses is the discovery of new chemistry. In this blog, I will suggest that the focus on the target can mean that interesting chemistry can get over-looked (or if observed, not fully exploited in subsequent publications). Taking a synthesis-oriented publication at (almost) random entitled Synthesis of 1-Oxadecalins from Anisole Promoted by Tungsten (DOI: 10.1021/ja803605m) which appeared in 2008, the following molecule appears as one of the (many) intermediates.

A cyano-substituted cis decalin. Click for 3D
This molecule has an X-ray structure reported, as a means of confirming the stereochemistry at the various centres, and particularly at the carbons bearing a cyano group. Labelled as compound 22 in the publication, there is no discussion or follow-up on the resulting conformation of this compound, which in fact adopts one with both cyano groups axial (there are three other possibilities of course, in which the cyano groups can be both equatorial, or one axial and the other equatorial). A B3LYP/6-31G(d,p) calculation of these conformations confirms that the di-axial isomer is indeed the most stable (see for example DOI: 10042/to-2402 for a digital repository entry for the calculation).

An inspection of the molecular orbitals for the di-axial isomer reveals that the HOMO involves interaction of the alkene π-MO with the C…CN bond (top) and the HOMO-1 involves interaction of the oxygen lone pair with the C…CN bond (bottom). This sort of interaction is a classical anomeric effect!

HOMO with alkene-cyano anomeric interaction. Click for 3D
HOMO-1 with O-CN anomeric interaction. Click for 3D
So what is unusual about it? Well, anomeric effects are normally described in text books and lecture courses as involving predominantly oxygen (and nitrogen) as an electron pair donor, and C…O (and C…N and C…F) σ-bonds as the acceptors. The stereoelectronic alignment of course has to be anti-periplanar, and this orientation will control how the anomeric effect operates. What you may not find in the text books is a C…CN bond as the electron acceptor! But if e.g. C…F can be one, why not C…CN (the cyano group is often described as a pseudo-halogen). If you inspect the 3D model above, you can see that the C…CN bond associated with the adjacent oxygen is perfectly set up for anti-periplanar alignment with one of the oxygen lone pairs (an arrangement not possible if the CN group had been equatorial). The C…CN bond length (1.49 Å) is indeed about 0.02Å longer than one would normally expect of such a bond.

Inspection of the HOMO shows an almost identical interaction between the C…CN bond and the alkene, implying that here it is the electrons from an alkene that are the donor. This combination, of an alkene as donor and a C…CN group as an acceptor has (to my knowledge) never been suggested as an anomeric effect pair. It is not as strong as before (C…CN 1.47Å) and perhaps in this case, it adopts the axial position because the alternative equatorial conformation is disfavoured for other reasons.

But, and this is the point of this blog, the structure of compound 22 in the synthesis project above has some interesting aspects, which perhaps can lead to new insights and even new chemistry. One can but wonder how many reported compounds have properties which are perhaps more interesting than their authors realize, and how much new chemistry is lurking in the literature which has not (yet) been noticed. With more than 50,000,000 compounds now reported in Chemical Abstracts, there is surely lots out there to discover. However, will it be humans who will increasingly do so in the future, or automatons scouring the Semantic Web? But here we digress to a new topic!

September 18, 2009
(Hyper)activating the chemistry journal.
The science journal is generally acknowledged as first appearing around 1665 with the Philosophical Transactions of the Royal Society in London and (simultaneously) the French Academy of Sciences in Paris. By the turn of the millennium, around 10,000 science and medical journals were estimated to exist. By then, the Web had been around for a decade, and most journals had responded to this new medium by re-inventing themselves for it. For most part, they adopted a format which emulated paper (Acrobat), with a few embellishments (such as making the text fully searchable) and then used the Web to deliver this new reformulation of the journal. Otherwise, Robert Hooke would have easily recognized the medium he helped found in the 17th century.

In 1994, a small group of us thought that one could, and indeed should go further than emulated paper. We argued [cite]10.1039/C39940001907[/cite] that journals should be activated by delivering not merely the logic of a scientific argument, but also the data on which it might have been based. Of course, we encountered the usual problem; doing this might cost publishers more in production resources, and in the absence of a market prepared to pay the extra, the business model did not make sense (to the publishers). Well, 15 years later, and most publishers are indeed now thinking about how their journals can be enhanced. A number of interesting projects (the RSC’s Project Prospect is one which strives to bring science alive) have emerged. Another is the topic of this blog; the activation of the journal with molecular coordinates and data using the Jmol applet.

Initially (~2005), this project met with resistance from publishers, and the issue really amounted to what the definitive version of a scientific article should be. Should that definitive version be printable? That model, after all had served the community well for more than 300 years! And journals from the very beginning are still as readable now as when first published. In other words, print lasts! But print is pretty limiting after all. For a start, it is limited to 2D static representations. Molecules, by and large, do their magic in a dynamic three dimensions (4D in an Einsteinian sense). But print is also expensive; not merely to produce, but to transport paper around the world.

From the turn of the millennium, a number of publishers, amongst them the American Chemical Society, started to evolve the scientific article such that the pre-eminent version would now be considered to be the HTML form (perhaps as a prelude to phasing out print entirely? See an interesting commentary by a journal editor) and perhaps a digital Acrobat form which would be deemed to loose some of its functionality once printed (again see here for how Acrobat can be used to enhance things). Again however, a chicken-and-egg scenario resulted. To enhance the articles with extra functionality (such as data), they would need to find authors prepared to put the extra work into preparing the material. In fact, most authors already do that, but they call it supporting information. This is often highly data rich, covering materials such as spectra, coordinates and other information nowadays provided to researchers for analysis. Unfortunately, what has been missing is the education of authors to provide this information in a proper digital form which can be easily re-used by others, and on a Web page, converted automatically to nice interactive models. Most spectra which form part of the supporting information are in fact still scanned versions of printed spectra!

Enter computational chemists. Nowadays, they live in a world that truly does not need printing! Almost all of their data is already suitably digital. So perhaps it is no surprise to find that when enhanced journal articles started appearing around 2005, many were produced by this group of chemists. By now perhaps you are wondering what such an article might look like. Well, the remainder of this blog will be devoted to listing some examples. You will also notice that they come exclusively from our own publications. Perhaps someone will find the time to collect a far more representative set to better illustrate the diversity of this form, and how it is evolving. Meanwhile, you might wish to take a look at the following.

Part 1: The early days: 1994 onwards

These examples all relied on a browser plugin called Chime, which is no longer with us! Hence the pages designed to invoke it no longer display properly. But the data associated with the articles is still there!
1. An early 1994 example of (hyper)activating a journal article can be seen here as the preliminary communication and
2. in 1995 here as the final full article. I am told that this was the article that actually inspired the developers of Chime to enhance (Netscape) with a chemical plugin.
3. This one from 1998 illustrates how articles can decay in functionality when Chime is no longer available.
4. An ab initio and MNDO-d SCF-MO Computational Study of Stereoelectronic Control in Extrusion Reactions of R₂I-F Iodine (III) Intermediates, M. A. Carroll, S. Martin-Santamaria, V. W. Pike, H. S. Rzepa and D. A. Widdowson, Perkin Trans. 2, 1999, 2707-2714 with the supporting information here.
5. Huckel and Mobius Aromaticity and Trimerous transition state behaviour in the Pericyclic Reactions of [10], [14], [16] and [18] Annulenes. Sonsoles Martên-Santamarêa, Balasundaram Lavan and H. S. Rzepa, J. Chem. Soc., Perkin Trans 2, 2000, 1415. with the supporting information here.
6. Peter Murray-Rust, H. S. Rzepa and Michael Wright, “Development of Chemical Markup Language (CML) as a System for Handling Complex Chemical Content”, New J. Chem., 2001, 618-634. DOI: 10.1039/b008780g. This article broke new ground in that the supporting information was something of a misnomer. It was expressed entirely in XML, including all the chemistry data, and used XSLT transforms on the fly to regenerate the article. In that sense, it was actually a superset of the published article. It would be fair to say that this article was rather ahead of its time (although it does seem appropriate to publish it in a new journal!).
7. M. Jakt, L. Johannissen, H. S. Rzepa, D. A. Widdowson and R. Wilhelm, “A Computational Study of the Mechanism of Palladium Insertion into Alkynyl and Aryl Carbon-Fluorine bonds”, Perkin Trans. 2, 2002, 576-581 and supporting information.
8. P. Murray-Rust and H. S. Rzepa, chapter in “Handbook of Chemoinformatics. Part 2. Advanced Topics.”, ed. J. Gasteiger and T. Engel, 2003, Vol 1, was not enhanced per se, but did lay out the principles of how it might/should be done.
9. K. P. Tellmann, M. J. Humphries, H. S. Rzepa and V. C. Gibson, “An experimental and computational study of β-H transfer between organocobalt complexes and 1-alkenes”, Organometallics, 2004, 23, 5503-5513. DOI: 10.1021/om049581h and supporting information.
Part 2: 2005.

These four examples all now invoke Jmol, which downloads upon request and hence does not rely on the presence of any browser plugin. The four articles were submited with supporting information in the form of HTML. These were associated with the main article, but were not formal part of that article. In that sense, they represent an incarnation of the traditional model, with all the data firmly resident in the supporting information.
1. Gibson, Vernon C.; Marshall, Edward L.; Rzepa, H. S. ” A computational study on the ring-opening polymerization of lactide initiated by β-diketiminate metal alkoxides: The origin of heterotactic stereocontrol”, J. Am. Chem. Soc., 2005, 127, 6048-6051. DOI: 10.1021/ja043819b and supporting information.
2. H. S. Rzepa, Mobius aromaticity and delocalization”, Chem. Rev., 2005, 105, 3697 – 3715. DOI: 10.1021/cr030092l and supporting information.
3. H. S. Rzepa, “Double-twist Mšbius Aromaticity in a 4n+2 Electron Electrocyclic Reaction”, 2005, Chem Comm, 5220-5222. DOI: 10.1039/b510508k The supporting information is also available directly.
4. H. S. Rzepa, “A Double-twist Mobius-aromatic conformation of [14]annulene”, Org. Lett., 2005, 7, 637 – 4639. DOI: 10.1021/ol0518333 and supporting information.
Part 3: 2006 onwards

The supporting information has now been assimilated into the main body of the article proper, and within these confines contribute components such as enhanced figures or tables (i.e. enhanced with data)
1. A. P. Dove, V. C. Gibson, E. L. Marshall, H. S. Rzepa, A. J. P. White and D. J. Williams, “Synthetic, Structural, Mechanistic and Computational Studies on Single-Site β-Diketiminate Tin(II) Initiators for the Polymerization of rac-Lactide”, J. Am. Chem. Soc., 2006,128, 9834-9843. DOI: 10.1021/ja061400a The enhancement can be seen in Figure 11.
2. O. Casher and H. S. Rzepa, “SemanticEye: A Semantic Web Application to Rationalise and Enhance Chemical Electronic Publishing”, J. Chem. Inf. Mod., 2006, 46, 2396-2411. DOI: 10.1021/ci060139e
3. H S. Rzepa and M. E. Cass, “A Computational Study of the Nondissociative Mechanisms that Interchange Apical and Equatorial Atoms in Square Pyramidal Molecules”, Inorg. Chem., 2006, 45, 3958–3963. DOI 10.1021/ic0519988. Interactive table at 10.1021/ic0519988/ic0519988.html
4. M. E. Cass and H. S. Rzepa, “In Search of The Bailar Twist and Ray-Dutt mechanisms that racemize chiral tris-chelates: A computational study of Sc(III), V(III), Co(III), Zn(II) and Ga(III) complexes of a ligand analog of acetylacetonate”, Inorg. Chem., 2007, 49, 8024-8031. DOI: 10.1021/ic062473y The enhancement can be seen in Figure 2
5. H. S. Rzepa, “Lemniscular Hexaphyrins as examples of aromatic and antiaromatic Double-Twist Möbius Molecules”, Org. Lett., 2008, 10, 949-952.DOI:10.1021/ol703129z The enhancement can be seen in Web Table 1.
6. D. C. Braddock and H. S. Rzepa, “Structural Reassignment of Obtusallenes V, VI and VII by GIAO-based Density functional prediction”, J. Nat. Prod., 2008, DOI: 10.1021/np0705918 and WEO1.
7. S. M. Rappaport and H S. Rzepa, “Intrinsically Chiral Aromaticity. Rules Incorporating Linking Number, Twist, and Writhe for Higher-Twist Möbius Annulenes”, J. Am. Chem. Soc., 2008, 130,, 7613-7619. DOI: 10.1021/ja710438j and WEO1 to 4
8. C. S. M. Allan and H. S. Rzepa, “AIM and ELF Critical point and NICS Magnetic analyses of Möbius-type Aromaticity and Homoaromaticity in Lemniscular Annulenes and Hexaphyrins”, J. Org. Chem., 2008, 73, 6615-6622. DOI: 10.1021/jo801022b and WEO1
9. C. S. M. Allan and H. S. Rzepa, “Chiral aromaticities. Möbius Homoaromaticity”, J. Chem. Theory. Comp., 2008, 4, 1841-1848. DOI: 10.1021/ct8001915 and WEO1
10. C. S. M Allan and H. S. Rzepa, “The structure of Polythiocyanogen: A Computational investigation”, Dalton Trans., 2008, 6925 – 6932. DOI: 10.1039/b810147g and enhanced Table
11. H. S. Rzepa, “Wormholes in Chemical Space connecting Torus Knot and Torus Link π-electron density topologies”, Phys. Chem. Chem. Phys., 2009, 1340-1345. DOI: 10.1039/b810301a and enhanced Table.
12. H. S. Rzepa, “The Chiro-optical properties of a Lemniscular Octaphyrin”, Org. Lett., 2009, 11, 3088-3091. DOI: 10.1021/ol901172g
13. C. S. Wannere, H. S. Rzepa, B. C. Rinderspacher, A. Paul, H. F. Schaefer III, P. v. R. Schleyer and C. S. M. Allan, “The geometry and electronic topology of higher-order Möbius charged Annulenes”, J. Phys. Chem., 2009, DOI: 10.1021/jp902176a and enhanced table
14. H. S. Rzepa, “The distortivity of π-electrons in conjugated Boron rings.”, Phys. Chem. Chem. Phys., 2009, DOI: 10.1039/B911817A and enhanced table.
15. H. S. Rzepa, “The importance of being bonded”, Nature Chem., 2009, DOI: 10.1038/nchem.373 and the exploratorium.
16. King Kuok Hii, J.L.Arbour, H.S.Rzepa, A.J.P.White, “Unusual Regiodivergence in Metal-Catalysed Intramolecular Cyclisation of γ-Allenols”, Chem. Commun, 2009, DOI: 10.1039/b913295c and enhanced table.
17. L. F. V. Pinto, P. M. C. Glória, M. J. S. Gomes, H. S. Rzepa, S. Prabhakar, A. M. Lobo. “A Dramatic Effect of Double Bond Configuration in N-Oxy-3-aza Cope Rearrangements – A simple synthesis of functionalised allenes”, Tet. Lett., 2009, 50, 3446-3449. DOI: 10.1016/j.tetlet.2009.02.228 and interactive table.
18. H. S. Rzepa and C. S. M. Allan, “Racemization of isobornyl chloride via carbocations: a non-classical look at a classic mechanism”, J. Chem. Educ., 2010, DOI: 10.1021/ed800058c and interactive table.
19. K. Abersfelder, A. J. P. White, H. S. Rzepa, and D. Scheschkewitz “A Tricyclic Aromatic Isomer of Hexasilabenzene”, Science, 2010, DOI: 10.1126/science.1181771 and interactive table.
20. A. C. Spivey, L. Laraia, A. R. Bayly, H. S. Rzepa and A. J. P. White “Stereoselective Synthesis of cis- and trans-2,3-Disubstituted Tetrahydrofurans via Oxonium−Prins Cyclization: Access to the Cordigol Ring System”, Org. Lett., 2010, DOI 10.1021/ol9024259 and interactive table.
21. J. Kong, P. v. R. Schleyer and H. S. Rzepa, “Successful Computational Modeling of Iso-bornyl Chloride Ion-Pair Mechanisms”, J. Org. Chem., 2010, DOI: 10.1021/jo100920e and interactive table.
22. A. Smith, H. S. Rzepa, A. White, D. Billen, K. K. Hii, “Delineating Origins of Stereocontrol in Asymmetric Pd-Catalyzed α-Hydroxylation of 1,3-Ketoesters”, J. Org. Chem., 2010, 75, 3085-3096. DOI: 10.1021/jo1002906 and interactive table.
23. H. S. Rzepa “The rational design of helium bonds”, Nature Chem., 2010, 2, 390-393. DOI: 10.1038/NCHEM.596 and web enhanced table.
24. P. Rivera-Fuentes, J. Lorenzo Alonso-Gómez, A. G. Petrovic, P. Seiler, F. Santoro, N. Harada, N. Berova, H. S. Rzepa, and F. Diederich, “Enantiomerically Pure Alleno–Acetylenic Macrocycles: Synthesis, Solid-State Structures, Chiroptical Properties, and Electron Localization Function Analysis”, Chem. Eur. J., 2010, DOI: 10.1002/chem.201001087 and interactive figure
25. H. S. Rzepa, “The Nature of the Carbon-Sulfur bond in the species H-CS-OH”, J. Chem. Theory. Comput., 2010, 49, DOI: 10.1021/ct100470g and interactive table.
26. H. S. Rzepa, “Can 1,3-dimethylcyclobutadiene and carbon dioxide co-exist inside a supramolecular cavity?”, Chem. Commun., 2010, DOI: 10.1039/C0CC04023A and interactive table
27. M. R. Crittall, H. S. Rzepa, and D. R. Carbery, “Design, Synthesis, and Evaluation of a Helicenoidal DMAP Lewis Base Catalyst”, Org. Lett., 2011, DOI: 10.1021/ol2001705 and interactive table
28. H. S. Rzepa, “The past, present and future of Scientific discourse”, J. Cheminformatics, 2011, 3, 46. DOI: 10.1186/1758-2946-3-46 and interactive figure 3, figure 4 and figure 5.
29. H. S. Rzepa, “A computational evaluation of the evidence for the synthesis of 1,3-dimethylcyclobutadiene in the solid state and aqueous solution”, Chem. Euro. J., 2013, 19, 4932–4937. DOI: 10.1002/chem.201102942.
30. J. L. Arbour, H. S. Rzepa, L. A. Adrio, E. M. Barreiro, P. G. Pringle and K. K. (Mimi) Hii, “Silver-catalysed enantioselective additions of O-H and N-H to C=C bonds: Non-covalent interactions in stereoselective processes”, Chem. Euro. J., 2012, in press, Web table 1 and Web table 2.
31. H. S. Rzepa, “Chemical datuments as scientific enablers”, J. Chemoinformatics, 2013, 10.1186/1758-2946-5-6.
32. A. P. Buchard, F. Jutz, F. M. R. Kember, H. S. Rzepa, C. K. Williams, C.K., “Experimental and Computational Investigation of the Mechanism of Carbon Dioxide/Cyclohexene Oxide Copolymerization Using A Dizinc Catalyst”, in press. Interactivity box
33. D. C. Braddock, D. Roy, D. Lenoir, E. Moore, H. S. Rzepa, J. I-Chia Wu and P. von R. Schleyer, “Verification of Stereospecific Dyotropic Racemisation of Enantiopure d and l-1,2-Dibromo-1,2-diphenylethane in Non-polar Media”, Chem. Comm., 2012, just published. DOI: 10.1039/C2CC33676F and interactivity box.
34. K. Leszczyńska, K. Abersfelder, M. Majumdar, B. Neumann, H.-G. Stammler, H. S. Rzepa, P. Jutzi and D. Scheschkewitz, “The Cp*Si⁺ Cation as a Stoichiometric Source of Silicon, Chem. Comm., 2012, 48, 7820-7822. DOI: 10.1039/c2cc33911k. Cites links to 10042/to-13974, 10042/to-13982, 10042/to-13969, 10042/20028, 10042/to-13973, 10042/to-13985
35. H. S. Rzepa, “A computational evaluation of the evidence for the synthesis of 1,3-dimethylcyclobutadiene in the solid state and aqueous solution”, Chem. Euro. J., 2013, 4932-4937. DOI: 10.1002/chem.201102942 and WebTable
36. H. S. Rzepa, “Chemical datuments as scientific enablers”, J. Chemoinformatics, 2013, 4, DOI: 10.1186/1758-2946-5-6. The interactivity box is integrated into the body of the article.
37. M. J. Cowley, V. Huch, H. S. Rzepa, D. Scheschkewitz, “A Silicon Version of the Vinylcarbene – Cyclopropene Equilibrium: Isolation of a Base-Stabilized Disilenyl Silylene”, 2013, Nature Chem., 5, 876–879. doi:10.1038/nchem.1751 and Webtable.
38. M. J. S. Gomes, L. F. V. Pinto, H. S. Rzepa, S. Prabhakar, A. M. Lobo, “N-Heteroatom Substitution Effects in 3-Aza-Cope Rearrangements”, Chemistry Central, 2013, 7:94. doi:10.1186/1752-153X-7-94 and Table.
39. H. S. Rzepa and C. Wentrup, “Mechanistic Diversity in Thermal Fragmentation Reactions: a Computational Exploration of CO and CO₂ Extrusions from Five-Membered Rings”, J. Org. Chem., DOI: 10.1021/jo401146k and Table.
40. D. C. Braddock, J. Clarke and H. S. Rzepa “Epoxidation of Bromoallenes Connects Red Algae Metabolites by an Intersecting Bromoallene Oxide – Favorskii Manifold”, Chem. Comm., 2013, DOI: 10.1039/C3CC46720A and Table.
41. M. J. Fuchter, Ya-Pei Lo and H. S. Rzepa, “Mechanistic and chiroptical studies on the desulfurization of epidithiodioxopiperazines reveal universal retention of configuration at the bridgehead carbon atoms”, J. Org. Chem., 2013, 78, 11646-11655. doi:10.1021/jo401316a and data
42. A. Armstrong, R. A. Boto, P. Dingwall, J. Contreras-García, M. J. Harvey, N. Mason and H. S. Rzepa, “The Houk-List Transition states for organocatalytic mechanisms revisited”, Chem. Sci., 2014, 5, 2057-2071. doi:10.1039/C3SC53416B and data, data, data, data, data, data, data, data.
43. S. Lai, H. S. Rzepa, and S. Díez-González, “N-Heterocyclic Carbene or Phosphine-Containing Copper(I) Complexes for the Synthesis of 5-Iodo-1,2,3-Triazoles: Catalytic and Mechanistic Studies”, ACS Catalysis, 2014, doi:10.1021/cs500326e and data, data, data, data
44. A. Jana, I. Omlor, V. Huch, H. S. Rzepa, D. Scheschkewitz, “Neutral and Cationic NHC-Coordinated Heavier Cyclopropylidenes”, Angew. Chemie. Intl. Ed., 2014, doi:10.1002/anie.201405238 and data
45. M. J. Harvey, N. J. Mason and H. S. Rzepa “Digital data repositories in chemistry and their integration with journals and electronic laboratory notebooks”, J. Chem. Inf. Comp., 2014, doi:10.1021/ci500302p and data, data
46. A. Jana, V. Huch, H. S. Rzepa, and D. Scheschkewitz, “A base-coordinated multiply functionalized Ge(II) compound and its reversible dimerization to the corresponding digermene”, Angew. Chemie., 2014, DOI:10.1002/anie.201407751 and data
47. A. E. Aliev, J. R. Arendorf, I. Pavlakos, R. B. Moreno, M. J. Porter, H. S. Rzepa and W. B. Motherwell, “Surfing the π-clouds for Non-covalent Interactions: A comparative Study of arenes versus Alkenes”, Angew. Chemie., 2014, 54, 551-555. doi:10.1021/om501286g and data
48. J. Jana, H. S. Rzepa and D. Scheschkewitz, “A Molecular Complex with Formally Neutral Irongermanide Motif (Fe₂Ge₂)”, Organometallics, 2015, doi:10.1021/om501286g and data
49. E. H. Smith, H. S. Rzepa and M. Hii, “Asymmetric epoxidation: a twinned laboratory and molecular modelling experiment”, J. Chem. Ed., 2015, doi:10.1021/ed500398e and data or here
50. P. Bultinck F. L. Cherblanc, M. J. Fuchter, W. A. Herrebout, Y.-P. Lo, H. S. Rzepa, G. Siligardi, M. Weimar and R. M. Williams, Chiroptical studies on brevianamide B, Org. Chem., 2015, doi:10.1021/jo5022647 and data
51. T. Lanyon-Hogg, M. Ritzefeld, N. Masumoto, A. I. Magee, H. S. Rzepa and E. W. Tate, Modulation of cis-Trans Amide Bond Rotamers in 5-Acyl-6,7-dihydrothieno[3,2-c]pyridines, Org. Chem., 2015, doi: 10.1021/acs.joc.5b00205
Acknowledgments

This post has been cross-posted in PDF format at Authorea.
September 7, 2009
The Fragile Web

One of the many clever things that clever people can do with the Web is harvest it, aggregate it, classify it etc. Its not just Google that does this sort of thing! Egon Willighagen is one of those clever people. He runs the Chemical blogspace which does all sorts of amazing things with blogs.

He sent me a message recently, saying that unfortunately, he was not able to do any amazing things to my blog, since it was not failsafe any more. Apparently, deep down in the software he was using to harvest the details of my blog, an error along the lines of Bytes: 0xA0 0x0A 0x49 0x74 was causing grief. This is the sort of message that would make most people quake. In this instance, the excellent W3C comes to the rescue. By putting this blog feed into their RSS Validator , one can narrow down the error. It proved to be on a single line of an earlier blog posting. Remove this line, and all becomes well. In fact, if the line was displayed on a regular text editor, one eventually notices that the end of the line (which looks just like a space) might be the suspect. Remove just that one character, and the RSS Validator is (almost perfectly) happy. I hope that Egon will be too now!

But the lesson of this little exercise is that a single character can still bring the whole edifice crashing down (or at least my entire blog). Single characters of course have been notorious in the past. One that springs to mind was a single (white) space, inserted by accident into a line of Fortran code. That space subverted the meaning of the code, which in fact was being used to control the navigation of a spacecraft on its way to Jupiter. Result? The probe missed Jupiter by quite a margin, and the entire cost of the mission was lost (around 1$billion!).

It is also a lesson in how an individual might operate within the modern Web. During the period 1993 to around 2001, most of the content on the Web was in the form of static HTML pages. This was written either by hand, or using software tools to do so. This was scary stuff for most people. Then along came two social inventions; the Wiki and the Blog. Each of these hid (most of) the scary HTML from the user, and allowed pain-free (almost) creation of content. As time passed, everyone became accustomed to using such tools, and they started to trust them implicitly to produce valid HTML under the hood. In my case, I trusted the Blog software (WordPress) to both not produce faulty HTML, or at least to detect it if it got in by accident. In this instant, it is more subtle, with an error in the character encoding. But this is the lesson. As the skills of olden time (i.e. writing native HTML) are lost, we will be more and more at the mercy of the modern tools. Will we even notice the errors, which might propagate out with our name attached? Or will the software get even smarter and fix the errors before they cause problems? Will humans become almost entirely redundant?

August 31, 2009

► Necessary Cookies Always Active

Necessary cookies enable essential site features like secure log-ins and consent preference adjustments. They do not store personal data.

► Functional Cookies Remark

Functional cookies support features like content sharing on social media, collecting feedback, and enabling third-party tools.

► Analytical Cookies Remark

Analytical cookies track visitor interactions, providing insights on metrics like visitor count, bounce rate, and traffic sources.

► Advertisement Cookies Remark

Advertisement cookies deliver personalized ads based on your previous visits and analyze the effectiveness of ad campaigns.