Posts Tagged ‘RDF’
Thursday, April 18th, 2019
In a previous post, I looked at the Findability of FAIR data in common chemistry journals. Here I move on to the next letter, the A = Accessible.
The attributes of A[1] include:
- (meta)data are retrievable by their identifier using a standardized communication protocol.
- the protocol is open, free and universally implementable.
- the protocol allows for an authentication and authorization procedure.
- metadata are accessible, even when the data are no longer available.
- The metadata should include access information that enables automatic processing by a machine as well as a person.
Items 1-2 are covered by associating a DOI (digital object identifier) with the metadata. Item 3 relates to data which is not necessarily also OPEN (FAIR and OPEN are complementary, but do not mean the same).
Item 4 mandates that a copy of the metadata be held separately from the data itself; currently the favoured repository is DataCite (and this metadata way well be duplicated at CrossRef, thus providing a measure of redundancy). It also addresses an interesting debate on whether the container for data such as a ZIP or other compressed archive should also contain the full metadata descriptors internally, which would not directly address item 4, but could do so by also registering a copy of the metadata externally with eg DataCite.
Item 4 also implies some measure of separation between the data and its metadata, which now raises an interesting and separate issue (introduced with this post) that the metadata can be considered a living object, with some attributes being updated post deposition of the data itself. Thus such metadata could include an identifier to the journal article relating to the data, information that only appears after the FAIR data itself is published. Or pointers to other datasets published at a later date. Such updating of metadata contained in an archive along with the data itself would be problematic, since the data itself should not be a living object.
Item 5 is the need for Accessibility to relate both to a human acquiring FAIR data and to a machine. The latter needs direct information on exactly how to access the data. To illustrate this, I will use data deposited in support of the previous post and for which a representative example of metadata can be found at (item 4) a separate location at:
data.datacite.org/application/vnd.datacite.datacite+xml/10.14469/hpc/5496
This contains the components:
- <relatedIdentifier relatedIdentifierType="URL" relationType="HasMetadata" relatedMetadataScheme="ORE"schemeURI="http://www.openarchives.org/ore/
">https://data.hpc.imperial.ac.uk/resolve/?ore=5496</relatedIdentifier>
- <relatedIdentifier relatedIdentifierType="URL" relationType="HasPart" relatedMetadataScheme="Filename" schemeURI="filename://aW5wdXQuZ2pm">https://data.hpc.imperial.ac.uk/resolve/?doi=5496&file=1</relatedIdentifier>
Item 6 is an machine-suitable RDF declaration of the full metadata record. Item 7 allows direct access to the datafile. This in turn allows programmed interfaces to the data to be constructed, which include e.g. components for immediate visualisation and/or analysis. It also allows access on a large-scale (mining), something a human is unlikely to try.
It would be fair to say that the A of FAIR is still evolving. Moreover, searches of the DataCite metadata database are not yet at the point where one can automatically identify metadata records that have these attributes. When they do become available, I will show some examples here.
Added: This search: https://search.test.datacite.org/works?
query=relatedIdentifiers.relatedMetadataScheme:ORE shows how it might operate.
References
- M.D. Wilkinson, M. Dumontier, I.J. Aalbersberg, G. Appleton, M. Axton, A. Baak, N. Blomberg, J. Boiten, L.B. da Silva Santos, P.E. Bourne, J. Bouwman, A.J. Brookes, T. Clark, M. Crosas, I. Dillo, O. Dumon, S. Edmunds, C.T. Evelo, R. Finkers, A. Gonzalez-Beltran, A.J. Gray, P. Groth, C. Goble, J.S. Grethe, J. Heringa, P.A. ’t Hoen, R. Hooft, T. Kuhn, R. Kok, J. Kok, S.J. Lusher, M.E. Martone, A. Mons, A.L. Packer, B. Persson, P. Rocca-Serra, M. Roos, R. van Schaik, S. Sansone, E. Schultes, T. Sengstag, T. Slater, G. Strawn, M.A. Swertz, M. Thompson, J. van der Lei, E. van Mulligen, J. Velterop, A. Waagmeester, P. Wittenburg, K. Wolstencroft, J. Zhao, and B. Mons, "The FAIR Guiding Principles for scientific data management and stewardship", Scientific Data, vol. 3, 2016. https://doi.org/10.1038/sdata.2016.18
Tags:Academic publishing, automatic processing, Data management, Digital Object Identifier, EIDR, FAIR data, Findability, Identifiers, Information, Information architecture, Information science, Knowledge, Knowledge representation, metadata, mining, Open Archives Initiative, RDF, Records management, representative, standardized communication protocol, Technical communication, Technology/Internet, Web design, Written communication, XML
Posted in Chemical IT | No Comments »
Thursday, April 18th, 2019
In a previous post, I looked at the Findability of FAIR data in common chemistry journals. Here I move on to the next letter, the A = Accessible.
The attributes of A[1] include:
- (meta)data are retrievable by their identifier using a standardized communication protocol.
- the protocol is open, free and universally implementable.
- the protocol allows for an authentication and authorization procedure.
- metadata are accessible, even when the data are no longer available.
- The metadata should include access information that enables automatic processing by a machine as well as a person.
Items 1-2 are covered by associating a DOI (digital object identifier) with the metadata. Item 3 relates to data which is not necessarily also OPEN (FAIR and OPEN are complementary, but do not mean the same).
Item 4 mandates that a copy of the metadata be held separately from the data itself; currently the favoured repository is DataCite (and this metadata way well be duplicated at CrossRef, thus providing a measure of redundancy). It also addresses an interesting debate on whether the container for data such as a ZIP or other compressed archive should also contain the full metadata descriptors internally, which would not directly address item 4, but could do so by also registering a copy of the metadata externally with eg DataCite.
Item 4 also implies some measure of separation between the data and its metadata, which now raises an interesting and separate issue (introduced with this post) that the metadata can be considered a living object, with some attributes being updated post deposition of the data itself. Thus such metadata could include an identifier to the journal article relating to the data, information that only appears after the FAIR data itself is published. Or pointers to other datasets published at a later date. Such updating of metadata contained in an archive along with the data itself would be problematic, since the data itself should not be a living object.
Item 5 is the need for Accessibility to relate both to a human acquiring FAIR data and to a machine. The latter needs direct information on exactly how to access the data. To illustrate this, I will use data deposited in support of the previous post and for which a representative example of metadata can be found at (item 4) a separate location at:
data.datacite.org/application/vnd.datacite.datacite+xml/10.14469/hpc/5496
This contains the components:
- <relatedIdentifier relatedIdentifierType="URL" relationType="HasMetadata" relatedMetadataScheme="ORE"schemeURI="http://www.openarchives.org/ore/
">https://data.hpc.imperial.ac.uk/resolve/?ore=5496</relatedIdentifier>
- <relatedIdentifier relatedIdentifierType="URL" relationType="HasPart" relatedMetadataScheme="Filename" schemeURI="filename://aW5wdXQuZ2pm">https://data.hpc.imperial.ac.uk/resolve/?doi=5496&file=1</relatedIdentifier>
Item 6 is an machine-suitable RDF declaration of the full metadata record. Item 7 allows direct access to the datafile. This in turn allows programmed interfaces to the data to be constructed, which include e.g. components for immediate visualisation and/or analysis. It also allows access on a large-scale (mining), something a human is unlikely to try.
It would be fair to say that the A of FAIR is still evolving. Moreover, searches of the DataCite metadata database are not yet at the point where one can automatically identify metadata records that have these attributes. When they do become available, I will show some examples here.
Added: This search: https://search.test.datacite.org/works?
query=relatedIdentifiers.relatedMetadataScheme:ORE shows how it might operate.
References
- M.D. Wilkinson, M. Dumontier, I.J. Aalbersberg, G. Appleton, M. Axton, A. Baak, N. Blomberg, J. Boiten, L.B. da Silva Santos, P.E. Bourne, J. Bouwman, A.J. Brookes, T. Clark, M. Crosas, I. Dillo, O. Dumon, S. Edmunds, C.T. Evelo, R. Finkers, A. Gonzalez-Beltran, A.J. Gray, P. Groth, C. Goble, J.S. Grethe, J. Heringa, P.A. ’t Hoen, R. Hooft, T. Kuhn, R. Kok, J. Kok, S.J. Lusher, M.E. Martone, A. Mons, A.L. Packer, B. Persson, P. Rocca-Serra, M. Roos, R. van Schaik, S. Sansone, E. Schultes, T. Sengstag, T. Slater, G. Strawn, M.A. Swertz, M. Thompson, J. van der Lei, E. van Mulligen, J. Velterop, A. Waagmeester, P. Wittenburg, K. Wolstencroft, J. Zhao, and B. Mons, "The FAIR Guiding Principles for scientific data management and stewardship", Scientific Data, vol. 3, 2016. https://doi.org/10.1038/sdata.2016.18
Tags:Academic publishing, automatic processing, Data management, Digital Object Identifier, EIDR, FAIR data, Findability, Identifiers, Information, Information architecture, Information science, Knowledge, Knowledge representation, metadata, mining, Open Archives Initiative, RDF, Records management, representative, standardized communication protocol, Technical communication, Technology/Internet, Web design, Written communication, XML
Posted in Chemical IT | No Comments »
Saturday, February 3rd, 2018
The topic of open citations was presented at the PIDapalooza conference and represents a third component in the increasing corpus of open scientific information.
David Shotton gave us an update on Citations as First Class data objects – Citation Identifiers and introduced (me) to the blog where he discusses this topic. The citations or bibliography has long been regarded as an essential, and until recently inseparable, component at the end of a scientific article. It is also a component easily susceptible to “game play“. Authors can be tempted to self-cite themselves, possibly to excess and perhaps worse, to cite their friends and colleagues for other than purely scientific reasons. There are other issues. Thus to infer the context of any particular citation, one has to read the text where it is cited and this too can be subjected to game play. One may have to “read between the lines” to try to judge whether the citation is being cited favourably as supporting any case being made, or instead to indicate disagreement with the cited authors. An article that is being cited because one disagrees with the conclusions therein may still go on to contribute to the cited author’s “h-index” of esteem. So there are various aspects of citations that deserve improvement, or certainly development and evolution.
Shotton told us that many publishers are now releasing article citations as open (CC0) data in their own right, as urged to do so on the Initiative for Open Citations site. A corpus of some 13 million of these are now available as RDF triples with a SPARQL end-point. This latter means that semantic searches of the corpus can be undertaken. So what are the benefits? Worthy aspirations such as to explore connections between knowledge fields, and to follow the evolution of ideas and scholarly disciplines (similar in fact to the new Dimensions product I discussed in the previous post). When I probed into the various sites linked above, I had in mind to identify some clear scientific outcomes of making them available in this manner, perchance even in the field of chemistry. When I succeed I will follow-up on this post, but at the moment I am not yet in a position to illustrate these benefits with chemical stories. If anyone reading this post has such, please let us know!
I will conclude here by noting much discussion at universities of the future of the scientific article itself; whether it should be increasingly mandated as GOLD Open Access (made so by payment of an article processing charge, or APC, by its authors), or whether journals should retain the hybrid publishing models where only a proportion of articles are GOLD, and the remainder are paid for by subscription fees for licensing access to the non-GOLD articles in the journal. Meanwhile, in what seems sometimes as a separate conversation, the article itself is being dis-assembled into components such as open and/or FAIR data, open citations, infographics, social media and yes, even blogs. Are these two evolutions headed in different directions? Certainly, I think the future is not what it used to be!
Tags:Academic publishing, Applied linguistics, article processing charge, British National Corpus, chemical stories, cited author, Corpus linguistics, David Shotton, Entertainment/Culture, Linguistics, Open access, Quotation, RDF, social media, Texas A&M–Corpus Christi Islanders women's basketball
Posted in Chemical IT | 2 Comments »
Sunday, June 2nd, 2013
A few years ago, we published an article which drew a formal analogy between chemistry and iTunes (sic)[1]. iTunes was the first really large commercial digital music library, and a feature under-the-skin was the use of meta-data to aid discoverability of any of the 10 million (26M in 2013) or so individual items in the store.‡ The analogy to digital chemistry and discoverability of the 70 or so million known molecules is, we argued, a good one.
Well, the digital photography revolution is very similar; I just checked my personal digital photo library to find it contains almost 14,000 photos dating back ten years now. It is not easy to find a particular photograph! Well, the reason I am posting here is to bring to your attention the first 6 minutes or so of an item in the BBC collection.† It is a very nice accessible explanation of the importance of meta-data for photography, and some of the innovative things that are being done for both acquiring and for manipulating this data. As I listened to this, I felt that for photograph, think molecule! And think of all the innovative things that could be done there as well.

Actually, you might reasonably ask how/whether molecular metadata is deployed here in this blog. It certainly is on Steve Bachrach’s site (see for example this recent post where you will find InChI keys for every molecule displayed; thus InChIKey=GOOHAUXETOMSMM-GSVOUGTGSA-N). I don’t do that on this blog (perhaps I should), but instead I provide URL links to a digital repository where they are displayed: thus follow http://dx.doi.org/10.6084/m9.figshare.706756 and you will find InChIKey=USGIFUSOUDIDJL-UHFFFAOYSA-N where it can be used as a search term to find any other instances of the same molecule at the site.
‡ Historical note: In 1997, we produced a CD-ROM containing the proceedings of the Electronic Conference on Trends in Heterocyclic Chemistry (ECHET96), H. S. Rzepa, J. Snyder and C. Leach, (Eds), ISBN 0-85404-894-4. Because it was entirely digital, we were able to include an “app” which created a visual navigation point derived from analysing the meta-data present (the entire contents had been expressed in HTML and so it was relatively easy to gather this meta-data). The software we used was called Hotsauce and was based on MCF (meta content framework) as developed by Apple engineer Ramanathan V. Guha for an internal experiment (we sometimes forget that in those days Apple was the Google of its day!). Guha left Apple, joined Netscape and MCF became RDF. The rest, as they say, is history. But you can see an early deployment on the CD-ROM I refer to above (these are NOT yet collectors items. Hint!).
† This being the BBC iPlayer collection, it is quite possible that it is not accessible outside the UK, or indeed even within the UK it may only be available for 8 days after broadcast. Which would be a shame.
References
- O. Casher, and H.S. Rzepa, "SemanticEye: A Semantic Web Application to Rationalize and Enhance Chemical Electronic Publishing", Journal of Chemical Information and Modeling, vol. 46, pp. 2396-2411, 2006. https://doi.org/10.1021/ci060139e
Tags:Apple, BBC, digital photography, engineer, Google, Historical, HTML, metadata, opendata, RDF, search term, Steve Bachrach, United Kingdom
Posted in Chemical IT | No Comments »
Wednesday, December 22nd, 2010
If you visit this blog you will see a scientific discourse in action. One of the commentators there notes how they would like to access some data made available in a journal article via the (still quite rare) format of an interactive table, but they are not familiar with how to handle that kind of data (file). The topic in question deals with various kinds of (chemical) data, including crystallographic information, computational modelling, and spectroscopic parameters. It could potentially deal with much more. It is indeed difficult for any one chemist to be familiar with how data is handled in such diverse areas. So I thought I would put up a short tutorial/illustration in this post of how one might go about extracting and re-using data from this one particular source.

Interactive Journal table
The above is a snapshot of part of the table in question, with a box in the middle set aside for a Jmol applet to appear. What might be both less obvious, and less familiar to many who might have seen such a display is the very rich environment available for manipulating the data. To expose some of this, proceed as follows:
- Firstly, load a molecule into the Jmol window by clicking on e.g. the hyperlink shown below.

Loading a molecule
- The display shown below will appear, in this case a set of coordinates used to present a 3D model of a molecule, which can be rotated, zoomed, etc. It also has been labelled with various selected bond lengths etc.

Interactive table with molecule loaded
- To extract data, right-click anywhere in the molecule area. Navigate through the menus which appear as shown below. In this case, the data is present in the form of a Gaussian log file. This can contain the history of the particular calculation performed (e.g. a geometry optimisation) or as in this case, all 3N-6 calculated normal vibrational modes. The one of interest here is number 318, being an O=C=O stretching mode.

An Interactive table in a chemistry journal.
- This mode can now be manipulated visually by selecting various parameters:

Manipulating a vibrational mode
- Jmol has a scintillating display of other options, and more are being added all the time, so the above display is by no means the limit of what one can do.
- Now to the most important bit. Invoke the menu as shown below, whereupon a copy of the relevant file (gzipped in this case to reduce its size) will be downloaded to your local system. You will now need to use a program on your own computer capable of reading and processing such a file (after unzipping).

Downloading a data file.
- There may be a bewildering variety of programs and toolkits which may perform the operation you wish on such a file. Some are commercial, some are open source. To help people get going, I link to one of the latter type here, You might also want to visit the Quixote project for ideas.
- We are not quite finished yet. Perhaps a Gaussian log file does not suite your purpose. Well, now try clicking on this link

Link to a digital repository
- This produces a page such as below, which contains more files. In this example, several molecular identifiers are present (InChI and InChI key) to help identify the uniqueness of the system, the molecular coordinates are available as a .cml file which itself can be processed by a variety of software tools, the original file used to run the calculation can be inspected (if you want to eg repeat it) as input.gjf, the logfile we have seen above, and a checkpoint file, which is most useful when using either the Gaussian program system or a visualiser (Gaussview, ChemBio3D etc, both commercial programs). A SMILES string is also offered, and sometimes (not in this example) a so-called wavefunction file which can be used by some programs to analyse the wavefunction, and perform e.g. QTAIM, ELF, NCI analyses.

A digital repository page.
It is now up to the user to identify suitable processing programs on their computer which fit their purpose.
- There is one other file present which I have not yet explained, the mets.xml manifest. This is a metadata file, containing (along with much else) an RDF declaration of (some) of the properties of the molecule. In theory at least, this file could be automatically harvested for the RDF, which could be injected into a triple store, and queried semantically using eg SPARQL. That is part of the semantic web.
I hope some of the screenshots here make the process of extracting data from an interactive table article a little more obvious. I must declare that this way of doing it is just one of the ways being explored and also (much to my regret) is not yet particularly common. But hopefully you might capture a little of what some of us believe to be the future of scientific journals.
Tags:chemical, chemical journals, chemist, opendata, RDF, semantic web, software tools, suitable processing programs, XML
Posted in Chemical IT, Interesting chemistry | 7 Comments »
Tuesday, February 9th, 2010
Scientists write blogs for a variety of reasons. But these do probably not include getting tenure (or grants). For that one has to publish. And I will argue here that a blog is not currently accepted as a scientific publication (for more discussion on this point, see this article by Maureen Pennock and Richard Davis). For chemists, publication means in a relatively small number of high-impact journals. Anything more than five articles a year in such journals, and your tenure is (probably) secure (if not your funding).
Can one do both? Post a blog item, and then publish a follow-up in a high-impact journal? Well, yes and no.
I had better explain. A blog post is more often then not catalysed by reading an article, viewing another blog, or discussing something with a colleague. One posts in the hope of getting some feedback, from which one’s ideas might mature, develop, or indeed collapse! Scientists have long done this of course, albeit with a colleague down the corridor, at conferences or seminars. The ideas thus cast forth may also of course also get stolen, and so these traditional mechanisms for floating ideas are often very short on detail. Sometimes, returning to the idea of blogs, one post can lead to another, and the nature of the blog means the ideas can evolve, mutate very rapidly. Eventually, one might wish to take a good overview of all the various efforts. At this point, one is now considering publishing a journal article, since currently at least, the longevity of a journal is considered longer than that of a blog (see this post here for more ruminations on that theme). There are other good reasons for then choosing a journal rather than one’s blog. The QA (quality assurance) necessary to get an article accepted in a good journal is, let’s face it, rather greater than that of a blog (although to be fair, it is only motivation that limits the quality of the latter). Apart from adding all those control experiments/calculations that may be missing from the blog, one also must be far more fastidious in citing the literature correctly.
I do speak from (thus far one) experience. The story starts here, this being the initial post on a story that broke on Steve Bachrach’s blog about a compound with a potentially pentavalent carbon; Steve’s own post was based on an original article on the theme. Several more blog posts followed as the logical theme gradually developed. I eventually decided that telling how this set of logical connections came about was almost as interesting as the specific molecules it covered. The story had also evolved from discussing the element Astatine to speculating about the rare gas Helium, a somewhat less than obvious connection path (and how to discover connections between disparate and apparently unconnected concepts is a different story). Where should the story about how astatine was connected to helium be told? I decided it should indeed be in a formally published journal article. But it was also important to tell the story more or less as it happened, and particularly to include the role that the blogs themselves had played.
In fact, as soon as I started this undertaking, I realised that more calculations, and at a rather higher theoretical level, needed to be done in order to persuade the referees of the article that the science was sound, and also that it advanced our knowledge significantly. In the event, although the calculations were repeated, enhanced, or evolved in some manner or other, and new ideas injected, none of the original assertions was proven wrong (and of course its now not just me that thinks this, but the 2-3 referees who also commented). Ultimately, I would estimate I ended up spending perhaps ten times as much time on the journal article as on the sum of the initial blog posts on the topic. It an interesting question as to whether the motivation needed to put in this amount of care and attention could also have been generated with blog as the sole output medium (see my opening remarks).
The article is now published (DOI: 10.1038/nchem.596). Of course, you can only read it if your institution (or you personally) has a subscription to the journal (although, like this blog, the article can be located using public search facilities such as Google Scholar). There is another aspect of both the blog and the article worth mention. Both contain data. The blogs contain the molecular coordinates of all the molecules discussed, as well as the DOIs for the digital repository where the calculations are archived. So does the article, in the form of an interactive table, although again access to this table may or may not require a journal subscription (in this regard I note that whereas an earlier article I wrote for this publisher, see DOI 10.1038/nchem.373, is protected from non-subscribers, the interactive table which is part of the article is openly accessible. The journal deserves full credit for allowing this data to be on public access).
There is another aspect of the blog and the article, which was alluded to above. I introduced the theme of linking concepts together. This very blog post (and all the others) have been subjected to analysis using the calais archive tagger. This automatically determines appropriate tags to annotate each post with, and then declares them using standard methods (which include RDF). The published article is similarly tagged by the publisher. In theory at least, this collection of materials, the blogs and their tags, and the article and indeed commentaries about both, should be reconcilable using appropriate semantic searches. But at this point, I feel that this topic deserves separate attention and I will close here.
Tags:astatine, Chemical IT, General, Google, helium, Maureen Pennock, public search facilities, rare gas, RDF, Richard Davis, Steve Bachrach
Posted in Chemical IT, General | 1 Comment »
Sunday, January 17th, 2010
A Semantic blog is one in which the system at least in part understands about (some of the) concepts and topics that are in the content. The idea is that this content can be more intelligently (is that the correct word?) and importantly, automatically searched, harvested, and connected to the same or similar concepts found elsewhere in other blogs and the Web as whole. I am writing this blog using Firefox, having added a Firefox extension called Zemanta. As I write, the system offers suggestions for similar themes elsewhere that I could choose to link to the blog (and obviously the more one writes, or the more specific the terms one uses, the more sensible the suggestions become. At this precise moment, it is still offering fairly generic suggestions, one of which I have just chosen to add). My purpose in this particular post is to explore how the very process of writing a blog might be affected by such a product. I am also inferring (but cannot add detail at the moment) that all the (semantic) connections or links to other materials will be expressed in this blog using some form of formal declaration, such as e.g. RDF or RDFa.
Thus this blog has a WordPress plugin called wp-RDFa as part of its library. This gathers meta-data in two forms, FOAF and Dublin-Core, and expresses it using the RDFa formalism. This is really just a standard way of letting any software that might visit the blog know that this meta-data is available for harvesting. FOAF is something we discussed a year or so back; it is a formal way of expressing information about yourself in RDF (see an ACS talk on the topic), and in particular indicating what you are interested in (as a chemist in my case), who you collaborate with, where you visit (information of course that you do wish to make public, you do not have to include any private details). Nowadays, a variety of social networking tools have become semantically enabled. This blog is, a flavour of Wikis (SemediaWiki, and its potential as a format for science journals), Second Life and many others. At the moment, there is little apparent added value emerging from such enrichment (I have just noted another two Zemanta articles flagged, which I will add at this instant) and certainly little in chemistry.
But what could one aspire to? For example, Steve Bachrach on his blog routinely adds InChI identifiers and keys to uniquely identify all molecules mentioned on his site. Just imagine a situation where one is describing a molecule in one’s own blog, and e.g. Zemanta instantly flags up any other article out there which has tagged the same molecule. That article and your blog can now be semantically identified as talking about the same system. A harvester could collect the information about this molecule, and create a superset of information about it (hey, we chemists already have such a system, it is called Chemical Abstracts! But of course its not quite the same, and I had better reserve a comparison with CAS for another post), which in turn enriches resources such as Zemanta. Its a sort of positive feed-back loop!
Well, the Semantic Web has been a long time coming (see DOI: http://dx.doi.org/10.1021/ci000406v or 10.1087/095315101750240421 which were both written in 2001), and since it has not yet changed the Web, some tend to write it off as a lost cause. Perhaps the semantification of blogs will make a difference?
Tags:Chemical IT, RDF, semantic
Posted in Chemical IT | 2 Comments »
Monday, September 7th, 2009
The science journal is generally acknowledged as first appearing around 1665 with the Philosophical Transactions of the Royal Society in London and (simultaneously) the French Academy of Sciences in Paris. By the turn of the millennium, around 10,000 science and medical journals were estimated to exist. By then, the Web had been around for a decade, and most journals had responded to this new medium by re-inventing themselves for it. For most part, they adopted a format which emulated paper (Acrobat), with a few embellishments (such as making the text fully searchable) and then used the Web to deliver this new reformulation of the journal. Otherwise, Robert Hooke would have easily recognized the medium he helped found in the 17th century.
In 1994, a small group of us thought that one could, and indeed should go further than emulated paper. We argued [1] that journals should be activated by delivering not merely the logic of a scientific argument, but also the data on which it might have been based. Of course, we encountered the usual problem; doing this might cost publishers more in production resources, and in the absence of a market prepared to pay the extra, the business model did not make sense (to the publishers). Well, 15 years later, and most publishers are indeed now thinking about how their journals can be enhanced. A number of interesting projects (the RSC’s Project Prospect is one which strives to bring science alive) have emerged. Another is the topic of this blog; the activation of the journal with molecular coordinates and data using the Jmol applet.
Initially (~2005), this project met with resistance from publishers, and the issue really amounted to what the definitive version of a scientific article should be. Should that definitive version be printable? That model, after all had served the community well for more than 300 years! And journals from the very beginning are still as readable now as when first published. In other words, print lasts! But print is pretty limiting after all. For a start, it is limited to 2D static representations. Molecules, by and large, do their magic in a dynamic three dimensions (4D in an Einsteinian sense). But print is also expensive; not merely to produce, but to transport paper around the world.
From the turn of the millennium, a number of publishers, amongst them the American Chemical Society, started to evolve the scientific article such that the pre-eminent version would now be considered to be the HTML form (perhaps as a prelude to phasing out print entirely? See an interesting commentary by a journal editor) and perhaps a digital Acrobat form which would be deemed to loose some of its functionality once printed (again see here for how Acrobat can be used to enhance things). Again however, a chicken-and-egg scenario resulted. To enhance the articles with extra functionality (such as data), they would need to find authors prepared to put the extra work into preparing the material. In fact, most authors already do that, but they call it supporting information. This is often highly data rich, covering materials such as spectra, coordinates and other information nowadays provided to researchers for analysis. Unfortunately, what has been missing is the education of authors to provide this information in a proper digital form which can be easily re-used by others, and on a Web page, converted automatically to nice interactive models. Most spectra which form part of the supporting information are in fact still scanned versions of printed spectra!
Enter computational chemists. Nowadays, they live in a world that truly does not need printing! Almost all of their data is already suitably digital. So perhaps it is no surprise to find that when enhanced journal articles started appearing around 2005, many were produced by this group of chemists. By now perhaps you are wondering what such an article might look like. Well, the remainder of this blog will be devoted to listing some examples. You will also notice that they come exclusively from our own publications. Perhaps someone will find the time to collect a far more representative set to better illustrate the diversity of this form, and how it is evolving. Meanwhile, you might wish to take a look at the following.
Part 1: The early days: 1994 onwards
These examples all relied on a browser plugin called Chime, which is no longer with us! Hence the pages designed to invoke it no longer display properly. But the data associated with the articles is still there!
- An early 1994 example of (hyper)activating a journal article can be seen here as the preliminary communication and
- in 1995 here as the final full article. I am told that this was the article that actually inspired the developers of Chime to enhance (Netscape) with a chemical plugin.
- This one from 1998 illustrates how articles can decay in functionality when Chime is no longer available.
- An ab initio and MNDO-d SCF-MO Computational Study of Stereoelectronic Control in Extrusion Reactions of R2I-F Iodine (III) Intermediates, M. A. Carroll, S. Martin-Santamaria, V. W. Pike, H. S. Rzepa and D. A. Widdowson, Perkin Trans. 2, 1999, 2707-2714 with the supporting information here.
- Huckel and Mobius Aromaticity and Trimerous transition state behaviour in the Pericyclic Reactions of [10], [14], [16] and [18] Annulenes. Sonsoles Martên-Santamarêa, Balasundaram Lavan and H. S. Rzepa, J. Chem. Soc., Perkin Trans 2, 2000, 1415. with the supporting information here.
- Peter Murray-Rust, H. S. Rzepa and Michael Wright, “Development of Chemical Markup Language (CML) as a System for Handling Complex Chemical Content”, New J. Chem., 2001, 618-634. DOI: 10.1039/b008780g. This article broke new ground in that the supporting information was something of a misnomer. It was expressed entirely in XML, including all the chemistry data, and used XSLT transforms on the fly to regenerate the article. In that sense, it was actually a superset of the published article. It would be fair to say that this article was rather ahead of its time (although it does seem appropriate to publish it in a new journal!).
- M. Jakt, L. Johannissen, H. S. Rzepa, D. A. Widdowson and R. Wilhelm, “A Computational Study of the Mechanism of Palladium Insertion into Alkynyl and Aryl Carbon-Fluorine bonds”, Perkin Trans. 2, 2002, 576-581 and supporting information.
- P. Murray-Rust and H. S. Rzepa, chapter in “Handbook of Chemoinformatics. Part 2. Advanced Topics.”, ed. J. Gasteiger and T. Engel, 2003, Vol 1, was not enhanced per se, but did lay out the principles of how it might/should be done.
- K. P. Tellmann, M. J. Humphries, H. S. Rzepa and V. C. Gibson, “An experimental and computational study of β-H transfer between organocobalt complexes and 1-alkenes”, Organometallics, 2004, 23, 5503-5513. DOI: 10.1021/om049581h and supporting information.
Part 2: 2005.
These four examples all now invoke Jmol, which downloads upon request and hence does not rely on the presence of any browser plugin. The four articles were submited with supporting information in the form of HTML. These were associated with the main article, but were not formal part of that article. In that sense, they represent an incarnation of the traditional model, with all the data firmly resident in the supporting information.
- Gibson, Vernon C.; Marshall, Edward L.; Rzepa, H. S. ” A computational study on the ring-opening polymerization of lactide initiated by β-diketiminate metal alkoxides: The origin of heterotactic stereocontrol”, J. Am. Chem. Soc., 2005, 127, 6048-6051. DOI: 10.1021/ja043819b and supporting information.
- H. S. Rzepa, Mobius aromaticity and delocalization”, Chem. Rev., 2005, 105, 3697 – 3715. DOI: 10.1021/cr030092l and supporting information.
- H. S. Rzepa, “Double-twist Mšbius Aromaticity in a 4n+2 Electron Electrocyclic Reaction”, 2005, Chem Comm, 5220-5222. DOI: 10.1039/b510508k The supporting information is also available directly.
- H. S. Rzepa, “A Double-twist Mobius-aromatic conformation of [14]annulene”, Org. Lett., 2005, 7, 637 – 4639. DOI: 10.1021/ol0518333 and supporting information.
Part 3: 2006 onwards
The supporting information has now been assimilated into the main body of the article proper, and within these confines contribute components such as enhanced figures or tables (i.e. enhanced with data)
- A. P. Dove, V. C. Gibson, E. L. Marshall, H. S. Rzepa, A. J. P. White and D. J. Williams, “Synthetic, Structural, Mechanistic and Computational Studies on Single-Site β-Diketiminate Tin(II) Initiators for the Polymerization of rac-Lactide”, J. Am. Chem. Soc., 2006,128, 9834-9843. DOI: 10.1021/ja061400a The enhancement can be seen in Figure 11.
- O. Casher and H. S. Rzepa, “SemanticEye: A Semantic Web Application to Rationalise and Enhance Chemical Electronic Publishing”, J. Chem. Inf. Mod., 2006, 46, 2396-2411. DOI: 10.1021/ci060139e
- H S. Rzepa and M. E. Cass, “A Computational Study of the Nondissociative Mechanisms that Interchange Apical and Equatorial Atoms in Square Pyramidal Molecules”, Inorg. Chem., 2006, 45, 3958–3963. DOI 10.1021/ic0519988. Interactive table at 10.1021/ic0519988/ic0519988.html
- M. E. Cass and H. S. Rzepa, “In Search of The Bailar Twist and Ray-Dutt mechanisms that racemize chiral tris-chelates: A computational study of Sc(III), V(III), Co(III), Zn(II) and Ga(III) complexes of a ligand analog of acetylacetonate”, Inorg. Chem., 2007, 49, 8024-8031. DOI: 10.1021/ic062473y The enhancement can be seen in Figure 2
- H. S. Rzepa, “Lemniscular Hexaphyrins as examples of aromatic and antiaromatic Double-Twist Möbius Molecules”, Org. Lett., 2008, 10, 949-952.DOI:10.1021/ol703129z The enhancement can be seen in Web Table 1.
- D. C. Braddock and H. S. Rzepa, “Structural Reassignment of Obtusallenes V, VI and VII by GIAO-based Density functional prediction”, J. Nat. Prod., 2008, DOI: 10.1021/np0705918 and WEO1.
- S. M. Rappaport and H S. Rzepa, “Intrinsically Chiral Aromaticity. Rules Incorporating Linking Number, Twist, and Writhe for Higher-Twist Möbius Annulenes”, J. Am. Chem. Soc., 2008, 130,, 7613-7619. DOI: 10.1021/ja710438j and WEO1 to 4
- C. S. M. Allan and H. S. Rzepa, “AIM and ELF Critical point and NICS Magnetic analyses of Möbius-type Aromaticity and Homoaromaticity in Lemniscular Annulenes and Hexaphyrins”, J. Org. Chem., 2008, 73, 6615-6622. DOI: 10.1021/jo801022b and WEO1
- C. S. M. Allan and H. S. Rzepa, “Chiral aromaticities. Möbius Homoaromaticity”, J. Chem. Theory. Comp., 2008, 4, 1841-1848. DOI: 10.1021/ct8001915 and WEO1
- C. S. M Allan and H. S. Rzepa, “The structure of Polythiocyanogen: A Computational investigation”, Dalton Trans., 2008, 6925 – 6932. DOI: 10.1039/b810147g and enhanced Table
- H. S. Rzepa, “Wormholes in Chemical Space connecting Torus Knot and Torus Link π-electron density topologies”, Phys. Chem. Chem. Phys., 2009, 1340-1345. DOI: 10.1039/b810301a and enhanced Table.
- H. S. Rzepa, “The Chiro-optical properties of a Lemniscular Octaphyrin”, Org. Lett., 2009, 11, 3088-3091. DOI: 10.1021/ol901172g
- C. S. Wannere, H. S. Rzepa, B. C. Rinderspacher, A. Paul, H. F. Schaefer III, P. v. R. Schleyer and C. S. M. Allan, “The geometry and electronic topology of higher-order Möbius charged Annulenes”, J. Phys. Chem., 2009, DOI: 10.1021/jp902176a and enhanced table
- H. S. Rzepa, “The distortivity of π-electrons in conjugated Boron rings.”, Phys. Chem. Chem. Phys., 2009, DOI: 10.1039/B911817A and enhanced table.
- H. S. Rzepa, “The importance of being bonded”, Nature Chem., 2009, DOI: 10.1038/nchem.373 and the exploratorium.
- King Kuok Hii, J.L.Arbour, H.S.Rzepa, A.J.P.White, “Unusual Regiodivergence in Metal-Catalysed Intramolecular Cyclisation of γ-Allenols”, Chem. Commun, 2009, DOI: 10.1039/b913295c and enhanced table.
- L. F. V. Pinto, P. M. C. Glória, M. J. S. Gomes, H. S. Rzepa, S. Prabhakar, A. M. Lobo. “A Dramatic Effect of Double Bond Configuration in N-Oxy-3-aza Cope Rearrangements – A simple synthesis of functionalised allenes”, Tet. Lett., 2009, 50, 3446-3449. DOI: 10.1016/j.tetlet.2009.02.228 and interactive table.
- H. S. Rzepa and C. S. M. Allan, “Racemization of isobornyl chloride via carbocations: a non-classical look at a classic mechanism”, J. Chem. Educ., 2010, DOI: 10.1021/ed800058c and interactive table.
- K. Abersfelder, A. J. P. White, H. S. Rzepa, and D. Scheschkewitz “A Tricyclic Aromatic Isomer of Hexasilabenzene”, Science, 2010, DOI: 10.1126/science.1181771 and interactive table.
- A. C. Spivey, L. Laraia, A. R. Bayly, H. S. Rzepa and A. J. P. White “Stereoselective Synthesis of cis- and trans-2,3-Disubstituted Tetrahydrofurans via Oxonium−Prins Cyclization: Access to the Cordigol Ring System”, Org. Lett., 2010, DOI 10.1021/ol9024259 and interactive table.
- J. Kong, P. v. R. Schleyer and H. S. Rzepa, “Successful Computational Modeling of Iso-bornyl Chloride Ion-Pair Mechanisms”, J. Org. Chem., 2010, DOI: 10.1021/jo100920e and interactive table.
- A. Smith, H. S. Rzepa, A. White, D. Billen, K. K. Hii, “Delineating Origins of Stereocontrol in Asymmetric Pd-Catalyzed α-Hydroxylation of 1,3-Ketoesters”, J. Org. Chem., 2010, 75, 3085-3096. DOI: 10.1021/jo1002906 and interactive table.
- H. S. Rzepa “The rational design of helium bonds”, Nature Chem., 2010, 2, 390-393. DOI: 10.1038/NCHEM.596 and web enhanced table.
- P. Rivera-Fuentes, J. Lorenzo Alonso-Gómez, A. G. Petrovic, P. Seiler, F. Santoro, N. Harada, N. Berova, H. S. Rzepa, and F. Diederich, “Enantiomerically Pure Alleno–Acetylenic Macrocycles: Synthesis, Solid-State Structures, Chiroptical Properties, and Electron Localization Function Analysis”, Chem. Eur. J., 2010, DOI: 10.1002/chem.201001087 and interactive figure
- H. S. Rzepa, “The Nature of the Carbon-Sulfur bond in the species H-CS-OH”, J. Chem. Theory. Comput., 2010, 49, DOI: 10.1021/ct100470g and interactive table.
- H. S. Rzepa, “Can 1,3-dimethylcyclobutadiene and carbon dioxide co-exist inside a supramolecular cavity?”, Chem. Commun., 2010, DOI: 10.1039/C0CC04023A and interactive table
- M. R. Crittall, H. S. Rzepa, and D. R. Carbery, “Design, Synthesis, and Evaluation of a Helicenoidal DMAP Lewis Base Catalyst”, Org. Lett., 2011, DOI: 10.1021/ol2001705 and interactive table
- H. S. Rzepa, “The past, present and future of Scientific discourse”, J. Cheminformatics, 2011, 3, 46. DOI: 10.1186/1758-2946-3-46 and interactive figure 3, figure 4 and figure 5.
- H. S. Rzepa, “A computational evaluation of the evidence for the synthesis of 1,3-dimethylcyclobutadiene in the solid state and aqueous solution”, Chem. Euro. J., 2012, in press.
- J. L. Arbour, H. S. Rzepa, L. A. Adrio, E. M. Barreiro, P. G. Pringle and K. K. (Mimi) Hii, “Silver-catalysed enantioselective additions of O-H and N-H to C=C bonds: Non-covalent interactions in stereoselective processes”, Chem. Euro. J., 2012, in press, Web table 1 and Web table 2.
- H. S. Rzepa, “Chemical datuments as scientific enablers”, J. Chemoinformatics, submitted.
- A. P. Buchard, F. Jutz, F. M. R. Kember, H. S. Rzepa, C. K. Williams, C.K., “Experimental and Computational Investigation of the Mechanism of Carbon Dioxide/Cyclohexene Oxide Copolymerization Using A Dizinc Catalyst”, in press. Interactivity box
- D. C. Braddock, D. Roy, D. Lenoir, E. Moore, H. S. Rzepa, J. I-Chia Wu and P. von R. Schleyer, “Verification of Stereospecific Dyotropic Racemisation of Enantiopure d and l-1,2-Dibromo-1,2-diphenylethane in Non-polar Media”, Chem. Comm., 2012, just published. DOI: 10.1039/C2CC33676F and interactivity box.
- K. Leszczyńska, K. Abersfelder, M. Majumdar, B. Neumann, H.-G. Stammler, H. S. Rzepa, P. Jutzi and D. Scheschkewitz, “The Cp*Si+ Cation as a Stoichiometric Source of Silicon, Chem. Comm., 2012, 48, 7820-7822. DOI: 10.1039/c2cc33911k. Cites links to 10042/to-13974, 10042/to-13982, 10042/to-13969, 10042/20028, 10042/to-13973, 10042/to-13985
- H. S. Rzepa, “A computational evaluation of the evidence for the synthesis of 1,3-dimethylcyclobutadiene in the solid state and aqueous solution”, Chem. Euro. J., 2013, 4932-4937. DOI: 10.1002/chem.201102942 and WebTable
- H. S. Rzepa, “Chemical datuments as scientific enablers”, J. Chemoinformatics, 2013, 4, DOI: 10.1186/1758-2946-5-6. The interactivity box is integrated into the body of the article.
- M. J. Cowley, V. Huch, H. S. Rzepa, D. Scheschkewitz, “A Silicon Version of the Vinylcarbene – Cyclopropene Equilibrium: Isolation of a Base-Stabilized Disilenyl Silylene”, 2013, Nature Chem., in press and Webtable.
- M. J. S. Gomes, L. F. V. Pinto, H. S. Rzepa, S. Prabhakar, A. M. Lobo, “N-Heteroatom Substitution Effects in 3-Aza-Cope Rearrangements”, Chemistry Central, 2013, 7:94. doi:10.1186/1752-153X-7-94 and Table.
- H. S. Rzepa and C. Wentrup, “Mechanistic Diversity in Thermal Fragmentation Reactions: a Computational Exploration of CO and CO2 Extrusions from Five-Membered Rings”, J. Org. Chem., DOI: 10.1021/jo401146k and Table.
- D. C. Braddock, J. Clarke and H. S. Rzepa “Epoxidation of Bromoallenes Connects Red Algae Metabolites by an Intersecting Bromoallene Oxide – Favorskii Manifold”, Chem. Comm., 2013, DOI: 10.1039/C3CC46720A and Table.
- M. J. Fuchter, Ya-Pei Lo and H. S. Rzepa, “Mechanistic and chiroptical studies on the desulfurization of epidithiodioxopiperazines reveal universal retention of configuration at the bridgehead carbon atoms”, J. Org. Chem., 2013, in press. doi: 10.1021/jo401316a and table.
References
- H.S. Rzepa, B.J. Whitaker, and M.J. Winter, "Chemical applications of the World-Wide-Web system", Journal of the Chemical Society, Chemical Communications, pp. 1907, 1994. https://doi.org/10.1039/c39940001907
Tags:A. I. Magee, A. Jana, A. P. Dove, Acrobat, American Chemical Society, aqueous solution, Balasundaram Lavan, C. S. M Allan, C. Wentrup, Chemical IT, chemical plugin, Chemoinformatics, Colorado, D. A. Widdowson, D. C. Braddock, D. J. Williams, D. R. Carbery, D. Scheschkewitz, Dalton Trans, digital Acrobat, E. H. Smith, E. M. Barreiro, E. W. Tate, Enhance Chemical Electronic Publishing, Extrusion Reactions, F. Diederich, F. Santoro, French Academy, G. Siligardi, G. Stammler, Ge, H. S. Rzepa, HTML, I. Omlor, I. Pavlakos, Interchange Apical, Interesting chemistry, Ion-Pair Mechanisms, J. Clarke, J. Jana, J. L. Arbour, J. Lorenzo Alonso-Gómez, J. P. White, J. R. Arendorf, journal editor, K. K. (Mimi) Hii, K. P. Tellmann, King, Kuok Hii, L. A. Adrio, L. Johannissen, Lewis Base Catalyst, M. E. Cass, M. Hii, M. J. Cowley, M. J. Fuchter, M. J. Harvey, M. J. Humphries, M. J. Porter, M. Jakt, M. R. Crittall, M. Ritzefeld, M. Weimar, Marshall, Michael Wright, N. Berova, N. Harada, N. J. Mason, N. Mason, N. Masumoto, O. Casher, opendata, P. G. Pringle, P. Jutzi, P. Lo, P. Seiler, Paris, Peter Murray-Rust, polymerization, Porter, printing, R. B. Moreno, R. M. Williams, R. Schleyer, R. Wilhelm, Rappaport, RDF, representative, Robert Hooke, Royal Society in London, S. Díez-González, S. Lai, S. M. Allan, S. Martin-Santamaria, Sonsoles Martên-Santamarêa, Square Pyramidal Molecules, T. Lanyon-Hogg, the Philosophical Transactions of the Royal Society, V, V. C. Gibson, V. Huch, V. W. Pike, V(III) Co, W. B. Motherwell, Web Application, Web Table, XML, XSLT, Ya-Pei Lo, β-diketiminate metal alkoxides
Posted in Chemical IT, Interesting chemistry | 6 Comments »