Chemical IT « Henry Rzepa's blog

Archive for the ‘Chemical IT’ Category

Revisiting (and maintaining) a twenty year old web page. Mauveine: The First Industrial Organic Fine-Chemical.

Thursday, February 2nd, 2017

Almost exactly 20 years ago, I started what can be regarded as the precursor to this blog. As part of a celebration of this anniversary, I revisited the page to see whether any of it had withstood the test of time. Here I recount what I discovered.

The site itself is at www.ch.ic.ac.uk/motm/perkin.html and has the title “Mauveine: The First Industrial Organic Fine-Chemical” It was an application of an earlier experiment[1] to which we gave the title “Hyperactive Molecules and the World-Wide-Web Information System“. The term hyperactive was supposed to be a play on hyperlinking to the active 3D models of molecules built using their 3D coordinates. The word has another, more negative, association with food additives such as tartrazine – which can induce hyperactivity in children – and we soon discontinued the association. This page was cast as a story about a molecule local to me in two contexts; the first being that the discoverer of mauveine, W. H. Perkin, had been a student at what is now the chemistry department at Imperial College. The second was the realization that where we lived in west London was just down the road from Perkin’s manufacturing factory. Armed with (one of the first) digital cameras, a Kodak DC25, I took some pictures of the location and added them later to the web page. The page also included two sets of 3D coordinates for mauveine itself and alizarin, another dyestuff associated with the factory. These were “activated” using HTML to make use of the then very new Chime browser plugin; hence the term hyperactive molecule.

This first effort, written in December 1995, soon needed revision in several ways. I note that I had maintained the site in 1998, 2001, 2004 and 2006. This took the form of three postscripts to add further chemical context and more recent developments and in replacing the original Chime code for Java code to support the new Jmol software (Chime itself had been discontinued, probably around 2001 or possibly 2004). With the passage of a further ten years, I now noticed that the hyperactive molecules were no longer working; the original Jmol applet was no longer considered secure by modern browsers and hence deactivated. So I replaced this old code with the latest version (14.7.5 as JmolAppletSigned.jar) and this simple fix has restored the functionality. The coordinates themselves were invoked using the HTML applet tag, which amazingly still works (the applet tag had replaced an earlier one, which I think might have been embed?). A modern invocation would be by using e.g. the JSmol Javascript based tool and so perhaps at some stage this code will indeed need further revision when the Java-based applet is permanently disabled.

You may also notice that the 3D coordinates are obtained from an XML document, where they are encoded using CML (chemical markup language[2]), which is another expression from the family that HTML itself comes from. That form may well last rather longer than earlier formats – still commonly used now – such as .pdb or .mol (for an MDL molfile).

Less successful was the attempt to include buttons which could be used to annotate the structures with highlights. These buttons no longer work and will have to be entirely replaced in the future at some stage.

The final part of the maintenance (which I had probably also done with the earlier versions) was to re-validate the HTML code. Checking that a web page has valid HTML was always a behind-the-scenes activity which I remember doing when constructing the ECTOC conferences also back in 1995 and doing so probably does prolong the longevity of a web page. This requires “tools-of-the-trade” and I use now (and indeed did also back in 1995 or so) an industrial strength HTML editor called BBedit. To this is added an HTML validation tool, the installation of which is described at https://wiki.ch.ic.ac.uk/wiki/index.php?title=It:html5 I re-ran this again^† and so this 2017 version should be valid for a little while longer at least. The page itself now has not just a URL but a persistent version called a DOI (digital object identifier), which is 10.14469/hpc/2133[3]. In theory at least, even if the web server hosting the page itself becomes defunct, the page could – if moved – be found simply from its DOI. The present URL-based hyperlink of course is tied to the server and would not work if the server stopped serving.

To complete this revisitation, I can add here a recent result^‡. Back in 1995, I had obtained the 3D coordinates of mauveine using molecular modelling software (MOPAC) together with a 2D structure drawing package (ChemDraw) because no crystal structure was available. Well, in 2015 such structures were finally published.[4] Twenty years on from the original “hyperactive” models, their crystal structures can be obtained from their assigned DOI, much in the same manner as is done for journal articles: Try DOI: 10.5517/CC1JLGK4[5] or DOI: 10.5517/CC1JLGL5[6].

At some stage, web archaeology might become a fashionable pursuit. Twenty year old Web pages are actually not that common and it would be of interest to chart their gradual decay as security becomes more important and standards evolve and mature. One might hope that at the age of 100, they could still be readable (or certainly rescuable). During this period, the technology used to display 3D models within a web page has certainly changed considerably and may well still do so in the future. Perhaps I will revisit this page in 2037 to see how things have changed!

^†The old code can still be seen at www.ch.ic.ac.uk/motm/perkin-old.html

^‡It should really be postscript 4.

References

O. Casher, G.K. Chandramohan, M.J. Hargreaves, C. Leach, P. Murray-Rust, H.S. Rzepa, R. Sayle, and B.J. Whitaker, "Hyperactive molecules and the World-Wide-Web information system", Journal of the Chemical Society, Perkin Transactions 2, pp. 7, 1995. https://doi.org/10.1039/p29950000007
P. Murray-Rust, and H.S. Rzepa, "Chemical Markup, XML, and the Worldwide Web. 1. Basic Principles", Journal of Chemical Information and Computer Sciences, vol. 39, pp. 928-942, 1999. https://doi.org/10.1021/ci990052b
H. Rzepa, "Molecule of the month: Mauveine.", Imperial College London, 2017. https://doi.org/10.14469/hpc/2133
M.J. Plater, W.T.A. Harrison, and H.S. Rzepa, "Syntheses and Structures of Pseudo-Mauveine Picrate and 3-Phenylamino-5-(2-Methylphenyl)-7-Amino-8-Methylphenazinium Picrate Ethanol Mono-Solvate: The First Crystal Structures of a Mauveine Chromophore and a Synthetic Derivative", Journal of Chemical Research, vol. 39, pp. 711-718, 2015. https://doi.org/10.3184/174751915x14474318419130
Plater, M. John., Harrison, William T. A.., and Rzepa, Henry S.., "CCDC 1417926: Experimental Crystal Structure Determination", 2016. https://doi.org/10.5517/cc1jlgk4
Plater, M. John., Harrison, William T. A.., and Rzepa, Henry S.., "CCDC 1417927: Experimental Crystal Structure Determination", 2016. https://doi.org/10.5517/cc1jlgl5

Tags:10.5517, Advertising & Marketing - NEC, chemical context, chemical markup language, City: London, Commercial REITs - NEC, Company: Chime, Company: Eastman Kodak, Company: First Industrial, digital cameras, Digital Object Identifier, food additives, HTML, Imperial College, industrial strength HTML editor, Java, JavaScript, manufacturing factory, mauveine using molecular modelling software, Person Attributes, Photographic Equipment, Technology/Internet, validation tool, Web, web archaeology, web server, XML, year old Web pages
Posted in Chemical IT, Historical | 1 Comment »

OpenCon (2016)

Friday, November 25th, 2016

Another conference, a Cambridge satellite meeting of OpenCon, and I quote here its mission: “OpenCon is a platform for the next generation to learn about Open Access, Open Education, and Open Data, develop critical skills, and catalyze action toward a more open system of research and education” targeted at students and early career academic professionals. But they do allow a few “late career” professionals to attend as well!

I could only attend the morning session, for which the keynote speaker was Erin McKiernan The presentation was entitled How open science helps researchers succeed, presented as an exploration of an article written by Erin and colleagues with the same name and published in eLife[1] Erin has created a support page at http://whyopenresearch.org to augment the presentation and it’s well worth a visit.

One striking point made was the assertion that Open publications get more citations!

As with many metrics of the impacts of the science publication processes, a citation itself lacks the context of why it was made (see this post for further discussion), but the expectation is that a citation is “good”. From my perspective as a chemist, I did wonder why molecular science was missing from the graphic above. Do open chemistry publications also get more citations?

Which brings me to another point made during the talk, the increasingly controversial aspect of (journal) impact factors and the pressure placed on early career researchers to publish only in those with “high” impact factors, and for their careers to be assessed at least in part based on these and the anticipated “h-index”. The audience was indeed encouraged to go visit http://www.ascb.org/Dora/ (Declaration on Research Assessment, or Putting science into the assessment of research). Have you signed it yet?

Another manifestation of the modern trend to analyse impact metrics is the site Impactstory.org. This is a scripted resource that starts from your ORCID identifier and (optionally) your Twitter account (yes, apparently Tweets matter!) to derive a more complex alternative metric of a individual’s impacts. I had not tried this one before and so I submitted my ORCID and my Twitter account, and watched as the system went off to http://orcid.scopusfeedback.com (Scopus is an Elsevier product) to attempt to create my profile. It ground for quite a while, reporting initially that I had no publications! This was followed by an unexpected error; I did not get my impact back! But this experiment served to highlight one aspect that was discussed at the meeting; data and other research objects. The graphic above refers only to the citation of journal articles, it does not yet include the citation of data. However ORCID DOES include data and research objects as works. And because the granularity of my data and research objects is very fine (one molecule = one work), I have quite a few. In fact ~200,000! ORCID gets to about 8000 before it gives up. I suspect http://orcid.scopusfeedback.com queries ORCID, gets back ~8000 entries and crashes. No doubt the programmer tasked with implementing this resource did not anticipate that any individual could accumulate 8000+ entries! Or probably factor in that the vast majority of these would of course not be journal articles but data. If the site gets back to me about the crash I experienced, I will update here.

Simon Deakin was the next speaker with (open) data as the focus and the worries many researchers have in being scooped by others who have re-used your open data without proper attributions. The discussion teased out that if data is properly deposited, it will indeed have full associated metadata and in particular a date stamp that could help protect an author’s interests.

It was really good to meet so many early career researchers who espouse the open ethos. Perhaps, in 20 years time, another graphic akin to the one above might demonstrate that open researchers get more promotions!

References

E.C. McKiernan, P.E. Bourne, C.T. Brown, S. Buck, A. Kenall, J. Lin, D. McDougall, B.A. Nosek, K. Ram, C.K. Soderberg, J.R. Spies, K. Thaney, A. Updegrove, K.H. Woo, and T. Yarkoni, "How open science helps researchers succeed", eLife, vol. 5, 2016. https://doi.org/10.7554/elife.16800

Tags:Academia, author, chemist, City: Cambridge, Company: Twitter, ELife, Erin McKiernan, keynote speaker, Max Planck Society, programmer, Simon Deakin, Social Media & Networking, speaker, Technology/Internet, Wellcome Trust
Posted in Chemical IT, General | 3 Comments »

Pidapalooza!

Thursday, November 10th, 2016

This is sent from the Pidapalooza event in Reykjavik, Iceland, and is a short collection of notable things I learnt or which attracted my attention.

Firstly, what IS PIDapalooza[1]? Well, it’s all about persistent identifiers, but don’t let that put you off! Another way of putting it is that it’s a way of finding things scientific on the Web. Not just publications, but conferences, social media, teaching, research datasets, infrastructure, grants, organizations, instruments, scientific objects and samples and no doubt much more. These (will) live in an inter-connected eco-system, and so the idea goes, will become an integral part of how a scientist accumulates and disseminates information nowadays. Yes, the conference itself has its own PID: 10.5438/11.0001 and the individual talks will also appear as both a collection and with their own PID in the near future.

The first example comes from WikiData, a collection of carefully curated data, from which can be dynamically assembled say a periodic table of the elements. All the data here is included from other objects, and everything is referenced by its PID. Since it’s all assembled from data, if say the name of element 118 is assigned, then it will automatically be absorbed into this presentation.
This next example proved highly contentious, but is included here anyway. It is templated PIDs, as in http://doi.org/10.5446/12780#t=00:20.00:27 which allows navigation to a particular part of an object referenced by the PID. In this case a time code for a movie, but it might be say an active site in a protein, or a key atom or group in a molecular complex for example. This might never happen (for reasons only the computer scientists currently understand!) but it does show one way in which the humble DOI might evolve.
http://typeregistry.org exists for registering data types. It has almost no chemistry at the moment, but perhaps it should have!
There was a great deal about ORCIDs, and the ways in which uses of this particular PID are evolving. For example, the next big effort is to use the ORCID system for organisations. You will find my ORCID at the top of this post.
PIDs are also being mooted for instruments. The idea is that instrumental capabilities, settings, calibration etc are often an integral part of the data acquisition for a project. So if data is generated using such a device, why not quote its PID in any derived article so that others can more easily replicate a particular experiment in their own laboratory.
A quote by one of the speakers was attributed to Bill Gates around 1997 “We need banking. We don’t need banks anymore” (think how this might apply to 2016. Was he correct?). This was followed by straw men such as: “We need publications. We don’t need publishers anymore”. Or “We need archiving. We don’t need libraries anymore”. Just like Gates’ own quote, the reality is of course far more complex.
And PID fatigue; I hope you are not getting too much of that at the moment.

There are lots more I have learnt which I need to fix/enhance/address in our own experiments in the use of PIDs in chemistry, so I have better get on with it now!

References

ORCID., DataCite., Crossref., and California Digital Library., "PIDapalooza 2016", 2016. https://doi.org/10.5438/11.0001

Tags:active site, Bill Gates, City: Reykjavik, Country: Iceland, scientist, social media, Technology/Internet
Posted in Chemical IT | 1 Comment »

An inorganic double helix: SnIP.

Sunday, October 16th, 2016

After sixty years of searching, the first non-templated double helical carbon-free inorganic molecular structure has been reported.[1] That is so neat that I thought to load the 3D coordinates here for you to interact with and then to explore the prospect of using these coordinates to add some value with e.g. some chiroptical calculations.

I cannot really show you a diagram at this stage, since the article is not gold open access (OA) and hence is copyright protected as © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim. So to progress I have to get the 3D coordinates, which as data cannot be copyrighted and from these generate my own diagram. How did I go about getting this data and how FAIR (Findable, Accessible, Interoperable, Reusable) did I find it? Here I list the actions I went through.

Go to the article[1] via its “landing page” and there I (as a human) navigated to the supporting information. Could automated software have done this I wonder if it were not familiar with the journal?
There I found a PDF file and two MP4 movies. I know movies are unlikely to contain FAIR data, so I try the former. On pages 16-17 you find the space group, cell dimensions and fractional atomic coordinates. Its not really formatted to be “I” (copying/pasting out of PDF can be a challenge) and you have to be familiar with what is a specialised format (neither A nor really R then) and some knowledge of appropriate crystallographic software or procedure to convert Table S1 and S2 into an inter-operable format such as CIF (crystallographic interchange format).
The main article does have the following statement: Further details of the crystal structure investigation(s) may be obtained from the Fachinformationszentrum Karlsruhe, 76344 Eggenstein-Leopoldshafen (Germany), on quoting the depository number CSD-430054. Do they want you to write them a letter?
Well, a bit of Googling reveals https://www.fiz-karlsruhe.de/en/leistungen/kristallographie/kristallstrukturdepot/order-form-request-for-deposited-data.html as the required online link (why could that not be shortened and included in the article?)
This form has not quite yet caught up with modern journal practice. The form stipulates a page number is apparently mandatory, but although this article is fully published, it is too new to have one. I wrote “not assigned yet” and hoped for the best; a “clever” non-human script might always decide the data type of this response is wrong and reject the request! There is no field for the article DOI, which is really all the information that is needed. I pasted that into the “volume number” and again crossed my fingers.
Two days later, whilst awaiting a response to the above, I revisited Table S1/S2 but now armed with a sample CIF file for the space group P 2/c and using a text editor, inserted into it the values found in these tables (~15 minutes). The result is shown below.

[jsmol caption=’SnIP as a helical polymer’ fileurl=’https://www.rzepa.net/blog/wp-content/uploads/2016/10/SnIP.mol’ id=’a3′ commands=’=spin 3;’ debug=’false’]

This double helix is not of the complementary type found in DNA but a concentric one. The inner helix of a chain of P atoms is enclosed by the outer helix (winding in the same sense, anticlockwise as shown above) of a Sn-I-Sn-I chain. Click on the diagram above to load the 3D coordinates and inspect this for yourself.

The article reporting this structure[1] is full of fascinating insights into this new material. Time will no doubt tell whether it has exploitable properties. Meanwhile, when the CIF file arrives from my query above, I will make it available here as properly FAIR data.

References

D. Pfister, K. Schäfer, C. Ott, B. Gerke, R. Pöttgen, O. Janka, M. Baumgartner, A. Efimova, A. Hohmann, P. Schmidt, S. Venkatachalam, L. van Wüllen, U. Schürmann, L. Kienle, V. Duppel, E. Parzinger, B. Miller, J. Becker, A. Holleitner, R. Weihrich, and T. Nilges, "Inorganic Double Helices in Semiconducting SnIP", Advanced Materials, vol. 28, pp. 9783-9791, 2016. https://doi.org/10.1002/adma.201603135

Tags:Chemical IT
Posted in Chemical IT, crystal_structure_mining | 2 Comments »

Chemistry preprint servers (revisited).

Tuesday, August 16th, 2016

This week the ACS announced its intention to establish a “ChemRxiv preprint server to promote early research sharing“. This was first tried quite a few years ago, following the example of especially the physicists. As I recollect the experiment lasted about a year, attracted few submissions and even fewer of high quality. Will the concept succeed this time, in particular as promoted by a commercial publisher rather than a community of scientists (as was the original physicists model)?

The RSC (itself a highly successful commercial publisher) has picked up on this and run its own commentary. You will find quotes from yours truly there, along with Peter Murray-Rust, a long time ardent promoter of community driven open science. One interesting aspect is that the ACS runs around 50 journals, and the decision on whether each will accept preprints for publication will (shortly = next few weeks) be made by the individual editors. I wonder if the eventual list of those supporting the project will bring any surprises (bets on J. Am. Chem. Soc. preprints anyone)?

But I want to pick up on the declared aspiration “to promote early research sharing“. Here I couple research sharing with data sharing. If you share your research, you should also share the data resulting from that research. We are now entering a new era of data sharing (in part as a result of mandation by various funding bodies) and so one has to ask whether a pre-print server will encourage people to create and share FAIR data (data which is findable, accessible, inter-operable and re-usable) as a model to replace the current one of “supporting information” held in enormous PDF files (mostly unFAIR on at least three counts). This question is indeed posed in the RSC commentary. What I would like to see happen are projects such as that described here, which create what were described as “first class research objects”, and which I think amply fulfil the criteria of being FAIR. So, will ChemRxiv preprint servers help promote such FAIR data sharing as part of early research sharing? We will find out soon.

The ACS supports OA (Open Access) sharing of articles, provided the authors pay (or arrange payment of) the appropriate APC or article processing charge. These charges are complex, being subject to various discounts (for example if you as an author are an ACS member or not) but are generally not insignificant (> $1000). I wondered whether preprints might be subject to an APC, and so I asked the ACS. The response was “we don’t anticipate any submission or usages fees at this time“. I think that means free at point of submission, and free at point of readership “at this time“.

Finally, let me now summarise as I understand the current family of “research publications”:

The preprint
The final author version as submitted to a journal
The “version of record” (VoR) as published by the journal
Any FAIR published data associated with the article

All four of these are attempts at “research sharing”. Each may be located in a different location, and each may have its own DOI. And of course we cannot easily know how much overlap there is between each of them. Thus, how might 1-3 differ in terms of the story or “narrative” of scientific claims? Does 4 agree or support 1-3? Does 4 agree with perhaps data subsets contained in 1-3? If keeping abreast of the current research literature is a challenge, imagine having to cope with/reconcile up to four versions of each “publication”!

Lots of food for thought here. We have not heard the last of these themes.

Tags:Academia, Academic publishing, article processing charge, author, Data publishing, Data sharing, food, Grey literature, Open access, Open science, PDF, Peter Murray-Rust, pre-print server, Preprint, preprint server, Public sphere, Publishing, Scholarly communication, Technology/Internet
Posted in Chemical IT | 1 Comment »

Managing (open) NMR data: a working example using Mpublish.

Monday, August 1st, 2016

In March, I posted from the ACS meeting in San Diego on the topic of Research data: Managing spectroscopy-NMR, and noted a talk by MestreLab Research on how a tool called Mpublish in the forthcoming release of their NMR analysis software Mestrenova could help. With that release now out, the opportunity arose to test the system.

I will start by reminding that NMR data associated with a published article is (or should be) openly free: one should not need a subscription to the journal to access it (although one might in order to find it). Now, NMR data as it emerges from a spectrometer is highly sophisticated, comprising a collection of (sometimes) binary proprietary files containing the measured free induction decays (FID). Turning this raw data into an interpretable NMR spectrum, the visual form of the data that so appeals to human beings, is non trivial. This requires what may be highly sophisticated software and that in turn means that it may be a commercial product. Of course there are also examples of non-commercial open software packages that are best-of-breed; indeed in its early life-cycle MestreNova was known as MESTREC before becoming a commercial product. Could one achieve the benefits of both open and fully functional NMR data with no loss from the original instrument coupled with the ability to apply top-quality software for its analysis in an open manner? This is a demonstration of how Mpublish achieves this.

Invoke the URL data.datacite.org/chemical/x-mnpub/10.14469/hpc/1087 from a browser
This action queries the metadata deposited with DataCite for the doi 10.14469/hpc/1087 and retrieves the first instance of any file associated with that dataset that has the format type chemical/x-mnpub. You can directly view this metadata by invoking just data.datacite.org/10.14469/hpc/1087 where you can find both mnpub and mnova formats listed. A command such as data.datacite.org/chemical/x-mnpub/10.14469/hpc/1087 allows the file retrieval to be incorporated into automated workflows based just on the doi and the media type desired. Note my parenthetical comment above about finding data; here you only need its doi to retrieve it!
The URL above downloads a small text file with the suffix .mnpub which contains in essence two components:
- A URL pointing directly to an .mnova file at the repository for which the doi has been issued
- A signature key derived used to verify that the public key of the publisher (the data repository in this instance) was counter-signed by Mestrelab.
If you now download the application program and install it (but for the purpose of this demonstration, ignore any requests to try to license the program. Use it unlicensed) and open the .mnpub file using it, you should get the below.The application program has checked the signature key, and if valid, proceeds to download a full data file (a .mnova file in this case), and to analyze and display it within the program. The data is fully active; it can be manipulated and analysed. Notice in the picture below, the red arrow points to the state of the license, in this case not present.
It is also possible to apply this procedure to the raw data as it emerges from the (Bruker) spectrometer, and compressed into a .zip archive. The MestreNova software will automatically process the contents by applying various default parameters, although the result may not correspond exactly to that present in e.g. the equivalent .mnova file (which may have had specific parameters applied).

It is my hope that anyone who records NMR data and processes it using software such as MestreNova will now consider using the mechanism above to accompany their submitted articles, rather than just automatically pasting a static image of the spectrum into a PDF file as "supporting information". This is part of what is meant by "managed research data" (RDM).

One cannot help but note that many types of scientific instrument nowadays come with bespoke software for analysing the data they produce. Very often this software is unavailable to anyone who has not purchased the instrument itself. To make the data available to others, the processed data and its visual interpretation often have to be reduced, with much consequent information loss, to a lowest common denominator format such as Acrobat/PDF. Here we see a mechanism for avoiding any such information loss whilst enabling, for that dataset only, the full potential for (re)analysing the data. It will be interesting to see if other examples of this model or its equivalent emerge in the near future.

Tags:Acrobat, analysis software, chemical, Chemistry, City: San Diego, format type chemical/x-mnpub, media type, Mestrenova, non-commercial open software packages, Nuclear magnetic resonance, Nuclear magnetic resonance spectra database, Nuclear magnetic resonance spectroscopy, PDF, public key, Science, Scientific method, spectroscopy, Technology/Internet
Posted in Chemical IT | 3 Comments »

How does an OH or NH group approach an aromatic ring to hydrogen bond with its π-face?

Wednesday, June 22nd, 2016

I previously used data mining of crystal structures to explore the directing influence of substituents on aromatic and heteroatomatic rings. Here I explore, quite literally, a different angle to the hydrogen bonding interactions between a benzene ring and OH or NH groups.

aromatic-pi-query

I start by defining a benzene ring with a centroid. The distance is from that centroid to the H atom of an OH or NH group and the angle is C-centroid-H. To limit the search to approach of the OH or NH group more or less orthogonal to the ring, the absolute value of the torsion between the centroid-H vector and the ring C-C vector is constrained to lie between 70-100° (the other constraints being no disorder, no errors, T < 140K and R < 0.05).[1]

aromatic-pi-HN-140

The above shows the results for NH groups interacting with the aromatic ring. The maximum distance 2.8Å is more or less the van der Waals contact distance between a hydrogen and a carbon and as you can see the contacts "funnel down" to the centroid at < 2.1Å. The shortest distance[2] is for ammonium tetraphenylborate, which you can view in e.g. spacefill mode here[3]

390

The other interesting close contact derives from a protonated pyridine[4], which can in turn be viewed here.[5] The main message from the distribution shown above is that as the distances between the HN and the centroid get shorter, the "trajectory" of approach remains orthogonal to the ring (the angle defined above remains ~90°) and heads towards the centroid of the π-cloud. The hotspot itself (red, ~2.6Å) also lies along this trajectory.

Recollect that when I used such hydrogen bonding to see if crystal structures discriminate between the ortho or meta positions of a ring carrying an electron donating substituent, it was the distance from a HO to the carbon that was measured as the discriminator. So it's a faint surprise to find that with HN, and without the necessary perturbation of an electron donating substituent, the intrinsic preference seems to be for the ring centroid and not any specific carbon atom of the ring.

So how about the OH group? There are in fact rather fewer examples, and so the statistics are a bit less clear-cut. But there is a tantalising suggestion that this time, the trajectory is not ~90° but rather less, implying that the destination is no longer the centroid of the π-cloud but one of the carbon atoms of the ring itself. For those who like to "read between the lines" and spot things that are absent rather than present, you may have asked yourself why I did not use NH probes in my earlier post. Well, it appears that the NH group is less effective at e.g. o/p discrimination than is an OH group.

aromatic-pi-OH-140

I can only speculate as to the origins (real or not) of the difference in behaviour between OH and NH groups towards a phenyl π-face. Perhaps it is simply bias in the CSD database? Or might there be electronic origins? Time to end with that phrase "watch this space".

References

H. Rzepa, "How does an OH or NH group approach an aromatic ring to hydrogen bond with its Ï-face?", 2016. https://doi.org/10.14469/hpc/673
T. Steiner, and S.A. Mason, "Short N<sup>+</sup>—H...Ph hydrogen bonds in ammonium tetraphenylborate characterized by neutron diffraction", Acta Crystallographica Section B Structural Science, vol. 56, pp. 254-260, 2000. https://doi.org/10.1107/s0108768199012318
Steiner, T.., and Mason, S.A.., "CCDC 144361: Experimental Crystal Structure Determination", 2000. https://doi.org/10.5517/cc4v6tz
O. Danylyuk, B. Leśniewska, K. Suwinska, N. Matoussi, and A.W. Coleman, "Structural Diversity in the Crystalline Complexes of <i>para</i>-Sulfonato-calix[4]arene with Bipyridinium Derivatives", Crystal Growth & Design, vol. 10, pp. 4542-4549, 2010. https://doi.org/10.1021/cg100831c
Danylyuk, O.., Lesniewska, B.., Suwinska, K.., Matoussi, N.., and Coleman, A.W.., "CCDC 819118: Experimental Crystal Structure Determination", 2011. https://doi.org/10.5517/ccwhc5w

Tags:10.1021, 10.1107, 10.5517, aromaticity, benzene, Centroid, chemical bonding, data mining, Functional groups, Hydrogen bond, Physical organic chemistry, Pyridine, Simple aromatic rings, Supramolecular chemistry
Posted in Chemical IT, crystal_structure_mining | 3 Comments »

Why is the carbonyl IR stretch in an ester higher than in a ketone: crystal structure data mining.

Saturday, June 18th, 2016

In this post, I pondered upon the C=O infra-red spectroscopic properties of esters, and showed three possible electronic influences:

s-cis-ester1

The red (and blue) arrows imply the C-O bond might shorten and the C=O bond would lengthen; the green the reverse. So time for a search of the crystal structure database as a reality check. The query is as follows:

s-cis-ester1

The response shows the bimodal distribution with as expected the s-cis conformation dominating. There is indeed a hint that for the s-cis, the C-O distance is rather shorter than for the s-trans conformation.

s-cis-ester1

Repeating the search, but specifying that the temperature of data acquisition is < 90K, one gets a much clearer indication of the difference in bond lengths.

s-cis-ester1

This alternative representation shows the C-O and the C=O distances, with red indicating s-trans and blue indicating s-cis conformations (T < 140K). The red dots occupy a bottom right cluster for which the C-O distance is longer and the C=O shorter than the corresponding blue cluster.

s-cis-ester1

Again reducing the temperature of data collection to < 90K shows a rather weak inverse correlation between the two distances for eg the blue dots.

s-cis-ester1

A shame however that this database does not hold IR values for the carbonyl stretches. I am sure correlations must exist, but how to get at them (other than manual collection of data).

Tags:Ester, Functional groups, Infra-Red
Posted in Chemical IT, crystal_structure_mining | 1 Comment »

500 chemical twists: a (chalk and cheese) comparison of the impacts of blog posts and journal articles.

Friday, June 3rd, 2016

The title might give it away; this is my 500th blog post, the first having come some seven years ago. Very little online activity nowadays is excluded from measurement and so it is no surprise that this blog and another of my “other” scholarly endeavours, viz publishing in traditional journals, attract such “metrics” or statistics. The h-index is a well-known but somewhat controversial measure of the impact of journal articles; here I thought I might instead take a look at three less familiar ones – one relating to blogging, one specific to journal publishing and one to research data.

First, an update on the accumulated outreach of this blog over this seven-year period. The total number of country domains measured is 190. The African continent still has quite a few areas with zero hits (as does Svalbard, with a population of only 2600 for a land mass area 61,000 km²or 23 km² per person). Given the low blog readership density on the African continent, it would be interesting to find out whether journal readership is any better.

Next, I look at the temporal distribution for individual posts. The first has attracted the highest total; in five years it has had 19,262 views (the diagram below shows the number of views per day). Four others exceed 10,000 and 80 exceed 1000 views.

Of these five, the next is the oldest, going back to 2009. I was very surprised to find such longevity, with the number of views increasing rather than decreasing with the passage of time.

So time now to compare these statistics with the journals. And of course its chalk and cheese. A “view” for a post means someone (or something) accessing the post URL, which is then recorded in the server log. Resolving the URL does at least load the entire content of the post; whether its read or not is of course not recorded. Importantly, if you want to view the content at some later stage, a new “view” has to be made (although some browsers do save a web page and allow offline viewing at a later stage, but I suspect this usage is low). With electronic journal access, it’s rather different. Access to an article is now predominantly via two mechanisms:

From the table of contents (this is somewhat analogous to browsing a blog)
From the article DOI.

Statistics for these two methods are gathered differently. The new CrossRef resource chronograph.labs.crossref.org (CrossRef allocate all journal DOIs) can be used to measure what they call DOI “resolutions”. A DOI resolution however leads one only to what is called the “landing page”, where the interested reader can view the title, the graphical abstract and some other metadata. It does not mean of course that they go on to actually view the article (as HTML, equivalent to the blog above, or probably more often by downloading a PDF file). Here are a few results using this method:

chronograph.labs.crossref.org/dois/10.1021/ja710438j tracks this article[1] which I selected (in part) because it was published in 2008, just slightly before the oldest post above. In fact, the resolutions log only goes back to October 2010, by which time the initial flush of any interest in this article would have subsided and so its nice to see continuing interest (= impact?).
chronograph.labs.crossref.org/dois/10.1002/anie.201409672 [2] totals 208 resolutions, but as the graph below shows, 188 of these were on the first day of publication (Nov 19, 2014), then a few days gap and then about a month of daily resolutions, followed by occasional interest since then.
chronograph.labs.crossref.org/dois/10.1126/science.1181771 dates from 2010[3] and this time shows no peak on the first day, but again steady continuing interest to a current 245 resolutions.

What about the other main journal article access method, not via a DOI but from a table of contents page journal page? A Google search revealed this site: jusp.mimas.ac.uk (JUSP stands for Journal usage statistics portal, which sounded promising). This site collects “COUNTER compliant usage data”. COUNTER (Counting Online Usage of Networked Electronic Resources) is an initiative supported by many journal publishers and it sounds an interesting way of measuring “usage” (as opposed to “views” or “resolutions”; it’s that chalk and cheese again!). I would love to be able to show you some statistics using this resource, but the “small print” caught me out: “JUSP gives librarians a simple way of analysing the value and impact of their electronic journals”. Put simply, I am a researcher, not a librarian. As a researcher I do not have direct access; JUSP is a closed, restricted access (albeit taxpayer-funded) resource. I am discussing this with our head of information resources (who is a librarian) and hope to report back here on the outcome.

Finally research data. This is almost too new to be able to measure, but this resource stats.datacite.org is starting to collect statistics on data resolutions (similar to DOI resolutions).

You can see from the below for Imperial College (in fact this represents the two data repositories that we operate and which I cite here extensively on these blogs) that the resolution at running up to about 200 a month per dataset (more typically ~25 a month), with a total of 5065 resolutions for all items in March 2016 (the blog has ~12,000 views per month).
Figshare is another data repository we have made use of:

So to the summary.

Firstly, we see that I have shown three forms of impact, views, resolutions and usage. If one had statistics on all three, one might then try to see if they are correlated in any way. Even then, normalisation might be a challenge.
Over ~7 years, five posts on this blog have attracted >10,000 views.
Many of the blog posts have a long “finish” (to use a wine tasting term); the views continue regularly and often increase over time.
My analysis of the three journal articles above (and about 15 others) shows that between 50-300 resolutions over a few years is fairly typical (for this researcher at least; I am sure most better known researchers attract far far more).
The temporal distribution for article resolutions and blog views show both can have continuing impact over an extended period. None of the 18 articles I looked at show a significantly increasing impact with time but many of the blog posts do. This tends to suggest that the audiences for each are quite different; researchers for articles and a fair proportion of inquisitive students for the blog?
I may speculate whether a correlation between my article resolutions and my h-index probably might be found, but the article resolution has a fine-grained temporal resolution (allowing a derivative wrt time to be obtained) that is perhaps potentially more valuable than just the coarse h-index integration (an article can of course be cited for both positive and negative reasons!).
Initial analysis for data shows resolutions running at a similar rate to article resolutions. It is not yet possible to correlate data resolutions with article resolutions in which that data is discussed.

References

S.M. Rappaport, and H.S. Rzepa, "Intrinsically Chiral Aromaticity. Rules Incorporating Linking Number, Twist, and Writhe for Higher-Twist Möbius Annulenes", Journal of the American Chemical Society, vol. 130, pp. 7613-7619, 2008. https://doi.org/10.1021/ja710438j
A.E. Aliev, J.R.T. Arendorf, I. Pavlakos, R.B. Moreno, M.J. Porter, H.S. Rzepa, and W.B. Motherwell, "Surfing π Clouds for Noncovalent Interactions: Arenes versus Alkenes", Angewandte Chemie International Edition, vol. 54, pp. 551-555, 2014. https://doi.org/10.1002/anie.201409672
K. Abersfelder, A.J.P. White, H.S. Rzepa, and D. Scheschkewitz, "A Tricyclic Aromatic Isomer of Hexasilabenzene", Science, vol. 327, pp. 564-566, 2010. https://doi.org/10.1126/science.1181771

Tags:Country: Svalbard and Jan Mayen, CrossRef, head of information resources, HTML, Imperial College, librarian, online activity, Online Usage, PDF, researcher, search engines, usage statistics portal
Posted in Chemical IT | 4 Comments »

The geometries of 5-coordinate compounds of group 14 elements.

Monday, May 30th, 2016

This is a follow-up to one aspect of the previous two posts dealing with nucleophilic substitution reactions at silicon. Here I look at the geometries of 5-coordinate compounds containing as a central atom 4A = Si, Ge, Sn, Pb and of the specific formula C₃4AO₂ with a trigonal bipyramidal geometry. This search arose because of a casual comment I made in the earlier post regarding possible cooperative effects between the two axial ligands (the ones with an angle of ~180 degrees subtended at silicon). Perhaps the geometries might expand upon this comment?

The search query is shown above results in 394 hits (May 2016) and is presented with the three variables in the query plotted as below, with the O-4A-O angle indicated by colour (red ~ 180°; blue ~90° and green ~120°).

The cluster at distances of 4A-O of ~1.9Å represents silicon compounds, and tends to suggest that the pair of distances 4A-O are quite similar in value. The angles correspond to a di-axial arrangement around the silicon. In this scenario, one might imagine a stereoelectronic effect similar to the anomeric effect when 4A = C operates and which has the potential to strengthen both di-axial oxygens.
The bulk of the points come at higher 4A-O distances of > 2.1Å and consist mostly of 4A = Sn. There are two a clear-cut distributions, one for angles of ~180° and a separate one for angles of ~90° and both are qualitatively different from the Si distribution. The 180° set corresponds to a di-axial arrangement for the oxygens, whereas the 90° set suggests an axial-equatorial geometry. Both distributions have prominent tails which reveal that as one 4A-O distance shortens, the other lengthens, equivalent to asymmetric anomeric effects at O-C-O.
Noticeably absent are any green points; these would correspond to bond angles of ~120° and hence would correspond to di-equatorial ligands.

This quick exploration (with potential variations that I have not explored above) can be added to the collection of “ten minute explorations” I have described elsewhere.[1]

References

H.S. Rzepa, "Discovering More Chemical Concepts from 3D Chemical Information Searches of Crystal Structure Databases", Journal of Chemical Education, vol. 93, pp. 550-554, 2015. https://doi.org/10.1021/acs.jchemed.5b00346

Tags:Anomer, Anomeric effect, Carbohydrate chemistry, Carbohydrates, Ligand, Molecular geometry, Physical organic chemistry, Stereochemistry, Stereoelectronic effect, Trigonal bipyramidal molecular geometry
Posted in Chemical IT, crystal_structure_mining | 3 Comments »

Henry Rzepa's blog

Archive for the ‘Chemical IT’ Category

Revisiting (and maintaining) a twenty year old web page. Mauveine: The First Industrial Organic Fine-Chemical.

References

An inorganic double helix: SnIP.

References

Managing (open) NMR data: a working example using Mpublish.

How does an OH or NH group approach an aromatic ring to hydrogen bond with its π-face?

References

Why is the carbonyl IR stretch in an ester higher than in a ketone: crystal structure data mining.

500 chemical twists: a (chalk and cheese) comparison of the impacts of blog posts and journal articles.

References

The geometries of 5-coordinate compounds of group 14 elements.

References

Recent Posts

Archives

Blogroll

Meta