Chemical IT « Henry Rzepa's blog

Archive for the ‘Chemical IT’ Category

The Bürgi–Dunitz angle revisited: a mystery?

Tuesday, May 12th, 2015

The Bürgi–Dunitz angle is one of those memes that most students of organic chemistry remember. It hypothesizes the geometry of attack of a nucleophile on a trigonal unsaturated (sp²) carbon in a molecule such as ketone, aldehyde, ester, and amide carbonyl. Its value obviously depends on the exact system, but is generally taken to be in the range 105-107°. A very good test of this approach is to search the crystal structure database (this was how it was originally established[1]).

The search is defined as follows

R can be either H or C
The carbon is constrained to 3-coordinate
The carbonyl oxygen is constrained to 1-coordinate
QA can be any of N, O, S, Cl, F.
QB can be any of H (aldehyde), C (ketone), N (amide), O (ester) or S (thioester).
The distance QA…C is constrained to any intermolecular non-bonded contact ≤ the sum of the van der Waals radii of the two atoms involved and the angle QA…C=O is the Bürgi–Dunitz angle.
I have also added a torsion constraint to specify that Nu has got to be ± 20° from orthogonality to the plane of the carbonyl to allow it to attack the π* orbital.
The crystallographic R factor must be < 0.05, no disorder, no crystallographic errors and the temperature is either any or < 120K.

With no temperature specified, 6994 hits are obtained as below. So the most probable angle (red spot) is ~90°.

One important change to the search is to decrease the temperature to 120K, since structures will have less vibrational noise. The number of hits decreases to 1279, but the most probable angle if anything reduces slightly.

So we have something of a mystery; this crystallographic data shows an angle of approach about 15° less than the oft quoted value. Here are some thoughts:

This search is the average for all types of carbonyl, whereas the original suggestion was constrained to four types of nucleophiles and simple ketones.
This search extends the interacting distance of the nucleophile and the carbon out to 3.5Å which is significantly longer than the normally considered length of ~2.85Å. The hotspots occur at about 3.15Å and not 2.85Å.
There is obviously considerably more data available in 2015 than in 1974, and in particular at low temperature.
The Bürgi–Dunitz angle is in fact one of two defining the trajectory, the other being the Flippin–Lodge angle which defines the displacement towards R or QB. The search above gives no direct information about this angle, but the torsion is related since it is constrained to bisect the C=O to within ± 20° and hence bisect the groups R and QB.
An angle of ≤ 90° does not match to the normal explanation, which is that the nucleophile attacks the π* orbital, each lobe of which “leans out” from the centre at about 105° rather than leaning in at ≤ 90°.
Decreasing the torsion range to ± 5° at 120K gives 592 hits with a hot spot at 95°
Also constraining the distance QA…C to be 0.3Å less than the van der Waals sum at 120K gives 59 hits with a hot spot at 95° and 2.9Å.

Well, to get to the bottom of this will require reducing the scope of both QA and QB, to find which if any of discrete values for these two variables can indeed give an angle of 105-107°. This would make for quite a good student group project; I expect a group of 8 students could sort this out quite quickly!

References

H. B:urgi, J. Dunitz, J. Lehn, and G. Wipff, "Stereochemistry of reaction paths at carbonyl centres", Tetrahedron, vol. 30, pp. 1563-1572, 1974. https://doi.org/10.1016/s0040-4020(01)90678-7

Tags:alkene, Bürgi–Dunitz angle, Carbonyl, Chemistry, Functional groups, Group of Eight, Ketone, Organic chemistry, Organic compounds, Stall
Posted in Chemical IT, crystal_structure_mining | 3 Comments »

The status of blogging as scientific communication.

Sunday, May 10th, 2015

Blogging in chemistry remains something of a niche activity, albeit with a variety of different styles. The most common is commentary or opinion on the scientific literature or conferencing, serving to highlight what their author considers interesting or important developments. There are even metajournals that aggregate such commentaries. The question therefore occasionally arises; should blogs aspire to any form of permanence, or are they simply creatures of their time.

In this blog, as you might have noticed, I take a slightly different tack. One focus is on exploring, perchance in more detail than might be found in the standard text-book, some of the dogmas of chemistry. It happens that occasionally when writing a conventional scientific article, I find myself wishing to cite such sources. This of itself raises interesting issues (such as should one cite what might be considered material that has not been peer-reviewed in the conventional manner) but the most important would be whether one should cite evanescent sources. So this brings me to the topic of this post; can a post be archived in a sense that achieves a greater perceived permanence? Nowadays, permanence tends to be associated with a digital object identifier, or DOI. So one can boil this question down to: can one assign a DOI to a blog post?

Well, if you came to this post via the main page, you may indeed have spotted that some do have a DOI. This is an experiment I have been running with an organisation known as The Winnower, who provide a WordPress extension to archive any individual post and assign it a (CrossRef) DOI. The archived version also includes metadata that points back to the original post.

This archival is not yet perfect. In its current state it does not (yet) capture:

Comments on any post (which could be considered a form of open peer review)
Enhancements such as the links to Jmol/JSmol that I associate with some of the posts
The ORCID identifier, which adds a layer of additional provenance.
We of course do not yet know what the lifetime expectancy archiving organisations will achieve (could it be 100 years for example?).

It does capture the citation list when there is one, and since I include citations to my data sources (for the computations performed in support of many of my posts) the archive is I think accordingly rendered more valuable.

What brought this post on? Well, the Journal of Chemical Education has put out a call for articles on chemical information for a special issue. I decided to contribute by aggregating some of my teaching related posts; indeed individually could perhaps have only appeared here as opposed to a more traditional means of dissemination such as the JCE journal itself. And I wanted to cite them using the DOI rather than simply the URL of the post. It’s an experiment, and one which I do not yet know if anyone else will try. That in some ways is the point of a blog; it is an interesting experimental vehicle!

Tags:author, chemical information, Digital Object Identifier, the JCE journal, the Journal of Chemical Education
Posted in Chemical IT | 5 Comments »

ORCID identifiers galore!

Tuesday, April 21st, 2015

Egon has reminded us that adoption of ORCID (Open researcher and collaborator ID) is gaining apace. It is a mechanism to disambiguate (a Wikipedia term!) contributions in the researcher community and to also remove much of the anonymity (where that is undesirable) that often lurks in social media sites.

This blog is now ORCID-enabled (my ORCID should appear at the top of this post for example, where you should be able to find it as 0000-0002-8635-8390, although the signature thumbnail is obscured by my gravatar, an older system for providing information about someone^‡). We also add ORCIDs to all data depositions[1].

You will not yet find them in many journal articles, which was the whole original point of their introduction. They can however already be used to log into e.g. manuscript submission sites, for example the Journal of Chemoinformatics and I gather many other journal submission systems will probably start using it in 2015. From there it is short step to incorporating them into journal articles routinely.

To counter the slightly awkward association with “being reduced to a mere number”,^† we need to start seeing genuine benefits from its pervasive use. From my point of view, there will be one immediate application. At my university we run a system called Symplectic, which in effect tracks all aspects of one’s research activities, including sourcing online publications. Each time Symplectic thinks it has found e.g. one of my articles, it sends me an email asking me to verify its discovery. I then have to spend 5 minutes or so acknowledging it was written by me, and then adding further links to e.g. instrumental resources used for that research. One of those resources is the high performance computing unit here. But since that resource already incorporates ORCID via e.g. [1], there is no reason why Symplectic need ever bother me with such questions in the future; it could automatically harvest all the information defined by my ORCID.

As with many steps forward, there are often steps back, following the law of unforeseen consequences. Perhaps “identity theft” is one; how easy could it be to use someone else’s ORCID for example? I think however that ORCID is here to stay, and we should explore both the good and the potentially bad aspects of its increasing deployment.

^‡Gravatar offer a list of verified services similar in concept to ORCID. But ORCID itself is not on that list; http://en.gravatar.com/profiles/edit/#verified-services
^†In the dystopian novel We by Yevgeny Zamyatin, there is no way of referring to people save by their given numbers. Wikipedia tells us We is considered as having influenced the later novel 1984 by George Orwell.

References

H.S. Rzepa, "C 6 H 6 Br 2", 2015. https://doi.org/10.14469/ch/191199

Tags:Chemoinformatics, George Orwell, Gravatar, journal submission systems, online publications, Open researcher, researcher, social media sites, Yevgeny Zamyatin
Posted in Chemical IT | 3 Comments »

A new way of exploring the directing influence of (electron donating) substituents on benzene.

Friday, April 17th, 2015

The knowledge that substituents on a benzene ring direct an electrophile engaged in a ring substitution reaction according to whether they withdraw or donate electrons is very old.[1] Introductory organic chemistry tells us that electron donating substituents promote the ortho and para positions over the meta. Here I try to recover some of this information by searching crystal structures.

I conducted the following search:

Any electron donating group as a ring substituent, defined by any of the elements N, O, F, S, Cl, Br.
A distance from the H of an OH fragment (as a hydrogen bonder to the aryl ring) to the ortho position relative to the electron donating group.
A similar distance to the meta position.
The |torsion angle| between the aryl plane and the C…H axis to be constrained to 90° ± 20.
Restricting the H…C contact distance to the van der Waals sum of the radii -0.3Å (to capture only the stronger interactions)
The usual crystallographic requirements of R < 0.1, no disorder, no errors and normalised H positions.

The result of such a search is seen below. The red line indicates those hits where the distance from the H to the ortho and meta positions is equal. In the top left triangle, the distance to ortho is shorter than to meta (and the converse in the bottom right triangle). You can see that both the red hot-spot and indeed the majority of the structures conform to ortho direction (of π-facial ) hydrogen bonding.

Here is a little calculation, optimising the position that HBr adopts with respect to bromobenzene. You can see that the distance discrimination towards ortho is ~ 0.17Å, a very similar value to the “hot-spot” in the diagram above.

This little search of course has hardly scratched the surface of what could be done. Changing eg the OH acceptor to other electronegative groups. Restricting the wide span of N, O, F, S, Cl, Br. Probing rings bearing two substituents. What of the minority of points in the bottom right triangle; are they true exceptions or does each have extenuating circumstances? Why do many points actually lie on the diagonal? Can one correlate the distances with the substituent? Is there a difference between intra and intermolecular H-bonds? What of electron withdrawing groups?

The above search took perhaps 20 minutes to define and optimise, and it gives a good statistical overview of this age-old effect. It is something every new student of organic chemistry can try for themselves! If you run an introductory course in organic aromatic chemistry, or indeed a laboratory, try to see what your students come up with!

References

H.E. Armstrong, "XXVIII.—An explanation of the laws which govern substitution in the case of benzenoid compounds", J. Chem. Soc., Trans., vol. 51, pp. 258-268, 1887. https://doi.org/10.1039/ct8875100258

Tags:above search, Aromatic compounds, aromaticity, Birch reduction, Chemistry, electron donating, Electrophile, Electrophilic aromatic substitution, Ether, Functional groups, little search, Organic chemistry, Physical organic chemistry, Substitution reactions
Posted in Chemical IT, crystal_structure_mining | 1 Comment »

Goldilocks Data.

Wednesday, April 8th, 2015

Last August, I wrote about data galore, the archival of data for 133,885 (134 kilo) molecules into a repository, together with an associated data descriptor[1] published in the new journal Scientific Data. Since six months is a long time in the rapidly evolving field of RDM, or research data management, I offer an update in the form of some new observations.

Firstly, 131 kilo molecules are now offered in a new different form; http://gdb.koitz.info/gdbrowse/ and it is worth comparing the differences between the presentation of the two sets of otherwise identical data.

The original archive had a single assigned DOI[2] from where you could download a ZIP file to be unpacked and navigated on your own computer. The exposed metadata for the deposition (by which I mean in this case, metadata registered with DataCite, the registration authority used by Figshare) was limited to general information about the 133,885 molecules such as the authorship and license. The granularity is coarse, not extending to descriptions of individual molecules.
The new version forgoes the ZIP archive, replacing it with a proper database (based on MongoDB) containing information about 130,832 molecules. This allows one to search the data at the individual molecule level (formula, InChI descriptor, mass, etc) using the tools provided. To the end-user, this is much more useful; the data is both discoverable and re-usable.

This is no overlap between these two presentations of the data. There also appears to be no API (application programming interface) which might allow one to write code to construct one’s own searches. The apparent absence of an API also means that really only a human navigating the set menus can discover and re-use that data; the data might not be mineable by a machine for example. The absence of an API is not that unusual, only some of the best known molecular databases offer this; the RCSB Protein Data Bank is a good example. More significantly, each instance of such a molecule-based database is likely to have its own way of accessing the data and even if a documented API were available, one would still have to write specific code for each such resource.

So the first bowl contains what I suggest is cold porridge and the second is perhaps equivalent to a table d’hôte menu. Does Goldilocks have a third option? I would argue yes, she could have:

We recently published data for 158 kilo molecules[3] for which each molecule carries its own metadata. That metadata can be queried using any search engine that supports the basic metadata standards:
http://search.datacite.org/ui?q=has_media:true&fq=prefix:10.14469
is an example. Or armed with the metadata schema, one could also write one’s own search engine and in theory at least, that code should serve to query ANY repository that supports these standards.

You could argue that all that has happened is one has simply replaced a specific database API (if it exists) with a specific metadata schema. But these metadata schemas are controlled standards, the components of which should be self-describing (and one can see the schema components by invoking the link above).

As the archival of data (RDM) becomes increasingly important, communities will have to start making decisions about which flavour of data-porridge to offer Goldilocks. For molecular data at least, I suggest the third option is highly desirable and perhaps likely to be the most persistent. Parochial databases very much depend on a specialised team of people to maintain them in perpetuity, which I gather now means 20 years. At very least, we should start to have a debate about how the future will evolve. Let us not leave this debate merely in the hands of a small number of large organisations that are likely to make decisions based on their own business models. After all, it starts off at least as our data, not theirs! Arguably, we as authors have now largely lost control over how our stories (journal articles) are managed, let us not cede the same for data.

References

R. Ramakrishnan, P.O. Dral, M. Rupp, and O.A. von Lilienfeld, "Quantum chemistry structures and properties of 134 kilo molecules", Scientific Data, vol. 1, 2014. https://doi.org/10.1038/sdata.2014.22
Raghunathan Ramakrishnan., P. Dral, P.O. Dral, M. Rupp, and O. Anatole Von Lilienfeld., "Quantum chemistry structures and properties of 134 kilo molecules", 2014. https://doi.org/10.6084/m9.figshare.978904
Y. Zhang, H.S. Rzepa, J.J.P. Stewart, P. Murray-Rust, M.J. Harvey, N. Mason, A. McLean, and Imperial College High Performance Computing Service., "Revised Cambridge NCI database", 2014. https://doi.org/10.14469/ch/2

Tags:API, RCSB Protein Data Bank, search engine
Posted in Chemical IT | No Comments »

How-open-is-it?

Thursday, February 12th, 2015

The title of this post refers to the site http://howopenisit.org/ which is in effect a license scraper for journal articles. In the past 2-3 years in the UK, we have been able to make use of grants to our university to pay publishers to convert our publications into Open Access (also called GOLD). I thought I might check out a few of my recent publications to see what http://howopenisit.org/ makes of them.

This was catalysed by an article which revealed that UK universities spent £9M in 2014 on the purchase of such openness. One of the “challenges” identified is the difficulty in converting such payment into an article that actually is open. Apparently, publishers make not a few mistakes in their quality controls in ensuring it is so, relying on irate authors informing them of such mistakes. This can be quite tedious to do, and so a tool that largely automates this checking is most useful. So here we go.

doi: 10.1039/C3SC53416B[1] This is a good start. The output looks like thus. Green is GOLD so to speak. Well done the Royal Society of Chemistry.
doi: 10.1021/ci500302p[2] from the ACS this time. Pink, but at least free to read. Quite what that means is less certain. There is an adage, “the right to read means the right to mine” presumably means this article is OK to mine, but then why does it not say so?
doi: 10.1002/anie.201405238[3]. Pink again, but the colour now simply means no information about the license could be obtained from the publisher (Wiley).

I ran a few more and sadly the third of the above, “no information” was the most common response. And the legal response is invariably that if no information can be obtained, the answer is NO, it is not free to read. In other words, not providing a license is just as bad as saying it’s not free to read.

Article aggregators such as Symplectic do not yet perform the service above (which to be fair is still in beta), and so I cannot yet check how many GOLD articles there are to my name. I think it should be about 8, and I might add that the time I have to spend in arranging for this to happen is not negligible. Hell, I could probably have found a few more reactions mechanism in the time I have spent on achieving GOLD. This is one of those topics which would be interesting to revisit say in five years time to see how the world has changed. So I leave this little time capsule and will update it then!

References

A. Armstrong, R.A. Boto, P. Dingwall, J. Contreras-García, M.J. Harvey, N.J. Mason, and H.S. Rzepa, "The Houk–List transition states for organocatalytic mechanisms revisited", Chem. Sci., vol. 5, pp. 2057-2071, 2014. https://doi.org/10.1039/c3sc53416b
M.J. Harvey, N.J. Mason, and H.S. Rzepa, "Digital Data Repositories in Chemistry and Their Integration with Journals and Electronic Notebooks", Journal of Chemical Information and Modeling, vol. 54, pp. 2627-2635, 2014. https://doi.org/10.1021/ci500302p
A. Jana, I. Omlor, V. Huch, H.S. Rzepa, and D. Scheschkewitz, "N‐Heterocyclic Carbene Coordinated Neutral and Cationic Heavier Cyclopropylidenes", Angewandte Chemie International Edition, vol. 53, pp. 9953-9956, 2014. https://doi.org/10.1002/anie.201405238

Tags:ACS, GBP, Royal Society of Chemistry, United Kingdom
Posted in Chemical IT, General | No Comments »

A convincing example of the need for data repositories. FAIR Data.

Thursday, January 15th, 2015

Derek Lowe in his In the Pipeline blog is famed for spotting unusual claims in the literature and subjecting them to analysis. This one is entitled Odd Structures, Subjected to Powerful Computations. He looks at this image below, and finds the structures represented there might be a mistake, based on his considerable experience of these kinds of molecules. I expect he had a gut feeling within seconds of seeing the diagram.

Indeed, so, you will now find that the authors have apparently acknowledged a mistake[1]. My interest piqued, I went to the article, and immediately tracked down the supplementary information. Surely, if these molecules had been subjected to powerful computation, this supporting information should contain coordinates of some kind that would allow a correlation with the 2D structural representation shown above. I have just returned from FORCE2015, a three-day event in Oxford. From the detailed agenda, you can see that a lot of the conference centered around what is called FAIR Data. FAIR stands for:

Findable
Accessible
Interoperable
Re-usable

So I then set out to find if the supplementary information WAS FAIR. Well, check for yourself (unlike the narrative article, the data should be accessible outside of the paywall, i.e. you should not need a subscription to access it). It is certainly big, running out to 45 pages, in the form of a paginated PDF file (the norm). The table of contents does not refer to data as such, but it does quote 25 figures, from which you might just be able to extract some data. But no molecules as such! So:

No data is findable, although the PDF which might contain it is reasonably so.
The data is not easily accessible,
let alone interoperable (thus many of the charts were probably created using spreadsheet software, but the source files for these are not available),
and not-reusable (certainly not without loss and possible error in any attempt at capture).

I think it fair to say that the data for these powerful computations are not FAIR. Had we had at least some coordinates (the computations involved molecular mechanics based dynamics simulations, which certainly involve manipulating atom coordinates in some form) then the structures shown in the figure above could be checked, and perhaps even the apparent error would have been quickly spotted.

Derek does not make the point about FAIR data (to be fair, he was not at FORCE2015) and so I will make the case. If you are reporting a computational model or simulation, there is no excuse for not supplying FAIR data to accompany it. If the data is FAIR it will be inter-operable and re-usable. And this will instantly allow anyone to check e.g. the structures above. You would not need to have Derek’s vast experience and instinct (although having it is also helps). And of course we might presume that there were 2-3 referees that also looked at the article, and presumably none of them requested FAIR data.

Oh, if you are interested in my take on FAIR data, I gave a talk about that at FORCE2015, which you are welcome to view; I hope it constitutes a FAIR talk!

References

K.J. Kohlhoff, D. Shukla, M. Lawrenz, G.R. Bowman, D.E. Konerding, D. Belov, R.B. Altman, and V.S. Pande, "Cloud-based simulations on Google Exacycle reveal ligand modulation of GPCR activation pathways", Nature Chemistry, vol. 6, pp. 15-21, 2013. https://doi.org/10.1038/nchem.1821

Tags:created using spreadsheet software, Derek Lowe, Oxford, PDF, simulation
Posted in Chemical IT, General | No Comments »

Data discoverability

Wednesday, December 17th, 2014

I have written earlier about the Amsterdam Manifesto. That arose out of a conference on the theme of “beyond the PDF“, with one simple question at its heart: what can be done to liberate data from containers it was not designed to be in? The latest meeting on this topic will happen in January 2015 as FORCE2015.

The format is suitably modern, starting with a Hackathon, and then two days of talks, posters and demos. We will be presenting both a talk and a demo. In the spirit of emancipated data, we have placed the latter into a container that is most certainly not a PDF. That demo has been archived, and there assigned a DOI[1] and for good measure transcluded into this post in its entirety. We hope this demonstrates that such “containers” can be usefully moved around to where they might be needed. I should say that the core of this demo is not just the data, but the metadata associated with it. Metadata renders that data discoverable (mineable) and its usage measurable.^†

I hope to report here on anything interesting happening at the FORCE2015 event.

^‡The format of this blog is a tiny bit too narrow for the demo to fit comfortably. Go see it here[1] and “enlarge” the view for a better experience.

^†Full details of this are in preparation.

References

H.S. Rzepa, N. Mason, A. Mclean, and M. Harvey, "Interoperability for Data Repositories. Machine Methods for Retrieving Data for Display or Mining Utilising Persistent (data-DOI) Identifiers", 2014. https://doi.org/10.6084/m9.figshare.1266197

Tags:Chemical IT
Posted in Chemical IT | 4 Comments »

Blasts from the past. A personal Web presence: 1993-1996.

Saturday, November 1st, 2014

Egon Willighagen recently gave a presentation at the RSC entitled “The Web – what is the issue” where he laments how little uptake of web technologies as a “channel for communication of scientific knowledge and data” there is in chemistry after twenty years or more. It caused me to ponder what we were doing with the web twenty years ago. Our HTTP server started in August 1993, and to my knowledge very little content there has been deleted (it’s mostly now just hidden). So here are some ancient pages which whilst certainly not examples of how it should be done nowadays, give an interesting historical perspective. In truth, there is not much stuff that is older out there!

This page was written in May 1994 as a journal article, although it did have to be then converted into a Word document to actually be submitted.[1] Because it introduced hyperlinks to a chemical audience, we wanted to illustrate these in the article itself! Hence permission was obtained from the RSC for an HTML version to be “self-archived” on our own servers where the hyperlinks were supposed to work (an early example of Open Access publishing!). I say supposed because quite a few of them have now “decayed”. We were aware of course that this might happen, but back in 1994, no-one knew how quickly this would happen. What is interesting is that the HTML itself (written by hand then) has survived pretty well! I will leave you to decide how much the message itself has decayed.
This HTML actually predates the above; it was written around November 1993 and represented the very first lecture notes I converted into this form (on the topic of NMR spectroscopy). A noteworthy aspect is the scarce use of colour images. At the start of 1994, the bandwidth available on our campus was pretty limited (the switches were 10 Mbps only) and a request went out to reduce the bit-depth of any colour images to 4-bits to help conserve that bandwidth! I rather doubt anyone took much notice however, and the policy was forgotten just a few months later.
In 1996, I had two visitors to the group, Guillaume Cottenceau, a french undergraduate student, and Darek Bogdal, a Polish researcher who wanted to learn some HTML. Together they produced this, which was an interactive tutorial to accompany the NMR lecture notes previously mentioned. These pages introduce the Java applet (yes, it was very new in 1996), which Guillaume had written and which Darek then made use of. And hey, what do you know, the applet still works (although you might have to coerce your browser into accepting an unsigned applet).
Here is a programming course that I had been running with Bryan Levitt for a few years, now recast into HTML web pages some time in 1994-5. This particular project I still hold dear, since it expanded upon the NMR lectures by getting the students to synthesize a FID (free induction decay) using the program they wrote, and then perform a Fourier Transform on it. I even encouraged students to present their results in HTML (I cannot now remember how many did). This link is to the computing facilities we offered students in 1994 for this project, ah those were the times! In 1996, the programming course was replaced by one on chemical information technologies, and here students were most certainly expected to write HTML. Some of the best examples are still available. And to illustrate how things happen in cycles, that course itself is now gone to be replaced by, yes, a programming course (but using Python, and not the original Fortran).
In tracking down the materials for the programming course described above, I re-discovered something far older. It is linked here and is (some of) the Fortran source code I wrote as a PhD student in ~~1974~~ 1972. So I will indulge in a short digression. My Ph.D. involved measuring rate constants, and the accepted method for analysing the raw kinetic data was using graph paper. For first order rate behaviour, this required one to measure a value at time=∞, which is supposed to be measured after ten half-lives. I was too impatient to wait that long, and worked out that a non-linear least squares analysis did not require the time=∞ value; indeed this value could be predicted accurately from the earlier measurements. So in 1974, I wrote this code to do this; no graph paper for me! Also for good measure is a least squares analysis of the Eyring equation. And you get proper standard deviations for your errors. In retrospect I should have commercialised this work, but in 1974, almost no-one paid money for software! What a change since then. I must try recompiling this code to see if it still works! And for good measure, here is a Huckel MO program I wrote in 1984 or earlier (I did compile this recently and found it works) and here is a little program for visualising atomic orbitals.
In January 1994, I was asked to create a web page for the WATOC organisation. This certainly predated the web sites for e.g. the RSC, the ACS, indeed famous sites such as the BBC and Tesco (a large supermarket chain) which only started up in mid 1994. The WATOC site itself moved a few years ago.
This is one of those wonderfully naive things I started in 1994, and which did not last long (in my hands). Nowadays, the concept lives on as MOOCs. Note again the almost complete expiry of the hyperlinks.
This is a project we also started in 1994, Virtual reality[2],[3]. The idea was that if HTML was text-markup, VRML was going to be 3D markup. VRML itself never quite caught on, but it is having a new life as a 3D printing language!
And by 1995, I felt confident enough in my ability to (edit) HTML, that we started a virtual conference in organic chemistry (we did four of them in the end). I remember the first one involved contributors sending me a Word version of their poster, and I did all the work in converting it into HTML. Such virtual conferences still run, but in truth most participants still prefer to travel long distances to go drink a beer with their chums, rather than hack HTML.

I am going to stop now, since this is far too much wallowing in the past. But at least all this stuff is not (yet) lost to posterity.

References

H.S. Rzepa, B.J. Whitaker, and M.J. Winter, "Chemical applications of the World-Wide-Web system", Journal of the Chemical Society, Chemical Communications, pp. 1907, 1994. https://doi.org/10.1039/c39940001907
O. Casher, and H.S. Rzepa, "Chemical collaboratories using World-Wide Web servers and EyeChem-based viewers", Journal of Molecular Graphics, vol. 13, pp. 268-270, 1995. https://doi.org/10.1016/0263-7855(95)00053-4
O. Casher, C. Leach, C.S. Page, and H.S. Rzepa, "Advanced VRML based chemistry applications: a 3D molecular hyperglossary", Journal of Molecular Structure: THEOCHEM, vol. 368, pp. 49-55, 1996. https://doi.org/10.1016/s0166-1280(96)90535-7

Tags:3D printing language, ACS, BBC, Bryan Levitt, chemical audience, chemical information technologies, Darek Bogdal, Fortran, Guillaume Cottenceau, HTML, http, Java, large supermarket chain, personal Web presence, Python, researcher, spectroscopy, Tesco, Virtual reality, WATOC, web technologies
Posted in Chemical IT, Historical | No Comments »

More simple experiments with crystal data. The pyramidalisation of nitrogen.

Saturday, November 1st, 2014

We are approaching 1 million recorded crystal structures (actually, around 716,000 in the CCDC and just over 300,00 in COD). One delight with having this wealth of information is the simple little explorations that can take just a minute or so to do. This one was sparked by my helping a colleague update a set of interactive lecture demos dealing with stereochemistry. Three of the examples included molecules where chirality originates in stereogenic centres with just three attached groups. An example might be a sulfoxide, for which the priority rule is to assign the lone pair present with atomic number zero. The issue then arises as to whether this centre is configurationally stable, i.e. does it invert in an umbrella motion slowly or quickly. My initial intention was to see if crystal structures could cast any light at all on this aspect.

Central atom has three bonded atoms as C, of which either all three must themselves have four attached atoms, or one can have just three attached atoms as shown above, along with acyclic character for the three bonds attached to the central atom, R ≤ 0.1, not disordered and no errors.

Using the search definition above for R₃N one gets the result below. It shows a hot spot for an angle subtended at the nitrogen of ~111°, indicating a pyramidal nitrogen. But how easily is that perturbed? (which is almost like asking how easily can it invert its configuration?).

A perturbation can be applied by changing just one of the attached carbons as having three attached atoms of its own (sp² hybridised). The response is that the hot spot moves to 120° (below). Of course now this includes compounds such as amides and the like. But we have learnt that it takes just one such attached sp² hybridised carbon to planarize an adjacent nitrogen.

The control experiment will now be to apply the same test to a P. The hot spot moves from ~99° (P with three sp³ carbons attached) to ~103° (P with two sp³ and one sp²). This reminds us that the overlap and energy-match between a p-orbital on carbon to an adjacent p-orbital on nitrogen is good, whereas the same overlap/energy match to a p-orbital on P is significantly less so.

One gets the same result when the central atom is S; the hotspot moves from ~102° to ~105°. Unfortunately, not enough compounds are known for a tri-substituted oxygen compounds to see how this element responds.

My point in illustrating these statistics is to show how much text-book chemistry can be recovered simply by a few quick explorations of crystal structures. One could even argue that much introductory chemistry could be taught by reference to the statistics of such structures.

Tags:energy, overlap/energy match, search definition
Posted in Chemical IT, crystal_structure_mining | No Comments »

Henry Rzepa's blog

Archive for the ‘Chemical IT’ Category

The status of blogging as scientific communication.

A new way of exploring the directing influence of (electron donating) substituents on benzene.

References

Goldilocks Data.

References

How-open-is-it?

References

A convincing example of the need for data repositories. FAIR Data.

References

Data discoverability

References

Blasts from the past. A personal Web presence: 1993-1996.

References

More simple experiments with crystal data. The pyramidalisation of nitrogen.

Recent Posts

Archives

Blogroll

Meta