Henry Rzepa's blog

500 chemical twists: a (chalk and cheese) comparison of the impacts of blog posts and journal articles.

June 3rd, 2016

The title might give it away; this is my 500th blog post, the first having come some seven years ago. Very little online activity nowadays is excluded from measurement and so it is no surprise that this blog and another of my “other” scholarly endeavours, viz publishing in traditional journals, attract such “metrics” or statistics. The h-index is a well-known but somewhat controversial measure of the impact of journal articles; here I thought I might instead take a look at three less familiar ones – one relating to blogging, one specific to journal publishing and one to research data.

First, an update on the accumulated outreach of this blog over this seven-year period. The total number of country domains measured is 190. The African continent still has quite a few areas with zero hits (as does Svalbard, with a population of only 2600 for a land mass area 61,000 km²or 23 km² per person). Given the low blog readership density on the African continent, it would be interesting to find out whether journal readership is any better.

Next, I look at the temporal distribution for individual posts. The first has attracted the highest total; in five years it has had 19,262 views (the diagram below shows the number of views per day). Four others exceed 10,000 and 80 exceed 1000 views.

Of these five, the next is the oldest, going back to 2009. I was very surprised to find such longevity, with the number of views increasing rather than decreasing with the passage of time.

So time now to compare these statistics with the journals. And of course its chalk and cheese. A “view” for a post means someone (or something) accessing the post URL, which is then recorded in the server log. Resolving the URL does at least load the entire content of the post; whether its read or not is of course not recorded. Importantly, if you want to view the content at some later stage, a new “view” has to be made (although some browsers do save a web page and allow offline viewing at a later stage, but I suspect this usage is low). With electronic journal access, it’s rather different. Access to an article is now predominantly via two mechanisms:

From the table of contents (this is somewhat analogous to browsing a blog)
From the article DOI.

Statistics for these two methods are gathered differently. The new CrossRef resource chronograph.labs.crossref.org (CrossRef allocate all journal DOIs) can be used to measure what they call DOI “resolutions”. A DOI resolution however leads one only to what is called the “landing page”, where the interested reader can view the title, the graphical abstract and some other metadata. It does not mean of course that they go on to actually view the article (as HTML, equivalent to the blog above, or probably more often by downloading a PDF file). Here are a few results using this method:

chronograph.labs.crossref.org/dois/10.1021/ja710438j tracks this article[1] which I selected (in part) because it was published in 2008, just slightly before the oldest post above. In fact, the resolutions log only goes back to October 2010, by which time the initial flush of any interest in this article would have subsided and so its nice to see continuing interest (= impact?).
chronograph.labs.crossref.org/dois/10.1002/anie.201409672 [2] totals 208 resolutions, but as the graph below shows, 188 of these were on the first day of publication (Nov 19, 2014), then a few days gap and then about a month of daily resolutions, followed by occasional interest since then.
chronograph.labs.crossref.org/dois/10.1126/science.1181771 dates from 2010[3] and this time shows no peak on the first day, but again steady continuing interest to a current 245 resolutions.

What about the other main journal article access method, not via a DOI but from a table of contents page journal page? A Google search revealed this site: jusp.mimas.ac.uk (JUSP stands for Journal usage statistics portal, which sounded promising). This site collects “COUNTER compliant usage data”. COUNTER (Counting Online Usage of Networked Electronic Resources) is an initiative supported by many journal publishers and it sounds an interesting way of measuring “usage” (as opposed to “views” or “resolutions”; it’s that chalk and cheese again!). I would love to be able to show you some statistics using this resource, but the “small print” caught me out: “JUSP gives librarians a simple way of analysing the value and impact of their electronic journals”. Put simply, I am a researcher, not a librarian. As a researcher I do not have direct access; JUSP is a closed, restricted access (albeit taxpayer-funded) resource. I am discussing this with our head of information resources (who is a librarian) and hope to report back here on the outcome.

Finally research data. This is almost too new to be able to measure, but this resource stats.datacite.org is starting to collect statistics on data resolutions (similar to DOI resolutions).

You can see from the below for Imperial College (in fact this represents the two data repositories that we operate and which I cite here extensively on these blogs) that the resolution at running up to about 200 a month per dataset (more typically ~25 a month), with a total of 5065 resolutions for all items in March 2016 (the blog has ~12,000 views per month).
Figshare is another data repository we have made use of:

So to the summary.

Firstly, we see that I have shown three forms of impact, views, resolutions and usage. If one had statistics on all three, one might then try to see if they are correlated in any way. Even then, normalisation might be a challenge.
Over ~7 years, five posts on this blog have attracted >10,000 views.
Many of the blog posts have a long “finish” (to use a wine tasting term); the views continue regularly and often increase over time.
My analysis of the three journal articles above (and about 15 others) shows that between 50-300 resolutions over a few years is fairly typical (for this researcher at least; I am sure most better known researchers attract far far more).
The temporal distribution for article resolutions and blog views show both can have continuing impact over an extended period. None of the 18 articles I looked at show a significantly increasing impact with time but many of the blog posts do. This tends to suggest that the audiences for each are quite different; researchers for articles and a fair proportion of inquisitive students for the blog?
I may speculate whether a correlation between my article resolutions and my h-index probably might be found, but the article resolution has a fine-grained temporal resolution (allowing a derivative wrt time to be obtained) that is perhaps potentially more valuable than just the coarse h-index integration (an article can of course be cited for both positive and negative reasons!).
Initial analysis for data shows resolutions running at a similar rate to article resolutions. It is not yet possible to correlate data resolutions with article resolutions in which that data is discussed.

References

S.M. Rappaport, and H.S. Rzepa, "Intrinsically Chiral Aromaticity. Rules Incorporating Linking Number, Twist, and Writhe for Higher-Twist Möbius Annulenes", Journal of the American Chemical Society, vol. 130, pp. 7613-7619, 2008. https://doi.org/10.1021/ja710438j
A.E. Aliev, J.R.T. Arendorf, I. Pavlakos, R.B. Moreno, M.J. Porter, H.S. Rzepa, and W.B. Motherwell, "Surfing π Clouds for Noncovalent Interactions: Arenes versus Alkenes", Angewandte Chemie International Edition, vol. 54, pp. 551-555, 2014. https://doi.org/10.1002/anie.201409672
K. Abersfelder, A.J.P. White, H.S. Rzepa, and D. Scheschkewitz, "A Tricyclic Aromatic Isomer of Hexasilabenzene", Science, vol. 327, pp. 564-566, 2010. https://doi.org/10.1126/science.1181771

Tags: Country: Svalbard and Jan Mayen, CrossRef, head of information resources, HTML, Imperial College, librarian, online activity, Online Usage, PDF, researcher, search engines, usage statistics portal
Posted in Chemical IT | 4 Comments »

The geometries of 5-coordinate compounds of group 14 elements.

May 30th, 2016

This is a follow-up to one aspect of the previous two posts dealing with nucleophilic substitution reactions at silicon. Here I look at the geometries of 5-coordinate compounds containing as a central atom 4A = Si, Ge, Sn, Pb and of the specific formula C₃4AO₂ with a trigonal bipyramidal geometry. This search arose because of a casual comment I made in the earlier post regarding possible cooperative effects between the two axial ligands (the ones with an angle of ~180 degrees subtended at silicon). Perhaps the geometries might expand upon this comment?

The search query is shown above results in 394 hits (May 2016) and is presented with the three variables in the query plotted as below, with the O-4A-O angle indicated by colour (red ~ 180°; blue ~90° and green ~120°).

The cluster at distances of 4A-O of ~1.9Å represents silicon compounds, and tends to suggest that the pair of distances 4A-O are quite similar in value. The angles correspond to a di-axial arrangement around the silicon. In this scenario, one might imagine a stereoelectronic effect similar to the anomeric effect when 4A = C operates and which has the potential to strengthen both di-axial oxygens.
The bulk of the points come at higher 4A-O distances of > 2.1Å and consist mostly of 4A = Sn. There are two a clear-cut distributions, one for angles of ~180° and a separate one for angles of ~90° and both are qualitatively different from the Si distribution. The 180° set corresponds to a di-axial arrangement for the oxygens, whereas the 90° set suggests an axial-equatorial geometry. Both distributions have prominent tails which reveal that as one 4A-O distance shortens, the other lengthens, equivalent to asymmetric anomeric effects at O-C-O.
Noticeably absent are any green points; these would correspond to bond angles of ~120° and hence would correspond to di-equatorial ligands.

This quick exploration (with potential variations that I have not explored above) can be added to the collection of “ten minute explorations” I have described elsewhere.[1]

References

H.S. Rzepa, "Discovering More Chemical Concepts from 3D Chemical Information Searches of Crystal Structure Databases", Journal of Chemical Education, vol. 93, pp. 550-554, 2015. https://doi.org/10.1021/acs.jchemed.5b00346

Tags: Anomer, Anomeric effect, Carbohydrate chemistry, Carbohydrates, Ligand, Molecular geometry, Physical organic chemistry, Stereochemistry, Stereoelectronic effect, Trigonal bipyramidal molecular geometry
Posted in Chemical IT, crystal_structure_mining | 3 Comments »

An alternative mechanism for nucleophilic substitution at silicon using a tetra-alkyl ammonium fluoride.

May 27th, 2016

In the previous post, I explored the mechanism for nucleophilic substitution at a silicon centre proceeding via retention of configuration involving a Berry-like pseudorotation. Here I probe an alternative route involving inversion of configuration at the Si centre. Both stereochemical modes are known to occur, depending on the leaving group, solvent and other factors.[1],[2],[3]

This alternative involves attack by F^– along the axial trajectory of the trigonal bipyramidal Si centre, with the OR group occupying the other axial position (TS1). In order to prepare the OR group for elimination with inversion of stereochemistry, the ion-pair complex has to reorganise (a process replacing the previous Berry pseudorotation necessary with for stereochemical retention) via TS2. And finally the OR is eliminated in TS3. The energetics of this pathway (ωB97XD/6-31+G(d) or Def2-TZVPPD/SCRF=thf) are shown below, with the inversion pathway coming out lower in energy than the previously reported retention pathway.

System	Relative free energy	DataDOI
Inversion mechanism
Reactants	0.0	[4]
TS1	4.9 (4.1)*	[5]
TS2	3.1	[6]
TS3	0.0 (-0.8)*	[7]
Retention mechanism
TS1	7.9 (8.3)*	[8]
TS2	9.2 (8.7)*	[9]
TS3	5.2 (4.9)*	[10]

* Values in parentheses are computed for the Def2-TZVPP basis set.

The key new finding for the inversion mechanism is the ion-pair isomerisation (TS2), which is animated below. Transition states which involve no rearrangement at a bond (either formation/cleavage or rotation) are quite rare, and it is nice to show one here.

So the nucleophilic displacement reaction at 4-substituted silicon centres is really quite different from carbon.Two distinct associative/elimination mechanisms proceeding through 5-coordinate silicon seem possible. For the specific case of tetra-alkyl ammonium fluoride as nucleophile and an enolate anion as the leaving group, it appears that an inversion mechanism is favoured, and one gets strong indications of this from crystal structures of such 5-coordinate species. It might be nice to repeat this study with a reaction which is known to strongly favour retention of configuration.

References

L. Wozniak, M. Cypryk, J. Chojnowski, and G. Lanneau, "Optically active silyl esters of phosphorus. II. Stereochemistry of reactions with nucleophiles", Tetrahedron, vol. 45, pp. 4403-4414, 1989. https://doi.org/10.1016/s0040-4020(01)89077-3
L.H. Sommer, and H. Fujimoto, "Stereochemistry of asymmetric silicon. X. Solvent and reagent effects on stereochemistry crossover in alkoxy-alkoxy exchange reactions at silicon centers", Journal of the American Chemical Society, vol. 90, pp. 982-987, 1968. https://doi.org/10.1021/ja01006a024
D.N. Roark, and L.H. Sommer, "Dramatic stereochemistry crossover to retention of configuration with angle-strained asymmetric silicon", Journal of the American Chemical Society, vol. 95, pp. 969-971, 1973. https://doi.org/10.1021/ja00784a081
H. Rzepa, "enol + Me4N(+).F(-) Reactant", 2016. https://doi.org/10.14469/hpc/565
H. Rzepa, "Di-axial elimination of F", 2016. https://doi.org/10.14469/hpc/570
H.S. Rzepa, "C 9 H 24 F 1 N 1 O 1 Si 1", 2016. https://doi.org/10.14469/ch/195052
H. Rzepa, "Di-axial elimination of O TS", 2016. https://doi.org/10.14469/hpc/567
H. Rzepa, "trimethyl silyl enol + Me4N(+).F(-) 5-coordinate intermediate F axial TS", 2016. https://doi.org/10.14469/hpc/554
H. Rzepa, "5-coordinate intermediate Berry pseudorotation TS2 New conf?", 2016. https://doi.org/10.14469/hpc/577
H. Rzepa, "trimethyl silyl enol + Me4N(+).F(-) TS", 2016. https://doi.org/10.14469/hpc/539

Tags: Brook rearrangement, energy, free energy, Leaving group, Nucleophilic substitution, Pseudorotation, Si centre, SNi, Substitution reactions, Walden inversion
Posted in reaction mechanism | No Comments »

The mechanism of silylether deprotection using a tetra-alkyl ammonium fluoride.

May 25th, 2016

The substitution of a nucleofuge (a good leaving group) by a nucleophile at a carbon centre occurs with inversion of configuration at the carbon, the mechanism being known by the term S_N2 (a story I have also told in this post). Such displacement at silicon famously proceeds by a quite different mechanism, which I here quantify with some calculations.

Trialkylsilyl is often used to protect OH groups, and as shown in the diagram above is specifically used to enforce the enol form of a ketone by replacing the OH with OTMS. The TMS can then be removed when required by utilising nucleophilic addition of e.g. fluoride anion from tetra-alkyl ammonium fluoride to form a 5-coordinate silicon intermediate, followed by collapse of this intermediate with expulsion of the oxygen to form an enolate anion. Before starting the calculations, I searched the crystal structure database for examples of R₃SIF(OR), as in the search query below.

There were 55 instances of such species, and show below are their geometric characteristics. In all cases, the two electronegative substituents occupy the axial positions of a trigonal bipyramidal geometry. This of course is the orientation adopted by the two electronegative substituents in the S_N2 mechanism for carbon, but with silicon this carbon "transition state" can be replaced by a stable (and as we see often crystalline) intermediate!

Turning to calculations (ωB97XD/6-31+G(d)/SCRF=thf), one can locate three transition states for the silicon process (there is only one for the S_N2 reaction with carbon).

TS1 represents attack of fluoride anion along the axial position of the forming 5-coordinate silicon.[1],[2] The oxygen in this instance occupies an equatorial position, and this close proximity between the incoming F(-) and the about to depart OR groups represents a retention of configuration at the Si. Note that the reaction is endo-energic. (c.f. [3]).
The next step, TS2[4],[5] is to move the F ligand to an equatorial position and the OR group from equatorial to its own axial position so that it can depart in the manner the F adopted to arrive. This requires what is called a Berry pseudorotation, an essentially isoenergic process.

You might note a "hidden intermediate" at IRC ~-7 (the "bump" in the energy profile). This is caused by re-organisation of the ion-pair geometry, with the tetra-alkyl ammonium cation moving its orientation.
TS3[6],[7] now eliminates the ^–OR group to complete the deprotection.

The free energies are summarised below. Key points include:

The overall free energy of deprotection is appropriately exo-energic.
The highest energy barrier is actually for pseudo-rotation! This suggests that tuning the deprotection with alternative alkyl or aryl groups on the silicon may be a matter of controlling the Berry pseudorotation process.
TS1-3 proceed with the attacking and leaving groups in close proximity (the angle between an axial and an equatorial group is ~90° of course, whereas for a di-axial relationship (the inversion of the S_N2 mechanism) it is instead 180°. This close proximity of nucleophile and nucleofuge minimises the required reorganisation of the ammonium counter-ion in the ion-pairs, and possibly also the dipole moments induced by the reactions, the changes of which for the three reactions are shown below:
The 5-coordinate intermediate where both F and O are axial is in fact significantly lower in energy (a cooperative effect) than when only one of them is axial, which matches the orientations identified above in the 55 crystal structures. For a substitution to occur, the cooperative strengthening of the Si-O and Si-F bonds must be removed; hence the retention of configuration.

System	Relative free energy	DataDOI
Reactants	0.0	[8]
TS1	7.9	[1]
Int F(ax), O(eq)	5.1	[9]
TS2	10.2 (9.2)*	[4]
Int F(eq), O(ax)	5.1	[10]
TS3	5.2	[6]
Products	-4.0	[11]
Int F,O(ax)	-2.5	[12]

*A lower energy orientation of the ion-pair has subsequently been found.[13]

This analysis shows just how different the carbon and the silicon substitution reactions are and how it is the pseudorotation interconverting two 5-coordinate intermediates that appears to be a key step. But questions remain unanswered. What is the energy of the pseudorotation interconverting an intermediate with ax/eq electronegative groups to one with di-axial electronegative groups? Are there transition states starting from the diaxial intermediate and resulting in elimination, and if so what are their relative energies? I leave answers to a follow up post.

References

H. Rzepa, "trimethyl silyl enol + Me4N(+).F(-) 5-coordinate intermediate F axial TS", 2016. https://doi.org/10.14469/hpc/554
H. Rzepa, "trimethyl silyl enol + Me4N(+).F(-) 5-coordinate intermediate F axial TS IRC", 2016. https://doi.org/10.14469/hpc/564
L. Wozniak, M. Cypryk, J. Chojnowski, and G. Lanneau, "Optically active silyl esters of phosphorus. II. Stereochemistry of reactions with nucleophiles", Tetrahedron, vol. 45, pp. 4403-4414, 1989. https://doi.org/10.1016/s0040-4020(01)89077-3
H. Rzepa, "trimethyl silyl enol + Me4N(+).F(-) 5-coordinate intermediate Berry pseudorotation TS", 2016. https://doi.org/10.14469/hpc/551
H. Rzepa, "trimethyl silyl enol + Me4N(+).F(-) 5-coordinate intermediate Berry pseudorotation TS IRC", 2016. https://doi.org/10.14469/hpc/553
H. Rzepa, "trimethyl silyl enol + Me4N(+).F(-) TS", 2016. https://doi.org/10.14469/hpc/539
H. Rzepa, "trimethyl silyl enol + Me4N(+).F(-) TS IRC", 2016. https://doi.org/10.14469/hpc/552
H. Rzepa, "enol + Me4N(+).F(-) Reactant", 2016. https://doi.org/10.14469/hpc/565
H. Rzepa, "enol + Me4N(+).F(-) 5-coordinate intermediate F axial", 2016. https://doi.org/10.14469/hpc/555
H. Rzepa, "trimethyl silyl enol + Me4N(+).F(-) 5-coordinate intermediate", 2016. https://doi.org/10.14469/hpc/540
H. Rzepa, "enol + Me4N(+).F(-) Product", 2016. https://doi.org/10.14469/hpc/563
H. Rzepa, "trimethyl silyl enol + Me4N(+).F(-) 5-coordinate intermediate F/O axial", 2016. https://doi.org/10.14469/hpc/550
H. Rzepa, "5-coordinate intermediate Berry pseudorotation TS2 New conf?", 2016. https://doi.org/10.14469/hpc/577

Tags: Berry mechanism, Elimination reaction, energy, energy barrier, energy profile, free energy, Leaving group, lower energy orientation, Molecular geometry, Organic reactions, overall free energy, Pseudorotation, search query, SN2 reaction, Stereochemistry, Trigonal bipyramidal molecular geometry
Posted in reaction mechanism | No Comments »

Data-free research data management? Not an oxymoron.

May 24th, 2016

I occasionally post about "RDM" (research data management), an activity that has recently become a formalised essential part of the research processes. I say recently formalised, since researchers have of course kept research notebooks recording their activities and their data since the dawn of science, but not always in an open and transparent manner. The desirability of doing so was revealed by the 2009 "Climategate" events. In the UK, Climategate was apparently the catalyst which persuaded the funding councils (such as the EPSRC, the Royal Society, etc) to formulate policies which required all their funded researchers to adopt the principles of RDM by May 2015 and in their future researches. An early career researcher here, anxious to conform to the funding body instructions, sent me an email a few days ago asking about one aspect of RDM which got me thinking.

The question related to the divide between data as a separate research object (and which therefore has to be managed), and data as an inseparable part of the article narrative, which is of course ostensibly managed by the journal publication processes. Such data may often be the description of a process rather than simply tables of numbers or graphs. In chemistry it may include chemical names and chemical terms as part of an experimental procedure. For one nice illustration of such embedded data, go look at the chemical tagger page. Here the data is blending with the semantics, and the two are not easily separated. So, when such separation is not easily achieved, should the specific processes required by RDM as illustrated in the five bullet points below actually be followed?

Specify a data management plan to be followed, as for example points 2-5 below.
Decide upon a location for your data, separated into one for "live" or working data (the purpose simply being to ensure it is properly backed up) and the other for a sub-set of formally "published data" which has to be available for at least ten years after its publication.
Use 2 to gather metadata (see 6-14 below) and in return get a DOI representing the location of the combined metadata + data, from a suitable registration authority such as DataCite.
Quote this DOI(s) in the article describing the results of analysing the data and presenting hypotheses, and conversely once the article itself is allocated its own DOI from a registration authority such as CrossRef, update the metadata in item 3 so as to achieve a bidirectional link between the data and its narrative (and we assume that DataCite and CrossRef will also increasingly exchange the metadata they each hold about the items).
Add both the data and the article DOIs to any institutional CRIS or current research information system (parenthetically, I regard this last stipulation as rather redundant if items 3 and 4 are working effectively, but its a good interim measure whilst the overall system matures).

So, should step 2 be included if the data itself is inextricably intertwined with the narrative and cannot be separated? The slightly surprising advice I would suggest is yes! And the answer is that it IS possible to generate metadata (data about the, possibly entwined, data) which CAN be processed in such a step. What forms would such metadata take?

Identification of the researcher(s) involved. This would nowadays take the form of an ORCID (Open Research and Collaborator Identifier).
Identification of the hosting institution where the data has been produced. There is currently no equivalent to an ORCID for institutions, but it is very likely to come in the future.
A date stamp formalising when the (meta)data is actually deposited.
A title for the project being described. Here we see a blurring between the narrative/article and the data; a title is the shortest possible description of the narrative/article, and it may also apply to the data object(s) or it could have its own title.
A slightly fuller abstract of the project being described. Here we see further blurring between the narrative and data objects.
One can include "related identifiers", in particular the DOIs of any other relevant articles that might have been published which may expand the context of the data, and also the DOIs of any other relevant datasets which may have been allocated in step 2 above.
It is also beneficial to include "chemical identifiers". These can take the form of InChI strings and InChI keys, which allow discretely defined molecular objects which were the object of the research to be tracked and which relate to both the narrative and any other data objects.
If specific software has been used to analyse data, it too can be included as a "related identifier" (e.g. [1]
Potentially at least, if a well-defined instrument has been involved, it too could be included with its own "related identifier". With both 13-14, other issues may need addressing, such as versioning etc, but this no doubt will be sorted in due course.
etc.

So items such as 6-14 can be collected and sent to e.g. DataCite with a DOI received in return as part of item 2 of the RDM processes. No "pure" data need be involved, only metadata. Nonetheless such metadata can only increase the visibility and discoverability of the research, as illustrated in how such metadata can be searched for the components described above.

References

H.S. Rzepa, "KINISOT. A basic program to calculate kinetic isotope effects using normal coordinate analysis of transition state and reactants.", 2015. https://doi.org/10.5281/zenodo.19272

Tags: Academic publishing, chemical identifiers, chemical names and chemical terms, chemical tagger page, CrossRef, Data management, Data management plan, DataCite, Identifiers, ORCiD, RDM, researcher, Royal Society, Singular spectrum analysis, Technical communication, Technology/Internet
Posted in Chemical IT | No Comments »

What is the approach trajectory of enhanced (super?) nucleophiles towards a carbonyl group?

May 11th, 2016

I have previously commented on the Bürgi–Dunitz angle, this being the preferred approach trajectory of a nucleophile towards the electrophilic carbon of a carbonyl group. Some special types of nucleophile such as hydrazines (R₂N-NR₂) are supposed to have enhanced reactivity[1] due to what might be described as buttressing of adjacent lone pairs. Here I focus in on how this might manifest by performing searches of the Cambridge structural database for intermolecular (non-bonded) interactions between X-Y nucleophiles (X,Y= N,O,S) and carbonyl compounds OC(NM)₂.

The search query[2] is shown above and involves plotting the distance from the nucleophilic atom (N above) to the carbon of the carbonyl group. The carbon is defined as having 3-coordination, one of which is O=C and two non-metal attachments. The torsion is constrained to values of |70-110|° to ensure that the approach of the nucleophile is approximately perpendicular to the plane of the carbonyl in order to overlap with the π*-orbital as electrophile. The pairwise sums of van der Waals radii are NC, 3.25; OC, 3.22 and SC, 3.5Å and the plots show all contacts shorter than these. The results of the searches are shown below.

The general observation is that the red hotspots do tend to come at trajectory angles of <100° and many are <90° such as the X=Y=N or X=Y=S examples. Given that the original Bürgi–Dunitz hypothesis (actually based on a small number of molecules synthesized for the purpose) proposed rather larger angles (105±5°) corresponding to optimum alignment of the nucleophile with the carbonyl π*-orbital, we might speculate whether the use of enhanced nucleophiles is the reason for the apparent decrease in the angle. And if so, what the underlying reasons would be.

I also cannot help but observe that the term supernucleophile is quite rare in the literature; SciFinder gives only 45 hits, but most are about neither hydrazines nor peroxides. There are also some unusual nucleophile varieties such as Cob(I)alamin[3], of which there are probably insufficient examples to reflect in the crystal structure statistics shown above. Given the interest in superbases, the relative lack of examples of unusual supernucleophiles seems surprising.

References

G. Klopman, K. Tsuda, J. Louis, and R. Davis, "Supernucleophiles—I", Tetrahedron, vol. 26, pp. 4549-4554, 1970. https://doi.org/10.1016/s0040-4020(01)93101-1
H. Rzepa, "Crystal structure search using enhanced nucleophiles", 2016. https://doi.org/10.14469/hpc/487
K.P. Jensen, "Electronic Structure of Cob(I)alamin: The Story of an Unusual Nucleophile", The Journal of Physical Chemistry B, vol. 109, pp. 10505-10512, 2005. https://doi.org/10.1021/jp050802m

Tags: Bases, Bürgi–Dunitz angle, Carbonyl, Electrophile, Ester, Flippin–Lodge angle, Functional groups, hydrazine, non-metal attachments, Nucleophile, Physical organic chemistry, search query, Superbase
Posted in Chemical IT, crystal_structure_mining | 1 Comment »

Autoionization of hydrogen fluoride.

April 24th, 2016

The autoionization of water involves two molecules transfering a proton to give hydronium hydroxide, a process for which the free energy of reaction is well known. Here I ask what might happen with the next element along in the periodic table, F.

I have been unable to find much about the autoionization of HF in the literature; the pH of neat HF appears unreported (unlike that of H₂O, which of course is 7). Even the dielectric constant of liquid HF[1],[2] seems to vary widely, the largest reported being ~84. It is suggested that liquid HF is much less ordered than e.g. water, and this suggests that a single static model is unlikely to be entirely realistic. Nonetheless, I thought it might be informative to take the model I previously constructed for water and try applying it to HF. Here is part of the geometry optimisation cycle[3] from the original edited water model. I used ωB97XD/Def2-TZVPPD/SCRF=water for the model. Why continuum water as the solvation treatment? Well, standard parameters for liquid HF are not available (perhaps given the variation in dielectric) and since the upper bound might be similar to water, I decided to use that to see what I got. Clearly however an approximation.

The low energy final geometry corresponds to 10 HF molecules and lies about 16 kcal/mol lower (in total energy) than the cyclic structure containing H₂F⁺.F^– species connected by two (HF)₃ bridges and two further non-bridge HF molecules hydrogen bonding to the H₂F⁺and the F^–. In fact the ionic structure turns out to be a transition state for proton shifting along the chain to create (HF)₁₀, with a free energy barrier of 9.2 kcal/mol above the neutral form.[4] This difference between ionic and non-ionic forms is considerably less than that for water as previously indicated. Note also how much shorter the hydrogen bonding H…F distances are in the HF cluster.

So unlike water, where the hydronium hydroxide is a clear minimum in the potential with a small but distinct barrier (~3.5 kcal/mol[5]) to proton transfer, with HF at the same level of theory the barrier is zero. Perhaps the difference might be because whereas hydronium hydroxide can support three stabilizing (H₂O)₃ bridges, only two (HF)₃ bridges are possible with H₂F⁺.F^–. It might also be higher levels of theory (or better/larger models of the HF cluster) could well give a barrier for the process, but this does tend to suggest that the dynamics of HF liquid may suggest quite different lifetimes for autoionized forms of HF compared to water. Liquid HF is clearly just as complicated a liquid as is H₂O, certainly much less is known about it.

References

R.H. Cole, "Dielectric constant and association in liquid HF", The Journal of Chemical Physics, vol. 59, pp. 1545-1546, 1973. https://doi.org/10.1063/1.1680219
P.H. Fries, and J. Richardi, "The solution of the Wertheim association theory for molecular liquids: Application to hydrogen fluoride", The Journal of Chemical Physics, vol. 113, pp. 9169-9179, 2000. https://doi.org/10.1063/1.1319172
H.S. Rzepa, "H 10 F 10", 2016. https://doi.org/10.14469/ch/192032
H.S. Rzepa, "H 10 F 10", 2016. https://doi.org/10.14469/ch/192034
H.S. Rzepa, "H22O11", 2016. https://doi.org/10.14469/ch/192022

Tags: dielectric, energy, Equilibrium chemistry, Fluorides, free energy, free energy barrier, Hydrogen bond, Hydronium, Inorganic solvents, Lithium fluoride, low energy final geometry corresponds, Oxides, PH, Properties of water, Self-ionization of water, Water, Water model
Posted in Interesting chemistry | 10 Comments »

Deuteronium deuteroxide. The why of pD 7.435.

April 22nd, 2016

Earlier, I constructed a possible model of hydronium hydroxide, or H₃O⁺.OH^–One way of assessing the quality of the model is to calculate the free energy difference between it and two normal water molecules and compare the result to the measured difference. Here I apply a further test of the model using isotopes.

Pure water has pH 7, which means equal concentrations for both [H₃O⁺] and [OH^–] of 10^-7M. Converting this to a free energy one gets ΔG₂₉₈ 19.088 kcal/mol. Now the pD of pure deuterium oxide is reported as 7.435, equivalent to ΔG₂₉₈ 20.274, an isotope effect on the free energy of ΔΔG₂₉₈=1.186 kcal/mol. How does the theoretical model (ωB97XD/Def2-TZVPPD/SCRF=water^‡) previously reported[1],[2] do? The value obtained is 1.215,[3] an apparent error of only 0.029 kcal/mol. I am quite pleased with the close correspondence; at least the model is capable of reporting good isotope effects on the ionisation equilibrium of pure water!

Finally, with some confidence assured, one might apply this to tritonium tritoxide. Tritiated water is so radioactive it would boil in an instant, probably well before its pT could be measured. ΔΔG₂₉₈ is calculated as 1.798 kcal/mol. Will this estimate ever be challenged by experiment?

‡ It is assumed no isotope effect acts on the dielectric constant of water and hence the continuum model used here to model it. In fact the isotope effect on this property is modest; ε₂₉₈ = 77.94, compared with 78.36 for normal water.[4]

References

H.S. Rzepa, "H 22 O 11", 2016. https://doi.org/10.14469/ch/191999
H.S. Rzepa, "H 22 O 11", 2016. https://doi.org/10.14469/ch/191998
H. Rzepa, "Deuteronium deuteroxide; free energy differences.", 2016. https://doi.org/10.14469/hpc/407
C. Malmberg, "Dielectric constant of deuterium oxide", Journal of Research of the National Bureau of Standards, vol. 60, pp. 609, 1958. https://doi.org/10.6028/jres.060.060

Tags: dielectric, energy, free energy, Heat transfer, Heavy water, Kilocalorie per mole, model is to calculate the free energy difference, Properties of water, the free energy, thermodynamics, Tritiated water
Posted in Interesting chemistry | 4 Comments »

Collaborative FAIR data sharing.

April 17th, 2016

I want to describe a recent attempt by a group of collaborators to share the research data associated with their just published article.[1]

I am here introducing things in a hierarchical form (i.e. not necessarily the serial order in which actions were taken).

The data repository selected for the data sharing is described by (m3data) doi: 10.17616/R3K64N[2]
A collaborative project collection was established on this repository (doi: 10.14469/hpc/244[3]). This data collection has some of the following attributes:
Its metadata is sent here: https://search.datacite.org/ui?&q=10.14469/hpc/244 where it can be queried for other details.
The project collaborators are all identified by their ORCID, used to obtain further individual information about the researchers. This information is also propagated to the metadata sent to DataCite.
In the section labelled associated DOIs there is a link to the recently published peer-reviewed article, which itself cites the data via doi: 10.14469/hpc/244 and which thus establishes a bidirectional link between the article and its data.
Also in the associated DOIs section are other DOIs (to two figures and two tables) held in a separate location. One example: doi: 10.14469/hpc/332[4]) which illustrates the original type of data sharing we started about 10 years ago. This form has been variously called a "WEO" or Web-enhanced object (by the ACS) or interactivity boxes (RSC, etc). In such WEOs, we wrap the data into an interactive visual appearance using Jmol or JSmol software. The data itself is directly available to the reader using the Jmol export functions (right mouse click in the visual window).
- In this specific example the WEO has been assigned its DOI using the repository noted above.[2]
- We have in the past also used Figshare[5]) for this purpose, see e.g. 10.6084/m9.figshare.1181739^‡
- The WEO itself can itself reference a more complete set of data used to create the visual appearance, for example data that allows the wavefunction of the molecule to be computed, doi: 10.6084/m9.figshare.2581987.v1[6] In this instance this is held on the Figshare[5] repository.
The collection has another section labelled Members. These are individual datasets associated with the collection and held on the SAME repository as the collection itself. In this case, there are five such members, two of which are listed below:
1. 10.14469/hpc/281[7] contains a variety of other data such as outputs from an IRC (intrinsic reaction coordinate), energy profile diagrams and ZIP archives of other calculations.
2. 10.14469/hpc/272[8] itself contains five members, one of which is e.g.
  - 10.14469/hpc/267[9] which contains a ZIP archive with NMR data (see here for how this might be packaged in the future) and a file for a GPC (chromatography) instrument.
  - This last item also contains a new section labelled Metadata, which includes e.g. the InChI key and InChI string for the molecule whose properties are reported.

If this mode of presenting data seems a little more complex than a single monolithic PDF file, its because its designed for:

collaboration between scientists, potentially at different locations and institutions.
attribution of provenance/credit for the individual items (via ORCID).
separate date stamping by the various contributors.
providing bi-directional links between data and publications.
holding what we call FAIR (findable, accessible, interoperable and reusable) data, rather than just data encapsulated in a PDF file.
Collecting, storing and sending metadata for aggregation in a formal way, i.e. to DataCite using a formal schema to render the metadata properly searchable.

Thus 10.14469/hpc/244 represents our most complex attempt yet at such collaborative FAIR data sharing with multiple contributors. The tools for packaging many of the datasets are still quite limited (see again here) and the design is still being optimised (call it α). When the repository[2] has been more extensively tested, we intend to make it available as open source for others to experiment with. And of course, when this happens the source code too will have its own DOI!

^‡A refactoring of the Figshare site in December 2015 meant that the DOI no longer points directly to the WEO, and you have to follow a manually inserted link on that page to see it.

References

C. Romain, Y. Zhu, P. Dingwall, S. Paul, H.S. Rzepa, A. Buchard, and C.K. Williams, "Chemoselective Polymerizations from Mixtures of Epoxide, Lactone, Anhydride, and Carbon Dioxide", Journal of the American Chemical Society, vol. 138, pp. 4120-4131, 2016. https://doi.org/10.1021/jacs.5b13070
Re3data.Org., "Imperial College Research Computing Service Data Repository", 2016. https://doi.org/10.17616/r3k64n
C. ROMAIN, "Chemo-Selective Polymerizations Using Mixtures of Epoxide, Lactone, Anhydride and CO2", 2016. https://doi.org/10.14469/hpc/244
H. Rzepa, "Table S8: Comparison of two different basis sets for selected intermediates for CHO/PA ROCOP.", 2016. https://doi.org/10.14469/hpc/332
Re3data.Org., "figshare", 2012. https://doi.org/10.17616/r3pk5r
P. Dingwall, "Gaussian Job Archive for C6H10O", 2016. https://doi.org/10.6084/m9.figshare.2581987.v1
C. ROMAIN, "Figure 9, Figure S18, Figure S19: ROCOP of PA/CHO + IRC", 2016. https://doi.org/10.14469/hpc/281
C. ROMAIN, "Table 1 : Polymerizations Using Lactone, Epoxide, and CO2", 2016. https://doi.org/10.14469/hpc/272
C. ROMAIN, "Table 1, entry 1 : Polymerizations Using Lactone, Epoxide, and CO2", 2016. https://doi.org/10.14469/hpc/267

Tags: 10.17616, Academic publishing, DataCite, energy profile diagrams, Figshare, Identifiers, Open science, ORCiD, PDF, Scholarly communication, Technical communication, Technology/Internet, Web-enhanced object
Posted in Chemical IT | No Comments »

Metametadata: data about data about (chemical) data.

April 16th, 2016

Scientists are familiar with the term data, at least in a scientific or chemical context, but appreciating metadata (meaning "after", or "beyond") is slightly more subtle, in the sense of using it to mean data about data. The challenge lies in clarifying where the boundary between data and its metadata lies and in specifying and controlling the vocabulary used for these metadata descriptions. Items in a chemical metadata dictionary might include e.g. subject classifications such as Organic Molecular Chemistry or identifiers such as InChIkey. But what could metametadata be? Here I briefly show some examples by way of illustration.

Let me start by defining a data repository as a store of both data and the metadata describing it. The metadata is to be exposed in a standard manner which allows it to be aggregated by other agencies. Nowdays, it is becoming common to identify such a data object together with its metadata using a persistent identifier, or DOI. But to decide if any particular repository and the data objects contained therein is generally useful to you, you need information about the metadata itself. Technically, this is defined using a schema[1] describing the metadata (which might e.g. identify any dictionaries used); hence metametadata. Now you need to store the metametadata and so I introduce the concept of a registry which does this. This metametadata object is itself assigned a DOI^‡ and here I list these DOIs for a personal selection of some chemically oriented examples, in this case deriving from the largest registry of research data repositories re3data.org. You can search for your own entry at their site: http://service.re3data.org/search.

Data repository	The repository metametadata DOI^♣	Badge
Figshare	10.17616/R3PK5R[2]
Zenodo	10.17616/R3QP53[3]
Cambridge structure database	10.17616/R36011[4]
Crystallographic open database	10.17616/R37S31[5]
Oxford University Research Archive	10.17616/R3Q056[6]
Open Notebook Science	10.17616/R3859D[7]
Usefulchem	10.17616/R3Z89N[8]
Chemotion	10.17616/R34P5T[9]
Chemspider	10.17616/R38P4P[10]
Chemical Database Service	10.17616/R36P42[11]
Imperial College HPC data repository.	r3d100011965[12],[13]
Imperial College SPECTRa repository.[14]	10.17616/R30316[15]

Not all of the repositories listed in the table above assign formal DOIs to their data collections, meaning that the metadata for their entries cannot be aggregated in a searchable manner using e.g. search.datacite.org/ui (or search.datacite.org/api for the machine version). Currently, the metametadata does not fully carry this information, an aspect which I gather will be rectified in a future revision of the re3data schema.[1]

Importantly, both metadata and (repository) metametadata can be searched using APIs (application programmer interface), ensuring that the entire flow of meta information can be subject to automated software analysis rather than just visual inspections by a human.This should allow a rich and open infrastructure for handling research objects or data to be built up using hierarchical metadata. The examples above indeed show that the chemical space is already the largest component of the Natural Sciences space.

Although the edifice is still largely in its infancy, already I think we can start to see an alternative open approach emerging to "Googling" for data, or the even older traditional bespoke (i.e. non-open) services offered by commercial human-based abstractors of chemical metadata.

^‡This DOI is information about the metametadata, and hence it is metametametadata, or m3data. Sorry! ^♣The citations at the foot of this post are generated entirely automatically (by a WordPress plugin called Kcite) from the m3data associated with each entry, i.e. the DOI listed. Were the persistent identifier for the entry ever to be changed, this would propagate automatically to the citation, unlike the static entries in the table.

References

J. Rücknagel, P. Vierkant, R. Ulrich, G. Kloska, E. Schnepf, D. Fichtmüller, E. Reuter, A. Semrau, M. Kindling, H. Pampel, M. Witt, F. Fritze, S. Van De Sandt, J. Klump, H. Goebelbecker, M. Skarupianski, R. Bertelmann, P. Schirmbacher, F. Scholze, C. Kramer, C. Fuchs, S. Spier, and A. Kirchhoff, "Metadata Schema for the Description of Research Data Repositories", 2015. https://doi.org/10.2312/re3.008
Re3data.Org., "figshare", 2012. https://doi.org/10.17616/r3pk5r
Re3data.Org., "Zenodo", 2013. https://doi.org/10.17616/r3qp53
Re3data.Org., "The Cambridge Structural Database", 2013. https://doi.org/10.17616/r36011
Re3data.Org., "Crystallography Open Database", 2013. https://doi.org/10.17616/r37s31
Re3data.Org., "Oxford University Research Archive", 2014. https://doi.org/10.17616/r3q056
Re3data.Org., "ONSchallenge", 2013. https://doi.org/10.17616/r3859d
Re3data.Org., "UsefulChem", 2014. https://doi.org/10.17616/r3z89n
Re3data.Org., "chemotion", 2013. https://doi.org/10.17616/r34p5t
Re3data.Org., "ChemSpider", 2013. https://doi.org/10.17616/r38p4p
Re3data.Org., "Chemical Database Service", 2012. https://doi.org/10.17616/r36p42
https://doi.org/
H. Rzepa, "Imperial College High Performance Computing Service Data Repository Metadata Schema", 2016. https://doi.org/10.14469/hpc/382
J. Downing, P. Murray-Rust, A.P. Tonge, P. Morgan, H.S. Rzepa, F. Cotterill, N. Day, and M.J. Harvey, "SPECTRa: The Deposition and Validation of Primary Chemistry Research Data in Digital Repositories", Journal of Chemical Information and Modeling, vol. 48, pp. 1571-1581, 2008. https://doi.org/10.1021/ci7004737
Re3data.Org., "SPECTRa Project", 2013. https://doi.org/10.17616/r30316

Tags: Academic publishing, automated software analysis, BASE, chemical context, Chemical Database Service, chemical metadata, chemical metadata dictionary, chemical space, City: Cambridge, Data dictionary, Data management, Identifiers, Knowledge representation, programmer, Registry of Research Data Repositories, search.datacite.org/api, SPECTRa, Technology/Internet
Posted in Chemical IT | No Comments »

Henry Rzepa's blog

500 chemical twists: a (chalk and cheese) comparison of the impacts of blog posts and journal articles.

References

The geometries of 5-coordinate compounds of group 14 elements.

References

An alternative mechanism for nucleophilic substitution at silicon using a tetra-alkyl ammonium fluoride.

References

The mechanism of silylether deprotection using a tetra-alkyl ammonium fluoride.

References

Data-free research data management? Not an oxymoron.

References

What is the approach trajectory of enhanced (super?) nucleophiles towards a carbonyl group?

References

Autoionization of hydrogen fluoride.

References

Deuteronium deuteroxide. The why of pD 7.435.

References

Collaborative FAIR data sharing.

References

Metametadata: data about data about (chemical) data.

References

Recent Posts

Archives

Blogroll

Meta