Posts Tagged ‘Nuclear magnetic resonance’

Organocatalytic cyclopropanation of an enal: (computational) product stereochemical assignments.

Sunday, August 26th, 2018

In the previous post, I investigated the mechanism of cyclopropanation of an enal using a benzylic chloride using a quantum chemistry based procedure. Here I take a look at the NMR spectra of the resulting cyclopropane products, with an evaluation of the original stereochemical assignments.[1]

Three products were identified, 4a-c (aryl=2,4-dinitro) with a fourth diastereomer undetected. The relative stereochemistries were assigned[1] on the basis of NMR coupling constants, using the empirical Karplus or Bothner-By relationships. Here I calculate the NMR couplings at the B3LYP+GD3BJ/Def2-TZVPP/SCRF=chloroform level for a comparison, using a methyl group rather than the full n-heptyl one shown above.

System, Data DOI

10.14469/hpc/4650

Gibbs Energy J1(a)-2(b) J1(a)-3(c)

J3(c)-2(b)

4a (1S,2R,3R) expt 4.9 9.0 7.5
4a calc -910.861653 4.6 9.9 8.3
-910.860816 4.4 10.7 7.9
-910.859908 4.9 10.9 7.7
-910.860299 5.2 8.1 8.1
4b (1R,2R,3R) expt 9.6 5.3 6.7
4b calc -910.859549 10.8 5.1 7.7
4c (1S,2R,3S) expt 5.4 5.4 9.9
4c calc -910.859820 4.2 5.5 10.4
4d (1R,2R,3S) expt n/a
4d calc -910.855965 10.3 9.4 9.6

The variation resulting from rotations about the substituents (the o-nitro and the carbaldehyde) as seen for 4a can be up to ~2 Hz. This could if needed be averaged by weighting with the Boltzmann populations. Even without this procedure one can see that for the three diastereomers where values were measured, the calculated couplings agree to 1 Hz or better. This provides confirmation of the original assignments. This quantum-based method can be used in cases where simple formulaic relationships may apply less well.


For four conformations, rotating the carbaldehyde and the o-nitro groups, as in red above.

References

  1. M. Meazza, A. Kowalczuk, S. Watkins, S. Holland, T.A. Logothetis, and R. Rios, "Organocatalytic Cyclopropanation of (<i>E</i>)-Dec-2-enal: Synthesis, Spectral Analysis and Mechanistic Understanding", Journal of Chemical Education, vol. 95, pp. 1832-1839, 2018. https://doi.org/10.1021/acs.jchemed.7b00566

MOLinsight: A web portal for the processing of molecular structures by blind students.

Friday, March 31st, 2017

Occasionally one comes across a web site that manages to combine being unusual, interesting and also useful. Thus www.molinsight.net is I think a unique chemistry resource for blind and visually impaired students.

If you think perhaps that it might be a little too specialised to be useful for you, go visit it first. It does not overwhelm, but contains much valuable information about topics such as open source chemical structure editors, property calculators and stereochemical utilities. Some topics really stand out. For example, Sonification of IR spectra describes the technique for converting an infrared spectrum into non-speech sounds of varying tones. I wonder if they have plans for sonified NMR spectra?

Sonification of visual spectra

The project has been around for a little while and it’s really nice to see it well curated and up to date. In an era where $billions seems to be focused on augmented visual reality as the future means of delivering information, its nice to know that enabling chemistry via other senses is not forgotten.

The provenance of scientific data – establishing an audit trail.

Thursday, March 30th, 2017

In an era when alternative facts and fake news afflict us, the provenance of scientific data becomes ever more important. Especially if that data is available as open access and exploitable by others for both valid scientific reasons but potentially also by those with other motives. Here I consider the audit trail that might serve to establish data provenance in one typical situation in chemistry, the acquisition of NMR instrumental data. 

Here I describe how such data is generated in my department; details may vary elsewhere.

  1. The prospective user of the NMR service is allocated a service ID. In our case, that ID relates to the research group rather than to individual researchers. This ID is parochial, it does not reference any other information about the user in the institute. Only the service manager has the information to associate this ID with real users and this information is normally not distributed.
  2. When a sample is submitted, this ID is used to create a new folder containing the data as a sub-folder of the group ID and located on the NMR data servers.
  3. The dataset itself contains a number of files that contain an audit trail (names such as audita.txt, auditp.txt) with the fields: ##AUDIT TRAIL= $$ (NUMBER, WHEN, WHO, WHERE, PROCESS, VERSION, WHAT). Typically, none of these files have propagated the original user ID under which the data was collected; to do so would require a programmatic connection between the local authentication systems and the spectrometer software used, a connection that is normally missing. Thus the first break in the provenance trail.
  4. In principle other audit trails can be inferred from these files, such as the unique identity of the instrument provided by its manufacturer. Further information such as e.g. the probe used to collect the data (probes can be readily changed over) or any calibration data used in setting up the instrument for the data collection are by and large not recorded. To my knowledge, although an instrument can have a unique serial number, such serial numbers of swappable components such as probes are not recorded by the collection software. Thus the second break in the provenance trail.
  5. This data then needs to be processed by further software. In this case we use the MestreNova system for this task. Each dataset has editable assigned properties; below I show those that can be associated with the spectrum (accessed with MestreNova using Edit/Properties). All this comes from the information collected by the instrument. The user’s identity can be inserted into the “title” field, the display of which is off by default. 
  6. There is also a section for parameters, a synonym for which might be metadata and accessed using this program from View/Tables/Parameters. If Author was entered as a parameter in the dataset by the spectrometer software, the Mnova document would retrieve that information. Equally, an ORCID identifier for the author entered at the time of data collection and thus stored in the dataset could be read by Mnova, stored and displayed if configured to do so. It would be fair to say however that this option is rarely if indeed ever systematically implemented by NMR instrument data collection software and so is never propagated to the data processing software (as highlighted in red below). Thus a third break in the provenance trail.
    This is also an alternative and this time formal metadata field that can be populated, by default as shown below with the type of spectrum and nucleus. These properties are not controlled in the sense of only allowing those terms that are present in a specified dictionary. The jargon for such control is a metadata schema. This is not used here, since dissemination of this information is not intended; the software accepts whatever information it is given. 
    There are thus several opportunities to collect the identity of the experimenter and thus attribute provenance to the collected data, but this does very much depend on the will of researchers, institutions or publishers to enforce specific policies around this. The fourth break in the provenance trail.
  7. The dataset can then be uploaded (DOI: 10.14469/hpc/1291), at which stage provenance can finally be added using the ORCID credentials of the person publishing the dataset, who of course may or may not be the person who actually recorded the data! The full metadata for this specific collection can be seen at data.datacite.org/10.14469/hpc/1291. Or to put it another way, this is the first point in the provenance chain where the metadata is controlled by a schema and is also discoverable in a standard programmatic manner, i.e. the preceding link. The provenance is now formally associated with the ORCID identifier using the DataCite metadata schema. You should be aware that a local policy is that access to the repository at https://data.hpc.imperial.ac.uk is only allowed by cross-authentication with http://orcid.org/ using the user’s ORCID. This identifier is then automatically propagated to the metadata held at e.g. data.datacite.org/10.14469/hpc/1095. Currently however, none of any metadata originally recorded in either the instrumental file set or the processed MestreNova file is forwarded on to the metadata record held at DataCite; again loss of information and potentially of provenance
  8. The peer-reviewed article resulting from the interpretation of this data however can be associated with the provenance introduced in the previous stage; see data.datacite.org/10.14469/hpc/1267  and the IsReferencedBy property. 

Now imagine if there was a common thread in all the stages of acquiring, processing and publishing this scientific data based on the ORCID. 

  1. Providing an ORCID could be made an essential requirement of access to the instrument.
  2. This information would be propagated to the dataset …
  3. by inclusion in one or more of the audit trail files.
  4. At this stage, further persistent identifiers associated with the instrument manufacturer could be added, which help identify not only the instrument used, but sub-components such as the changeable probe. This would allow access to any calibration curves or probe sensitivity and other aspects.
  5. The ORCID and other relevant information could be picked up by the software used to convert the data into spectra and propagated into the metadata containers for this software …
  6. where its use is controlled by a specified schema.
  7. At this stage, the ORCID and information such as the nucleus recorded, the sample temperature etc can be propagated on to the final metadata records.
  8. And the reader of the article describing this work would have a formally defined provenance audit trail they could follow back to the start of the experiment or forward to a published article. In this case, the data claims provenance (acquired from peer review) from the article, but it should also work in reverse with the article claiming provenance from the data on which it is based. The indexing of this bidirectional exchange is one of the exciting features that we should see emerging from CrossRef (holders of metadata about articles) and DataCite (holders of metadata about research data) in the near future.

We are clearly a little way from having the infrastructures described above for establishing such data audit trails. To do so will require cooperation from instrument manufacturers, at least in the example as charted above, as well as researchers, institutions, publishers, peer-reviewers and funding bodies. The first step would be to ensure that all scientists who intend collecting, processing and publishing data should claim an ORCID. That remark is directed specifically at undergraduate, postgraduate and post-doctoral researchers, not just at their supervisor or their PI (principal investigator). At a point when the discussion about alternate facts and perhaps even alternate data risks a general loss of confidence in science, we should be pro-active in establishing trust in the scientific processes.


You can see an example obtained by this process at DOI: 10.14469/hpc/1095

This requirement is a strong driver for the uptake of ORCID amongst our student population.

Hydrogen bonding to chloroform.

Monday, November 14th, 2016

Chloroform, often in the deuterated form CDCl3, is a very common solvent for NMR and other types of spectroscopy. Quantum mechanics is increasingly used to calculate such spectra to aid assignment and the solvent is here normally simulated as a continuum rather than by explicit inclusion of one or more chloroform molecules. But what are the features of the hydrogen bonds that form from chloroform to other acceptors? Here I do a quick search for the common characteristics of such interactions.

  1. This first search (R < 0.05, no errors, no disorder) is for interactions from the CH… O, and is a plot of that distance against the angle subtended at the oxygen.

    clcho-rt

    Note that there are not that many crystalline examples. The “hotspot” is at a distance of ~2.3Å, but real examples down to 1.9Å exist. The angle subtended at the oxygen is close to 120° (the angle subtended at the hydrogen is always close to 180°). The plot below constrains the search to data collected below 140K to reduce the thermal noise in the measurements, with the hotspot shortening slightly to 2.2Å. clcho-140

  2. The next search is for interactions to N rather than O (T < 140K). There are rather fewer hits, but again with similar features.clchn-140
  3. Finally, an attempt to see if there is a correlation between the C-H length and the H…O length. ch-vs-co

    This has odd characteristics, which suggests that in most cases the C-H distance is not measured from the diffraction data but simply “idealised” (and which therefore renders this plot meaningless). Unless its been added recently, it is not possible to specify in the search how the hydrogen positions have been refined, if at all and hence to restrict the search only to those structures where the C-H distance is meaningful.

In the last ten years or so, great progress has been made in assigning experimental spectra with the help of quantum calculations. This is true of chemical shifts in NMR, but especially so for chiroptical measurements such as ORP, ECD and VCD. Given that explicit hydrogen bonds can introduce anisotropy into the otherwise isotropic solvent continuum, it might be worth including perhaps one chloroform molecule into these calculations, especially if the  CH…O distance is <2Å (which suggests it is fairly strong). If nothing else, chloroform is rather big and might exert effects based on dispersion attractions or steric repulsions as well as the H-bonding.

Managing (open) NMR data: a working example using Mpublish.

Monday, August 1st, 2016

In March, I posted from the ACS meeting in San Diego on the topic of Research data: Managing spectroscopy-NMR, and noted a talk by MestreLab Research on how a tool called Mpublish in the forthcoming release of their NMR analysis software Mestrenova could help. With that release now out, the opportunity arose to test the system.

I will start by reminding that NMR data associated with a published article is (or should be) openly free: one should not need a subscription to the journal to access it (although one might in order to find it). Now, NMR data as it emerges from a spectrometer is highly sophisticated, comprising a collection of (sometimes) binary proprietary files containing the measured free induction decays (FID). Turning this raw data into an interpretable NMR spectrum, the visual form of the data that so appeals to human beings, is non trivial. This requires what may be highly sophisticated software and that in turn means that it may be a commercial product. Of course there are also examples of non-commercial open software packages that are best-of-breed; indeed in its early life-cycle MestreNova was known as MESTREC before becoming a commercial product. Could one achieve the benefits of both open and fully functional NMR data with no loss from the original instrument coupled with the ability to apply top-quality software for its analysis in an open manner? This is a demonstration of how Mpublish achieves this.

  1. Invoke the URL data.datacite.org/chemical/x-mnpub/10.14469/hpc/1087 from a browser
  2. This action queries the metadata deposited with DataCite for the doi 10.14469/hpc/1087 and retrieves the first instance of any file associated with that dataset that has the format type chemical/x-mnpub. You can directly view this metadata by invoking just data.datacite.org/10.14469/hpc/1087 where you can find both mnpub and mnova formats listed. A command such as data.datacite.org/chemical/x-mnpub/10.14469/hpc/1087 allows the file retrieval to be incorporated into automated workflows based just on the doi and the media type desired. Note my parenthetical comment above about finding data; here you only need its doi to retrieve it!
  3. The URL above downloads a small text file with the suffix .mnpub which contains in essence two components:

    • A URL pointing directly to an .mnova file at the repository for which the doi has been issued
    • A signature key derived used to verify that the public key of the publisher (the data repository in this instance) was counter-signed by Mestrelab.
  4. If you now download the application program and install it (but for the purpose of this demonstration, ignore any requests to try to license the program. Use it unlicensed) and open the .mnpub file using it, you should get the below.The application program has checked the signature key, and if valid, proceeds to download a full data file (a .mnova file in this case), and to analyze and display it within the program. The data is fully active; it can be manipulated and analysed. Notice in the picture below, the red arrow points to the state of the license, in this case not present.
    mn
  5. It is also possible to apply this procedure to the raw data as it emerges from the (Bruker) spectrometer, and compressed into a .zip archive. The MestreNova software will automatically process the contents by applying various default parameters, although the result may not correspond exactly to that present in e.g. the equivalent .mnova file (which may have had specific parameters applied).

It is my hope that anyone who records NMR data and processes it using software such as MestreNova will now consider using the mechanism above to accompany their submitted articles, rather than just automatically pasting a static image of the spectrum into a PDF file as "supporting information". This is part of what is meant by "managed research data" (RDM).

One cannot help but note that many types of scientific instrument nowadays come with bespoke software for analysing the data they produce. Very often this software is unavailable to anyone who has not purchased the instrument itself. To make the data available to others, the processed data and its visual interpretation often have to be reduced, with much consequent information loss, to a lowest common denominator format such as Acrobat/PDF. Here we see a mechanism for avoiding any such information loss whilst enabling, for that dataset only, the full potential for (re)analysing the data. It will be interesting to see if other examples of this model or its equivalent emerge in the near future.

 
 
 

Research data: Managing spectroscopy-NMR.

Wednesday, March 16th, 2016

At the ACS conference, I have attended many talks these last four days, but one made some “connections” which intrigued me. I tell its story (or a part of it) here.

But to start, try the following experiment.

  1. Find a Word document of .docx type on your hard drive
  2. Remove the .docx suffix and replace it with a .zip suffix.
  3. Expand as if it is an archive (it is!).
  4. A folder is created and this itself contains four further folders. These all contain XML files, and in the sub-folder actually called word you will find something called document.xml That file contains the visible content of the document; all the others are support documents, including styles etc.

The reason this is important was made clear in Santi Dominguez’ talk. Most of it was concerned with introducing Mbook, an ELN (electronic laboratory notebook) but the relevance to the above comes from his introduction of Mpublish, a forthcoming product targeting the area of research data management. What is the connection? Well, NMR spectrometers produce raw outputs as collections of files, much in the manner of the exploded word document above. Some files contain the raw FID, others contain the acquisition parameters, etc. These files are then turned into the traditional spectra by suitable processing software such as Mestrenova (part of the same ecosystem as Mpublish). Most users of such programs then squirt the spectra into a PDF file and it is this last document that is preserved as “research data” – almost invariably this is the version sent off to journals as the supporting information or SI for the article. SI is called information for a good reason; in such a container it is very often not easily usable data, and functions just visually.

So what is the problem? Well, the conversion of the NMR fileset (and quite possibly many other forms of spectroscopy) into a PDF file is a lossy process. It cannot be reversed; information has been lost. And only really a human who can easily retrieve and interpret such a visual presentation.

Santi described how Mpublish can assemble all the files associated with the instrumental outputs, optionally add chemical structure and other information, collect suitable metadata describing the contents and create a .zip archive. As we saw with Word however, the suffix does not even need to be .zip. It was suggested that it be this information-complete archive that should really be used as SI to accompany an article in which NMR data is invoked to support the narrative. In the reverse process, anyone downloading this zip archive could themselves potentially acquire full access, without information loss, to the original NMR data. There is a little further magic that needs to be included to make the process work which I do not include here. When Mpublish becomes available to play with, I will complete that story here.

It is good to report that software is starting to appear which enhances the management and reporting of research data as part of the publication process. The “rules” and “best practice” of this game are still being written however. In this regard, I feel that it is the researchers themselves that must play a vital role in defining the rules. Let us not cede that role just to publishers.