Posts Tagged ‘supervisor’

The provenance of scientific data – establishing an audit trail.

Thursday, March 30th, 2017

In an era when alternative facts and fake news afflict us, the provenance of scientific data becomes ever more important. Especially if that data is available as open access and exploitable by others for both valid scientific reasons but potentially also by those with other motives. Here I consider the audit trail that might serve to establish data provenance in one typical situation in chemistry, the acquisition of NMR instrumental data. 

Here I describe how such data is generated in my department; details may vary elsewhere.

  1. The prospective user of the NMR service is allocated a service ID. In our case, that ID relates to the research group rather than to individual researchers. This ID is parochial, it does not reference any other information about the user in the institute. Only the service manager has the information to associate this ID with real users and this information is normally not distributed.
  2. When a sample is submitted, this ID is used to create a new folder containing the data as a sub-folder of the group ID and located on the NMR data servers.
  3. The dataset itself contains a number of files that contain an audit trail (names such as audita.txt, auditp.txt) with the fields: ##AUDIT TRAIL= $$ (NUMBER, WHEN, WHO, WHERE, PROCESS, VERSION, WHAT). Typically, none of these files have propagated the original user ID under which the data was collected; to do so would require a programmatic connection between the local authentication systems and the spectrometer software used, a connection that is normally missing. Thus the first break in the provenance trail.
  4. In principle other audit trails can be inferred from these files, such as the unique identity of the instrument provided by its manufacturer. Further information such as e.g. the probe used to collect the data (probes can be readily changed over) or any calibration data used in setting up the instrument for the data collection are by and large not recorded. To my knowledge, although an instrument can have a unique serial number, such serial numbers of swappable components such as probes are not recorded by the collection software. Thus the second break in the provenance trail.
  5. This data then needs to be processed by further software. In this case we use the MestreNova system for this task. Each dataset has editable assigned properties; below I show those that can be associated with the spectrum (accessed with MestreNova using Edit/Properties). All this comes from the information collected by the instrument. The user’s identity can be inserted into the “title” field, the display of which is off by default. 
  6. There is also a section for parameters, a synonym for which might be metadata and accessed using this program from View/Tables/Parameters. If Author was entered as a parameter in the dataset by the spectrometer software, the Mnova document would retrieve that information. Equally, an ORCID identifier for the author entered at the time of data collection and thus stored in the dataset could be read by Mnova, stored and displayed if configured to do so. It would be fair to say however that this option is rarely if indeed ever systematically implemented by NMR instrument data collection software and so is never propagated to the data processing software (as highlighted in red below). Thus a third break in the provenance trail.
    This is also an alternative and this time formal metadata field that can be populated, by default as shown below with the type of spectrum and nucleus. These properties are not controlled in the sense of only allowing those terms that are present in a specified dictionary. The jargon for such control is a metadata schema. This is not used here, since dissemination of this information is not intended; the software accepts whatever information it is given. 
    There are thus several opportunities to collect the identity of the experimenter and thus attribute provenance to the collected data, but this does very much depend on the will of researchers, institutions or publishers to enforce specific policies around this. The fourth break in the provenance trail.
  7. The dataset can then be uploaded (DOI: 10.14469/hpc/1291), at which stage provenance can finally be added using the ORCID credentials of the person publishing the dataset, who of course may or may not be the person who actually recorded the data! The full metadata for this specific collection can be seen at data.datacite.org/10.14469/hpc/1291. Or to put it another way, this is the first point in the provenance chain where the metadata is controlled by a schema and is also discoverable in a standard programmatic manner, i.e. the preceding link. The provenance is now formally associated with the ORCID identifier using the DataCite metadata schema. You should be aware that a local policy is that access to the repository at https://data.hpc.imperial.ac.uk is only allowed by cross-authentication with http://orcid.org/ using the user’s ORCID. This identifier is then automatically propagated to the metadata held at e.g. data.datacite.org/10.14469/hpc/1095. Currently however, none of any metadata originally recorded in either the instrumental file set or the processed MestreNova file is forwarded on to the metadata record held at DataCite; again loss of information and potentially of provenance
  8. The peer-reviewed article resulting from the interpretation of this data however can be associated with the provenance introduced in the previous stage; see data.datacite.org/10.14469/hpc/1267  and the IsReferencedBy property. 

Now imagine if there was a common thread in all the stages of acquiring, processing and publishing this scientific data based on the ORCID. 

  1. Providing an ORCID could be made an essential requirement of access to the instrument.
  2. This information would be propagated to the dataset …
  3. by inclusion in one or more of the audit trail files.
  4. At this stage, further persistent identifiers associated with the instrument manufacturer could be added, which help identify not only the instrument used, but sub-components such as the changeable probe. This would allow access to any calibration curves or probe sensitivity and other aspects.
  5. The ORCID and other relevant information could be picked up by the software used to convert the data into spectra and propagated into the metadata containers for this software …
  6. where its use is controlled by a specified schema.
  7. At this stage, the ORCID and information such as the nucleus recorded, the sample temperature etc can be propagated on to the final metadata records.
  8. And the reader of the article describing this work would have a formally defined provenance audit trail they could follow back to the start of the experiment or forward to a published article. In this case, the data claims provenance (acquired from peer review) from the article, but it should also work in reverse with the article claiming provenance from the data on which it is based. The indexing of this bidirectional exchange is one of the exciting features that we should see emerging from CrossRef (holders of metadata about articles) and DataCite (holders of metadata about research data) in the near future.

We are clearly a little way from having the infrastructures described above for establishing such data audit trails. To do so will require cooperation from instrument manufacturers, at least in the example as charted above, as well as researchers, institutions, publishers, peer-reviewers and funding bodies. The first step would be to ensure that all scientists who intend collecting, processing and publishing data should claim an ORCID. That remark is directed specifically at undergraduate, postgraduate and post-doctoral researchers, not just at their supervisor or their PI (principal investigator). At a point when the discussion about alternate facts and perhaps even alternate data risks a general loss of confidence in science, we should be pro-active in establishing trust in the scientific processes.


You can see an example obtained by this process at DOI: 10.14469/hpc/1095

This requirement is a strong driver for the uptake of ORCID amongst our student population.

The 2016 Bradley-Mason prize for open chemistry.

Tuesday, October 4th, 2016

Peter Murray-Rust and I are delighted to announce that the 2016 award of the Bradley-Mason prize for open chemistry goes to Jan Szopinski (UG) and Clyde Fare (PG).

Jan’s open chemistry derives from a final year project looking at why atom charges derived from quantum chemical calculation of the electronic density represent chemical information well, but the electrostatic potential (ESP) generated from these charges is very poor and conversely charges derived from the computed electrostatic potential are incommensurate with chemical information (such as the electronegativity of atoms). He has developed a Python program called ‘repESP’ in which ‘compromise’ charges are generated which attempt to reconcile the physical world-view (fitting the ESP) with chemical insight provided by NPA (Natural Population Analysis). Jan was the main driver to making his code open source, “opening his supervisor’s eyes” to the various flavours of open source licences. To ensure that all subsequent improvements to the program remain available to anyone, the source code has been released under a ‘copyleft’ licence (GPL v3) and is maintained by Jan on GitHub, where Jan looks forward to helping new users and collaborating with contributors.

Clyde has made various contributions to opensource chemistry over the period of his PhD, with the focus mainly on utilities to improve quantum chemical research and the enhancement of a popular machine learning library with a method that has been successful in chemometrics, creation of an opensource channel for teaching chemists programming and data analysis and creation of a tool to help encourage open sourcing software development. Cclib is the most popular library for parsing quantum chemical data from output files and Clyde has contributed patches for the Atomic simulation environment which enables control of quantum chemical codes from a unified python interface. He was responsible for the construction of a computational chemistry electronic notebook published to github and which is now under active development by others as well. This aims to encapsulate computation chemical research projects, both for the sake of reproducibility and for the sake of organising and keeping track of quantum chemical research. Alongside this platform he created an enhanced Gaussian calculator for the Atomic Simulation Environment that enables automatic construction of ONIOM input files, also now under active development. He also made contributions to scikit learn, the most popular python machine learning framework, implementing a kernel for Kernel Ridge Regression that has become the most successful kernel for regression over molecular properties. He was part of the team that won the 2014 sustainable software conference prize for creation of the opensource healthchecker software as part of Sustain. He has argued for opensource as a platform for teaching resources and created the Imperial Chemistry github user account, which is now run by the department. Materials for the Imperial Chemistry Data Analysis and Programming workshops implemented as Python Notebooks are now available through this account and continue under active development.

Criteria for the award will include judging the submission on its immediate accessibility via public web sites, what is visible and re-usable in this way and of evidence of either community formation/engagement or re-use of materials by people other than the proposer.

Kinetic isotope effect models as a function of ring substituent for indole-3-carboxylic acids and indolin-2-ones.

Wednesday, January 20th, 2016

The original strategic objective of my PhD researches in 1972-74 was to explore how primary kinetic hydrogen isotope effects might be influenced by the underlying structures of the transition states involved. Earlier posts dealt with how one can construct quantum-chemical models of these transition states that fit the known properties of the reactions. Now, one can reverse the strategy by computing the expected variation with structure to see if anything interesting might emerge, and then if it does, open up the prospect of further exploration by experiment. Here I will use the base-catalysed enolisation of 1,3-dimethylindolin-2-ones and the decarboxylation of 3-indole carboxylates to explore this aspect.

Indole diazocoupling Indole diazocoupling

The systems and results are shown in the table below, summarised by the points:

1,3-dimethyl-indolinones:

  1. The free energy barriers are very low, but show an overall increase when changing the substituent from nitro to amino, with the 6-position being more sensitive than the 5. However, the increase is not consistent.
  2. The transition state mode changes regularly, the wavenumber more than doubling along the progression.
  3. The basic structure of the proton transfer evolves smoothly, from being an early transition state with 6-nitro to being a late one with 6-amino.
  4. The primary kinetic isotope effect shows less variation, but the trend is to increase as the transition state gets later, even beyond the point where the two bond lengths associated with the tranferring hydrogen are equal in length.
  5. As Dan Singleton has pointed out on this blog, the observed KIE is a combination of effects based purely on the transition state structure and effects resulting from the sharpness of the barrier inducing proton tunneling and this is itself related to the magnitude of νi. The KIE ratios tabulated below derive purely from the former and do not take into account any such tunneling. We can see from the variation in νi that such tunnelling contributions are likely to vary substantially across this range of substituents. As a result, deconvoluting the KIE due to the symmetry of the proton transfer from the contribution due to tunnelling is going to be difficult.
  6. There are other computational errors which might contribute, such as solvent reorganisations due to specific substituents, only partially taken into acount here. In effect the unsubstituted reaction geometry was used as the template for the others, followed of course by a re-optimisation which might not explore other more favourable orientations brought about by the substituents.

Indole-3-carboxylic acids:

  1. The free energy barriers are now much higher than the indolinones, but show a consistent decrease along the series from 6-nitro to 6-amino. This matches with the idea that the indole is a base and the basicity is increased by electron donation and decreased by electron withdrawal.
  2. The transition state mode again changes regularly, increasing as the barrier decreases.
  3. For 5-H, the computed free energy barrier matches that measured remarkably well.
  4. The calculated KIE increase regularly along the series 6-nitro to 6-amino.
  5. The calculated KIE for 5-H matches that measured very well, but that for the 5-chloro does not. One might safely conclude that the outlier is probably the experimental value. The KIE are not obtained by direct measurement of the rate of reaction, but inferred from solving the relatively complex rate equation with inclusion of some approximations and assumptions. Perhaps one of these approximations is not valid for this substituent, or possibly an experimental error has encroached. Were this work to ever be repeated, this entry should be prioritised.
  6. The overall variation in KIE is in fact quite small, but if the KIE can be measured very accurately, then they should be useful for comparison with such calculations.
  7. We cannot really conclude whether the magnitude of the KIE closely reflects the symmetry of the transition state. For all the examples below, the C-H bond is always shorter than the H-O bond. More extreme and probably multiple substituents on the ring (5,6-dinitro? 5,6-diamino?) might have to be used to probe a wider variation in transition state symmetry. For example, the maximum value for proton transfer from a hydronium ion was stated a long time ago to be around 3.6, [1] and it would be of interest to see if that value is attained when the proton transfer becomes fully symmetry.
1,3-dimethylindolin-2-ones[2]
Model ΔG298 (ΔH298) kH/kD (298K) rC-H, rH-O νi DataDOIs
6-nitro 1.94 3.22 1.256, 1.417 611 [3],[4]
5-nitro 1.82 3.65 1.289, 1.364 895 [5],[6]
H 2.48 4.40 1.326, 1.316 1130 [7],[8]
5-amino 6.73 3.86 1.337, 1.304 1182 [9],[10]
6-amino 3.19 4.43 1.349, 1.291 1226 [11],[12]
Indole-3-carboxylic acids[13]
6-nitro

25.1

2.72 1.279,1.391 706 [14],[15]
5-chloro 23.1 2.80 (2.23) 1.300,1.361 873 [16],[17]
5-H

22.1 (22.0)a[18]

2.87 (2.72)[18] 1.304,1.354 921 [19],[20]
6-amino 20.5 3.04 1.308,1.348 950 [21],[22]

aThe barrier is higher than previously reported because a significantly lower isomer of the ionised reactant was subsequently located.[21] Use of this new isomer also has a modest knock-on effect on the computed isotope effect for this system, bringing it into line with the other substituents and also with experiment.

Overall, this study of variation in kinetic isotope effects for proton transfer as induced by variation of ring substitution shows the viability of such computation. The total elapsed time since the start of this project is about three weeks, very much shorter than the original time taken to synthesize the molecules and measure their kinetics. Importantly, these were very much reactions occuring in aqueous solution, where solvation and general acid or general base catalysis occurred. Such reactions have long been thought to be very difficult to model in a non-dynamic discrete sense. The results obtained here tends towards optimism that such calculations may have a useful role to play in understanding such mechanisms.


I would like to express my enormous gratitude to my Ph.D. supervisor, Brian Challis, for starting me along this life-long exploration of reaction mechanisms. I hope the above gives him satisfaction that the endeavour back in 1972 has borne some more fruits.


References

  1. C.G. Swain, D.A. Kuhn, and R.L. Schowen, "Effect of Structural Changes in Reactants on the Position of Hydrogen-Bonding Hydrogens and Solvating Molecules in Transition States. The Mechanism of Tetrahydrofuran Formation from 4-Chlorobutanol<sup>1</sup>", Journal of the American Chemical Society, vol. 87, pp. 1553-1561, 1965. https://doi.org/10.1021/ja01085a025
  2. H. Rzepa, "Kinetic isotope effects for the ionisation of 5- and 6-substituted 1,3-dimethyl indolinones.", 2016. https://doi.org/10.14469/hpc/208
  3. H.S. Rzepa, "C 10 H 19 N 2 Na 1 O 8", 2016. https://doi.org/10.14469/ch/191802
  4. H.S. Rzepa, "C 10 H 19 N 2 Na 1 O 8", 2016. https://doi.org/10.14469/ch/191796
  5. H.S. Rzepa, "C 10 H 19 N 2 Na 1 O 8", 2016. https://doi.org/10.14469/ch/191800
  6. H.S. Rzepa, "C 10 H 19 N 2 Na 1 O 8", 2016. https://doi.org/10.14469/ch/191789
  7. H.S. Rzepa, "C 10 H 20 N 1 Na 1 O 6", 2016. https://doi.org/10.14469/ch/191787
  8. H.S. Rzepa, "C 10 H 20 N 1 Na 1 O 6", 2016. https://doi.org/10.14469/ch/191782
  9. H.S. Rzepa, "C 10 H 21 N 2 Na 1 O 6", 2016. https://doi.org/10.14469/ch/191803
  10. H.S. Rzepa, "C 10 H 21 N 2 Na 1 O 6", 2016. https://doi.org/10.14469/ch/191797
  11. H.S. Rzepa, "C 10 H 21 N 2 Na 1 O 6", 2016. https://doi.org/10.14469/ch/191804
  12. H.S. Rzepa, "C 10 H 21 N 2 Na 1 O 6", 2016. https://doi.org/10.14469/ch/191799
  13. H. Rzepa, "Decarboxylation of 5- and 6-substituted indole-3-carboxylic acids", 2016. https://doi.org/10.14469/hpc/220
  14. H.S. Rzepa, "C 9 H 15 Cl 1 N 2 O 8", 2016. https://doi.org/10.14469/ch/191807
  15. H.S. Rzepa, and H.S. Rzepa, "C 9 H 15 Cl 1 N 2 O 8", 2016. https://doi.org/10.14469/ch/191805
  16. H.S. Rzepa, "C 9 H 15 Cl 2 N 1 O 6", 2016. https://doi.org/10.14469/ch/191822
  17. H.S. Rzepa, "C 9 H 15 Cl 2 N 1 O 6", 2016. https://doi.org/10.14469/ch/191825
  18. B.C. Challis, and H.S. Rzepa, "Heteroaromatic hydrogen exchange reactions. Part 9. Acid catalysed decarboxylation of indole-3-carboxylic acids", Journal of the Chemical Society, Perkin Transactions 2, pp. 281, 1977. https://doi.org/10.1039/p29770000281
  19. H.S. Rzepa, "C 9 H 16 Cl 1 N 1 O 6", 2016. https://doi.org/10.14469/ch/191828
  20. H.S. Rzepa, "C 9 H 16 Cl 1 N 1 O 6", 2016. https://doi.org/10.14469/ch/191790
  21. H.S. Rzepa, "C 9 H 17 Cl 1 N 2 O 6", 2016. https://doi.org/10.14469/ch/191810
  22. H.S. Rzepa, "C 9 H 17 Cl 1 N 2 O 6", 2016. https://doi.org/10.14469/ch/191806

A two-publisher model for the scientific article: narrative+shared data.

Sunday, September 15th, 2013

I do go on rather a lot about enabling or hyper-activating[1] data. So do others[2]. Why is sharing data important?

  1. Reproducibility is a cornerstone in science,
  2. To achieve this, it is important that scientific research be open and transparent.
  3. Openly available research data is central to achieving this. It is estimated that less than 20% of the data collected in chemistry is made available in any open manner.
  4. RCUK (the UK research councils) wish increased transparency of publicly funded research and availability of its outputs

But it’s not all hot air, honestly. Peter Murray-Rust and I had started out on a journey to improve reproducibility, openness and transparency in (inter alia) scientific publishing in 1994. In 2001 we published an example of a data-rich article[3] based on CML, and by 2004 the concept had evolved into something Peter termed a datument[4]. Some forty such have now been crafted.[5]

In 2009, the journal Nature Chemistry was starting up, and I approached them with the idea of an interactive data exploratorium on the premise that a new journal might be receptive to new ways of presenting science. It was accepted and published[6] and was followed in 2010 by a second variation.[7] In both cases, these activated-figures were sent to the journal as part of the submission process, and hosted by them (they still are). You can even access them without a subscription to the journal!

Move on to 2012, when David Scheschkewitz had some very exciting silicon chemistry to report, we collaborated on some computational modelling, and sent the resulting article to Nature Chemistry for publication. This included the usual interactive table reporting the modelling and its data. However, it transpired that the production workflows for Nature Chemistry had been streamlined and I was informed that interactive tables could no longer be accepted. This time, we (i.e. the authors) would have to solve the issue of how to host and present the data ourselves.

I was very keen that this table be treated with equal weight to the article itself (citable in its own right) and that it not be downgraded to supporting information (ESI). My objection to ESI is that it is often poorly structured by authors, i.e. it is not prepared in a form which allows the data to be re-used, either by a perceptive human, or a logical machine. As a result it is often given little attention by referees (although bloggers seem to do a far better job) and furthermore can end up being lost behind a pay wall (the two Nature Chem interactive objects noted above can be openly accessed, but only if you know that they exist). So I determined that:

  1. The table should be immediately accessible by non-experts, but not through any convoluted processes of downloading a file, expanding it and finding the correct document within the resulting fileset to view in the correct program, which is how normal ESI is handled.
  2. The table and the data it contained within should be capable of acting as a scientific tool, forming what could be the starting point for a new investigation if appropriate.

To solve this issue, some lateral and quick thinking was needed. The solution was a two-component model in which the original article is treated as a “narrative“, intertwingled with a second, but nevertheless distinct component, the “data“. This data would follow the principles of the Amsterdam Manifesto; it would itself be citable. The two components would become symbiotes (a datument). The narrative[8] could cite this data and the data could back-link to the narrative. The data would inherit trust (i.e. peer review) from that applied to the narrative and the latter would inherit a date stamp and integrity from the data host (in this case Figshare[9]).*

The data itself can have two layers, presentation [9] using a combination of software (Jmol or JSmol for chemistry) which are used to invoke the “raw” data. That data itself is citable[10] (this is just a single example, resident as it happens on a different repository). The reader can choose use just the presentation layer or the underlying data.

The data object can be embedded in other pages; here it is below. The data sources for this table are themselves citable[11].



What are the advantages of such an approach? (the “what’s in it for me” question often asked by research students and their supervisors)

  1. Each of the components is held in an environment optimised for it and so can be presented to full advantage.
  2. The conventional narrative publisher does not necessarily also have to develop their own infrastructures for handling the data. They can choose to devolve that task to a “data publisher”.
  3. The data publisher (Figshare in this case) makes the data open. One does not need an institutional subscription to access it.
  4. “Added value” for each component can be done separately. Thus most narrative publishers would not necessarily wish to develop infrastructures for validating it or subsequently mining such “big data”. Indeed data mining of journals is prohibited by many publishers; it simply is either not possible or rendered so administratively difficult as to be impractical.
  5. Whilst a narrative article must clearly exist as a single instance (otherwise the authors would be accused of plagiarism), data can have multiple instances. Indeed, there exist protocols (SWORD) for moving data from one repository to another as the need arises. Publishing the same data in two or more locations is not currently considered plagiarism!
  6. The data component can be published as part of an article or say as part of a PhD thesis. This way, the creator of the data gets the advantages not of a date stamp associated with a narrative citation but of a much earlier stamp associated more closely with the actual creation of the data. That could easily and usefully resolve many disputes about who discovered what first, leaving the other issue of who interpreted what first to the narrative. I should mention that it is perfectly possible to “embargo” the data deposition so that it only becomes public when the narrative does (although you may choose not to do this).
  7. A data deposition cannot be modified, but a new version (which bidirectionally links back to the old one) can be published if say more data is collected at a future date.
  8. A whole infrastructure devoted just to enhancing the cited data can evolve; one that is unlikely to do so if the narrative publishers are the only stakeholders. For example, synthetic procedural data can be tagged using the excellent chemical tagger.
  9. It is relatively simple (=cheap) to build a pre-processor for publishing data, which for a research student can act as an electronic laboratory notebook, holding meta-data about the deposited/published data and the handles (doi) associated with each deposition. I have been using such an environment now for about seven years as the e-notebook for this blog for example. Thus the task of preparing figures and tables for a publication (or a blog post) is greatly facilitated. The same system is also used by research students and undergraduates for their lab work.
  10. I have noted previously how e.g. Google Scholar identifies data citations along with article citations in constructing an individual research profile. A researcher could become known for their published data as well as their published narratives. Indeed, it seems likely that the person who acquires and publishes the data, i.e. the research student, would then get accolades directly rather them all accruing to their supervisor.

But what can you, gentle reader of this blog, do to help? Well, ask if your institution already has, or plans to create a data repository. It can be local (we use DSpace) or “in-the-cloud” (e.g. Figshare). If not, ask why not! And if you are planning to submit an article for publication in the near future, ponder how you might better share its data.


As first circulated on 28 April, 2011. See 
http://www.epsrc.ac.uk/about/standards/researchdata/Pages/policyframework.aspx

The example given at the start of this post[8] contains only one table processed in this manner; the actual synthetic procedures are still held in more conventional SI.

*This blog uses the excellent Kcite plugin to manage citations.

The good folks at Figshare were extremely helpful in converting this deposition into an interactive presentation. Thanks guys!


References

  1. O. Casher, G.K. Chandramohan, M.J. Hargreaves, C. Leach, P. Murray-Rust, H.S. Rzepa, R. Sayle, and B.J. Whitaker, "Hyperactive molecules and the World-Wide-Web information system", Journal of the Chemical Society, Perkin Transactions 2, pp. 7, 1995. https://doi.org/10.1039/p29950000007
  2. R. Van Noorden, "Data-sharing: Everything on display", Nature, vol. 500, pp. 243-245, 2013. https://doi.org/10.1038/nj7461-243a
  3. P. Murray-Rust, H.S. Rzepa, and M. Wright, "Development of chemical markup language (CML) as a system for handling complex chemical content", New Journal of Chemistry, vol. 25, pp. 618-634, 2001. https://doi.org/10.1039/b008780g
  4. H.S. Rzepa, "Chemical datuments as scientific enablers", Journal of Cheminformatics, vol. 5, 2013. https://doi.org/10.1186/1758-2946-5-6
  5. H.S. Rzepa, "Transclusions of data into articles", 2013. https://doi.org/10.6084/m9.figshare.797481
  6. H.S. Rzepa, "The importance of being bonded", Nature Chemistry, vol. 1, pp. 510-512, 2009. https://doi.org/10.1038/nchem.373
  7. H.S. Rzepa, "The rational design of helium bonds", Nature Chemistry, vol. 2, pp. 390-393, 2010. https://doi.org/10.1038/nchem.596
  8. M.J. Cowley, V. Huch, H.S. Rzepa, and D. Scheschkewitz, "Equilibrium between a cyclotrisilene and an isolable base adduct of a disilenyl silylene", Nature Chemistry, vol. 5, pp. 876-879, 2013. https://doi.org/10.1038/nchem.1751
  9. D. Scheschkewitz, M.J. Cowley, V. Huch, and H.S. Rzepa, "The Vinylcarbene – Cyclopropene Equilibrium of Silicon: an Isolable Disilenyl Silylene", 2013. https://doi.org/10.6084/m9.figshare.744825
  10. H.S. Rzepa, "Gaussian Job Archive for C60H92Si3", 2012. https://doi.org/10.6084/m9.figshare.96410