Posts Tagged ‘editor’

Two stories about Open Peer Review (OPR), the next stage in Open Access (OA).

Thursday, October 5th, 2017

We have heard a lot about OA or Open Access (of journal articles) in the last five years, often in association with the APC (Article Processing Charge) model of funding such OA availability. Rather less discussed is how the model of the peer review of these articles might also evolve into an Open environment. Here I muse about two experiences I had recently.

Organising the peer review of journal articles is often now seen as the single most important activity a journal publisher can undertake on behalf of the scientific community; the very reputation of the journal depends on this process being conducted responsibly, thoroughly and with integrity by the selected reviewers. Reviewers conduct this process voluntarily, mostly anonymously, without remuneration or recognition and often with short deadlines for completion. After one such process, I recently received an interesting follow-up email from the journal, suggesting I register my activity with Publons.com, a site set up to register and give non-anonymous credit for reviewing activities. I should say that Publons is a commercial company, set up in 2012 to to “address the static state of peer-reviewing practices in scholarly communication, with a view to encourage collaboration and speed up scientific development”. Worthy aims, but like many a .com company nowadays, one might ask what the back-story might be. Thus many of the Internet giants, Google, Facebook, Twitter etc, do have back-stories, which often underpin their business models, but which may only emerge years after their founding. With only a hazy idea of what Publons’ back-story might be, I went ahead and registered my reviewing activity.

After doing so, I then accessed my entry. You only learn that I have reviewed for a particular journal, but nothing about the actual process itself. I did not really think that this experiment had done much to encourage collaboration and speed up scientific development. It might be useful for early career researchers to get their name exposed however.

I can almost understand why the review itself might not be publicly displayed, but as a result you learn nothing about the factual basis of the review and whether it might have been conducted responsibly, thoroughly and with integrity. Instead, I now suspect that the presence of my name on this site might merely encourage other publishers to deluge me with requests for further (freely donated) refereeing.

Discussing this at lunch, a colleague (thanks Ed!) reminded me of a veritable journal called Organic Syntheses. Here, authors submit a synthetic procedure and open identified “checkers” are invited to repeat the procedure and comment on it. The two roles are kept separate (i.e. the checkers do not become co-authors), but they could get credit for their activity. Thus if you view a typical recent entry[1] you will see a full biography and affiliation of the checkers given at the end, with footnotes often describing their own observations if they differ from those of the authors. 

This set me thinking whether an open peer review process might also contain such an element of checking, as well as informed comment, nay opinion, about the article itself and the conclusions it makes. The opportunity arose when I was contacted by an author who was about to submit a computational article to a journal. This journal allowed open peer review. If I agreed to review, my name would be attached to the article if accepted for publication. I undertook this on the basis that I would use this review to conduct some limited checking of the computations and other assumptions underpinning the conclusions in the submitted article. I also wanted this open process to include the data on which my review was based. Most importantly if anyone wished to replicate my replication, the barriers to doing so should be as low as is possible. Shortly thereafter, I received a formal invitation from the journal and I set about my task. Crucially, all my own calculations supporting the review were archived in a data repository, albeit under embargo. In my cover letter I included the DOI for my data and the embargo access code, so that the authors (and the editor of the journal if they so wished) could inspect the data against which I wrote my review.

Then followed standard procedures, whereby the authors took my comments into consideration, revised the article and the final version was indeed accepted and published.[2] You will find the two referees/checkers listed, although unlike Organic Syntheses,  there is no bibliographic information about them or their affiliation. I did ask the journal if they could at least link my ORCID identifier to my name, but that request was refused. If my name had been a common one, then disambiguating it into a unique identity could be a challenge. There was also no mechanism to associate my identity on the journal with any data on which I had based my review. Really, the only open aspect of this process was just my (potentially ambiguous) name, nothing else. No follow-up was received from the journal to add the review to Publons. 

The next stage was to contact the author who had originally set the process under way to ask them if they would mind my releasing the data on which my review had been based. They agreed, as also they did to my telling this story. The overall outcome is thus a published article with the reviewers (if not their reviews or any supporting evidence for their review) openly named. In this specific case, there is also an open dataset with a formal link back to the article in the form of a DOI (10.14469/hpc/2640, although I suspect this aspect is unique, even precedent setting), but one driven by the reviewer and not the journal. It would be nice to have bidirectional links between both article and the review data, but I do not know any publishers currently operating such a mechanism (if anyone knows such, please tell).

Now to the broader questions about the process described above. I think that the aspiration to encourage collaboration and speed up scientific development may indeed have been promoted by this association between article and the data assembled by the reviewer. Whether the final article was improved as a result of the processes described here I will leave the authors to comment if they wish. As with the checkers employed by Organic Syntheses, such a review process takes not just time, but resources. Resources that currently have to be freely donated by the reviewers and their host institution and which clearly cannot become expensive, time-consuming or onerous. That was not the case as it happens here; my contributions were facilitated by my having sufficient expertise to perform the tasks I undertook really quite quickly.

I will raise one more issue; that of whether to add my review to the dataset which is now openly available. In fact it is not included, in part because it related to the initially submitted version of the MS. The final MS version has been revised and so many of the comments in my review may only make sense if you have the first version to hand. It would be perhaps unreasonable to make the first drafts of manuscripts routinely available (although historians of science would probably love that!) alongside the reviews of that first draft. But I could also see a case for doing so if the community agreed to it. One to discuss for the future I think. There is also the associated issue of what should happen to any dataset associated with a review in the event that the final article is rejected and not accepted. Should the data remain permanently under embargo and the reviewer’s identity permanently anonymous? Perhaps opening up even such datasets might nevertheless  encourage collaboration and speed up scientific development, but I fancy some would consider that a step too far!

References

  1. J. Zhu, "Preparation of N-Trifluoromethylthiosaccharin: A Shelf-Stable Electrophilic Reagent for Trifluoromethylthiolation", Organic Syntheses, vol. 94, pp. 217-233, 2017. https://doi.org/10.15227/orgsyn.094.0217
  2. L. Li, M. Lei, Y. Xie, H.F. Schaefer, B. Chen, and R. Hoffmann, "Stabilizing a different cyclooctatetraene stereoisomer", Proceedings of the National Academy of Sciences, vol. 114, pp. 9803-9808, 2017. https://doi.org/10.1073/pnas.1709586114

The challenges in curating research data: one case study.

Friday, April 28th, 2017

Research data (and its management) is rapidly emerging as a focal point for the development of research dissemination practices. An important aspect of ensuring that such data remains fit for purpose is identifying what curation activities need to be associated with it. Here I revisit one particular case study associated with the molecular structure of a product identified from a photolysis reaction[1] and the curation of the crystallographic data associated with this study.

This particular dataset (CSD, dataDOI: 10.5517/cctnx5j) is associated with an article entitled “Single-Crystal X-ray Structure of 1,3-Dimethylcyclobutadiene by Confinement in a Crystalline Matrix“.[1] Data for crystal structures supporting a research article is required (at least in part) to be deposited into the Cambridge structure database (internal reference MUWMEX) and for which a significant level of curation is performed. Although the definition of the term curation has evolved over the last few years, here I take it to include the following:

  1. Identification of appropriate metadata describing the data. For molecules, this would include any identifiers such as the name of the molecule and the connectivities of the atoms constituting that molecule.
  2. The submission of this metadata to a suitable aggregator, such as e.g. DataCite and its inclusion in any other databases associated with the data. These two tests are part of the FAIR data guidelines[2], covering the F (findable) and A (accessible).
  3. Performing any validation tests for the data that can be identified. With crystal structure data in CIF format, this is defined by the utility checkCIF and helps to ensure the I (inter-operable) of FAIR. The R refers in part to the licenses under which the data can be re-used.

On (it has to be said rare) occasions, these procedures can lead to a disparity between the author’s conclusions arrived on the basis of their acquired data and the metadata identified by the independent curators. This difference is most obviously illustrated in this case study by the chemical names inferred by the curation process for the structure represented by the data in the CSD:

  • chemical name: “tetrakis(Guanidinium) 25,26,27,28-tetrahydroxycalix(4)arene-5,11,17,23-tetrasulfonate 1,5-dimethyl-2-oxabicyclo[2.2.0]hex-5-en-3-one clathrate trihydrate
  • chemical name synonym: “tetrakis(Guanidinium) tetra-p-sulfocalix(4)arene 1,3-dimethylcyclobutadiene carbon dioxide clathrate trihydrate“.

Only the synonym agrees with the title given by the original authors in their publication.[1] One might indeed strongly argue that these two names are not in fact synonyms, since they refer to quite different chemical structures with different atom connectivities. A search of the database for the sub-structure corresponding to 1,3-dimethylcyclobutadiene does not reveal any hits and so the information implied by this synonym is not recorded in the index created for the CSD database.

I asked the scientific editors of the CSD for some guidance on the curation procedures applied to crystal structure datasets and they have kindly allowed me to quote some of this.

  1. “In cases such as this, we as editors are sometimes faced with conflicting information and have to try our best to strike a balance between the data presented in the CIF, a published interpretation and our knowledge based on the information already in the CSD”.
  2. “In areas where there is a particular conflict between these, we often would include a comment (usually in the Remarks or Disorder field as appropriate)”. For this particular dataset, one finds the following under the Disorder field:
    • “Under UV radiation the clathrated pyrone molecule converts to a disordered mixture of square-planar 1, 3-dimethylcyclobutadiene and rectangular-bent 1, 3-dimethylcyclobutadiene in van der Waals contact with a carbon dioxide molecule. The ratio of the square-planar to rectangular-bent 1, 3-dimethylcyclobutadiene clathrate is modelled with occupancies 0.6292:0.3708”.
    • It is not entirely obvious however whether this last comment originates from the original authors or from the data curators. It does not resolve the difference between the assigned chemical name and the indicated chemical name synonym.
  3. “In the case of MUWMEX, I think that the editor produced a diagram (below) which seems chemically reasonable based on the crystallographic data with which we were provided and tried to cover the situation regarding disorder, van der Waals contacts etc in the ‘Disorder’ field. At this point, it is left to the CSD user to decide for themselves.”

We have arrived at a point where the CSD user must indeed decide what the species described by this dataset actually is. Ideally, the best recourse would be to acquire the original data in full and repeat the crystallographic analysis. This is an aspect of the curation of crystallographic data that is not conducted as part of the current processes, which would require as a minimum a superset known as the hkl information to be present in the data. Again, to quote the CSD scientific editors:

  1. “With regard to your question: Is there any mechanism in the Conquest search to identify structures where the hkl information is present? I understand that it is not currently possible to do this in ConQuest. It is, however, possible … to access structure factor data (where available) using Access Structures.”

For MUWMEX, the hkl information is not present in the CSD dataset and in 2010 when the structure was published would have to be obtained directly from the authors. By 2016 however, its presence in deposited datasets was becoming far more common. It is worth pointing out that even the hkl information is not the complete data recorded for the experiment.  That is represented by the original image files recording the X-ray diffractions. This latter is hardly ever available as FAIR data even nowadays.

I hope I have here illustrated at least some of the challenging aspects of curating scientific data and the issues that can arise when derived metadata (in this case the name and the atom connectivities of a molecule) reveal conflicts with the original interpretations. This for an area of chemistry where both the data deposition and its curation is a very mature subject, having operated for ~52 years now. It is still a process that requires the intervention of skilled curators of the data, but perhaps even more importantly it reveals the need to identify even more strictly what the provenance of the interpretations is. Should the CSD curation rest merely at the stage of teasing out and flagging inconsistencies and allowing the user to then take over to resolve the conflicts? Should it be more active, in re-analyzing data for each entry where conflicts have been detected? Perhaps the latter is not practical now, but it might be in the near future. What is certain is that with increasing availability of FAIR data these sorts of issues will increasingly come to the fore. And not just for the very well understood case of crystallographic data but for many other types of data.

References

  1. Y. Legrand, A. van der Lee, and M. Barboiu, "Single-Crystal X-ray Structure of 1,3-Dimethylcyclobutadiene by Confinement in a Crystalline Matrix", Science, vol. 329, pp. 299-302, 2010. https://doi.org/10.1126/science.1188002
  2. M.D. Wilkinson, M. Dumontier, I.J. Aalbersberg, G. Appleton, M. Axton, A. Baak, N. Blomberg, J. Boiten, L.B. da Silva Santos, P.E. Bourne, J. Bouwman, A.J. Brookes, T. Clark, M. Crosas, I. Dillo, O. Dumon, S. Edmunds, C.T. Evelo, R. Finkers, A. Gonzalez-Beltran, A.J. Gray, P. Groth, C. Goble, J.S. Grethe, J. Heringa, P.A. ’t Hoen, R. Hooft, T. Kuhn, R. Kok, J. Kok, S.J. Lusher, M.E. Martone, A. Mons, A.L. Packer, B. Persson, P. Rocca-Serra, M. Roos, R. van Schaik, S. Sansone, E. Schultes, T. Sengstag, T. Slater, G. Strawn, M.A. Swertz, M. Thompson, J. van der Lei, E. van Mulligen, J. Velterop, A. Waagmeester, P. Wittenburg, K. Wolstencroft, J. Zhao, and B. Mons, "The FAIR Guiding Principles for scientific data management and stewardship", Scientific Data, vol. 3, 2016. https://doi.org/10.1038/sdata.2016.18

A wider look at π-complex metal-alkene (and alkyne) compounds.

Monday, June 13th, 2016

Previously, I looked at the historic origins of the so-called π-complex theory of metal-alkene complexes. Here I follow this up with some data mining of the crystal structure database for such structures.

Alkene-metal "π-complexes" have what might be called a representational problem; they do not happily fit into the standard Lewis model of using lines connecting atoms to represent electron pairs. Structure 1 was the original representation used by Dewar intending the meaning of partial back donation from a filled metal orbital to the empty π* of the alkene. At the other extreme these compounds can be called metallacyclopropanes (2) in which only single bonds feature (these can be thought of as representing full back bonding from metal to alkene and full forward bonding from alkene to metal). Representations 3 and 4 are a more fuzzy blend of these, implying some sort of partial bond order for the metal-carbon bonds. Taken together, they imply that the formal bond order of the C-C bond might vary between single to double. Structures 1 and 2 in particular imply that there might be two distinct ways in arranging the bonding and that π-complexes and metallacyclopropanes might therefore be distinct valence-bond isomers, each potentially capable of separate existence.

Why do these representations matter? Well, I am going to mine the crystal structure database for these species to try to see if there is any evidence for a bimodal distribution in the C-C lengths, perhaps indicating evidence of the isomerism suggested above. Such a structural database is indexed against atom-pair connectivity in the first instance and then bond type; one can specify the following types of bond connecting any two atoms: single, double, triple, quadruple, polymeric, delocalised, pi and any. It is not entirely obvious which if any of these types apply to structure 1 (it is not possible to draw a bond ending at the mid-point of another bond using the Conquest structure editor); the dashed lines in structures 3 and 4 could be classed as delocalised, pi, or most generally any. The search query can be constructed thus, where the two carbons carry R which can be either H or C and all four C-R bonds are specified as acyclic (to try to avoid complications by excluding compounds such as cyclic metallacenes). Because representation 1 cannot be constructed in the editor, I am going to specify that each carbon carries four bonds of any type in the first instance. The torsion specified is defined as R-C-C-M and the full queries can be found deposited here.[1]

If the metallacyclopropane representation 2 is defined with explicit single bonds, one gets only 22 hits (no errors, no disorder, R < 0.1). The distribution of C-C bond lengths is shown below. Already one sees a representational problem emerging. A true metallacyclopropane might be expected to show a C-C single bond length, say > ~1.5Å. But only one or two of these examples actually have this value, the most probable value being ~1.4Å.

Using representation 3, one gets 1861 hits, but as before one sees a maximum at ~1.4Å with a tail reaching to both single and double bond values for the C-C distance.

If the C-C bond is also specified as "any", the hits increase to 3948, but the bond length distribution is still very similar, with no sign of any bimodal distribution.

Such a distribution is however found if the torsions between the R-C bond vector and the C-M bond vector are plotted (for all types of bond). A large number of the complexes have a torsion <90°, which suggests that in fact the substituent R is probably interacting with the metal (even though this would lead to formal cyclicity, specifying R-C as acyclic does not detect this interaction). Could this be masking a bimodal distribution in the C-C lengths?

If the previous search is repeated, but this time specifying that all four torsions must lie in the range 90-180° (the range expected for a "classical" alkene-metal complex and selecting only the top right hand side cluster in the plot above) the reduced value of 1051 hits are obtained, but the monomodal distribution remains.

For this last set, here is a plot of the two C-metal bond length, with colour indicating the C-C bond length, indicating the two C-metal bonds are clearly linearly correlated.

One final variation;  the atom on either C can only be H or a 4-coordinate (sp3) carbon; 645 hits. Again, a monomodal distribution centered at 1.4Å.

So this foray through metal alkene complexes suggests that there is a continuum between the formal metallacyclopropane with a C-C single bond and the only slightly perturbed alkene-metal complex with a C=C double bond. Whilst this would not prevent any one of these compounds existing as two distinctly different valence-bond isomers, it makes it very unlikely. I had noted in an earlier post that for molecules of the type RX≡XR (X=Si, Ge, Sn, Pb) that there was indeed a clear bimodal distribution of the X-X lengths evident in the crystal structures (for a relatively small sample number). The structures 1-4 shown at the start of this post are all simply just variations in a continuum and not distinct isomers.

POSTSCRIPT:  I noted above the bimodel distribution in compounds involving formal triple bonds. So I repeated the search above for π-complex metal-alkyne complexes. Specifying an acyclic C-R bond, and any for the CC bond type, one gets the following.

There is now a tantalizing suggestion of two clusters, one at 1.3 and another at 1.4Å. The torsional distribution shows that the latter distance appears to be associated with much smaller torsions, whereas the top right cluster is associated with shorter lengths.

If the torsions are restricted to the range 90-180, then the histogram looses the smaller cluster, and perhaps gains a second cluster at 1.22Å?  As I said, all quite tantalizing!


The tail in all the histograms extends into the 1.1-1.3Å region, which seems unreasonable for a carbon where four bonds are specified. This region probably represents errors in the crystallographic analysis or reporting. But who knows, perhaps some very unusual compounds are lurking there!

 

References

  1. H. Rzepa, "A wider look at the π-complex theory of metal-alkene compounds.", 2016. https://doi.org/10.14469/hpc/642

Jmol and WordPress: Loading 3D molecular models, molecular isosurfaces and molecular vibrations into a blog

Saturday, April 12th, 2008

A lemniscular molecular orbital

Click on the static image to get an active model. The code used to obtain the above was:

  1. This line is best added to the theme header by editing the file /wp-content/themes/default/header.php to add the following line in the header:

    <script src="../Jmol/Jmol.js" type="text/javascript"></script>

  2. <img onclick="jmolInitialize('../Jmol/','JmolAppletSigned.jar');jmolSetAppletColor('yellow');
    jmolApplet([500,500],'load http://www.ch.imperial.ac.uk/rzepa/blog/wp-content/uploads/2009/08/HV2-62.jvxl;isosurface translucent;zoom 5;moveto 4 0 2 0 90 70;');"  alt="A lemniscular molecular orbital" src="http://www.ch.imperial.ac.uk/rzepa/blog/wp-content/uploads/2008/04/14-knot.jpg" />

    where of course the uploads directory needs to be modified to correspond to your own content, and the file and script following it also correspond to the effect you wish to achieve.

The path wp-content/uploads/2009/08/ is that created by the built-in editor using the Add media file upload mechanism. The Jmol directory is located at the level above that of the blog itself. The JVXL file is created from either the corresponding (Gaussian) output file, or a CUB file created using a program such as Gaussview. Any suitable surface can be displayed using JVXL. In addition to MOs, we have also displayed ELF (Electron localization function) isosurfaces and molecular vibrations. For the latter, use a script of the form

'load wp-content/uploads/2008/04/vibration.log; frame 9; vectors on;vectors 4;vectors scale 5.0; color vectors green; vibration 10;animation mode loop;'

where the vibration you want is contained in e.g. frame 9.

There does appear to be a display bug with the above; the Jmol model replaces the window rather than being inlined in it. Once the model is displayed, just refresh the page to return to the blog entry.

A recent addition is the display of non-covalent-interaction (NCI) surfaces, which are colour coded by using the values in one cube of points to colorize a second cube.

'load wp-content/uploads/2011/05/isobornyl1.xyz;isosurface wp-content/uploads/2011/05/isobornyl1.jvxl colorscheme translucent bgyor;'