Nowadays, data supporting most publications relating to the synthesis of organic compounds is more likely than not to be found in associated “supporting information” rather than the (often page limited) article itself. For example, this article[1] has an SI which is paginated at 907; almost a mini-database in its own right! Here I ponder whether such dissemination of data is FAIR (Findable, accessible, interoperable and re-usable).[2]
I am going to use this article as my starting point.[3] One of the compounds discussed there is shown below; it is not explicitly discussed in the main body of the article. So how findable is it?
- A search of Scifinder (Chemical abstracts) using the structure above reveals one hit, the source being the expected one.[3]
- A search of Reaxys (used to be Beilstein) reveals no hits in their own database, but one hit is noted in …
- Pubchem, where it occurs as substance 163835830. The source is again cited correctly[3]. One of the properties reported is the InChI key: JSLVVAICXSKSEQ-UHFFFAOYSA-N. This is the same key generated from the structure drawing programs Chemdraw or ChemDoodle.
- Google on the other hand finds nothing for JSLVVAICXSKSEQ-UHFFFAOYSA-N.[4]
- I also tried Google Scholar but again with no luck.
So supporting information does appear to be indexed by both Chemical Abstracts and Pubchem; it is thankfully not a graveyard![5] The chemical databases do return valuable additional information about the molecule, such as e.g. its InChI key and much else besides. Given that presumably the open PubChem resource IS indexed by Google, it must be a policy somewhere that prevents e.g. JSLVVAICXSKSEQ-UHFFFAOYSA-N from being found.
I suppose the next question might be Supporting information: chemical graveyard or invaluable resource for chemical spectra? I confess here that this post was in fact inspired by a previous one on the topic of the provenance of NMR spectra. And perhaps also with some input from the concept of sonification of spectra, in which an instrumental spectrum is converted into a sound signature to allow blind people access to such information.‡ I wonder whether a sonified unique digital signature could be used to search for spectra, somewhat in the manner that InChI helped in tracking down (or not) the molecule above? I think it would be reasonable to say that e.g. NMR spectra as embedded in say a 907 page supporting information document are likely to be very much less FAIR[2]. The solution there of course is better provenance and better metadata, as I previously mulled.
‡I cannot help but wonder what a carbonyl group sounds like!
References
- J.M. Lopchuk, K. Fjelbye, Y. Kawamata, L.R. Malins, C. Pan, R. Gianatassio, J. Wang, L. Prieto, J. Bradow, T.A. Brandt, M.R. Collins, J. Elleraas, J. Ewanicki, W. Farrell, O.O. Fadeyi, G.M. Gallego, J.J. Mousseau, R. Oliver, N.W. Sach, J.K. Smith, J.E. Spangler, H. Zhu, J. Zhu, and P.S. Baran, "Strain-Release Heteroatom Functionalization: Development, Scope, and Stereospecificity", Journal of the American Chemical Society, vol. 139, pp. 3209-3226, 2017. https://doi.org/10.1021/jacs.6b13229
- M.D. Wilkinson, M. Dumontier, I.J. Aalbersberg, G. Appleton, M. Axton, A. Baak, N. Blomberg, J. Boiten, L.B. da Silva Santos, P.E. Bourne, J. Bouwman, A.J. Brookes, T. Clark, M. Crosas, I. Dillo, O. Dumon, S. Edmunds, C.T. Evelo, R. Finkers, A. Gonzalez-Beltran, A.J. Gray, P. Groth, C. Goble, J.S. Grethe, J. Heringa, P.A. ’t Hoen, R. Hooft, T. Kuhn, R. Kok, J. Kok, S.J. Lusher, M.E. Martone, A. Mons, A.L. Packer, B. Persson, P. Rocca-Serra, M. Roos, R. van Schaik, S. Sansone, E. Schultes, T. Sengstag, T. Slater, G. Strawn, M.A. Swertz, M. Thompson, J. van der Lei, E. van Mulligen, J. Velterop, A. Waagmeester, P. Wittenburg, K. Wolstencroft, J. Zhao, and B. Mons, "The FAIR Guiding Principles for scientific data management and stewardship", Scientific Data, vol. 3, 2016. https://doi.org/10.1038/sdata.2016.18
- G.M.S. Yip, Z. Chen, C.J. Edge, E.H. Smith, R. Dickinson, E. Hohenester, R.R. Townsend, K. Fuchs, W. Sieghart, A.S. Evers, and N.P. Franks, "A propofol binding site on mammalian GABAA receptors identified by photolabeling", Nature Chemical Biology, vol. 9, pp. 715-720, 2013. https://doi.org/10.1038/nchembio.1340
- S.J. Coles, N.E. Day, P. Murray-Rust, H.S. Rzepa, and Y. Zhang, "Enhancement of the chemical semantic web through the use of InChI identifiers", Organic & Biomolecular Chemistry, vol. 3, pp. 1832, 2005. https://doi.org/10.1039/b502828k
- M. Karthikeyan, and R. Vyas, "ChemEngine: harvesting 3D chemical structures of supplementary data from PDF files", Journal of Cheminformatics, vol. 8, 2016. https://doi.org/10.1186/s13321-016-0175-x