{"id":27090,"date":"2024-06-11T14:55:01","date_gmt":"2024-06-11T13:55:01","guid":{"rendered":"https:\/\/www.ch.imperial.ac.uk\/rzepa\/blog\/?p=27090"},"modified":"2024-06-11T14:55:01","modified_gmt":"2024-06-11T13:55:01","slug":"data-discoverability-as-a-feature-of-journal-articles","status":"publish","type":"post","link":"https:\/\/www.rzepa.net\/blog\/?p=27090","title":{"rendered":"Data Discoverability as a feature of Journal Articles."},"content":{"rendered":"<div class=\"kcite-section\" kcite-section-id=\"27090\">\n<p>I can remember a time when journal articles carried selected data within their body as <em>e.g.<\/em> Tables, Figures or Experimental procedures, with the rest consigned to a box of paper deposited (for UK journals) at the British library. Then came ESI or electronic supporting information. Most recently, many journals are now including what is called a &#8220;<strong>Data availability<\/strong>&#8221; statement at the end of an article, which often just cites the ESI, but can increasingly \u00a0point to so-called FAIR data. The latter is especially important in the new AI-age (&#8220;FAIR is AI-Ready&#8221;).\u00a0One attribute of FAIR data is that it can be associated with a\u00a0DOI in addition to that assigned to the article itself, and we have been promoting the inclusion of that Data DOI in the citation list of the article.<span id=\"cite_ITEM-27090-0\" name=\"citation\"><a href=\"#ITEM-27090-0\">[1]<\/a><\/span>\u00a0Since the data can also cite the article, a bidirectional link between data and article is established. ESI itself can exceed 1000 &#8220;pages&#8221; of a PDF document and examples of chemical FAIR data exceeding 62 Gbytes<span id=\"cite_ITEM-27090-1\" name=\"citation\"><a href=\"#ITEM-27090-1\">[2]<\/a><\/span> (Also see DOI: <a href=\"https:\/\/doi.org\/10.14469\/hpc\/10386\" target=\"_blank\" rel=\"noopener\">10.14469\/hpc\/10386<\/a>) are known. Finding the chemical needle in that data haystack can become a serious problem. So here I illustrate a recent suggestion for moving to the next stage, namely the inclusion of a &#8220;<strong>Data Availability and Discovery<\/strong>&#8221; statement. The below is the text of such a statement in a recently published article.<span id=\"cite_ITEM-27090-2\" name=\"citation\"><a href=\"#ITEM-27090-2\">[3]<\/a><\/span><\/p>\n<hr \/>\n<p><em><strong>Data availability and discovery statement<\/strong>. Available as a FAIR\u00a0and AI-ready data collection accessible via doi: <a href=\"https:\/\/doi.org\/10.14469\/hpc\/13058\">10.14469\/hpc\/13058<\/a> for the overall collection<sup>18<\/sup> and Findable by following the hierarchy of data collections identified there. The data discovery and accessibility aspects are further enabled by using one of the following methods.<\/em><\/p>\n<ul>\n<li><em>\u00a0Discovery using a metadata search query. A example template for identifying kinetic isotope data deriving from a\u00a0 Gaussian calculation is here provided as: <a href=\"https:\/\/commons.datacite.org\/?query=media.media_type:chemical\/x-gaussian-log+AND+media.media_type:text\/plain+AND+(titles.title:*Exo*+OR+titles.title:*Endo*)\">https:\/\/commons.datacite.org\/?query=media.media_type:chemical\/x-gaussian-log+AND+media.media_type:text\/plain+AND+(titles.title:*Exo*+OR+titles.title:*Endo*)<\/a> This query can be modified or extended as required.<\/em><\/li>\n<li><em><span style=\"font-weight: 400;\">Discovery using an enhancement of Table 1,<\/span><sup style=\"font-weight: 400;\">55<\/sup><span style=\"font-weight: 400;\"> doi: <\/span><span style=\"font-weight: 400;\"><a href=\"https:\/\/doi.org\/10.14469\/hpc\/13370\">10.14469\/hpc\/13370<\/a><\/span><span style=\"font-weight: 400;\"> acting as a data Finding Aid via links to data collections associated with each row of Table 1. \u00a0<\/span><\/em><em><span style=\"font-weight: 400;\">A visual tool for displaying interactive 3D coordinate models illustrating the transition structure normal vibrational modes is interactively created by re-using data accessed and exploiting information in the metadata record associated with the FAIR doi of each dataset.<\/span><\/em><\/li>\n<\/ul>\n<p><a href=\"https:\/\/commons.datacite.org\/?query=media.media_type:chemical\/x-gaussian-log+AND+media.media_type:text\/plain+AND+(titles.title:*Exo*+OR+titles.title:*Endo*)\" target=\"_blank\" rel=\"noopener\"><img data-recalc-dims=\"1\" decoding=\"async\" data-attachment-id=\"27097\" data-permalink=\"https:\/\/www.rzepa.net\/blog\/?attachment_id=27097\" data-orig-file=\"https:\/\/i0.wp.com\/www.rzepa.net\/blog\/wp-content\/uploads\/2024\/06\/Screenshot-328-scaled-1.jpg?fit=2560%2C1615&amp;ssl=1\" data-orig-size=\"2560,1615\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Screenshot\" data-image-description=\"\" data-image-caption=\"&lt;p&gt;Screenshot&lt;\/p&gt;\n\" data-medium-file=\"https:\/\/i0.wp.com\/www.rzepa.net\/blog\/wp-content\/uploads\/2024\/06\/Screenshot-328-scaled-1.jpg?fit=300%2C189&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/www.rzepa.net\/blog\/wp-content\/uploads\/2024\/06\/Screenshot-328-scaled-1.jpg?fit=450%2C284&amp;ssl=1\" class=\"size-large wp-image-27097\" src=\"https:\/\/i0.wp.com\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2024\/06\/Screenshot-328-1024x646.jpg?w=450&#038;ssl=1\" alt=\"\"  \/><\/a><\/p>\n<hr \/>\n<p>Many variations on the above search can be constructed<span id=\"cite_ITEM-27090-3\" name=\"citation\"><a href=\"#ITEM-27090-3\">[4]<\/a><\/span> It is also useful to note that the above syntax presents the results of the search in &#8220;human readable&#8221; form. For a machine version, either of the two forms below should be used.<\/p>\n<ol>\n<li><tt><small><a href=\"https:\/\/api.datacite.org\/dois\/?query=media.media_type:chemical\/x-gaussian-log+AND+media.media_type:text\/plain+AND+(titles.title:*Exo*+OR+titles.title:*Endo*)\" target=\"new\" rel=\"noopener\">https:\/\/api.datacite.org\/dois\/?query=media.media_type:chemical\/x-gaussian-log+AND+media.media_type:text\/plain+AND+(titles.title:*Exo*+OR+titles.title:*Endo*)<\/a><\/small><\/tt><\/li>\n<li><tt><small><a href=\"http:\/\/curl &quot;https:\/\/api.datacite.org\/dois\/?query=media.media_type:chemical\/x-gaussian-log+AND+media.media_type:text\/plain+AND+(titles.title:*Exo*+OR+titles.title:*Endo*)&quot;\">curl \"https:\/\/api.datacite.org\/dois\/?query=media.media_type:chemical\/x-gaussian-log+AND+media.media_type:text\/plain+AND+(titles.title:*Exo*+OR+titles.title:*Endo*)\"<\/a><\/small><\/tt><\/li>\n<\/ol>\n<p>These last forms emphasise that data discovery is aimed at machine automation as well as humans.<\/p>\n<p>Finally, I ponder how machines will respond to articles containing references to such discoverability. Ideally, the machine actionable information should itself be included in the (CrossRef) metadata describing the article. At the moment that aspect is perhaps the weakest point of machine discoverability associated with journals.<\/p>\n<h2>References<\/h2>\n    <ol class=\"kcite-bibliography csl-bib-body\"><li id=\"ITEM-27090-0\">H. Rzepa, \"The journey from Journal &quot;ESI&quot; to FAIR data objects: An eighteen year old (continuing) experiment.\", 2023. <a href=\"https:\/\/doi.org\/10.59350\/g2p77-78m14\">https:\/\/doi.org\/10.59350\/g2p77-78m14<\/a>\n\n<\/li>\n<li id=\"ITEM-27090-1\">T. Mies, A.J.P. White, H.S. Rzepa, L. Barluzzi, M. Devgan, R.A. Layfield, and A.G.M. Barrett, \"Syntheses and Characterization of Main Group, Transition Metal, Lanthanide, and Actinide Complexes of Bidentate Acylpyrazolone Ligands\", <i>Inorganic Chemistry<\/i>, vol. 62, pp. 13253-13276, 2023. <a href=\"https:\/\/doi.org\/10.1021\/acs.inorgchem.3c01506\">https:\/\/doi.org\/10.1021\/acs.inorgchem.3c01506<\/a>\n\n<\/li>\n<li id=\"ITEM-27090-2\">D.C. Braddock, S. Lee, and H.S. Rzepa, \"Modelling kinetic isotope effects for Swern oxidation using DFT-based transition state theory\", <i>Digital Discovery<\/i>, vol. 3, pp. 1496-1508, 2024. <a href=\"https:\/\/doi.org\/10.1039\/d3dd00246b\">https:\/\/doi.org\/10.1039\/d3dd00246b<\/a>\n\n<\/li>\n<li id=\"ITEM-27090-3\">H. Rzepa, \"A cascading tutorial in finding rich NMR data using the Datacite datasearch engine.\", 2020. <a href=\"https:\/\/doi.org\/10.59350\/7jq8v-z4p56\">https:\/\/doi.org\/10.59350\/7jq8v-z4p56<\/a>\n\n<\/li>\n<\/ol>\n\n<\/div> <!-- kcite-section 27090 -->","protected":false},"excerpt":{"rendered":"<p>I can remember a time when journal articles carried selected data within their body as e.g. Tables, Figures or Experimental procedures, with the rest consigned to a box of paper deposited (for UK journals) at the British library. Then came ESI or electronic supporting information. Most recently, many journals are now including what is called [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[6],"tags":[],"class_list":["post-27090","post","type-post","status-publish","format-standard","hentry","category-interesting-chemistry"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p1gPyz-72W","jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/www.rzepa.net\/blog\/index.php?rest_route=\/wp\/v2\/posts\/27090","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.rzepa.net\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.rzepa.net\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.rzepa.net\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.rzepa.net\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=27090"}],"version-history":[{"count":0,"href":"https:\/\/www.rzepa.net\/blog\/index.php?rest_route=\/wp\/v2\/posts\/27090\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.rzepa.net\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=27090"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.rzepa.net\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=27090"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.rzepa.net\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=27090"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}