{"id":22043,"date":"2020-04-07T11:04:44","date_gmt":"2020-04-07T10:04:44","guid":{"rendered":"https:\/\/www.ch.imperial.ac.uk\/rzepa\/blog\/?p=22043"},"modified":"2020-04-07T11:04:44","modified_gmt":"2020-04-07T10:04:44","slug":"new-generations-of-globally-aggregating-search-engines-for-chemical-data","status":"publish","type":"post","link":"https:\/\/www.rzepa.net\/blog\/?p=22043","title":{"rendered":"New generations of globally aggregating search engines &#8211; for (chemical) data."},"content":{"rendered":"<div class=\"kcite-section\" kcite-section-id=\"22043\">\n<p>Chemists have long been familiar with search engines that aspire to index a large proportion of the chemical literature. Think for example the old-generation (and commercial)\u00a0<a href=\"https:\/\/scifinder.cas.org\/\">SciFinder (Scholar)<\/a>\u00a0and <a href=\"https:\/\/www.reaxys.com\/\">Reaxys<\/a>\u00a0or those that arrived in the 1990s in the online era<sup>\u2021<\/sup> such as the non-commercial <a href=\"https:\/\/pubchem.ncbi.nlm.nih.gov\">Pubchem<\/a> or <a href=\"https:\/\/www.chemspider.com\">ChemSpider<\/a> (there are more). But you may not be as familiar with the latest generation of global search engines and here I will focus on three relatively new ones that specialise specifically in tracking down data rather than just publications.<\/p>\n<p>I will illustrate first using a <i>regular<\/i> or <i>non-advanced<\/i> search. The keyword will be <strong>obtusallene<\/strong>, which is selected largely because it is a relatively unique string which is likely to result in fewer false positives. It is a family of marine alkaloids containing, unusually, bromine and \/or chlorine<span id=\"cite_ITEM-22043-0\" name=\"citation\"><a href=\"#ITEM-22043-0\">[1]<\/a><\/span> and the citation here is to a journal article describing some of its chemistry. But what if you want to find data associated with such molecules?<\/p>\n<ol>\n<li><strong>DataCite<\/strong> (the name gives a clue) specialises in finding data. It was launched ten years ago and has been rapidly expanding its index since. A regular search can be formulated using the string\n<ul>\n<li><a href=\"https:\/\/search.datacite.org\/works?query=obtusallene\">https:\/\/search.datacite.org\/works?query=obtusallene<\/a> (25 hits).<\/li>\n<li>What might be considered an advanced query is to constrain the search, here to the occurance of the phrase in specifically the <strong>title<\/strong> field of the information descriptor of the item (also called the metadata for the item):<br \/>\n<a href=\"https:\/\/search.datacite.org\/works?query=titles.title:*Obtusallene*\">https:\/\/search.datacite.org\/works?query=titles.title:*Obtusallene*<\/a> (24 hits)<\/li>\n<li>A variation on this would be to specify the <strong>description field<\/strong>:\u00a0<a href=\"https:\/\/search.datacite.org\/works?query=descriptions.description:*Obtusallene*\"> https:\/\/search.datacite.org\/works?query=descriptions.description:*Obtusallene*<\/a> (3 hits)<\/li>\n<li>And these can be combined: <a href=\"https:\/\/search.datacite.org\/works?query=descriptions.description:*Obtusallene*+OR+titles.title:*Obtusallene*\">https:\/\/search.datacite.org\/works?query=descriptions.description:*Obtusallene*+OR+titles.title:*Obtusallene*<\/a>\u00a0(25 hits)<\/li>\n<\/ul>\n<p>As these three advanced queries imply, there are many more ways of constraining the search, which I will describe at a later time.<\/li>\n<li>A more recent introduction is <strong>DataSetSearch<\/strong> from Google.\n<ul>\n<li><a href=\"https:\/\/datasetsearch.research.google.com\/search?query=obtusallene\">https:\/\/datasetsearch.research.google.com\/search?query=obtusallene<\/a> (20 hits). Google cites as its sources DataCite itself and the specific repository Figshare (for this search query).\u00a0<\/li>\n<li>Which leaves a slight mystery. Whilst there is considerable overlap between the DataCite and Google searches, the latter should clearly be potentially a superset of the former, but in fact it is slightly less comprehensive (by at least 5 hits). <a href=\"https:\/\/i0.wp.com\/www.rzepa.net\/blog\/wp-content\/uploads\/2020\/04\/google.jpg?ssl=1\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"22051\" data-permalink=\"https:\/\/www.rzepa.net\/blog\/?attachment_id=22051\" data-orig-file=\"https:\/\/i0.wp.com\/www.rzepa.net\/blog\/wp-content\/uploads\/2020\/04\/google.jpg?fit=1706%2C1672&amp;ssl=1\" data-orig-size=\"1706,1672\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"google\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/www.rzepa.net\/blog\/wp-content\/uploads\/2020\/04\/google.jpg?fit=300%2C294&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/www.rzepa.net\/blog\/wp-content\/uploads\/2020\/04\/google.jpg?fit=450%2C441&amp;ssl=1\" class=\"aligncenter size-large wp-image-22051\" src=\"https:\/\/i0.wp.com\/www.rzepa.net\/blog\/wp-content\/uploads\/2020\/04\/google.jpg?resize=450%2C441&#038;ssl=1\" alt=\"\" width=\"450\" height=\"441\" srcset=\"https:\/\/i0.wp.com\/www.rzepa.net\/blog\/wp-content\/uploads\/2020\/04\/google.jpg?resize=1024%2C1004&amp;ssl=1 1024w, https:\/\/i0.wp.com\/www.rzepa.net\/blog\/wp-content\/uploads\/2020\/04\/google.jpg?resize=300%2C294&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.rzepa.net\/blog\/wp-content\/uploads\/2020\/04\/google.jpg?resize=768%2C753&amp;ssl=1 768w, https:\/\/i0.wp.com\/www.rzepa.net\/blog\/wp-content\/uploads\/2020\/04\/google.jpg?resize=1536%2C1505&amp;ssl=1 1536w, https:\/\/i0.wp.com\/www.rzepa.net\/blog\/wp-content\/uploads\/2020\/04\/google.jpg?resize=50%2C50&amp;ssl=1 50w, https:\/\/i0.wp.com\/www.rzepa.net\/blog\/wp-content\/uploads\/2020\/04\/google.jpg?w=1706&amp;ssl=1 1706w, https:\/\/i0.wp.com\/www.rzepa.net\/blog\/wp-content\/uploads\/2020\/04\/google.jpg?w=900&amp;ssl=1 900w, https:\/\/i0.wp.com\/www.rzepa.net\/blog\/wp-content\/uploads\/2020\/04\/google.jpg?w=1350&amp;ssl=1 1350w\" sizes=\"auto, (max-width: 450px) 100vw, 450px\" \/><\/a><\/li>\n<\/ul>\n<\/li>\n<li>My third new engine is <strong><a href=\"https:\/\/explore.openaire.eu\/search\/find\">OpenAIRE<\/a><\/strong> (a European project supporting Open Science). It is also the search engine provided by Zenodo.\n<ul>\n<li><a href=\"https:\/\/explore.openaire.eu\/search\/find?keyword=obtusallene\">https:\/\/explore.openaire.eu\/search\/find?keyword=obtusallene <\/a> (20 hits on research data, 6 hits on publications, 5 hits on &#8220;other research products&#8221; and zero hits on &#8220;software&#8221;).<\/li>\n<li>Which introduces not just data but other concepts associated with &#8220;research objects&#8221;, clearly more useful than data alone. One of these may well shortly be <a href=\"https:\/\/www.ch.imperial.ac.uk\/rzepa\/blog\/?p=21960\">Instruments<\/a> (as eg used to acquire data) and another is<em> e.g.<\/em> the software used to analyze the data. <a href=\"https:\/\/i0.wp.com\/www.rzepa.net\/blog\/wp-content\/uploads\/2020\/04\/openaire.jpg?ssl=1\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"22050\" data-permalink=\"https:\/\/www.rzepa.net\/blog\/?attachment_id=22050\" data-orig-file=\"https:\/\/i0.wp.com\/www.rzepa.net\/blog\/wp-content\/uploads\/2020\/04\/openaire.jpg?fit=2201%2C1487&amp;ssl=1\" data-orig-size=\"2201,1487\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"openaire\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/www.rzepa.net\/blog\/wp-content\/uploads\/2020\/04\/openaire.jpg?fit=300%2C203&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/www.rzepa.net\/blog\/wp-content\/uploads\/2020\/04\/openaire.jpg?fit=450%2C304&amp;ssl=1\" class=\"aligncenter size-large wp-image-22050\" src=\"https:\/\/i0.wp.com\/www.rzepa.net\/blog\/wp-content\/uploads\/2020\/04\/openaire.jpg?resize=450%2C304&#038;ssl=1\" alt=\"\" width=\"450\" height=\"304\" srcset=\"https:\/\/i0.wp.com\/www.rzepa.net\/blog\/wp-content\/uploads\/2020\/04\/openaire.jpg?resize=1024%2C692&amp;ssl=1 1024w, https:\/\/i0.wp.com\/www.rzepa.net\/blog\/wp-content\/uploads\/2020\/04\/openaire.jpg?resize=300%2C203&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.rzepa.net\/blog\/wp-content\/uploads\/2020\/04\/openaire.jpg?resize=768%2C519&amp;ssl=1 768w, https:\/\/i0.wp.com\/www.rzepa.net\/blog\/wp-content\/uploads\/2020\/04\/openaire.jpg?resize=1536%2C1038&amp;ssl=1 1536w, https:\/\/i0.wp.com\/www.rzepa.net\/blog\/wp-content\/uploads\/2020\/04\/openaire.jpg?resize=2048%2C1384&amp;ssl=1 2048w, https:\/\/i0.wp.com\/www.rzepa.net\/blog\/wp-content\/uploads\/2020\/04\/openaire.jpg?w=900&amp;ssl=1 900w, https:\/\/i0.wp.com\/www.rzepa.net\/blog\/wp-content\/uploads\/2020\/04\/openaire.jpg?w=1350&amp;ssl=1 1350w\" sizes=\"auto, (max-width: 450px) 100vw, 450px\" \/><\/a><\/li>\n<\/ul>\n<\/li>\n<\/ol>\n<p>I think these new-generation search engines specialising in data have lots of exciting potential. They are still maturing and I hope we will see some interesting new capabilities emerge which we have not had before.<\/p>\n<hr \/>\n<p><sup>\u2021<\/sup>All are on-line nowadays, but engines such as SciFinder had two previous existences, from about 1980 as CAS online using merely a terminal interface, and prior to that as printed copies to be searched manually.<\/p>\n<h2>References<\/h2>\n    <ol class=\"kcite-bibliography csl-bib-body\"><li id=\"ITEM-22043-0\">J. Clarke, K.J. Bonney, M. Yaqoob, S. Solanki, H.S. Rzepa, A.J.P. White, D.S. Millan, and D.C. Braddock, \"Epimeric Face-Selective Oxidations and Diastereodivergent Transannular Oxonium Ion Formation Fragmentations: Computational Modeling and Total Syntheses of 12-Epoxyobtusallene IV, 12-Epoxyobtusallene II, Obtusallene X, Marilzabicycloallene C, and Marilzabicycloallene D\", <i>The Journal of Organic Chemistry<\/i>, vol. 81, pp. 9539-9552, 2016. <a href=\"https:\/\/doi.org\/10.1021\/acs.joc.6b02008\">https:\/\/doi.org\/10.1021\/acs.joc.6b02008<\/a>\n\n<\/li>\n<\/ol>\n\n<\/div> <!-- kcite-section 22043 -->","protected":false},"excerpt":{"rendered":"<p>Chemists have long been familiar with search engines that aspire to index a large proportion of the chemical literature. Think for example the old-generation (and commercial)\u00a0SciFinder (Scholar)\u00a0and Reaxys\u00a0or those that arrived in the 1990s in the online era\u2021 such as the non-commercial Pubchem or ChemSpider (there are more). But you may not be as familiar [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[3],"tags":[1523],"class_list":["post-22043","post","type-post","status-publish","format-standard","hentry","category-chemical-it","tag-chemical-it"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p1gPyz-5Jx","jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/www.rzepa.net\/blog\/index.php?rest_route=\/wp\/v2\/posts\/22043","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.rzepa.net\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.rzepa.net\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.rzepa.net\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.rzepa.net\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=22043"}],"version-history":[{"count":0,"href":"https:\/\/www.rzepa.net\/blog\/index.php?rest_route=\/wp\/v2\/posts\/22043\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.rzepa.net\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=22043"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.rzepa.net\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=22043"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.rzepa.net\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=22043"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}