{"id":24951,"date":"2022-03-28T12:53:31","date_gmt":"2022-03-28T11:53:31","guid":{"rendered":"https:\/\/www.ch.imperial.ac.uk\/rzepa\/blog\/?p=24951"},"modified":"2022-03-28T12:53:31","modified_gmt":"2022-03-28T11:53:31","slug":"raw-data-and-the-evolution-of-crystallographic-fair-data-journals-processed-and-raw-structure-data","status":"publish","type":"post","link":"https:\/\/www.rzepa.net\/blog\/?p=24951","title":{"rendered":"Raw data and the evolution of crystallographic FAIR data. Journals, processed and raw structure data."},"content":{"rendered":"<div class=\"kcite-section\" kcite-section-id=\"24951\">\n<p>In <a href=\"https:\/\/www.ch.imperial.ac.uk\/rzepa\/blog\/?p=24723\" target=\"_blank\" rel=\"noopener\">my previous post on the topic<\/a>,\u00a0I introduced the concept that data can come in several forms, most commonly as &#8220;raw&#8221; or primary data and as a &#8220;processed&#8221; version of this data that has added value. In crystallography, the chemist is interested in this processed version, carried by a CIF file. However on rare occasions when a query arises about the processed component, this can in principle at least be resolved by taking a look at the original raw data, expressed as diffraction images.\u00a0I established with much appreciated help from CCDC that since 2016, around 65 datasets in the CSD (Cambridge structural database) have appeared with such associated raw data. The problem is easily reconciling the two sets of data (the raw data is not stored on CSD) and one way of doing this is <em>via<\/em> the metadata associated with the datasets. In turn, if this metadata is suitably registered, one can query the metadata store for such associations, as was illustrated in the previous post on the topic. Here\u00a0I explore the metadata records for five of these 65 sets to find out their properties, selected to illustrate the five data repositories thus far that host such data for compounds in the CSD database.<\/p>\n<table border=\"1\">\n<tbody>\n<tr>\n<th>Raw data<br \/>\nrepository<\/th>\n<th>Raw Data<br \/>\nDOI<\/th>\n<th>Raw data<br \/>\n\u2192CSD?<\/th>\n<th>CSD\u2192<br \/>\nRaw data?<\/th>\n<th>\u21d0Journal\u21d2<\/th>\n<\/tr>\n<tr>\n<td><small>Zenodo<\/small><\/td>\n<td><a href=\"https:\/\/doi.org\/10.5281\/zenodo.4271549\" target=\"references\" rel=\"noopener\"><small>10.5281\/zenodo.4271549<\/small><\/a><\/td>\n<td><a href=\"https:\/\/api.datacite.org\/application\/vnd.datacite.datacite+xml\/10.5517\/10.5281\/zenodo.4271549\" target=\"references\" rel=\"noopener\">No<\/a><\/td>\n<td><a href=\"https:\/\/api.datacite.org\/application\/vnd.datacite.datacite+xml\/10.5517\/ccdc.csd.cc1mw6k8vnd.datacite.datacite+xml\/10.5517\/ccdc.csd.cc1mw6k8\" target=\"references\" rel=\"noopener\">No<\/a><\/td>\n<td><a href=\"https:\/\/api.crossref.org\/works\/10.1039\/C6RA28567H\/transform\/application\/vnd.crossref.unixsd+xml\" target=\"_blank\" rel=\"noopener\"><small>10.1039\/C6RA28567H<\/small><\/a><\/td>\n<\/tr>\n<tr>\n<td><small>Imperial College research data repository<\/small><\/td>\n<td><a href=\"https:\/\/dx.doi.org\/10.14469\/hpc\/2298\" target=\"references\" rel=\"noopener\"><small>10.14469\/hpc\/2298<\/small><\/a><\/td>\n<td><a href=\"https:\/\/api.datacite.org\/application\/vnd.datacite.datacite+xml\/10.14469\/hpc\/2298\" target=\"references\" rel=\"noopener\">Yes<\/a><\/td>\n<td><a href=\"https:\/\/api.datacite.org\/application\/vnd.datacite.datacite+xml\/10.5517\/ccdc.csd.cc1n9pn9\" target=\"references\" rel=\"noopener\">Yes<\/a><\/td>\n<td><a href=\"https:\/\/api.crossref.org\/works\/10.1021\/acsomega.7b00482\/transform\/application\/vnd.crossref.unixsd+xml\" target=\"_blank\" rel=\"noopener\"><small>10.1021\/acsomega.7b00482<\/small><\/a><\/td>\n<\/tr>\n<tr>\n<td><small>RepoD, a Harvard Dataverse instance<\/small><\/td>\n<td><a href=\"https:\/\/doi.org\/10.18150\/repod.6628285\" target=\"references\" rel=\"noopener\"><small>10.18150\/repod.6628285<\/small><\/a><\/td>\n<td><a href=\"https:\/\/api.datacite.org\/application\/vnd.datacite.datacite+xml\/10.18150\/repod.6628285\" target=\"references\" rel=\"noopener\">No<\/a><\/td>\n<td><a href=\"https:\/\/api.datacite.org\/application\/vnd.datacite.datacite+xml\/10.5517\/ccdc.csd.cc24lk1c\" target=\"references\" rel=\"noopener\">No<\/a><\/td>\n<td><a href=\"https:\/\/api.crossref.org\/works\/10.1021\/acs.cgd.0c01252\/transform\/application\/vnd.crossref.unixsd+xml\" target=\"_blank\" rel=\"noopener\"><small>10.1021\/acs.cgd.0c01252<\/small><\/a><\/td>\n<\/tr>\n<tr>\n<td><small>Cambridge university repository<\/small><\/td>\n<td><a href=\"https:\/\/doi.org\/10.17863\/CAM.21968\" target=\"references\" rel=\"noopener\"><small>10.17863\/CAM.21968<\/small><\/a><\/td>\n<td><a href=\"https:\/\/api.datacite.org\/application\/vnd.datacite.datacite+xml\/10.17863\/CAM.21968\" target=\"references\" rel=\"noopener\">No<\/a><\/td>\n<td><a href=\"https:\/\/api.datacite.org\/application\/vnd.datacite.datacite+xml\/10.5517\/ccdc.csd.cc215gc2\" target=\"references\" rel=\"noopener\">No<\/a><\/td>\n<td><a href=\"https:\/\/api.crossref.org\/works\/10.1016\/j.inoche.2018.08.024\/transform\/application\/vnd.crossref.unixsd+xml\" target=\"_blank\" rel=\"noopener\"><small>10.1016\/j.inoche.2018.08.024<\/small><\/a><\/td>\n<\/tr>\n<tr>\n<td><small>Isis neutron and muon source data journal<\/small><\/td>\n<td><a href=\"https:\/\/doi.org\/10.17863\/CAM.21968\" target=\"references\" rel=\"noopener\"><small>10.5286\/ISIS.E.RB1620465<\/small><\/a><\/td>\n<td><a href=\"https:\/\/api.datacite.org\/application\/vnd.datacite.datacite+xml\/10.5286\/ISIS.E.RB1620465\" target=\"references\" rel=\"noopener\">No<\/a><\/td>\n<td><a href=\"https:\/\/api.datacite.org\/application\/vnd.datacite.datacite+xml\/10.5517\/ccdc.csd.cc24ykvj\" target=\"references\" rel=\"noopener\">No<\/a><\/td>\n<td><a href=\"https:\/\/api.crossref.org\/works\/10.1039\/D0CC02418J\/transform\/application\/vnd.crossref.unixsd+xml\" target=\"_blank\" rel=\"noopener\"><small>10.1039\/D0CC02418J<\/small><\/a><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Ideally, one is looking for bidirectional links between the data as expressed in the metadata and in both directions. As you can see from the above, these links are present in only one of the five sets. More common is that both the raw and the processed data will contain links to the journal article where the data is discussed. Very much less commonly are there links from the journal article to the raw data, although such links are slightly more likely to exist from the journal to the processed data. If you click on the link in any of the last three columns, a copy of the metadata will download for you to inspect. There you can verify if the assertions made above are correct.\u00a0<\/p>\n<p>What the metadata records demonstrate above is a very small scale so-called PID graph (DOI: <span id=\"cite_ITEM-24951-0\" name=\"citation\"><a href=\"#ITEM-24951-0\">[1]<\/a><\/span> <a href=\"https:\/\/doi.org\/10.5438\/jwvf-8a66\">10.5438\/jwvf-8a66<\/a>)\u00a0where each DOI is a node in that graph and if a connection exists, it is shown by a line connecting the nodes.\u00a0The PID graph can be extended to include a third type of node, the journal article and then it starts to get interesting!\u00a0I will investigate if I can generate the PID graph for the above, although be prepared, it will not (yet) contain very many lines between nodes!<\/p>\n<h2>References<\/h2>\n    <ol class=\"kcite-bibliography csl-bib-body\"><li id=\"ITEM-24951-0\">M. Fenner, and A. Aryani, \"Introducing the PID Graph\", 2019. <a href=\"https:\/\/doi.org\/10.5438\/jwvf-8a66\">https:\/\/doi.org\/10.5438\/jwvf-8a66<\/a>\n\n<\/li>\n<\/ol>\n\n<\/div> <!-- kcite-section 24951 -->","protected":false},"excerpt":{"rendered":"<p>In my previous post on the topic,\u00a0I introduced the concept that data can come in several forms, most commonly as &#8220;raw&#8221; or primary data and as a &#8220;processed&#8221; version of this data that has added value. In crystallography, the chemist is interested in this processed version, carried by a CIF file. However on rare occasions [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[3],"tags":[1523],"class_list":["post-24951","post","type-post","status-publish","format-standard","hentry","category-chemical-it","tag-chemical-it"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p1gPyz-6ur","jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/www.rzepa.net\/blog\/index.php?rest_route=\/wp\/v2\/posts\/24951","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.rzepa.net\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.rzepa.net\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.rzepa.net\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.rzepa.net\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=24951"}],"version-history":[{"count":0,"href":"https:\/\/www.rzepa.net\/blog\/index.php?rest_route=\/wp\/v2\/posts\/24951\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.rzepa.net\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=24951"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.rzepa.net\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=24951"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.rzepa.net\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=24951"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}