Posts Tagged ‘social media’
Saturday, February 16th, 2019
The title of this post comes from the site www.crossref.org/members/prep/ Here you can explore how your favourite publisher of scientific articles exposes metadata for their journal.
Firstly, a reminder that when an article is published, the publisher collects information about the article (the “metadata”) and registers this information with CrossRef in exchange for a DOI. This metadata in turn is used to power e.g. a search engine which allows “rich” or “deep” searching of the articles to be undertaken. There is also what is called an API (Application Programmer Interface) which allows services to be built offering deeper insights into what are referred to as scientific objects. One such service is “Event Data“, which attempts to create links between various research objects such as publications, citations, data and even commentaries in social media. A live feed can be seen here.
So here are the results for the metadata provided by six publishers familiar to most chemists, with categories including;
- References
- Open References
- ORCID IDs
- Text mining URLs
- Abstracts

RSC

ACS

Elsevier

Springer-Nature

Wiley

Science
One immediately notices the large differences between publishers. Thus most have 0% metadata for the article abstracts, but one (the RSC) has 87%! Another striking difference is those that support open references (OpenCitations). The RSC and Springer Nature are 99-100% compliant whilst the ACS is 0%. Yet another variation is the adoption of the ORCID (Open Researcher and Collaborator Identifier), where the learned society publishers (RSC, ACS) achieve > 80%, but the commercial publishers are in the lower range of 20-49%.
To me the most intriguing was the Text mining URLs. From the help pages, “The Crossref REST API can be used by researchers to locate the full text of content across publisher sites. Publishers register these URLs – often including multiple links for different formats such as PDF or XML – and researchers can request them programatically“. Here the RSC is at 0%, ACS is at 8% but the commercial publishers are 80+%. I tried to find out more at e.g. https://www.springernature.com/gp/researchers/text-and-data-mining but the site was down when I tried. This can be quite a controversial area. Sometimes the publisher exerts strict control over how the text mining can be carried out and how any results can be disseminated. Aaron Swartz famously fell foul of this.
I am intrigued as to how, as a reader with no particular pre-assembled toolkit for text mining, I can use this metadata provided by the publishers to enhance my science. After all, 80+% of articles with some of the publishers apparently have a mining URL that I could use programmatically. If anyone reading this can send some examples of the process, I would be very grateful.
Finally I note the absence of any metadata in the above categories relating to FAIR data. Such data also has the potential for programmatic procedures to retrieve and re-use it (some examples are available here[1]), but apparently publishers do not (yet) collect metadata relating to FAIR. Hopefully they soon will.
References
- A. Barba, S. Dominguez, C. Cobas, D.P. Martinsen, C. Romain, H.S. Rzepa, and F. Seoane, "Workflows Allowing Creation of Journal Article Supporting Information and Findable, Accessible, Interoperable, and Reusable (FAIR)-Enabled Publication of Spectroscopic Data", ACS Omega, vol. 4, pp. 3280-3286, 2019. https://doi.org/10.1021/acsomega.8b03005
Tags:Aaron Swartz, Academic publishing, API, Business intelligence, CrossRef, data, Data management, Elsevier, favourite publisher, Identifiers, Information, Information science, Knowledge, Knowledge representation, metadata, mining, ORCiD, PDF, Pre-exposure prophylaxis, Publishing, Publishing Requirements for Industry Standard Metadata, Records management, Research Object, Scholarly communication, Scientific literature, search engine, social media, Technical communication, Technology/Internet, text mining, Written communication, XML
Posted in Interesting chemistry | 1 Comment »
Saturday, February 16th, 2019
The title of this post comes from the site www.crossref.org/members/prep/ Here you can explore how your favourite publisher of scientific articles exposes metadata for their journal.
Firstly, a reminder that when an article is published, the publisher collects information about the article (the “metadata”) and registers this information with CrossRef in exchange for a DOI. This metadata in turn is used to power e.g. a search engine which allows “rich” or “deep” searching of the articles to be undertaken. There is also what is called an API (Application Programmer Interface) which allows services to be built offering deeper insights into what are referred to as scientific objects. One such service is “Event Data“, which attempts to create links between various research objects such as publications, citations, data and even commentaries in social media. A live feed can be seen here.
So here are the results for the metadata provided by six publishers familiar to most chemists, with categories including;
- References
- Open References
- ORCID IDs
- Text mining URLs
- Abstracts

RSC

ACS

Elsevier

Springer-Nature

Wiley

Science
One immediately notices the large differences between publishers. Thus most have 0% metadata for the article abstracts, but one (the RSC) has 87%! Another striking difference is those that support open references (OpenCitations). The RSC and Springer Nature are 99-100% compliant whilst the ACS is 0%. Yet another variation is the adoption of the ORCID (Open Researcher and Collaborator Identifier), where the learned society publishers (RSC, ACS) achieve > 80%, but the commercial publishers are in the lower range of 20-49%.
To me the most intriguing was the Text mining URLs. From the help pages, “The Crossref REST API can be used by researchers to locate the full text of content across publisher sites. Publishers register these URLs – often including multiple links for different formats such as PDF or XML – and researchers can request them programatically“. Here the RSC is at 0%, ACS is at 8% but the commercial publishers are 80+%. I tried to find out more at e.g. https://www.springernature.com/gp/researchers/text-and-data-mining but the site was down when I tried. This can be quite a controversial area. Sometimes the publisher exerts strict control over how the text mining can be carried out and how any results can be disseminated. Aaron Swartz famously fell foul of this.
I am intrigued as to how, as a reader with no particular pre-assembled toolkit for text mining, I can use this metadata provided by the publishers to enhance my science. After all, 80+% of articles with some of the publishers apparently have a mining URL that I could use programmatically. If anyone reading this can send some examples of the process, I would be very grateful.
Finally I note the absence of any metadata in the above categories relating to FAIR data. Such data also has the potential for programmatic procedures to retrieve and re-use it (some examples are available here[1]), but apparently publishers do not (yet) collect metadata relating to FAIR. Hopefully they soon will.
References
- A. Barba, S. Dominguez, C. Cobas, D.P. Martinsen, C. Romain, H.S. Rzepa, and F. Seoane, "Workflows Allowing Creation of Journal Article Supporting Information and Findable, Accessible, Interoperable, and Reusable (FAIR)-Enabled Publication of Spectroscopic Data", ACS Omega, vol. 4, pp. 3280-3286, 2019. https://doi.org/10.1021/acsomega.8b03005
Tags:Aaron Swartz, Academic publishing, API, Business intelligence, CrossRef, data, Data management, Elsevier, favourite publisher, Identifiers, Information, Information science, Knowledge, Knowledge representation, metadata, mining, ORCiD, PDF, Pre-exposure prophylaxis, Publishing, Publishing Requirements for Industry Standard Metadata, Records management, Research Object, Scholarly communication, Scientific literature, search engine, social media, Technical communication, Technology/Internet, text mining, Written communication, XML
Posted in Interesting chemistry | 1 Comment »
Saturday, February 3rd, 2018
The topic of open citations was presented at the PIDapalooza conference and represents a third component in the increasing corpus of open scientific information.
David Shotton gave us an update on Citations as First Class data objects – Citation Identifiers and introduced (me) to the blog where he discusses this topic. The citations or bibliography has long been regarded as an essential, and until recently inseparable, component at the end of a scientific article. It is also a component easily susceptible to “game play“. Authors can be tempted to self-cite themselves, possibly to excess and perhaps worse, to cite their friends and colleagues for other than purely scientific reasons. There are other issues. Thus to infer the context of any particular citation, one has to read the text where it is cited and this too can be subjected to game play. One may have to “read between the lines” to try to judge whether the citation is being cited favourably as supporting any case being made, or instead to indicate disagreement with the cited authors. An article that is being cited because one disagrees with the conclusions therein may still go on to contribute to the cited author’s “h-index” of esteem. So there are various aspects of citations that deserve improvement, or certainly development and evolution.
Shotton told us that many publishers are now releasing article citations as open (CC0) data in their own right, as urged to do so on the Initiative for Open Citations site. A corpus of some 13 million of these are now available as RDF triples with a SPARQL end-point. This latter means that semantic searches of the corpus can be undertaken. So what are the benefits? Worthy aspirations such as to explore connections between knowledge fields, and to follow the evolution of ideas and scholarly disciplines (similar in fact to the new Dimensions product I discussed in the previous post). When I probed into the various sites linked above, I had in mind to identify some clear scientific outcomes of making them available in this manner, perchance even in the field of chemistry. When I succeed I will follow-up on this post, but at the moment I am not yet in a position to illustrate these benefits with chemical stories. If anyone reading this post has such, please let us know!
I will conclude here by noting much discussion at universities of the future of the scientific article itself; whether it should be increasingly mandated as GOLD Open Access (made so by payment of an article processing charge, or APC, by its authors), or whether journals should retain the hybrid publishing models where only a proportion of articles are GOLD, and the remainder are paid for by subscription fees for licensing access to the non-GOLD articles in the journal. Meanwhile, in what seems sometimes as a separate conversation, the article itself is being dis-assembled into components such as open and/or FAIR data, open citations, infographics, social media and yes, even blogs. Are these two evolutions headed in different directions? Certainly, I think the future is not what it used to be!
Tags:Academic publishing, Applied linguistics, article processing charge, British National Corpus, chemical stories, cited author, Corpus linguistics, David Shotton, Entertainment/Culture, Linguistics, Open access, Quotation, RDF, social media, Texas A&M–Corpus Christi Islanders women's basketball
Posted in Chemical IT | 2 Comments »
Saturday, February 11th, 2017
On February 6th I was alerted to this intriguing article[1] by a phone call, made 55 minutes before the article embargo was due to be released. Gizmodo wanted to know if I could provide an (almost)† instant‡ quote. After a few days, this report of a stable compound of helium and sodium still seems impressive to me and I now impart a few more thoughts here.
The discovery originates from 17 authors based in 17 different institutions, an impressive illustration of global science and cooperation. I illustrate with this diagram, to be found not in the main article body but in its supporting information and for which the caption reads:

Computed charge density (eÅ-3) of Na2He at 300 GPa, plotted in the [110] plane of the conventional cell. The color bar gives the scale.
The nuclei carry of course the greatest charge density, but the density labelled “2e” is not nuclear-centered. This is typical of species known as electrides, where positive cations are associated with just electrons acting as the counter-anion and about which there was an extensive debate earlier on this blog. There is much discussion in the article[1] about the essential role of the He atoms in bringing about the formation of such an electride, an effect that is summarised in a second diagram also found in the supporting information:

I found myself thinking that it would be great to have the first diagram represented as a movie, evolving as the pressure is increased from say ambient to 300 GPa, and presumably showing the “2e” feature (which means diamagnetic electrons) forming as the pressure increases. Would their evolution be abrupt (a step change) or gradual as the pressure increases and the interatomic distances all decrease? As I understand it, this chemical phenomenon is due not so much to the usual coulombic attraction between positive nuclei and negative charge density from the electronic wavefunction leading to e.g. covalent bonds, but to electron repulsions induced by decreasing nuclear separations resulting in electride-like ionisation and hence electron localisation into the “interstitial cavities” of the lattice. Without pressure, you would just have sodium and helium atoms!
The urge to obtain this intriguing electronic wavefunction for myself now appeared (wavefunctions are rarely if ever included in supporting information). To do this you must have atom coordinates available, But such data was not to be found in the supporting information. It was eventually tracked down (by a crystallographer; thanks Andrew!) to the caption in Figure 2.

However, you probably do need to be a crystallographer to convert this data into a set of coordinates. This was done and is here deposited as a CIF file for you to play with if you wish (DOI:10.14469/hpc/2154)[2]. I have reduced the packing of the unit cell obtained from this CIF file (198 atoms) to just 60 and you can enjoy them by clicking on the diagram below. I should point out that if one uses a program that can recognise the periodic lattice such as Crystal (used in the article discussed here), there is no need to make such reductions, but in this instance I wanted to use a program such as Gaussian in discrete (non-periodic) mode, for which the calculation (B3LYP/Def2-SVPD) has DOI: 10.14469/hpc/2156[3] and where you can also find a wavefunction file to play with if you wish.

Click for 3D model
An ELF analysis for this non-periodic wavefunction looks as below. The ELF basins labelled “2e” located in the centre of the cube show an integrated electron population of ~1.9e and correspond to the localised electron pairs noted in the article above.

Click for 3D
The basins on the boundaries of this non-periodic unit show reduced integrations (red arrows below, 0.08 – 1.7e) and are artefacts of the non-periodic approximation introduced.

The ionization into an electride is brought about by the close proximity of the atoms as induced by high pressure. Releasing the pressure would allow the ionized electrons to re-attach themselves to the valence shell of the sodium atoms, thus destroying the unique properties of the system. It is certainly true that this system challenges our normal concepts of what a molecule is. The presence of He is essential and yet its electrons are hardly involved in the re-organised wavefunction. I cannot wait for more examples to be discovered!
†To meet the 55 minute deadline, I was given about 15 minutes thinking time!
‡Instant responses on social media now seem a sine qua non of the political world, so why not the scientific one?!
References
- X. Dong, A.R. Oganov, A.F. Goncharov, E. Stavrou, S. Lobanov, G. Saleh, G. Qian, Q. Zhu, C. Gatti, V.L. Deringer, R. Dronskowski, X. Zhou, V.B. Prakapenka, Z. Konôpková, I.A. Popov, A.I. Boldyrev, and H. Wang, "A stable compound of helium and sodium at high pressure", Nature Chemistry, vol. 9, pp. 440-445, 2017. https://doi.org/10.1038/nchem.2716
- H. Rzepa, "Na2He: a stable compound of helium and sodium at high pressure.", 2017. https://doi.org/10.14469/hpc/2154
- H. Rzepa, "He20Na40", 2017. https://doi.org/10.14469/hpc/2156
Tags:10.1038, Atom, Chemical elements, chemical phenomenon, Chemistry, Company: P. Acucar-CBD, Electride, Electron, Food Retail & Distribution - NEC, helium, Hydrogen, Matter, Oxygen, Physics, social media
Posted in Bond slam, crystal_structure_mining, Interesting chemistry | 11 Comments »
Thursday, November 10th, 2016
This is sent from the Pidapalooza event in Reykjavik, Iceland, and is a short collection of notable things I learnt or which attracted my attention.
Firstly, what IS PIDapalooza[1]? Well, it’s all about persistent identifiers, but don’t let that put you off! Another way of putting it is that it’s a way of finding things scientific on the Web. Not just publications, but conferences, social media, teaching, research datasets, infrastructure, grants, organizations, instruments, scientific objects and samples and no doubt much more. These (will) live in an inter-connected eco-system, and so the idea goes, will become an integral part of how a scientist accumulates and disseminates information nowadays. Yes, the conference itself has its own PID: 10.5438/11.0001 and the individual talks will also appear as both a collection and with their own PID in the near future.
- The first example comes from WikiData, a collection of carefully curated data, from which can be dynamically assembled say a periodic table of the elements. All the data here is included from other objects, and everything is referenced by its PID. Since it’s all assembled from data, if say the name of element 118 is assigned, then it will automatically be absorbed into this presentation.
- This next example proved highly contentious, but is included here anyway. It is templated PIDs, as in http://doi.org/10.5446/12780#t=00:20.00:27 which allows navigation to a particular part of an object referenced by the PID. In this case a time code for a movie, but it might be say an active site in a protein, or a key atom or group in a molecular complex for example. This might never happen (for reasons only the computer scientists currently understand!) but it does show one way in which the humble DOI might evolve.
- http://typeregistry.org exists for registering data types. It has almost no chemistry at the moment, but perhaps it should have!
- There was a great deal about ORCIDs, and the ways in which uses of this particular PID are evolving. For example, the next big effort is to use the ORCID system for organisations. You will find my ORCID at the top of this post.
- PIDs are also being mooted for instruments. The idea is that instrumental capabilities, settings, calibration etc are often an integral part of the data acquisition for a project. So if data is generated using such a device, why not quote its PID in any derived article so that others can more easily replicate a particular experiment in their own laboratory.
- A quote by one of the speakers was attributed to Bill Gates around 1997 “We need banking. We don’t need banks anymore” (think how this might apply to 2016. Was he correct?). This was followed by straw men such as: “We need publications. We don’t need publishers anymore”. Or “We need archiving. We don’t need libraries anymore”. Just like Gates’ own quote, the reality is of course far more complex.
- And PID fatigue; I hope you are not getting too much of that at the moment.
There are lots more I have learnt which I need to fix/enhance/address in our own experiments in the use of PIDs in chemistry, so I have better get on with it now!
References
- ORCID., DataCite., Crossref., and California Digital Library., "PIDapalooza 2016", 2016. https://doi.org/10.5438/11.0001
Tags:active site, Bill Gates, City: Reykjavik, Country: Iceland, scientist, social media, Technology/Internet
Posted in Chemical IT | 1 Comment »