librarian « Henry Rzepa's blog

Posts Tagged ‘librarian’

PIDapalooza 2018: the open festival for persistent identifiers.

Tuesday, November 14th, 2017

PIDapalooza is a new forum concerned with discussing all things persistent, hence PID. You might wonder what possible interest a chemist might have in such an apparently arcane subject, but think of it in terms of how to find the proverbial needle in a haystack in a time when needles might look all very similar. Even needles need descriptions, they are not all alike and PIDs are a way of providing high quality information (metadata) about a digital object.

The topics for discussion along with descriptions are now available at https://pidapalooza18.sched.com/list/descriptions/ and yes, before you ask, the event has its own PID (DOI: 10.5438/11.0002). Check out the speakers at https://pidapalooza18.sched.com/directory/speakers. I will be telling some stories from chemistry, and who knows, even some of the posts on this blog might feature. And if you do not brush up on the topic, no doubt your librarian, your funding body and your publisher will be telling you about it soon!

Tags:chemist, computing, Information, Information science, Knowledge representation, librarian, Needle, PID
Posted in Chemical IT | No Comments »

500 chemical twists: a (chalk and cheese) comparison of the impacts of blog posts and journal articles.

Friday, June 3rd, 2016

The title might give it away; this is my 500th blog post, the first having come some seven years ago. Very little online activity nowadays is excluded from measurement and so it is no surprise that this blog and another of my “other” scholarly endeavours, viz publishing in traditional journals, attract such “metrics” or statistics. The h-index is a well-known but somewhat controversial measure of the impact of journal articles; here I thought I might instead take a look at three less familiar ones – one relating to blogging, one specific to journal publishing and one to research data.

First, an update on the accumulated outreach of this blog over this seven-year period. The total number of country domains measured is 190. The African continent still has quite a few areas with zero hits (as does Svalbard, with a population of only 2600 for a land mass area 61,000 km²or 23 km² per person). Given the low blog readership density on the African continent, it would be interesting to find out whether journal readership is any better.

Next, I look at the temporal distribution for individual posts. The first has attracted the highest total; in five years it has had 19,262 views (the diagram below shows the number of views per day). Four others exceed 10,000 and 80 exceed 1000 views.

Of these five, the next is the oldest, going back to 2009. I was very surprised to find such longevity, with the number of views increasing rather than decreasing with the passage of time.

So time now to compare these statistics with the journals. And of course its chalk and cheese. A “view” for a post means someone (or something) accessing the post URL, which is then recorded in the server log. Resolving the URL does at least load the entire content of the post; whether its read or not is of course not recorded. Importantly, if you want to view the content at some later stage, a new “view” has to be made (although some browsers do save a web page and allow offline viewing at a later stage, but I suspect this usage is low). With electronic journal access, it’s rather different. Access to an article is now predominantly via two mechanisms:

From the table of contents (this is somewhat analogous to browsing a blog)
From the article DOI.

Statistics for these two methods are gathered differently. The new CrossRef resource chronograph.labs.crossref.org (CrossRef allocate all journal DOIs) can be used to measure what they call DOI “resolutions”. A DOI resolution however leads one only to what is called the “landing page”, where the interested reader can view the title, the graphical abstract and some other metadata. It does not mean of course that they go on to actually view the article (as HTML, equivalent to the blog above, or probably more often by downloading a PDF file). Here are a few results using this method:

chronograph.labs.crossref.org/dois/10.1021/ja710438j tracks this article[1] which I selected (in part) because it was published in 2008, just slightly before the oldest post above. In fact, the resolutions log only goes back to October 2010, by which time the initial flush of any interest in this article would have subsided and so its nice to see continuing interest (= impact?).
chronograph.labs.crossref.org/dois/10.1002/anie.201409672 [2] totals 208 resolutions, but as the graph below shows, 188 of these were on the first day of publication (Nov 19, 2014), then a few days gap and then about a month of daily resolutions, followed by occasional interest since then.
chronograph.labs.crossref.org/dois/10.1126/science.1181771 dates from 2010[3] and this time shows no peak on the first day, but again steady continuing interest to a current 245 resolutions.

What about the other main journal article access method, not via a DOI but from a table of contents page journal page? A Google search revealed this site: jusp.mimas.ac.uk (JUSP stands for Journal usage statistics portal, which sounded promising). This site collects “COUNTER compliant usage data”. COUNTER (Counting Online Usage of Networked Electronic Resources) is an initiative supported by many journal publishers and it sounds an interesting way of measuring “usage” (as opposed to “views” or “resolutions”; it’s that chalk and cheese again!). I would love to be able to show you some statistics using this resource, but the “small print” caught me out: “JUSP gives librarians a simple way of analysing the value and impact of their electronic journals”. Put simply, I am a researcher, not a librarian. As a researcher I do not have direct access; JUSP is a closed, restricted access (albeit taxpayer-funded) resource. I am discussing this with our head of information resources (who is a librarian) and hope to report back here on the outcome.

Finally research data. This is almost too new to be able to measure, but this resource stats.datacite.org is starting to collect statistics on data resolutions (similar to DOI resolutions).

You can see from the below for Imperial College (in fact this represents the two data repositories that we operate and which I cite here extensively on these blogs) that the resolution at running up to about 200 a month per dataset (more typically ~25 a month), with a total of 5065 resolutions for all items in March 2016 (the blog has ~12,000 views per month).
Figshare is another data repository we have made use of:

So to the summary.

Firstly, we see that I have shown three forms of impact, views, resolutions and usage. If one had statistics on all three, one might then try to see if they are correlated in any way. Even then, normalisation might be a challenge.
Over ~7 years, five posts on this blog have attracted >10,000 views.
Many of the blog posts have a long “finish” (to use a wine tasting term); the views continue regularly and often increase over time.
My analysis of the three journal articles above (and about 15 others) shows that between 50-300 resolutions over a few years is fairly typical (for this researcher at least; I am sure most better known researchers attract far far more).
The temporal distribution for article resolutions and blog views show both can have continuing impact over an extended period. None of the 18 articles I looked at show a significantly increasing impact with time but many of the blog posts do. This tends to suggest that the audiences for each are quite different; researchers for articles and a fair proportion of inquisitive students for the blog?
I may speculate whether a correlation between my article resolutions and my h-index probably might be found, but the article resolution has a fine-grained temporal resolution (allowing a derivative wrt time to be obtained) that is perhaps potentially more valuable than just the coarse h-index integration (an article can of course be cited for both positive and negative reasons!).
Initial analysis for data shows resolutions running at a similar rate to article resolutions. It is not yet possible to correlate data resolutions with article resolutions in which that data is discussed.

References

S.M. Rappaport, and H.S. Rzepa, "Intrinsically Chiral Aromaticity. Rules Incorporating Linking Number, Twist, and Writhe for Higher-Twist Möbius Annulenes", Journal of the American Chemical Society, vol. 130, pp. 7613-7619, 2008. https://doi.org/10.1021/ja710438j
A.E. Aliev, J.R.T. Arendorf, I. Pavlakos, R.B. Moreno, M.J. Porter, H.S. Rzepa, and W.B. Motherwell, "Surfing π Clouds for Noncovalent Interactions: Arenes versus Alkenes", Angewandte Chemie International Edition, vol. 54, pp. 551-555, 2014. https://doi.org/10.1002/anie.201409672
K. Abersfelder, A.J.P. White, H.S. Rzepa, and D. Scheschkewitz, "A Tricyclic Aromatic Isomer of Hexasilabenzene", Science, vol. 327, pp. 564-566, 2010. https://doi.org/10.1126/science.1181771

Tags:Country: Svalbard and Jan Mayen, CrossRef, head of information resources, HTML, Imperial College, librarian, online activity, Online Usage, PDF, researcher, search engines, usage statistics portal
Posted in Chemical IT | 4 Comments »

Henry Rzepa's blog

Posts Tagged ‘librarian’

PIDapalooza 2018: the open festival for persistent identifiers.

500 chemical twists: a (chalk and cheese) comparison of the impacts of blog posts and journal articles.

References

Recent Posts

Archives

Blogroll

Meta