Posts Tagged ‘Academia’

Open Access journal publishing debates – the elephant in the room?

Sunday, November 4th, 2018

For perhaps ten years now, the future of scientific publishing has been hotly debated. The traditional models are often thought to be badly broken, although convergence to a consensus of what a better model should be is not apparently close. But to my mind, much of this debate seems to miss one important point, how to publish data.

Thus, at one extreme is COAlition S, a model which promotes the key principle that “after 1 January 2020 scientific publications on the results from research funded by public grants provided by national and European research councils and funding bodies, must be published in compliant Open Access Journals or on compliant Open Access Platforms.” This includes ten principles, one of which “The ‘hybrid’ model of publishing is not compliant with the above principles” has revealed some strong dissent, as seen at forbetterscience.com/2018/09/11/response-to-plan-s-from-academic-researchers-unethical-too-risky I should explain that hybrid journals are those where the business model includes both institutional closed-access to the journal via a subscription charge paid by the library, coupled with the option for individual authors to purchase an Open Access release of an article so that it sits outside the subscription. The dissenters argue that non-OA and hybrid journals include many traditional ones, which especially in chemistry are regarded as those with the best impact factors and very much as the journals to publish in to maximise both the readership, hence the impact of the research and thus researcher’s career prospects. Thus many (not all) of the American Chemical Society (ACS) and Royal Society of Chemistry (RSC) journals currently fall into this category, as well as commercial publishers of journals such as Nature, Nature Chemistry,Science, Angew. Chemie, etc. 

So the debate is whether funded top ranking research in chemistry should in future always appear in non-hybrid OA journals (where the cost of publication is borne by article processing charges, or APCs) or in traditional subscription journals where the costs are borne by those institutions that can afford the subscription charges, but of course also limit the access.  A measure of how important and topical the debate is that there is even now a movie devoted to the topic which makes the point of how profitable commercial scientific publishing now is and hence how much resource is being diverted into these profit margins at the expense of funding basic science.

None of these debates however really takes a close look at the nature of the modern research paper. In chemistry at least, the evolution of such articles in the last 20 years (~ corresponding to the online era) has meant that whilst the size of the average article has remained static at around 10 “pages” (in quotes because of course the “page” is one of those legacy concepts related to print), another much newer component known as “Supporting information” or SI has ballooned to absurd sizes. It can reach 1000 pages[1] and there are rumours of even larger SIs. The content of SI is of course mostly data. The size is often because the data is present in visual form (think spectra). As visual information, it is not easily “inter-operable” or “accessible”. Nor is it “findable” until commercial abstracting agencies chose to index it. Searches of such indexed data are most certainly “closed” (again depending on institutional purchases of access) and not “open access”. You may recognise these attributes as those of FAIR (Findable, accessible, inter-operable and re-usable). So even if an article in chemistry is published in pure OA form, in order to get FAIR access to the data associated with the article, you will probably have to go to a non-OA resource run by a commercial organisation for profit. Thus a 10 page article might itself be OA, but the full potential of its 1000+ page data (an elephant if ever there was one) ends up being very much not OA.

You might argue that the 1000+ pages of data does not require the services of an abstracting agency to be useful. Surely a human can get all the information they want from inspecting a visual spectrum? Here I raise the future prospects of AI (artificial intelligence). The ~1000 page SI I noted above[1] includes e.g NMR spectra for around 70 compounds (I tried to count them all visually, but could not be certain I found them all). A machine, trained to identify spectra from associated metadata (a feature of FAIR), could extract vastly more information than a human could from FAIR raw data (a spectrum is already processed data, with implied information/data loss) in a given time. And for many articles, not just one. Thus FAIR data is very much targeted not only at humans but at the AI-trained machines of the future.

So I again repeat my assertion that focussing on whether an article is OA or not and whether publishing in hybrid journals is to be allowed or not by funders is missing that 100-fold bigger elephant in the room. For me, a publishing model that is fit for the future should include as a top priority a declaration of whether the data associated with it is FAIR. Thus in the Plan-S ten principles, FAIR is not mentioned at all. Only when FAIR-enabled data becomes part of the debates can we truly say that the article and its data are on its way to being properly open access.


The FAIR concept did not originally differentiate between processed data (i.e. spectra) and the underlying primary or raw data on which the processed data is based. Our own implementation of FAIR data includes both types of data; raw for machine reprocessing if required, and processed data for human interpretation. Along with a rich set of metadata, itself often created using carefully designed workflows conducted by machines.

The proportion of articles relating to chemistry which do not include some form of SI is probably low. These would include articles which simply provide a new model or interpretation of previously published data, reporting no new data of their own. A famous historical example is Michael Dewar’s re-interpretation of the structure of stipitatic acid[2] which founded the new area of non-benzenoid aromaticity.

References

  1. J.M. Lopchuk, K. Fjelbye, Y. Kawamata, L.R. Malins, C. Pan, R. Gianatassio, J. Wang, L. Prieto, J. Bradow, T.A. Brandt, M.R. Collins, J. Elleraas, J. Ewanicki, W. Farrell, O.O. Fadeyi, G.M. Gallego, J.J. Mousseau, R. Oliver, N.W. Sach, J.K. Smith, J.E. Spangler, H. Zhu, J. Zhu, and P.S. Baran, "Strain-Release Heteroatom Functionalization: Development, Scope, and Stereospecificity", Journal of the American Chemical Society, vol. 139, pp. 3209-3226, 2017. https://doi.org/10.1021/jacs.6b13229
  2. M.J.S. DEWAR, "Structure of Stipitatic Acid", Nature, vol. 155, pp. 50-51, 1945. https://doi.org/10.1038/155050b0

OpenCon (2016)

Friday, November 25th, 2016

Another conference, a Cambridge satellite meeting of OpenCon, and I quote here its mission: “OpenCon is a platform for the next generation to learn about Open Access, Open Education, and Open Data, develop critical skills, and catalyze action toward a more open system of research and education” targeted at students and early career academic professionals. But they do allow a few “late career” professionals to attend as well!

I could only attend the morning session, for which the keynote speaker was Erin McKiernanorcid The presentation was entitled How open science helps researchers succeedpresented as an exploration of an article written by Erin and colleagues with the same name and published in eLife[1] Erin has created a support page at http://whyopenresearch.org to augment the presentation and it’s well worth a visit.

One striking point made was the assertion that Open publications get more citations! 
Open publications get more citations

As with many metrics of the impacts of the science publication processes, a citation itself lacks the context of why it was made (see this post for further discussion), but the expectation is that a citation is “good”. From my perspective as a chemist, I did wonder why molecular science was missing from the graphic above. Do open chemistry publications also get more citations?

Which brings me to another point made during the talk, the increasingly controversial aspect of (journal) impact factors and the pressure placed on early career researchers to publish only in those with “high” impact factors, and for their careers to be assessed at least in part based on these and the anticipated “h-index”. The audience was indeed encouraged to go visit http://www.ascb.org/Dora/ (Declaration on Research Assessment, or Putting science into the assessment of research). Have you signed it yet?

Another manifestation of the modern trend to analyse impact metrics is the site Impactstory.org. This is a scripted resource that starts from your ORCID identifier and (optionally) your Twitter account (yes, apparently Tweets matter!) to derive a more complex alternative metric of a individual’s impacts. I had not tried this one before and so I submitted my ORCID and my Twitter account, and watched as the system went off to http://orcid.scopusfeedback.com (Scopus is an Elsevier product) to attempt to create my profile. It ground for quite a while, reporting initially that I had no publications! This was followed by an unexpected error; I did not get my impact back! But this experiment served to highlight one aspect that was discussed at the meeting; data and other research objects. The graphic above refers only to the citation of journal articles, it does not yet include the citation of data. However ORCID DOES include data and research objects as works.  And because the granularity of my data and research objects is very fine (one molecule = one work), I have quite a few. In fact ~200,000! ORCID gets to about 8000 before it gives up. I suspect http://orcid.scopusfeedback.com queries ORCID, gets back ~8000 entries and crashes. No doubt the programmer tasked with implementing this resource did not anticipate that any individual could accumulate 8000+ entries! Or probably factor in that the vast majority of these would of course not be journal articles but data. If the site gets back to me about the crash I experienced, I will update here.

Simon Deakin was the next speaker with (open) data as the focus and the worries many researchers have in being scooped by others who have re-used your open data without proper attributions. The discussion teased out that if data is properly deposited, it will indeed have full associated metadata and in particular a date stamp that could help protect an author’s interests.

It was really good to meet so many early career researchers who espouse the open ethos. Perhaps, in 20 years time,  another graphic akin to the one above might demonstrate that open researchers get more promotions!

References

  1. E.C. McKiernan, P.E. Bourne, C.T. Brown, S. Buck, A. Kenall, J. Lin, D. McDougall, B.A. Nosek, K. Ram, C.K. Soderberg, J.R. Spies, K. Thaney, A. Updegrove, K.H. Woo, and T. Yarkoni, "How open science helps researchers succeed", eLife, vol. 5, 2016. https://doi.org/10.7554/elife.16800

Journal innovations – the next step is augmented reality?

Wednesday, August 17th, 2016

In the previous post, I noted that a chemistry publisher is about to repeat an earlier experiment in serving pre-prints of journal articles. It would be fair to suggest that following the first great period of journal innovation, the boom in rapid publication “camera-ready” articles in the 1960s, the next period of rapid innovation started around 1994 driven by the uptake of the World-Wide-Web. The CLIC project[1] aimed to embed additional data-based components into the online presentation of the journal Chem Communications, taking the form of pop-up interactive 3D molecular models and spectra. The Internet Journal of Chemistry was designed from scratch to take advantage of this new medium.[2] Here I take a look at one recent experiment in innovation which incorporates “augmented reality”.[3]

The title is interesting: “Combination of Enabling Technologies to Improve and Describe the Stereoselectivity of Wolff–Staudinger Cascade Reaction“. One of these technologies relates to “microwave-assisted flow generation of primary ketenes by thermal decomposition of α-diazoketones at high temperature”, but the journal presentation itself attempts the “faster interpretation of computed data via a new web-based molecular viewer, which takes advantage from Augmented Reality (AR) technology“. To access this component directly, go to the link https://leyscigateway.ch.cam.ac.uk/staudinger/ It is not incorporated into the journal infrastructures as the CLIC project attempted, but is perhaps closer to the model I noted in the previous post of supporting (FAIR)  data associated with the article and hosted separately from the journal.

What happens next depends rather on the Web browser you are using. With many browsers and tablets, a conventional 3D molecular presentation appears; there is no button present where the red arrow points. You will find out this is because “Augmented Reality is not available in your browser, as the getUserMedia() API is not supported

AR0

Some browsers (the latest Opera, FireFox, Chrome) do support this feature, and a new AR button appears. Selecting this now layers the video from the device camera onto the 3D molecular model; the molecule now floats in the scene captured by the camera (which in the case below is the room I am sitting in). After a few seconds you are urged to “point the camera towards the AR marker”. The supporting information contains such AR markers as a navigation aid for the 3D coordinates contained there. An example is:

AR-markers

If this marker is now brought into the camera view (by printing it, sic) and holding it in front of the camera image, the marker resolves into further data relevant to the molecule of interest, layered into the existing scene of the room and the molecule. For the marker above, it resolves to a reaction energy profile which reveals where the specific molecule sits energetically in terms of the overall reaction.

AR

This layering of “heads up” molecular data into a scene comprising a 3D molecular model and the human viewer of that molecule captured in video is what defines the concept of “augmented reality” (the data being the augmentation, rather than the human). 

Having now tried it out, I was left wondering whether this truly was a great advance in enabling technology for chemistry journals. The role of the camera seems primarily to capture the AR markers contained in the supporting information; the presence of the reader in the video image apparently inspecting the molecule could be regarded as a distraction. The AR markers (QR codes) are merely visual representations of a URL, which in the form of a DOI (as used in this blog) to locate data is rather more familiar to most readers. The DOI, by the way, carries further information in the form of metadata, and which when sent to e.g. DataCite, enables the data to be found. Does the data need to be layered onto the molecule (and apparently floating in front of the reader) to become usable? Could it instead be placed in a pop-up or separate window of its own (as the 1994 CLIC project achieved)? Do the AR markers enable the data to be FAIR? One can Find the data (albeit only by reading and printing the supporting information) and view it in the AR scene, but is it Accessible (can one access the underlying numerical data?) or Interoperable (place it into another program) or Re-usable?

As with all enabling technologies, one has to always ask if that technology helps or hinders. Or is the principle of KISS (keep it simple) sometimes better? It is however good to see research groups experimenting with these themes and meanwhile readers can judge for themselves whether “heads up” AR augmentation of the data describing research is indeed the next big thing.

References

  1. D. James, B.J. Whitaker, C. Hildyard, H.S. Rzepa, O. Casher, J.M. Goodman, D. Riddick, and P. Murray‐Rust, "The case for content integrity in electronic chemistry journals: The CLIC project", New Review of Information Networking, vol. 1, pp. 61-69, 1995. https://doi.org/10.1080/13614579509516846
  2. S.M. Bachrach, and S.R. Heller, "The<i>Internet Journal of Chemistry:</i>A Case Study of an Electronic Chemistry Journal", Serials Review, vol. 26, pp. 3-14, 2000. https://doi.org/10.1080/00987913.2000.10764578
  3. S. Ley, B. Musio, F. Mariani, E. Śliwiński, M. Kabeshov, and H. Odajima, "Combination of Enabling Technologies to Improve and Describe the Stereoselectivity of Wolff–Staudinger Cascade Reaction", Synthesis, vol. 48, pp. 3515-3526, 2016. https://doi.org/10.1055/s-0035-1562579

Chemistry preprint servers (revisited).

Tuesday, August 16th, 2016

This week the ACS announced its intention to establish a “ChemRxiv preprint server to promote early research sharing“. This was first tried quite a few years ago, following the example of especially the physicists. As I recollect the experiment lasted about a year, attracted few submissions and even fewer of high quality. Will the concept succeed this time, in particular as promoted by a commercial publisher rather than a community of scientists (as was the original physicists model)?

The RSC (itself a highly successful commercial publisher) has picked up on this and run its own commentary. You will find quotes from yours truly there, along with Peter Murray-Rust, a long time ardent promoter of community driven open science. One interesting aspect is that the ACS runs around 50 journals, and the decision on whether each will accept preprints for publication will (shortly = next few weeks) be made by the individual editors. I wonder if the eventual list of those supporting the project will bring any surprises (bets on J. Am. Chem. Soc. preprints anyone)?

But I want to pick up on the declared aspiration “to promote early research sharing“. Here I couple research sharing with data sharing. If you share your research, you should also share the data resulting from that research. We are now entering a new era of data sharing (in part as a result of mandation by various funding bodies) and so one has to ask whether a pre-print server will encourage people to create and share FAIR data (data which is findable, accessible, inter-operable and re-usable) as a model to replace the current one of “supporting information” held in enormous PDF files (mostly unFAIR on at least three counts). This question is indeed posed in the RSC commentary. What I would like to see happen are projects such as that described here, which create what were described as “first class research objects”, and which I think amply fulfil the criteria of being FAIR. So, will ChemRxiv preprint servers help promote such FAIR data sharing as part of early research sharing? We will find out soon.

The ACS supports OA (Open Access) sharing of articles, provided the authors pay (or arrange payment of) the appropriate APC or article processing charge. These charges are complex, being subject to various discounts (for example if you as an author are an ACS member or not) but are generally not insignificant (> $1000). I wondered whether preprints might be subject to an APC, and so I asked the ACS. The response was “we don’t anticipate any submission or usages fees at this time“. I think that means free at point of submission, and free at point of readership “at this time“.

Finally, let me now summarise as I understand the current family of “research publications”:

  1. The preprint
  2. The final author version as submitted to a journal
  3. The “version of record” (VoR) as published by the journal
  4. Any FAIR published data associated with the article

All four of these are attempts at “research sharing”. Each may be located in a different location, and each may have its own DOI. And of course we cannot easily know how much overlap there is between each of them. Thus, how might 1-3 differ in terms of the story or “narrative” of scientific claims? Does 4 agree or support 1-3? Does 4 agree with perhaps data subsets contained in 1-3? If keeping abreast of the current research literature is a challenge, imagine having to cope with/reconcile up to four versions of each “publication”! 

Lots of food for thought here. We have not heard the last of these themes. 

 

Single Figure (nano)publications, reddit AMAs and other new approaches to research reporting

Wednesday, August 5th, 2015

I recently received two emails each with a subject line new approaches to research reporting. The traditional 350 year-old model of the (scientific) journal is undergoing upheavals at the moment with the introduction of APCs (article processing charges), a refereeing crisis and much more. Some argue that brand new thinking is now required. Here are two such innovations (and I leave you to judge whether that last word should have an appended ?).

To set the scene for the first, I will quote the abstract: “The single figure publication is a novel, efficient format by which to communicate scholarly advances. It will serve as a forerunner of the nano-publication, a modular unit of information critical for machine-driven data aggregation and knowledge integration[1] The kernel of this suggestion is (again I quote) “We offer the idea of the micro-publication unit, the single figure publication (SFP), to provide scholars with a real-world, manageable method to inform research.” I was struck by the overlap between this suggestion and the one you may find on many of the posts on this blog, where what I refer to as FAIR Data is assigned a digital object identifier (DOI) and included in the citation lists at the end of the post. The key phrase in the above abstract is machine-driven data aggregation and knowledge, although the article does not really go into any mechanisms for easily achieving this. It is my argument that the act of assigning a DOI carries with it the association that there is machine searchable metadata which can be retrieved and used for the aggregation and knowledge mining. The authors of this article, Do and Mobley, advocate adoption of nanopublications defined by inclusion of just a single figure (notably, not a table of results!) and some accompanying context which they claim would reduce the unit of publication to a more tractable size. This does raise the question of whether science needs more publications (in chemistry alone there are said to be more than a million published each year) or whether we should instead be concentrating our efforts on improving the data side of things by increasing its semantic content and formalising its structures, its preservation and curation. I certainly argue that far too little effort has been poured into these latter activities. You only have to look at the typical SI (supporting information) associated with many chemistry articles to realise that in many cases they are still hardly fit for purpose. There is one concept introduced by Do and Mobley that also deserves mention. Their nanopublications are structured to be read by machines, not people. They will therefore not be refereed by people (my inference). They do not really discuss how else the quality will be assessed, but of course if you treat their nanopublication as essentially FAIR data, then it does become possible to develop methods of machine refereeing.

The second email alerted me to an article[2] in the Winnower, a forum that offers a bridge between “traditional scholarly publishing tools to traditional and non-traditional scholarly outputs—because scholarly communication doesn’t just happen in scholarly journals“. Here, the concept of scholarly communication is extended to the New Reddit Journal of Science and introduces the concept pioneered by reddit of the AMA, or “ask me anything” environment. I occasionally publish some of the posts on this blog to the Winnower, receiving in return the increasingly ubiquitous DOI. I have also occasionally quoted these DOIs in articles submitted to conventional chemistry journals. What we see now is the propagation of a Winnower DOI on to e.g. https://www.reddit.com/r/science/ where anyone can post a question related to the original research reporting. I must state that I do have some reservations about this. Whilst it is likely that the majority of traditional scholarly reporting is likely to receive no AMAs (just as a very high proportion of research articles attract few if any citations in other articles over a period of decades), it is also likely that the quality of posted AMAs may turn out to be very low. At which point the original researcher has to make a judgement as to whether to devote any of their increasingly precious and fragmented time to answering them. And if few if any answers are posted in response to an AMA, the system seems unlikely to flourish.

But what we see here are two serious attempts to develop new approaches to research reporting, and not doubt others will emerge. To quote Yogi Berra, the future is not what it used to be.


Anyone can also post to this blog to ask similar questions. But note that associating an ORCID with such comments is highly recommended. I do not think that reddit currently supports ORCID, but  I would argue if the intent is serious, it certainly should.

References

  1. L. Do, and W. Mobley, "Single Figure Publications: Towards a novel alternative format for scholarly communication", F1000Research, vol. 4, pp. 268, 2015. https://doi.org/10.12688/f1000research.6742.1
  2. . RobustTempComparison, and . r/Science, "Science AMA Series: Climate models are more accurate than previous evaluations suggest. We are a bunch of scientists and graduate students who recently published a paper demonstrating this, Ask Us Anything!", The Winnower, . https://doi.org/10.15200/winn.143871.12809

Impact factors, journals and blogs: a modern distortion.

Thursday, May 21st, 2015

A lunchtime conversation with a colleague had us both bemoaning the distorting influence on chemistry of bibliometrics, h-indices and journal impact factors, all very much a modern phenomenon of scientific publishing. Young academics on a promotion fast-track for example are apparently advised not to publish in a well-known journal devoted to organic chemistry because of its apparently “low” impact factor. Chris suggested that the real reason the impact factor was “low” is that this particular journal concentrates on full articles, which for a subject area such as organic chemistry can take years to assemble and hence years for others to assimilate and report their own results, and only then creating a citation for the first article. So this slow but steady evolution of citations in a long time frame apparently shows such a journal up as having less (short-term) impact than the fast-publishing notes-type variety where the impact is immediate but possibly less long-lived. That would be no reason of itself not to publish there of course!

Most would describe a blog as an ultimate medium for short-term publishing (shortened only by e.g.  Twitter). I began to wonder what the statistics for this particular blog would show. So I looked at the time lines for the five most read posts, ranked below in terms of total hits. The oldest (and second most popular) is exactly six years old and so represents a reasonably long evolutionary time frame. The graphs below show the daily hits (red bars are annual). The immediate impact of each lasts less than a week, but the long-term analysis shows each accumulated their totals not by such immediate impact but by long-term accretion. For most, the first derivatives are still on the increase. This might all come as a surprise to those who tend to regard scientific blogs as having only short-term impact. But it would also be true to say that chemistry operates not on a time scale of days or years but of centuries, and so it will take a little while longer to assess impacts on that scale.
post-5114
post-439post-7779post-9606post-10937

What of course is not measured by simply integrating total views over time is what purpose if any each viewing serves. This is as true of journal articles as it is of blogs. And a viewing is not quite the same as a citation (although the latter does not always imply a viewing!). But it is tempting to conclude that we have all become far too fixated on short-term impacts and the bibliometrics that provide this information.