Posts Tagged ‘mobile devices’

Chemistry data round-tripping. Has there been ANY progress?

Monday, December 2nd, 2013

This is one of those topics that seems to crop up every three years or so. Since then, new versions of operating systems, new versions of programs, mobile devices and perhaps some progress? 

Right, I will briefly recapitulate. Chemical structure diagrams are special; they contain chemical semantics (what an atom is, what a bond is, stereochemistry, charges, etc). One needs special programs to represent this. Take two well-known ones. ChemBioDraw V 13 is the latest in a long line dating back to 1985 or so. A newcomer is ChemDoodle, just updated to version 6. The idea is you express your molecule, and capture some of its semantics using one of these programs. And then paste the data into another veritable word processor, Word (also dating back to around 1984). Then send the Word document to a colleague. Who might want to copy the structure back out, and put it back into ChemBioDraw/ChemDoodle. And put those semantics to good use, by editing it, or re-purposing the information. This is round-tripping the data. Its been almost 30 years, surely the process should be seamless by now? Wrong!

One problem is that the “exchange-particle” is the clipboard, yet another ancient and presumed mature technology. Its invisible of course, we rarely get to see it. And very operating system specific! So what is the current state of play? Round tripping ChemBiodraw structures across a single operating system might work. Well, it currently does for just one of the two most common desktop operating systems (remember, Word is provided by the originator of one of these operating systems). The other program, ChemDoodle round trips within both operating systems.

But, here is the key point, not across operating systems. Paste either a ChemBioDraw or a Chemdoodle structure into Word on one of these OS, and try re-editing that diagram on the version of Word on the other OS. The data is lost unless you have the “right” operating system.

An experiment I have not tried, but regarding which I would welcome any feedback is to factor in the two newest operating systems, this time for mobile devices such as tablets and phones. Lets not even worry whether different flavours of one of these mobile OSs are compatible. Apps for drawing chemical structures are available for both of these. Here, the amazing clipboard still exists. One now has four OS to consider, and four homogenous permutations and a minimum of six heterogenous round trips the data could try to take for any given app. We do not even consider app2app transfers not involving discrete intermediate documents. I would predict that only a few of these permutations preserve round-tripped data and its semantics.

Perhaps we need to look at it in a different way? One simply avoids putting data from one program into another. Chemical data is kept in its own files, never mixed with data from other programs, but always kept/sent separately. Pre-1984 and the clipboard, this might have made sense. But in an era when XML was invented around 17 years ago to allow data to fully retain semantic information in any environment it finds itself in, it seems surprising that we still have this situation.

I mention all of this, since there is a current refocusing on the importance of data; “emancipating data” is now important. But the reality is that much current software destroys the semantics in data at almost every turn. Thirty years of no progress then. But what of Chem4Word, a combination of differently namespaced  XML in which the chemistry is expressed in CML (it is only available for a single operating system!). I will perhaps devote a separate post to that one; first I have to try a few experiments!

Science publishers (and authors) please take note.

Monday, October 24th, 2011

I have for perhaps the last 25 years been urging publishers to recognise how science publishing could and should change. My latest thoughts are published in an article entitled “The past, present and future of Scientific discourse” (DOI: 10.1186/1758-2946-3-46). Here I take two articles, one published 58 years ago and one published last year, and attempt to reinvent some aspects. You can see the result for yourself (since this journal is laudably open access, and you will not need a subscription). The article is part of a special issue, arising from a one day symposium held in January 2011 entitled “Visions of a Semantic Molecular Future” in celebration of Peter Murray-Rust’s contributions over that period (go read all 15 articles on that theme in fact!).

Here I want to note just two features, which I have also striven to incorporate into many of the posts this blog (which in one small regard I have attempted to formulate as an experimental test-bed for publishing innovations). Scalable-Vector-Graphics (SVG) emerged around the turn of the millennium as a sort of HTML for images. To my knowledge, no science publisher has yet made it an intrinsic part of their publishing process (although gratifyingly all modern browsers support at least a sub-set of the format). Until now (perhaps). Thus 10.1186/1758-2946-3-46 contains diagrams in SVG, but you will need to avoid the Acrobat version, and go straight to the HTML version to see them. However, what sparked my noting all of this here was the recent announcement by Amazon that they are adopting a new format for their e-books, which they call Kindle Format 8 or KF8 (the successor to their Mobi7 format). To quote: “Technical and engineering books are created more efficiently with Cascading Style Sheet 3 formatting, nested tables, boxed elements and Scalable Vector Graphics“. This is wrapped in HTML5 to be able to provide (inter alia) a rich interactive experience for the reader. In fairness, there is also the more open epub3 which strives for the same. Other features of HTML5 include embedded chemistry using WebGL and the same mechanisms are being used for the construction of modern chemical structure drawing packages.

It remains to be seen how much of all of this will be adopted by mainstream chemistry publishers. Here, we do get into something of a cyclic argument. I suspect the publishers will argue that few of the authors that contribute to their journals will send them copy in any of these new formats and that it would be too expensive for them to re-engineer these articles with little or no help from such authors. The chemistry researchers who do the writing (perhaps composition might be a better word?) might argue there is little point in adopting innovative formats if the publishers do not accept them (I will point out that my injection of SVG into the above article did have some teething problems). For example, you will not find SVG noted in any of the “instructions for authors” in most “high impact journals” (or, come to that, HTML5).

If one looks at the 25 year old period, in 1986 all chemistry journals were distributed exclusively on paper. My office shelves still show the scars of bearing the weight of all that paper. Move on 25 years, and all journals almost without exception are now distributed electronically. I suspect the outcome in many a reader’s hands is simply that they (rather than the publisher) now bear the printing costs themselves (despite or perhaps because of the introduction of electronic binders such as Mendeley). But it will only be when the article itself grows out of its printable constraints, and hops onto mobile devices such as Kindles and iPads in the promised (scientifically) interactive and data-rich form, that the true revolution will start taking place.

A final observation: you will not readily obtain the interactive features of 10.1186/1758-2946-3-46 on e.g. an iPad or Kindle because the Java-based Jmol is not supported on either. But Jmol has now been ported to Android, and its certainly one to watch.

What is the future of books?

Friday, April 29th, 2011

At a recent conference, I talked about what books might look like in the near future, with the focus on mobile devices such as the iPad. I ended by asserting that it is a very exciting time to be an aspiring book author, with one’s hands on (what matters), the content. Ways of expressing that content are currently undergoing an explosion of new metaphors, and we might even expect some of them to succeed! But content is king, as they say.

Here I list only some innovative solutions which have emerged in the last year or so, but which also raise important issues which we ignore at our peril.

  1. TouchPress were one of the first publishers to get off the mark with their living books. Their first offering was The Elements, deriving from an earlier interactive display of the periodic table (an example of which can be seen in the entrance to the chemistry building at Imperial College). It is a programmed book, in the sense that the content is expressed using code written by the publisher (very much in the manner of interactive games).
  2. Next to appear were Inkling, who describe their offering as interactive. Their approach is described in a blog written by their founder, Matt Macinnis. There he talks about The Art of Content Engineering, which again makes it sound as if authoring a book is in effect programming it! (I know what he means; if you follow the link to the talk I allude to above, you may spot that it too is, at least in part, programmed, and not simply written). Inkling also promote the book as part of a social network, with readers able to annotate the content, and share that annotation with others.
  3. The latest company to change the way books are both read and authored is Pushpoppress, the heart of which is also an interactive app.
  4. Then there is the epub3 format. This is a free and open standard for e-books. This third revision in particular is meant to enhance interactivity.

Something of a common theme so far. Books are going to be interactive! But what about these issues?

  1. Each of the first three (commercial) publishers above has adopted their own programming format. Although HTML5 may be at the heart of some of this, programming may also mean control (in the sense that the creative industries must put control of their content at the heart of what they do). Each of the first three above sound like a closed system, and extracting re-usable content is, I argue, an essential part of doing science. I am just a tad worried that the approaches exemplified above may not allow this to happen.
  2. Suppose you manage to acquire a chemistry textbook in any of the four approaches listed above. Will they inter-operate, in the sense of being able to extract data from one and perhaps inject it into another? Or will each be a data- or information silo, rigidly controlled by the creative content generator (whoever that is)?
  3. What might an aspiring author, intent on creating interactive content do? Should they go closed/proprietary or open? They will clearly need to retrain themselves. We have indeed come a long way along the road: hand-written manuscript → typed manuscript → word-processed manuscript → interactive app! Like computer games, is the day of the single-authored book rapidly fading, to be replaced by a large team, each with their own tasks to perform?

I end with this question. Is the era of books, just like the Web itself, going to be the app? And who will be able to (find the time) to participate?