Posts Tagged ‘Web browser’
Saturday, December 29th, 2018
The traditional structure of the research article has been honed and perfected for over 350 years by its custodians, the publishers of scientific journals. Nowadays, for some journals at least, it might be viewed as much as a profit centre as the perfected mechanism for scientific communication. Here I take a look at the components of such articles to try to envisage its future, with the focus on molecules and chemistry.
The formula which is mostly adopted by authors when they sit down to describe their chemical discoveries is more or less as follows:
- An introduction, setting the scene for the unfolding narrative
- Results. This is where much of the data from which the narrative is derived is introduced. Such data can be presented in the form of:
- Tables
- Figures and schemes
- Numerical and logical data embedded in narrative text
- Discussion, where the models constructed from the data are illustrated and new inferences presented. Very often categories 2 and 3 are conflated into one single narrative.
- Conclusions, where everything is brought together to describe the essential aspects of the new science.
- Bibliography, where previous articles pertinent to the narrative are listed.
In the last decade or so, the management of research data has developed as a field of its own, with three phases:
- Setting out a data management plan at the start of the project, often a set of aspirations together with putative actions,
- the day-to-day management of the data as it emerges in the form of an electronic laboratory notebook (ELN),
- the publication of selected data from the ELN into a repository, together with the registration of metadata describing the properties of the data.
In the latter category, item 8 can be said to be a game-changer, a true disruptive influence on the entire process. The key aspect is that it constitutes independent publication of data to sit alongside the object constructed from 1-5. More disruption emerges from the open citations project, whereby category 5 above can be released by publishers to adopt its own separate existence. So now we see that of the five essential anatomic components of a research article, two are already starting to achieve their own independence. Clearly the re-invention of the anatomy of the research article is well under way already.
Next I take a look at what sorts of object might be found in category 8, drawing very much on our own experience of implementing 7 and 8 over the last twelve years or so. I start by observing that in 2 above, figures are perhaps the object most in need of disruptive re-invention. In the 1980s, authors were much taken by the introduction of colour as a means of conveying information within a figure more clearly; although the significant costs then had to be borne directly by these authors (and with a few journals this persists to this day). By the early 1990s, the introduction of the Web[1] offered new opportunities not only of colour but of an extra dimension (or at least the illusion of one) by means of introducing interactivity for three-dimensional models. Some examples resulting from combining figures from category 2 with 8 above are listed in the table below.
Example 1 illustrates how a figure from category 2 above can be augmented with active hyperlinks specifying the DOI of the data in category 8 from which the figure is derived, thus creating a direct and contextual connection between the research article and the research data it is based upon. These links are embedded only in the Acrobat (PDF) version of the article as part of the production process undertaken by the journal publisher. Download Figure 9 from the link here and try it for yourself or try the entire article from the journal, where more figures are so enhanced.
Example 2 takes this one stage further. The hyperlinks in the published figure in example 1 were embedded in software capable of resolving them, namely a PDF viewer. But that is all that this software allows. By relocating the hyperlink into a Web browser instead, one can add further functionality in the form of Javascripts perhaps better described as workflows (supported by browsers but not supported by Acrobat). There are three such workflows in example 2.
- The first uses an image map to associate a region of the figure data object defined by a DOI.
- The second interrogates the metadata specifically associated with the DOI (the same DOIs that are seen in the figure itself) to see if there is any so-called ORE metadata available (ORE= Object Re-use and Exchange). If there is, it uses this information to retrieve the data itself and pass it through to
- the third workflow represented by a set of JavaScripts known as JSmol. These interpret the data received and construct an interactive visual 3D molecular model representing the retrieved data.
All this additional workflowed activity is implemented in a data repository. It is not impossible that it could also be implemented at the journal publisher end of things, but it is an action that would have to be supported by multiple publishers. Arguably this sort of enhancement is far better suited and more easily implemented by a specialised data publisher, i.e. a data repository.
Example 3 does the same thing for a table.
Example 4 enhances in a different manner. Conventionally NMR data is added to the supporting information file associated with a journal article, but such data is already heavily processed and interpreted. The raw instrumental data is never submitted to the journal and is pretty much always possibly only available by direct request from the original researchers (at least if the request is made whilst the original researchers are still contactable!). The data repository provides a new mechanism for making such raw instrumental (and indeed computational) data an integral part of the scientific process.
Example 5 shows how a bibliography can be linked to a secondary bibliography (citations 35 and 36 in this example in the narrative article) and perhaps in the future to Open Citations semantic searches for further cross references.
So by deconstructing the components of the standard scientific article, re-assembling some of them in a better-suited environment and then linking the two sets of components to each other, one can start to re-invent the genre and hopefully add more tools for researchers to use to benefit their basic research processes. The scope for innovation seems considerable. The issue of course is (a) whether publishers see this as a viable business model or whether they instead wish to protect their current model of the research article and whether (b) authors wish to undertake the learning curve and additional effort to go in this direction. As I have noted before, the current model is deficient in various ways; I do not think it can continue without significant reinvention for much longer. And I have to ask that if reinvention does emerge, will science be the prime beneficiary?
References
- H.S. Rzepa, B.J. Whitaker, and M.J. Winter, "Chemical applications of the World-Wide-Web system", Journal of the Chemical Society, Chemical Communications, pp. 1907, 1994. https://doi.org/10.1039/c39940001907
Tags:Academic publishing, Acrobat, Articles, chemical discoveries, data, Data management, ELN, Information, Molecules, Narrative, PDF, Publishing, Research, Scholarly communication, Science, Scientific Journal, Scientific method, Technical communication, Technology/Internet, Web browser
Posted in Chemical IT | No Comments »
Monday, May 29th, 2017
As the Internet and its Web-components age, so early pages start to decay as technology moves on. A few posts ago, I talked about the maintenance of a relatively simple page first hosted some 21 years ago. In my notes on the curation, I wrote the phrase “Less successful was the attempt to include buttons which could be used to annotate the structures with highlights. These buttons no longer work and will have to be entirely replaced in the future at some stage.” Well, that time has now come, for a rather more crucial page associated with a journal article published more recently in 2009.[1]
The story started a few days ago when I was contacted by the learned society publisher of that article, noting they were “just checking our updated HTML view and wanted to test some of our old exceptions“. I should perhaps explain what this refers to. The standard journal production procedures involve receiving a Word document from authors and turning that into XML markup for the internal production processes. For some years now, I have found such passive (i.e. printable only) Word content unsatisfactory for expressing what is now called FAIR (Findable, accessible, inter-operable and re-usable) data. Instead, I would create another XML expression (using HTML), which I described as Interactive Tables and then ask the publisher to host it and add that as a further link to the final published article. I have found that learned society publishers have not been unwilling to create an “exception” to their standard production workflows (the purely commercial publishers rather less so!). That exceptional link is http://www.rsc.org/suppdata/cp/b8/b810301a/Table/Table1.html but it has now “fallen foul of the java deprecation“.
Back in 2008 when the table was first created, I used the Java-based Jmol program to add the interactive component. That page, when loaded, now responds with the message:

This I must emphasise is nothing to do with the publisher, it is the Jmol certificate that has been revoked. That of itself requires explanation. Java is a powerful language which needs to be “sandboxed” to ensure system safety. But commands can be created which can access local file stores and write files out there (including potentially dangerous ones). So it started to become the practise to sign the Java code with the developer certificate to ensure provenance for the code. These certificates are time-expired and around 2015 the time came to renew it. Normally, when such a certificate is renewed, the old one is allowed to continue operation. On this occasion the agency renewing the certificate did not do this but revoked the old one instead (Certificate has been revoked, reason: CESSATION_OF_OPERATION, revocation date: Thu Oct 15 23:11:18 BST 2015). So all instances of Jmol with the old certificate now give the above error message.
The solution in this case is easy; the old Jmol code (as JmolAppletSigned.jar) is simply replaced with the new version for which the certificate is again valid. But simply doing that alone would merely have postponed the problem; Java is now indeed deprecated for many publishers, which is a warning that it will be prohibited at some stage in the future.‡ So time to bite the bullet and remove the dependency on Java-Jmol, replacing it with JSmol which uses only JavaScript.
Changing published content is in general not allowed; one instead must publish a corrigendum. But in this instance, it is not the content that needs changing but the style of its presentation (following the principle of the Web of a clear-cut separation of style and content). So I set out to update the style of presentation, but I was keen to document the procedures used. I did this by commenting out non-functional parts of the style components of my original HTML document (as <!– comment –>) and adding new ones. I describe the changes I made below.
- The old HTML contained the following initialisation code: jmolInitialize(".","JmolAppletSigned.jar");jmolSetLogLevel('0'); which was commented out.
- New scripts to initialize instead JSmol were added, such as:
<script src="JSmol.min.js" type="text/javascript"> </script>
- I added further scripts to set up controls to add interactivity.

- The now deprecated buttons had been invoked using a Jmol instance: jmolButton('load "7-c2-h-020.jvxl";isosurface "" opaque; zoom 120;',"rho(r) H")
- which was replaced by the JSmol equivalent, but this time to produce a hyperlink rather than a button (to allow the greek ρ to appear, which it could not on a button): <a href="javascript:show_jmol_window();Jmol.script(jmolApplet0,'load 7-c2-020.jvxl;isosurface "" translucent;spin 3;')">ρ(r)</a>,
- Some more changes were made to another component of the table, the links to the data repository. Originally, these quoted a form of persistent identifier known as a Handle; 10042/to-800. Since the data was deposited in 2008, the data repository has licensed further functionality to add DataCite DOIs to each entry. For this entry, 10.14469/ch/775. Why? Well, the original Handle registration had very little (chemically) useful registered metadata, whereas DataCite allows far richer content. So an extra column was added to the table to indicate these alternate identifiers for the data.
- We are now at the stage of preparing to replace the Java applet at the publishers site with the Javascript version, along with the amended HTML file. The above link, as I write this post, still invokes the old Java, but hopefully it will shortly change to function again as a fully interactive table.
- I should say that the whole process, including finding a solution and implementing it took 3-4 hours work, of which the major part was the analysis rather than its implementation.
It might be interesting to speculate how long the curated table will last before it too needs further curation. There are some specifics in the files which might be a cause for worry, namely the so-called JVXL isosurfaces which are displayed. These are currently only supported by Jmol/JSmol. They were originally deployed because iso-surfaces tend to be quite large datafiles and JVXL used a remarkably efficient compression algorithm (“marching cubes”) which reduces their size ten-fold or more. Should JSmol itself become non-operational at some time in the (hopefully) far future (which we take to be ~10 years!) then a replacement for the display of JVXL will need to be found. But the chances are that the table itself will decay “gracefully”, with the HTML components likely to outlive most of the other features. The data repository quoted above has itself now been available for ~12 years and it too is expected to survive in some form for perhaps another 10. Beyond that period, no-one really knows what will still remain.
You may well ask why the traditional journal model of using paper to print articles and which has survived some 350 years now, is being replaced by one which struggles to survive 10 years without expensive curation. Obviously, a 3D interactive display is not possible on paper. But one also hears that publishers are increasingly dropping printed versions entirely. One presumes that the XML content will be assiduously preserved, but re-working (transforming, as in XSLT) any particular flavour of XML into another publishers systems is also likely to be expensive. Perhaps in the future the preservation of 100% of all currently published journals will indeed become too expensive and we might see some of the less important ones vanishing for ever?†
‡Nowadays it is necessary to configure your system or Web browser to allow even signed valid Java applets to operate. Thus in the Safari browser (which still allows Java to operate, other popular browsers such as Chrome and Firefox have recently removed this ability), one has to go to preferences/security/plugin-settings/Java, enter the URL of the site hosting the applet and set it to either “ask” (when a prompt will always appear asking if you want to accept the applet) or “on” when it will always do so. How much longer this option will remain in this browser is uncertain.
†In the area of chemistry, an early pioneer was the Internet Journal of Chemistry, where the presentation of the content took full advantage of Web-technologies and was on-line only. It no longer operates and the articles it hosted are gone.
References
- H.S. Rzepa, "Wormholes in chemical space connecting torus knot and torus link π-electron density topologies", Phys. Chem. Chem. Phys., vol. 11, pp. 1340-1345, 2009. https://doi.org/10.1039/b810301a
Tags:Applet, compression algorithm, computing, Cross-platform software, HTML, HTML element, Internet Journal, Java, Java applet, Java platform, jmol, Markup languages, Open formats, publishers site, publishers systems, technology moves, Technology/Internet, the Internet Journal, Web browser, web technologies, Web-components age, XML, XSLT
Posted in Chemical IT | 8 Comments »
Monday, October 31st, 2016
Is asking a question such as “what is the smallest angle subtended at a chain of three connected 4-coordinate carbon atoms” just seeking another chemical record, or could it unearth interesting chemistry?
A simple search of the Cambridge structure database for a chain of three carbons, each bearing four substituents (sp3 hybridized in normal paralance) reveals the following distribution:

The value 60° is of course a three-membered cyclopropane ring. The tail of the distribution is very small, and there are a few small outliers with values of < 54°. Most of the time such outliers are in fact simple errors, but here we see that they are in fact semibullvalenes, of the type shown below, with the small angle subtended at the central of the three carbon atoms coloured in red.

In this diagram I have added my own semantic interpretation of what is going on. Let me itemise this:
- These molecules can undergo very rapid [3,3] sigmatropic rearrangements, shifting a σ-bond away from the 3-ring to create another such ring.
- This process elongates one of the C-C bonds and of neccessity this reduces the angle at the associated carbon.
- I have drawn two types of arrow connecting the two structures. The first is an equilibrium arrow, which implies a transition state connecting the two species. This transition state will have equal bond lengths for the forming/breaking C-C bond, and the transition state will have a rate constant which is slower than the time taken for one molecular vibration (~10-15s)
- It is also possible however that the second arrow is the correct one, and this implies an electronic resonance rather than a nuclear motion. This would have a rate constant comensurate with electron dynamics (~10-18 s) rather than nuclear vibrations.
What does x-ray crystallography measure? Well the diffraction of photons by electrons. In order to obtain a diffraction pattern, enough photons have to be diffracted to be measured, and even with most modern instruments this still takes minutes or hours. During this period, all the various nuclear positions encountered as a result of vibrations or equilibria are sampled. So if the rate constant for the [3,3] sigmatropic rearrangement is fast, x-ray diffraction will measure the average of the two sets of nuclear positions, which can be distinguished only with some difficulty (if at all) from the structure implied instead by electronic resonance.
If the equilibrium arrow applies, then the small angles of <54° are merely the average of the normal value for a 3-membered ring and a smaller value for a structure where one of the C-C bonds has been removed. In my opening sentence, I noted that the three carbon carbon atoms had to be connected in a chain. This is no longer true; the goalposts have been moved (a lot)!
If its an electron resonance, then the three carbon atoms are still connected, albeit one of the two C-C bonds is no longer a single bond but rather weaker and hence longer. The goalposts have merely been slightly shifted!
A calculation (B3LYP/Def2-TZVPP+D3 dispersion, doi: 10.14469/hpc/1850, [1]) of the structure KUZFUE [2] shows the C2-symmetric species shown below, with an elongated C-C bond and hence a reduced C-C-C angle, as being a true minimum (a resonance) rather than a transition state (an equilibrium). The vibration which shortens one C-C bond and lengthens the other has the real calculated wavenumber 244 cm-1.‡ But the boundary between the two possibilities (often referred to as the boundary between a single and a double minimum in a potential energy surface) is notoriously difficult to capture using calculations.

How could experiment definitively settle the issue? Well, the SLAC beam is a unique instrument. Its source of X-rays is so intense that you can get an analysable diffraction pattern from a crystal on a timescale so short that during this period no nuclear motions occur (not even vibrations). Those nuclear positions capture the true equilibrium positions of the atoms in the molecule. Now, how does one get beam time on the SLAC?
‡ Click on the image above to see an animation of this normal mode. If you are running the macOS Safari browser, make sure Preferences/Security/Plug-in settings/Java has the site ch.ic.ac.uk or ch.imperial.ac.uk set to on. If you do not do this, the somewhat unhelpful message You do not have Java applets enabled in your web browser, or your browser is blocking this applet. will appear. Note also that new system installations might tend to switch these settings to off.
References
- H. Rzepa, "CAZFUE", 2016. https://doi.org/10.14469/hpc/1850
- L.M. Jackman, A. Benesi, A. Mayer, H. Quast, E.M. Peters, K. Peters, and H.G. Von Schnering, "The Cope rearrangement of 1,5-dimethylsemibullvalene-2,6- and 3,7-dicarbonitriles in the solid state", Journal of the American Chemical Society, vol. 111, pp. 1512-1513, 1989. https://doi.org/10.1021/ja00186a064
Tags:animation, Bicyclic molecule, chemical record, Chemistry, City: Cambridge, Cycloalkane, Cyclopropanes, Java, Molecular geometry, Organic chemistry, potential energy surface, Safari, Web browser, X-ray
Posted in crystal_structure_mining, reaction mechanism | 7 Comments »
Wednesday, August 17th, 2016
In the previous post, I noted that a chemistry publisher is about to repeat an earlier experiment in serving pre-prints of journal articles. It would be fair to suggest that following the first great period of journal innovation, the boom in rapid publication “camera-ready” articles in the 1960s, the next period of rapid innovation started around 1994 driven by the uptake of the World-Wide-Web. The CLIC project[1] aimed to embed additional data-based components into the online presentation of the journal Chem Communications, taking the form of pop-up interactive 3D molecular models and spectra. The Internet Journal of Chemistry was designed from scratch to take advantage of this new medium.[2] Here I take a look at one recent experiment in innovation which incorporates “augmented reality”.[3]
The title is interesting: “Combination of Enabling Technologies to Improve and Describe the Stereoselectivity of Wolff–Staudinger Cascade Reaction“. One of these technologies relates to “microwave-assisted flow generation of primary ketenes by thermal decomposition of α-diazoketones at high temperature”, but the journal presentation itself attempts the “faster interpretation of computed data via a new web-based molecular viewer, which takes advantage from Augmented Reality (AR) technology“. To access this component directly, go to the link https://leyscigateway.ch.cam.ac.uk/staudinger/ It is not incorporated into the journal infrastructures as the CLIC project attempted, but is perhaps closer to the model I noted in the previous post of supporting (FAIR) data associated with the article and hosted separately from the journal.
What happens next depends rather on the Web browser you are using. With many browsers and tablets, a conventional 3D molecular presentation appears; there is no button present where the red arrow points. You will find out this is because “Augmented Reality is not available in your browser, as the getUserMedia() API is not supported“

Some browsers (the latest Opera, FireFox, Chrome) do support this feature, and a new AR button appears. Selecting this now layers the video from the device camera onto the 3D molecular model; the molecule now floats in the scene captured by the camera (which in the case below is the room I am sitting in). After a few seconds you are urged to “point the camera towards the AR marker”. The supporting information contains such AR markers as a navigation aid for the 3D coordinates contained there. An example is:

If this marker is now brought into the camera view (by printing it, sic) and holding it in front of the camera image, the marker resolves into further data relevant to the molecule of interest, layered into the existing scene of the room and the molecule. For the marker above, it resolves to a reaction energy profile which reveals where the specific molecule sits energetically in terms of the overall reaction.

This layering of “heads up” molecular data into a scene comprising a 3D molecular model and the human viewer of that molecule captured in video is what defines the concept of “augmented reality” (the data being the augmentation, rather than the human).
Having now tried it out, I was left wondering whether this truly was a great advance in enabling technology for chemistry journals. The role of the camera seems primarily to capture the AR markers contained in the supporting information; the presence of the reader in the video image apparently inspecting the molecule could be regarded as a distraction. The AR markers (QR codes) are merely visual representations of a URL, which in the form of a DOI (as used in this blog) to locate data is rather more familiar to most readers. The DOI, by the way, carries further information in the form of metadata, and which when sent to e.g. DataCite, enables the data to be found. Does the data need to be layered onto the molecule (and apparently floating in front of the reader) to become usable? Could it instead be placed in a pop-up or separate window of its own (as the 1994 CLIC project achieved)? Do the AR markers enable the data to be FAIR? One can Find the data (albeit only by reading and printing the supporting information) and view it in the AR scene, but is it Accessible (can one access the underlying numerical data?) or Interoperable (place it into another program) or Re-usable?
As with all enabling technologies, one has to always ask if that technology helps or hinders. Or is the principle of KISS (keep it simple) sometimes better? It is however good to see research groups experimenting with these themes and meanwhile readers can judge for themselves whether “heads up” AR augmentation of the data describing research is indeed the next big thing.
References
- D. James, B.J. Whitaker, C. Hildyard, H.S. Rzepa, O. Casher, J.M. Goodman, D. Riddick, and P. Murray‐Rust, "The case for content integrity in electronic chemistry journals: The CLIC project", New Review of Information Networking, vol. 1, pp. 61-69, 1995. https://doi.org/10.1080/13614579509516846
- S.M. Bachrach, and S.R. Heller, "The<i>Internet Journal of Chemistry:</i>A Case Study of an Electronic Chemistry Journal", Serials Review, vol. 26, pp. 3-14, 2000. https://doi.org/10.1080/00987913.2000.10764578
- S. Ley, B. Musio, F. Mariani, E. Śliwiński, M. Kabeshov, and H. Odajima, "Combination of Enabling Technologies to Improve and Describe the Stereoselectivity of Wolff–Staudinger Cascade Reaction", Synthesis, vol. 48, pp. 3515-3526, 2016. https://doi.org/10.1055/s-0035-1562579
Tags:Academia, Academic publishing, Boom, Design, Design Services, Innovation, Internet Journal, online presentation, Preprint, Publishing, reaction energy profile, technology helps, Web browser, web-based molecular viewer
Posted in General | 1 Comment »
Saturday, March 5th, 2011
My colleague Bill Griffith has again come up with another colour challenge: that of the ancient semi-precious stone Lapis Lazuli, mined in the mountains of Afghanistan for more than 6000 years and used by painters in some medieval paintings of the Virgin, the Wilton diptych etc.

Lapis Lazuli (photo from Wikipedia).
The formula is (approximately): (Na,Ca)8(AlSiO4)6(S,SO4,Cl)1-2, which sounds a bit of a challenge! But, as a very recent article points out (DOI: 10.1039/b910469k) the component that imparts the colour is the sulfur, more specifically present in the stone as the S3– radical anion. No recent calculation of the UV/Vis spectrum of this simple triatomic has been reported, so here goes. A ωB97XD/aug-cc-pVQZ calculation, embedded in a continuum solvent field of water (which serves to compactify the otherwise diffuse anionic aspect) and with TD-DFT applied, shows the following (you will need an SVG enabled Web browser to see the spectrum. I am here promoting the use of this graphical standard, which differs from normal images in scaling as you resize the page size with no loss of resolution).

The λmax is ~ 650nm calculated and ~619nm measured (as a solution in an ionic liquid). Not bad agreement! The molecular orbitals involved in the excitation are shown below.
 Highest doubly occupied MO. Click for 3D. |
 Lowest singly occupied MO. Click for 3D. |
Such a precious colour, and produced using such a cheap material!
Tags:Afghanistan, Bill Griffith, Lapis Lazuli, Missouri, trisulfide radical anion, Web browser
Posted in Interesting chemistry | 2 Comments »
Friday, December 24th, 2010
If you get a small rotatable molecule below, then ChemDoodle/HTML5/WebGL is working. Why might this be important? Well, the future is mobile, in other words, devices that rely on batteries or other sources of built-in power. This means the power guzzling GPU cards of the past (some reach ~400 Watts!) cannot be used. Rather than using e.g. a full power OpenGL library, one will use Web-based graphics libraries, which (to quote Wikipedia) extends the capability of the JavaScript programming language to allow it to generate interactive 3D graphics within any compatible web browser. A typical target device might be for example Apple’s iPad (for which the redoubtable Jmol, which is based on Java, is unlikely to ever work).
To find out if your device and its browser can support this type of graphical display, go to either this test page or this more general one (which at the time of writing actually gets the WebGL test wrong!).
I have deployed an earlier graphical methodology in other posts (SVG), which many browsers now support. This combination of HTML5, SVG and WebGL is the future! For its use on another blog, see here.
Tags:3D graphics, Apple, GPU, HTML5, iPad, Java, JavaScript, OpenGL library, SVG, typical target device, Web browser, Web-based graphics libraries, WebGL
Posted in Chemical IT | 3 Comments »
Tuesday, December 7th, 2010
Moving (chemical) data around in a manner which allows its (automated) use in whichever context it finds itself must be a holy grail for all scientists and chemists. I posted earlier on the fragile nature of molecular diagrams making the journey between the editing program used to create them (say ChemDraw) and the Word processor used to place them into a context (say Microsoft office), via an intermediate storage area known as the clipboard. The round trip between the Macintosh (OS X) versions of these programs had been broken a little while, but it is now fixed! A small victory. This blog reports what happened when such a Mac-created Word document is sent to someone using Microsoft Windows as an OS (or vice versa).
As you might have guessed, the molecular diagram arrives largely dead, and not re-usable. Opening the .docx archive (it is nothing more than a zip file) reveals only a JPEG file residing inside. Nothing that can be chemically repurposed. If the reverse process is undertaken, of creating a chemdraw diagram, and pasting it into Word on Windows, one finds in the .docx two components; a bit-mapped image linked to an active object containing the data. Only the first of these is recognised if the file makes its way to a Macintosh; i.e. the same story, the data is again lost. So the bottom line is that Mac users and Windows users cannot, after all, exchange repurposable molecular diagrams using Word documents using this combination of programs. This is not good.
But let me remind what happened around 1993. The word processor was joined by a program called the Web browser. In 1996, the underlying content carrier, HTML, became XHTML (an instance of XML). Right from day 1 almost, such XHTML could, and frequently was repurposed. A memorable example is that search engines could use it to index the Web. The XHTML easily survived trips to and from clipboards. In 1996, CML joined HTML as a way of carrying chemical information capable of round-tripping without loss (if need be). There are other chemical XML languages in use nowadays, including CDXML used by the ChemDraw program. Word itself now uses XML (the x in .docx). So, after 14 years, why am I still describing the difficulties above? I am frankly at a loss to explain why there is still a need to write this post.
All is not entirely lost. The CML4Word approach is designed to enable (chemical) data round tripping from the outset. Although I do not yet know if the CML created and stored in the Word document using this mechanism is recognised anywhere outside of Word 2007 on Windows? If anyone can let me know of examples where such a CML-enabled Word document can be used in other environments, I would be very grateful (but not on OS X, as I know already).
And as I might have mentioned in the previous post on this topic, things may not however be getting better in that other carrier of information and data, the mobile phone/iPad, as exemplified by operating systems such as iOS or Android. Watch this space, as they say.
Tags:Android, cellular telephone, chemical, chemical information, Chemical IT, content carrier, HTML, iPad, JPEG, Mac OS X, Macintosh, Microsoft, Microsoft Windows, opendata, operating systems, search engines, Web browser, word processor, XML
Posted in Chemical IT | No Comments »