Posts Tagged ‘Microsoft’

Refactoring my lecture notes on pericyclic reactions.

Sunday, December 29th, 2013

When I first started giving lectures to students, it was the students themselves that acted as human photocopiers, faithfully trying to duplicate what I was embossing on the lecture theatre blackboard with chalk. How times have changed! Here I thought I might summarise my latest efforts to refactor the material I deliver in one lecture course on pericyclic reactions (and because my notes have always been open, you can view them yourself if you wish).

When I first started this course, I created notes using a typewriter, and used stencils to draw the chemical diagram (with a french curve for smooth lines). Now of course, for most (organic chemistry) lecturers, it’s a combination of a chemical drawing software tool (which we first saw around 1985) and a visual presentation package such as Powerpoint (although we do still have one lecture theatre with a real chalk board which many lecturers still use, albeit probably to slow their presentation down and avoid “death by Powerpoint” for the students). I never endorsed this combination, having avoided PPT for many years. Instead, I do it this way, and here I will list some of the aspects of my newly refactored notes.

  1. I am at ease writing HTML(5). Because it is native markup, it has stood me in good stead for many years, since it adapts to new technologies very well and is not dependent on proprietary software. One example might be when I first refactored such notes into an e-book format (EPUB) using Calibre. It converts HTML delightfully easily. I should say that one does have to use HTML “properly”. By this I mean ensuring that style is implemented using CSS, that meta-data is appropriately entered into the document header, and that the resulting document is checked using the HTML syntax checker Tidy. This latter is actually built into the software editor I have used for many years (BBedit), but the same validation can be achieved via sites such as http://www.dirtymarkup.com/
  2. The diagrams can be created using the usual tools (ChemDraw, ChemDoodle), but I convert the end product to the SVG format. Chemdoodle saves as this natively; Chemdraw can save as EPS, and then I convert this without issue to SVG using Scribus.  I adopted this approach quite a few years ago (in 2000 in fact), since I felt that scalable graphics (which is what SVG is) had a sounder future than raster graphics. So it has proved, since SVG is now natively supported in most new Web browsers (IE being the prominent exception). 
  3. I decided about six years ago to augment my pericyclic lectures with properly computed transition states. After all, the essence of this topic is the properties of the reaction transition state (my approach is to consider its aromaticity), and so a quantitative model for this seems desirable. Mine have all been computed using the ωB97XD/6-311G(d,p) method (also extensively employed on this blog).
  4. About three years ago, starting from these transition states, I computed intrinsic reaction coordinates for all the reactions. These are now deployed as animations in my notes. I use animated GIFs for this, since it is widely supported. A bonus was to discover that conversion of such diagrams into EPUB format preserves the animation, and many e-book readers that support this format honour the animation (but not all). 
  5. When my transition state library (now including around 100 in number) was expanding, I put some thought into how to express these models in my notes. Initially, I used Jmol to do so, and wrote some Javascript for the notes that would “pop-up” a separate window to display the Jmol model. Not surprisingly, this feature does not condense down into an EPUB book at all. Nor does it work if the notes are displayed on a tablet browser (most such devices do not support the Java that Jmol requires). There is some prospect on the horizon that the EPUB3 format might be capable of supporting such interactivity, but not quite yet. In refactoring my notes for the next lecture delivery, I pondered how to handle this aspect. A natural was JSmol, a recent evolution of Jmol which does not depend on Java, instead using the very different technology Javascript (implemented natively in the Web browser, just like SVG). If you are interested, some of the early evolution of this new approach can be seen in an article I wrote about 18 months ago, and which contains very early working examples.[1] The pop-up window approach does not work very well on tablets, and so a hideable insert into the main window was adopted. The reader can show/hide this display readily, and its position stays fixed whilst the notes scroll around it.toolbarMaking this window come and go itself requires a control toolbar and my current design (which may change!) is shown above. It clearly indicates to the reader that the notes they are reading have interactive components. I cannot resist also directing you to chemagic.com/ochem/ochem.htm which is a particularly good example of how to create interactive teaching notes. Into this inset window, all my transition states, molecular orbitals and other relevant surfaces can appear on demand.
  6. Notice that the notes are indexed (by Google) and so are text-searchable (I cannot help but note that my institution uses a commercial content-management system for most lectures except mine, and where text-search of materials is not possible for obscure reasons). The transition state models can be indexed using InChI key generated when the molecule is deposited into a repository (see below), but sub-structure searching is a trickier issue!
  7. If any student wants to follow-up on any one entry in my transition state library, I have helped them do so by linking each one to a digital repository. This is addressed by a persistent digital object identifier (DOI), and from there anyone interested can branch off and start doing their own modelling. 
  8. On the subject of the DOI, I decided to allocate the notes themselves with such a handle. It is http://doi.org/10042/a3uxp[2] You might imagine that allocating such persistent identifiers is something only publishers can do (and that it costs money). In fact, a so-called handle server comes with our digital repository, and a clever programmer (thanks Matt!) was able to extract the relevant code and make it a stand-alone resource. I will not list the URL here, since we really only want handles allocated using it to be associated with my department!
  9. The transition states are also allocated handles, and it is perfectly possible to combine JSmol with handles to create interactive windows that retrieve the relevant files using the data DOI and render them. We are writing an article on how to do this, so look out for it.
  10. I pondered printing long and hard! Having sweated to get interactive components and animations into the notes, I felt that destroying this feature by what I will call 2D printing was an oxymoron. But students like printing (about 20 million print copies are made at my institution each year). In fact, some more Javascript resulted in a small link in my toolbar above, which first hides all the interactive boxes (including the toolbar itself!) before printing the page, and then returning the screen back to normal. I feel bad in a sense about doing this, but printed notes do have one nice feature, one can write notes in the margin!
  11. So can one replicate note taking electronically, using say a tablet? This it has to be said remains a somewhat unsolved problem, especially in chemistry. One has vexed issues to confront such as whether to use a stylus for the purpose (either passive, or palm-rejecting active) and whether the stylus is compatible with whatever note taking software one wants to use. And how should any notes take be stored? One chemical solution is www.cambridgesoft.com/land/flick-to-share.aspx where a ChemDraw diagram can be “flicked” to another student, or indeed an instructor, and also stored in the cloud. But I think I am going to have to work hard to convince most students that e-notes are better than just printing the notes onto paper and using a ball-point pen. I do however, expect interesting things to happen in this regard in the next year or two, so a space to watch most certainly.
  12. I have added 3D printing to my list. It is not yet on demand from the models built into the notes, but our small library of 3D-printable molecular orbitals and transition states is growing, and perhaps in the near future, a student can simply press another link entitled “3D print” to produce a model for themselves. If you have never held a 3D-printed molecular orbital in your hands, try it! You might discover something new.
  13. Templates? I have tried to package the distracting “behind-the-scenes” stuff into scripts and stylesheets, so that an author need only write the fairly simple HTML. Perhaps there are excellent commercial packages out there that might make the task easier. But just like convincing students to abandon using paper to carry notes is a tough nut to crack, so too I suspect will be convincing my colleagues to adopt this format. Most are very wedded indeed to the traditional word processor and the traditional presentation package.

Well, the refactored lecture materials described above will be exposed to real students (and real tutors) in a few weeks time, and feedback will no doubt be received. But if anyone reading this blog wishes to comment on any aspect, I most certainly welcome it. 


In what I initially regarded as an unusual 15th birthday present, my father gave me a mechanical typewriter, along with a small booklet on how to learn to touch-type. In retrospect, this gift was very far-sighted. Rather than just playing football with my friends, I also learned to touch-type. This blog has just been written employing this skill.

Internet Explorer for Microsoft Windows 8.0 and 8.1 displays these components well.

References

  1. H.S. Rzepa, "Chemical datuments as scientific enablers", Journal of Cheminformatics, vol. 5, 2013. https://doi.org/10.1186/1758-2946-5-6
  2. "Organic Pericyclic Reactions", 2014. http://doi.org/10042/a3uxp

Data-round-tripping: wherein the future?

Tuesday, December 7th, 2010

Moving (chemical) data around in a manner which allows its (automated) use in whichever context it finds itself must be a holy grail for all scientists and chemists. I posted earlier on the fragile nature of molecular diagrams making the journey between the editing program used to create them (say ChemDraw) and the Word processor used to place them into a context (say Microsoft office), via an intermediate storage area known as the clipboard. The round trip between the Macintosh (OS X) versions of these programs had been broken a little while, but it is now fixed! A small victory. This blog reports what happened when such a Mac-created Word document is sent to someone using Microsoft Windows as an OS (or vice versa).

As you might have guessed, the molecular diagram arrives largely dead, and not re-usable. Opening the .docx archive (it is nothing more than a zip file) reveals only a JPEG file residing inside. Nothing that can be chemically repurposed. If the reverse process is undertaken, of creating a chemdraw diagram, and pasting it into Word on Windows, one finds in the .docx two components; a bit-mapped image linked to an active object containing the data. Only the first of these is recognised if the file makes its way to a Macintosh; i.e. the same story, the data is again lost. So the bottom line is that Mac users and Windows users cannot, after all, exchange repurposable molecular diagrams using Word documents using this combination of programs. This is not good.

But let me remind what happened around 1993. The word processor was joined by a program called the Web browser. In 1996, the underlying content carrier, HTML, became XHTML (an instance of XML). Right from day 1 almost, such XHTML could, and frequently was repurposed. A memorable example is that search engines could use it to index the Web. The XHTML easily survived trips to and from clipboards. In 1996, CML joined HTML as a way of carrying chemical information capable of round-tripping without loss (if need be). There are other chemical XML languages in use nowadays, including CDXML used by the ChemDraw program. Word itself now uses XML (the x in .docx). So, after 14 years, why am I still describing the difficulties above? I am frankly at a loss to explain why there is still a need to write this post.

All is not entirely lost. The CML4Word approach is designed to enable (chemical) data round tripping from the outset. Although I do not yet know if the CML created and stored in the Word document using this mechanism is recognised anywhere outside of Word 2007 on Windows?  If anyone can let me know of examples where such a CML-enabled Word document can be used in other environments, I would be very grateful (but not on  OS X, as I know already).

And as I might have mentioned in the previous post on this topic, things may not however be getting better in that other carrier of information and data, the mobile phone/iPad, as exemplified by operating systems such as iOS or Android. Watch this space, as they say.

Data-round-tripping: moving chemical data around.

Saturday, November 20th, 2010

For those of us who were around in 1985, an important chemical IT innovation occurred. We could acquire a computer which could be used to draw chemical structures in one application, and via a mysterious and mostly invisible entity called the clipboard, paste it into a word processor (it was called a Macintosh). Perchance even print the result on a laserprinter. Most students of the present age have no idea what we used to do before this innovation! Perhaps not in 1985, but at some stage shortly thereafter, and in effect without most people noticing, the return journey also started working, the so-called round trip. It seemed natural that a chemical structure diagram subjected to this treatment could still be chemically edited, and that it could make the round trip repeatedly. Little did we realise how fragile this round trip might be. Years later, the computer and its clipboard, the chemistry software, and the word processor had all moved on many generations (it is important to flag that three different vendors were involved, all using proprietary formats to weave their magic). And (on a Mac at least) the round-tripping no longer worked. Upon its return to (Chemdraw in this instance), it had been rendered inert, un-editable, and devoid of semantic meaning unless a human intervened. By the way, this process of data-loss is easily demonstrated even on this blog. The chemical diagrams you see here are similarly devoid of data, being merely bit-mapped JPG images. Which is why, on many of these posts, I put in the caption Click for 3D, which gives you access to the chemical data proper (in CML or other formats). And I throw in a digital repository identifier for good measure should you want a full dataset.

It is only now that we (more specifically, this user) understand what had happened under-the-hood to break this round-tripping. In 1984, when Apple produced the Mac, they also produced a most interesting data format called PICT. A human saw the PICT as a PICTure, but the computer saw more. It (could) see additional data embedded in the PICT. The clipboard supported the PICT format, which meant that both picture and data could be transferred between programs. And ChemDraw and Word also understood this. Hence the ability to round-trip noted above (it has to be said between specifically these programs).

Times moved on and the limitations of PICT set in. Apple refocussed on the PDF format. Related, notice, to the Postscript format that Adobe had introduced in order to allow high quality laserprinting. PICT support was abandoned, and the various components no longer carried recognisable data (specifically the clipboard or the ability of Word to recognise the data). Round-tripping broke. Does this matter? Well, one colleague where I work had accumulated more than 1000 chemical diagrams, which he decided to store in Powerpoint (and yes, he threw the original Chemdraw files away). The day came when he wanted to round trip one of them. And of course he could not. He was rather upset I have to say!

PDF was not really a format designed to carry data (see DOI: 10.1021/ci9003688). But, bless their hearts, the three vendors involved in this story all agreed to support data embedded in the PDF hamburger (and Abobe to tolerate it) and now once again, a structure diagram can move into an Office program (on Mac) and out again and retain its chemical integrity. What lessons can be learnt?

  1. Firstly, out of side, out of mind. The clipboard is truly mostly out of sight, and it was not really designed from the outset to preserve data properly. Nowadays I wonder whether clipboards in general recognise XML (and hence CML) and preserve it. I truly do not know. But they should.
  2. Secondly, any system which relies on three or four commercial vendors, who at least in the past, devised proprietary formats which they could change without warning, is bound to be fragile.
  3. We have learnt that data is valuable. More so than the representation of it (i.e. a 2D or 3D structure diagram). But when its lost, the users should care! And tell the vendors.
  4. Peter Murray-Rust and his team have produced CML4Word (or as Microsoft call it, Chemistry add-in for Word). At its heart is data integrity. Fantastic! But I wonder if it survives on Microsoft’s clipboard (I know it does not on Apple’s, since CML4Word is not available on that OS. And is unlikely to ever become so).
  5. And I can see history about to repeat itself. The same seems about to happen on new devices such as the Apple iPad. It too has copy/paste via a clipboard. I bet this will not round trip chemistry (or much other) data! Want to bet that the lessons of this story have not yet been learnt?

Oh, for those who wish to round-trip chemistry on a Mac, you will have to acquire ChemDraw 12.0.2 and Word 2011 (version 14.01), as well as OS X 10.6 for it to work.

A Digital chemical repository – is it being used?

Tuesday, May 4th, 2010

In this previous blog post I wrote about one way in which we have enhanced the journal article. Associated with that enhancement, and also sprinkled liberally throughout this blog, are links to a Digital Repository (if you want to read all about it, see DOI: 10.1021/ci7004737). It is a fairly specific repository for chemistry, with about 5000 entries. These are mostly the results of quantum mechanical calculations on molecules (together with a much smaller number of spectra, crystal structure and general document depositions). Today, with some help (thanks Matt!), I decided to take a look at how much use the repository was receiving.

  1. The first entry in the log dates from 2008-02-05.
  2. The repository is now receiving about 1200 accesses via handle resolutions each day, which comprises
  3. ~150 unique client IPs, and
  4. ~900 unique handles accessed daily

Whilst most of the hits are coming from web spiders by auto-discovery, a fair number (perhaps ~300) of the 5000 entries have also been linked to via journal articles, and of course this blog, and some hits may be presumed to be the result of non-random ping-backs. A breakdown of a typical day (2010-02-10) when 839 unique handles were accessed shows access by, amongst others, five universities, Google/Yahoo, several other information corporations and Microsoft. I had no idea Microsoft was interested in calculations on molecules! You saw that here first!!

Other anecdotal feedback regarding the repository: I often use it to exchange calculations with collaborators, sending them the handle instead of a vast checkpoint or log file. Some collaborators, it has to be said are baffled by the interface presented to them (which was designed in large measure by DSpace, not by us).

It is early days in many ways, and being pretty much the only standards-compliant digital repository operating in chemistry in this manner means that awareness is still low. If anyone reading this blog knows of significant others, please comment.