Posts Tagged ‘operating systems’

Data nightmares: B40 and counting its π-electrons

Saturday, July 19th, 2014

Whilst clusters of carbon atoms are well-known, my eye was caught by a recent article describing the detection of a cluster of boron atoms, B40 to be specific.[1] My interest was in how the σ and π-electrons were partitioned. In a C40, one can reliably predict that each carbon would contribute precisely one π-electron. But boron, being more electropositive, does not always play like that. Having one electron less per atom, one might imagine that a fullerene-like boron cluster would have no π-electrons. But the element has a propensity[2] to promote its σ-electrons into the π-manifold, leaving a σ-hole. So how many π-electrons does B40 have? These sorts of clusters are difficult to build using regular structure editors, and so coordinates are essential. The starting point for a set of coordinates with which to compute a wavefunction was the supporting information. Here is the relevant page: B401 The coordinates are certainly there (that is not always the case), but you have to know a few tricks to make them usable.

  1. Open Adobe Reader, select the coordinates and copy
  2. Paste into any application which recognises text. I used an old stalwart on the Mac, BBedit. It is reliable!
  3. But no, it produces a row of skull&crossbones characters (the authors of the program clearly have a sense of humour) B402
  4. Thinking that BBedit might have let me down (for the first time), I tried Word. A little less humour, but the same result. B403
  5. There are lots of web sites out there that claim to convert PDF files directly to Word files. Again, no luck, the coordinates are now entirely missing! B404
  6. Right, time for the big guns. Adobe Acrobat XI converts .PDF to .DOC, and (if you jump through a lot of hoops to register etc) they even give you a 30 day trial. Well, at least it gives numbers. But notice that the line breaks are missing, and all the numbers flow from one line to another.B405
  7. Another copy/paste from Word to BBedit, and now I have all the numbers, and adding 40 line breaks is all that is needed (there is sometimes some skill in knowing where to add them by the way). The time taken from step 1 to step 7 was about 90 minutes (including a necessary cup of tea to recover from steps 1-5, and the realisation that the time was not wasted, since I could blog the experience!).

Well, I am sure you know what is coming next; my usual rant about how little most chemists truly value data and particularly its integrity and its semantics. And how little almost all journals understand data. Notice that the original article was published in Nature Chemistry. Note also a new journal from that stable, Scientific Data. The journal clearly thinks there is mileage in receiving scholarly articles about scientific data, and what they call data descriptors (they even got me to write a data descriptor a year or so back). Its a shame then that the same publisher allowed the decimation of the core data related to an article about B40.

They have a widely read blog, perhaps they can comment?

One more point to make about data: a phrase has recently been coined: deposition with recognition. Here, I show how my own data has been recognised:

There are various other ways as well, and perhaps I will leave this to another post. To return to the chemistry (where we should have been at the start). I ran the calculation (B3LYP+D3/TZVP) and published the newly enhanced data, citing it in the usual way.[3],[4] To answer my question, for the D2d geometry, B40 has 24 π-electrons (there is some ambiguity, it could be 26). On average, the boron retains only ~0.65s, balanced by ~2.35p electrons. The most stable π-pair is shown below. At the centre of the ring is a strongly diatropic ring current (NICS = -42 ppm)[5] suggesting aromaticity (26 electrons = 4n+2).

B40-29

I conclude by pondering whether the properties of any such boron cluster may in time prove to be directly related to the number of σ-to-π promotions.


Sadly, line breaks in lists of atom coordinates date back to an era of about 50 years ago when text files were first treated differently from binary files. Three different “standards” emerged for specifying a line break (DOS, Mac and Unix) in a text file and much confusion has there been ever since when moving these text files across operating systems. The modern way of doing it is to make line breaks redundant by instead marking up the file. The standard chemical markup, invented in 1996, and formally published in 1999[6], is CML. You will find such CML coordinates in the deposited data from this calculation.[3] You will not have any problems with line breaks!

Publication assigns a DataCite DOI. This takes about 48 hours to propagate to CrossRef, which is here used by the KCite WordPress plugin to retrieve the metadata and compose a citation. If KCite queries CrossRef before the metadata has propagated, it does not generate a citation. If you are reading this and see no citation, please revisit after 48 hours have elapsed.

The diatropicity is inverted to paratropicity (NICS = +28 ppm) when two electrons are removed to create the dication.[7] This inversion is normally a good test of aromaticity/antiaromaticity.


References

  1. H. Zhai, Y. Zhao, W. Li, Q. Chen, H. Bai, H. Hu, Z.A. Piazza, W. Tian, H. Lu, Y. Wu, Y. Mu, G. Wei, Z. Liu, J. Li, S. Li, and L. Wang, "Observation of an all-boron fullerene", Nature Chemistry, vol. 6, pp. 727-731, 2014. https://doi.org/10.1038/nchem.1999
  2. H.S. Rzepa, "The distortivity of π-electrons in conjugated boron rings", Physical Chemistry Chemical Physics, vol. 11, pp. 10042, 2009. https://doi.org/10.1039/b911817a
  3. H.S. Rzepa, "Gaussian Job Archive for B40", 2014. https://doi.org/10.6084/m9.figshare.1111454
  4. H.S. Rzepa, "B 40", 2014. https://doi.org/10.14469/ch/24884
  5. H.S. Rzepa, "Gaussian Job Archive for B40", 2014. https://doi.org/10.6084/m9.figshare.1111518
  6. P. Murray-Rust, and H.S. Rzepa, "Chemical Markup, XML, and the Worldwide Web. 1. Basic Principles", Journal of Chemical Information and Computer Sciences, vol. 39, pp. 928-942, 1999. https://doi.org/10.1021/ci990052b
  7. H.S. Rzepa, "Gaussian Job Archive for B40(2+)", 2014. https://doi.org/10.6084/m9.figshare.1111534

Chemistry data round-tripping. Has there been ANY progress?

Monday, December 2nd, 2013

This is one of those topics that seems to crop up every three years or so. Since then, new versions of operating systems, new versions of programs, mobile devices and perhaps some progress? 

Right, I will briefly recapitulate. Chemical structure diagrams are special; they contain chemical semantics (what an atom is, what a bond is, stereochemistry, charges, etc). One needs special programs to represent this. Take two well-known ones. ChemBioDraw V 13 is the latest in a long line dating back to 1985 or so. A newcomer is ChemDoodle, just updated to version 6. The idea is you express your molecule, and capture some of its semantics using one of these programs. And then paste the data into another veritable word processor, Word (also dating back to around 1984). Then send the Word document to a colleague. Who might want to copy the structure back out, and put it back into ChemBioDraw/ChemDoodle. And put those semantics to good use, by editing it, or re-purposing the information. This is round-tripping the data. Its been almost 30 years, surely the process should be seamless by now? Wrong!

One problem is that the “exchange-particle” is the clipboard, yet another ancient and presumed mature technology. Its invisible of course, we rarely get to see it. And very operating system specific! So what is the current state of play? Round tripping ChemBiodraw structures across a single operating system might work. Well, it currently does for just one of the two most common desktop operating systems (remember, Word is provided by the originator of one of these operating systems). The other program, ChemDoodle round trips within both operating systems.

But, here is the key point, not across operating systems. Paste either a ChemBioDraw or a Chemdoodle structure into Word on one of these OS, and try re-editing that diagram on the version of Word on the other OS. The data is lost unless you have the “right” operating system.

An experiment I have not tried, but regarding which I would welcome any feedback is to factor in the two newest operating systems, this time for mobile devices such as tablets and phones. Lets not even worry whether different flavours of one of these mobile OSs are compatible. Apps for drawing chemical structures are available for both of these. Here, the amazing clipboard still exists. One now has four OS to consider, and four homogenous permutations and a minimum of six heterogenous round trips the data could try to take for any given app. We do not even consider app2app transfers not involving discrete intermediate documents. I would predict that only a few of these permutations preserve round-tripped data and its semantics.

Perhaps we need to look at it in a different way? One simply avoids putting data from one program into another. Chemical data is kept in its own files, never mixed with data from other programs, but always kept/sent separately. Pre-1984 and the clipboard, this might have made sense. But in an era when XML was invented around 17 years ago to allow data to fully retain semantic information in any environment it finds itself in, it seems surprising that we still have this situation.

I mention all of this, since there is a current refocusing on the importance of data; “emancipating data” is now important. But the reality is that much current software destroys the semantics in data at almost every turn. Thirty years of no progress then. But what of Chem4Word, a combination of differently namespaced  XML in which the chemistry is expressed in CML (it is only available for a single operating system!). I will perhaps devote a separate post to that one; first I have to try a few experiments!

Data-round-tripping: wherein the future?

Tuesday, December 7th, 2010

Moving (chemical) data around in a manner which allows its (automated) use in whichever context it finds itself must be a holy grail for all scientists and chemists. I posted earlier on the fragile nature of molecular diagrams making the journey between the editing program used to create them (say ChemDraw) and the Word processor used to place them into a context (say Microsoft office), via an intermediate storage area known as the clipboard. The round trip between the Macintosh (OS X) versions of these programs had been broken a little while, but it is now fixed! A small victory. This blog reports what happened when such a Mac-created Word document is sent to someone using Microsoft Windows as an OS (or vice versa).

As you might have guessed, the molecular diagram arrives largely dead, and not re-usable. Opening the .docx archive (it is nothing more than a zip file) reveals only a JPEG file residing inside. Nothing that can be chemically repurposed. If the reverse process is undertaken, of creating a chemdraw diagram, and pasting it into Word on Windows, one finds in the .docx two components; a bit-mapped image linked to an active object containing the data. Only the first of these is recognised if the file makes its way to a Macintosh; i.e. the same story, the data is again lost. So the bottom line is that Mac users and Windows users cannot, after all, exchange repurposable molecular diagrams using Word documents using this combination of programs. This is not good.

But let me remind what happened around 1993. The word processor was joined by a program called the Web browser. In 1996, the underlying content carrier, HTML, became XHTML (an instance of XML). Right from day 1 almost, such XHTML could, and frequently was repurposed. A memorable example is that search engines could use it to index the Web. The XHTML easily survived trips to and from clipboards. In 1996, CML joined HTML as a way of carrying chemical information capable of round-tripping without loss (if need be). There are other chemical XML languages in use nowadays, including CDXML used by the ChemDraw program. Word itself now uses XML (the x in .docx). So, after 14 years, why am I still describing the difficulties above? I am frankly at a loss to explain why there is still a need to write this post.

All is not entirely lost. The CML4Word approach is designed to enable (chemical) data round tripping from the outset. Although I do not yet know if the CML created and stored in the Word document using this mechanism is recognised anywhere outside of Word 2007 on Windows?  If anyone can let me know of examples where such a CML-enabled Word document can be used in other environments, I would be very grateful (but not on  OS X, as I know already).

And as I might have mentioned in the previous post on this topic, things may not however be getting better in that other carrier of information and data, the mobile phone/iPad, as exemplified by operating systems such as iOS or Android. Watch this space, as they say.