Posts Tagged ‘Ian Bruno’

Tautomeric polymorphism.

Thursday, June 1st, 2017

Conformational polymorphism occurs when a compound crystallises in two polymorphs differing only in the relative orientations of flexible groups (e.g. Ritonavir). At the Beilstein conference, Ian Bruno mentioned another type;  tautomeric polymorphism, where a compound can crystallise in two forms differing in the position of acidic protons. Here I explore three such examples.

The term occurs in the title of this article,[1] for a compound known as Omeprazole.

When the bottom structure (the 6-methoxy) is used to search the CSD, two separate series are found. The first of these is UDAVIF (DOI:  10.5517/ccp82qq,  6-Methoxy-2-((4-methoxy-3,5-dimethyl-2-pyridinyl)methylsulfinyl)-1H-benzimidazole). There is no information regarding the absolute configuration of the chiral S-centre. Although the downloaded coordinates show it as R it is probably a racemic mixture. A note added to the structure declares disorder: “Omeprazole exists as solid solutions of the two tautomers. The structure is mixed 5-methoxy/6-methoxy with occupancies 0.078:0.922“, which indicates 7.8% is present as in the upper structure above. 

The second hit is VAYXOI (DOI: 10.5517/ccp82pp, rac-6-Methoxy-2-(((4-methoxy-3,5-dimethyl-2-pyridinyl)methyl)sulfinyl)-1H-benzimidazole) which now contains no disorder; the contaminating 5-methoxy tautomer is no longer present. Perhaps not quite a true tautomeric polymorph, since the 5-methoxy tautomer is never observed in pure form.

This does occur with a second example. DEBFAR[2] represents the keto form on the right which crystallises from methanol, whilst YUYDOL as the enol form on the left crystallises from n-hexane. 

Calculations shed some light on this behaviour. DEBFAR has a computed (DOI: 10.14469/hpc/2591)  dipole moment of 11D, whereas YUYDOL (DOI: 10.14469/hpc/2590) is 2.5D. In chloroform solutions (~half way between the two solvent polarities), the keto form is ~6.1 kcal/mol lower in ΔG than the enol. The crystal packing for the two forms is very different and the differences in this packing must clearly amount to >6.1 kcal/mol to over-ride the lesser stability of DEBFAR in solution.


The final example [3] is illustrated using scheme 2 from that article, one entitled tautomeric species of 4-hydroxynicotinic acid:

The original diagram has two unfortunate bond errors which are NOT reproduced above (and which perhaps are a good topic for discussion in tutorials with students), along with an unusual interpretation of the term tautomerism. The blue arrows above are mine and I suggest the isomerism between the connected species is resonance isomerism, and not tautomerism. So three possible different true tautomers then. Five crystal structures are reported which I list below.

  1. 10.5517/cctswjz (KUXPUP, 4-oxo-1,4-dihydropyridine-3-carboxylic acid, no H2O),  10.5517/ccdc.csd.cc1kfyxv (KUXPUP01 no H2O) and 10.5517/ccdc.csd.cc1kfyzx (KUXPUP02 no H2O)
  2. 10.5517/ccx59s4 (AVEMUK, 4-Oxo-1,4-dihydropyridine-3-carboxylic acid hemihydrate) and  10.5517/ccdc.csd.cc1kfz21 (AVEMUK01)
  3. 10.5517/ccdc.csd.cc1kfz54 (AKIHIN, 4-hydroxypyridin-1-ium-3-carboxylate monohydrate) 
  4. 10.5517/ccdc.csd.cc1kfz10 (AKIHAF, 4-hydroxypyridin-1-ium-3-carboxylate)

KUXPUP and AVEMUK differ only in the presence of one solvent water molecule and both represent tautomer 2 above. AKIHIN and AKIHAF similarly represent tautomer 3 above; both are represented as 3a in the CSD and not as 3b. There are no examples of tautomer 1 in the crystal structure database; it may only exist in the gas phase. So the equilibrium 2 ⇌ 3 is another genuine example of tautomeric polymorphism, with the keto form favoured by more polar solvents, as was noted for the previous example.

With this last article,[3] comprehensive calculations at a good level were reported, including modelling the periodic cell using the Crystal program and including corrections such as BSSE (basis set superposition error) and dispersion terms. I was hopeful that this might lead me to something as simple as the computed dipole moments of the (isolated) species (as I reported above for the previous system), but these were not mentioned in the text of the article. Unfortunately, the supporting information also had no details of any such calculations, which left me frustrated again at how difficult it can be in (it has to be said) the vast majority of articles which report calculations to get details of such calculations. 

Tautomeric polymorphism remains a very rare phenomenon. SciFinder for example only has 19 references citing it (2 of which are to conference talks). Perhaps the most intriguing[4] claims that 2-thiobarbituric acid has the richest collection of tautomeric polymorphs with five. Since no calculations are reported there, I might try these out and report back here.

Postscript:  Here is some analysis of 2-thiobarbituric.

  1. THBARB (DOI 10.5517/cctbxcd10.5517/cctbxfg  and 10.5517/cctbxgh) are three polymorphs of  the keto tautomer, the isolated molecule having a small calculated dipole moment (DOI: 10.14469/hpc/2632).
  2. PABNAJ (DOI: 10.5517/cctbxbc) is a polymorph in the enol form, with a much larger calculated dipole moment (DOI: 10.14469/hpc/2633)
  3. PABNIR (DOI: 10.5517/cctbxdf) is a mixed polymorph with one enol paired with one keto form. 

The relative free-energies of the isolated molecules are 0.0 (keto) and 9.0 (enol). The keto-enol pair is 0.4 kcal/mol more stable than the isolated components. This again shows the effect that crystal packing can have on the relative energies and also shows that a  simple inspection of the dipole moment may cast light on the polymorphism.

 

References

  1. P.M. Bhatt, and G.R. Desiraju, "Tautomeric polymorphism in omeprazole", Chemical Communications, pp. 2057, 2007. https://doi.org/10.1039/b700506g
  2. Y. Akama, M. Shiro, T. Ueda, and M. Kajitani, "Keto and Enol Tautomers of 4-Benzoyl-3-methyl-1-phenyl-5(2H)-pyrazolone", Acta Crystallographica Section C Crystal Structure Communications, vol. 51, pp. 1310-1314, 1995. https://doi.org/10.1107/s0108270194007389
  3. S. Long, M. Zhang, P. Zhou, F. Yu, S. Parkin, and T. Li, "Tautomeric Polymorphism of 4-Hydroxynicotinic Acid", Crystal Growth & Design, vol. 16, pp. 2573-2580, 2016. https://doi.org/10.1021/acs.cgd.5b01639
  4. M. Chierotti, L. Ferrero, N. Garino, R. Gobetto, L. Pellegrino, D. Braga, F. Grepioni, and L. Maini, "The Richest Collection of Tautomeric Polymorphs: The Case of 2‐Thiobarbituric Acid", Chemistry – A European Journal, vol. 16, pp. 4347-4358, 2010. https://doi.org/10.1002/chem.200902485

Challenges in reliably representing the chemistry of crystal structures.

Monday, May 29th, 2017

The title here is taken from a presentation made by Ian Bruno from CCDC at the recent conference on Open Science. It also addresses the theme here of the issues that might arise in assigning identifiers for any given molecule.

The structure was represented as shown[1] by the original authors, in which the bonding from S to Sn is indicated with both solid lines (a bond) and dotted lines (an “interaction”).

Why would this matter? Well, to enable any entry in the Cambridge structure database as findable (the F of FAIR) it has to be given a unique identifier. There are in general three such identifiers assigned by the CCDC:

  1. The Refcode, in this case XONHIS. These six or seven letter codes are historically the oldest, and started off at least with an attempt if possible to assign some semantic inference from the name, even if only occasionally. 
  2. The CCDC deposition number, in this case 650011. This is the number that an author will receive immediately upon deposition, and you often find these identifiers quoted in supporting information files
  3. The DOI (digital object identifier), in this case 10.5517/ccptd3z, which can be used to view the structure even if access to the full CSD is not available to the user. In that sense, the DOI is the FAIRest of the first three of these identifiers.
  4. However, CCDC reported that they are considering adding a 4th very common identifier, based on the InChI (International chemical identifier), which comes as a full string and with the structure of the molecule at least in part inferrable from it, together with  a shortened (almost) unique string which has the advantage of being “Googlable”. Both are helpfully FAIR.

It is this 4th identifier that is at issue here. InChIs are derived from atom connection tables; you need to define all bonds present in the molecule. And it is here that the dotted “bond”/”interaction” above becomes a problem. This is the representation shown in the CSD database, which reveals that all the Sn…S interactions are classified as “bonds”, along with some creative(!) representations of the C…S bonds.

So the InChI will very much depend on whether all the Sn…S contacts are termed as bonds or as interactions. To help clarify that, it is useful to show the typical range of lengths of such contacts. Below is a simple search for all Sn and S systems where the pair are either close in space (< 3.5Å) or have a bond specified between the two atoms.

The main cluster occurs at ~2.5Å, but there is some evidence of a second peak at about 3.0Å. The third distribution up to 3.5Å is probably a continuum of very weak dispersion interaction, which most molecules exhibit. The values for XONHIS are 2.521 and 2.996Å, which match the two clusters above.

So perhaps a quantum calculation can shed some light (DOI: 10.14469/hpc/2593)? The values on the right are the optimised bond lengths which are pretty similar to the crystal structure. On the left are the calculated Wiberg bond orders (B3LYP+D3BJ/Def2-TZVPP/chloroform calculation). These reveal both “bonds” have an order less than 1. The value of ~0.6 is probably not contentious, but it does graphically show that when a compound is indexed as having a “single bond” between two atoms, the quantitative bond order may be substantially less. What however would one make of a bond order of 0.214? Should it be classified as a bond, albeit a much weaker one than normal? Or should it instead simply be a rather strong “interaction” which is not classified as a bond? And perhaps one should have in mind the question “how sensitive is this result to the quantum mechanical procedure used?”

Why does this distinction matter? Well, the InChI algorithm is based on simple connectivity; are two atoms connected by a bond or not? There are no nuances here. At the moment, this decision can be made by an algorithm based on the distance between any atom pair (whether computed or measured), but more often I suspect it derives from a “molfile” which is often derived from a human-drawn representation using a structure drawing program. It does rather boil down to the individual preferences of the human drawing the molecule. Due in part to such uncertainties, it was estimated that only 22% of structures in the CSD can be used to generate a reliable InChI. Hydrogen bonds are almost always classified as non-bonds, which means their presence is rarely systematically flagged during the indexing of the structures. Organometallics often pose some of the greatest representational problems (there are many others).

I will end by observing another class of structure that I deal with, “reaction transition states”. As you might imagine these forms are full of pairs of atoms with ambiguous bond lengths and hence connectivity. We currently have no truly reliable method for assigning useful identifiers to them. So lots of challenges for the future then!

 

References

  1. R. Reyes-Martínez, R. Mejia-Huicochea, J.A. Guerrero-Alvarez, H. Höpfl, and H. Tlahuext, "Synthesis, heteronuclear NMR and X-ray crystallographic studies of two dinuclear diorganotin(IV) dithiocarbamate macrocycles", Arkivoc, vol. 2008, pp. 19-30, 2007. https://doi.org/10.3998/ark.5550190.0009.503