|
|
|
InChI is a character string of letters, numbers
and symbols; the InChIKey is a pure text string, which can be processed very
easily by search engines. |
"Through our active collaboration in the InChI Trust, we can provide high-level technical skills in organization, documentation and the unaltered playback of electronically-published scientific information to assist in global standardization in the identification of chemical structural formulae on the Internet," said Professor Dr. René Deplanque, Managing Director of FIZ CHEMIE, explaining their motivation in taking over this responsibility. According to Deplanque, standardization using a system that is not owned by a single company or a closed group of interested parties is urgently required to make scientific knowledge available to every interested party, now and in the future. "Up until now, anyone who had enough money or access to libraries could buy standard works on chemistry and could look up what was published when on whatever substance, and could base his or her own researches on that. With the rapid dissemination of publications on the Internet, there has been a massive fall, or even a halt, in printed documentation. A replacement must be created for this," explained Deplanque. This is best done through international collaboration between the large processors of information, as this will combine skills, and at the same time prevent the creation of monopolies or oligopolies in the provision of scientific information. The InChI system was originally developed by the chemical association IUPAC (International Union of Pure and Applied Chemistry) as a uniform data representation format in the public domain for chemical structures in the databases of the American authority NIST (National Institute of Standards and Technology). Pushed forward by the members of the InChI Trust, it is now being transferred step by step to scientific publications on the Internet. The first version of the identification system was published in 2005. A further version followed in 2008, which generated shorter character strings as target quantities, called InChI keys, through the use of the so-called hash function. They are built on the original algorithm to maintain interoperability between databases and other InChI sources, such as articles in periodicals. From a software point of view, an InChI is a character string of letters, numbers and symbols; the InChIKey is a pure text string, which can be processed very easily by search engines. Entered into Google, however, both the InChI and the shorter InChIKey look like a secret alchemical message, and it is hard to type the character string without making any mistakes. But that’s not necessary, as InChIs can be generated by tools on the Internet that can be used free of charge, such as on the RSC website. By entering either a structure, a substance name or a formula, the tool generates the required InChI (Fig. 1). The character string is copied from the browser and transferred to the search field of the search engine. The search engine then finds the corresponding InChI codes in the very diverse sources of information on the Internet, such as references to publications in the PubChem database. The Resolver tool shows the located code as a structure (Fig. 2). All of this can be tried out on the RSC website at: http://inchis.chemspider.com.
|
 |
|
Fig. 1: Generating InChI Code
[Graphic: V.Vogt-Herrmann] |
|
 |
|
Fig.2: Resolving InChI Code
[Graphic: V. Vogt-Herrmann] |
Other technical explanations, frequently-asked questions and their answers, as well as information about becoming a member of the InChI Trust, can be found on the InChI Trust website.
|