Abstract: MATH/CHEM/COMP 2002, Dubrovnik, June 24-29, 2002



Self-optimising molecular descriptors


Giorgi Lekishvili and Johann Gasteiger

Computer-Chemie-Centrum, University of Erlangen-Nuremberg, D-91052 Erlangen, Germany




In the past few years some descriptors were designed that contained not only considering the molecular structure, but also had free parameters, or variables. The numerical values of such free parameters are by no means dependent on the molecular structure but have to be optimised to achieve maximal performance of the model. The success of employing this approach crucially depends on the two following points: a refined strategy of how to vary the free parameters, as this must not be done by hand, and a substantial mathematical proof that the models obtained are not simply the by-chance ones fit for the particular dataset.

This work presents a generalized form of the self-optimising indices. Our approach is based on a modern part of mathematics, the lambda calculus.

Let B be the basis of a structural representation of a molecule, such as the adjacency matrix/connectivity table, or a vector containing the number of occurrences of different substructures in the molecule, etc. Let T be an expression possibly containing B. Then, a self-optimising index is a lambda function F of B: lB.T(B).  Here l is the so-called abstractor. For the particular case, i.e., given a numerical value of B for the molecule M, bM, the expression is reduced to a numerical value, DescrM, as shown below:


 (lB.T(B))bM->F(M) DescrM


In a simplified case, one has to have a basic form for the T expression, which will be further optimised for particular tasks. In our studies, we have applied the autocorrelation polynomial as the expression and simulated annealing as a technique to optimise it. Alternatively, neural networks could be applied. Thereafter, one can decrease the number of candidate descriptors from several hundreds to less than ten polynomials.

In the most profound case, the genetic programming can be applied to find the optimal expression of the self-optimising descriptor.