Chemometrical treatment
of pesticide data
Marjana Novic
National Institute of
Chemisty, Ljubljana, Slovenia
The pesticide dataset containing 235 chemical compounds was studied for relationships between chemical structure and toxic action on rats (LD50 values). The compounds represented by at most 173 different descriptors, such as constitutional, geometrical, topological,electrostatic, quantum-chemical, and logD, were involved into a modelling procedure based on counterpropagation artificial neural network (CP-ANN). A subset of 164 pesticides is represented by additional 16 descriptors that are computable only for a limited number of compounds from the set.
The CP-ANN model on the basis of 235 chemicals with known eco-toxicity is generated. Special emphasis is given to assess the optimal number of descriptors for the model, which is not overfitted. The randomization test is used for this purpose. A modified algorithm of CP-ANN, which is capable of dealing with missing data, was applied to the subset of pesticides represented by incomplete descriptors. The possibility of taking into account the incomplete descriptors is of special benefit if that particular descriptor correlates well with the compounds' activity.