Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Chemoinformatics [15L]

4.1 Introduction to Chemoinformatics: History & evolution of chemoinformatics,


fundamental questions & learning, major tasks [2L]

4.2 Representation of molecules: Nomenclature, different types of notations, Line


notations- SMILES coding, InChi notation; Graph theory & matrix representations,
input and output of chemical structures, standard structure exchange formats, structures
of molfiles and sdfiles, Tools- academic programs: Marvin Sketch, ACD labs;
commercial tools: ChemDraw, Shrodinger, Accelrys [5L]

4.3 Representation of chemical reactions: Reaction types, reaction center, chemical


reactivity, Hendrickson’s scheme [2L]

4.4 Searching Chemical Structures: Full structure search, sub-structure search, basic
ideas, similarity search, basics of computation of physical and chemical data and
structure descriptors, data visualisation [2L]

4.5 Applications: QSPR, Spectra correlations, Computer aided synthesis design, docking
& computer aided drug designing [4L]

Introduction to Chemoinformatics

Computers have become intricately linked to human life. It was realised quite some decades
ago that the amount of information accumulated by chemists can, in the long run, be made
accessible to the scientific community only in electronic form; in other words, it has to be
stored in databases. This new field, which deals with the storage, the manipulation, and the
processing of chemical information, was emerging without a proper name. The term
chemometrics & computer chemistry have been prevalent in literature. However,
chemometrics is now accepted to be associated with data analysis methods involved in
analytical chemistry. A broader term was required which could encompass all the potential
applications of computers in chemical information. The term chemoinformatics was first
coined by Dr. Frank Brown in the Annual Reports of Medicinal Chemistry. In 1998, Dr. Brown
gave his original definition of chemoinformatics as follows: “The use of information
technology and management has become a critical part of the drug discovery process.
Chemoinformatics is the mixing of those information resources to transform data into
information and information into knowledge for the intended purpose of making better
decisions faster in the area of drug lead identification and organization.”
Chemoinformatics has since ventured into and found applications in many fields other than
drug discovery. A broad and general definition of the term is given as Chemoinformatics is the
application of informatics methods to solve chemical problems.

Chemoinformatics have two broad areas of application:


1. Databases
2. Prediction

Databases:
Probably the most important achievement of chemoinformatics is the building of databases on
chemical information. The development of chemoinformatics technologies made it possible to
store and retrieve chemical information. Databases are routinely used in chemical research
and have been possible by the work and developments by chemists, mathematicians, and
computer scientists in the decades from 1960 to 1990. The creation of these databases requires
representation of chemical compounds and chemical reactions in a graphical form. This will
be the major focus of the first part of the unit. Retrieving of information from databases
involves development of different types of queries like full structure search, substructure
search, similarity search etc. The databases and the technology surrounding them have grown
enormously in the recent decades and modern chemical research cannot be done with
consulting these databases on chemical information.

Prediction:
In his Norris Award Lecture of 1968, George S. Hammond said: “The most fundamental and
lasting objective of synthesis is not production of new compounds but production of
properties.” With this in mind, the first fundamental task of a chemist is to relate the desired
property with a chemical structure. This is the domain of structure-property relationships
(SPR). Once we have an idea about the structure that would give a desired property, we need
to plan how to synthesize this compound- which reaction or sequence of reactions to perform
to make this structure from available starting materials. This is the domain of synthesis design
and the planning of chemical reactions. Once a reaction has been performed, we have to
establish whether the reaction took the desired course, and whether we obtained the desired
structure. This is the domain of structure elucidation which for most part utilizes information
from a battery of spectra (IR, NMR, Mass)

Fundamental questions of a Chemist

All the fundamental tasks of a chemist are closely connected to one another and in order to
have a smooth flow, computational help is needed to process the large amount of data.
In essence, these fundamental questions all boil down to having to predict a property, be it
physical, chemical or biological, of a chemical compound or an ensemble of compounds, such
as the starting materials of a chemical reaction. In order to make predictions, one has to pass
through a process of learning. There exist two different types of learning:
1. Deductive learning
2. Inductive learning
In deductive learning, one must have a fundamental theory that allows one to make inferences
and to calculate the property of interest. Such a fundamental theory does exist for chemistry:
quantum mechanics. The structure, energy and many properties of a molecule can be described
quantum mechanically by the Schrödinger equation. Great progress has been made both in the
development of the theory and in advances in hardware and software technology, which now
allows the calculation of many interesting properties of chemical compounds of fairly
reasonable size with high accuracy. However, this equation quite often cannot be solved in a
straightforward manner, or its solution would require large amount of computation time. This
is even more true for chemical reactions. Only the simplest reactions can be calculated in a
rigorous manner, others require a series of approximations, and most are still beyond an exact
quantum mechanical treatment. There are large areas of interest in chemistry that are still
beyond a theoretical treatment, for example:
• The prediction of the course of an organic reaction in a certain solvent, at a given
temperature, using a specific catalyst, or
• The prediction of the biological activity of a certain compound

There is, however, another type of learning: inductive learning. From a series of observations,
inferences are made to predict new observations. In order to be able to do this, the observations
have to be put into a scheme that allows one to order them, and to recognize the common
features and the features that are different. On the basis of these observations, a model of the
principles that govern these observations must be built; such a model then allows one to make
predictions by analogy. The cycle then continues with either confirmation, rejection or
refinement of models. This process is called inductive learning.

The use of inductive learning is very common in chemistry. One such example is provided by
the bromination of various monosubstituted benzene derivatives: it was realised that
substituents with atoms carrying free electrons pairs bonded directly to the benzene ring (-OH,
-NH2 etc) gave o- and p-substituted benzene derivatives. Furthermore, in all cases except of
the halogen atoms the reaction rates were higher than with unsubstituted benzene. On the other
hand, substituents with double bonds in conjugation with the benzene ring (-NO2, -CHO etc)
decreased reaction rates and provided m-substituted benzene derivatives.
This led to the introduction of the concepts of inductive and resonance effects and to the
establishment of the mechanism of electrophilic aromatic substitution. It should be emphasized
that the concepts of inductive and resonance effect have not been derived from theory but have
been introduced to “explain” the experimental observations, to put them into a systematic
framework of an ordering scheme.
In the endeavour to deepen understanding of chemistry, many an experiment has been
performed, and many data have been accumulated. The task is then to derive knowledge from
these data by inductive learning. In this context we have to define the terms data, information
and knowledge.
• Data: any observation provides data, which could be a result of a physical
measurement, a yes/no answer to whether a reaction occurs or not, or the determination
of a biological activity.
• Information: if data are put into context with other data, we call the result as
information. The measurement of the biological activity of compound gains in value if
we also know the molecule structure of that compound.
• Knowledge: obtaining knowledge needs some level of abstraction. Many pieces of
information are ordered in the framework of a model; rules are derived from a sequence
of observations; predictions can be made by analogy.

Major Tasks in Chemoinformatics:

1. Representation of molecules- 2D, 3D & Modeling


2. Data analysis & chemometric QSPR models
3. Chemical reactions & synthesis design
4. Structure elucidation & Spectral prediction

Representation of molecules

One of the major tasks in cheminformatics is to represent chemical structures into application
programs. The first and basic step of teaching chemistry to the computer is to transform the
molecular structure into a language amenable to computer representation and manipulation.
Computers can handle bits of 0 and 1. Coding is the basis for transferring data and writing is a
kind of coding of the language.
A nomenclature or notation is called unambiguous if it produces only one structure. However,
the structure could be expressed by more than one representation, all producing the same
structure. Moreover, “uniqueness” demands that the transformation results in only one – unique
– structure or nomenclature, respectively, in both directions.
Nomenclature is one of the elementary and introductory modules in chemical education.
Learning in the subject first involves representation of chemical elements in abbreviated form.
This universal coding of chemical elements in the form of symbols accepted by the IUPAC
helps to communicate the chemical knowledge across the globe. The nomenclature of
compounds (organic & inorganic) are also based on rigorous rules laid down by IUPAC. There
are trivial and systematic nomenclature for molecules, however, neither of the two are ideal for
computer processing. The reason is that various valid compound names can describe one
chemical structure. As a consequence, the name/structure correlation is unambiguous but not
unique.

There are different notations that have been developed in order to transform chemical structures
into forms that are easily identified by computing systems. They can be categorised into:
• 2D Representations
A] Line Notations
B] Matrix Representations
• 3D Representations
A] Coordinate matrix

Line Notations
Line notations represent the structure of chemical compounds as a linear sequence of letters
and numbers. Chemical line notations represent molecular structural formulas by sequences of
ASCII characters. Line notations can be very compact, can be used as keys for database access
and provide an easy way to communicate molecular structures. The important line notations in
use even today are SMILES & InChi notations.

SMILES
The SMILES notation (Simplified Molecular Input Line Entry System)[1] provides a compact
intuitive representation of molecules by strings of ASCII characters. For example, n‐propanol
can be represented by “CCCO.” One can readily suspect that atoms are represented by their
symbols and bonded atoms follow each other in the sequence. The main rules to generate a
SMILES string are briefly presented in a simplified way:
1) Atoms are represented by their atomic symbols enclosed in square brackets, []. Elements
in the “organic subset” (B, C, N, O, P, S, F, Cl, Br, and I) may be written without brackets.
2) Adjacent atoms are assumed to be connected to each other. Single, double, triple, and
aromatic bonds are represented by the symbols −, =, #, and :, respectively (single and
aromatic bonds may be omitted, as well as implicit hydrogen atoms). For example, C =
CC represents propene.
3) Aromaticity can be specified with lower‐case atomic symbols.
4) Branches are specified by enclosing them in parentheses. For example, acetone can be
represented as CC(=O)C.
5) Cyclic structures are represented by breaking one bond in each ring. The bonds are
numbered in any order, designating ring opening (or ring closure) bonds by a digit
immediately following the atomic symbol at each ring closure. For example, the string
c1ccccc1 represents benzene.
6) Configuration around double bonds is specified by the characters / and Ä.
7) The configuration around tetrahedral centers is specified by @ or @@ written as an
atomic property following the atomic symbol of the chiral atom inside the square
brackets, and it is based on the order in which neighbors occur in the SMILES string.
Looking from the first neighbor of the chiral atom to the chiral atom, the symbol “@”
indicates that the three other neighbors appear anticlockwise in the order that they are
listed. “@@” indicates that the neighbors are written clockwise.

Exercise:

Use MarvinSketch or ACD Chemsketch as open-source platforms to verify.

InChI Notation
The InChI (phonetic /ˈɪntʃiː/ IN-chee) (International Chemical Identifier) was developed by
IUPAC to be a representation standard yielding a unique textual label for any chemical
substance. InChIs comprise different layers and sub‐layers of information separated by slashes
(/). Each InChI strings starts with the InChI version number followed by the main layer. This
main layer contains sub‐layers for chemical formula, atom connections, and hydrogen atoms.
The identity of each atom and its covalently bonded partners provide all the information
necessary for the main layer. The main layer may be followed by additional layers, for example,
for charge, isotopic composition, tautomerism, and stereochemistry. Unlike SMILES, InChI is
a canonical line notation and so is a unique identifier that is build upon a set of nomenclature
rules.
The generation of the identifier starts with a normalization step (in which information is
selected and separated into layers), followed by canonicalization (in which atoms are labelled
not depending on how the structure was entered) and serialization (in which the string is
generated based on the canonic labels). A standard InChI starts with a number followed by the
letter “S” to indicate the version of the standard. The InChI standard document released in
January 2017 has 6 core layers (and several sublayers within the core layers). The InChI
software generates both standard and nonstandard InChI, with the standard InChI having “fixed
options” that ensures interoperability between databases and software agents. The standard
InChI (version 1.05) has the following layers:
1. Main layer:
a. Chemical formula layer
b. Connections- bonds between atoms, may have sublayers, with the last one
dealing with mobile hydrogens.
2. Charge Layer:
a. Component charge
b. Protons
3. Stereochemical layer
a. Double bond sp2 (Z/E) Stereochemistry
b. Tetrahedral stereochemistry
4. Isotopic Layer
5. Fixed Hydrogen Layer
6. Polymer Layer

A nonstandard InChI does not start with InChI = 1S/… but with InChI = 1/… and has additional
layers that approach different facets of a molecule’s structure or features. The nonstandard
InChI may not be canonical, but can handle facets of information that a standard InChI cannot.
For example, a standard to be canonical, tautomers must have the same InChI, or there will be
two InChIs for the same molecule. If one wishes to define just one of the tautomers, one needs
to use nonstandard InChI and add a fixed hydrogen layer.
The InChI suite can generate a hashed version of the InChI, the InChI key. The hash function
generates a standard key of 27 characters that stores information in four parts. The InChIKey
may be a standard or nonstandard key as indicated by the version, but all keys are of the same
length and format. The hash function is a one-way conversion, that is, if you have an InChI you
can generate the key, but if you have the key you cannot generate the InChI. The key can
function as an identifier if two chemical compounds in databases have the same standard
InChIKey, they are the same chemical.
Exercise:

Visit the PubChem site (https://pubchem.ncbi.nlm.nih.gov/) and search paracetamol. Note


down the InChI notation for the molecule. See the different layers in the InChI notation. Next
search for Ibuprofen and then for S-Ibuprofen. See the difference in the InChI notation when
the stereochemistry is specified.

Matrix Representations
Chemists have been used to drawing chemical structures for more than a hundred years.
Nowadays, structures are not only drawn on paper but they are also available in electronic form
on a computer for publications, presentations or for the input and output with computer
programs. The “pictures” cannot be used directly by the computer in this form. To process a
chemical structure on the computer, the structure drawing must be converted to another form
of representation.

One of the approaches applies graph theory. In mathematical terms, the structure diagrams
drawn by a chemist, can be considered as ordinary graphs. Graphs consist of nodes (vertices),
which are the atoms, and edges, which are the bonds. A structure diagram is an undirected (the
bonds have no direction) and labelled graph (the nodes are characterized by atom symbols).
The graph so constructed carries no geometric information about the molecule. Two nodes can
have several edges between them (in chemistry, multiple bonds).

A graph can also be represented as a matrix. The major advantage of this representation is the
calculation of paths and cycles by matrix operations. A variety of matrices have been proposed
to contain structural information- adjacency, distance, incidence, bond and bond-electron
matrices.

Adjacency matrix: It is a square (n x n) matrix with the entries giving all the connectivity of the
atoms. The intersection of a row and a column obtains a value of 1 if the corresponding atoms
are connected. If there is no bond between the atoms being considered, the position in the
matrix obtains the value 0. Thus, this matrix representation is a Boolean matrix with bits (0 or
1).
The diagonal elements of such a matrix are always zero and it is symmetric around the diagonal
elements. Thus, it is a redundant matrix and can be reduced to half of its entries. Zeros can be
omitted for further clarity. Such a matrix representation is dependent only on the number of
nodes (atoms). Hence an adjacency matrix is unsuitable for reconstructing the constitution of
a molecule because it does not provide any information about the bond orders.
Distance Matrix: The elements of a distance matrix contain values which specify the shortest
distance between the atoms involved. Distance can be expressed either as geometric distances
(in Å) or as topological distances (in number of bonds)

Incidence Matrix: It is a (m x n) matrix where the columns (n) represent the nodes (atoms) and
the rows (m) represent the edges (bonds). An entry of 1 is obtained if the corresponding edge
ends in the particular node.
Bond Matrix: The bond matrix is related to the adjacency matrix but gives information also on
the bond order of the connected atoms. A value of 2 is assigned to the cell if there is a double
bond between atoms. Values of 0,1,2 or 3 are assigned for different bonding combinations. The
representation is redundant as well and hence the diagonal elements can be omitted, and the
representation reduced to half of its elements like in case of adjacency matrix.

Bond electron matrix: The BE matrix, in addition to the entries of bond values in the off-
diagonal elements, gives the number of free valence electrons on the corresponding atom in
the diagonal elements.

Evaluation of matrix representation of chemical structures


Connection Table: A major disadvantage of a matrix representation for a molecular graph is
that the number of entries increases by square of the number of atoms in the molecule. What is
needed is a representation of a molecular graph where the number of entries increases only
linearly with the number of atoms in the molecule. Such a representation can be obtained by
listing, in tabular form only the atoms and the bonds of a molecular structure.

File formats (Molfiles & SDfiles)


Some of the file formats have been widely used and accepted by the chemoinformatics
community and are used as standard formats for the exchange of information on chemical
structures and reactions. Molfile and SDfile formats were first described by Molecular Design
Limited (MDL). A Molfile describes a single molecular structure which can contain disjointed
fragments. In turn, an SDfile (SD stands for structure-data) contains structure and data
(properties) for any number of molecules. Within an SDfile, each molecule is represented by
its Molfile with additional data items describing non-structural properties (molecular weight,
heat of formation, molecular descriptors, biological activity etc). This makes it especially
convenient for handling large sets of molecules & for exchange of data between databases as
well as between computational software.
Each Molfile consists of two parts: the so-called header block specific to Molfiles (lines 1—3)
and a connection table – Ctab ( lines 4-18) which is fundamental to all MDL’s CTfile formats.
The first line of the header block contains the molecule name and does not require any
particular format. The second line, however, has a strict format as follows.

The counts line specifies information like number of atoms, number of bonds, chirality,
additional properties and the version (e.g., v2000) of the Ctab format.

Markush Structures

Markush structure diagram is a specific type of representation of a series of chemical


compounds. It is also called a generic structure diagram. They are mainly used in patent
databases.

Representation of Chemical Reactions

In order to make predictions of the reaction course and outcomes as well as in designing
synthesis, the following tasks have to be solved
1. Storing information on chemical reactions
2. Retrieving information on chemical reactions
3. Comparing and analyzing sets of reactions
4. Defining the scope and limitations of a reaction type
5. Developing models of chemical reactivity
6. Predicting the course of chemical reactions
7. Analysing reaction networks
8. Developing methods for the design of synthesis.

There are many challenges associated with transforming reactions into computer notations.
This is because of diversity in the representation of reaction. In some cases trivial products like
water, alcohol was not included in the reaction equation. The plus (+) symbol in a reaction
equation for different purposes.
For instance in the first equation given below, the + symbol on the right hand side indicates
that a molecule of ethanol is simultaneously and necessarily produced together with acetic acid.
In the second reaction equation, the + symbol on the right hand side actually reports that there
are two parallel reactions occurring, each leading to two regioisomeric products (ortho and
para).

Even on the reactant side, the + symbol is used for all the variety of different reaction types.
Though the distinction may be trivial to a trained pair of eyes, it is however necessary to be
unambiguous while creating databases of reactions.
An important step in learning from individual reactions is thus the grouping of reaction
instances into reaction types. Further, it is also necessary to identify the reaction center and the
bonds broken and made in a chemical reaction.

Reaction Types
Reactions can be objectively classified according to the overall change in molecularity. Hence
reactions can be classified into substitutions, additions or eliminations based on the above
criteria.

Reaction Center
The atoms and bond directly involved in the bond and electron rearrangement process
constitute the reaction center. Consideration of the reaction center or reaction site is of central
importance in reaction searching. It does not suffice to specify the functional groups in the
starting materials and in the products of a reaction when one is interested in a certain
transformation.
Hendrickson’s Scheme
Hendrickson concentrated mainly on C-C bond forming reactions. Each carbon atom is
classified according to which kind of atoms are bonded to it and what kind of bonds (s or p)
are involved. The number of bonds to R, P, Z and H atoms is given by the numbers s, p, z and
h, respectively. For any uncharged carbon atom the following equation must hold.

s+p+z+h=4

These numbers carry other chemical information. For example z – h = x gives the oxidation
state of a carbon atom. In effect, each carbon atom is classified according to its oxidation state,
x and its attachment to other carbon atoms.

Skeletal changes are characterized by changes in R, with constructions having positive values
(+R) and fragmentation negative (-R); functionality changes have ±P, ±Z or ±H.
STRUCTURE SEARCHING

The databases used to store and search chemical structural information have tended to be rather
specialised, owing to the nature of the methods used to store and manipulate the chemical
structure information. Indeed, many of the chemistry database facilities now available in
commercial products originated in academic research projects. Perhaps the simplest searching
task involves the extraction from the database of information associated with a particular
structure. For example, one may wish to look up the boiling point of acetic acid or the price of
acetone. The first step is to convert the structure provided by the user (the query) into the
relevant canonical ordering or representation. One could then search through the database,
starting at the beginning, in order to find this structure. However, the canonical representation
can also provide the means to retrieve information about a given structure more directly from
the database through the generation of a hash key. A hash key is typically an integer with a
value between 0 and some large number (e.g. 232 − 1). If the database is arranged so that the
hash key corresponds to the physical location on the computer disk where the data associated
with that structure is stored, then the information can be retrieved almost instantaneously by
moving the disk read mechanism directly to that location. There are well-established computer
algorithms for the generation of hash keys [Wipke et al. 1978]. Much attention in particular
has been paid to the generation of hash keys for strings of characters (such as canonical
SMILES strings). An example of a hash key devised specifically for chemical information is
the Augmented Connectivity Molecular Formula used in the Chemical Abstracts Service
(CAS) Registry System [Freeland et al. 1979] which uses a procedure similar in concept to the
Morgan algorithm. Ideally, each canonical structure will produce a different hash key.
However, there is always a chance that two different structures will produce the same hash key.
It is thus necessary to incorporate mechanisms that can automatically resolve such clashes.

SUBSTRUCTURE SEARCHING

Beyond looking up the data associated with a particular structure, substructure searching is
perhaps the most widely used approach to identify compounds of interest. A substructure
search identifies all the molecules in the database that contain a specified substructure. A
simple example would be to identify all structures that contain a particular functional group or
sequence of atoms such as a carboxylic acid, benzene ring or C5 alkyl chain. An illustration of
the range of hits that may be obtained following a substructure search based on a dopamine
derivative is shown in Figure 1-6.

Graph theoretic methods can be used to perform substructure searching, which is equivalent to
determining whether one graph is entirely contained within another, a problem known as
subgraph isomorphism. Efficient algorithms for performing subgraph isomorphism are well
established, but for large chemical databases they are usually too slow to be used alone. For
this reason most chemical database systems use a two-stage mechanism to perform substructure
search [Barnard 1993]. The first step involves the use of screens to rapidly eliminate molecules
that cannot possibly match the substructure query. The aim is to discard a large proportion
(ideally more than 99%) of the database. The structures that remain are then subjected to the
more time-consuming subgraph isomorphism procedure to determine which of them truly do
match the substructure. Molecule screens are often implemented using binary string
representations of the molecules and the query substructure called bitstrings. Bitstrings consist
of a sequence of “0”s and “1s”. They are the “natural currency” of computers and so can be
compared and manipulated very rapidly, especially if held in the computer’s memory. A “1”
in a bitstring usually indicates the presence of a particular structural feature and a “0” its
absence. Thus if a feature is present in the substructure (i.e. there is a “1” in its bitstring) but
not in the molecule (i.e. the corresponding value is “0”) then it can be readily determined from
the bitstring comparison that the molecule cannot contain the substructure. The converse does
not hold; there will usually be features present in the molecule that are not in the substructure.
An example of the use of bitstrings for substructure search is shown in Figure 1-7.

DESCRIPTORS CALCULATED FROM THE 2D STRUCTURE

Simple Counts

Perhaps the simplest descriptors are based on simple counts of features such as hydrogen bond
donors, hydrogen bond acceptors, ring systems (including aromatic rings), rotatable bonds and
molecular weight. Many of these features can be defined as substructures or molecular
fragments and so their frequency of occurrence can be readily calculated from a 2D connection
table using the techniques developed for substructure search. For most applications, however,
these descriptors are unlikely to offer sufficient discriminating power if used in isolation and
so they are often combined with other descriptors.

Physicochemical Properties

Hydrophobicity is an important property in determining the activity and transport of drugs


[Martin and DeWitte 1999, 2000]. For example, a molecule’s hydrophobicity can affect how
tightly it binds to a protein and its ability to pass through a cell membrane. Hydrophobicity is
most commonly modelled using the logarithm of the partition coefficient between n-octanol
and water (log P). The experimental determination of log P can be difficult, particularly for
zwitterionic and very lipophilic or polar compounds; data are currently available for
approximately 30,000 compounds only [Mannhold and van der Waterbeemd 2001]. Of course,
there are no data for compounds not yet synthesised. There is thus considerable interest in the
development of methods for predicting hydrophobicity values. The first approach to estimating
log P was based on an additive scheme whereby the value for a compound with a substituent
X is equal to the log P for the parent compound (typically chosen to be the molecule with the
substituent being hydrogen) plus the appropriate substituent constant πx [Fujita et al. 1964].
The substituent constants, πx, were calculated from experimental data as follows:

πx = log Px − log PH

While the approach worked well within a congeneric series of compounds, the π-values were
found not to be additive across different series. For example, it is inappropriate to use π-values
derived from a benzene parent on electron-deficient rings such as pyridine.

Many other methods for estimating log P have been proposed. Some of these are based on
breaking the molecule into fragments. The partition coefficient for the molecule then equals
the sum of fragment values plus a series of “correction factors” to account for interactions
between the fragments such as intramolecular hydrogen bonding. Two ClogP calculations are
illustrated in Figure 3-1 for benzyl bromide and o-methyl acetanilide.
APPLICATIONS IN LFER AND QSPR

You might also like