To use all functions of this page, please activate cookies in your browser.
With an accout for my.chemeurope.com you can always see everything at a glance – and you can configure your own website and individual newsletter.
- My watch list
- My saved searches
- My saved topics
- My newsletter
Smiles arbitrary target specification
Additional recommended knowledge
Introduction and background
Smiles ARbitrary Target Specification (SMARTS) is a language for specifying substructural patterns in molecules. The SMARTS line notation is expressive and allows extremely precise and transparent substructural specification and atom typing.
SMARTS is related to the SMILES line notation that is used to encode molecular structures and like SMILES was originally developed by David Weininger and colleagues at Daylight Chemical Information Systems. The most comprehensive descriptions of the SMARTS language can be found in Daylight's SMARTS theory manual, tutorial  and examples. OpenEye Scientific Software have developed their own version of SMARTS which differs from the original Daylight version in how the R descriptor (see cyclicity below) is defined.
Atoms can be specified by symbol or atomic number. Aliphatic carbon is matched by [C], atomatic carbon by [c] and any carbon by [#6] or [C,c]. The wild card symbol *, A and a match any atom, any aliphatic atom and any aromatic atom respectively. Implicit hydrogens are considered to be a characteristic of atoms and the SMARTS for an amino group can be written as [NH2]. Charge is specified by the descriptors '+' and '-' as exemplified by the SMARTS [nH+] (protonated aromatic nitrogen atom) and [O-]C(=O)c (deprotonated aromatic carboxylic acid).
A number of bond types can be specified: '-' (single), '=' (double), '#' (triple), ':' (aromatic) and '~' (any).
The X and D descriptors are used to specify the total numbers of connections (including implicit hydrogen atoms) and connections to explicit atoms. Thus [CX4] matches carbon atoms with bonds to 4 other atoms while [CD4] matches quaternary carbon.
As originally defined by Daylight, the R descriptor is used to specify ring membership. In the Daylight model for cyclic systems, the smallest set of smallest rings (SSSR) is used as a basis for ring membership. For example indole is perceived as a 5-membered ring fused with a 6-membered ring rather than a 9-membered ring. The two carbon atoms that make up the ring fusion would match [cR2] and the other carbon atoms would match [cR1].
The SSSR model has been criticised by OpenEye who, in their implementation of SMARTS, use R to denote the number of ring bonds for an atom. The two carbon atoms in the ring fusion match [cR3] and the other carbons match [cR2] in the OpenEye implementation of SMARTS. Used without a number, R specifies an atom in a ring in both implementations, for example [CR] (aliphatic carbon atom in ring).
Lower case r specifies the size of the smallest ring of which the atom is a member. The carbon atoms of the ring fusion would both match [cr5]. Bonds can be specified as cyclic, for example C@C matches directly bonded atoms in a ring.
Four logical operators allow atom and bond descriptors to be combined. The 'and' operator ';' can be used to define a protonated primary amine as [N;H3;+][C;X4]. The 'or' operator ',' has a higher priority and [c,n;H] defines (aromatic carbon or aromatic nitrogen) with implicit hydrogen. The 'and' operator '&' is has higher priority than ',' and [c,n&H] defines aromatic carbon or (aromatic nitrogen with implicit hydrogen).
The 'not' operator '!' can be used to define unsaturated aliphatic carbon as [C;!X4] and acyclic bonds as *-!@*.
Recursive SMARTS allow detailed specifcation of an atom's environment. For example the more reactive (with respect to electrophilic aromatic substitution) ortho and para carbon atoms of phenol can be defined as: [$(c1c([OH])cccc1),$(c1ccc([OH])cc1)]
Examples of SMARTS
A number of illustrative examples of SMARTS have been assembled by Daylight.
In real applications the CX4 atoms would need to be defined more precisely to prevent matching against electron withdrawing groups such as CF3 that would render the amine insuffciently basic to protonate at physiological pH.
SMARTS can be used to encode pharmacophore elements such as anionic centers. In the following example, recursive SMARTS notation is used to combine acid oxygen and tetrazole nitrogen in a defintion of oxygen atoms that are likely to be anionic under normal physiological conditions.
The SMARTS above would only match the acid hydroxyl and the tetrazole NH. When a carboxylic acid deprotonates the negative charge is delocalised over both oxygen atoms and it may be desirable to designate both as anionic. This can achieved using the following SMARTS.
Applications of SMARTS
The extremely precise and transparent atom typing provided by SMARTS has been exploited in a number of applications.
ALADDIN, is an early pharmacophore matching program that uses SMARTS to define recognition points (e.g. neutral hydrogen bond acceptor) of pharmacophores. A key problem in pharmacophore matching is that functional groups that are likely to be ionised at physiological pH are typically registered in their neutral forms in structural databases. The ROCS shape matching program allows atom types to be defined using SMARTS.
|This article is licensed under the GNU Free Documentation License. It uses material from the Wikipedia article "Smiles_arbitrary_target_specification". A list of authors is available in Wikipedia.|