1001Ferramentas
🧪 Validators

SMILES Chemical Notation Validator

Verify SMILES string syntax (atoms, bonds, rings, isomerism, parentheses). Highlights tokenization and valence errors.

Aspirina Cafeína

SMILES: writing a molecule as a line of text

SMILES (Simplified Molecular-Input Line-Entry System) is a notation that encodes a molecular structure as a single ASCII string. Unlike the CAS number — an arbitrary registry key — SMILES describes the actual structure: which atoms are bonded to which. It is the lingua franca of cheminformatics, accepted by RDKit, Open Babel, PubChem and virtually every chemistry toolkit. This tool checks that a string is syntactically valid SMILES (there is no checksum — SMILES is a grammar, not a coded number).

The core syntax

  • Atoms: the organic subset B C N O P S F Cl Br I is written bare; any other element (or one with charge/isotope/explicit H) goes in square brackets, e.g. [NH4+], [13C], [Fe].
  • Bonds: - single, = double, # triple, : aromatic; single/aromatic bonds are usually implicit.
  • Branches: parentheses, e.g. acetic acid CC(=O)O.
  • Rings: matching digit labels open and close a ring — cyclohexane is C1CCCCC1; two-digit ring numbers use %nn.
  • Aromaticity: lowercase atoms, e.g. benzene c1ccccc1.
  • Stereochemistry: / and \ for double-bond geometry, @/@@ for chirality.

Examples

  • Water O · ethanol CCO · acetic acid CC(=O)O
  • Aspirin CC(=O)Oc1ccccc1C(=O)O
  • Caffeine CN1C=NC2=C1C(=O)N(C)C(=O)N2C

Common pitfalls

  • Case matters: CO is carbon–oxygen (methanol skeleton), but Co is the element cobalt. Lowercase means aromatic or a two-letter symbol — never interchangeable.
  • Unmatched ring digits: every ring-opening digit needs its closing partner; a stray 1 is invalid.
  • Unbalanced parentheses/brackets: branches and bracketed atoms must close.
  • Not InChI or SMARTS: InChI is a different canonical identifier; SMARTS is a query language that extends SMILES and is not valid as a plain structure.

FAQ

Is there one correct SMILES per molecule? No — a molecule has many valid SMILES. Canonical SMILES (produced by a toolkit's algorithm) gives one reproducible string per structure.

Does valid syntax mean the molecule is real? Not necessarily — the grammar can be satisfied by a chemically implausible valence. Toolkits add a separate valence/sanitization check.

SMILES vs InChI? SMILES is compact and human-writable; InChI is a standardized canonical string designed for exact database matching. Many workflows store both.

Related Tools