Conformers

This workflow generates a set of conformers from a SMILES string.

Unlike most other conformer generation tools, the Sierra conformers workflow generates conformers with structures optimized using the QM semi-empirical GFN-xTB method and provides ranking of conformers by energy computed using one of various QM methods or using OrbNet machine learning methods.

The workflow is inspired by ReSCoSS and contains the following steps:

The input SMILES string is checked for chemical soundness and standardized based on a set of rules.
An initial pool of conformers (default: 250) are generated using RDKit.
The initial conformers are optimized using MMFF94 force field.
The initial conformers are clustered based on a set of descriptors, and the final pool of conformers (35 at default settings) are selected.
The selected conformers are optimized at the GFN1-xTB level of theory.
The optimized structures are checked for consistency with the input SMILES string and duplicates are removed.
The final unique conformers are sorted by energy computed using energy_method (GFN1-xTB at default settings).

Examples

The following example demonstrates conformer generation for butane using default settings.

import sierra
from sierra.inputs import *

# Generate conformers of butane:
butane_input = ConformersInput(
    smiles="CCCC",
)
ret = sierra.run(butane_input)

# There are two particularly interesting results in the object.
# `result.energies` is a list containing the energies of the generated
# conformers, and `result.conformers` is a list containing the generated
# confomers as `codex.model.Molecule` objects. The conformers are sorted by the
# energy (lowest to highest) and the geometry of each conformer is aligned to
# the previous in the list.

# Conformer Energies:
for index, energy in enumerate(ret.energies):
    print(f"   {index + 1:2d}:  {energy:.6f}")
    #>     1:  -13.866303
    #>     2:  -13.865710
    #>     3:  -13.865710

Note that the workflow only generates unique conformers. For example, for benzene the workflow will only generate 1 final conformer disregard the value of optimized_conformers.

The default size of the starting pool of initial conformers is 250, but for smaller molecules with only very few conformers, this may be reduced. However, it takes very little time to generate these, compared to duration of the subsequent GFN1-xTB optimizations.

Below is an example showing more options for the conformer workflow:

import sierra
from sierra.inputs import *

# Generate conformers for butane
# and rank them by energy in ascending order
# at wB97XD3/def2-tzvp level of theory
butane_input = ConformersInput(
    smiles="CCCC",
    details={
        # Maximum number of conformers to generate in the end (default 35)
        "optimized_conformers": 10,
        # The size of the starting pool of conformers (default: 250)
        "initial_conformers": 500,
        # if reproducibility is needed
        "rng_seed": 2,
    },
    # rank the conformers by energy at wB97XD3/def2-tzvp level of theory
    energy_method=DFTMethod(xc="wB97XD3", ao="def2-svp"),
)
ret = sierra.run(butane_input)

# Conformer Energies:
for index, energy in enumerate(ret.energies):
    print(f"   {index + 1:2d}:  {energy:.6f}")
    #>     1:  -158.300470
    #>     2:  -158.299684
    #>     3:  -158.299666

ConformersInput

Representation of input for the conformers workflow

Fields

energy_method

The method used to compute the energy of the conformers.

Type: One of: [XTBMethod,HFMethod,DFTMethod,EMFTMethod,OrbNetMethod]
Default: XTBMethod(model='GFN1')

smiles

The SMILES string used to begin the conformer pipeline.

Type: Optional[Any]

details

Detail Fields

duplicate_energy_threshold

The energy threshold used to identify duplicate conformers. Conformers that have an energy difference of less than this value and an RMSD of less than duplicate_rmsd_threshold are flagged as duplicates.

Type: EnergyQuantity
Default: "0.1 kcal/mol"

duplicate_rmsd_threshold

The RMSD threshold used to identify duplicate conformers. Conformers that have an RMSD of less than this and an energy difference of less than duplicate_energy_threshold are flagged as duplicates.

Type: LengthQuantity
Default: "0.32 angstrom"

error_flags

The raised flags which will cause the conformer workflow to fail.

Type: List[ConformerFlags]
Default: [optimization_fail, properties_fail, energy_fail, duplicate, bond_length_check]
Possibilities:
- optimization_fail
- properties_fail
- energy_fail
- duplicate
- bond_length_check
- bond_order_check

force_openbabel

Flag indicating whether to use Open Babel when converting the SMILES string to a molecule object.

Type: bool
Default: False

initial_conformers

The number of initial trial conformers to begin from.

Type: PositiveInt
Default: 250

n_optimizer_steps

Number of optimizer steps in each optimizer macro cycle.

Type: PositiveInt
Default: 50

optimized_conformers

The number of conformers to optimize using the provided energy_method.

Type: PositiveInt
Default: 35

rng_seed

Seed for the random number generator when generating conformers.

Type: Optional[int]

warning_flags

Flags which emit warnings, but not a failure of the workflow.

Type: List[ConformerFlags]
Default: [bond_order_check]

ConformersResult

Representation of results of the conformers workflow.

Fields

All of the fields in ConformersInput and the following:

conformers

The list of conformers generated by the workflow.

Type: List[Molecule]

energies

The energies in Hartree of each conformer generated by the workflow. The order of energies matches the conformers order.

Type: List[float]

radii_of_gyration

The radius of gyration of each conformer generated by the workflow. The order of radii matches the conformers order.

Type: List[float]

warnings

A list of warnings generated while evaulating the conformer generation workflow.

Type: List[List[ConformerFlags]]