Conformers
This workflow generates a set of conformers from a SMILES string.
Unlike most other conformer generation tools, the Sierra conformers workflow generates conformers with structures optimized using the QM semi-empirical GFN-xTB method and provides ranking of conformers by energy computed using one of various QM methods or using OrbNet machine learning methods.
The workflow is inspired by ReSCoSS and contains the following steps:
- The input SMILES string is checked for chemical soundness and standardized based on a set of rules.
- An initial pool of conformers (default: 250) are generated using RDKit.
- The initial conformers are optimized using MMFF94 force field.
- The initial conformers are clustered based on a set of descriptors, and the final pool of conformers (35 at default settings) are selected.
- The selected conformers are optimized at the GFN1-xTB level of theory.
- The optimized structures are checked for consistency with the input SMILES string and duplicates are removed.
- The final unique conformers are sorted by energy computed using
energy_method
(GFN1-xTB at default settings).
Examples
The following example demonstrates conformer generation for butane using default settings.
import sierra
from sierra.inputs import *
# Generate conformers of butane:
butane_input = ConformersInput(
smiles="CCCC",
)
ret = sierra.run(butane_input)
# There are two particularly interesting results in the object.
# `result.energies` is a list containing the energies of the generated
# conformers, and `result.conformers` is a list containing the generated
# confomers as `codex.model.Molecule` objects. The conformers are sorted by the
# energy (lowest to highest) and the geometry of each conformer is aligned to
# the previous in the list.
# Conformer Energies:
for index, energy in enumerate(ret.energies):
print(f" {index + 1:2d}: {energy:.6f}")
#> 1: -13.866303
#> 2: -13.865710
#> 3: -13.865710
Note that the workflow only generates unique conformers. For example, for benzene the workflow will only generate 1 final conformer disregard the value of optimized_conformers
.
The default size of the starting pool of initial conformers is 250, but for smaller molecules with only very few conformers, this may be reduced. However, it takes very little time to generate these, compared to duration of the subsequent GFN1-xTB optimizations.
Below is an example showing more options for the conformer workflow:
import sierra
from sierra.inputs import *
# Generate conformers for butane
# and rank them by energy in ascending order
# at wB97XD3/def2-tzvp level of theory
butane_input = ConformersInput(
smiles="CCCC",
details={
# Maximum number of conformers to generate in the end (default 35)
"optimized_conformers": 10,
# The size of the starting pool of conformers (default: 250)
"initial_conformers": 500,
# if reproducibility is needed
"rng_seed": 2,
},
# rank the conformers by energy at wB97XD3/def2-tzvp level of theory
energy_method=DFTMethod(xc="wB97XD3", ao="def2-svp"),
)
ret = sierra.run(butane_input)
# Conformer Energies:
for index, energy in enumerate(ret.energies):
print(f" {index + 1:2d}: {energy:.6f}")
#> 1: -158.300470
#> 2: -158.299684
#> 3: -158.299666
ConformersInput
- Representation of input for the conformers workflow
Fields
energy_method
-
The method used to compute the energy of the conformers.
- Type: One of:
[
XTBMethod
,
HFMethod
,
DFTMethod
,
EMFTMethod
,
OrbNetMethod
]
- Default:
XTBMethod(model='GFN1')
- Type: One of:
smiles
-
The SMILES string used to begin the conformer pipeline.
- Type:
Optional[Any]
- Type:
details
-
Detail Fields
duplicate_energy_threshold
-
The energy threshold used to identify duplicate conformers. Conformers that have an energy difference of less than this value and an RMSD of less than
duplicate_rmsd_threshold
are flagged as duplicates.- Type:
EnergyQuantity
- Default:
"0.1 kcal/mol"
- Type:
duplicate_rmsd_threshold
-
The RMSD threshold used to identify duplicate conformers. Conformers that have an RMSD of less than this and an energy difference of less than
duplicate_energy_threshold
are flagged as duplicates.- Type:
LengthQuantity
- Default:
"0.32 angstrom"
- Type:
error_flags
-
The raised flags which will cause the conformer workflow to fail.
- Type:
List[ConformerFlags]
- Default:
[optimization_fail, properties_fail, energy_fail, duplicate, bond_length_check]
- Possibilities:
optimization_fail
properties_fail
energy_fail
duplicate
bond_length_check
bond_order_check
- Type:
force_openbabel
-
Flag indicating whether to use Open Babel when converting the SMILES string to a molecule object.
- Type:
bool
- Default:
False
- Type:
initial_conformers
-
The number of initial trial conformers to begin from.
- Type:
PositiveInt
- Default:
250
- Type:
n_optimizer_steps
-
Number of optimizer steps in each optimizer macro cycle.
- Type:
PositiveInt
- Default:
50
- Type:
optimized_conformers
-
The number of conformers to optimize using the provided
energy_method
.- Type:
PositiveInt
- Default:
35
- Type:
rng_seed
-
Seed for the random number generator when generating conformers.
- Type:
Optional[int]
- Type:
warning_flags
-
Flags which emit warnings, but not a failure of the workflow.
- Type:
List[ConformerFlags]
- Default:
[bond_order_check]
- Type:
ConformersResult
Representation of results of the conformers workflow.
Fields
All of the fields in ConformersInput and the following:
conformers
-
The list of conformers generated by the workflow.
- Type:
List[
Molecule
]
- Type:
energies
-
The energies in Hartree of each conformer generated by the workflow. The order of energies matches the
conformers
order.- Type:
List[float]
- Type:
radii_of_gyration
-
The radius of gyration of each conformer generated by the workflow. The order of radii matches the
conformers
order.- Type:
List[float]
- Type:
warnings
-
A list of warnings generated while evaulating the conformer generation workflow.
- Type:
List[List[ConformerFlags]]
- Type: