# Molecule

A Molecule defines a collection of atoms by their geometry (in bohr), atomic numbers, charge and the multiplicity. It can be created from a variety of sources and data formats.

The Molecule building block is a critical input (and often output) of most workflows.

## Creating a new Molecule

A common method of Molecule creation is to directly set the fields:

import sierra
from sierra.inputs import *

# Build a molecule from raw data, note the distances are in Bohr
he2 = Molecule(atomic_numbers=[2, 2], geometry=[0, 0, 0, 0, 0, 5])
print(he2)
#> Molecule(formula='He2', eoi='c8de2a8')
print(he2.measure([0, 1]))
#> 5.0


Here, the charge and multiplicity are set to the defaults of 0 and 1, respectively.

Element symbols can also be used in place of atomic numbers for initialization:

from sierra.inputs import *

# Build a molecule from symbols, note the distances are in Bohr
he2 = Molecule(symbols=["He", "He"], geometry=[0, 0, 0, 0, 0, 5])
print(he2)
#> Molecule(formula='He2', eoi='c8de2a8')


## Importing common file formats

It is also common to construct a Molecule from SDF, XYZ or XYZ+ text. These formats specify positions in Angstrom and Molecule will convert these to Bohr to store in the geometry field.

A compatible file can be loaded with the file field:

from sierra.inputs import *

# Build a molecule from a SDF, XYZ or XYZ+ file
water = Molecule(file="examples/atoms.xyz")


or the file content can be passed to the data field:

from sierra.inputs import *

# Build a molecule from SDF, XYZ or XYZ+ contents
# Note the distances are in Angstrom
water = Molecule(
data="""
O 0 0 0
H 0 0 1
H 0 1 0
"""
)
print(water)
#> Molecule(formula='H2O', eoi='b7a2e71')


## Generating from a SMILES string

A molecule can be straightforwardly generated from a smiles string.

from sierra.inputs import *

butane = Molecule(smiles="CCCC")
print(butane)
#> Molecule(formula='C4H10', eoi='5c33557')


Here, our internal conformers tools are used to generate a structure from the SMILES string. Note that this implementation prioritizes speed of execution to obtain a reasonable structure rather than a rigorous conformational search. Please use the Conformer workflow for full control over geometry generation.

## Importing from PubChem

A very useful form of making a Molecule is via the PubChem interface. The pubchem attribute can be used to automatically search pubchem for the best common name match and generate a Molecule.

from sierra.inputs import *

caffeine = Molecule(pubchem="caffeine")
print(caffeine)
#> Molecule(formula='C8H10N4O2', eoi='328969a')


Warning

The pubchem interface sends data to PubChem servers and should not be used for proprietary material. This is the only operation in Sierra which reaches to an outside server, all other calls, including the Conformer workflow, run locally.

## Exporting a Molecule

Molecule objects can easily be exported to a file or a string variable in XYZ+ format:

from pathlib import Path
from sierra.inputs import *

mol = Molecule(pubchem="caffeine")

# Write a molecule as XYZ+ format
xyz_text = mol.write()
print(xyz_text)
"""
24
0 1
O     0.470000000755   2.568800000562   0.000600002289
O    -3.127099997715  -0.443600000591  -0.000300001144
N    -0.968599997718  -1.312500000756   0.000000000000
N     2.218200000188   0.141200000637  -0.000300001144
N    -1.347700000975   1.079700001715  -0.000099998618
N     1.411900002456  -1.937200000728   0.000200002527
C     0.857899998774   0.259199999205  -0.000799999524
C     0.389699999594  -1.026399997716  -0.000399999762
C     0.030699998928   1.422000000414  -0.000600002289
C    -1.906100002038  -0.249500001009  -0.000399999762
C     2.503200002560  -1.199799997711   0.000300001144
C    -1.427600002373  -2.695999998946   0.000799999524
C     3.192600002392   1.206100000576   0.000300001144
C    -2.296900002300   2.188099998257   0.000700000906
H     3.516299998930  -1.578699998441   0.000799999524
H    -1.045099998494  -3.197300000918  -0.893700001282
H    -2.518600001332  -2.759599998139   0.001100000668
H    -1.044699998732  -3.196299998867   0.895700000091
H     4.199199998662   0.780100000095   0.000200002527
H     3.046800001847   1.809199997527  -0.899199999331
H     3.046599999320   1.808299999386   0.900399998617
H    -1.808699999148   3.165100001559  -0.000300001144
H    -2.932199998609   2.102699998809   0.888099999323
H    -2.934600002473   2.102100001812  -0.884900001227
"""

# Write to a file
mol.write(filename=Path("caffeine.xyz+"))

The comment line will contain the charge and multiplicity, as per the XYZ+ standard.

## Fields

atomic_numbers

The (n, ) array of atomic numbers of the atoms.

• Type: Array[int]
• Additional Details: shape: (-1,)
charge

The overall charge of the atoms.

• Type: int
• Default: 0
geometry

The (n, 3) array of coordinates of the atoms in units of Bohr.

• Type: Array[float]
• Additional Details: shape: (-1, 3)
masses

The (n, ) array of masses of the atoms.

• Type: Array[float]
• Additional Details: shape: (-1,)
multiplicity

A value of None refers to the lowest multiplicity given the electron number parity.

• Type: int
• Default: None
symbols

The (n, ) array of symbols of the atoms.

• Type: Array[str]
• Additional Details: shape: (-1,)