Create topology and coordinates for benzene molecule
For the molecular system that contains a mixture of small molecules it is convenient to have topology and coordinate files for each component. This way the system coordinates can be created by a third-party software (e.g. Packmol or VMD) and the topology file will just list the molecules and their quantities, not the interatomic connectivity in the molecules. We will use an example of benzene to explain the procedure of creating a pair of coordinates and topology files for a single molecule that can be later used when the molecule is added to the mixture of many molecules.
The coordinates
We will write basic code to synthesize the coordinates according to the atomic parameters. Another approach would be to find the coordinates on the internet, as a stand-alone file or part of another molecular system. The later for benzene may be, e.g. a part of phenylalanine side-chain form any protein that contains this amino-acid. Note that the the carbon, that connects the side-chain to the backbone of the protein will have to be substituted with a hydrogen. Also, if the system is synthesized with the correct force-field parameters, there is no need to minimize the energy (but we will do it to show the complete procedure).
Let us use CHARMM36 force-field for the reference.
We assume that you have the force-field files in your local folder, see CHARMM36 section for details.
We need cgenff.rtp
residue topology file, where we can find benzene molecule description (look for BENZ
residue):
[ BENZ ] ; C6H6, benzene, adm jr. [ atoms ] CG CG2R61 -0.1150 1 HG HGR61 0.1150 1 CD1 CG2R61 -0.1150 1 HD1 HGR61 0.1150 1 CD2 CG2R61 -0.1150 1 HD2 HGR61 0.1150 1 CE1 CG2R61 -0.1150 1 HE1 HGR61 0.1150 1 CE2 CG2R61 -0.1150 1 HE2 HGR61 0.1150 1 CZ CG2R61 -0.1150 1 HZ HGR61 0.1150 1 [ bonds ] CD1 CG CD2 CG CE1 CD1 CE2 CD2 CZ CE1 CZ CE2 CG HG CD1 HD1 CD2 HD2 CE1 HE1 CE2 HE2 CZ HZ
Here, we have two sections that describe residue atoms and bonds respectively.
For atoms, the atom name, type, partial charge and charge group are listed.
Each bond is a pair of atom names that should be connected.
Note that angles and dihedrals will be generated by GROMACS according to what is in the header to residue topology file, but we will discuss this later.
All atoms in the residue has unique name, which is used to identify the atom when bonds are defined.
See how CG
atom is connected to CD1
and CD2
carbons and to HG
hydrogen.
Exercise
Using the residue description of benzene molecule, try to draw it on a piece of paper. Write the unique names of the atoms from the file next to them.
Now let us try to see how the parameters for the interactions are determined.
This will also help us to get the equilibrium parameters, needed to create atomic coordinates.
We see that all the carbons have type CG2R61
, hydrogens are of type HGR61
.
These atom types are what defines the parameters for bonded and non-bonded interactions for these atoms.
To see how, open ffbonded.itp
file in the force-field.
The bond and angle parameters are in the following lines of the ffbonded.itp
file (we will skip dihedrals and non-bonded parameters for now):
... CG2R61 CG2R61 1 0.13750000 255224.00 ; PROT benzene, JES 8/25/89 ... CG2R61 HGR61 1 0.10800000 284512.00 ; PROT phe,tyr JES 8/25/89 ... CG2R61 CG2R61 CG2R61 5 120.000000 334.720000 0.24162000 29288.00 ; PROT JES 8/25/89 ... CG2R61 CG2R61 HGR61 5 120.000000 251.040000 0.21525000 18409.60 ; PROT JES 8/25/89 benzene ...
As one can see from the comments, these atom types are found in phenylalanine and tyrosine amino-acids as well as in benzene molecule. The equilibrium bond distances are 0.1375 nm for bond between carbons and 0.108 for carbon-hydrogen bond. The equilibrium angle is 120 degrees for both C-C-C and C-C-H angles.
So, to make a coordinate file, we need to create a regular hexagon with carbons in its corners and hydrogens sticking out. Each side of the hexagon is a=0.1375 nm and hydrogens are farther b=0.108 nm away from the corners.
If placed in the x-y plane so that the center of hexagon is (0,0,0) and the longest diagonal along x-axis, the coordinates of the vertices are (clockwise):
\[ \begin{align}\begin{aligned}(a, 0, 0),\\(a/2, -a\cdot\cos(30), 0),\\(-a/2, -a\cdot\cos(30), 0),\\(-a, 0, 0),\\(-a/2, a\cdot\cos(30), 0),\\(a/2, a\cdot\cos(30), 0)\end{aligned}\end{align} \]
The coordinates of the respected hydrogens are in the same layout, but (a+b) away from the center, i.e.:
\[ \begin{align}\begin{aligned}(a+b, 0, 0),\\((a+b)/2, -(a+b)\cdot\cos(30), 0),\\(-(a+b)/2, -(a+b)\cdot\cos(30), 0),\\(-(a+b), 0, 0),\\(-(a+b)/2, (a+b)\cdot\cos(30), 0),\\((a+b)/2, (a+b)\cdot\cos(30), 0)\end{aligned}\end{align} \]
We also need to synchronize the coordinates with the topology: molecular simulation software needs to know which atom is connected to which.
This is described in the [ bonds ]
section of the topology: atom CD1
is connected to atom CG
, CD2
to CG
as well, and so on.
Hence, if we want to go around the carbon ring, starting from the CG
atom, the atom order will be CG
, CD1
, CE1
, CZ
, CE2
, CD2
.
This notation is inherited from proteins, where atoms are named according to their separation from the C-alpha (CA
) atom: C-beta (CB
), C-gamma (CG
), C-delta (CD
) and so on.
If the chain separates, atom names receive a number (e.g. CD1
and CD2
).
This does not make much sense in case of benzene molecule, but this does not affect us in the future so we can leave it as it is.
Once we decide which coordinates correspond to which atom name, we can create the coordinates file in any format that is suitable for the software. For instance, we can use GROMACS .gro format.
Exercise
Write a basic code that generates coordinates for benzene and saves them in
.gro
format.
Molecular topology
Once the coordinates are ready, the topology file can be created. If the atoms and residue are named according to the force-field, the basic topology can be created using pdb2gmx utility from GROMACS:
GMX=/usr/local/gromacs/bin/gmx SYSTEM_NAME=C6H6 $GMX pdb2gmx -f ${SYSTEM_NAME}.gro -o ${SYSTEM_NAME}.gro -p ${SYSTEM_NAME}.top -ff charmm36 -water tip3p
We set and use some variable to make it easier to copy-paste these commands later.
Here, we take the .gro file we created and pass it to the pdb2gmx.
We use CHARMM36 force-field and tip3p water model (if we don’t specify these options, they will be asked interactively).
Note that CHARMM36 is not supplied with GROMACS by default, so it has to be installed or you need to have force-field files (i.e. charmm36.ff
folder) in your local folder.
As an output we have corrected .gro file (if the correction is needed, e.g. hydrogen were added) and topology file in .top format.
We opted to overwrite the .gro file here, since not much is updated there — only the periodic boundary description is added.
Don’t worry if you wanted to compare the files — GROMACS makes back-ups it case it overwrites (see the log file for the backup name).
Before we modify the topology to make it more useful in the future, we can finalize our coordinates by running energy minimization simulations. Strictly speaking, this step is not required because we created the coordinates using force-field parameters. Hence the structure should be already in its energy minima. But it is useful to do the energy minimization to fix any imperfections in the structure and to make sure that the coordinates are useable.
The following script performs the energy minimization in vacuo for the molecular system. Assuming that you have this minimization .mdp file
and your coordinates are saved as C6H6.gro
:
SYSTEM_NAME=C6H6 $GMX editconf -f ${SYSTEM_NAME}.gro -o ${SYSTEM_NAME}.gro -d 0.1 $GMX editconf -f ${SYSTEM_NAME}.gro -o ${SYSTEM_NAME}.gro -box 100 100 100 -noc $GMX grompp -f em_vac.mdp -c ${SYSTEM_NAME}.gro -o em_vac.tpr $GMX mdrun -deffnm em_vac mkdir hydrocarbons cp ${SYSTEM_NAME}_em.gro hydrocarbons/${SYSTEM_NAME}.gro
Note that this will overwrite your .gro
file, but don’t worry — GROMACS will make a backup, surrounded by #
signs.
We first run the editconf
utility twice to set the periodic boundary correctly.
The first run moves the molecule so that all the coordinates are positive.
The -d 0.1
option means, that the box will be no closer than 0.1 nm from the molecule.
By default, the editconf
shifts the coordinates to the center of the box, which is in the positive quadrant.
The second run of editconf
changes the periodic box definition to a cube with 100 nm side.
This is an arbitrary large number to make sure that PBC don’t affect the system — modern versions of GROMACS can only run with periodic boundary.
Note that we use -noc
option here to leave the molecule where it is instead of moving it to the center of the box.
After coordinates are prepared, we configure GROMACS with grompp
utility.
This takes the .mdp file that describes the simulation protocol (see The energy minimization .mdp file section below), topology and coordinate files.
grompp
creates a portable .tpr
file, which contains all the data for the molecular dynamics simulation run.
This file is used with gmx mdrun
, which is the main simulation engine.
The -deffnm
option means default name, i.e. the name of the files for input/output that will be different only by extension.
Hence, the resulting (energy minimized) structure will be saved as ${SYSTEM_NAME}_em.gro
, which are the coordinates that we are going to save for future use.
Some software packages may need a .pdb
file instead of .gro
.
To convert, one can use editconf
utility from GROMACS:
$GMX editconf -f ${SYSTEM_NAME}_em.gro -o ${SYSTEM_NAME}.pdb cp ${SYSTEM_NAME}.pdb hydrocarbons/${SYSTEM_NAME}.pdb
Now, the hydrocarbons
folder should have both .gro
and .pdb
files.
The energy minimization .mdp file
Above, we ran something called energy minimization in vacuum.
This is not something that is done by default, we explicitly asked molecular simulation software to do it.
This is specified in the configuration or molecular dynamics parameters (.mdp
) file (see full description here).
We used the following parameters in our run:
title = enrgy minimisation ; Parameters describing what to do, when to stop and what to save integrator = steep ; Algorithm (steep = steepest descent minimization) emtol = 1000.0 ; Stop minimization when the maximum force < 1000.0 kJ/mol/nm emstep = 0.01 ; Minimization step size nstenergy = 500 ; save energies every 1.0 ps, so we can observe if we are successful nstxout-compressed = 500 ; for writing coords (x) nsteps = -1 ; run as long as we need ; Settings that make sure we run with parameters in harmony with the selected force-field constraints = none ; no constraints rcoulomb = 10 ; short-range electrostatic cutoff (in nm) rvdw = 10 ; short-range van der Waals cutoff (in nm) coulombtype = Cut-Off ; Cutoff electrostatics with large radii rlist = 20
The main parameter here is integrator
, which is set to steep
or steepest descent energy minimization algorithm.
Also note that we set all the cut-offs to an arbitrary large value, which is only allowed since our simulation box is large.
This is to minimize the effect of switching the interaction potential off near the cut-off radius.
Creating molecule topology file for future use
Current .top
file does two things at the same time: it describes the composition of the entire molecular system and topology of a single benzene molecule.
This is not surprising because our system is a singular benzene molecule.
However, this is not convenient if we want to re-use the molecular topology description part of the file when benzene molecule(s) is(are) part of another molecular system.
What we can do in this case is we can create a separate Include ToPology or .itp
file and copy the topology description there.
Let us have a look at the generated .top
file:
... ; Include forcefield parameters #include "charmm36.ff/forcefield.itp" [ moleculetype ] ; Name nrexcl Other 3 [ atoms ] ; nr type resnr residue atom cgnr charge mass typeB chargeB massB ; residue 1 BENZ rtp BENZ q 0.0 1 CG2R61 1 BENZ CG 1 -0.115 12.011 2 HGR61 1 BENZ HG 1 0.115 1.008 3 CG2R61 1 BENZ CD1 1 -0.115 12.011 4 HGR61 1 BENZ HD1 1 0.115 1.008 5 CG2R61 1 BENZ CD2 1 -0.115 12.011 6 HGR61 1 BENZ HD2 1 0.115 1.008 7 CG2R61 1 BENZ CE1 1 -0.115 12.011 8 HGR61 1 BENZ HE1 1 0.115 1.008 9 CG2R61 1 BENZ CE2 1 -0.115 12.011 10 HGR61 1 BENZ HE2 1 0.115 1.008 11 CG2R61 1 BENZ CZ 1 -0.115 12.011 12 HGR61 1 BENZ HZ 1 0.115 1.008 ; qtot 0 [ bonds ] ; ai aj funct c0 c1 c2 c3 1 2 1 1 3 1 ... 9 11 1 11 12 1 [ pairs ] ; ai aj funct c0 c1 c2 c3 1 8 1 1 10 1 ... 8 12 1 10 12 1 [ angles ] ; ai aj ak funct c0 c1 c2 c3 2 1 3 5 2 1 5 5 ... 7 11 12 5 9 11 12 5 [ dihedrals ] ; ai aj ak al funct c0 c1 c2 c3 c4 c5 2 1 3 4 9 2 1 3 7 9 ... 10 9 11 7 9 10 9 11 12 9 ... [ system ] ; Name Benzene [ molecules ] ; Compound #mols Other 1
Here, dots represent parts of the file that were omitted for shortness.
After the header, the description of the molecule starts in [ moleculetype ]
section.
There are two parameters here: name of the newly created molecule and number of exclusions in it.
By default, the molecule is called Other
.
This is followed by the list of atoms with their respective types, residue number, residue name, charge and mass.
Note that we are adding mass here, which is defined by the atom type.
Exercise
In the force-filed files, find where is mass is defined for our atom types.
The following sections describe connectivity between atoms, including bonds, angles and dihedral angles.
Note that now atoms are referred by their unique index, not by their name.
This is because each molecule can have several residues and hence names will not be unique any more.
The angles and dihedral angles were not listed in the residue description above, they were added by GROMACS pdb2gmx
utility according to the rules in the beginning of the residue database file.
We are skipping the position restrain section here for the clarity of the lesson. This section is normally used if one wants to constrain some (e.g. heavy) atoms in their simulations, which we will not be doing on this stage.There are also some include directive for ions and water molecule, which are self-explanatory.
The topology is finalized by [ system ]
and [ molecules ]
sections, which describe what molecules we have in our system.
Here, we only have one benzene molecule, which was given a default name Other
here.
The part we need to keep for the molecule description starts with [ moleculetype ]
and ends after [ dihedrals ]
section of the topology file.
We also want to change the name Other
to avoid duplicating names in future.
The described procedures can be performed with the following script:
cp ${SYSTEM_NAME}.top ${SYSTEM_NAME}.itp sed -i -n '/\[ moleculetype \]/,$p' ${SYSTEM_NAME}.itp sed -i '/; Include Position restraint file/,$d' ${SYSTEM_NAME}.itp sed -i "s/Other/${SYSTEM_NAME}/g" ${SYSTEM_NAME}.itp cp ${SYSTEM_NAME}.itp hydrocarbons/${SYSTEM_NAME}.itp
This will create a molecular topology or include topology C6H6.itp
file in hydrocarbons
folder.
Now, the system topology can just include the molecular topology.
For the system of a singular benzene, this topology will be:
; Include forcefield parameters #include "charmm36.ff/forcefield.itp" #include "hydrocarbons/C6H6.itp" [ system ] ; Name C6H6 [ molecules ] ; Compound #mols C6H6 1Exercise
Create
.gro
and.top
files for a system of two benzene molecules, where one of the molecules is shifted by 0.5 nm from the other in z-direction (perpendicular to the benzene plane).
Conclusions
Creating topologies and coordinates for all the molecules of the mixture allow to re-use them in the future, thus simplifying the process of simulating the arbitrary mixture significantly. The coordinates file can be constructed using pre-generated coordinate files for the molecules (e.g. using Packmol software) and the topologies of molecules can be included into the system topology file. These two files should be kept in-sync though, with the number of molecules and their order in coordinates file strictly correspondent to the number of molecules and their order in the system topology.