Linear hydrocarbons (alkanes)
The aim of this section is to learn how to work with the residue topology (.rtp) files. Proteins, DNA and RNA are heteropolymers, containing different monomers connected to each other the same way. For example, proteins are all made out of 20 amino-acids, connected via polypeptide bonds. This way nature saves information and simplifies the production line. It only needs to remember what sequence of amino-acid to assemble, which is much less data than the entire graph of covalent bonds in the protein. And it requires machinery to produce all twenty amino-acids and to link them together. By doing that, incredible variety of protein molecules can be produces, with very different properties and capable of wide variety of function. Producing correct chemical formula for each amino-acid and linking amino-acids together in the correct order is all that is needed. Molecular dynamics software uses this in slightly different way to simplify the way molecular topology is built. Biomolecular force-field has a set of residue topologies, which describe the internal topology of each residue (be it amino-acid, nucleic acid of any other monomer odd to be part of a (hetero)polymer). These residue topologies also contain information on how to connect monomers together when the (hetero)polymer is built. When software is supplied with the coordinate file in which all residues are properly named and numbered, it draws them from the force-field in a succession and connects them into the polymer, thus building its topology. In this section we show how this mechanism can be explored for linear alkanes — molecules that connect a set of CH2 and CH3 groups into a linear chain.
Coordinates
Alkanes are chains of carbon atoms saturated with hydrogens.
Let us first look at the butane topology file to extract all the necessary parameters for the geometry of alkanes.
To get an idea on what atom types and parameters to use, let us look at ethers.rtp
file.
The very first residue there is BUTA
(butane), which should be a good template for the rest of the alkanes.
[ BUTA ] ; butane [ atoms ] H11 HCA3A 0.0900 1 H12 HCA3A 0.0900 1 H13 HCA3A 0.0900 1 C1 CC33A -0.2700 1 H21 HCA2A 0.0900 2 H22 HCA2A 0.0900 2 C2 CC32A -0.1800 2 H31 HCA2A 0.0900 3 H32 HCA2A 0.0900 3 C3 CC32A -0.1800 3 H41 HCA3A 0.0900 4 H42 HCA3A 0.0900 4 H43 HCA3A 0.0900 4 C4 CC33A -0.2700 4 [ bonds ] H11 C1 H12 C1 H13 C1 C1 C2 H21 C2 H22 C2 C2 C3 H31 C3 H32 C3 C3 C4 H41 C4 H42 C4 H43 C4
Let us start with carbon atoms.
The first and last carbons are of type CC33A
and have a partial charge -0.27.
This type is a carbon, that is covalently linked to one other carbon and three hydrogens.
The rest of carbons have type CC32A
and partial charge of -0.18.
These are linked to two other carbons in the chain and have two hydrogens attached to them.
The hydrogens that are connected to the first and last carbon are of type HCA3A
, the rest are of type HCA2A
.
All hydrogens have partial charge of 0.9, which makes the total charge of the molecule zero.
Let us check the bonded parameters for these atom types in ffbonded.itp
file.
We are interested in equilibrium bond distances and angles, since we are going to use these parameters when constructing the coordinates of the molecule.
Corresponding lines for bond distances in ffbonded.itp
file are:
[ bondtypes ] ... ; i j func b0 kb ... CC32A HCA2A 1 0.11110000 258571.20 ; alkanes, 4/98 CC33A HCA3A 1 0.11110000 269449.60 ; alkanes, 4/98 ... CC32A CC32A 1 0.15300000 186188.00 ; alkanes, 3/92 CC32A CC33A 1 0.15280000 186188.00 ; alkanes, 3/92 CC33A CC33A 1 0.15300000 186188.00 ; alkanes, 3/92 ...
Each line here contains parameters for one type of bond.
The later is defined by the atom types that participate in this bond, which are in the first two columns.
These are followed by the function type (1 for harmonic bond), equilibrium distance (b0
) and spring constant (kb
).
Most of the entries also have a comment which indicates where the parameters are taken from.
Here, we see that the equilibrium distances for C-H bonds are the same for two cases we have: 0.1111 nm. There is a slight variation in C-C bond distances, with bonds at the terminals of the molecule slightly lower in length. For simplicity, we are going to use the same value of 0.153 nm for all of these and will let the energy minimization algorithm to fix this small inconsistency.
The parameters for equilibrium bond angles can be taken from the following lines in ffbonded.itp
file:
[ angletypes ] ... ; i j k func theta0 ktheta ub0 kub ... HCA2A CC32A CC32A 5 110.100000 221.752000 0.21790000 18853.10 ; alkane, 4/98 HCA2A CC32A CC33A 5 110.100000 289.532800 0.21790000 18853.10 ; alkane, 4/98 ... HCA3A CC33A CC33A 5 110.100000 313.800000 0.21790000 18853.10 ; alkane, 4/98 HCA2A CC32A HCA2A 5 109.000000 297.064000 0.18020000 4518.72 ; alkane, 3/92 HCA3A CC33A HCA3A 5 108.400000 297.064000 0.18020000 4518.72 ; alkane, 3/92 ... CC32A CC32A CC32A 5 113.600000 488.272800 0.25610000 9338.69 ; alkane, 3/92 ... CC32A CC32A CC33A 5 115.000000 485.344000 0.25610000 6694.40 ; alkane, 3/92
The first three rows here are the respective atom types, followed by the function id (5 is for urey-bradley angle with both harmonic angle and harmonic distance contribution).
Urey-Bradley angle has three parameters: the equilibrium angle (theta0
), angle spring constant (ktheta
), equilibrium distance between first and third atom (ub0
) and distance spring constant (kub
).
There is slightly more variability in angles for different atom types. But the angles are close enough to rely on the energy minimization to fix inconsistencies if there will be any. We also need to satisfy Urey-Bradley distances, but for simplicity we are going to leave that to energy minimization algorithm as well.
The dihedral angles for our carbon atom types are all having the equilibrium dihedral angles of 0 or 180 degrees, which indicates that the carbon backbone structure is planar. Hence, we can build it in one plane, leaving e.g. z coordinates zero. The hydrogens are not in plane. The backbone hydrogens are sticking equidistantly from the plane while being apart from respective carbons in plane. The terminal hydrogens should form a tetrahedral with the first (last) carbon and three hydrogens in the corners.
[ dihedraltypes ] ... ; i j k l func phi0 kphi mult ... CC32A CC32A CC32A CC32A 9 0.000000 0.470742 5 ; alkane, c27r klauda et al 2004 CC32A CC32A CC32A CC32A 9 0.000000 0.395723 4 ; alkane, c27r klauda et al.2004 CC32A CC32A CC32A CC32A 9 180.000000 0.626554 3 ; alkane, c27r klauda et al 2004 CC32A CC32A CC32A CC32A 9 0.000000 0.269868 2 ; alkane, c27r klauda et al 2004
Let us start with carbon atoms, that form a sawtooth-like structure. We are going to place the first carbon in (0,0,0). If the x is the general direction of the chain, each next carbon is going to be \(r_{CC}\times\sin(\alpha_{CCC})\) further away from the starting point. The y coordinates will be \(r_{CC}\times\cos(\alpha_{CCC})\) for the odd atoms and zero for the even atoms, forming a sawtooth-like structure. Here, \(r_{CC}\) is the equilibrium length of the covalent bond between two carbons, \(\alpha_{CCC}\) is the equilibrium angle between two such bonds.
The two hydrogens that are connected to the carbon in chain are \(\Delta y=r_{CH}\times\cos(\alpha_{HCH})\) further away from the backbone in plane and are sticking out by \(\Delta z=r_{CH}\sin(\alpha_{HCH})\) out of plane (in z direction). Note that due to the geometry of the backbone, \(\Delta y\) should be added to the y coordinate of the respective carbon for odd carbons and subtracted for the even carbons. The \(\Delta z\) should also be added and subtracted from the z coordinates of the respective carbon for the two connected hydrogens.
The terminal carbons are connected to three hydrogens, with four atoms forming a tetrahedron. To simplify the geometry, we are going to assume that the axis of this tetrahedron is along the x (i.e. along the main axis of the alkane, not along the first C-C covalent bond). In this case, the x coordinate of all hydrogens should be a height of the tetrahedron away from the carbon atom, i.e. \(x = r_{CH}\sqrt{1-\frac{4}{3}\sin^2(\alpha_{HCH}/2)}\). If the vertex of tetrahedron is on the same z-plane as the carbon atom, its y coordinate differ by \(2r_{CH}\sin(\alpha_{HCH}/2)\), otherwise by \(r_{CH}\sin(\alpha_{HCH}/2)\). In the later case, difference on z axis is \(2r_{CH}\cos(\alpha_{HCH}/2)\). Here, \(r_{CH}\) is the equilibrium length of the covalent bond between carbon and hydrogen, \(\alpha_{HCH}\) is the equilibrium angle between teo C-H bonds.
One other special case is a methane molecule, in which case the hydrogens are in the vertices of a regular tetrahedron.
Now we can create coordinates for alkane chain of an arbitrary length, let us build topologies for these molecules.
Topology
We already looked at the topology of the butane molecule.
Making topologies for the rest of alkanes should be then simple using analogy.
In fact, there are some more in the ethers.itp
file: ethane, octane. decane, etc.
However, we can take it a notch further, by using the same mechanism that GROMACS uses for proteins.
Proteins are sequences of amino acid residues, linked covalently.
Since each residue is one of 20 standard amino-acids, there are \(20^{100}\) variants to construct a 100-residue-long protein.
Obviously, it is not possible to keep topology for all of them.
Instead, each residue has special entries in their bond listings on how to connect the residue to the next/previous one, marked with +/- signs.
Similarly, we can construct alkanes out of residues: CH3 and CH2 groups.
For these, we can also add the intra-residue bonds, so that they will all be connected in the chain.
To do this, add an alkanes.rtp
file to the force-field folder and add following lines into it:
; Methylene group in the middle of an alkane [ CH2 ] [ atoms ] C CC32A -0.1800 1 H1 HCA2A 0.0900 1 H2 HCA2A 0.0900 1 [ bonds ] C H1 C H2 -C C C +C; Terminal methyl group for an alkane [ CH3 ] [ atoms ] C CC33A -0.2700 1 H1 HCA3A 0.0900 1 H2 HCA3A 0.0900 1 H3 HCA3A 0.0900 1 [ bonds ] C H1 C H2 C H3 -C C
Here we have residue CH2
, containing of three atoms: one carbon (C
) and two hydrogens (H1
and H2
).
The atom types are listed in the second column of the atoms
section, followed by the atom charges.
These are taken from the butane topology (see above).
The last column is the charge group: the total charge of a group should be integer number of electron charges (zero in our case).
The atom names should be unique, since they define the connectivity within and between residues.
As it is follows from the bonds
section, the carbon atom is connected to both hydrogens and to carbon atom of residue before (-C
) and after (+C
).
It is very important that the atom names used here are matched with those we specify in the coordinate files.
Also, it is important the the residues are named and numbered properly there.
Here is an example of such file for butane in .gro
format:
C4H10 14 1CH3 C 1 0.129 0.169 0.182 1CH3 H1 2 0.060 0.254 0.182 1CH3 H2 3 0.108 0.108 0.273 1CH3 H3 4 0.108 0.108 0.091 2CH2 C 5 0.263 0.241 0.182 2CH2 H1 6 0.266 0.307 0.272 2CH2 H2 7 0.266 0.307 0.092 3CH2 C 8 0.389 0.157 0.182 3CH2 H1 9 0.387 0.091 0.273 3CH2 H2 10 0.387 0.091 0.091 4CH3 C 11 0.523 0.229 0.182 4CH3 H1 12 0.593 0.145 0.182 4CH3 H2 13 0.544 0.291 0.273 4CH3 H3 14 0.544 0.291 0.091
There are 4 residues in total, in the following order: CH3-CH2-CH2-CH3.
Note that these topologies above don’t have the angle and dihedral angle description, neither there is a type for the bonds.
These are constructed using the criteria that should be added to the beginning of the alkanes.rtp
file:
[ bondedtypes ] ; Col 1: Type of bond ; Col 2: Type of angles ; Col 3: Type of proper dihedrals ; Col 4: Type of improper dihedrals ; Col 5: Generate all dihedrals if 1, only heavy atoms of 0. ; Col 6: Number of excluded neighbors for nonbonded interactions ; Col 7: Generate 1,4 interactions between pairs of hydrogens if 1 ; Col 8: Remove propers over the same bond as an improper if it is 1 ; bonds angles dihedrals impropers all_dihedrals nrexcl HH14 RemoveDih 1 5 9 2 1 3 1 0
As it follows from the column-by-column comments, GROMACS will generate harmonic bonds (type 1), Urey-Bradley angles (type 5), multiple proper dihedrals (type 9), etc. (more information on bond type is available in GROMACS manual). Column 5 indicates that all dihedrals will be generated, which we will see once it is done.
Another corner-case to take care is the methane molecule, where 4 hydrogens are connected to the carbon atom. The residue description for this molecule is:
; Methane [ CH4 ] [ atoms ] C CC33A -0.3600 1 H1 HCA3A 0.0900 1 H2 HCA3A 0.0900 1 H3 HCA3A 0.0900 1 H4 HCA3A 0.0900 1 [ bonds ] C H1 C H2 C H3 C H4
Generate topology and run energy minimization
Once the coordinates and alkanes.rst
files are ready, we can repeat the procedure we used for the benzene molecule.
First, we create a full topology or the system using gmx pdb2gmx
utility, then modify it, splitting out the molecular topology.
Then we can run energy minimization in vacuum and save energy minimized coordinates in .gro
and .pdb
formats.
Later is useful for third-party software, e.g. for Packmol, if we are to create a mixture of compounds.
We can also do these procedures in bulk, if we create coordinates for alkanes of different length.
The following script rely on properly set environment variables to define path to GROMACS and name of the molecule.
# Run GROMACS to create top files for all the PDBs in the folder $GMX pdb2gmx -f ${name}.gro -o ${name}.gro -p ${name}.top -i ${name}_posre.itp -ff charmm36 -water tip3p # Create a copy of the topology that can be included cp ${name}.top ${name}.itp # Remove th1e header sed -i -n '/\[ moleculetype \]/,$p' ${name}.itp # Remove the footer sed -i '/; Include Position restraint file/,$d' ${name}.itp # Rename the molecule sed -i "s/Other/${name}/g" ${name}.itp # Combine topologies into one file # cat ${name}.itp >> alkanes.itp # Copy the topolgy to separate folder cp ${name}.itp hydrocarbons/${name}.itp # Minimize energy # Move molecule so that all coordinates are positive $GMX editconf -f ${name}.pdb -o ${name}.gro -d 0.1 # Create large simulation box (~no PBC simulations) $GMX editconf -f ${name}.gro -o ${name}.gro -box 100 100 100 -noc # Configure and run GROMACS $GMX grompp -f em.mdp -c ${name}.gro -p ${name}.top -o ${name}_em.tpr $GMX mdrun -deffnm ${name}_em # Rename the system in the resulting file sed -i "s/Protein/${name}/g" ${name}_em.gro # Convert GRO to PDB $GMX editconf -f ${name}_em.gro -o ${name}_em.pdb # Copy the resulting coordinates cp ${name}_em.gro hydrocarbons/${name}.gro cp ${name}_em.pdb hydrocarbons/${name}.pdb