Protocol Sheet: ACETK.BLDCHARMM.v1
Build a globular protein in the Charmm forcefield
Authors: Nathaniel Stanley, Ignasi Buch, Gianni De Fabritiis (acemd@googlegroups.com)
Copyrights 2010-2012, University Pompeu Fabra. All rights reserved.
Software requirements: VMD.1.8.7
Knowledge requirements: Use of VMD and psfgen
Input requirements: This protocol assumes that you are familiar basic VMD functions, such as the program menus, how to load files, and the Tk Console command line feature.
Disclaimer. This protocol is based on a best practice. In no way, we guarantee that it is optimal for your system. You use it at your own risk.
Suggested directory structure:
acetk.bldcharmm.v1
`-- build
|-- 1LKK.pdb
|-- build-functions.tcl
|-- build.vmd
|-- split.vmd
`-- top_all27_prot_na.inpSuggestion. After building use protocol ACETK.EQ to equilibrate your system.
The focus of this protocol is how to manually build a system in the CHARMM force field using VMD scripts. For how to build a system "automatically" using VMD's various GUI plugins, see the addendum at the bottom. While the automatic plugins are covenient, it is easy to have errors while using them, particularly for those who don't understand how they work behind the scenes. This protocol goes over many of the things one needs to consider when building a system.
STEP1: Visually analyze the structure and read the PDB
Before working with any system, it is crucial that you understand that Protein Data Bank (PDB) files or any other type of structure file are not intended to be plug-and-play inputs for simulations. While some are convenient, many others are not. Typical problems for files from the PDB include non-standard biological residues or small organic molecules that are not parameterized and therefore not ready to be effortlessly put into simulations. Additionally, there are other considerations such as missing segments (if part of the structure couldn't be crystallized/resolved), disulfide bonds, salt bridges, or duplicate atoms (in the event two positions were seen when the spectra was taken). In some cases, such as with metalloproteins, you may be precluded from doing any meaningful or trustworthy simulations using pure MM methods; the metallic centers are often involved in complex quantum interactions and, even when not, can not be parameterized using traditional parameterization techniques.
So, the first step for any system is to inspect the structure visually and read REMARKS in the PDB file to find out if there is anything you should know. In the structure we build here, there is a carbon atom that is given two positions, and there is a phosphotyrosine residue which is not one of the 20 standard amino acids (luckily there is a patch for this in the CHARMM force field). Find these features visually using VMD and look for the remark denoting the duplicate atom in the PDB file, then move on to the next section.
STEP2: Split the system into parts
VMD builds larger structures by combining what are called segments. Segments are simply groups of atoms you define, such as a protein chain or a group of waters. In the PDB format, a segment is denoted by an identifier in spaces 73-76 of the 'ATOM' line (for more info on the PDB format, read about it here). The file directly from the PDB database does not have these, because at that point it serves no purpose. Therefore, we must split the structure up in a way that is easiest to handle and select later on. VMD loads segments from pdb files only, so we must create these files with the atom selections we want. This is the function of the following script. Read through it to make sure you understand how it works.
vmd -dispdev text -e split.vmd
It saves three pdb files. The first contains the main protein with hydrogens stripped away. The second contains the peptide ligand with hydrogens stripped away and the ACE terminal residue removed (which conveniently gets rid of our duplicate atoms problem). The third file contains the crystal waters.
STEP3: Combine different parts, make corrections, solvate and ionize
Now we run the build script. The pdb files we created are loaded into segment objects and modified where necessary. The script has several logical steps that should be followed for all builds:
- Load psfgen package and other procedures. We have supplied a file build-funcitons.tcl that contains some useful things, in particular aliases for common residues.
- Load topology files.
- If you plan to use any variable, define them early for ease-of-use. In this script we define several variables relating to the solvent box and ionization that will be performed later.
- Load the segments. The order doesn't matter, but you must assign aliases to strange residues and atoms to match definitions in the topology file before loading the segment that contains them. In this build, we have to assign some for the phosphotyrosine.
- Add patches. Patches are done after the segment is loaded. The patch in this case is of the phosphotyrosine.
- Load the coordinates with the coordpdb command. This command gets atom positions and regenerates angles and dihedrals.
- Guess coordinates of missing atoms with guesscoord command.
- Save the new structures coordinates and protein structure file (here saved as complex.pdb and complex.psf).
- Solvate the structure as desired.
- Ionize with autoionize as desired.
vmd -dispdev text -e build.vmd
General note: The PDB format is very old (punch cards and FORTRAN "old"). In an effort to handle its legacy shortcomings, several versions have been made over the years, and they are not all readily interchangeable and not all software can handle each version perfectly. The most important things to watch out for are: (1) the columns, since many PDB versions have very rigid rules about what values can go in each space. Keep in mind that it is not a space/tab/comma delimited format, but rather has rigid definitions of what should be in each space/column. (2) That the format as originally designed cannot handle more than 9,999 resids or 99,999 atoms (due to the column issues). Several workarounds have been devised, such as using hexadecimal numbers or other compact number formats. VMD has no trouble saving more atoms/residues than that once they are properly loaded, but if your input PDB file does not adhere to these rules it could result in confusing issues like missing atoms or duplicate atoms.
Addendum: Automatic structure building using VMD plugins
VMD comes standard with some nice plugins that can be very useful for building simple or even complex systems using GUI interfaces. The principal ones are the AutoPSF, Solvate, Autoionize, and Membrane Builder plugins. They can all be found under the Extensions > Modeling menu. I will discuss general consideration for using the first three here. The use of the Membrane Builder extension is discussed in ACETK.BLDMEMBPROT.v1. These tools are easiest to use when there are only very simple residues in the structure (standard amino acids, water, etc.), and hardest and error prone when there are not. Attempting to rebuild the system built above using the automatic tools will be good practice, as it is not simple and you can see where problems arise.
The general logical flow of using the GUI tools is the same as when using the manual tools. Before you add a solvation box or ions, you must first make sure that all residues are properly protonated, in particular because we would like to maintain the crystal waters. While this PDB has protons, it is best to strip the protons and any other problematic atoms and allow AutoPSF to guess the segments and coordinates (but do not delete entire residues; obviously it needs a few atoms to build the rest of the residue). (In fact, VMD's ability to guess heavy atom coordinates can prove useful in "mutating" residues into other similar residues.) Now that you have your new selection, you can save this new version of the PDB with VMD by going to File > Save coordinates... and putting in an appropriate atom selection. Then run AutoPSF.
AutoPSF has four main steps for the user:
- Load your topology file (you may not be able to use the standard topology file if you have special residues; delete the standard and load your own here)
- Guess coordinates for missing atoms (and select only particular atoms, if desired)
- Review and correction of segments/chains automatically identified
- Patches for unknown residues or atoms
The most important step is 3. Here you can see if psfgen had any trouble recognizing atoms, residues, or protein chains, giving you a chance to correct problems. This happens often with non-ideal systems, so be ready to make corrections. You can group residues any way you please for non-protein segments. This is also where you can add caps to protein chains and other biomolecules. Many of these caps are pre-parameterized for convenience, and simply become part of the terminal residues after AutoPSF is finished. Note that all protein chains should be grouped by their contiguous covalent bonds, so if a single covalent chain is split into two segments you likely have a problem with the topology file and unrecognized atoms/residues.
Fix any errors you see and finish running AutoPSF. It will save the new PDB and PSF files, which you can then use as input to Solvate. Make sure to untick the "waterbox only" box at the top of Solvate GUI and be sure that the pdb/psf file paths are set properly. The output PDB and PSF from Solvate can then be input into Autoionize. New PDB and PSF files will be generated at each step, and you can choose unique names for them to keep track of your workflow. The use of Solvate and Autoionize is straighforward so I won't go into any more detail here. If you have problems, see their respective manuals or ask on the VMD mailing list.
Appendix: List of common patches
Note: the following may not be in the topology file you have. If not, you will have to find the correct topology and parameter files elsewhere, but this list should help you do that.
C-terminal patches:
Name |
Charge |
Description |
||
CTER |
-1.00 |
standard C-terminus |
||
CT1 |
0.00 |
methylated C-terminus from methyl acetate |
||
CT2 |
0.00 |
amidated C-terminus |
||
CT3 |
0.00 |
N-Methylamide C-terminus |
||
N-terminal patches:
Name |
Charge |
Description |
||
NTER |
1.00 |
standard N-terminus |
||
ACE |
0.00 |
acetylated N-terminus (to create dipeptide) |
||
ACP |
0.00 |
acetylated N-terminus (for proline dipeptide) |
||
PROP |
1.00 |
Proline N-Terminal |
||
GLYP |
1.00 |
Glycine N-terminus |
||
Side chain patches:
Name |
Charge |
Description |
||
ASPP |
0.00 |
patch for protonated aspartic acid, proton on od2 |
||
GLUP |
0.00 |
patch for protonated glutamic acid, proton on oe2 |
||
CYSD |
-1.0 |
patch for deprotonated CYS |
||
DISU |
-0.36 |
patch for disulfides. Patch must be 1-CYS and 2-CYS. |
||
HS2 |
0.00 |
Patch for neutral His, move proton from ND1 to NE2 |
||
TP1 |
-1.00 |
convert tyrosine to monoanionic phosphotyrosine |
||
TP1A |
-1.00 |
patch to convert tyrosine to monoanionic phenol-phosphate model compound when generating tyr, use first none last none for terminal patches |
||
TP2 |
-2.00 |
patch to convert tyrosine to dianionic phosphotyrosine |
||
TP2A |
-2.00 |
patch to convert tyrosine to dianionic phosphotyrosine when generating tyr, use first none last none for terminal patches this converts a single tyrosine to a phenol phosphate |
||
TMP1 |
-1.00 |
patch to convert tyrosine to monoanionic phosphonate ester O -> methylene (see RESI BMPH) |
||
TMP2 |
-2.00 |
patch to convert tyrosine to dianionic phosphonate ester O -> methylene (see RESI BMPD) |
||
TDF1 |
-1.00 |
patch to convert tyrosine to monoanionic difluoro phosphonate ester O -> methylene (see RESI BDFH) |
||
Circular protein chain patches:
Name |
Charge |
Description |
||
LIG1 |
0.00000 |
linkage for cyclic peptide, 1 refers to the C terminus which is a glycine , 2 refers to the N terminus |
||
LIG2 |
0.00000 |
linkage for cyclic peptide, 1 refers to the C terminus, 2 refers to the N terminus which is a glycine |
||
LIG3 |
0.00000 |
linkage for cyclic peptide, 1 refers to the C terminus which is a glycine, 2 refers to the N terminus which is a glycine |
||