Guides
Preparing Input for a3fe
The most basic input that a3fe accepts is PDB files of the protein and crystallographic waters, along with an sdf file for the ligand. Pre-parameterised inputs in AMBER-format are also accepted. The table below details the combinations of input files that can be supplied to a3fe, and the names that they must be given (files for both the free and bound leg must be provided). The preparation stage will be detected by a3fe when you instantiate a Calculation, and only the required preparation steps will be carried out for each leg.
You can also find out which input files are required for a given preparation stage for a given leg programmatically, e.g:
# Minimised parameterised structures for the free leg
a3.PreparationStage.MINIMISED.get_simulation_input_files(a3.LegType.FREE)
PreparationStage |
LegType |
Required Input Files |
Description |
|---|---|---|---|
STRUCTURES_ONLY |
BOUND |
protien.pdb, ligand.sdf (water.pdb) |
The ligand free-protein structure, the ligand, and (optionally) the crystallographic waters |
STRUCTURES_ONLY |
FREE |
ligand.sdf |
The ligand |
PARAMETERISED |
BOUND |
bound_param.prm7, bound_param.rst7 |
The AMBER parm7 and restart files for the complex, including crystallographic waters |
PARAMETERISED |
FREE |
free_param.prm7, free_param.rst7 |
The AMBER parm7 and restart files for the ligand |
SOLVATED |
BOUND |
bound_solv.prm7, bound_solv.rst7 |
The solvated complex with 150 mM NaCl |
SOLVATED |
FREE |
free_solv.prm7, free_solv.rst7 |
The solvated ligand with 150 mM NaCl |
MINIMISED |
BOUND |
bound_min.prm7, bound_min.rst7 |
The solvated complex after minimisation |
MINIMISED |
FREE |
free_min.prm7, free_min.rst7 |
The solvated ligand after minimisation |
PREEQUILIBRATED |
BOUND |
bound_preequil.prm7, bound_preequil.rst7 |
The solvated complex after heating and short initial equilibration steps |
PREEQUILIBRATED |
FREE |
free_preequil.prm7, free_preequil.rst7 |
The solvated ligand after heating and short initial equilibration steps |
Please note that if you are suppling parameterised input files, the ligand must be the first molecule in the system and the ligand must be named “LIG”. The former can be achieved by reordering the system with BioSimSpace, and the latter by simply editing the ligand name in the prm7 files.
The parameterisation step is liable to failure if your pdb is not formatted for tleap, becuase BioSimSpace is very fussy and will raise an error if tleap is not completely happy with the input files.
Tip
If you are having trouble parameterising your protein pdb, the recommended workflow is:
Clean your unsanitised pdb using pdb4amber, e.g.
pdb4amber -i protein.pdb -o protein_sanitised.pdbAttempt to run a3fe again (renaming
protein_sanitised.pdbtoprotein.pdb)If a3fe still raises an error, attempt parameterisation directly with
tleapto get more detailed error messages. E.g., typetleap, then
source leaprc.protein.ff14SB
source leaprc.water.tip3p
# Loading an unsanitised pdb will likely raise an error
prot = loadpdb protein_sanitised.pdb
saveamberparm prot protein.parm7 protein.rst7
savepdb prot protein_fully_sanitised.pdb
If the above fails, this is often due to residue/ atom names which do not match the templates. Read the errors to find out which residues / atoms are causing the issues, then check the expected names in library which was loaded after typing
source leaprc.protein.ff14SBe.g.cat $AMBERHOME/dat/leap/lib/amino12.lib. Rename the offending atoms/ residues and repeat the above step.Finally, rename
protein_fully_sanitised.pdbtoprotein.pdband run a3fe again.
Alternatively, if you have a parameterised protein (protein.rst7 and protein.prm7), and ligand.sdf (and optionally waters.prm7 and waters.rst7), you can use the notebook supplied in a3fe/a3fe/data/example_run_dir/parameterise_and_assemble_input.ipynb to parameterise the ligand and create the parameterised input files required by a3fe.
Running Standard Non-Adaptive Calculations
Once you have the required files in input as described above, you can run a standard non-adaptive ABFE calculation. To run 5 replicates with 5 ns of sampling per window (discarding 1 ns of this to equilibration):
import a3fe as a3
calc = a3.Calculation(ensemble_size = 5)
calc.setup()
calc.run(adaptive=False, runtime=5) # Run for 5 ns per window per replicate
calc.wait() # Wait for the simulations to finish
calc.set_equilibration_time(1) # Discard the first ns of simulation time
calc.analyse() # Fast analyses
calc.analyse_convergence() # Slower convergence analysis
calc.save()
We suggest running this through ipython (so that you can interact with the calculation while it is running) in a tmux session (so that the process is not killed when you log out).
Customising Calculations
Engine Configuration
The default simulation engine is SOMD. You can customize its configuration by using a3fe.configuration.engine_config.SomdConfig.
For example, to change the timestep, create a SomdConfig object and pass it to Calculation:
engine_config = a3.SomdConfig(timestep=2.0) # fs (femtoseconds), Create configuration with custom parameters
calc = a3.Calculation(engine_config=engine_config) # Pass to Calculation during creation
# Or modify parameters after creating the Calculation
calc = a3.Calculation()
# Before setup(): modify the engine_config directly
calc.engine_config.timestep = 2.0 # fs
# After setup(): use update_engine_config_option(option, value)
# to propagate the change to all sub-simulations
calc.update_engine_config_option("timestep", 2.0) # fs
Warning
After calling calc.setup(), always use calc.update_engine_config_option("option", value)
rather than modifying engine_config directly.
Note
nmoves and ncycles are computed properties derived from runtime, timestep, and
max_nmoves; they cannot be set directly.
To see a complete list of available configuration options, run somd-freenrg --help-config
or inspect the a3.SomdConfig API reference.
System Preparation Configuration
Calculation setup options, including the force fields, lambda schedules, and length of the equilibration steps, can be customised using a3fe.configuration.system_preparation.SomdSystemPreparationConfig.
For example, to use GAFF2 instead of OFF2 for the small molecule, set this in the config object and pass this to calc.setup():
cfg = a3.SomdSystemPreparationConfig()
cfg.forcefields["ligand"] = "gaff2"
calc.setup(sysprep_config=cfg)
You can also modify lambda values in the config object and the system will only run this defined leg and stage.
For example, to only run the bound restrain stage with specified lambda values, set:
cfg = a3.SomdSystemPreparationConfig()
cfg.lambda_values = {a3.LegType.BOUND: {a3.StageType.RESTRAIN: [0.0, 1.0]}}
calc.setup(sysprep_config=cfg)
SLURM Options
SLURM options can be customised using the a3fe.SlurmConfig object. For example, to change the partition, set calc.slurm_config.partition = "my-cluster-gpu-partition". You can
also set other options:
slurm_config = a3.SlurmConfig(partition="my-cluster-gpu-partition", time="12:00:00")
Note
The molecular dynamics simulations should be run on GPUs - they are unbearably slow on CPU. However, you may want to run the MBAR analysis on CPUs to minimise submissions to the CPU queue.
To do this, you can supply an analysis_slurm_config which is different to the slurm_config, which will only be used for the MBAR analysis.
analysis_slurm_config = a3.SlurmConfig(partition="my-cluster-cpu-partition", time="00:10:00")
calc = a3.Calculation(slurm_config=slurm_config, analysis_slurm_config=analysis_slurm_config)
Running Fast Non-Adaptive Calculations
By modifying the SomdSystemPreparationConfig object as described above, we can now try running a very fast non-adaptive calculation with just
three replicates. Note that this is expected to produce an erroneously favourable free energy of binding.
import a3fe as a3
# Shorten several of the initial equilibration stages.
# This should still be stable.
cfg = a3.SomdSystemPreparationConfig()
cfg.runtime_npt_unrestrained = 50 # ps
cfg.runtime_npt = 50 # ps
cfg.ensemble_equilibration_time = 100 # ps
calc = a3.Calculation(ensemble_size = 3)
calc.setup(sysprep_config = cfg)
calc.run(adaptive = False, runtime=0.1) # ns
calc.wait() # Wait for the simulations to finish
calc.set_equilibration_time(1) # Discard the first ns of simulation time
calc.analyse() # Fast analyses
calc.save()
Running Adaptive Calculations
You can also take advantage of the adaptive algorithms available with a3fe. The code below uses the automated lambda window selection, simulation time allocation, and equilibration detection algorithms.
import a3fe as a3
calc = a3.Calculation(ensemble_size = 5)
calc.setup()
# Get optimised lambda schedule with thermodynamic speed
# of 2 kcal mol-1
calc.get_optimal_lam_vals(delta_er = 2)
# Run adaptively with a runtime constant of 0.0005 kcal**2 mol-2 ns**-1
# Note that automatic equilibration detection with the paired t-test
# method will also be carried out.
calc.run(adaptive=True, runtime_constant = 0.0005)
calc.wait()
calc.analyse()
calc.save()
Note
It is recommended to run the calc.get_optimal_lam_vals() step before proceeding with calc.run(adaptive=True). This is becuase
the relative simulation costs for the bound and free leg are determined during the optimisation step, and are used to calculate the simulation
time allocations during the adaptive run. If you do not run the optimisation step, you can set the required attributes manually with e.g.
calc.legs[0].recursively_set_attr("relative_simulation_cost", 1, force=True) and calc.legs[1].recursively_set_attr("relative_simulation_cost", 0.2, force=True)
Note
During the adaptive allocation of simulation time, the allocated runtime is computed taking into account the relative simulation cost. To obtain
comparable total simulation times to those described in the manuscript, you should set the reference simulation time to the cost (hr / ns) of the bound leg of the
MIF/ MIF180 complex ([input files here](https://github.com/michellab/Automated-ABFE-Paper/tree/main/simulations/initial_systems/mif/input)). The cost can be obtained
by running a short simulation for the leg and checking the cost with e.g. ref_cost = calc.legs[0].tot_gpu_time / calc.legs[0].tot_simtime. This should then be passed when
optimising the lambda schedule with e.g. calc.get_optimal_lam_vals(delta_er = 2, reference_sim_cost = ref_cost).
Analysis
Analysis can be performed with:
# Calculate the free energy changes using MBAR and
# generate a variety of plots to aid analysis.
# Run through SLURM as MBAR can be computationally intensive.
# Avoid costly RMSD analysis.
calc.analyse(slurm=True, plot_rmsds=False)
# Run longer analysis to check how the estimate is changing with
# simulation time
calc.analyse_convergence()
Note
Your simulations must be equilibrated before analysis can be performed. Practically, this means that all lambda windows must be set as equilibrated.
This will be done automatically by the adaptive equilibration detection algorithms, but can also be done manually with e.g. calc.set_equilibration_time(1).
calc.analyse() generates a variety of outputs in the output directories of the calculation, leg, and stage directories. The most detailed
information is given in the stage output directories. You can get a detailed breakdown of the results as a pandas dataframe by running calc.get_results_df().
Convergence analysis involves repeatedly calculating the free energy changes with different subsets of the
data, and is computationally intensive. Hence, it is implemented in a different function. To run convergence
analysis, enter calc.analyse_convergence(). Plots of the free energy change against total simulation time
will be created in each output directory.
Some useful initial checks on the output are:
Is the calculation equilibrated, or is the estimated free energy strongly dependent on the total simulation time? See the plots of free energy change against total simulation time. Often, the bound vanish stage shows the slowest equilibration
Are there large discrepancies between runs? The overall 95 % confidence interval for the free energy change is typically around 1 kcal / mol for an intermediate-sized ligand in a reasonably behaved system with 5 replicates. If the uncertainty is much larger, identify which leg and stage it originates from by checking the free energy changes for each, and inspect the potential of mean force and histograms of the gradients to get an idea of which lambda windows are problematic. Inspecting the trajectories for these lambda windows is often helpful. Checking for Gelman-Rubin \(\hat{R} > 1.1\) (indicative of substantial discrepancies between runs)(stage output directory) can also be informative.
Are the free energy changes for the bound restraining stage (where the receptor-ligand restraints are introduced) reasonable? As a result of the restraint selection algorithm, these changes should all be around 1.2 kcal/ mol. If they are not, check the plots of the Boresch degrees of freedom in the ensemble equilibration direcoties. Discontinous jumps can indicate a change in binding modules
Running Sets of Calculations
You can run sets of calculations using the a3fe.run.CalcSet class. To do so:
Create a directory containing subdirectories for each calculation, each containing the required input files as described above
Create an
inputdirectory containing anexp_dgs.csvfile with the experimental free energy changes formatted as below, wherecalc_base_diris the name of the subdirectory containing the calculation input files,nameis your desired name for the calculation,exp_dgis the experimental free energy change,exp_eris the experimental uncertainty, andcalc_coris a correction to be applied to the calculated free energy change (for example a symmetry correction).
calc_base_dir,name,exp_dg,exp_er,calc_cor
t4l,t4l,-9.06,0.5,0
mdm2_short,mdm2_short,-2.93,0.5,0
Note
If you do not have experimental data, you can either omit the exp_dgs.csv, or supply it but leave the exp_dg and exp_er columns blank. The advantage of still supplying it is that you can still provide symmetry corrections in the calc_cor column. In both cases, you should set compare_to_exp = False in the calc_set.analyse() call.
Create, run, and analyse the set of calculations, for example for a set of non-adaptive calculations for “t4l” and “mdm2_short”:
import a3fe as a3
calc_set = a3.CalcSet(calc_paths = ["t4l", "mdm2_short"])
calc_set.setup()
calc_set.run(adaptive=False, runtime=5)
calc_set.wait()
calc_set.set_equilibration_time(1)
calc_set.analyse(exp_dgs_path = "input/exp_dgs.csv", offset = False)
calc_set.save()
ABFE with Charged Ligands
Since A3FE 0.2.0, ABFE calculations with charged ligands are supported using a co-alchemical ion approach. The charge of the ligand will be automatically detected, assuming that this is correctly specified in the input sdf. The only change in the input required is that the use of PME, rather than reaction field electrostatics, should be specified in SomdConfig as: e.g.:
engine_config = a3.SomdConfig(cutoff_type="PME", cutoff_distance=10.0) # Configure SOMD to use PME for charged ligands
calc = a3.Calculation(engine_config=engine_config) # Pass to Calculation
The default SomdConfig uses reaction field instead of PME. This is faster (around twice as fast for some of our systems) and has been shown to give equivalent results for neutral ligands in RBFE calculations - see https://pubs.acs.org/doi/full/10.1021/acs.jcim.0c01424.