a3fe.run.Leg

class a3fe.run.Leg(leg_type: LegType, equil_detection: str = 'multiwindow', runtime_constant: float | None = 0.0005, relative_simulation_cost: float = 1, ensemble_size: int = 5, base_dir: str | None = None, input_dir: str | None = None, stream_log_level: int = 20, slurm_config: SlurmConfig | None = None, analysis_slurm_config: SlurmConfig | None = None, engine_config: _EngineConfig | None = None, engine_type: EngineType = EngineType.SOMD, update_paths: bool = True)[source]

Class set up and run the stages of a leg of the calculation.

Attributes:

delta_g
delta_g_er
equil_time: The equilibration time, per member of the ensemble, in ns, for the and any sub-simulation runners.
equilibrated
failed_simulations: The failed sub-simulation runners
input_dir: The input directory for the simulation runner.
output_dir
running
stages
stream_log_level: The log level for the stream handler.
tot_gpu_time
tot_simtime

Methods

`analyse`([slurm, run_nos, subsampling, ...])	Analyse the leg and any sub-simulations, and return the overall free energy change.
`analyse_convergence`([slurm, run_nos, mode, ...])	Get a timeseries of the total free energy change of the Leg against total simulation time.
`clean`([clean_logs])	Clean the simulation runner by deleting all files with extensions matching self.__class__.run_files in the base and output dirs, and resetting the total runtime to 0.
`create_stage_input_dirs`(sysprep_config)	Create the input directories for each stage.
`get_optimal_lam_vals`([simtime, er_type, ...])	Determine the optimal lambda windows for each stage of the leg by running short simulations at each lambda value and analysing them, using only a single run.
`get_results_df`([save_csv, add_sub_sim_runners])	Return the results in dataframe format
`lighten`()	Lighten the leg by deleting ensemble equilibration output and lightening all sub-simulation runners
`recursively_get_attr`(attr)	Get the values of the attribute for the simulation runner and any sub-simulation runners.
`recursively_set_attr`(attr, value[, force, ...])	Set the attribute to the value for the simulation runner and any sub-simulation runners.
`reset`([reset_sub_sims])	Reset all attributes changed by the runtime algorithms to their default values.
`run`([run_nos, adaptive, runtime, ...])	Run all stages and perform analysis once finished.
`run_ensemble_equilibration`(sysprep_config)	Run 5 ns simulations with SOMD for each of the ensemble_size runs and extract the final structures to use as diverse starting points for the production runs.
`save`()	Save the current state of the simulation object to a pickle file.
`set_equilibration_time`(equil_time)	Set the equilibration time for the simulation runner and any sub-simulation runners.
`setup`([sysprep_config])	Set up the leg. This involves:
`setup_stages`(pre_equilibrated_system, ...)	Set up the engine configurations for each stage of the leg.
`update_engine_config_option`(option, value)	Update an option in the engine configuration file.
`update_paths`(old_sub_path, new_sub_path)	Replace the old sub-path with the new sub-path in the base, input, and output directory paths.

get_tot_gpu_time
get_tot_simtime
is_equilibrated
kill
wait

__init__(leg_type: LegType, equil_detection: str = 'multiwindow', runtime_constant: float | None = 0.0005, relative_simulation_cost: float = 1, ensemble_size: int = 5, base_dir: str | None = None, input_dir: str | None = None, stream_log_level: int = 20, slurm_config: SlurmConfig | None = None, analysis_slurm_config: SlurmConfig | None = None, engine_config: _EngineConfig | None = None, engine_type: EngineType = EngineType.SOMD, update_paths: bool = True) → None[source]

Instantiate a calculation based on files in the input dir. If leg.pkl exists in the base directory, the calculation will be loaded from this file and any arguments supplied will be overwritten.

Parameters:

leg_type (a3.LegType) – The type of leg to set up. Options are BOUND or FREE.
equil_detection (str, Optional, default: “multiwindow”) – Method to use for equilibration detection. Options are: - “multiwindow”: Use the multiwindow paired t-test method to detect equilibration. - “chodera”: Use Chodera’s method to detect equilibration.
runtime_constant (float, Optional, default: 0.0005) – The runtime_constant (kcal**2 mol**-2 ns*-1) only affects behaviour if running adaptively, and must be supplied if running adaptively. This is used to calculate how long to run each simulation for based on the current uncertainty of the per-window free energy estimate, as discussed in the docstring of the run() method.
relative_simlation_cost (float, Optional, default: 1) – The relative cost of the simulation for a given runtime. This is used to calculate the predicted optimal runtime during adaptive simulations. The recommended use is to set this to 1 for the bound leg and to (speed of bound leg / speed of free leg) for the free leg.
ensemble_size (int, Optional, default: 5) – Number of simulations to run in the ensemble.
base_dir (str, Optional, default: None) – Path to the base directory in which to set up the stages. If None, this is set to the current working directory.
input_dir (str, Optional, default: None) – Path to directory containing input files for the simulations. If None, this is set to current_working_directory/input.
stream_log_level (int, Optional, default: logging.INFO) – Logging level to use for the steam file handlers for the calculation object and its child objects.
slurm_config (SlurmConfig, default: None) – Configuration for the SLURM job scheduler. If None, the default partition is used.
analysis_slurm_config (SlurmConfig, default: None) – Configuration for the SLURM job scheduler for the analysis. This is helpful e.g. if you want to submit analysis to the CPU partition, but the main simulation to the GPU partition. If None,
engine_config (EngineConfig, default: None) – Configuration for the engine. If None, the default configuration is used.
engine_type (EngineType, default: EngineType.SOMD) – The type of engine to use for the production simulations.
update_paths (bool, optional, default: True) – if true, if the simulation runner is loaded by unpickling, then update_paths() is called.

Return type:

None

Methods

`__init__`(leg_type[, equil_detection, ...])	Instantiate a calculation based on files in the input dir.
`analyse`([slurm, run_nos, subsampling, ...])	Analyse the leg and any sub-simulations, and return the overall free energy change.
`analyse_convergence`([slurm, run_nos, mode, ...])	Get a timeseries of the total free energy change of the Leg against total simulation time.
`clean`([clean_logs])	Clean the simulation runner by deleting all files with extensions matching self.__class__.run_files in the base and output dirs, and resetting the total runtime to 0.
`create_stage_input_dirs`(sysprep_config)	Create the input directories for each stage.
`get_optimal_lam_vals`([simtime, er_type, ...])	Determine the optimal lambda windows for each stage of the leg by running short simulations at each lambda value and analysing them, using only a single run.
`get_results_df`([save_csv, add_sub_sim_runners])	Return the results in dataframe format
`get_tot_gpu_time`([run_nos])
`get_tot_simtime`([run_nos])
`is_equilibrated`([run_nos])
`kill`()
`lighten`()	Lighten the leg by deleting ensemble equilibration output and lightening all sub-simulation runners
`recursively_get_attr`(attr)	Get the values of the attribute for the simulation runner and any sub-simulation runners.
`recursively_set_attr`(attr, value[, force, ...])	Set the attribute to the value for the simulation runner and any sub-simulation runners.
`reset`([reset_sub_sims])	Reset all attributes changed by the runtime algorithms to their default values.
`run`([run_nos, adaptive, runtime, ...])	Run all stages and perform analysis once finished.
`run_ensemble_equilibration`(sysprep_config)	Run 5 ns simulations with SOMD for each of the ensemble_size runs and extract the final structures to use as diverse starting points for the production runs.
`save`()	Save the current state of the simulation object to a pickle file.
`set_equilibration_time`(equil_time)	Set the equilibration time for the simulation runner and any sub-simulation runners.
`setup`([sysprep_config])	Set up the leg. This involves:
`setup_stages`(pre_equilibrated_system, ...)	Set up the engine configurations for each stage of the leg.
`update_engine_config_option`(option, value)	Update an option in the engine configuration file.
`update_paths`(old_sub_path, new_sub_path)	Replace the old sub-path with the new sub-path in the base, input, and output directory paths.
`wait`()

Attributes

`class_count`
`delta_g`
`delta_g_er`
`equil_time`	The equilibration time, per member of the ensemble, in ns, for the and any sub-simulation runners.
`equilibrated`
`failed_simulations`	The failed sub-simulation runners
`input_dir`	The input directory for the simulation runner.
`leg_type`
`output_dir`
`prep_stage`
`required_input_files`
`run_files`
`running`
`runtime_attributes`
`stages`
`stream_log_level`	The log level for the stream handler.
`tot_gpu_time`
`tot_simtime`

analyse(slurm: bool = False, run_nos: List[int] | None = None, subsampling=False, fraction: float = 1, plot_rmsds: bool = False) → Tuple[ndarray, ndarray][source]

Analyse the leg and any sub-simulations, and return the overall free energy change.

Parameters:

slurm (bool, optional, default=False) – Whether to use SLURM to run the analysis, by default False.
run_nos (List[int], Optional, default=None) – A list of the run numbers to analyse. If None, all runs are analysed.
subsampling (bool, optional, default=False) – If True, the free energy will be calculated by subsampling using the methods contained within pymbar.
fraction (float, optional, default=1) – The fraction of the data to use for analysis. For example, if fraction=0.5, only the first half of the data will be used for analysis. If fraction=1, all data will be used. Note that unequilibrated data is discarded from the beginning of simulations in all cases.
plot_rmsds (bool, optional, default=False) – Whether to plot RMSDS. This is slow and so defaults to False.

Returns:

dg_overall (np.ndarray) – The overall free energy change for each of the ensemble size repeats.
er_overall (np.ndarray) – The overall error for each of the ensemble size repeats.

analyse_convergence(slurm: bool = False, run_nos: List[int] | None = None, mode: str = 'cumulative', fraction: float = 1, equilibrated: bool = True) → Tuple[ndarray, ndarray][source]

Get a timeseries of the total free energy change of the Leg against total simulation time. Also plot this. Keep this separate from analyse as it is expensive to run.

Parameters:

slurm (bool, optional, default=False) – Whether to use slurm for the analysis.
run_nos (Optional[List[int]], default=None) – If specified, only analyse the specified runs. Otherwise, analyse all runs.
mode (str, optional, default=”cumulative”) – “cumulative” or “block”. The type of averaging to use. In both cases, 20 MBAR evaluations are performed.
fraction (float, optional, default=1) – The fraction of the data to use for analysis. For example, if fraction=0.5, only the first half of the data will be used for analysis. If fraction=1, all data will be used. Note that unequilibrated data is discarded from the beginning of simulations in all cases.
equilibrated (bool, optional, default=True) – Whether to analyse only the equilibrated data (True) or all data (False)

Returns:

fracts (np.ndarray) – The fraction of the total (equilibrated) simulation time for each value of dg_overall.
dg_overall (np.ndarray) – The overall free energy change for the {self.__class__.__name__} for each value of total (equilibrated) simtime for each of the ensemble size repeats.

clean(clean_logs=False) → None

Clean the simulation runner by deleting all files with extensions matching self.__class__.run_files in the base and output dirs, and resetting the total runtime to 0.

Parameters:: clean_logs (bool, default=False) – If True, also delete the log files.

create_stage_input_dirs(sysprep_config: _BaseSystemPreparationConfig) → Dict[StageType, str][source]

Create the input directories for each stage.

Parameters:: sysprep_config (BaseSystemPreparationConfig) – Configuration object for the setup of the leg.
Returns:: stage_input_dirs – Dictionary mapping each stage type to the path to its input directory.
Return type:: Dict[StageType, str]

property equil_time: float: The equilibration time, per member of the ensemble, in ns, for the and any sub-simulation runners.

property failed_simulations: List[SimulationRunner]: The failed sub-simulation runners

get_optimal_lam_vals(simtime: float | None = 0.1, er_type: str = 'root_var', delta_er: float = 2, set_relative_sim_cost: bool = True, reference_sim_cost: float = 0.21, run_nos: List[int] = [1]) → None[source]

Determine the optimal lambda windows for each stage of the leg by running short simulations at each lambda value and analysing them, using only a single run. Optionally, determine the simulation cost and recursively set the relative simulation cost according reference_sim_cost.

Parameters:

simtime (float, Optional, default: 0.1) – The length of the short simulations to run, in ns. If None is provided, it is assumed that the simulations have already been run and the optimal lambda values are extracted from the output files.
er_type (str, Optional, default=”root_var”) – Whether to integrate the standard error of the mean (“sem”) or root variance of the gradients (“root_var”) to calculate the optimal lambda values.
delta_er (float, default=2) – If er_type == “root_var”, the desired integrated root variance of the gradients between each lambda value, in kcal mol^(-1). If er_type == “sem”, the desired integrated standard error of the mean of the gradients between each lambda value, in kcal mol^(-1) ns^(1/2). A sensible default for root_var is 2 kcal mol-1, and 0.1 kcal mol-1 ns^(1/2) for sem. This is referred to as ‘thermodynamic speed’ in the publication.
set_relative_sim_cost (bool, optional, default=True) – Whether to recursively set the relative simulation cost for the leg and all sub simulation runners according to the mean simulation cost of the leg.
reference_sim_cost (float, optional, default=0.16) – The reference simulation cost to use if set_relative_sim_cost is True, in hr / ns. The default of 0.21 is the average bound leg simulation cost from a test set of ligands of a range of system sizes on RTX 2080s. This is used to set the relative simulation cost according to average_sim_cost / reference_sim_cost.
run_nos (List[int], optional, default=[1]) – The run numbers to use for the calculation. Only 1 is run by default, so by default we only analyse 1. If using delta_er = “sem”, more than one run must be specified.

Return type:

None

get_results_df(save_csv: bool = True, add_sub_sim_runners: bool = True) → DataFrame[source]

Return the results in dataframe format

Parameters:

save_csv (bool, optional, default=True) – Whether to save the results as a csv file
add_sub_sim_runners (bool, optional, default=True) – Whether to show the results from the sub-simulation runners.

Returns:

results_df – A dataframe containing the results

Return type:

pd.DataFrame

property input_dir: str: The input directory for the simulation runner.

lighten() → None[source]: Lighten the leg by deleting ensemble equilibration output and lightening all sub-simulation runners

recursively_get_attr(attr: str) → Dict[SimulationRunner, Any]

Get the values of the attribute for the simulation runner and any sub-simulation runners. If the attribute is not present for a sub-simulation runner, None is returned.

Parameters:: attr (str) – The name of the attribute to get the values of.
Returns:: attr_values – A dictionary of the attribute values for the simulation runner and any sub-simulation runners.
Return type:: Dict[SimulationRunner, Any]

recursively_set_attr(attr: str, value: Any, force: bool = False, silent: bool = False) → None

Set the attribute to the value for the simulation runner and any sub-simulation runners.

Parameters:

attr (str) – The name of the attribute to set the values of.
value (Any) – The value to set the attribute to.
force (bool, default=False) – If True, set the attribute even if it doesn’t exist.
silent (bool, default=False) – If True, don’t log the setting of the attribute or raise any warnings.

reset(reset_sub_sims: bool = True) → None

Reset all attributes changed by the runtime algorithms to their default values.

Parameters:: reset_sub_sims (bool, default=True) – If True, also reset any sub-simulation runners.

run(run_nos: List[int] | None = None, adaptive: bool = True, runtime: float | None = None, runtime_constant: float | None = None, parallel: bool = True) → None[source]

Run all stages and perform analysis once finished. If running adaptively, cycles of short runs then optimal runtime estimation are performed, where the optimal runtime is estimated according to

\[t_{\mathrm{Optimal, k}} = \sqrt{\frac{t_{\mathrm{Current}, k}}{C}}\sigma_{\mathrm{Current}}(\Delta \widehat{F}_k)\]

where: - \(t_{\mathrm{Optimal, k}}\) is the calculated optimal runtime for lambda window \(k\) - \(t_{\mathrm{Current}, k}\) is the current runtime for lambda window \(k\) - \(C\) is the runtime constant - \(\sigma_{\mathrm{Current}}(\Delta \widehat{F}_k)\) is the current uncertainty in the free energy change contribution for lambda window \(k\). This is estimated from inter-run deviations. - \(\Delta \widehat{F}_k\) is the free energy change contribution for lambda window \(k\)

Parameters:

run_nos (Optional[List[int]], default=None) – If specified, only run the specified runs. Otherwise, run all runs.
adaptive (bool, Optional, default: True) – If True, the stages will run until the simulations are equilibrated and perform analysis afterwards. If False, the stages will run for the specified runtime and analysis will not be performed.
runtime (float, Optional, default: None) – If adaptive is False, runtime must be supplied and stage will run for this number of nanoseconds.
runtime_constant (float, Optional, default: None) – The runtime_constant (kcal**2 mol**-2 ns*-1) only affects behaviour if running adaptively. This is used to calculate how long to run each simulation for based on the current uncertainty of the per-window free energy estimate.
parallel (bool, Optional, default: True) – If True, the stages will run in parallel. If False, the stages will run sequentially.

Return type:

None

run_ensemble_equilibration(sysprep_config: _BaseSystemPreparationConfig) → System[source]

Run 5 ns simulations with SOMD for each of the ensemble_size runs and extract the final structures to use as diverse starting points for the production runs. If this is the bound leg, the restraints will also be extracted from the simulations and saved to a file. The simulations will be run in a subdirectory of the stage base directory called ensemble_equilibration, and the restraints and final coordinates will be saved here.

Parameters:: sysprep_config (BaseSystemPreparationConfig) – Configuration object for the setup of the leg.

save() → None: Save the current state of the simulation object to a pickle file.

set_equilibration_time(equil_time: float) → None

Set the equilibration time for the simulation runner and any sub-simulation runners.

Parameters:: equil_time (float) – The equilibration time to set, in ns per run per lambda window.

setup(sysprep_config: _BaseSystemPreparationConfig | None = None) → None[source]

Set up the leg. This involves:

Creating the input directories
Parameterising the input structures
Solvating the input structures
Minimising the input structures
Heating the input structures
Running pre-equilibration simulations (and extracting the restraints for the bound leg)
Creating the Stage objects

Parameters:: sysprep_config (Optional[BaseSystemPreparationConfig], default: None) – Configuration object for the setup of the leg. If None, the default configuration is used.

setup_stages(pre_equilibrated_system: System, sys_prep_config: _BaseSystemPreparationConfig) → Dict[StageType, _EngineConfig][source]

Set up the engine configurations for each stage of the leg.

Parameters:

pre_equilibrated_system (_BSS._SireWrappers._system.System) – The equilibrated system to run further equilinration on. The final coordinates are then used as input for each of the individual runs.
config (BaseSystemPreparationConfig) – Configuration object for the setup of the leg.

Returns:

Dictionary mapping stage types to their Engine configurations

Return type:

Dict[StageType, EngineConfig]

property stream_log_level: int: The log level for the stream handler.

update_engine_config_option(option: str, value: str) → None: Update an option in the engine configuration file.

update_paths(old_sub_path: str, new_sub_path: str) → None

Replace the old sub-path with the new sub-path in the base, input, and output directory paths.

Parameters:

old_sub_path (str) – The old sub-path to replace.
new_sub_path (str) – The new sub-path to replace the old sub-path with.