a3fe.run.CalcSet

class a3fe.run.CalcSet(calc_paths: List | None = None, calc_args: Dict[str, Dict] = {}, base_dir: str | None = None, input_dir: str | None = None, output_dir: str | None = None, stream_log_level: int = 20, slurm_config: SlurmConfig | None = None, analysis_slurm_config: SlurmConfig | None = None, engine_config: _EngineConfig | None = None, update_paths: bool = True)[source]

Class to set up, run, and analyse sets of ABFE calculations (each represented by Calculation objects). This runs calculations sequentially to avoid overloading the system.

Attributes:

calcs
delta_g
delta_g_er
delta_g_err
equil_time: The equilibration time, per member of the ensemble, in ns, for the and any sub-simulation runners.
equilibrated
failed_simulations: The failed sub-simulation runners
input_dir: The input directory for the simulation runner.
output_dir
running
stream_log_level: The log level for the stream handler.
tot_gpu_time
tot_simtime

Methods

`analyse`([exp_dgs_path, offset, ...])	Analyse all calculations in the set and, if the experimental free energies are provided, plot the free energy changes with respect to experiment.
`analyse_convergence`([slurm, run_nos, mode, ...])	Not implemented for CalcSet objects as convergence analysis is expensive.
`clean`([clean_logs])	Clean the simulation runner by deleting all files with extensions matching self.__class__.run_files in the base and output dirs, and resetting the total runtime to 0.
`get_optimal_lam_vals`([simtime, er_type, ...])	Determine the optimal lambda windows for each stage of each leg of each calculation by running short simulations at each lambda value and analysing them, using only a single run.
`get_results_df`([save_csv, add_sub_sim_runners])	Return the results in dataframe format
`recursively_get_attr`(attr)	Get the values of the attribute for the simulation runner and any sub-simulation runners.
`recursively_set_attr`(attr, value[, force, ...])	Set the attribute to the value for the simulation runner and any sub-simulation runners.
`reset`([reset_sub_sims])	Reset all attributes changed by the runtime algorithms to their default values.
`run`([run_nos, adaptive, runtime, ...])	Run all calculations.
`save`()	Save the current state of the simulation object to a pickle file.
`set_equilibration_time`(equil_time)	Set the equilibration time for the simulation runner and any sub-simulation runners.
`setup`([sysprep_config])	Set up all calculations sequentially.
`update_engine_config_option`(option, value)	Update an option in the engine configuration file.
`update_paths`(old_sub_path, new_sub_path)	Replace the old sub-path with the new sub-path in the base, input, and output directory paths.

get_tot_gpu_time
get_tot_simtime
is_equilibrated
kill
lighten
wait

__init__(calc_paths: List | None = None, calc_args: Dict[str, Dict] = {}, base_dir: str | None = None, input_dir: str | None = None, output_dir: str | None = None, stream_log_level: int = 20, slurm_config: SlurmConfig | None = None, analysis_slurm_config: SlurmConfig | None = None, engine_config: _EngineConfig | None = None, update_paths: bool = True) → None[source]

Instantiate a calculation based on files in the input dir. If calculation.pkl exists in the base directory, the calculation will be loaded from this file and any arguments supplied will be overwritten.

Parameters:

calc_paths (List, Optional, default: None) – List of paths to the Calculation base directories. If None, then all directories in the current directory will be assumed to be calculation base directories
calc_args (Dict[str: _Dict], Optional, default: {}) – Dictionary of kwargsto pass to the Calculation objects.
base_dir (str, Optional, default: None) – Path to the base directory which contains all the Calculations. If None, this is set to the current working directory.
input_dir (str, Optional, default: None) – Path to directory containing input files for example experimental free energy changes. If None, this is set to current_working_directory/input.
output_dir (str, Optional, default: None) – Path to directory containing output files. If None, this is set to current_working_directory/output.
stream_log_level (int, Optional, default: logging.INFO) – Logging level to use for the steam file handlers for the set object and its child objects.
slurm_config (SlurmConfig, default: None) – Configuration for the SLURM job scheduler. If None, the default partition is used.
analysis_slurm_config (SlurmConfig, default: None) – Configuration for the SLURM job scheduler for the analysis. This is helpful e.g. if you want to submit analysis to the CPU partition, but the main simulation to the GPU partition. If None,
engine_config (EngineConfig, default: None) – Configuration for the engine. If None, the default configuration is used.
update_paths (bool, Optional, default: True) – If True, if the simulation runner is loaded by unpickling, then update_paths() is called.

Return type:

None

Methods

`__init__`([calc_paths, calc_args, base_dir, ...])	Instantiate a calculation based on files in the input dir.
`analyse`([exp_dgs_path, offset, ...])	Analyse all calculations in the set and, if the experimental free energies are provided, plot the free energy changes with respect to experiment.
`analyse_convergence`([slurm, run_nos, mode, ...])	Not implemented for CalcSet objects as convergence analysis is expensive.
`clean`([clean_logs])	Clean the simulation runner by deleting all files with extensions matching self.__class__.run_files in the base and output dirs, and resetting the total runtime to 0.
`get_optimal_lam_vals`([simtime, er_type, ...])	Determine the optimal lambda windows for each stage of each leg of each calculation by running short simulations at each lambda value and analysing them, using only a single run.
`get_results_df`([save_csv, add_sub_sim_runners])	Return the results in dataframe format
`get_tot_gpu_time`([run_nos])
`get_tot_simtime`([run_nos])
`is_equilibrated`([run_nos])
`kill`()
`lighten`([clean_logs])
`recursively_get_attr`(attr)	Get the values of the attribute for the simulation runner and any sub-simulation runners.
`recursively_set_attr`(attr, value[, force, ...])	Set the attribute to the value for the simulation runner and any sub-simulation runners.
`reset`([reset_sub_sims])	Reset all attributes changed by the runtime algorithms to their default values.
`run`([run_nos, adaptive, runtime, ...])	Run all calculations.
`save`()	Save the current state of the simulation object to a pickle file.
`set_equilibration_time`(equil_time)	Set the equilibration time for the simulation runner and any sub-simulation runners.
`setup`([sysprep_config])	Set up all calculations sequentially.
`update_engine_config_option`(option, value)	Update an option in the engine configuration file.
`update_paths`(old_sub_path, new_sub_path)	Replace the old sub-path with the new sub-path in the base, input, and output directory paths.
`wait`()

Attributes

`calcs`
`class_count`
`delta_g`
`delta_g_er`
`delta_g_err`
`equil_time`	The equilibration time, per member of the ensemble, in ns, for the and any sub-simulation runners.
`equilibrated`
`failed_simulations`	The failed sub-simulation runners
`input_dir`	The input directory for the simulation runner.
`output_dir`
`run_files`
`running`
`runtime_attributes`
`stream_log_level`	The log level for the stream handler.
`tot_gpu_time`
`tot_simtime`

analyse(exp_dgs_path: str | None = None, offset: bool = False, compare_to_exp: bool = True, reanalyse: bool = False, slurm: bool = False, run_nos: List[int] | None = None, subsampling=False, fraction: float = 1, plot_rmsds: bool = False) → None[source]

Analyse all calculations in the set and, if the experimental free energies are provided, plot the free energy changes with respect to experiment.

Parameters:

exp_dgs_path (str, Optional, default = None) – The path to the file containing the experimental free energy changes. This must be a csv file with the columns:

calc_base_dir, name, exp_dg, exp_err
offset (bool, default = False) – If True, the calculated dGs will be offset to match the average experimental free energies.
compare_to_exp (bool, optional, default=True) – Whether to compare the calculated free energies to experimental free energies. If False, only the calculated free energies will be analysed. If True, correlation statistics and plots will be generated.
reanalyse (bool, optional, default=False) – Whether to reanalyse the data. If False, any existing data will be used (e.g. if you have already analysed some calculations). If True, the data will be reanalysed using the options provided. Reanalysis is useful when you have changed the analysis options and want to apply them to existing data.
slurm (bool, optional, default=False) – Whether to use slurm for the analysis.
run_nos (List[int], Optional, default=None) – A list of the run numbers to analyse. If None, all runs are analysed.
subsampling (bool, optional, default=False) – If True, the free energy will be calculated by subsampling using the methods contained within pymbar.
fraction (float, optional, default=1) – The fraction of the data to use for analysis. For example, if fraction=0.5, only the first half of the data will be used for analysis. If fraction=1, all data will be used. Note that unequilibrated data is discarded from the beginning of simulations in all cases.
plot_rmsds (bool, optional, default=False) – Whether to plot RMSDS. This is slow and so defaults to False.

analyse_convergence(slurm: bool = False, run_nos: List[int] | None = None, mode: str = 'cumulative', fraction: float = 1, equilibrated: bool = True)[source]: Not implemented for CalcSet objects as convergence analysis is expensive. Call the analyse_convergence method on individual calculations instead (or run analyse convergence on each calculation in the set, if you’re determined).

clean(clean_logs=False) → None

Clean the simulation runner by deleting all files with extensions matching self.__class__.run_files in the base and output dirs, and resetting the total runtime to 0.

Parameters:: clean_logs (bool, default=False) – If True, also delete the log files.

property equil_time: float: The equilibration time, per member of the ensemble, in ns, for the and any sub-simulation runners.

property failed_simulations: List[SimulationRunner]: The failed sub-simulation runners

get_optimal_lam_vals(simtime: float = 0.1, er_type: str = 'root_var', delta_er: float = 2, set_relative_sim_cost: bool = True, reference_sim_cost: float = 0.21, run_nos: List[int] = [1]) → None[source]

Determine the optimal lambda windows for each stage of each leg of each calculation by running short simulations at each lambda value and analysing them, using only a single run. Optionally, determine the simulation cost and recursively set the relative simulation cost according reference_sim_cost.

Parameters:

simtime (float, Optional, default: 0.1) – The length of the short simulations to run, in ns. If None is provided, it is assumed that the simulations have already been run and the optimal lambda values are extracted from the output files.
er_type (str, Optional, default=”root_var”) – Whether to integrate the standard error of the mean (“sem”) or root variance of the gradients (“root_var”) to calculate the optimal lambda values.
delta_er (float, default=2) – If er_type == “root_var”, the desired integrated root variance of the gradients between each lambda value, in kcal mol^(-1). If er_type == “sem”, the desired integrated standard error of the mean of the gradients between each lambda value, in kcal mol^(-1) ns^(1/2). A sensible default for root_var is 2 kcal mol-1, and 0.1 kcal mol-1 ns^(1/2) for sem. This is referred to as ‘thermodynamic speed’ in the publication.
set_relative_sim_cost (bool, optional, default=True) – Whether to recursively set the relative simulation cost for the leg and all sub simulation runners according to the mean simulation cost of the leg.
reference_sim_cost (float, optional, default=0.16) – The reference simulation cost to use if set_relative_sim_cost is True, in hr / ns. The default of 0.21 is the average bound leg simulation cost from a test set of ligands of a range of system sizes on RTX 2080s. This is used to set the relative simulation cost according to average_sim_cost / reference_sim_cost.
run_nos (List[int], optional, default=[1]) – The run numbers to use for the calculation. Only 1 is run by default, so by default we only analyse 1. If using delta_er = “sem”, more than one run must be specified.

Return type:

None

get_results_df(save_csv: bool = True, add_sub_sim_runners: bool = True)[source]

Return the results in dataframe format

Parameters:

save_csv (bool, optional, default=True) – Whether to save the results as a csv file
add_sub_sim_runners (bool, optional, default=True) – Whether to show the results from the sub-simulation runners.

Returns:

results_df – A dataframe containing the results

Return type:

pd.DataFrame

property input_dir: str: The input directory for the simulation runner.

recursively_get_attr(attr: str) → Dict[SimulationRunner, Any]

Get the values of the attribute for the simulation runner and any sub-simulation runners. If the attribute is not present for a sub-simulation runner, None is returned.

Parameters:: attr (str) – The name of the attribute to get the values of.
Returns:: attr_values – A dictionary of the attribute values for the simulation runner and any sub-simulation runners.
Return type:: Dict[SimulationRunner, Any]

recursively_set_attr(attr: str, value: Any, force: bool = False, silent: bool = False) → None

Set the attribute to the value for the simulation runner and any sub-simulation runners.

Parameters:

attr (str) – The name of the attribute to set the values of.
value (Any) – The value to set the attribute to.
force (bool, default=False) – If True, set the attribute even if it doesn’t exist.
silent (bool, default=False) – If True, don’t log the setting of the attribute or raise any warnings.

reset(reset_sub_sims: bool = True) → None

Reset all attributes changed by the runtime algorithms to their default values.

Parameters:: reset_sub_sims (bool, default=True) – If True, also reset any sub-simulation runners.

run(run_nos: List[int] | None = None, adaptive: bool = True, runtime: float | None = None, runtime_constant: float | None = None, run_stages_parallel: bool = False) → None[source]

Run all calculations. Analysis is not performed by default. If running adaptively, cycles of short runs then optimal runtime estimation are performed, where the optimal runtime is estimated according to

\[\begin{split}t_{\\mathrm{Optimal, k}} = \\sqrt{\\frac{t_{\\mathrm{Current}, k}}{C}}\\sigma_{\\mathrm{Current}}(\\Delta \\widehat{F}_k)\end{split}\]

where: - \(t_{\\mathrm{Optimal, k}}\) is the calculated optimal runtime for lambda window \(k\) - \(t_{\\mathrm{Current}, k}\) is the current runtime for lambda window \(k\) - \(C\) is the runtime constant - \(\sigma_{\\mathrm{Current}}(\\Delta \\widehat{F}_k)\) is the current uncertainty in the free energy change contribution for lambda window \(k\). This is estimated from inter-run deviations. - \(\Delta \\widehat{F}_k\) is the free energy change contribution for lambda window \(k\)

Parameters:

run_nos (List[int], Optional, default: None) – List of run numbers to run. If None, all runs will be run.
adaptive (bool, Optional, default: True) – If True, the stages will run until the simulations are equilibrated and perform analysis afterwards. If False, the stages will run for the specified runtime and analysis will not be performed.
runtime (float, Optional, default: None) – If adaptive is False, runtime must be supplied and stage will run for this number of nanoseconds.
runtime_constant (float, Optional, default: None) – The runtime_constant (kcal**2 mol**-2 ns*-1) only affects behaviour if running adaptively. This is used to calculate how long to run each simulation for based on the current uncertainty of the per-window free energy estimate.
run_stages_parallel (bool, Optional, default: False) – If True, the stages for each individual calculation will be run in parallel. Can casuse issues with QOS limits on HPC clusters as each stage might try to submit jobs at the same time, resulting in oversubmission of jobs. Each calculation will still be run sequentially.

Return type:

None

save() → None: Save the current state of the simulation object to a pickle file.

set_equilibration_time(equil_time: float) → None

Set the equilibration time for the simulation runner and any sub-simulation runners.

Parameters:: equil_time (float) – The equilibration time to set, in ns per run per lambda window.

setup(sysprep_config: _BaseSystemPreparationConfig | None = None) → None[source]

Set up all calculations sequentially.

Parameters:: sysprep_config (_BaseSystemPreparationConfig, opttional, default = None) – The system preparation configuration to use for all calculations. If None, the default configuration is used.

property stream_log_level: int: The log level for the stream handler.

update_engine_config_option(option: str, value: str) → None: Update an option in the engine configuration file.

update_paths(old_sub_path: str, new_sub_path: str) → None

Replace the old sub-path with the new sub-path in the base, input, and output directory paths.

Parameters:

old_sub_path (str) – The old sub-path to replace.
new_sub_path (str) – The new sub-path to replace the old sub-path with.