a3fe.run.Stage

class a3fe.run.Stage(stage_type: StageType, equil_detection: str = 'multiwindow', runtime_constant: float | None = 0.0005, relative_simulation_cost: float = 1, ensemble_size: int = 5, base_dir: str | None = None, input_dir: str | None = None, output_dir: str | None = None, stream_log_level: int = 20, slurm_config: SlurmConfig | None = None, analysis_slurm_config: SlurmConfig | None = None, engine_config: _EngineConfig | None = None, engine_type: EngineType = EngineType.SOMD, update_paths: bool = True)[source]

Class to hold and manipulate an ensemble of SOMD simulations for a single stage of a calculation.

Attributes:
delta_g
delta_g_er
equil_time

The equilibration time, per member of the ensemble, in ns, for the and any sub-simulation runners.

equilibrated
failed_simulations

The failed sub-simulation runners

input_dir

The input directory for the simulation runner.

lam_val_weights

Return the weights for each lambda window.

lam_vals
lam_windows
output_dir
running

Check if the stage is running.

stream_log_level

The log level for the stream handler.

tot_gpu_time
tot_simtime

Methods

analyse([slurm, run_nos, get_frnrg, ...])

Analyse the results of the ensemble of simulations.

analyse_convergence([slurm, run_nos, mode, ...])

Get a timeseries of the total free energy change of the stage against total simulation time.

clean([clean_logs])

Clean the simulation runner by deleting all files with extensions matching self.__class__.run_files in the base and output dirs, and resetting the total runtime to 0.

get_optimal_lam_vals([er_type, delta_er, ...])

Get the optimal lambda values for the stage, based on the integrated SEM, and create plots.

get_results_df([save_csv])

Return the results in dataframe format

kill()

Kill all running simulations.

recursively_get_attr(attr)

Get the values of the attribute for the simulation runner and any sub-simulation runners.

recursively_set_attr(attr, value[, force, ...])

Set the attribute to the value for the simulation runner and any sub-simulation runners.

reset([reset_sub_sims])

Reset all attributes changed by the runtime algorithms to their default values.

run([run_nos, adaptive, runtime, ...])

Run the ensemble of simulations constituting the stage (optionally with adaptive equilibration detection), and, if using adaptive equilibration detection, perform analysis once finished.

save()

Save the current state of the simulation object to a pickle file.

set_equilibration_time(equil_time)

Set the equilibration time for the simulation runner and any sub-simulation runners.

update([save_name])

Delete the current set of lamda windows and simulations, and create a new set of simulations based on the current state of the stage.

update_engine_config_option(option, value)

Update an option in the engine configuration file.

update_paths(old_sub_path, new_sub_path)

Replace the old sub-path with the new sub-path in the base, input, and output directory paths.

wait()

Wait for the stage to finish running.

get_tot_gpu_time

get_tot_simtime

is_equilibrated

lighten

setup

__init__(stage_type: StageType, equil_detection: str = 'multiwindow', runtime_constant: float | None = 0.0005, relative_simulation_cost: float = 1, ensemble_size: int = 5, base_dir: str | None = None, input_dir: str | None = None, output_dir: str | None = None, stream_log_level: int = 20, slurm_config: SlurmConfig | None = None, analysis_slurm_config: SlurmConfig | None = None, engine_config: _EngineConfig | None = None, engine_type: EngineType = EngineType.SOMD, update_paths: bool = True) None[source]

Initialise an ensemble of SOMD simulations, constituting the Stage. If Stage.pkl exists in the output directory, the Stage will be loaded from this file and any arguments supplied will be overwritten.

Parameters:
  • stage_type (StageType) – The type of stage.

  • equil_detection (str, Optional, default: “multiwindow”) – Method to use for equilibration detection. Options are: - “multiwindow”: Use the multiwindow paired t-test method to detect equilibration. - “chodera”: Use Chodera’s method to detect equilibration.

  • runtime_constant (float, Optional, default: 0.0005) – The runtime_constant (kcal**2 mol**-2 ns*-1) only affects behaviour if running adaptively, and must be supplied if running adaptively. This is used to calculate how long to run each simulation for based on the current uncertainty of the per-window free energy estimate, as discussed in the docstring of the run() method.

  • relative_simlation_cost (float, Optional, default: 1) – The relative cost of the simulation for a given runtime. This is used to calculate the predicted optimal runtime during adaptive simulations. The recommended use is to set this to 1 for the bound leg and to (speed of bound leg / speed of free leg) for the free leg.

  • ensemble_size (int, Optional, default: 5) – Number of simulations to run in the ensemble.

  • base_dir (str, Optional, default: None) – Path to the base directory. If None, this is set to the current working directory.

  • input_dir (str, Optional, default: None) – Path to directory containing input files for the simulations. If None, this will be set to “current_working_directory/input”.

  • output_dir (str, Optional, default: None) – Path to directory to store output files from the simulations. If None, this will be set to “current_working_directory/output”.

  • stream_log_level (int, Optional, default: logging.INFO) – Logging level to use for the steam file handlers for the Ensemble object and its child objects.

  • slurm_config (SlurmConfig, default: None) – Configuration for the SLURM job scheduler. If None, the default partition is used.

  • analysis_slurm_config (SlurmConfig, default: None) – Configuration for the SLURM job scheduler for the analysis. This is helpful e.g. if you want to submit analysis to the CPU partition, but the main simulation to the GPU partition. If None,

  • engine_config (EngineConfig, default: None) – Configuration for the engine. If None, the default configuration is used.

  • engine_type (EngineType, default: EngineType.SOMD) – The type of engine to use for the production simulations.

  • update_paths (bool, Optional, default: True) – If True, if the simulation runner is loaded by unpickling, then update_paths() is called.

Return type:

None

Methods

__init__(stage_type[, equil_detection, ...])

Initialise an ensemble of SOMD simulations, constituting the Stage.

analyse([slurm, run_nos, get_frnrg, ...])

Analyse the results of the ensemble of simulations.

analyse_convergence([slurm, run_nos, mode, ...])

Get a timeseries of the total free energy change of the stage against total simulation time.

clean([clean_logs])

Clean the simulation runner by deleting all files with extensions matching self.__class__.run_files in the base and output dirs, and resetting the total runtime to 0.

get_optimal_lam_vals([er_type, delta_er, ...])

Get the optimal lambda values for the stage, based on the integrated SEM, and create plots.

get_results_df([save_csv])

Return the results in dataframe format

get_tot_gpu_time([run_nos])

get_tot_simtime([run_nos])

is_equilibrated([run_nos])

kill()

Kill all running simulations.

lighten([clean_logs])

recursively_get_attr(attr)

Get the values of the attribute for the simulation runner and any sub-simulation runners.

recursively_set_attr(attr, value[, force, ...])

Set the attribute to the value for the simulation runner and any sub-simulation runners.

reset([reset_sub_sims])

Reset all attributes changed by the runtime algorithms to their default values.

run([run_nos, adaptive, runtime, ...])

Run the ensemble of simulations constituting the stage (optionally with adaptive equilibration detection), and, if using adaptive equilibration detection, perform analysis once finished.

save()

Save the current state of the simulation object to a pickle file.

set_equilibration_time(equil_time)

Set the equilibration time for the simulation runner and any sub-simulation runners.

setup()

update([save_name])

Delete the current set of lamda windows and simulations, and create a new set of simulations based on the current state of the stage.

update_engine_config_option(option, value)

Update an option in the engine configuration file.

update_paths(old_sub_path, new_sub_path)

Replace the old sub-path with the new sub-path in the base, input, and output directory paths.

wait()

Wait for the stage to finish running.

Attributes

class_count

delta_g

delta_g_er

equil_time

The equilibration time, per member of the ensemble, in ns, for the and any sub-simulation runners.

equilibrated

failed_simulations

The failed sub-simulation runners

input_dir

The input directory for the simulation runner.

lam_val_weights

Return the weights for each lambda window.

lam_vals

lam_windows

output_dir

run_files

running

Check if the stage is running.

runtime_attributes

stream_log_level

The log level for the stream handler.

tot_gpu_time

tot_simtime

analyse(slurm: bool = False, run_nos: List[int] | None = None, get_frnrg: bool = True, subsampling: bool = False, fraction: float = 1, plot_rmsds: bool = False) Tuple[float, float] | Tuple[None, None][source]

Analyse the results of the ensemble of simulations. Requires that all lambda windows have equilibrated.

Parameters:
  • slurm (bool, optional, default=False) – Whether to use slurm for the analysis.

  • run_nos (List[int], Optional, default: None) – The run numbers to analyse. If None, all runs will be analysed.

  • get_frnrg (bool, optional, default=True) – If True, the free energy will be calculated with MBAR, otherwise this will be skipped.

  • subsampling (bool, optional, default=False) – If True, the free energy will be calculated by subsampling using the methods contained within pymbar.

  • fraction (float, optional, default=1) – The fraction of the data to use for analysis. For example, if fraction=0.5, only the first half of the data will be used for analysis. If fraction=1, all data will be used. Note that unequilibrated data is discarded from the beginning of simulations in all cases.

  • plot_rmsds (bool, optional, default=False) – Whether to plot RMSDS. This is slow and so defaults to False.

Returns:

  • free_energies (np.ndarray or None) – The free energy changes for the stage for each of the ensemble size runs, in kcal mol-1. If get_frnrg is False, this is None.

  • errors (np.ndarray or None) – The MBAR error estimates for the free energy changes for the stage for each of the ensemble size runs, in kcal mol-1. If get_frnrg is False, this is None.

analyse_convergence(slurm: bool = False, run_nos: List[int] | None = None, mode: str = 'cumulative', fraction: float = 1, equilibrated: bool = True) Tuple[ndarray, ndarray][source]

Get a timeseries of the total free energy change of the stage against total simulation time. Also plot this. This is kept separate from the analyse method as it is expensive to run.

Parameters:
  • slurm (bool, optional, default=False) – Whether to use slurm for the analysis.

  • run_nos (List[int], Optional, default: None) – The run numbers to analyse. If None, all runs will be analysed.

  • mode (str, optional, default=”cumulative”) – “cumulative” or “block”. The type of averaging to use. In both cases, 20 MBAR evaluations are performed.

  • fraction (float, optional, default=1) – The fraction of the total simulation time to use for the analysis. For example, if fraction=0.5, only the first 50 % of the simulation time will be used for the analysis.

  • equilibrated (bool, optional, default=True) – Whether to analyse only the equilibrated data (True) or all data (False)

Returns:

  • slurm (bool, optional, default=False) – Whether to use slurm for the analysis.

  • fracts (np.ndarray) – The fraction of the total (equilibrated) simulation time for each value of dg_overall.

  • dg_overall (np.ndarray) – The overall free energy change for the stage for each value of total (equilibrated) simtime for each of the ensemble size repeats.

clean(clean_logs=False) None[source]

Clean the simulation runner by deleting all files with extensions matching self.__class__.run_files in the base and output dirs, and resetting the total runtime to 0. Also flush the virtual queue to remove any remaining simulations.

Parameters:

clean_logs (bool, default=False) – If True, also delete the log files.

property equil_time: float

The equilibration time, per member of the ensemble, in ns, for the and any sub-simulation runners.

property failed_simulations: List[SimulationRunner]

The failed sub-simulation runners

get_optimal_lam_vals(er_type: str = 'root_var', delta_er: float | None = None, n_lam_vals: int | None = None, run_nos: List[int] = [1]) ndarray[source]

Get the optimal lambda values for the stage, based on the integrated SEM, and create plots.

Parameters:
  • er_type (str, optional, default=”root_var”) – Whether to integrate the standard error of the mean (“sem”) or root variance of the gradients (“root_var”) to calculate the optimal lambda values.

  • delta_er (float, optional, default=None) – If er_type == “root_var”, the desired integrated root variance of the gradients between each lambda value, in kcal mol^(-1). If er_type == “sem”, the desired integrated standard error of the mean of the gradients between each lambda value, in kcal mol^(-1) ns^(1/2). A sensible default for sem is 0.1 kcal mol-1 ns1/2, and for root_var is 2 kcal mol-1. If not provided, the number of lambda windows must be provided with n_lam_vals. This is referred to as ‘thermodynamic speed’ in the publication.

  • n_lam_vals (int, optional, default=None) – The number of lambda values to sample. If not provided, delta_er must be provided.

  • run_nos (List[int], optional, default=[1]) – The run numbers to use for the calculation. Only 1 is run by default, so by default we only analyse 1. If using er_type = “SEM”, more than one run must be specified.

Returns:

optimal_lam_vals – List of optimal lambda values for the stage.

Return type:

np.ndarray

get_results_df(save_csv: bool = True) DataFrame[source]

Return the results in dataframe format

Parameters:

save_csv (bool, optional, default=True) – Whether to save the results as a csv file

Returns:

results_df – A dataframe containing the results

Return type:

pd.DataFrame

property input_dir: str

The input directory for the simulation runner.

kill() None[source]

Kill all running simulations.

property lam_val_weights: List[float]

Return the weights for each lambda window. These are calculated according to how each windows contributes to the overall free energy estimate, as given by TI and the trapezoidal rule.

recursively_get_attr(attr: str) Dict[SimulationRunner, Any]

Get the values of the attribute for the simulation runner and any sub-simulation runners. If the attribute is not present for a sub-simulation runner, None is returned.

Parameters:

attr (str) – The name of the attribute to get the values of.

Returns:

attr_values – A dictionary of the attribute values for the simulation runner and any sub-simulation runners.

Return type:

Dict[SimulationRunner, Any]

recursively_set_attr(attr: str, value: Any, force: bool = False, silent: bool = False) None

Set the attribute to the value for the simulation runner and any sub-simulation runners.

Parameters:
  • attr (str) – The name of the attribute to set the values of.

  • value (Any) – The value to set the attribute to.

  • force (bool, default=False) – If True, set the attribute even if it doesn’t exist.

  • silent (bool, default=False) – If True, don’t log the setting of the attribute or raise any warnings.

reset(reset_sub_sims: bool = True) None

Reset all attributes changed by the runtime algorithms to their default values.

Parameters:

reset_sub_sims (bool, default=True) – If True, also reset any sub-simulation runners.

run(run_nos: List[int] | None = None, adaptive: bool = True, runtime: float | None = None, runtime_constant: float | None = None) None[source]

Run the ensemble of simulations constituting the stage (optionally with adaptive equilibration detection), and, if using adaptive equilibration detection, perform analysis once finished. If running adaptively, cycles of short runs then optimal runtime estimation are performed, where the optimal runtime is estimated according to

\[t_{\mathrm{Optimal, k}} = \sqrt{\frac{t_{\mathrm{Current}, k}}{C}}\sigma_{\mathrm{Current}}(\Delta \widehat{F}_k)\]

where: - \(t_{\mathrm{Optimal, k}}\) is the calculated optimal runtime for lambda window \(k\) - \(t_{\mathrm{Current}, k}\) is the current runtime for lambda window \(k\) - \(C\) is the runtime constant - \(\sigma_{\mathrm{Current}}(\Delta \widehat{F}_k)\) is the current uncertainty in the free energy change contribution for lambda window \(k\). This is estimated from inter-run deviations. - \(\Delta \widehat{F}_k\) is the free energy change contribution for lambda window \(k\)

Parameters:
  • adaptive (bool, Optional, default: True) – If True, the stage will run until the simulations are equilibrated and perform analysis afterwards. If False, the stage will run for the specified runtime and analysis will not be performed.

  • runtime (float, Optional, default: None) – If adaptive is False, runtime must be supplied and stage will run for this number of nanoseconds.

  • runtime_constant (float, Optional, default: None) – The runtime_constant (kcal**2 mol**-2 ns*-1) only affects behaviour if running adaptively. This is used to calculate how long to run each simulation for based on the current uncertainty of the per-window free energy estimate.

Return type:

None

property running: bool

Check if the stage is running.

save() None

Save the current state of the simulation object to a pickle file.

set_equilibration_time(equil_time: float) None

Set the equilibration time for the simulation runner and any sub-simulation runners.

Parameters:

equil_time (float) – The equilibration time to set, in ns per run per lambda window.

property stream_log_level: int

The log level for the stream handler.

update(save_name: str = 'output_saved') None[source]

Delete the current set of lamda windows and simulations, and create a new set of simulations based on the current state of the stage. This is useful if you want to change the number of simulations per lambda window, or the number of lambda windows.

Parameters:

save_name (str, default “output_saved”) – The name of the directory to save the old output directory to.

update_engine_config_option(option: str, value: str) None

Update an option in the engine configuration file.

update_paths(old_sub_path: str, new_sub_path: str) None

Replace the old sub-path with the new sub-path in the base, input, and output directory paths.

Parameters:
  • old_sub_path (str) – The old sub-path to replace.

  • new_sub_path (str) – The new sub-path to replace the old sub-path with.

wait() None[source]

Wait for the stage to finish running.