Run the modelling
Warning
Before running Assembline for your modelling target make sure you have set up properly your modelling virtual environment and dependencies, as described in System requirements section of this manual!
To run Assembline installed with Anaconda, activate the Assembline environment by:
source activate Assembline
or depending on your computer setup:
conda activate Assembline
Introduction
The modelling at any stage is run with the same command
assembline.py <options> X_config.json params.py
The information in X_config.json
and params.py
determine which step (i.e. 1. Global optimization, 2. Recombinations or 3. Refinement is run)
This command can be used to execute a single “run”, which generates a single model, or multiple runs, each leading to a model.
Typically hundreds to thousands runs need to be executed (leading to the number of models same as the number of runs) to ensure exhaustive sampling and find optimal models.
A single run
To generate a single model just run:
assembline.py --traj --prefix 0000000 -o <output directory> --models X_config.json params.py
It is always useful to run this before executing the many runs as below. Run this in an interactive session to check for any errors and to check if everything has been set up correctly based on the log messages.
Multiple runs
Method 1: Submit all runs to the computer cluster or run on a workstation in chunks of N according to the number of processors:
assembline.py --traj --models -o out --multi --start_idx 0 --njobs 1000 X_config.json params.py &>log&
on a cluster, this will submit 1000 modelling jobs in the queue, each job leading to one model (if ntasks in params.py is set to 1)
if ntasks params.py is N, it will run submit 1000/N cluster jobs, each running N modelling jobs
on a multicore computer, it will run ntasks at a time, and keep running until all 1000 jobs are done.
Note
The number of processors or cluster submission commands and templates are specified in
params.py
Method 2: Dynamically adjust the number of concurrent runs (e.g. to not overload a cluster or annoy other users):
Warning
The following works out of the box only on the EMBL cluster. To adjust to your cluster, modify the
assembline.py
for your cluster environment following the guidelines in the script.assembline.py --traj --models --multi --daemon --min_concurrent_jobs 200 --max_concurrent_jobs 1000 -o out --start_idx 0 --njobs 1000 X_config.json params.py &>log&
Note
Check if jobs are being submitted to the cluster queue. By default the script runs as many jobs as there are cluster slots free to avoid overloading the cluster with too many jobs thus hindering jobs/tasks of other users and/or cluster admins.:: –min_concurrent_jobs 200 –max_concurrent_jobs 2000
this would make sure that you have at least 200 jobs in the queue and not more than 2000 (even if there are more than 2000 free). For jobs that take longer than few minutes it is OK to use
--max_concurrent_jobs 10000
(short jobs or crashing jobs can crash the queuing systems; e.g. submitting 10000 jobs that crash after 1 minute because of an error crashes the Slurm scheduler).Finally, log out and log back (otherwise automated session might disconnect killing your daemon)
Method 3: If none of the above solutions works for you, you could submit multiple jobs manually using a shell loop like the following e.g. on a computer cluster with the Slurm queuing system:
for i in $(seq -f "%07g" 0 999) do srun assembline.py --traj --models --prefix $i -o out X_config.json params.py &>log& done
Warning
Just remember to make the
prefix
unique for every run.
Assembline.py options
--traj
option is optional and determines whether to save trajectory of optimization.Warning
Setting a low number of
traj_frame_period
in Parameter file for many and long modelling runs will slow down the modelling procedure significantly.
Checking if everyhting runs correctly
The above command (i.e.
assembline.py --traj --models --multi --daemon ....
) redirects the output tolog
file. Check thelog
file for any errorsCheck if the output is being created:
The script should create the following directories in the < output directory >
logs/
- directory with the logs for each runmodels_rmf/
- models in the RMF formatmodels_txt/
- models in a special format that specifies transformation matrices for each rigid bodytraj/
- (only if –traj option was specified) optimization trajectories in the RMF format
Check the
< output directory >/logs
in case of crashes or to check/display diagnostic messages