Run the modelling

Warning

Before running Assembline for your modelling target make sure you have set up properly your modelling virtual environment and dependencies, as described in System requirements section of this manual!

To run Assembline installed with Anaconda, activate the Assembline environment by:

source activate Assembline

or depending on your computer setup:

conda activate Assembline

Introduction

The modelling at any stage is run with the same command

assembline.py <options> X_config.json params.py

The information in X_config.json and params.py determine which step (i.e. 1. Global optimization, 2. Recombinations or 3. Refinement is run)

This command can be used to execute a single “run”, which generates a single model, or multiple runs, each leading to a model.

Typically hundreds to thousands runs need to be executed (leading to the number of models same as the number of runs) to ensure exhaustive sampling and find optimal models.

A single run

To generate a single model just run:

assembline.py --traj --prefix 0000000 -o <output directory> --models X_config.json params.py

It is always useful to run this before executing the many runs as below. Run this in an interactive session to check for any errors and to check if everything has been set up correctly based on the log messages.

Multiple runs

  • Method 1: Submit all runs to the computer cluster or run on a workstation in chunks of N according to the number of processors:

    assembline.py --traj --models -o out --multi --start_idx 0 --njobs 1000 X_config.json params.py &>log&
    
    • on a cluster, this will submit 1000 modelling jobs in the queue, each job leading to one model (if ntasks in params.py is set to 1)

    • if ntasks params.py is N, it will run submit 1000/N cluster jobs, each running N modelling jobs

    • on a multicore computer, it will run ntasks at a time, and keep running until all 1000 jobs are done.

    Note

    The number of processors or cluster submission commands and templates are specified in params.py

  • Method 2: Dynamically adjust the number of concurrent runs (e.g. to not overload a cluster or annoy other users):

    Warning

    The following works out of the box only on the EMBL cluster. To adjust to your cluster, modify the assembline.py for your cluster environment following the guidelines in the script.

    assembline.py --traj --models --multi --daemon --min_concurrent_jobs 200 --max_concurrent_jobs 1000 -o out --start_idx 0 --njobs 1000 X_config.json params.py &>log&
    

    Note

    Check if jobs are being submitted to the cluster queue. By default the script runs as many jobs as there are cluster slots free to avoid overloading the cluster with too many jobs thus hindering jobs/tasks of other users and/or cluster admins.:: –min_concurrent_jobs 200 –max_concurrent_jobs 2000

    this would make sure that you have at least 200 jobs in the queue and not more than 2000 (even if there are more than 2000 free). For jobs that take longer than few minutes it is OK to use --max_concurrent_jobs 10000 (short jobs or crashing jobs can crash the queuing systems; e.g. submitting 10000 jobs that crash after 1 minute because of an error crashes the Slurm scheduler).

    Finally, log out and log back (otherwise automated session might disconnect killing your daemon)

  • Method 3: If none of the above solutions works for you, you could submit multiple jobs manually using a shell loop like the following e.g. on a computer cluster with the Slurm queuing system:

    for i in $(seq -f "%07g" 0 999)
        do
            srun assembline.py  --traj --models --prefix $i -o out X_config.json params.py &>log&
        done
    

    Warning

    Just remember to make the prefix unique for every run.

Assembline.py options

  • --traj option is optional and determines whether to save trajectory of optimization.

    Warning

    Setting a low number of traj_frame_period in Parameter file for many and long modelling runs will slow down the modelling procedure significantly.

Checking if everyhting runs correctly

  1. The above command (i.e. assembline.py --traj --models --multi --daemon ....) redirects the output to log file. Check the log file for any errors

  2. Check if the output is being created:

    • The script should create the following directories in the < output directory >

      • logs/ - directory with the logs for each run

      • models_rmf/ - models in the RMF format

      • models_txt/ - models in a special format that specifies transformation matrices for each rigid body

      • traj/ - (only if –traj option was specified) optimization trajectories in the RMF format

  3. Check the < output directory >/logs in case of crashes or to check/display diagnostic messages