Run the modelling
=================

.. warning:: Before running Assembline for your modelling target make sure you have set up properly your modelling virtual environment and dependencies, as described in :doc:`installation` section of this manual!

To run Assembline installed with Anaconda, activate the Assembline environment by::

    source activate Assembline

or depending on your computer setup: ::

    conda activate Assembline


Introduction
------------

The modelling at any stage is run with the same command ::

    assembline.py <options> X_config.json params.py

The information in ``X_config.json`` and ``params.py`` determine which step (i.e. :doc:`combinations`, :doc:`recombinations` or :doc:`refinement` is run)

This command can be used to execute a single "run", which generates a single model, or multiple runs, each leading to a model.

Typically hundreds to thousands runs need to be executed (leading to the number of models same as the number of runs) to ensure exhaustive sampling and find optimal models.

A single run
------------

To generate a single model just run:

.. code-block:: bash

    assembline.py --traj --prefix 0000000 -o <output directory> --models X_config.json params.py

It is always useful to run this before executing the many runs as below. Run this in an interactive session to check for any errors and
to check if everything has been set up correctly based on the log messages.

Multiple runs
-------------

* Method 1: Submit all runs to the computer cluster or run on a workstation in chunks of N according to the number of processors:

    .. code-block:: bash

        assembline.py --traj --models -o out --multi --start_idx 0 --njobs 1000 X_config.json params.py &>log&

    * on a cluster, this will submit 1000 modelling jobs in the queue, each job leading to one model (if ntasks in params.py is set to 1)
    * if ntasks params.py is N, it will run submit 1000/N cluster jobs, each running N modelling jobs
    * on a multicore computer, it will run ntasks at a time, and keep running until all 1000 jobs are done. 

    .. note:: The number of processors or cluster submission commands and templates are specified in ``params.py``


* Method 2: Dynamically adjust the number of concurrent runs (e.g. to not overload a cluster or annoy other users):
  
    .. warning:: The following works out of the box only on the EMBL cluster. To adjust to your cluster, modify the ``assembline.py`` for your cluster environment following the guidelines in the script. 

    .. code-block:: bash

        assembline.py --traj --models --multi --daemon --min_concurrent_jobs 200 --max_concurrent_jobs 1000 -o out --start_idx 0 --njobs 1000 X_config.json params.py &>log&

    .. note:: Check if jobs are being submitted to the cluster queue. By default the script runs as many jobs as there are cluster slots free to avoid overloading the cluster with too many jobs thus hindering jobs/tasks of other users and/or cluster admins.::
        `--min_concurrent_jobs 200 --max_concurrent_jobs 2000`

        this would make sure that you have at least 200 jobs in the queue and not more than 2000 (even if there are more than 2000 free). For jobs that take longer than few minutes it is OK to use ``--max_concurrent_jobs 10000`` (short jobs or crashing jobs can crash the queuing systems; e.g. submitting 10000 jobs that crash after 1 minute because of an error crashes the Slurm scheduler).

    Finally, log out and log back (otherwise automated session might disconnect killing your daemon)


* Method 3: If none of the above solutions works for you, you could submit multiple jobs manually using a shell loop like the following e.g. on a computer cluster with the Slurm queuing system:
  
    .. code-block:: bash
  
        for i in $(seq -f "%07g" 0 999)
            do
                srun assembline.py  --traj --models --prefix $i -o out X_config.json params.py &>log&
            done

    .. warning:: Just remember to make the ``prefix`` unique for every run.


Assembline.py options
---------------------

* ``--traj`` option is optional and determines whether to save trajectory of optimization.

    .. warning:: Setting a low number of ``traj_frame_period`` in :doc:`params` for many and long modelling runs will slow down the modelling procedure significantly.


Checking if everyhting runs correctly
-------------------------------------

#. The above command (i.e. ``assembline.py --traj --models --multi --daemon ....``) redirects the output to ``log`` file. Check the ``log`` file for any errors

#. Check if the output is being created:

    * The script should create the following directories in the < output directory >

        * ``logs/`` - directory with the logs for each run

        * ``models_rmf/`` - models in the RMF format

        * ``models_txt/`` - models in a special format that specifies transformation matrices for each rigid body

        * ``traj/`` - (only if --traj option was specified) optimization trajectories in the RMF format

#. Check the ``< output directory >/logs`` in case of crashes or to check/display diagnostic messages