Parameter file
==============

The **parameter file** file defines modeling protocol, scoring functions, output parameters and some restraints.

The parameter file is written in Python programming language, but really, no programming required!

Creating the parameter file
---------------------------

#. Define the protocol:
    
    For example, for the :doc:`combinations`

    .. code-block:: python

        protocol = 'denovo_MC-SA'

    See parameters below for available protocols and guidelines for :doc:`combinations`, :doc:`recombinations`, :doc:`refinement`

#. Define Monte Carlo Simulated Annealing schedule:
   
    For example:

    .. code-block:: python

        SA_schedule = [
            (50000,   10000),
            (10000,   10000),
            (5000,   10000),
            (1000,   50000),
            (100,   50000)
        ]

    SA_schedule is a list of pairs. Each pair defines a Monte Carlo temperature and number of steps.

    The Monte Carlo Simulated Annealing with run the specified number of steps for the first temperature,
    then switch to the second temperature in the list and so on.


    .. note:: **How to select the temperatures and number of steps?**
    
        Unfortunately, there is no absolute way. For this, one usually first runs a small number of runs,
        and evaluates the output to select.

        The **temperature** would be selected to be similar in order to the obtained scores (e.g. if scores
        are in the 100,000 range, the temperature could be 10,000-100,000 range too) and based on the acceptance 
        rates of accepted moves (listed in the output log files). If the acceptance rates are low, you may want
        to increase the temperatures.

        You can always calculate the Metropolis probability of accepting a move given your score scales. For example,
        if you score moved up from 100,000 to 110,000 (difference 10,000) you can calculate the probability of accepting this move by
        this equation:

        .. image:: images/metropolis.png
          :width: 300
          :alt: Metropolis

        substituting ``k=1``, ``T=<your temperature>``, ``deltaScore=10,000``

        The **number of steps** can be adjusted after evaluating the convergence of the initial runs. Also,
        the number of steps is always a compromise between the time needed for convergence and available computational resources.

#. Optionally, define an initial optimization:
   
    .. code-block:: python

        do_ini_opt = True
        ini_opt_SA_schedule = [
            (1000, 1000)
        ]

    The initial optimization can be used for example to equilibrate the system with different
    restraints than in the main optimization.

#. Optionally, adjust log files and frequency of saving logs and frames:
   
    .. code-block:: python

        traj_frame_period = 10
        print_frame_period = traj_frame_period

        print_log_scores_to_files = True
        print_log_scores_to_files_frame_period = 10

        print_total_score_to_files = True
        print_total_score_to_files_frame_period = 10

#. Adjust representation resolutions for your molecules:
   
   For example, to define two resolutions, 1- and 10-residues per bead:
   
    .. code-block:: python

        struct_resolutions = [1,10] 

#. Decide whether to define rigid bodies based in the :doc:`input_structures` definition in the :doc:`json`:
   
    .. code-block:: python

        add_rbs_from_pdbs = True

#. Set weight for :doc:`connectivity_restraints` and whether they should be applied only to the first copy of each series (should be True for symmetry **constrained** complexes):
   
    .. code-block:: python

        connectivity_restraint_weight = 1.0
        conn_first_copy_only = True

#. Add symmetry constraints and/or restraints?:

    For symmetry **constraints**:

    .. code-block:: python

        add_symmetry_constraints = True

    For symmetry **constraints**:

    .. code-block:: python

        add_symmetry_restraints = True

#. Optionally, set parameters for cross-link restraints:

    .. code-block:: python

        add_xlink_restraints = True
        x_min_ld_score = 25
        x_weight = 100.0
        x_xlink_distance = 30
        x_xlink_score_type='HarmonicUpperBound'

#. For :doc:`combinations` and :doc:`recombinations`, set weight for "discrete restraints":

    The descrete restraints is just a name used for the restraint derived from the :doc:`fit_libs`

    .. code-block:: python

        discrete_restraints_weight=10000 # weight for the restraint derived from the fit libraries

#. Define excluded volume (steric) restraints:

    .. code-block:: python

        ev_restraints = [
            {
                'name': 'ev_restraints_lowres',
                'weight': 10,
                'repr_resolution': 10
            },
            {
                'name': 'ev_restraints_highres',
                'weight': 10,
                'repr_resolution': 1
            }
        ]

    In this block you define a list of possible excluded volume restraints. Each gets a custom name, weight and resolution of the representation it is assigned. You decide which will be used later in the scoring functions.

    The representation resolution will match the closest existing, e.g. if you have mixed representation resolution, 10 and 1 for rigid bodies and 1 for flexible beads, and 
    specify ``repr_resolution`` of 10, it will be matched to the resolution 10 of rigid bodies and 1 of flexible beads.


#. Define scoring functions:

    In the example below, two scoring functions are defined, ``score_func_ini_opt`` and ``score_func_lowres``. These are arbitrary names, choose them to your liking.
    Each scoring function has a list of ``restraints``, listing names defined above and any other restraints defined in the JSON.

    Names of some restraints are predefined, such as:

    * discrete_restraints - restraints derived from the pre-defined positions, e.g. precomputed libraries of fits

    * conn_restraints - connectivity restraints
      
    * xlink_restraints - cross-link restraints
      
    Other restraints bare names defined above (e.g. ``ev_restraints_lowres`` and ``ev_restraints_highres`` defined above)
    or in the :doc:json.


    .. code-block:: python

        scoring_functions = {
            'score_func_lowres': {
                'restraints': [
                    'discrete_restraints',
                    'conn_restraints',
                    'ev_restraints_lowres'
                ]
            },
            'score_func_highres': {
                'restraints': [
                    'discrete_restraints',
                    'xlink_restraints',
                    'conn_restraints',
                    'ev_restraints_highres'
                ]
            }
        }

    In the example above, two scoring functions are defined. One uses the low resolution excluded volume restraint and does not use crosslinks.
    The second uses high resolution representation and turns on crosslinks.

#. Define which scoring function should be used for each modeling stage:

    .. code-block:: python

        score_func_ini_opt = 'score_func_lowres'
        score_func = 'score_func_highres'

    The ``score_func_ini_opt`` parameter defines which of the scoring functions defined above should be used for the **initial** optimization.

    The ``score_func`` parameter defines which of the scoring functions defined above should be used for the **main** optimization.

#. Finally, define running parameters for modeling (multiprocessing, cluster):
   
    For example, for multiprocessor workstation of 8 cores, just define:

    .. code-block:: python

        ntasks = 8

    For cluster using Slurm queuing system:

    .. code-block:: python

        cluster_submission_command = 'sbatch'
        from string import Template
        run_script_templ = Template("""#!/bin/bash
        #
        #SBATCH --ntasks=$ntasks
        #SBATCH --mem-per-cpu=2000
        #SBATCH --job-name=${prefix}fit
        #SBATCH --time=00:30:00
        #SBATCH -e $outdir/logs/${prefix}_err.txt
        #SBATCH -o $outdir/logs/${prefix}_out.txt

        echo "Running on:"
        srun hostname

        $cmd

        wait
        """)

In summary, the above parameter file looks like this:

    .. code-block:: python

        protocol = 'denovo_MC-SA'

        SA_schedule = [
            (50000,   10000),
            (10000,   10000),
            (5000,   10000),
            (1000,   50000),
            (100,   50000)
        ]

        do_ini_opt = True
        ini_opt_SA_schedule = [
            (1000, 1000)
        ]

        traj_frame_period = 10
        print_frame_period = traj_frame_period

        print_log_scores_to_files = True
        print_log_scores_to_files_frame_period = 10

        print_total_score_to_files = True
        print_total_score_to_files_frame_period = 10

        struct_resolutions = [1,10] 

        add_rbs_from_pdbs = True

        connectivity_restraint_weight = 1.0
        conn_first_copy_only = True

        add_symmetry_constraints = True

        add_xlink_restraints = True
        x_min_ld_score = 25
        x_weight = 100.0
        x_xlink_distance = 30
        x_xlink_score_type='HarmonicUpperBound'

        discrete_restraints_weight=10000 # weight for the restraint derived from the fit libraries

        ev_restraints = [
            {
                'name': 'ev_restraints_lowres',
                'weight': 10,
                'repr_resolution': 10
            },
            {
                'name': 'ev_restraints_highres',
                'weight': 10,
                'repr_resolution': 1
            }
        ]


        scoring_functions = {
            'score_func_lowres': {
                'restraints': [
                    'discrete_restraints',
                    'conn_restraints',
                    'ev_restraints_lowres'
                ]
            },
            'score_func_highres': {
                'restraints': [
                    'discrete_restraints',
                    'xlink_restraints',
                    'conn_restraints',
                    'ev_restraints_highres'
                ]
            }
        }

        score_func_ini_opt = 'score_func_lowres'
        score_func = 'score_func_highres'

        cluster_submission_command = 'sbatch'
        from string import Template
        run_script_templ = Template("""#!/bin/bash
        #
        #SBATCH --ntasks=$ntasks
        #SBATCH --mem-per-cpu=2000
        #SBATCH --job-name=${prefix}fit
        #SBATCH --time=00:30:00
        #SBATCH -e $outdir/logs/${prefix}_err.txt
        #SBATCH -o $outdir/logs/${prefix}_out.txt

        echo "Running on:"
        srun hostname

        $cmd

        wait
        """)


Parameters
----------

protocol
    Modeling protocol.
    Options: 

    * denovo_MC-SA - Monte Carlo Simulated Annealing global optimization moving rigid bodies according to pre-computed position libraries, if they are defined for the rigid bodies, otherwise moving with random rotations and translations. Flexible beads are moved through Monte Carlo with random rotations and translations.

    * denovo_MC-SA-CG - as denovo_MC-SA, but flexible beads are moved using Conjugate Gradient optimization

    * refine - both rigid bodies and flexible beads are moved with random rotations and translations, pre-computed libraries are ignored. Flexible beads are moved using Conjugate Gradient optimization.

    * all_combinations - generates all combinations of positions from pre-computed position libraries

    * custom - a function with a custom protocol based on a user-defined function custom_protocol() defined in params.

    , default: **denovo_MC-SA**
do_ini_opt
    Perform initial optimization using the score_func_ini_opt scoring function?, default: **False**
SA_schedule
    Simulated Annealing schedule.
    List of (temperature, number of steps) pairs., default: **[(30000, 1000), (2000, 1000), (1000, 1000)]**
before_opt_fn
    A Python function with code that should be executed before optimization, default: **None**
number_of_cg_steps_for_flex_beads
    Number of Conjugate Gradient steps per round in denovo_MC-SA-CG and refine protocols, default: **100**
stop_on_convergence
    Whether to stop the current stage of Simulated Annealing when converged, working well only for optimizations with high number of steps., default: **False**
no_frames_for_convergence
    Number of frames for evaluating convergence when stop_on_convergence is set to True, default: **1000**
print_frame_period
    Print every Nth frame ID in progress reporting, default: **10**
traj_frame_period
    Save every Nth frame to the trajectory output file, default: **10**
print_total_score_to_files
    Save frame total scores to log files?, default: **False**
print_total_score_to_files_frame_period
    Print total score to log files every Nth frame, default: **10**
print_log_scores_to_files
    Save frame individual restraint scores to log files?, default: **False**
print_log_scores_to_files_frame_period
    Print individual restraint scores to log files every Nth frame, default: **10**
struct_resolutions
    Resolutions of bead representations., default: **[1, 10]**
add_missing
    Add missing regions as flexible beads? Either False or a list of selectors., default: **False**
missing_resolution
    Representation resolution for the missing regions (if add_missing is specified), default: **1**
add_rbs_from_pdbs
    Define rigid bodies automatically based on pdb_files specification in JSON?, default: **True**
ca_only
    Read only Calpha from PDB structures?, default: **True**
add_connectivity_restraints
    Add domain and bead connectivity restraints?, default: **True**
connectivity_restraint_weight
    Connectivity restraint weight, default: **1.0**
max_conn_gap
    Number of missing residues of the missing atomic region above which the restraint will not be added, default: **None**
connectivity_restraint_k
    Connectivity restraint spring constant k., default: **10.0**
conn_reweight_fn
    A function that accepts the following parameters:
    mol - PMI molecule object
    next_resi_idx - residue of the next rigid body or bead
    prev_resi_idx - residue of the previous rigid body or bead
    connectivity_restraint_weight - default weight passed to add_connectivity_restraints, default: **None**
conn_first_copy_only
    Whether to add the restraints only for the first copy of the molecule (useful for symmetrical assemblies), default: **False**
ca_ca_connectivity_scale
    Scale the average CA-CA distance of 3.8 to account for that
    it is unlikely that the linker is fully stretched, default: 0.526 (count 2A per peptide bond), default: **0.526**
ev_restraints
    Specification of excluded volume (clash score) restraint.
    Parameters:
    'name': a custom name
    'weight': weight
    'repr_resolution': which representation resolution to use for this restraint
    'copies': which molecule copies are included in this restraint
    'distance_cutoff': distance cutoff for non-bonded list
    'slack': slack for non-bonded lists
    Read more about distance_cutoff and slack: https://integrativemodeling.org/2.14.0/doc/ref/classIMP_1_1container_1_1ClosePairContainer.html#aa7b183795bd28ab268e5e84a5ad0cd99, default: **[{'name': 'ev_restraints', 'weight': 1.0, 'repr_resolution': 10, 'copies': None, 'distance_cutoff': 0.0, 'slack': 5.0}]**
add_xlink_restraints
    Add crosslink restraints?, default: **False**
x_xlink_score_type
    A type of crosslink restraint.

    Options:

    'HarmonicUpperBound' - 0 below x_xlink_distance, harmonic above, distance calculated between centers of the beads

    'HarmonicUpperBoundSphereDistancePairScore' - 0 below x_xlink_distance, harmonic above, distance calculated between surfaces of the beads

    'XlinkScore' - 0 below x_xlink_distance, 1 above

    'LogHarmonic' - A log harmonic potential with the maximum at x_xlink_distance

    'CombinedHarmonic': - harmonic above distance of 35 A and log harmonic with the maximum at x_xlink_distance below 35 A

    , default: **HarmonicUpperBound**
x_min_ld_score
    Only crosslinks with the confidence score above this threshold will be used as restraints. The score must be defined in "score" column of crosslink CSV files (see also Xlink Analyzer documentation), default: **30.0**
x_weight
    Weight of crosslink restraints, default: **1.0**
x_xlink_distance
    Target or maximual crosslink distance (depending on the implementation), default: **30.0**
x_k
    Spring constant for the harmonic potential, default: **1.0**
x_inter_xlink_scale
    Multiply the weight of inter-molecule crosslinks by this value, default: **1.0**
x_first_copy_only
    Apply crosslinks only to the first molecule copy of each series, default: **False**
x_skip_pair_fn
    A Python function to skip specific crosslinks.
    Arguments: p1, p2, component1, component2, xlink
    Return True to skip the crosslink, default: **None**
x_log_filename
    Optional log file name for printing more information about added and skipped crosslinks, default: **None**
x_score_weighting
    Scale crosslink weights by their score, default: **False**
xlink_reweight_fn
    A custom Python function to scale crosslink weights.
    Arguments: xlink, weight
    Return final weight (float), default: **None**
x_random_sample_fraction
    Take this random fraction of crosslinks for modeling, default: **1.0**
add_symmetry_constraints
    Add symmetry constraints?, default: **False**
add_symmetry_restraints
    Add symmetry restraints?, default: **False**
symmetry_restraints_weight
    Weight of symmetry restraints, default: **1.0**
add_parsimonious_states_restraints
    Add parsimonious states restraints?, default: **False**
parsimonious_states_weight
    Weight, default: **1.0**
parsimonious_states_distance_threshold
    Distance threshold for elastic network restraining the states, default: **0.0**
parsimonious_states_exclude_rbs
    Python function to exclude selected rigid bodies from the restraint.
    Arguments: IMP's rigid body object
    Return: True if the rigid body should be excluded., default: Some default function
parsimonious_states_representation_resolution
    Representation resolution for this restraint, default: **10**
parsimonious_states_restrain_rb_transformations
    Restraint rigid body transformations instead of using elastic network, default: **True**
create_custom_restraints
    Python function to create custom restraints.
    Arguments: imp_utils1.MultiRepresentation class.
    Return dictionary mapping restraint names to list of restraints, default: **None**
discrete_restraints_weight
    Weight for restraints derived from the pre-defined positions, e.g. precomputed libraries of fits., default: **1.0**
discrete_mover_weight_score_fn
    help message, default: **None**
scoring_functions
    A collection of scoring functions, default: **{}**
score_func
    Name of the scoring functions in the scoring_functions collection to be used for the main optimization, default: **None**
score_func_for_CG
    Name of the scoring functions in the scoring_functions collection to be used for conjugate gradient steps.
    Only restraints that have implemented derivative calculation can be used for Conjugate Gradient optimization., default: **None**
score_func_ini_opt
    Name of the scoring functions in the scoring_functions collection to be used for the initial optimization, default: **None**
score_func_preconditioned_mc
    Name of the scoring functions in the scoring_functions collection to be used for the preconditioned Monte Carlo, default: **None**
add_custom_movers
    A Python function to add custom Monte Carlo movers, default: **None**
custom_preprocessing
    A Python function with code that should be executed before modeling protocols are initiated and after adding and setting all restraints.
    Arguments: imp_utils1.MultiRepresentation class., default: **None**
rb_max_rot
    Maximal rotation of a rigid body in a single Monte Carlo move, in radians, default: **0.2**
rb_max_trans
    Maximal translation of a rigid body in a single Monte Carlo move, in Angstroms, default: **2**
beads_max_trans
    Maximal translation of a flexible bead in a single Monte Carlo move, in Angstroms, default: **1**
randomize_initial_positions
    Randomize initial positions of all particles?, default: **False**
randomize_initial_positions_remove_clashes
    Randomize initial positions of all particles and run short optimization to try removing steric clashes?, default: **False**
get_movers_for_main_opt
    Python function defining movers for the main optimization, override for custom implementations, default: Some default function
get_movers_for_ini_opt
    Python function defining movers for the initial optimization, override for custom implementations, default: Some default function
get_movers_for_refine
    Python function defining movers for the refinement, override for custom implementations, default: Some default function
debug
    Print debug messages?, default: **False**
ntasks
    Number of tasks to run for multiprocessor runs, default: **1**
cluster_submission_command
    Command to run the cluster submission script, default: **None**
run_script_templ
    A template for running the jobs on a computer cluster.
    Make sure your template includes $ntasks, ${prefix}, $outdir, and $cmd, default: 

    .. code-block:: python

        """

            #!/bin/bash
            #
            #SBATCH --ntasks=$ntasks
            #SBATCH --mem-per-cpu=2000
            #SBATCH --job-name=${prefix}fit
            #SBATCH --time=5-00:00:00
            #SBATCH -e $outdir/logs/${prefix}_err.txt
            #SBATCH -o $outdir/logs/${prefix}_out.txt

            echo "Running on:"
            srun hostname

            $cmd

            wait #necessary when ntasks > 1 and cmd are run in the background, otherwise job may end before the background processes end
    
        """