Parameter file

The parameter file file defines modeling protocol, scoring functions, output parameters and some restraints.

The parameter file is written in Python programming language, but really, no programming required!

Creating the parameter file

Define the protocol:
For example, for the 1. Global optimization
protocol = 'denovo_MC-SA'
See parameters below for available protocols and guidelines for 1. Global optimization, 2. Recombinations, 3. Refinement
Define Monte Carlo Simulated Annealing schedule:
For example:
SA_schedule = [ (50000, 10000), (10000, 10000), (5000, 10000), (1000, 50000), (100, 50000) ]
SA_schedule is a list of pairs. Each pair defines a Monte Carlo temperature and number of steps.

The Monte Carlo Simulated Annealing with run the specified number of steps for the first temperature, then switch to the second temperature in the list and so on.

Note

How to select the temperatures and number of steps?

Unfortunately, there is no absolute way. For this, one usually first runs a small number of runs, and evaluates the output to select.

The temperature would be selected to be similar in order to the obtained scores (e.g. if scores are in the 100,000 range, the temperature could be 10,000-100,000 range too) and based on the acceptance rates of accepted moves (listed in the output log files). If the acceptance rates are low, you may want to increase the temperatures.

You can always calculate the Metropolis probability of accepting a move given your score scales. For example, if you score moved up from 100,000 to 110,000 (difference 10,000) you can calculate the probability of accepting this move by this equation:

substituting k=1, T=<your temperature>, deltaScore=10,000

The number of steps can be adjusted after evaluating the convergence of the initial runs. Also, the number of steps is always a compromise between the time needed for convergence and available computational resources.
Optionally, define an initial optimization:
do_ini_opt = True ini_opt_SA_schedule = [ (1000, 1000) ]
The initial optimization can be used for example to equilibrate the system with different restraints than in the main optimization.

Optionally, adjust log files and frequency of saving logs and frames:

traj_frame_period = 10
print_frame_period = traj_frame_period

print_log_scores_to_files = True
print_log_scores_to_files_frame_period = 10

print_total_score_to_files = True
print_total_score_to_files_frame_period = 10

Adjust representation resolutions for your molecules:

For example, to define two resolutions, 1- and 10-residues per bead:
struct_resolutions = [1,10]
Decide whether to define rigid bodies based in the Input structures definition in the JSON configuration file:
add_rbs_from_pdbs = True
Set weight for Connectivity restraints and whether they should be applied only to the first copy of each series (should be True for symmetry constrained complexes):
connectivity_restraint_weight = 1.0 conn_first_copy_only = True
Add symmetry constraints and/or restraints?:
For symmetry constraints:
add_symmetry_constraints = True
For symmetry constraints:
add_symmetry_restraints = True

Optionally, set parameters for cross-link restraints:

add_xlink_restraints = True
x_min_ld_score = 25
x_weight = 100.0
x_xlink_distance = 30
x_xlink_score_type='HarmonicUpperBound'

For 1. Global optimization and 2. Recombinations, set weight for “discrete restraints”:
The descrete restraints is just a name used for the restraint derived from the Adding precomputed fitting libraries to JSON
discrete_restraints_weight=10000 # weight for the restraint derived from the fit libraries
Define excluded volume (steric) restraints:
ev_restraints = [ { 'name': 'ev_restraints_lowres', 'weight': 10, 'repr_resolution': 10 }, { 'name': 'ev_restraints_highres', 'weight': 10, 'repr_resolution': 1 } ]
In this block you define a list of possible excluded volume restraints. Each gets a custom name, weight and resolution of the representation it is assigned. You decide which will be used later in the scoring functions.

The representation resolution will match the closest existing, e.g. if you have mixed representation resolution, 10 and 1 for rigid bodies and 1 for flexible beads, and specify repr_resolution of 10, it will be matched to the resolution 10 of rigid bodies and 1 of flexible beads.
Define scoring functions:
In the example below, two scoring functions are defined, score_func_ini_opt and score_func_lowres. These are arbitrary names, choose them to your liking. Each scoring function has a list of restraints, listing names defined above and any other restraints defined in the JSON.

Names of some restraints are predefined, such as:
- discrete_restraints - restraints derived from the pre-defined positions, e.g. precomputed libraries of fits
- conn_restraints - connectivity restraints
- xlink_restraints - cross-link restraints
Other restraints bare names defined above (e.g. ev_restraints_lowres and ev_restraints_highres defined above) or in the :doc:json.
scoring_functions = { 'score_func_lowres': { 'restraints': [ 'discrete_restraints', 'conn_restraints', 'ev_restraints_lowres' ] }, 'score_func_highres': { 'restraints': [ 'discrete_restraints', 'xlink_restraints', 'conn_restraints', 'ev_restraints_highres' ] } }
In the example above, two scoring functions are defined. One uses the low resolution excluded volume restraint and does not use crosslinks. The second uses high resolution representation and turns on crosslinks.
Define which scoring function should be used for each modeling stage:
score_func_ini_opt = 'score_func_lowres' score_func = 'score_func_highres'
The score_func_ini_opt parameter defines which of the scoring functions defined above should be used for the initial optimization.

The score_func parameter defines which of the scoring functions defined above should be used for the main optimization.

Finally, define running parameters for modeling (multiprocessing, cluster):

For example, for multiprocessor workstation of 8 cores, just define:

ntasks = 8

For cluster using Slurm queuing system:

cluster_submission_command = 'sbatch'
from string import Template
run_script_templ = Template("""#!/bin/bash
#
#SBATCH --ntasks=$ntasks
#SBATCH --mem-per-cpu=2000
#SBATCH --job-name=${prefix}fit
#SBATCH --time=00:30:00
#SBATCH -e $outdir/logs/${prefix}_err.txt
#SBATCH -o $outdir/logs/${prefix}_out.txt

echo "Running on:"
srun hostname

$cmd

wait
""")

In summary, the above parameter file looks like this:

protocol = 'denovo_MC-SA'

SA_schedule = [
    (50000,   10000),
    (10000,   10000),
    (5000,   10000),
    (1000,   50000),
    (100,   50000)
]

do_ini_opt = True
ini_opt_SA_schedule = [
    (1000, 1000)
]

traj_frame_period = 10
print_frame_period = traj_frame_period

print_log_scores_to_files = True
print_log_scores_to_files_frame_period = 10

print_total_score_to_files = True
print_total_score_to_files_frame_period = 10

struct_resolutions = [1,10]

add_rbs_from_pdbs = True

connectivity_restraint_weight = 1.0
conn_first_copy_only = True

add_symmetry_constraints = True

add_xlink_restraints = True
x_min_ld_score = 25
x_weight = 100.0
x_xlink_distance = 30
x_xlink_score_type='HarmonicUpperBound'

discrete_restraints_weight=10000 # weight for the restraint derived from the fit libraries

ev_restraints = [
    {
        'name': 'ev_restraints_lowres',
        'weight': 10,
        'repr_resolution': 10
    },
    {
        'name': 'ev_restraints_highres',
        'weight': 10,
        'repr_resolution': 1
    }
]


scoring_functions = {
    'score_func_lowres': {
        'restraints': [
            'discrete_restraints',
            'conn_restraints',
            'ev_restraints_lowres'
        ]
    },
    'score_func_highres': {
        'restraints': [
            'discrete_restraints',
            'xlink_restraints',
            'conn_restraints',
            'ev_restraints_highres'
        ]
    }
}

score_func_ini_opt = 'score_func_lowres'
score_func = 'score_func_highres'

cluster_submission_command = 'sbatch'
from string import Template
run_script_templ = Template("""#!/bin/bash
#
#SBATCH --ntasks=$ntasks
#SBATCH --mem-per-cpu=2000
#SBATCH --job-name=${prefix}fit
#SBATCH --time=00:30:00
#SBATCH -e $outdir/logs/${prefix}_err.txt
#SBATCH -o $outdir/logs/${prefix}_out.txt

echo "Running on:"
srun hostname

$cmd

wait
""")

Parameters

protocol

Modeling protocol. Options:

denovo_MC-SA - Monte Carlo Simulated Annealing global optimization moving rigid bodies according to pre-computed position libraries, if they are defined for the rigid bodies, otherwise moving with random rotations and translations. Flexible beads are moved through Monte Carlo with random rotations and translations.
denovo_MC-SA-CG - as denovo_MC-SA, but flexible beads are moved using Conjugate Gradient optimization
refine - both rigid bodies and flexible beads are moved with random rotations and translations, pre-computed libraries are ignored. Flexible beads are moved using Conjugate Gradient optimization.
all_combinations - generates all combinations of positions from pre-computed position libraries
custom - a function with a custom protocol based on a user-defined function custom_protocol() defined in params.

, default: denovo_MC-SA

do_ini_opt

Perform initial optimization using the score_func_ini_opt scoring function?, default: False

SA_schedule

Simulated Annealing schedule. List of (temperature, number of steps) pairs., default: [(30000, 1000), (2000, 1000), (1000, 1000)]

before_opt_fn

A Python function with code that should be executed before optimization, default: None

number_of_cg_steps_for_flex_beads

Number of Conjugate Gradient steps per round in denovo_MC-SA-CG and refine protocols, default: 100

stop_on_convergence

Whether to stop the current stage of Simulated Annealing when converged, working well only for optimizations with high number of steps., default: False

no_frames_for_convergence

Number of frames for evaluating convergence when stop_on_convergence is set to True, default: 1000

print_frame_period

Print every Nth frame ID in progress reporting, default: 10

traj_frame_period

Save every Nth frame to the trajectory output file, default: 10

print_total_score_to_files

Save frame total scores to log files?, default: False

print_total_score_to_files_frame_period

Print total score to log files every Nth frame, default: 10

print_log_scores_to_files

Save frame individual restraint scores to log files?, default: False

print_log_scores_to_files_frame_period

Print individual restraint scores to log files every Nth frame, default: 10

struct_resolutions

Resolutions of bead representations., default: [1, 10]

add_missing

Add missing regions as flexible beads? Either False or a list of selectors., default: False

missing_resolution

Representation resolution for the missing regions (if add_missing is specified), default: 1

add_rbs_from_pdbs

Define rigid bodies automatically based on pdb_files specification in JSON?, default: True

ca_only

Read only Calpha from PDB structures?, default: True

add_connectivity_restraints

Add domain and bead connectivity restraints?, default: True

connectivity_restraint_weight

Connectivity restraint weight, default: 1.0

max_conn_gap

Number of missing residues of the missing atomic region above which the restraint will not be added, default: None

connectivity_restraint_k

Connectivity restraint spring constant k., default: 10.0

conn_reweight_fn

A function that accepts the following parameters: mol - PMI molecule object next_resi_idx - residue of the next rigid body or bead prev_resi_idx - residue of the previous rigid body or bead connectivity_restraint_weight - default weight passed to add_connectivity_restraints, default: None

conn_first_copy_only

Whether to add the restraints only for the first copy of the molecule (useful for symmetrical assemblies), default: False

ca_ca_connectivity_scale

Scale the average CA-CA distance of 3.8 to account for that it is unlikely that the linker is fully stretched, default: 0.526 (count 2A per peptide bond), default: 0.526

ev_restraints

Specification of excluded volume (clash score) restraint. Parameters: ‘name’: a custom name ‘weight’: weight ‘repr_resolution’: which representation resolution to use for this restraint ‘copies’: which molecule copies are included in this restraint ‘distance_cutoff’: distance cutoff for non-bonded list ‘slack’: slack for non-bonded lists Read more about distance_cutoff and slack: https://integrativemodeling.org/2.14.0/doc/ref/classIMP_1_1container_1_1ClosePairContainer.html#aa7b183795bd28ab268e5e84a5ad0cd99, default: [{‘name’: ‘ev_restraints’, ‘weight’: 1.0, ‘repr_resolution’: 10, ‘copies’: None, ‘distance_cutoff’: 0.0, ‘slack’: 5.0}]

add_xlink_restraints

Add crosslink restraints?, default: False

x_xlink_score_type

A type of crosslink restraint.

Options:

‘HarmonicUpperBound’ - 0 below x_xlink_distance, harmonic above, distance calculated between centers of the beads

‘HarmonicUpperBoundSphereDistancePairScore’ - 0 below x_xlink_distance, harmonic above, distance calculated between surfaces of the beads

‘XlinkScore’ - 0 below x_xlink_distance, 1 above

‘LogHarmonic’ - A log harmonic potential with the maximum at x_xlink_distance

‘CombinedHarmonic’: - harmonic above distance of 35 A and log harmonic with the maximum at x_xlink_distance below 35 A

, default: HarmonicUpperBound

x_min_ld_score

Only crosslinks with the confidence score above this threshold will be used as restraints. The score must be defined in “score” column of crosslink CSV files (see also Xlink Analyzer documentation), default: 30.0

x_weight

Weight of crosslink restraints, default: 1.0

x_xlink_distance

Target or maximual crosslink distance (depending on the implementation), default: 30.0

x_k

Spring constant for the harmonic potential, default: 1.0

x_inter_xlink_scale

Multiply the weight of inter-molecule crosslinks by this value, default: 1.0

x_first_copy_only

Apply crosslinks only to the first molecule copy of each series, default: False

x_skip_pair_fn

A Python function to skip specific crosslinks. Arguments: p1, p2, component1, component2, xlink Return True to skip the crosslink, default: None

x_log_filename

Optional log file name for printing more information about added and skipped crosslinks, default: None

x_score_weighting

Scale crosslink weights by their score, default: False

xlink_reweight_fn

A custom Python function to scale crosslink weights. Arguments: xlink, weight Return final weight (float), default: None

x_random_sample_fraction

Take this random fraction of crosslinks for modeling, default: 1.0

add_symmetry_constraints

Add symmetry constraints?, default: False

add_symmetry_restraints

Add symmetry restraints?, default: False

symmetry_restraints_weight

Weight of symmetry restraints, default: 1.0

add_parsimonious_states_restraints

Add parsimonious states restraints?, default: False

parsimonious_states_weight

Weight, default: 1.0

parsimonious_states_distance_threshold

Distance threshold for elastic network restraining the states, default: 0.0

parsimonious_states_exclude_rbs

Python function to exclude selected rigid bodies from the restraint. Arguments: IMP’s rigid body object Return: True if the rigid body should be excluded., default: Some default function

parsimonious_states_representation_resolution

Representation resolution for this restraint, default: 10

parsimonious_states_restrain_rb_transformations

Restraint rigid body transformations instead of using elastic network, default: True

create_custom_restraints

Python function to create custom restraints. Arguments: imp_utils1.MultiRepresentation class. Return dictionary mapping restraint names to list of restraints, default: None

discrete_restraints_weight

Weight for restraints derived from the pre-defined positions, e.g. precomputed libraries of fits., default: 1.0

discrete_mover_weight_score_fn

help message, default: None

scoring_functions

A collection of scoring functions, default: {}

score_func

Name of the scoring functions in the scoring_functions collection to be used for the main optimization, default: None

score_func_for_CG

Name of the scoring functions in the scoring_functions collection to be used for conjugate gradient steps. Only restraints that have implemented derivative calculation can be used for Conjugate Gradient optimization., default: None

score_func_ini_opt

Name of the scoring functions in the scoring_functions collection to be used for the initial optimization, default: None

score_func_preconditioned_mc

Name of the scoring functions in the scoring_functions collection to be used for the preconditioned Monte Carlo, default: None

add_custom_movers

A Python function to add custom Monte Carlo movers, default: None

custom_preprocessing

A Python function with code that should be executed before modeling protocols are initiated and after adding and setting all restraints. Arguments: imp_utils1.MultiRepresentation class., default: None

rb_max_rot

Maximal rotation of a rigid body in a single Monte Carlo move, in radians, default: 0.2

rb_max_trans

Maximal translation of a rigid body in a single Monte Carlo move, in Angstroms, default: 2

beads_max_trans

Maximal translation of a flexible bead in a single Monte Carlo move, in Angstroms, default: 1

randomize_initial_positions

Randomize initial positions of all particles?, default: False

randomize_initial_positions_remove_clashes

Randomize initial positions of all particles and run short optimization to try removing steric clashes?, default: False

get_movers_for_main_opt

Python function defining movers for the main optimization, override for custom implementations, default: Some default function

get_movers_for_ini_opt

Python function defining movers for the initial optimization, override for custom implementations, default: Some default function

get_movers_for_refine

Python function defining movers for the refinement, override for custom implementations, default: Some default function

debug

Print debug messages?, default: False

ntasks

Number of tasks to run for multiprocessor runs, default: 1

cluster_submission_command

Command to run the cluster submission script, default: None

run_script_templ

A template for running the jobs on a computer cluster. Make sure your template includes $ntasks, ${prefix}, $outdir, and $cmd, default:

"""

    #!/bin/bash
    #
    #SBATCH --ntasks=$ntasks
    #SBATCH --mem-per-cpu=2000
    #SBATCH --job-name=${prefix}fit
    #SBATCH --time=5-00:00:00
    #SBATCH -e $outdir/logs/${prefix}_err.txt
    #SBATCH -o $outdir/logs/${prefix}_out.txt

    echo "Running on:"
    srun hostname

    $cmd

    wait #necessary when ntasks > 1 and cmd are run in the background, otherwise job may end before the background processes end

"""