Parameter file ============== The **parameter file** file defines modeling protocol, scoring functions, output parameters and some restraints. The parameter file is written in Python programming language, but really, no programming required! Creating the parameter file --------------------------- #. Define the protocol: For example, for the :doc:`combinations` .. code-block:: python protocol = 'denovo_MC-SA' See parameters below for available protocols and guidelines for :doc:`combinations`, :doc:`recombinations`, :doc:`refinement` #. Define Monte Carlo Simulated Annealing schedule: For example: .. code-block:: python SA_schedule = [ (50000, 10000), (10000, 10000), (5000, 10000), (1000, 50000), (100, 50000) ] SA_schedule is a list of pairs. Each pair defines a Monte Carlo temperature and number of steps. The Monte Carlo Simulated Annealing with run the specified number of steps for the first temperature, then switch to the second temperature in the list and so on. .. note:: **How to select the temperatures and number of steps?** Unfortunately, there is no absolute way. For this, one usually first runs a small number of runs, and evaluates the output to select. The **temperature** would be selected to be similar in order to the obtained scores (e.g. if scores are in the 100,000 range, the temperature could be 10,000-100,000 range too) and based on the acceptance rates of accepted moves (listed in the output log files). If the acceptance rates are low, you may want to increase the temperatures. You can always calculate the Metropolis probability of accepting a move given your score scales. For example, if you score moved up from 100,000 to 110,000 (difference 10,000) you can calculate the probability of accepting this move by this equation: .. image:: images/metropolis.png :width: 300 :alt: Metropolis substituting ``k=1``, ``T=``, ``deltaScore=10,000`` The **number of steps** can be adjusted after evaluating the convergence of the initial runs. Also, the number of steps is always a compromise between the time needed for convergence and available computational resources. #. Optionally, define an initial optimization: .. code-block:: python do_ini_opt = True ini_opt_SA_schedule = [ (1000, 1000) ] The initial optimization can be used for example to equilibrate the system with different restraints than in the main optimization. #. Optionally, adjust log files and frequency of saving logs and frames: .. code-block:: python traj_frame_period = 10 print_frame_period = traj_frame_period print_log_scores_to_files = True print_log_scores_to_files_frame_period = 10 print_total_score_to_files = True print_total_score_to_files_frame_period = 10 #. Adjust representation resolutions for your molecules: For example, to define two resolutions, 1- and 10-residues per bead: .. code-block:: python struct_resolutions = [1,10] #. Decide whether to define rigid bodies based in the :doc:`input_structures` definition in the :doc:`json`: .. code-block:: python add_rbs_from_pdbs = True #. Set weight for :doc:`connectivity_restraints` and whether they should be applied only to the first copy of each series (should be True for symmetry **constrained** complexes): .. code-block:: python connectivity_restraint_weight = 1.0 conn_first_copy_only = True #. Add symmetry constraints and/or restraints?: For symmetry **constraints**: .. code-block:: python add_symmetry_constraints = True For symmetry **constraints**: .. code-block:: python add_symmetry_restraints = True #. Optionally, set parameters for cross-link restraints: .. code-block:: python add_xlink_restraints = True x_min_ld_score = 25 x_weight = 100.0 x_xlink_distance = 30 x_xlink_score_type='HarmonicUpperBound' #. For :doc:`combinations` and :doc:`recombinations`, set weight for "discrete restraints": The descrete restraints is just a name used for the restraint derived from the :doc:`fit_libs` .. code-block:: python discrete_restraints_weight=10000 # weight for the restraint derived from the fit libraries #. Define excluded volume (steric) restraints: .. code-block:: python ev_restraints = [ { 'name': 'ev_restraints_lowres', 'weight': 10, 'repr_resolution': 10 }, { 'name': 'ev_restraints_highres', 'weight': 10, 'repr_resolution': 1 } ] In this block you define a list of possible excluded volume restraints. Each gets a custom name, weight and resolution of the representation it is assigned. You decide which will be used later in the scoring functions. The representation resolution will match the closest existing, e.g. if you have mixed representation resolution, 10 and 1 for rigid bodies and 1 for flexible beads, and specify ``repr_resolution`` of 10, it will be matched to the resolution 10 of rigid bodies and 1 of flexible beads. #. Define scoring functions: In the example below, two scoring functions are defined, ``score_func_ini_opt`` and ``score_func_lowres``. These are arbitrary names, choose them to your liking. Each scoring function has a list of ``restraints``, listing names defined above and any other restraints defined in the JSON. Names of some restraints are predefined, such as: * discrete_restraints - restraints derived from the pre-defined positions, e.g. precomputed libraries of fits * conn_restraints - connectivity restraints * xlink_restraints - cross-link restraints Other restraints bare names defined above (e.g. ``ev_restraints_lowres`` and ``ev_restraints_highres`` defined above) or in the :doc:json. .. code-block:: python scoring_functions = { 'score_func_lowres': { 'restraints': [ 'discrete_restraints', 'conn_restraints', 'ev_restraints_lowres' ] }, 'score_func_highres': { 'restraints': [ 'discrete_restraints', 'xlink_restraints', 'conn_restraints', 'ev_restraints_highres' ] } } In the example above, two scoring functions are defined. One uses the low resolution excluded volume restraint and does not use crosslinks. The second uses high resolution representation and turns on crosslinks. #. Define which scoring function should be used for each modeling stage: .. code-block:: python score_func_ini_opt = 'score_func_lowres' score_func = 'score_func_highres' The ``score_func_ini_opt`` parameter defines which of the scoring functions defined above should be used for the **initial** optimization. The ``score_func`` parameter defines which of the scoring functions defined above should be used for the **main** optimization. #. Finally, define running parameters for modeling (multiprocessing, cluster): For example, for multiprocessor workstation of 8 cores, just define: .. code-block:: python ntasks = 8 For cluster using Slurm queuing system: .. code-block:: python cluster_submission_command = 'sbatch' from string import Template run_script_templ = Template("""#!/bin/bash # #SBATCH --ntasks=$ntasks #SBATCH --mem-per-cpu=2000 #SBATCH --job-name=${prefix}fit #SBATCH --time=00:30:00 #SBATCH -e $outdir/logs/${prefix}_err.txt #SBATCH -o $outdir/logs/${prefix}_out.txt echo "Running on:" srun hostname $cmd wait """) In summary, the above parameter file looks like this: .. code-block:: python protocol = 'denovo_MC-SA' SA_schedule = [ (50000, 10000), (10000, 10000), (5000, 10000), (1000, 50000), (100, 50000) ] do_ini_opt = True ini_opt_SA_schedule = [ (1000, 1000) ] traj_frame_period = 10 print_frame_period = traj_frame_period print_log_scores_to_files = True print_log_scores_to_files_frame_period = 10 print_total_score_to_files = True print_total_score_to_files_frame_period = 10 struct_resolutions = [1,10] add_rbs_from_pdbs = True connectivity_restraint_weight = 1.0 conn_first_copy_only = True add_symmetry_constraints = True add_xlink_restraints = True x_min_ld_score = 25 x_weight = 100.0 x_xlink_distance = 30 x_xlink_score_type='HarmonicUpperBound' discrete_restraints_weight=10000 # weight for the restraint derived from the fit libraries ev_restraints = [ { 'name': 'ev_restraints_lowres', 'weight': 10, 'repr_resolution': 10 }, { 'name': 'ev_restraints_highres', 'weight': 10, 'repr_resolution': 1 } ] scoring_functions = { 'score_func_lowres': { 'restraints': [ 'discrete_restraints', 'conn_restraints', 'ev_restraints_lowres' ] }, 'score_func_highres': { 'restraints': [ 'discrete_restraints', 'xlink_restraints', 'conn_restraints', 'ev_restraints_highres' ] } } score_func_ini_opt = 'score_func_lowres' score_func = 'score_func_highres' cluster_submission_command = 'sbatch' from string import Template run_script_templ = Template("""#!/bin/bash # #SBATCH --ntasks=$ntasks #SBATCH --mem-per-cpu=2000 #SBATCH --job-name=${prefix}fit #SBATCH --time=00:30:00 #SBATCH -e $outdir/logs/${prefix}_err.txt #SBATCH -o $outdir/logs/${prefix}_out.txt echo "Running on:" srun hostname $cmd wait """) Parameters ---------- protocol Modeling protocol. Options: * denovo_MC-SA - Monte Carlo Simulated Annealing global optimization moving rigid bodies according to pre-computed position libraries, if they are defined for the rigid bodies, otherwise moving with random rotations and translations. Flexible beads are moved through Monte Carlo with random rotations and translations. * denovo_MC-SA-CG - as denovo_MC-SA, but flexible beads are moved using Conjugate Gradient optimization * refine - both rigid bodies and flexible beads are moved with random rotations and translations, pre-computed libraries are ignored. Flexible beads are moved using Conjugate Gradient optimization. * all_combinations - generates all combinations of positions from pre-computed position libraries * custom - a function with a custom protocol based on a user-defined function custom_protocol() defined in params. , default: **denovo_MC-SA** do_ini_opt Perform initial optimization using the score_func_ini_opt scoring function?, default: **False** SA_schedule Simulated Annealing schedule. List of (temperature, number of steps) pairs., default: **[(30000, 1000), (2000, 1000), (1000, 1000)]** before_opt_fn A Python function with code that should be executed before optimization, default: **None** number_of_cg_steps_for_flex_beads Number of Conjugate Gradient steps per round in denovo_MC-SA-CG and refine protocols, default: **100** stop_on_convergence Whether to stop the current stage of Simulated Annealing when converged, working well only for optimizations with high number of steps., default: **False** no_frames_for_convergence Number of frames for evaluating convergence when stop_on_convergence is set to True, default: **1000** print_frame_period Print every Nth frame ID in progress reporting, default: **10** traj_frame_period Save every Nth frame to the trajectory output file, default: **10** print_total_score_to_files Save frame total scores to log files?, default: **False** print_total_score_to_files_frame_period Print total score to log files every Nth frame, default: **10** print_log_scores_to_files Save frame individual restraint scores to log files?, default: **False** print_log_scores_to_files_frame_period Print individual restraint scores to log files every Nth frame, default: **10** struct_resolutions Resolutions of bead representations., default: **[1, 10]** add_missing Add missing regions as flexible beads? Either False or a list of selectors., default: **False** missing_resolution Representation resolution for the missing regions (if add_missing is specified), default: **1** add_rbs_from_pdbs Define rigid bodies automatically based on pdb_files specification in JSON?, default: **True** ca_only Read only Calpha from PDB structures?, default: **True** add_connectivity_restraints Add domain and bead connectivity restraints?, default: **True** connectivity_restraint_weight Connectivity restraint weight, default: **1.0** max_conn_gap Number of missing residues of the missing atomic region above which the restraint will not be added, default: **None** connectivity_restraint_k Connectivity restraint spring constant k., default: **10.0** conn_reweight_fn A function that accepts the following parameters: mol - PMI molecule object next_resi_idx - residue of the next rigid body or bead prev_resi_idx - residue of the previous rigid body or bead connectivity_restraint_weight - default weight passed to add_connectivity_restraints, default: **None** conn_first_copy_only Whether to add the restraints only for the first copy of the molecule (useful for symmetrical assemblies), default: **False** ca_ca_connectivity_scale Scale the average CA-CA distance of 3.8 to account for that it is unlikely that the linker is fully stretched, default: 0.526 (count 2A per peptide bond), default: **0.526** ev_restraints Specification of excluded volume (clash score) restraint. Parameters: 'name': a custom name 'weight': weight 'repr_resolution': which representation resolution to use for this restraint 'copies': which molecule copies are included in this restraint 'distance_cutoff': distance cutoff for non-bonded list 'slack': slack for non-bonded lists Read more about distance_cutoff and slack: https://integrativemodeling.org/2.14.0/doc/ref/classIMP_1_1container_1_1ClosePairContainer.html#aa7b183795bd28ab268e5e84a5ad0cd99, default: **[{'name': 'ev_restraints', 'weight': 1.0, 'repr_resolution': 10, 'copies': None, 'distance_cutoff': 0.0, 'slack': 5.0}]** add_xlink_restraints Add crosslink restraints?, default: **False** x_xlink_score_type A type of crosslink restraint. Options: 'HarmonicUpperBound' - 0 below x_xlink_distance, harmonic above, distance calculated between centers of the beads 'HarmonicUpperBoundSphereDistancePairScore' - 0 below x_xlink_distance, harmonic above, distance calculated between surfaces of the beads 'XlinkScore' - 0 below x_xlink_distance, 1 above 'LogHarmonic' - A log harmonic potential with the maximum at x_xlink_distance 'CombinedHarmonic': - harmonic above distance of 35 A and log harmonic with the maximum at x_xlink_distance below 35 A , default: **HarmonicUpperBound** x_min_ld_score Only crosslinks with the confidence score above this threshold will be used as restraints. The score must be defined in "score" column of crosslink CSV files (see also Xlink Analyzer documentation), default: **30.0** x_weight Weight of crosslink restraints, default: **1.0** x_xlink_distance Target or maximual crosslink distance (depending on the implementation), default: **30.0** x_k Spring constant for the harmonic potential, default: **1.0** x_inter_xlink_scale Multiply the weight of inter-molecule crosslinks by this value, default: **1.0** x_first_copy_only Apply crosslinks only to the first molecule copy of each series, default: **False** x_skip_pair_fn A Python function to skip specific crosslinks. Arguments: p1, p2, component1, component2, xlink Return True to skip the crosslink, default: **None** x_log_filename Optional log file name for printing more information about added and skipped crosslinks, default: **None** x_score_weighting Scale crosslink weights by their score, default: **False** xlink_reweight_fn A custom Python function to scale crosslink weights. Arguments: xlink, weight Return final weight (float), default: **None** x_random_sample_fraction Take this random fraction of crosslinks for modeling, default: **1.0** add_symmetry_constraints Add symmetry constraints?, default: **False** add_symmetry_restraints Add symmetry restraints?, default: **False** symmetry_restraints_weight Weight of symmetry restraints, default: **1.0** add_parsimonious_states_restraints Add parsimonious states restraints?, default: **False** parsimonious_states_weight Weight, default: **1.0** parsimonious_states_distance_threshold Distance threshold for elastic network restraining the states, default: **0.0** parsimonious_states_exclude_rbs Python function to exclude selected rigid bodies from the restraint. Arguments: IMP's rigid body object Return: True if the rigid body should be excluded., default: Some default function parsimonious_states_representation_resolution Representation resolution for this restraint, default: **10** parsimonious_states_restrain_rb_transformations Restraint rigid body transformations instead of using elastic network, default: **True** create_custom_restraints Python function to create custom restraints. Arguments: imp_utils1.MultiRepresentation class. Return dictionary mapping restraint names to list of restraints, default: **None** discrete_restraints_weight Weight for restraints derived from the pre-defined positions, e.g. precomputed libraries of fits., default: **1.0** discrete_mover_weight_score_fn help message, default: **None** scoring_functions A collection of scoring functions, default: **{}** score_func Name of the scoring functions in the scoring_functions collection to be used for the main optimization, default: **None** score_func_for_CG Name of the scoring functions in the scoring_functions collection to be used for conjugate gradient steps. Only restraints that have implemented derivative calculation can be used for Conjugate Gradient optimization., default: **None** score_func_ini_opt Name of the scoring functions in the scoring_functions collection to be used for the initial optimization, default: **None** score_func_preconditioned_mc Name of the scoring functions in the scoring_functions collection to be used for the preconditioned Monte Carlo, default: **None** add_custom_movers A Python function to add custom Monte Carlo movers, default: **None** custom_preprocessing A Python function with code that should be executed before modeling protocols are initiated and after adding and setting all restraints. Arguments: imp_utils1.MultiRepresentation class., default: **None** rb_max_rot Maximal rotation of a rigid body in a single Monte Carlo move, in radians, default: **0.2** rb_max_trans Maximal translation of a rigid body in a single Monte Carlo move, in Angstroms, default: **2** beads_max_trans Maximal translation of a flexible bead in a single Monte Carlo move, in Angstroms, default: **1** randomize_initial_positions Randomize initial positions of all particles?, default: **False** randomize_initial_positions_remove_clashes Randomize initial positions of all particles and run short optimization to try removing steric clashes?, default: **False** get_movers_for_main_opt Python function defining movers for the main optimization, override for custom implementations, default: Some default function get_movers_for_ini_opt Python function defining movers for the initial optimization, override for custom implementations, default: Some default function get_movers_for_refine Python function defining movers for the refinement, override for custom implementations, default: Some default function debug Print debug messages?, default: **False** ntasks Number of tasks to run for multiprocessor runs, default: **1** cluster_submission_command Command to run the cluster submission script, default: **None** run_script_templ A template for running the jobs on a computer cluster. Make sure your template includes $ntasks, ${prefix}, $outdir, and $cmd, default: .. code-block:: python """ #!/bin/bash # #SBATCH --ntasks=$ntasks #SBATCH --mem-per-cpu=2000 #SBATCH --job-name=${prefix}fit #SBATCH --time=5-00:00:00 #SBATCH -e $outdir/logs/${prefix}_err.txt #SBATCH -o $outdir/logs/${prefix}_out.txt echo "Running on:" srun hostname $cmd wait #necessary when ntasks > 1 and cmd are run in the background, otherwise job may end before the background processes end """