Sampling exhaustiveness and precision ===================================== Run sampling performance analysis with imp-sampcon tool (described by `Viswanath et al. 2017 `_) #. Enter the output modelling directory #. Prepare the ``density.txt`` file: .. code-block:: bash create_density_file.py --project_dir config.json --by_rigid_body #. For complexes containing multiple copies of the same subunit, prepare the ``symm_groups.txt`` file storing information necessary to properly align homo-oligomeric structures .. code-block:: bash create_symm_groups_file.py --project_dir config.json params.py By default all molecule copies of the same subunits are grouped together, and this should be sufficient in most cases. In some special cases where subunits are directed to specific series, to group by series use ``--by series`` option: .. code-block:: bash create_symm_groups_file.py --project_dir --by series config.json params.py To additionally group series into bigger groups use ``--extra_series_groups`` option, e.g.: .. code-block:: bash create_symm_groups_file.py \ --project_dir \ --by series \ --extra_series_groups NR_1,NR2;CR_1,CR_2 \ config.json params.py which will group by series but on top will consider ambiguity between selected series #. Run ``setup_analysis.py`` script to prepare input files for the sampling exhaustiveness analysis based on your resulting models: .. code-block:: bash setup_analysis.py -s \ -o \ -d \ -n \ -k Example: .. code-block:: bash setup_analysis.py -s all_scores.csv -o analysis -d density.txt -n 20000 To see available options and default values run: .. code-block:: bash setup_analysis.py -h #. Run ``imp-sampcon exhaust`` tool (command-line tool provided with `IMP `_) to perform the actual analysis: .. code-block:: bash cd imp_sampcon exhaust -n \ --rmfA sample_A/sample_A_models.rmf3 \ --rmfB sample_B/sample_B_models.rmf3 \ --scoreA scoresA.txt --scoreB scoresB.txt \ -d /density.txt \ -m \ -c \ -gp \ -g To see available options and default values for imp-sampcon exhaust analysis run: .. code-block:: bash imp_sampcon exhaust -h In case the analysis will be run on slurm-based cluster then compile a bash script like the following and run with ``sbatch``: .. code-block:: bash #!/bin/bash #SBATCH --job-name=master_sampling_20000.job #SBATCH --output=./master_sampling_20000.out #SBATCH --error=./master_sampling_20000.err #SBATCH --nodes=1 #SBATCH --time=10:00:00 #SBATCH --qos=highest #SBATCH --cpus-per-task=15 #SBATCH --mem-per-cpu=4000 imp_sampcon exhaust -n CR_Y_test --rmfA sample_A/sample_A_models.rmf3 --rmfB sample_B/sample_B_models.rmf3 --scoreA scoresA.txt --scoreB scoresB.txt -d density.txt -m cpu_omp -c 15 -gp -g 5.0 #. In the output you will get, among other files: * ``.Sampling_Precision_Stats.txt`` with estimation of the sampling precision. * Clusters obtained after clustering at the above sampling precision in directories and files starting from ``cluster`` in their names, containing information about the models in the clusters and cluster localization densities * ``.Cluster_Precision.txt`` listing the precision for each cluster * PDF files with plots with the results of exhaustiveness tests See `Viswanath et al. 2017 `_ for detailed explanation of these concepts. #. Optimize the plots The fonts and value ranges in X and Y axes in the default plots from ``imp_sampcon exhaust`` are frequently not optimal. For this you have to adjust them manually. #. Copy the original ``gnuplot`` scripts to the current ``analysis`` directory by executing: .. code-block:: bash copy_sampcon_gnuplot_scripts.py This will copy four scripts to the current directory: * ``Plot_Cluster_Population.plt`` for the ``.Cluster_Population.pdf`` plot * ``Plot_Convergence_NM.plt`` for the ``.ChiSquare.pdf`` plot * ``Plot_Convergence_SD.plt`` for the ``.Score_Dist.pdf`` plot * ``Plot_Convergence_TS.plt`` for the ``.Top_Score_Conv.pdf`` plot #. Edit the scripts to adjust according to your liking or needs #. Run the scripts again: .. code-block:: bash gnuplot -e "sysname=''" Plot_Cluster_Population.plt gnuplot -e "sysname=''" Plot_Convergence_NM.plt gnuplot -e "sysname=''" Plot_Convergence_SD.plt gnuplot -e "sysname=''" Plot_Convergence_TS.plt For example: .. code-block:: bash gnuplot -e "sysname='elongator'" Plot_Cluster_Population.plt gnuplot -e "sysname='elongator'" Plot_Convergence_NM.plt gnuplot -e "sysname='elongator'" Plot_Convergence_SD.plt gnuplot -e "sysname='elongator'" Plot_Convergence_TS.plt #. Extract cluster models For example, to extract the 5 top scoring models: .. code-block:: bash extract_cluster_models.py \ --project_dir \ --outdir cluster.0/ \ --ntop 5 \ --scores ../all_scores.csv \ Identities_A.txt Identities_B.txt cluster.0.all.txt ../config.json #. If the exhaustiveness is not met, run more jobs: :doc:`run_more`