PartitionFinder2 on XSEDE 2.1.1 Selecting best-fit partitioning schemes and models of evolution Robert Lanfear R.Lanfear, P.B. Frandsen, A.M. Wright, T. Senfeld, B. Calcott (2016) PartitionFinder 2: New Methods for Selecting Partitioned Models of Evolution for Molecular and Morphological Phylogenetic Analyses. Mol Biol Evol 2016 msw260. doi: 10.1093/molbev/msw260 Phylogenetics partitionfinder2_xsede pfinder2_comet perl "" 1 pfinder2_dna perl $datatype eq "DNA" perl "" 1 pfinder2_protein perl $datatype eq "Protein" perl "" 1 pfinder2_morphology perl $datatype eq "Morphology" perl "" 1 create_configfile1 partition_finder.cfg perl !defined $cfg_select 2 perl "# ALIGNMENT FILE #\\n" . "alignment = infile.phy;\\n" create_configfile1b partition_finder.cfg 2 perl $supply_usertreetop && !defined $cfg_select perl "user_tree_topology = tree.phy\\n" create_configfile2 partition_finder.cfg perl !defined $cfg_select 3 perl "# BRANCHLENGTHS #\\n" . "branchlengths = $branch_lengths;\\n" create_configfile3 partition_finder.cfg 4 perl defined $models_choice && !defined $cfg_select perl "# MODELS OF EVOLUTION #\\n" . "models=$models_choice;\\n" . "model_selection = $model_selection;\\n" create_configfile3b partition_finder.cfg 4 perl $models_choicestring && !defined $cfg_select perl "# MODELS OF EVOLUTION #\\n" . "models=$models_choice2;\\n" . "model_selection = $model_selection;\\n" create_configfile4 partition_finder.cfg 5 perl $num_datablocks > 0 perl "# DATA BLOCKS #\\n" . "[data_blocks]\\n" create_configfile6 partition_finder.cfg perl defined $search_choice 20 perl "# SCHEMES #\\n" . "[schemes]\\n" . "search = $search_choice;\\n" number_nodes 2 scheduler.conf perl "nodes=1\\n" . "node_exclusive=0\\n" . "threads_per_process=8\\n" num_processes perl "--processes=8" 2 infile Input File infile.phy all_results * runtime 1 scheduler.conf Maximum Hours to Run (up to 168 hours) 0.5 The maximum hours to run must be less than 168 perl $runtime > 168.0 The maximum hours to run must be greater than 0.05 perl $runtime < 0.05 perl "runhours=$value\\n" 8 The job will run on 8 processors as configured. If it runs for the entire configured time, it will consume 8 X $runtime cpu hours perl $runtime > 0 Estimate the maximum time your job will need to run. We recommend testing initially with a < 0.5hr test run because Jobs set for 0.5 h or less depedendably run immediately in the "debug" queue. Once you are sure the configuration is correct, you then increase the time. The reason is that jobs > 0.5 h are submitted to the "normal" queue, where jobs configured for 1 or a few hours times may run sooner than jobs configured for the full 168 hours. datatype DNA Protein Morphology DNA Please select a data type perl !defined $datatype cfg_select Select cgf file (you can also create one below) partition_finder.cfg 2 If you provide a.cfg file, you please be sure it specifies alignment=infile.phy perl defined $cfg_select chk_only Just check the config file 0 perl $value ? "-c ":"" 3 show_pyexceptions If errors occur, print the python exceptions 0 perl $value ? "--show-python-exceptions":"" 4 run_quick Avoid anything slow (--quick) 0 perl $value ? "-q":"" 5 Avoid anything slow (like writing schemes at each step),useful for very large datasets. use_raxml Use RAxML (rather than PhyML, -r). See the manual. 0 perl $value ? "--raxml":"" 6 use_tiger Use kmeans=tiger (morphological data only) 0 perl $value ? "--kmeans tiger":"--kmeans entropy" 7 --kmeans=type This defines which sitewise values to use: entropy or tiger --kmeans entropy: use entropies for sitewise values --kmeans tiger: use TIGER rates for sitewise values (only valid for Morphology) rcluster_percent Proportion of possible schemes that the relaxed clustering algorithm will consider (-rcluster-percent) 10.0 perl " --rcluster-percent=$value" 8 --rcluster-percent=N This defines the proportion of possible schemes that the relaxed clustering algorithm will consider before it stops looking. The default is 10%. e.g. --rcluster-percent 10.0 rcluster_max Number of possible schemes that the relaxed clustering algorithm will consider (-rcluster-max) perl (defined $value) ? "--rcluster-percent=$value":"" 9 This defines the number of possible schemes that the relaxed clustering algorithm will consider before it stops looking. The default is to look at the larger value out of 1000, and 10 times the number of data blocks you have. e.g. --rcluster-max 1000 min_subset Minimum subset size that the kmeans and rcluster algorithm will accept ( --min-subset-size) 100 perl ($value && $value ne $vdef) ? " --min-subset-size=$value":"" 10 --min-subset-size=N This defines the minimum subset size that the kmeans and rcluster algorithm will accept. Subsets smaller than this will be merged at with other subsets at the end of the algorithm (for kmeans) or at the start of the algorithm (for rcluster). See manual for details. The default value for kmeans is 100. The default value for rcluster is to ignore this option. e.g. --min- subset-size 100 debug_option Provide comma-separated debug regions to output extra information about what the program is doing. (--debug-output=) perl " --debug-output=$value" 11 --debug-output=REGION,REGION,... (advanced option) Provide a list of debug regions to output extra information about what the program is doing. Possible regions are 'all' or any of {subset,subset_ops,neighbour,raxml,parser,model_util,results,entropy,alignment,threadpool,progress,main,config, pandas, reporter,kmeans,pandas.io.gbq,pandas.io,morph_tige,analysis_m,util,scheme,submodels,database,analysis,phyml,raxml_mode,model_load,phyml_mode}. all_states PartitionFinder should not produce subsets that do not have all possible states present(--all-states) perl $value ? " --all-states":"" 12 --all-states In the kmeans and rcluster algorithms, this stipulates that PartitionFinder should not produce subsets that do not have all possible states present. E.g. for DNAsequence data, all subsets in the final scheme must have A, C, T, and G nucleotides present. This can occasionally be useful for downstream analyses, particularly concerning amino acid datasets. config_file Create Configuration file branch_lengths Linked branch lengths? perl !defined $cfg_select linked unlinked Please specify linked or unlinked branch lengths perl !defined $branch_lengths models_choice Select the model perl !defined $cfg_select all allx beast mrbayes gamma gammai models_choicestring Specify a model list perl !defined $models_choice && !defined $cfg_select Please select a model or enter a model list perl !defined $model_choice && !$models_choicestring for models = list; enter any list of models appropriate for the data type. If you are not sure which models are possible, you can either study the models.csv file (in the /partfinder folder) or just try out a list. If you include a model that won’t work, PF2 will tell you which models didn’t work an error message before your analysis gets underway. Each model in the list should be separated by a comma. For example, if I was only interested in a few nucleotide models in PartitionFinder, I might do this: models = JC, JC+G, HKY, HKY+G, GTR, GTR+G; models_choice2 Enter the model list perl $models_choicestring model_selection Select the metric for model (model_selection) perl !defined $cfg_select aic aicc bic Please select a metric for the model perl !defined $model_selection In general, you should never use the AIC since the AICc is always preferable. However, it’s included in PartitionFinder mostly for historical reasons perl $value eq "aic" search_choice Select the search algorithm perl !defined $cfg_select all greedy rcluster rclusterf hcluster kmeans Please select the search algorthm perl !defined $search_choice supply_usertreetop Use a tree file in Newick format tree.phy num_datablocks How many datablocks do you have? perl !defined $cfg_select Sorry, no more than 10 data blocks are allowed. Please contact us if you need more. perl $value > 10 datablock_1a Enter the name of your first datablock perl $num_datablocks >= 1 datablock1_range1 Enter the beginning of the range perl $num_datablocks >= 1 datablock1_range2 Enter the end of the range perl $num_datablocks >= 1 datablock_codon This a codon analysis (will repeat the range /1,/2, and /3) perl $num_datablocks >= 1 datablock_hidden1 perl !$datablock_codon && $num_datablocks >= 1 partition_finder.cfg perl "$datablock_1a = $datablock1_range1 - $datablock1_range2;\\n" 6 datablock_codonname1 codon1 datablock_codonname2 codon2 datablock_codonname3 codon3 datablock_hidden2 perl $datablock_codon && $num_datablocks >= 1 partition_finder.cfg perl "$datablock_1a$datablock_codonname1 = $datablock1_range1 \- $datablock1_range2\\3;\\n" . "$datablock_1a$datablock_codonname2 = " . ($datablock1_range1 + 1) . " \- $datablock1_range2\\3;\\n" . "$datablock_1a$datablock_codonname3 = " . ($datablock1_range1 + 2) . " \- $datablock1_range2\\3;\\n" 6 datablock_2a Enter the name of your second datablock perl $num_datablocks >= 2 datablock2_range1 Enter the beginning of the range perl $num_datablocks >= 2 datablock2_range2 Enter the end of the range perl $num_datablocks >= 2 datablock2_codon This a codon analysis (will repeat the range /1,/2, and /3) perl $num_datablocks >= 2 datablock2_hidden1 perl !$datablock2_codon && $num_datablocks >= 2 partition_finder.cfg perl "$datablock_2a = $datablock2_range1 - $datablock2_range2;\\n" 7 datablock2_hidden2 perl $datablock2_codon && $num_datablocks >= 2 partition_finder.cfg perl "$datablock_2a$datablock_codonname1 = $datablock2_range1 \- $datablock2_range2\\3;\\n" . "$datablock_2a$datablock_codonname2 = " . ($datablock2_range1 + 1) . " \- $datablock2_range2\\3;\\n" . "$datablock_2a$datablock_codonname3 = " . ($datablock2_range1 + 2) . " \- $datablock2_range2\\3;\\n" 7 datablock_3a Enter the name of your third datablock perl $num_datablocks >= 3 datablock3_range1 Enter the beginning of the range perl $num_datablocks >= 3 datablock3_range2 Enter the end of the range perl $num_datablocks >= 3 datablock3_codon This a codon analysis (will repeat the range /1,/2, and /3) perl $num_datablocks >= 3 datablock3_hidden1 perl !$datablock3_codon && $num_datablocks >= 3 partition_finder.cfg perl "$datablock_3a = $datablock3_range1 - $datablock3_range2;\\n" 7 datablock3_hidden2 perl $datablock3_codon && $num_datablocks >= 3 partition_finder.cfg perl "$datablock_3a$datablock_codonname1 = $datablock3_range1 \- $datablock3_range2\\3;\\n" . "$datablock_3a$datablock_codonname2 = " . ($datablock3_range1 + 1) . " \- $datablock3_range2\\3;\\n" . "$datablock_3a$datablock_codonname3 = " . ($datablock3_range1 + 2) . " \- $datablock3_range2\\3;\\n" 7 datablock_4a Enter the name of your fourth datablock perl $num_datablocks >= 4 datablock4_range1 Enter the beginning of the range perl $num_datablocks >= 4 datablock4_range2 Enter the end of the range perl $num_datablocks >= 4 datablock4_codon This a codon analysis (will repeat the range /1,/2, and /3) perl $num_datablocks >= 4 datablock4_hidden1 perl !$datablock4_codon && $num_datablocks >= 4 partition_finder.cfg perl "$datablock_4a = $datablock4_range1 - $datablock4_range2;\\n" 7 datablock4_hidden2 perl $datablock4_codon && $num_datablocks >= 4 partition_finder.cfg perl "$datablock_4a$datablock_codonname1 = $datablock4_range1 \- $datablock4_range2\\3;\\n" . "$datablock_4a$datablock_codonname2 = " . ($datablock4_range1 + 1) . " \- $datablock4_range2\\3;\\n" . "$datablock_4a$datablock_codonname3 = " . ($datablock4_range1 + 2) . " \- $datablock4_range2\\3;\\n" 7 datablock_5a Enter the name of your fifth datablock perl $num_datablocks >= 5 datablock5_range1 Enter the beginning of the range perl $num_datablocks >= 5 datablock5_range2 Enter the end of the range perl $num_datablocks >= 5 datablock5_codon This a codon analysis (will repeat the range /1,/2, and /3) perl $num_datablocks >= 5 datablock5_hidden1 perl !$datablock5_codon && $num_datablocks >= 5 partition_finder.cfg perl "$datablock_5a = $datablock5_range1 - $datablock5_range2;\\n" 7 datablock5_hidden2 perl $datablock5_codon && $num_datablocks >= 5 partition_finder.cfg perl "$datablock_5a$datablock_codonname1 = $datablock5_range1 \- $datablock5_range2\\3;\\n" . "$datablock_5a$datablock_codonname2 = " . ($datablock5_range1 + 1) . " \- $datablock5_range2\\3;\\n" . "$datablock_5a$datablock_codonname3 = " . ($datablock5_range1 + 2) . " \- $datablock5_range2\\3;\\n" 7 datablock_6a Enter the name of your sixth datablock perl $num_datablocks >= 6 datablock6_range1 Enter the beginning of the range perl $num_datablocks >= 6 datablock6_range2 Enter the end of the range perl $num_datablocks >= 6 datablock6_codon This a codon analysis (will repeat the range /1,/2, and /3) perl $num_datablocks >= 6 datablock6_hidden1 perl !$datablock6_codon && $num_datablocks >= 6 partition_finder.cfg perl "$datablock_6a = $datablock6_range1 - $datablock6_range2;\\n" 7 datablock6_hidden2 perl $datablock6_codon && $num_datablocks >= 6 partition_finder.cfg perl "$datablock_6a$datablock_codonname1 = $datablock6_range1 \- $datablock6_range2\\3;\\n" . "$datablock_6a$datablock_codonname2 = " . ($datablock6_range1 + 1) . " \- $datablock6_range2\\3;\\n" . "$datablock_6a$datablock_codonname3 = " . ($datablock6_range1 + 2) . " \- $datablock6_range2\\3;\\n" 7 datablock_7a Enter the name of your seventh datablock perl $num_datablocks >= 7 datablock7_range1 Enter the beginning of the range perl $num_datablocks >= 7 datablock7_range2 Enter the end of the range perl $num_datablocks >= 7 datablock7_codon This a codon analysis (will repeat the range /1,/2, and /3) perl $num_datablocks >= 7 datablock7_hidden1 perl !$datablock7_codon && $num_datablocks >= 7 partition_finder.cfg perl "$datablock_7a = $datablock7_range1 - $datablock7_range2;\\n" 7 datablock7_hidden2 perl $datablock7_codon && $num_datablocks >= 7 partition_finder.cfg perl "$datablock_7a$datablock_codonname1 = $datablock7_range1 \- $datablock7_range2\\3;\\n" . "$datablock_7a$datablock_codonname2 = " . ($datablock7_range1 + 1) . " \- $datablock7_range2\\3;\\n" . "$datablock_7a$datablock_codonname3 = " . ($datablock7_range1 + 2) . " \- $datablock7_range2\\3;\\n" 7 datablock_8a Enter the name of your eigth datablock perl $num_datablocks >= 8 datablock8_range1 Enter the beginning of the range perl $num_datablocks >= 8 datablock8_range2 Enter the end of the range perl $num_datablocks >= 8 datablock8_codon This a codon analysis (will repeat the range /1,/2, and /3) perl $num_datablocks >= 8 datablock8_hidden1 perl !$datablock8_codon && $num_datablocks >= 8 partition_finder.cfg perl "$datablock_8a = $datablock8_range1 - $datablock8_range2;\\n" 7 datablock8_hidden2 perl $datablock8_codon && $num_datablocks >= 8 partition_finder.cfg perl "$datablock_8a$datablock_codonname1 = $datablock8_range1 \- $datablock8_range2\\3;\\n" . "$datablock_8a$datablock_codonname2 = " . ($datablock8_range1 + 1) . " \- $datablock8_range2\\3;\\n" . "$datablock_8a$datablock_codonname3 = " . ($datablock8_range1 + 2) . " \- $datablock8_range2\\3;\\n" 7 datablock_9a Enter the name of your ninth datablock perl $num_datablocks >= 9 datablock9_range1 Enter the beginning of the range perl $num_datablocks >= 9 datablock9_range2 Enter the end of the range perl $num_datablocks >= 9 datablock9_codon This a codon analysis (will repeat the range /1,/2, and /3) perl $num_datablocks >= 9 datablock9_hidden1 perl !$datablock9_codon && $num_datablocks >= 9 partition_finder.cfg perl "$datablock_9a = $datablock9_range1 - $datablock9_range2;\\n" 7 datablock9_hidden2 perl $datablock9_codon && $num_datablocks >= 9 partition_finder.cfg perl "$datablock_9a$datablock_codonname1 = $datablock9_range1 \- $datablock9_range2\\3;\\n" . "$datablock_9a$datablock_codonname2 = " . ($datablock9_range1 + 1) . " \- $datablock9_range2\\3;\\n" . "$datablock_9a$datablock_codonname3 = " . ($datablock9_range1 + 2) . " \- $datablock9_range2\\3;\\n" 7 datablock_10a Enter the name of your tenth datablock perl $num_datablocks >= 10 datablock10_range1 Enter the beginning of the range perl $num_datablocks >= 10 datablock10_range2 Enter the end of the range perl $num_datablocks >= 10 datablock10_codon This a codon analysis (will repeat the range /1,/2, and /3) perl $num_datablocks >= 10 datablock10_hidden1 perl !$datablock10_codon && $num_datablocks >= 10 partition_finder.cfg perl "$datablock_10a = $datablock10_range1 - $datablock10_range2;\\n" 7 datablock10_hidden2 perl $datablock10_codon && $num_datablocks >= 10 partition_finder.cfg perl "$datablock_10a$datablock_codonname1 = $datablock10_range1 \- $datablock10_range2\\3;\\n" . "$datablock_10a$datablock_codonname2 = " . ($datablock10_range1 + 1) . " \- $datablock10_range2\\3;\\n" . "$datablock_10a$datablock_codonname3 = " . ($datablock10_range1 + 2) . " \- $datablock10_range2\\3;\\n" 7