PartitionFinder2 on XSEDE2.1.1Selecting best-fit partitioning schemes and models of evolution Robert LanfearR.Lanfear, P.B. Frandsen, A.M. Wright, T. Senfeld, B. Calcott (2016) PartitionFinder 2: New Methods for Selecting Partitioned Models of Evolution for Molecular and Morphological Phylogenetic Analyses. Mol Biol Evol 2016 msw260. doi: 10.1093/molbev/msw260 Phylogeneticspartitionfinder2_xsedepfinder2_cometperl""1pfinder2_dnaperl$datatype eq "DNA"perl""1pfinder2_proteinperl$datatype eq "Protein"perl""1pfinder2_morphologyperl$datatype eq "Morphology"perl""1create_configfile1partition_finder.cfgperl!defined $cfg_select2perl
"# ALIGNMENT FILE #\\n" .
"alignment = infile.phy;\\n"
create_configfile1bpartition_finder.cfg2perl$supply_usertreetop && !defined $cfg_selectperl
"user_tree_topology = tree.phy\\n"
create_configfile2partition_finder.cfgperl!defined $cfg_select3perl
"# BRANCHLENGTHS #\\n" .
"branchlengths = $branch_lengths;\\n"
create_configfile3partition_finder.cfg4perldefined $models_choice && !defined $cfg_selectperl
"# MODELS OF EVOLUTION #\\n" .
"models=$models_choice;\\n" .
"model_selection = $model_selection;\\n"
create_configfile3bpartition_finder.cfg4perldefined $models_choice2 && !defined $cfg_selectperl
"# MODELS OF EVOLUTION #\\n" .
"models=$models_choice2;\\n" .
"model_selection = $model_selection;\\n"
create_configfile4partition_finder.cfg5perl$num_datablocks > 0perl
"# DATA BLOCKS #\\n" .
"[data_blocks]\\n"
create_configfile6partition_finder.cfgperldefined $search_choice20perl
"# SCHEMES #\\n" .
"[schemes]\\n" .
"search = $search_choice;\\n"
number_nodes2scheduler.confperl
"nodes=1\\n" .
"node_exclusive=0\\n" .
"threads_per_process=$num_processes\\n"
infileInput Fileinfile.phyall_results*runtime1scheduler.confMaximum Hours to Run (up to 168 hours)0.5The maximum hours to run must be less than 168perl$runtime > 168.0The maximum hours to run must be greater than 0.05perl$runtime < 0.05perl"runhours=$value\\n"8The job will run on $num_processes processors as configured. If it runs for the entire configured time, it will consume $num_processes X $runtime cpu hoursperl$runtime > 0Estimate the maximum time your job will need to run. We recommend testimg initially with a < 0.5hr test run because Jobs set for 0.5 h or less depedendably run immediately in the "debug" queue.
Once you are sure the configuration is correct, you then increase the time. The reason is that jobs > 0.5 h are submitted to the "normal" queue, where jobs configured for 1 or a few hours times may
run sooner than jobs configured for the full 168 hours.
num_processesHow many concurrent processes?1perl"--processes=$value"2datatypeDNAProteinMorphologyDNAPlease select a data typeperl!defined $datatypecfg_selectSelect cgf file (you can also create one below)partition_finder.cfg2If you provide a.cfg file, you please be sure it specifies alignment=infile.phyperldefined $cfg_selectchk_onlyJust check the config file0perl$value ? "-c ":""3show_pyexceptionsIf errors occur, print the python exceptions0perl$value ? "--show-python-exceptions":""4run_quickAvoid anything slow (--quick)0perl$value ? "-q":""5Avoid anything slow (like writing schemes at each step),useful for very large datasets.use_raxmlUse RAxML (rather than PhyML, -r). See the manual.0perl$value ? "--raxml":""6use_tigerUse kmeans=tiger (morphological data only)0perl$value ? "--kmeans tiger":"--kmeans entropy"7--kmeans=type This defines which sitewise values to use: entropy or tiger --kmeans entropy: use entropies for sitewise values --kmeans
tiger: use TIGER rates for sitewise values (only valid for Morphology) rcluster_percentProportion of possible schemes that the relaxed clustering algorithm will consider (-rcluster-percent)10.0perl" --rcluster-percent=$value"8--rcluster-percent=N This defines the proportion of possible schemes that the relaxed clustering algorithm will consider before it stops looking.
The default is 10%. e.g. --rcluster-percent 10.0rcluster_maxNumber of possible schemes that the relaxed clustering algorithm will consider (-rcluster-max)perl(defined $value) ? "--rcluster-percent=$value":""9This defines the number of possible schemes that the relaxed clustering algorithm will consider before it stops looking.
The default is to look at the larger value out of 1000, and 10 times the number of data blocks you have. e.g. --rcluster-max 1000min_subsetMinimum subset size that the kmeans and rcluster algorithm will accept ( --min-subset-size)100perl($value && $value ne $vdef) ? " --min-subset-size=$value":""10--min-subset-size=N This defines the minimum subset size that the kmeans and rcluster algorithm will accept. Subsets smaller than this will be
merged at with other subsets at the end of the algorithm (for kmeans) or at the start of the algorithm (for rcluster). See manual for details. The default
value for kmeans is 100. The default value for rcluster is to ignore this option. e.g. --min- subset-size 100debug_optionProvide comma-separated debug regions to output extra information about what the program is doing. (--debug-output=)perl" --debug-output=$value"11--debug-output=REGION,REGION,... (advanced option) Provide a list of debug regions to output extra information about what the program is doing.
Possible regions are 'all' or any of {subset,subset_ops,neighbour,raxml,parser,model_util,results,entropy,alignment,threadpool,progress,main,config,
pandas, reporter,kmeans,pandas.io.gbq,pandas.io,morph_tige,analysis_m,util,scheme,submodels,database,analysis,phyml,raxml_mode,model_load,phyml_mode}. all_statesPartitionFinder should not produce subsets that do not have all possible states present(--all-states)perl$value ? " --all-states":""12 --all-states In the kmeans and rcluster algorithms, this stipulates that PartitionFinder should not produce subsets that do not have
all possible states present. E.g. for DNAsequence data, all subsets in the final scheme must have A, C, T, and G nucleotides present. This can
occasionally be useful for downstream analyses, particularly concerning amino acid datasets.config_fileCreate Configuration filebranch_lengthsLinked branch lengths?perl!defined $cfg_selectlinkedunlinkedPlease specify linked or unlinked branch lengthsperl!defined $branch_lengthsmodels_choiceSelect the modelperl!defined $cfg_selectallallxbeastmrbayesgammagammaimodels_choice2Specify a model listperl!defined $models_choice && !defined $cfg_selectSorry, you cant specify a model if you have already selected a valueperldefined $model_choice && defined $models_choice2model_selectionSelect the metric for model (model_selection)perl!defined $cfg_selectaicaiccbicIn general, you should never use the AIC since the AICc is always preferable. However, it’s included in PartitionFinder mostly for historical reasonsperl$value eq "aic"num_datablocksHow many datablocks do you have?perl!defined $cfg_selectSorry, no more than 10 data blocks are allowed. Please contact us if you need more.perl$value > 10datablock_1aEnter the name of your first datablockperl$num_datablocks >= 1datablock1_range1Enter the beginning of the rangeperl$num_datablocks >= 1datablock1_range2Enter the end of the rangeperl$num_datablocks >= 1datablock_codonThis a codon analysis (will repeat the range /1,/2, and /3) perl$num_datablocks >= 1datablock_hidden1perl!$datablock_codon && $num_datablocks >= 1partition_finder.cfgperl"$datablock_1a = $datablock1_range1 - $datablock1_range2;\\n"6datablock_codonname1codon1datablock_codonname2codon2datablock_codonname3codon3datablock_hidden2perl$datablock_codon && $num_datablocks >= 1partition_finder.cfgperl"$datablock_1a$datablock_codonname1 = $datablock1_range1 \- $datablock1_range2\\3;\\n" .
"$datablock_1a$datablock_codonname2 = " . ($datablock1_range1 + 1) . " \- $datablock1_range2\\3;\\n" .
"$datablock_1a$datablock_codonname3 = " . ($datablock1_range1 + 2) . " \- $datablock1_range2\\3;\\n"
6datablock_2aEnter the name of your second datablockperl$num_datablocks >= 2datablock2_range1Enter the beginning of the rangeperl$num_datablocks >= 2datablock2_range2Enter the end of the rangeperl$num_datablocks >= 2datablock2_codonThis a codon analysis (will repeat the range /1,/2, and /3) perl$num_datablocks >= 2datablock2_hidden1perl!$datablock2_codon && $num_datablocks >= 2partition_finder.cfgperl"$datablock_2a = $datablock2_range1 - $datablock2_range2;\\n"7datablock2_hidden2perl$datablock2_codon && $num_datablocks >= 2partition_finder.cfgperl"$datablock_2a$datablock_codonname1 = $datablock2_range1 \- $datablock2_range2\\3;\\n" .
"$datablock_2a$datablock_codonname2 = " . ($datablock2_range1 + 1) . " \- $datablock2_range2\\3;\\n" .
"$datablock_2a$datablock_codonname3 = " . ($datablock2_range1 + 2) . " \- $datablock2_range2\\3;\\n"
7datablock_3aEnter the name of your third datablockperl$num_datablocks >= 3datablock3_range1Enter the beginning of the rangeperl$num_datablocks >= 3datablock3_range2Enter the end of the rangeperl$num_datablocks >= 3datablock3_codonThis a codon analysis (will repeat the range /1,/2, and /3) perl$num_datablocks >= 3datablock3_hidden1perl!$datablock3_codon && $num_datablocks >= 3partition_finder.cfgperl"$datablock_3a = $datablock3_range1 - $datablock3_range2;\\n"7datablock3_hidden2perl$datablock3_codon && $num_datablocks >= 3partition_finder.cfgperl"$datablock_3a$datablock_codonname1 = $datablock3_range1 \- $datablock3_range2\\3;\\n" .
"$datablock_3a$datablock_codonname2 = " . ($datablock3_range1 + 1) . " \- $datablock3_range2\\3;\\n" .
"$datablock_3a$datablock_codonname3 = " . ($datablock3_range1 + 2) . " \- $datablock3_range2\\3;\\n"
7datablock_4aEnter the name of your fourth datablockperl$num_datablocks >= 4datablock4_range1Enter the beginning of the rangeperl$num_datablocks >= 4datablock4_range2Enter the end of the rangeperl$num_datablocks >= 4datablock4_codonThis a codon analysis (will repeat the range /1,/2, and /3) perl$num_datablocks >= 4datablock4_hidden1perl!$datablock4_codon && $num_datablocks >= 4partition_finder.cfgperl"$datablock_4a = $datablock4_range1 - $datablock4_range2;\\n"7datablock4_hidden2perl$datablock4_codon && $num_datablocks >= 4partition_finder.cfgperl"$datablock_4a$datablock_codonname1 = $datablock4_range1 \- $datablock4_range2\\3;\\n" .
"$datablock_4a$datablock_codonname2 = " . ($datablock4_range1 + 1) . " \- $datablock4_range2\\3;\\n" .
"$datablock_4a$datablock_codonname3 = " . ($datablock4_range1 + 2) . " \- $datablock4_range2\\3;\\n"
7datablock_5aEnter the name of your fifth datablockperl$num_datablocks >= 5datablock5_range1Enter the beginning of the rangeperl$num_datablocks >= 5datablock5_range2Enter the end of the rangeperl$num_datablocks >= 5datablock5_codonThis a codon analysis (will repeat the range /1,/2, and /3) perl$num_datablocks >= 5datablock5_hidden1perl!$datablock5_codon && $num_datablocks >= 5partition_finder.cfgperl"$datablock_5a = $datablock5_range1 - $datablock5_range2;\\n"7datablock5_hidden2perl$datablock5_codon && $num_datablocks >= 5partition_finder.cfgperl"$datablock_5a$datablock_codonname1 = $datablock5_range1 \- $datablock5_range2\\3;\\n" .
"$datablock_5a$datablock_codonname2 = " . ($datablock5_range1 + 1) . " \- $datablock5_range2\\3;\\n" .
"$datablock_5a$datablock_codonname3 = " . ($datablock5_range1 + 2) . " \- $datablock5_range2\\3;\\n"
7datablock_6aEnter the name of your sixth datablockperl$num_datablocks >= 6datablock6_range1Enter the beginning of the rangeperl$num_datablocks >= 6datablock6_range2Enter the end of the rangeperl$num_datablocks >= 6datablock6_codonThis a codon analysis (will repeat the range /1,/2, and /3) perl$num_datablocks >= 6datablock6_hidden1perl!$datablock6_codon && $num_datablocks >= 6partition_finder.cfgperl"$datablock_6a = $datablock6_range1 - $datablock6_range2;\\n"7datablock6_hidden2perl$datablock6_codon && $num_datablocks >= 6partition_finder.cfgperl"$datablock_6a$datablock_codonname1 = $datablock6_range1 \- $datablock6_range2\\3;\\n" .
"$datablock_6a$datablock_codonname2 = " . ($datablock6_range1 + 1) . " \- $datablock6_range2\\3;\\n" .
"$datablock_6a$datablock_codonname3 = " . ($datablock6_range1 + 2) . " \- $datablock6_range2\\3;\\n"
7datablock_7aEnter the name of your seventh datablockperl$num_datablocks >= 7datablock7_range1Enter the beginning of the rangeperl$num_datablocks >= 7datablock7_range2Enter the end of the rangeperl$num_datablocks >= 7datablock7_codonThis a codon analysis (will repeat the range /1,/2, and /3) perl$num_datablocks >= 7datablock7_hidden1perl!$datablock7_codon && $num_datablocks >= 7partition_finder.cfgperl"$datablock_7a = $datablock7_range1 - $datablock7_range2;\\n"7datablock7_hidden2perl$datablock7_codon && $num_datablocks >= 7partition_finder.cfgperl"$datablock_7a$datablock_codonname1 = $datablock7_range1 \- $datablock7_range2\\3;\\n" .
"$datablock_7a$datablock_codonname2 = " . ($datablock7_range1 + 1) . " \- $datablock7_range2\\3;\\n" .
"$datablock_7a$datablock_codonname3 = " . ($datablock7_range1 + 2) . " \- $datablock7_range2\\3;\\n"
7datablock_8aEnter the name of your eigth datablockperl$num_datablocks >= 8datablock8_range1Enter the beginning of the rangeperl$num_datablocks >= 8datablock8_range2Enter the end of the rangeperl$num_datablocks >= 8datablock8_codonThis a codon analysis (will repeat the range /1,/2, and /3) perl$num_datablocks >= 8datablock8_hidden1perl!$datablock8_codon && $num_datablocks >= 8partition_finder.cfgperl"$datablock_8a = $datablock8_range1 - $datablock8_range2;\\n"7datablock8_hidden2perl$datablock8_codon && $num_datablocks >= 8partition_finder.cfgperl"$datablock_8a$datablock_codonname1 = $datablock8_range1 \- $datablock8_range2\\3;\\n" .
"$datablock_8a$datablock_codonname2 = " . ($datablock8_range1 + 1) . " \- $datablock8_range2\\3;\\n" .
"$datablock_8a$datablock_codonname3 = " . ($datablock8_range1 + 2) . " \- $datablock8_range2\\3;\\n"
7datablock_9aEnter the name of your ninth datablockperl$num_datablocks >= 9datablock9_range1Enter the beginning of the rangeperl$num_datablocks >= 9datablock9_range2Enter the end of the rangeperl$num_datablocks >= 9datablock9_codonThis a codon analysis (will repeat the range /1,/2, and /3) perl$num_datablocks >= 9datablock9_hidden1perl!$datablock9_codon && $num_datablocks >= 9partition_finder.cfgperl"$datablock_9a = $datablock9_range1 - $datablock9_range2;\\n"7datablock9_hidden2perl$datablock9_codon && $num_datablocks >= 9partition_finder.cfgperl"$datablock_9a$datablock_codonname1 = $datablock9_range1 \- $datablock9_range2\\3;\\n" .
"$datablock_9a$datablock_codonname2 = " . ($datablock9_range1 + 1) . " \- $datablock9_range2\\3;\\n" .
"$datablock_9a$datablock_codonname3 = " . ($datablock9_range1 + 2) . " \- $datablock9_range2\\3;\\n"
7datablock_10aEnter the name of your tenth datablockperl$num_datablocks >= 10datablock10_range1Enter the beginning of the rangeperl$num_datablocks >= 10datablock10_range2Enter the end of the rangeperl$num_datablocks >= 10datablock10_codonThis a codon analysis (will repeat the range /1,/2, and /3) perl$num_datablocks >= 10datablock10_hidden1perl!$datablock10_codon && $num_datablocks >= 10partition_finder.cfgperl"$datablock_10a = $datablock10_range1 - $datablock10_range2;\\n"7datablock10_hidden2perl$datablock10_codon && $num_datablocks >= 10partition_finder.cfgperl"$datablock_10a$datablock_codonname1 = $datablock10_range1 \- $datablock10_range2\\3;\\n" .
"$datablock_10a$datablock_codonname2 = " . ($datablock10_range1 + 1) . " \- $datablock10_range2\\3;\\n" .
"$datablock_10a$datablock_codonname3 = " . ($datablock10_range1 + 2) . " \- $datablock10_range2\\3;\\n"
7search_choiceSelect the modelperl!defined $cfg_selectallgreedyrclusterrclusterfhclusterkmeansPlease select a Search Choiceperl!defined $search_choicesupply_usertreetopUse a tree file in Newick formattree.phy