PartitionFinder2 on XSEDE2.1.1Selecting best-fit partitioning schemes and models of evolution Robert LanfearR.Lanfear, P.B. Frandsen, A.M. Wright, T. Senfeld, B. Calcott (2016) PartitionFinder 2: New Methods for Selecting Partitioned Models of Evolution for Molecular and Morphological Phylogenetic Analyses. Mol Biol Evol 2016 msw260. doi: 10.1093/molbev/msw260 Phylogeneticspartitionfinder2_xsedepfinder2_cometperl""1pfinder2_dnaperl$datatype eq "DNA"perl""1pfinder2_proteinperl$datatype eq "Protein"perl""1pfinder2_morphologyperl$datatype eq "Morphology"perl""1create_configfile1partition_finder.cfgperl!defined $cfg_select2perl
"# ALIGNMENT FILE #\\n" .
"alignment = infile.phy;\\n"
create_configfile1bpartition_finder.cfg2perl$supply_usertreetop && !defined $cfg_selectperl
"user_tree_topology = tree.phy;\\n"
create_configfile2partition_finder.cfgperl!defined $cfg_select3perl
"# BRANCHLENGTHS #\\n" .
"branchlengths = $branch_lengths;\\n"
create_configfile3partition_finder.cfg4perldefined $models_choice && !defined $cfg_selectperl
"# MODELS OF EVOLUTION #\\n" .
"models=$models_choice;\\n" .
"model_selection = $model_selection;\\n"
create_configfile3bpartition_finder.cfg4perldefined $morphological_models_choice && !defined $cfg_selectperl
"# MODELS OF EVOLUTION #\\n" .
"models=$morphological_models_choice;\\n" .
"model_selection = $model_selection;\\n"
create_configfile3bpartition_finder.cfg4perl$models_choicestring && !defined $cfg_selectperl
"# MODELS OF EVOLUTION #\\n" .
"models=$models_choice2;\\n" .
"model_selection = $model_selection;\\n"
create_configfile4partition_finder.cfg5perl$num_datablocks > 0perl
"# DATA BLOCKS #\\n" .
"[data_blocks]\\n"
create_configfile6partition_finder.cfgperldefined $search_choice20perl
"# SCHEMES #\\n" .
"[schemes]\\n" .
"search = $search_choice;\\n"
number_nodes2scheduler.confperl
"nodes=1\\n" .
"node_exclusive=0\\n" .
"threads_per_process=8\\n"
num_processesHow many concurrent processes?864218perl"--processes=$value"Please reduce the number of concurrent processes only if your job experiences an out of memory error. Otherwise, 8 is idealperl$value < 8 Reduce the number of concurrent processes if your job experiences an out of memory error. Otherwise, 8 is ideal2infileInput Fileinfile.phyall_results*runtime1scheduler.confMaximum Hours to Run (up to 168 hours)0.5The maximum hours to run must be less than 168perl$runtime > 168.0The maximum hours to run must be greater than 0.05perl$runtime < 0.05perl"runhours=$value\\n"8The job will run on 8 processors as configured. If it runs for the entire configured time, it will consume 8 X $runtime cpu hoursperl$runtime > 0Estimate the maximum time your job will need to run. We recommend testing initially with a < 0.5hr test run because Jobs set for 0.5 h or less depedendably run immediately in the "debug" queue.
Once you are sure the configuration is correct, you then increase the time. The reason is that jobs > 0.5 h are submitted to the "normal" queue, where jobs configured for 1 or a few hours times may
run sooner than jobs configured for the full 168 hours.
datatypePlease choose your data typeDNAProteinMorphologyDNAPlease select a data typeperl!defined $datatypecfg_selectSelect cfg file (you can also create one below)partition_finder.cfg2If you provide a.cfg file, you please be sure it specifies alignment=infile.phy. If you provide a.cfg file with a tree specified, please be sure it specifies user_tree_topology = tree.phyperldefined $cfg_selectchk_onlyJust check the config file0perl$value ? "-c ":""3show_pyexceptionsIf errors occur, print the python exceptions0perl$value ? "--show-python-exceptions":""4run_quickAvoid anything slow (--quick)0perl$value ? "-q":""5Avoid anything slow (like writing schemes at each step),useful for very large datasets.use_raxmlUse RAxML (rather than PhyML, -r). See the manual.0perl$value ? "--raxml":""You must use RAxML for Morpological Data Setsperl$datatype eq "Morphology" && !$use_raxml 6use_tigerUse kmeans=tiger (morphological data only)perl$datatype eq "Morphology"0perl$value ? "--kmeans tiger":"--kmeans entropy"7--kmeans=type This defines which sitewise values to use: entropy or tiger --kmeans entropy: use entropies for sitewise values --kmeans
tiger: use TIGER rates for sitewise values (only valid for Morphology) rcluster_percentProportion of possible schemes that the relaxed clustering algorithm will consider (-rcluster-percent)10.0perl" --rcluster-percent=$value"8--rcluster-percent=N This defines the proportion of possible schemes that the relaxed clustering algorithm will consider before it stops looking.
The default is 10%. e.g. --rcluster-percent 10.0rcluster_maxNumber of possible schemes that the relaxed clustering algorithm will consider (-rcluster-max)perl(defined $value) ? "--rcluster-max=$value":""9This defines the number of possible schemes that the relaxed clustering algorithm will consider before it stops looking.
The default is to look at the larger value out of 1000, and 10 times the number of data blocks you have. e.g. --rcluster-max 1000min_subsetMinimum subset size that the kmeans and rcluster algorithm will accept ( --min-subset-size)100perl($value && $value ne $vdef) ? " --min-subset-size=$value":""10--min-subset-size=N This defines the minimum subset size that the kmeans and rcluster algorithm will accept. Subsets smaller than this will be
merged at with other subsets at the end of the algorithm (for kmeans) or at the start of the algorithm (for rcluster). See manual for details. The default
value for kmeans is 100. The default value for rcluster is to ignore this option. e.g. --min- subset-size 100debug_optionProvide comma-separated debug regions to output extra information about what the program is doing. (--debug-output=)perl" --debug-output=$value"11--debug-output=REGION,REGION,... (advanced option) Provide a list of debug regions to output extra information about what the program is doing.
Possible regions are 'all' or any of {subset,subset_ops,neighbour,raxml,parser,model_util,results,entropy,alignment,threadpool,progress,main,config,
pandas, reporter,kmeans,pandas.io.gbq,pandas.io,morph_tige,analysis_m,util,scheme,submodels,database,analysis,phyml,raxml_mode,model_load,phyml_mode}. all_statesPartitionFinder should not produce subsets that do not have all possible states present(--all-states)perl$value ? " --all-states":""12 --all-states In the kmeans and rcluster algorithms, this stipulates that PartitionFinder should not produce subsets that do not have
all possible states present. E.g. for DNAsequence data, all subsets in the final scheme must have A, C, T, and G nucleotides present. This can
occasionally be useful for downstream analyses, particularly concerning amino acid datasets.config_fileCreate Configuration filebranch_lengthsLinked branch lengths?perl!defined $cfg_selectlinkedunlinkedPlease specify linked or unlinked branch lengthsperl!defined $branch_lengthsmodels_choiceSelect the modelperl!defined $cfg_select && $datatype ne "Morphology"allallxbeastmrbayesgammagammaimorphological_models_choiceSelect the model for Morphological dataperl!defined $cfg_select && $datatype eq "Morphology"BINARY+GBINARYG+AMULTISTATE+GMULTISTATE+G+Amodels_choicestringSpecify a model listperl!defined $models_choice && !defined $cfg_selectPlease select a model or enter a model listperl!defined $models_choice && !$models_choicestring && !defined $morphological_models_choicefor models = list; enter any list of models appropriate for the data type. If you are not sure which models are possible, you can either study the
models.csv file (in the /partfinder folder) or just try out a list. If you include a model that won’t work, PF2 will tell you which models didn’t work
an error message before your analysis gets underway. Each model in the list should be separated by a comma. For example, if I was only interested in a few nucleotide models in PartitionFinder, I might do this:
models = JC, JC+G, HKY, HKY+G, GTR, GTR+G;models_choice2Enter the model listperl$models_choicestringmodel_selectionSelect the metric for model (model_selection)perl!defined $cfg_selectaicaiccbicPlease select a metric for the modelperl!defined $model_selectionIn general, you should never use the AIC since the AICc is always preferable. However, it’s included in PartitionFinder mostly for historical reasonsperl$value eq "aic"search_choiceSelect the search algorithmperl!defined $cfg_selectallgreedyrclusterrclusterfhclusterkmeansPlease select the search algorthmperl!defined $search_choicesupply_usertreetopUse a tree file in Newick formattree.phynum_datablocksHow many datablocks do you have?perl!defined $cfg_selectPlease specify the number of data blocks.perl!defined $valueSorry, no more than 14 data blocks are allowed. Please contact us if you need more.perl$value > 14datablock_1aEnter the name of your first datablockperl$num_datablocks >= 1datablock1_range1Enter the beginning of the rangeperl$num_datablocks >= 1datablock1_range2Enter the end of the rangeperl$num_datablocks >= 1datablock_codonThis a codon analysis (will repeat the range /1,/2, and /3) perl$num_datablocks >= 1datablock_hidden1perl!$datablock_codon && $num_datablocks >= 1partition_finder.cfgperl"$datablock_1a = $datablock1_range1 - $datablock1_range2;\\n"6datablock_codonname1codon1datablock_codonname2codon2datablock_codonname3codon3datablock_hidden2perl$datablock_codon && $num_datablocks >= 1partition_finder.cfgperl"$datablock_1a$datablock_codonname1 = $datablock1_range1 \- $datablock1_range2\\3;\\n" .
"$datablock_1a$datablock_codonname2 = " . ($datablock1_range1 + 1) . " \- $datablock1_range2\\3;\\n" .
"$datablock_1a$datablock_codonname3 = " . ($datablock1_range1 + 2) . " \- $datablock1_range2\\3;\\n"
6datablock_2aEnter the name of your second datablockperl$num_datablocks >= 2datablock2_range1Enter the beginning of the rangeperl$num_datablocks >= 2datablock2_range2Enter the end of the rangeperl$num_datablocks >= 2datablock2_codonThis a codon analysis (will repeat the range /1,/2, and /3) perl$num_datablocks >= 2datablock2_hidden1perl!$datablock2_codon && $num_datablocks >= 2partition_finder.cfgperl"$datablock_2a = $datablock2_range1 - $datablock2_range2;\\n"7datablock2_hidden2perl$datablock2_codon && $num_datablocks >= 2partition_finder.cfgperl"$datablock_2a$datablock_codonname1 = $datablock2_range1 \- $datablock2_range2\\3;\\n" .
"$datablock_2a$datablock_codonname2 = " . ($datablock2_range1 + 1) . " \- $datablock2_range2\\3;\\n" .
"$datablock_2a$datablock_codonname3 = " . ($datablock2_range1 + 2) . " \- $datablock2_range2\\3;\\n"
7datablock_3aEnter the name of your third datablockperl$num_datablocks >= 3datablock3_range1Enter the beginning of the rangeperl$num_datablocks >= 3datablock3_range2Enter the end of the rangeperl$num_datablocks >= 3datablock3_codonThis a codon analysis (will repeat the range /1,/2, and /3) perl$num_datablocks >= 3datablock3_hidden1perl!$datablock3_codon && $num_datablocks >= 3partition_finder.cfgperl"$datablock_3a = $datablock3_range1 - $datablock3_range2;\\n"7datablock3_hidden2perl$datablock3_codon && $num_datablocks >= 3partition_finder.cfgperl"$datablock_3a$datablock_codonname1 = $datablock3_range1 \- $datablock3_range2\\3;\\n" .
"$datablock_3a$datablock_codonname2 = " . ($datablock3_range1 + 1) . " \- $datablock3_range2\\3;\\n" .
"$datablock_3a$datablock_codonname3 = " . ($datablock3_range1 + 2) . " \- $datablock3_range2\\3;\\n"
7datablock_4aEnter the name of your fourth datablockperl$num_datablocks >= 4datablock4_range1Enter the beginning of the rangeperl$num_datablocks >= 4datablock4_range2Enter the end of the rangeperl$num_datablocks >= 4datablock4_codonThis a codon analysis (will repeat the range /1,/2, and /3) perl$num_datablocks >= 4datablock4_hidden1perl!$datablock4_codon && $num_datablocks >= 4partition_finder.cfgperl"$datablock_4a = $datablock4_range1 - $datablock4_range2;\\n"7datablock4_hidden2perl$datablock4_codon && $num_datablocks >= 4partition_finder.cfgperl"$datablock_4a$datablock_codonname1 = $datablock4_range1 \- $datablock4_range2\\3;\\n" .
"$datablock_4a$datablock_codonname2 = " . ($datablock4_range1 + 1) . " \- $datablock4_range2\\3;\\n" .
"$datablock_4a$datablock_codonname3 = " . ($datablock4_range1 + 2) . " \- $datablock4_range2\\3;\\n"
7datablock_5aEnter the name of your fifth datablockperl$num_datablocks >= 5datablock5_range1Enter the beginning of the rangeperl$num_datablocks >= 5datablock5_range2Enter the end of the rangeperl$num_datablocks >= 5datablock5_codonThis a codon analysis (will repeat the range /1,/2, and /3) perl$num_datablocks >= 5datablock5_hidden1perl!$datablock5_codon && $num_datablocks >= 5partition_finder.cfgperl"$datablock_5a = $datablock5_range1 - $datablock5_range2;\\n"7datablock5_hidden2perl$datablock5_codon && $num_datablocks >= 5partition_finder.cfgperl"$datablock_5a$datablock_codonname1 = $datablock5_range1 \- $datablock5_range2\\3;\\n" .
"$datablock_5a$datablock_codonname2 = " . ($datablock5_range1 + 1) . " \- $datablock5_range2\\3;\\n" .
"$datablock_5a$datablock_codonname3 = " . ($datablock5_range1 + 2) . " \- $datablock5_range2\\3;\\n"
7datablock_6aEnter the name of your sixth datablockperl$num_datablocks >= 6datablock6_range1Enter the beginning of the rangeperl$num_datablocks >= 6datablock6_range2Enter the end of the rangeperl$num_datablocks >= 6datablock6_codonThis a codon analysis (will repeat the range /1,/2, and /3) perl$num_datablocks >= 6datablock6_hidden1perl!$datablock6_codon && $num_datablocks >= 6partition_finder.cfgperl"$datablock_6a = $datablock6_range1 - $datablock6_range2;\\n"7datablock6_hidden2perl$datablock6_codon && $num_datablocks >= 6partition_finder.cfgperl"$datablock_6a$datablock_codonname1 = $datablock6_range1 \- $datablock6_range2\\3;\\n" .
"$datablock_6a$datablock_codonname2 = " . ($datablock6_range1 + 1) . " \- $datablock6_range2\\3;\\n" .
"$datablock_6a$datablock_codonname3 = " . ($datablock6_range1 + 2) . " \- $datablock6_range2\\3;\\n"
7datablock_7aEnter the name of your seventh datablockperl$num_datablocks >= 7datablock7_range1Enter the beginning of the rangeperl$num_datablocks >= 7datablock7_range2Enter the end of the rangeperl$num_datablocks >= 7datablock7_codonThis a codon analysis (will repeat the range /1,/2, and /3) perl$num_datablocks >= 7datablock7_hidden1perl!$datablock7_codon && $num_datablocks >= 7partition_finder.cfgperl"$datablock_7a = $datablock7_range1 - $datablock7_range2;\\n"7datablock7_hidden2perl$datablock7_codon && $num_datablocks >= 7partition_finder.cfgperl"$datablock_7a$datablock_codonname1 = $datablock7_range1 \- $datablock7_range2\\3;\\n" .
"$datablock_7a$datablock_codonname2 = " . ($datablock7_range1 + 1) . " \- $datablock7_range2\\3;\\n" .
"$datablock_7a$datablock_codonname3 = " . ($datablock7_range1 + 2) . " \- $datablock7_range2\\3;\\n"
7datablock_8aEnter the name of your eigth datablockperl$num_datablocks >= 8datablock8_range1Enter the beginning of the rangeperl$num_datablocks >= 8datablock8_range2Enter the end of the rangeperl$num_datablocks >= 8datablock8_codonThis a codon analysis (will repeat the range /1,/2, and /3) perl$num_datablocks >= 8datablock8_hidden1perl!$datablock8_codon && $num_datablocks >= 8partition_finder.cfgperl"$datablock_8a = $datablock8_range1 - $datablock8_range2;\\n"7datablock8_hidden2perl$datablock8_codon && $num_datablocks >= 8partition_finder.cfgperl"$datablock_8a$datablock_codonname1 = $datablock8_range1 \- $datablock8_range2\\3;\\n" .
"$datablock_8a$datablock_codonname2 = " . ($datablock8_range1 + 1) . " \- $datablock8_range2\\3;\\n" .
"$datablock_8a$datablock_codonname3 = " . ($datablock8_range1 + 2) . " \- $datablock8_range2\\3;\\n"
7datablock_9aEnter the name of your ninth datablockperl$num_datablocks >= 9datablock9_range1Enter the beginning of the rangeperl$num_datablocks >= 9datablock9_range2Enter the end of the rangeperl$num_datablocks >= 9datablock9_codonThis a codon analysis (will repeat the range /1,/2, and /3) perl$num_datablocks >= 9datablock9_hidden1perl!$datablock9_codon && $num_datablocks >= 9partition_finder.cfgperl"$datablock_9a = $datablock9_range1 - $datablock9_range2;\\n"7datablock9_hidden2perl$datablock9_codon && $num_datablocks >= 9partition_finder.cfgperl"$datablock_9a$datablock_codonname1 = $datablock9_range1 \- $datablock9_range2\\3;\\n" .
"$datablock_9a$datablock_codonname2 = " . ($datablock9_range1 + 1) . " \- $datablock9_range2\\3;\\n" .
"$datablock_9a$datablock_codonname3 = " . ($datablock9_range1 + 2) . " \- $datablock9_range2\\3;\\n"
7datablock_10aEnter the name of your tenth datablockperl$num_datablocks >= 10datablock10_range1Enter the beginning of the rangeperl$num_datablocks >= 10datablock10_range2Enter the end of the rangeperl$num_datablocks >= 10datablock10_codonThis a codon analysis (will repeat the range /1,/2, and /3) perl$num_datablocks >= 10datablock10_hidden1perl!$datablock10_codon && $num_datablocks >= 10partition_finder.cfgperl"$datablock_10a = $datablock10_range1 - $datablock10_range2;\\n"7datablock10_hidden2perl$datablock10_codon && $num_datablocks >= 10partition_finder.cfgperl"$datablock_10a$datablock_codonname1 = $datablock10_range1 \- $datablock10_range2\\3;\\n" .
"$datablock_10a$datablock_codonname2 = " . ($datablock10_range1 + 1) . " \- $datablock10_range2\\3;\\n" .
"$datablock_10a$datablock_codonname3 = " . ($datablock10_range1 + 2) . " \- $datablock10_range2\\3;\\n"
7datablock_11aEnter the name of your eleventh datablockperl$num_datablocks >= 11datablock11_range1Enter the beginning of the rangeperl$num_datablocks >= 11datablock11_range2Enter the end of the rangeperl$num_datablocks >= 11datablock11_codonThis a codon analysis (will repeat the range /1,/2, and /3) perl$num_datablocks >= 11datablock11_hidden1perl!$datablock11_codon && $num_datablocks >= 11partition_finder.cfgperl"$datablock_11a = $datablock11_range1 - $datablock11_range2;\\n"7datablock11_hidden2perl$datablock11_codon && $num_datablocks >= 11partition_finder.cfgperl"$datablock_11a$datablock_codonname1 = $datablock11_range1 \- $datablock11_range2\\3;\\n" .
"$datablock_11a$datablock_codonname2 = " . ($datablock11_range1 + 1) . " \- $datablock11_range2\\3;\\n" .
"$datablock_11a$datablock_codonname3 = " . ($datablock11_range1 + 2) . " \- $datablock11_range2\\3;\\n"
7datablock_12aEnter the name of your twelfth datablockperl$num_datablocks >= 12datablock12_range1Enter the beginning of the rangeperl$num_datablocks >= 12datablock12_range2Enter the end of the rangeperl$num_datablocks >= 12datablock12_codonThis a codon analysis (will repeat the range /1,/2, and /3) perl$num_datablocks >= 12datablock12_hidden1perl!$datablock12_codon && $num_datablocks >= 12partition_finder.cfgperl"$datablock_12a = $datablock12_range1 - $datablock12_range2;\\n"7datablock12_hidden2perl$datablock12_codon && $num_datablocks >= 12partition_finder.cfgperl"$datablock_12a$datablock_codonname1 = $datablock12_range1 \- $datablock12_range2\\3;\\n" .
"$datablock_12a$datablock_codonname2 = " . ($datablock12_range1 + 1) . " \- $datablock12_range2\\3;\\n" .
"$datablock_12a$datablock_codonname3 = " . ($datablock12_range1 + 2) . " \- $datablock12_range2\\3;\\n"
7datablock_13aEnter the name of your thirteenth datablockperl$num_datablocks >= 13datablock13_range1Enter the beginning of the rangeperl$num_datablocks >= 13datablock13_range2Enter the end of the rangeperl$num_datablocks >= 13datablock13_codonThis a codon analysis (will repeat the range /1,/2, and /3) perl$num_datablocks >= 13datablock13_hidden1perl!$datablock13_codon && $num_datablocks >= 13partition_finder.cfgperl"$datablock_13a = $datablock13_range1 - $datablock13_range2;\\n"7datablock13_hidden2perl$datablock13_codon && $num_datablocks >= 13partition_finder.cfgperl"$datablock_13a$datablock_codonname1 = $datablock13_range1 \- $datablock13_range2\\3;\\n" .
"$datablock_13a$datablock_codonname2 = " . ($datablock13_range1 + 1) . " \- $datablock13_range2\\3;\\n" .
"$datablock_13a$datablock_codonname3 = " . ($datablock13_range1 + 2) . " \- $datablock13_range2\\3;\\n"
7datablock_14aEnter the name of your fourteenth datablockperl$num_datablocks >= 14datablock14_range1Enter the beginning of the rangeperl$num_datablocks >= 14datablock14_range2Enter the end of the rangeperl$num_datablocks >= 14datablock14_codonThis a codon analysis (will repeat the range /1,/2, and /3) perl$num_datablocks >= 14datablock14_hidden1perl!$datablock10_codon && $num_datablocks >= 14partition_finder.cfgperl"$datablock_14a = $datablock14_range1 - $datablock14_range2;\\n"7datablock14_hidden2perl$datablock14_codon && $num_datablocks >= 14partition_finder.cfgperl"$datablock_14a$datablock_codonname1 = $datablock14_range1 \- $datablock14_range2\\3;\\n" .
"$datablock_14a$datablock_codonname2 = " . ($datablock14_range1 + 1) . " \- $datablock14_range2\\3;\\n" .
"$datablock_14a$datablock_codonname3 = " . ($datablock14_range1 + 2) . " \- $datablock14_range2\\3;\\n"
7