RAxML-HPC BlackBox8.2.12Phylogenetic tree inference using maximum likelihood/rapid bootstrapping on XSEDE.Alexandros StamatakisStamatakis A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models.Bioinformatics. 2006 Nov 1;22(21):2688-90Phylogeny / Alignmenthttp://icwww.epfl.ch/~stamatak/index-Dateien/countManual7.0.0.php raxmlhpc2bb_expanseraxmlhpc_hybridlogicdhge10perl$specify_nchar < 500000 && $datatype ne "protein" && ($use_bootstopping || $specify_bootstraps > 9)perl"raxmlHPC-HYBRID_8.2.12_expanse"0raxmlhpc_hybridlogicdpge10perl$specify_nchar > 499999 && $datatype ne "protein" && ($use_bootstopping || $specify_bootstraps > 9)perl"raxmlHPC-PTHREADS_8.2.12_expanse>"0raxmlhpc_hybridlogicphge10perl$specify_nchar < 200000 && $datatype eq "protein" && ($use_bootstopping || $specify_bootstraps > 9)perl"raxmlHPC-HYBRID_8.2.12_expanse"0raxmlhpc_hybridlogicppge10perl$specify_nchar > 199999 && $datatype eq "protein" && ($use_bootstopping || $specify_bootstraps > 9)perl"raxmlHPC-PTHREADS_8.2.12_expanse"0raxmlhpc_hybridlogicallplt10perl!$use_bootstopping && $specify_bootstraps < 10perl"raxmlHPC-PTHREADS_8.2.12_expanse"0raxmlhpc_hybridlogicdhge10_scheduler1scheduler.confperl$specify_nchar < 10000 && $datatype ne "protein" && ($use_bootstopping || $specify_bootstraps > 9)perl
"jobtype=mpi\\n" .
"mpi_processes=10\\n" .
"threads_per_process=4\\n" .
"cpus-per-task=4\\n" .
"mem=77G\\n" .
"node_exclusive=0\\n" .
"nodes=1\\n"
0raxmlhpc_hybridlogicdhge10_scheduler2scheduler.confperl$specify_nchar > 9999 && $specify_nchar < 40000 && $datatype ne "protein" && ($use_bootstopping || $specify_bootstraps > 9)perl
"jobtype=mpi\\n" .
"mpi_processes=5\\n" .
"threads_per_process=25\\n" .
"cpus-per-task=25\\n" .
"node_exclusive=1\\n" .
"nodes=1\\n"
0raxmlhpc_hybridlogicdhge10_scheduler3scheduler.confperl$specify_nchar > 39999 && $specify_nchar < 500000 && $datatype ne "protein" && ($use_bootstopping || $specify_bootstraps > 9)perl
"jobtype=mpi\\n" .
"mpi_processes=2\\n" .
"threads_per_process=64\\n" .
"cpus-per-task=64\\n" .
"node_exclusive=1\\n" .
"nodes=1\\n"
0raxmlhpc_hybridlogicdpge10_schedulerscheduler.confperl$specify_nchar > 499999 && $datatype ne "protein" && ($use_bootstopping || $specify_bootstraps > 9)perl
"threads_per_process=128\\n" .
"cpus-per-task=128\\n" .
"node_exclusive=1\\n" .
"mem=243G\\n" .
"nodes=1\\n"
0raxmlhpc_hybridlogicdplt10_scheduler1scheduler.confperl$specify_nchar < 4000 && $datatype ne "protein" && !$use_bootstopping && $specify_bootstraps < 10perl
"threads_per_process=8\\n" .
"cpus-per-task=8\\n" .
"node_exclusive=0\\n" .
"mem=15G\\n" .
"nodes=1\\n"
0raxmlhpc_hybridlogicdplt10_scheduler2scheduler.confperl$specify_nchar > 3999 && $specify_nchar < 16000 && $datatype ne "protein" && !$use_bootstopping && $specify_bootstraps < 10perl
"threads_per_process=16\\n" .
"cpus-per-task=16\\n" .
"node_exclusive=0\\n" .
"mem=30G\\n" .
"nodes=1\\n"
0raxmlhpc_hybridlogicdplt10_scheduler3scheduler.confperl$specify_nchar > 15999 && $specify_nchar < 60000 && $datatype ne "protein" && !$use_bootstopping && $specify_bootstraps < 10perl
"threads_per_process=24\\n" .
"cpus-per-task=24\\n" .
"node_exclusive=0\\n" .
"mem=46G\\n" .
"nodes=1\\n"
0raxmlhpc_hybridlogicdplt10_scheduler4scheduler.confperl$specify_nchar > 59999 && $specify_nchar < 300000 && $datatype ne "protein" && !$use_bootstopping && $specify_bootstraps < 10perl
"threads_per_process=48\\n" .
"cpus-per-task=48\\n" .
"node_exclusive=0\\n" .
"mem=92G\\n" .
"nodes=1\\n"
0raxmlhpc_hybridlogicdplt10_schedulerscheduler.confperl$specify_nchar > 299999 && $datatype ne "protein" && !$use_bootstopping && $specify_bootstraps < 10perl
"threads_per_process=128\\n" .
"cpus-per-task=128\\n" .
"node_exclusive=1\\n" .
"mem=243G\\n" .
"nodes=1\\n"
0raxmlhpc_hybridlogicphge10_scheduler1scheduler.confperl$specify_nchar < 3000 && $datatype eq "protein" && ($use_bootstopping || $specify_bootstraps > 9)perl
"jobtype=mpi\\n" .
"mpi_processes=10\\n" .
"threads_per_process=4\\n" .
"cpus-per-task=4\\n" .
"mem=77G\\n" .
"node_exclusive=0\\n" .
"nodes=1\\n"
raxmlhpc_hybridlogicphge10_scheduler2scheduler.confperl$specify_nchar > 2999 && $specify_nchar < 12000 && $datatype eq "protein" && ($use_bootstopping || $specify_bootstraps > 9)perl
"jobtype=mpi\\n" .
"mpi_processes=10\\n" .
"threads_per_process=12\\n" .
"cpus-per-task=12\\n" .
"node_exclusive=1\\n" .
"nodes=1\\n"
raxmlhpc_hybridlogicphge10_scheduler3scheduler.confperl$specify_nchar > 11999 && $specify_nchar < 30000 && $datatype eq "protein" && ($use_bootstopping || $specify_bootstraps > 9)perl
"jobtype=mpi\\n" .
"mpi_processes=5\\n" .
"threads_per_process=25\\n" .
"cpus-per-task=25\\n" .
"node_exclusive=1\\n" .
"nodes=1\\n"
raxmlhpc_hybridlogicphge10_scheduler4scheduler.confperl$specify_nchar > 29999 && $specify_nchar < 200000 && $datatype eq "protein" && ($use_bootstopping || $specify_bootstraps > 9)perl
"jobtype=mpi\\n" .
"mpi_processes=2\\n" .
"threads_per_process=64\\n" .
"cpus-per-task=64\\n" .
"node_exclusive=1\\n" .
"nodes=1\\n"
raxmlhpc_hybridlogicppge10_schedulerscheduler.confperl$specify_nchar > 199999 && $datatype eq "protein" && ($use_bootstopping || $specify_bootstraps > 9)perl
"threads_per_process=128\\n" .
"cpus-per-task=128\\n" .
"node_exclusive=1\\n" .
"mem=243G\\n" .
"nodes=1\\n"
raxmlhpc_hybridlogicpplt10_scheduler1scheduler.confperl$specify_nchar < 3000 && $datatype eq "protein" && !$use_bootstopping && $specify_bootstraps < 10perl
"threads_per_process=12\\n" .
"cpus-per-task=12\\n" .
"node_exclusive=0\\n" .
"mem=23G\\n" .
"nodes=1\\n"
raxmlhpc_hybridlogicpplt10_scheduler2scheduler.confperl$specify_nchar > 2999 && $specify_nchar < 4500 && $datatype eq "protein" && !$use_bootstopping && $specify_bootstraps < 10perl
"threads_per_process=24\\n" .
"cpus-per-task=24\\n" .
"node_exclusive=0\\n" .
"mem=46G\\n" .
"nodes=1\\n"
raxmlhpc_hybridlogicpplt10_scheduler3scheduler.confperl$specify_nchar > 4499 && $specify_nchar < 15000 && $datatype eq "protein" && !$use_bootstopping && $specify_bootstraps < 10perl
"threads_per_process=32\\n" .
"cpus-per-task=32\\n" .
"node_exclusive=0\\n" .
"mem=61G\\n" .
"nodes=1\\n"
raxmlhpc_hybridlogicpplt10_scheduler4scheduler.confperl$specify_nchar > 14999 && $specify_nchar < 100000 && $datatype eq "protein" && !$use_bootstopping && $specify_bootstraps < 10perl
"threads_per_process=64\\n" .
"cpus-per-task=64\\n" .
"node_exclusive=0\\n" .
"mem=123G\\n" .
"nodes=1\\n"
raxmlhpc_hybridlogicpplt10_schedulerscheduler.confperl$specify_nchar > 99999 && $datatype eq "protein" && !$use_bootstopping && $specify_bootstraps < 10perl
"threads_per_process=128\\n" .
"cpus-per-task=128\\n" .
"node_exclusive=1\\n" .
"mem=243G\\n" .
"nodes=1\\n"
infileSequences File (relaxed phylip format) (-s)perl" -s infile"1infileparsimony_seed_valEnter a random seed value for parsimony inferences (gives reproducible results from random starting tree)perl " -p 12345"2outsuffixperl" -n result"1bootstrap_seedperl"-x 12345"2specify_bootstrapsSet the number of bootstrapsperl!$use_bootstoppingperl" -N $value"2Please enter number of bootstraps desired (-N) (eg 100)perl!$use_bootstopping && !defined $specify_bootstrapsSorry, the number of bootstraps cannot exceed 1,000 (-N) perl$specify_bootstraps > 1000use_bootstoppingLet RAxML halt bootstrapping automatically (HIGHLY recommended)1perl($value)? "-N autoMRE":""Sorry, you cant use automatic bootstopping with a constraint treeperl$use_bootstopping && defined $constraintThis option instructs Raxml to automatically halt bootstrapping when certain criteria are met, instead of specifying the number of bootstraps for an analysis. The exact criteria are specified/configured using subsequent entry fields.dna_gtrcatUse GTRCAT for the bootstrapping phase, and GTRGAMMA for the final tree inference (default)perl"-m GTRCAT$invariable"2perl$datatype eq "dna"prot_sub_modelperl$datatype eq "protein"perl"-m PROTCAT$invariable$prot_matrix_spec$empirical"2runtime1scheduler.confMaximum Hours to Run (click here for help setting this correctly)0.25Estimate the maximum time your job will need to run. We recommend testimg initially with a < 0.5hr test run because Jobs set for 0.5 h or less depedendably run immediately in the "debug" queue.
Once you are sure the configuration is correct, you then increase the time. The reason is that jobs > 0.5 h are submitted to the "normal" queue, where jobs configured for 1 or a few hours times may
run sooner than jobs configured for the full 168 hours.
Maximum Hours to Run must be less than 168perl$runtime > 168.0Maximum Hours to Run must be greater than 0.1 perl$runtime < 0.1The job will run on 40 processors as configured. If it runs for the entire configured time, it will consume 40 x $runtime cpu hoursperl$specify_nchar < 10000 && $datatype ne "protein" && ($use_bootstopping || $specify_bootstraps > 9)The job will run on 125 processors as configured. If it runs for the entire configured time, it will consume 125 x $runtime cpu hoursperl$specify_nchar > 9999 && $specify_nchar < 40000 && $datatype ne "protein" && ($use_bootstopping || $specify_bootstraps > 9)The job will run on 128 processors as configured. If it runs for the entire configured time, it will consume 128 x $runtime cpu hoursperl$specify_nchar > 39999 && $specify_nchar < 500000 && $datatype ne "protein" && ($use_bootstopping || $specify_bootstraps > 9)The job will run on 128 processors as configured. If it runs for the entire configured time, it will consume 128 x $runtime cpu hoursperl$specify_nchar > 499999 && $datatype ne "protein" && ($use_bootstopping || $specify_bootstraps > 9)The job will run on 8 processors as configured. If it runs for the entire configured time, it will consume 8 x $runtime cpu hoursperl$specify_nchar < 4000 && $datatype ne "protein" && !$use_bootstopping && $specify_bootstraps < 10The job will run on 16 processors as configured. If it runs for the entire configured time, it will consume 16 x $runtime cpu hoursperl$specify_nchar > 3999 && $specify_nchar < 16000 && $datatype ne "protein" && !$use_bootstopping && $specify_bootstraps < 10The job will run on 24 processors as configured. If it runs for the entire configured time, it will consume 24 x $runtime cpu hoursperl$specify_nchar > 15999 && $specify_nchar < 60000 && $datatype ne "protein" && !$use_bootstopping && $specify_bootstraps < 10The job will run on 48 processors as configured. If it runs for the entire configured time, it will consume 48 x $runtime cpu hoursperl$specify_nchar > 59999 && $specify_nchar < 300000 && $datatype ne "protein" && !$use_bootstopping && $specify_bootstraps < 10The job will run on 128 processors as configured. If it runs for the entire configured time, it will consume 128 x $runtime cpu hoursperl$specify_nchar > 299999 && $datatype ne "protein" && !$use_bootstopping && $specify_bootstraps < 10The job will run on 40 processors as configured. If it runs for the entire configured time, it will consume 40 x $runtime cpu hoursperl$specify_nchar < 3000 && $datatype eq "protein" && ($use_bootstopping || $specify_bootstraps > 9)The job will run on 120 processors as configured. If it runs for the entire configured time, it will consume 120 x $runtime cpu hoursperl$specify_nchar > 2999 && $specify_nchar < 12000 && $datatype eq "protein" && ($use_bootstopping || $specify_bootstraps > 9)The job will run on 125 processors as configured. If it runs for the entire configured time, it will consume 125 x $runtime cpu hoursperl$specify_nchar > 11999 && $specify_nchar < 30000 && $datatype eq "protein" && ($use_bootstopping || $specify_bootstraps > 9)The job will run on 128 processors as configured. If it runs for the entire configured time, it will consume 128 x $runtime cpu hoursperl$specify_nchar > 29999 && $specify_nchar < 200000 && $datatype eq "protein" && ($use_bootstopping || $specify_bootstraps > 9)The job will run on 128 processors as configured. If it runs for the entire configured time, it will consume 128 x $runtime cpu hoursperl$specify_nchar > 199999 && $datatype eq "protein" && ($use_bootstopping || $specify_bootstraps > 9)The job will run on 12 processors as configured. If it runs for the entire configured time, it will consume 12 x $runtime cpu hoursperl$specify_nchar < 3000 && $datatype eq "protein" && !$use_bootstopping && $specify_bootstraps < 10The job will run on 24 processors as configured. If it runs for the entire configured time, it will consume 24 x $runtime cpu hoursperl$specify_nchar > 2999 && $specify_nchar < 4500 && $datatype eq "protein" && !$use_bootstopping && $specify_bootstraps < 10The job will run on 32 processors as configured. If it runs for the entire configured time, it will consume 32 x $runtime cpu hoursperl$specify_nchar > 4499 && $specify_nchar < 15000 && $datatype eq "protein" && !$use_bootstopping && $specify_bootstraps < 10The job will run on 64 processors as configured. If it runs for the entire configured time, it will consume 64 x $runtime cpu hoursperl$specify_nchar > 14999 && $specify_nchar < 100000 && $datatype eq "protein" && !$use_bootstopping && $specify_bootstraps < 10The job will run on 128 processors as configured. If it runs for the entire configured time, it will consume 128 x $runtime cpu hoursperl$specify_nchar > 99999 && $datatype eq "protein" && !$use_bootstopping && $specify_bootstraps < 10perl"runhours=$value\\n"specify_ncharNumber of patterns in your dataset
Knowing the number of characters in your dataset helps us determine the most efficient way to run raxml.
We need to know the number of characters per row in the input data matrix. You can find this by doing a quick pre-run using raxml.
1000Please enter a value for the number of characters in your data matrixperl!defined $specify_ncharThe number of characters in the matrix must 1 or greater.perl$specify_nchar < 1Knowing the number of characters in your dataset helps us determine the most efficient way to run raxml.
The number of patterns is the number of unique columns in the multiple sequence alignment matrix. You can get this number from the output of the intermediate results once a job begins. Entering the number of characters per taxon in your matirx, or 1000 as the number of patterns is an ok start.
Look at the intermediate results, and see if that is reasonably close. If it is not, kill the job, and adjust the number.15datatypeSequence Typeproteindnadna2outgroupOutgroup (one or more comma-separated outgroups, see comment for syntax)perl(defined $value)? " -o $value " : "" 10The correct syntax for the box is outgroup1,outgroup2,outgroupn. If white space is introduced (e.g. outgroup1, outgroup2, outgroupn) the program will fail with the message
"Error, you must specify a model of substitution with the '-m' option"
constraintperl!defined $binary_backbone && !$use_bootstoppingConstraint (-g)constraint.treperl(defined $value) ? "-g constraint.tre" : ""2 This option allows you to specify an incomplete or comprehensive multifurcating constraint tree for the RAxML
search in NEWICK format. Initially, multifurcations are resolved randomly. If the tree is incomplete (does not contain
all taxa) the remaining taxa are added by using the MP criterion. Once a comprehensive (containing all taxa) bifurcating
tree is computed, it is further optimized under ML respecting the given constraints. Important: If you specify a
non-comprehensive constraint, e.g., a constraint tree that does not contain all taxa, RAxML will assume that any taxa
that not found in the constraint topology are unconstrained, i.e., these taxa can be placed in any part of the tree. As
an example consider an alignment with 10 taxa: Loach, Chicken, Human, Cow, Mouse, Whale, Seal, Carp, Rat, Frog. If, for
example you would like Loach, Chicken, Human, Cow to be monophyletic you would specify the constraint tree as follows: ((Loach, Chicken, Human, Cow),(Mouse, Whale, Seal, Carp, Rat, Frog)); Moreover, if you would like Loach, Chicken, Human, Cow to be monophyletic and in addition Human, Cow to be
monophyletic within that clade you could specify: ((Loach, Chicken, (Human, Cow)),(Mouse, Whale, Seal, Carp, Rat, Frog)); If you specify an incomplete constraint: ((Loach, Chicken, Human, Cow),(Mouse, Whale, Seal, Carp)); the two groups Loach, Chicken, Human, Cow and Mouse, Whale, Seal, Carp will be monophyletic, while Rat and Frog can
end up anywhere in the tree. binary_backboneperl!defined $constraintBinary Backbone (-r)binary_backbone.treperl(defined $value) ? " -r binary_backbone.tre" : ""2This option allows you to pass a binary/bifurcating constraint/backbone tree in NEWICK format to RAxML. Note that using this option only makes sense if this tree contains fewer taxa than the input alignment. The remaining taxa will initially be added by using the MP criterion. Once a comprehensive tree with all taxa has been obtained it will be optimized under ML respecting the restrictions of the constraint tree.
partitionUse a mixed/partitioned model? (-q)perl" -q part"2partThe multipleModelFileName parameter (-q) allows you to upload a file that specifies the regions of your alignment for which an individual model of nucleotide substitution should be estimated. This will typically be used to infer trees for long (in terms of base pairs) multi-gene alignments. For example, if -m GTRGAMMA is used, individual alpha-shape parameters, GTR-rates, and empirical base frequencies will be estimated and optimized for each partition. Since Raxml can now handle mixed Amino Acid and DNA alignments, you must specify the data type in the partition file, before the partition name. For DNA, this means you have to add DNA to each line in the partition. For AA data you must specify the transition matrices for each partition:
The AA substitution model must be the first entry in each line and must be separated by a comma from the gene name, just like the DNA token above. You can not assign different models of rate heterogeneity to different partitions, i.e. it will be CAT or CATI. for all partitions. Finally, if you have a concatenated DNA and AA alignments, with DNA data at positions 1 - 500 and AA data at 501-1000 with the WAG model the partition file should look as follows: DNA, gene1 = 1-500; WAG, gene2 = 501-1000. For more help see http://phylobench.vital-it.ch/raxml-bb/index.php?help=model.
exclude_fileCreate an input file that excludes the range of positions specifed in this file (-E)perl" -E excl"2exclThis option is used to excludes specific positions in the matrix. For example, uploading a file
that contains the text: 100-200 300-400 will create a file that excludes all columns between positions
100 and 200 as well as all columns between positions 300 and 400. Note that the boundary numbers (positions 100, 200, 300,
and 400) will also be excluded. To exclude a single column write (100-100). This option does not
run an analysis but just prints an alignment file without the excluded columns. Save this file to your
data area, and then run the real analysis. If you use a mixed model, an appropriately adapted model file
will also be written. The ntax element of the phylip files is automatically corrected Example: raxmlHPC -E excl
-s infile -m GTRCAT -q part -n TEST. In this case the files with columns excluded will be named
infile.excl and part.excl. invariableEstimate proportion of invariable sites (GTRGAMMA + I)perl$mlsearchI2prot_matrix_specProtein Substitution Matrixperl$datatype eq "protein"DAYHOFFDCMUTJTTMTREVWAGRTREVCPREVVTBLOSUM62MTMAMLGGTRJTTNote: FLOAT and invariable sites (I) options are not exposed here. If you require this option, please contact mmiller@sdsc.edu.-m PROTCATmatrixName: analyses using the specified AA matrix + Optimization of substitution rates + Optimization of site-specific evolutionary rates which are categorized into numberOfCategories distinct rate categories for greater computational efficiency. Final tree might be evaluated automatically under PROTGAMMAmatrixName[f], depending on the tree search option.
-m PROTGAMMAmatrixName[F] anyses use the specified AA matrix + Optimization of substitution rates + GAMMA model of rate heterogeneity (alpha parameter will be estimated)Available AA substitution models: DAYHOFF, DCMUT, JTT, MTREV, WAG, RTREV, CPREV, VT, BLOSUM62, MTMAM, LG, GTR. You can specify if you want to use empirical base frequencies. Please note that for mixed models you can in addition specify the per-gene AA model in the mixed model file (see manual for details). Also note that if you estimate AA GTR parameters on a partitioned dataset, they will be linked (estimated jointly) across all partitions to avoid over-parametrization.empiricalUse empirical base frequencies?perl$datatype eq "protein"F2The empirical base frequency command is relevant for the protein datatype, and is used as a suffix to the -m model string PROTGAMMA_____
mlsearchFind best tree using maximum likelihood searchperl ($value)?" -f a ":""12Tell RAxML to conduct a Rapid Bootstrap analysis (-x) and search for the best-scoring ML tree in one single program run.
no_bfgsDon't use BFGS searching algorithm (--no-bfgs)0perl($value)? "--no-bfgs":""Sorry, you cant use automatic bootstopping with a constraint treeperl$use_bootstopping && defined $constraint BFGS is a more efficient optimization algorithm for optimizing
branch lengths and GTR parameters simultaneously. YUOu can disable it using this optionprintbrlengthPrint branch lengths (-k)perl ($value)?" -k":""02 The -k option causes bootstrapped trees to be printed with branch lengths.
The bootstraps will require a bit longer to run under this option because model parameters will be optimized at
the end of each run under GAMMA or GAMMA+P-Invar respectively.
all_outputfiles*raxml_outputfilesRAXML_*