RAxML-HPC BlackBox 8.2.12 Phylogenetic tree inference using maximum likelihood/rapid bootstrapping on XSEDE. Alexandros Stamatakis Stamatakis A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models.Bioinformatics. 2006 Nov 1;22(21):2688-90 Phylogeny / Alignment http://icwww.epfl.ch/~stamatak/index-Dateien/countManual7.0.0.php raxmlhpc2bb_expanse raxmlhpc_hybridlogicdhge10 perl $nchars < 500000 && $datatype ne "protein" && ($use_bootstopping || $bootstop > 9) perl "raxmlHPC-HYBRID_8.2.12_expanse" 0 raxmlhpc_hybridlogicdpge10 perl $nchars > 499999 && $datatype ne "protein" && ($use_bootstopping || $bootstop > 9) perl "raxmlHPC-PTHREADS_8.2.12_expanse>" 0 raxmlhpc_hybridlogicphge10 perl $nchars < 200000 && $datatype eq "protein" && ($use_bootstopping || $bootstop > 9) perl "raxmlHPC-HYBRID_8.2.12_expanse" 0 raxmlhpc_hybridlogicppge10 perl $nchars > 199999 && $datatype eq "protein" && ($use_bootstopping || $bootstop > 9) perl "raxmlHPC-PTHREADS_8.2.12_expanse" 0 raxmlhpc_hybridlogicallplt10 perl !$use_bootstopping && $bootstop < 10 perl "raxmlHPC-PTHREADS_8.2.12_expanse" 0 raxmlhpc_hybridlogicdhge10_scheduler1 scheduler.conf perl $nchars < 10000 && $datatype ne "protein" && ($use_bootstopping || $bootstop > 9) perl "jobtype=mpi\\n" . "mpi_processes=10\\n" . "cpus-per-task=4\\n" . "threads_per_process=4\\n" . "mem=77G\\n" . "node_exclusive=0\\n" . "nodes=1\\n" 0 raxmlhpc_hybridlogicdhge10_scheduler2 scheduler.conf perl $nchars > 9999 && $nchars < 40000 && $datatype ne "protein" && ($use_bootstopping || $bootstop > 9) perl "jobtype=mpi\\n" . "mpi_processes=5\\n" . "cpus-per-task=25\\n" . "threads_per_process=25\\n" . "node_exclusive=1\\n" . "nodes=1\\n" 0 raxmlhpc_hybridlogicdhge10_scheduler3 scheduler.conf perl $nchars > 39999 && $nchars < 500000 && $datatype ne "protein" && ($use_bootstopping || $bootstop > 9) perl "jobtype=mpi\\n" . "mpi_processes=2\\n" . "cpus-per-task=64\\n" . "threads_per_process=64\\n" . "node_exclusive=1\\n" . "nodes=1\\n" 0 raxmlhpc_hybridlogicdpge10_scheduler scheduler.conf perl $nchars > 499999 && $datatype ne "protein" && ($use_bootstopping || $bootstop > 9) perl "cpus-per-task=128\\n" . "threads_per_process=128\\n" . "node_exclusive=1\\n" . "nodes=1\\n" 0 raxmlhpc_hybridlogicdplt10_scheduler1 scheduler.conf perl $nchars < 4000 && $datatype ne "protein" && !$use_bootstopping && $bootstop < 10 perl "cpus-per-task=8\\n" . "threads_per_process=8\\n" . "node_exclusive=0\\n" . "mem=15G\\n" . "nodes=1\\n" 0 raxmlhpc_hybridlogicdplt10_scheduler2 scheduler.conf perl $nchars > 3999 && $nchars < 16000 && $datatype ne "protein" && !$use_bootstopping && $bootstop < 10 perl "cpus-per-task=16\\n" . "threads_per_process=16\\n" . "node_exclusive=0\\n" . "mem=31G\\n" . "nodes=1\\n" 0 raxmlhpc_hybridlogicdplt10_scheduler3 scheduler.conf perl $nchars > 15999 && $nchars < 60000 && $datatype ne "protein" && !$use_bootstopping && $bootstop < 10 perl "cpus-per-task=24\\n" . "threads_per_process=24\\n" . "node_exclusive=0\\n" . "mem=46G\\n" . "nodes=1\\n" 0 raxmlhpc_hybridlogicdplt10_scheduler4 scheduler.conf perl $nchars > 59999 && $nchars < 300000 && $datatype ne "protein" && !$use_bootstopping && $bootstop < 10 perl "cpus-per-task=48\\n" . "threads_per_process=48\\n" . "node_exclusive=0\\n" . "mem=92G\\n" . "nodes=1\\n" 0 raxmlhpc_hybridlogicdplt10_scheduler scheduler.conf perl $nchars > 299999 && $datatype ne "protein" && !$use_bootstopping && $bootstop < 10 perl "cpus-per-task=128\\n" . "threads_per_process=128\\n" . "node_exclusive=1\\n" . "nodes=1\\n" 0 raxmlhpc_hybridlogicphge10_scheduler1 scheduler.conf perl $nchars < 3000 && $datatype eq "protein" && ($use_bootstopping || $bootstop > 9) perl "jobtype=mpi\\n" . "mpi_processes=10\\n" . "cpus-per-task=4\\n" . "threads_per_process=4\\n" . "mem=77G\\n" . "node_exclusive=0\\n" . "nodes=1\\n" raxmlhpc_hybridlogicphge10_scheduler2 scheduler.conf perl $nchars > 2999 && $nchars < 12000 && $datatype eq "protein" && ($use_bootstopping || $bootstop > 9) perl "jobtype=mpi\\n" . "mpi_processes=10\\n" . "cpus-per-task=12\\n" . "threads_per_process=12\\n" . "node_exclusive=1\\n" . "nodes=1\\n" raxmlhpc_hybridlogicphge10_scheduler3 scheduler.conf perl $nchars > 11999 && $nchars < 30000 && $datatype eq "protein" && ($use_bootstopping || $bootstop > 9) perl "jobtype=mpi\\n" . "mpi_processes=5\\n" . "cpus-per-task=25\\n" . "threads_per_process=25\\n" . "node_exclusive=1\\n" . "nodes=1\\n" raxmlhpc_hybridlogicphge10_scheduler4 scheduler.conf perl $nchars > 29999 && $nchars < 200000 && $datatype eq "protein" && ($use_bootstopping || $bootstop > 9) perl "jobtype=mpi\\n" . "mpi_processes=2\\n" . "cpus-per-task=64\\n" . "threads_per_process=64\\n" . "node_exclusive=1\\n" . "nodes=1\\n" raxmlhpc_hybridlogicppge10_scheduler scheduler.conf perl $nchars > 199999 && $datatype eq "protein" && ($use_bootstopping || $bootstop > 9) perl "cpus-per-task=128\\n" . "threads_per_process=128\\n" . "node_exclusive=1\\n" . "nodes=1\\n" raxmlhpc_hybridlogicpplt10_scheduler1 scheduler.conf perl $nchars < 3000 && $datatype eq "protein" && !$use_bootstopping && $bootstop < 10 perl "cpus-per-task=12\\n" . "threads_per_process=12\\n" . "node_exclusive=0\\n" . "mem=23G\\n" . "nodes=1\\n" raxmlhpc_hybridlogicpplt10_scheduler2 scheduler.conf perl $nchars > 2999 && $nchars < 4500 && $datatype eq "protein" && !$use_bootstopping && $bootstop < 10 perl "cpus-per-task=24\\n" . "threads_per_process=24\\n" . "node_exclusive=0\\n" . "mem=46G\\n" . "nodes=1\\n" raxmlhpc_hybridlogicpplt10_scheduler3 scheduler.conf perl $nchars > 4499 && $nchars < 15000 && $datatype eq "protein" && !$use_bootstopping && $bootstop < 10 perl "cpus-per-task=32\\n" . "threads_per_process=32\\n" . "node_exclusive=0\\n" . "mem=61G\\n" . "nodes=1\\n" raxmlhpc_hybridlogicpplt10_scheduler4 scheduler.conf perl $nchars > 14999 && $nchars < 100000 && $datatype eq "protein" && !$use_bootstopping && $bootstop < 10 perl "cpus-per-task=64\\n" . "threads_per_process=64\\n" . "node_exclusive=0\\n" . "mem=120G\\n" . "nodes=1\\n" raxmlhpc_hybridlogicpplt10_scheduler scheduler.conf perl $nchars > 99999 && $datatype eq "protein" && !$use_bootstopping && $bootstop < 10 perl "cpus-per-task=128\\n" . "threads_per_process=128\\n" . "node_exclusive=1\\n" . "nodes=1\\n" infile Sequences File (relaxed phylip format) (-s) perl " -s infile" 1 infile parsimony_seed_val Enter a random seed value for parsimony inferences (gives reproducible results from random starting tree) perl " -p 12345" 2 outsuffix perl " -n result" 1 bootstrap_seed perl "-x 12345" 2 bootstop Set the number of bootstraps perl " -N $value" 2 perl !$use_bootstopping use_bootstopping Let RAxML halt bootstrapping automatically (HIGHLY recommended) 1 perl ($value)? "-N autoMRE":"" Sorry, you cant use automatic bootstopping with a constraint tree perl $use_bootstopping && defined $constraint This option instructs Raxml to automatically halt bootstrapping when certain criteria are met, instead of specifying the number of bootstraps for an analysis. The exact criteria are specified/configured using subsequent entry fields. dna_gtrcat Use GTRCAT for the bootstrapping phase, and GTRGAMMA for the final tree inference (default) perl "-m GTRCAT$invariable" 2 perl $datatype eq "dna" prot_sub_model perl $datatype eq "protein" perl "-m PROTCAT$invariable$prot_matrix_spec$empirical" 2 runtime 1 scheduler.conf Maximum Hours to Run (click here for help setting this correctly) 0.25 Estimate the maximum time your job will need to run. We recommend testimg initially with a < 0.5hr test run because Jobs set for 0.5 h or less depedendably run immediately in the "debug" queue. Once you are sure the configuration is correct, you then increase the time. The reason is that jobs > 0.5 h are submitted to the "normal" queue, where jobs configured for 1 or a few hours times may run sooner than jobs configured for the full 168 hours. Maximum Hours to Run must be less than 168 perl $runtime > 168.0 Maximum Hours to Run must be greater than 0.1 perl $runtime < 0.1 The job will run on 40 processors as configured. If it runs for the entire configured time, it will consume 40 x $runtime cpu hours perl $nchars < 10000 && $datatype ne "protein" && ($use_bootstopping || $bootstop > 9) The job will run on 125 processors as configured. If it runs for the entire configured time, it will consume 125 x $runtime cpu hours perl $nchars > 9999 && $nchars < 40000 && $datatype ne "protein" && ($use_bootstopping || $bootstop > 9) The job will run on 128 processors as configured. If it runs for the entire configured time, it will consume 128 x $runtime cpu hours perl $nchars > 39999 && $nchars < 500000 && $datatype ne "protein" && ($use_bootstopping || $bootstop > 9) The job will run on 128 processors as configured. If it runs for the entire configured time, it will consume 128 x $runtime cpu hours perl $nchars > 499999 && $datatype ne "protein" && ($use_bootstopping || $bootstop > 9) The job will run on 8 processors as configured. If it runs for the entire configured time, it will consume 8 x $runtime cpu hours perl $nchars < 4000 && $datatype ne "protein" && !$use_bootstopping && $bootstop < 10 The job will run on 16 processors as configured. If it runs for the entire configured time, it will consume 16 x $runtime cpu hours perl $nchars > 3999 && $nchars < 16000 && $datatype ne "protein" && !$use_bootstopping && $bootstop < 10 The job will run on 24 processors as configured. If it runs for the entire configured time, it will consume 24 x $runtime cpu hours perl $nchars > 15999 && $nchars < 60000 && $datatype ne "protein" && !$use_bootstopping && $bootstop < 10 The job will run on 48 processors as configured. If it runs for the entire configured time, it will consume 48 x $runtime cpu hours perl $nchars > 59999 && $nchars < 300000 && $datatype ne "protein" && !$use_bootstopping && $bootstop < 10 The job will run on 128 processors as configured. If it runs for the entire configured time, it will consume 128 x $runtime cpu hours perl $nchars > 299999 && $datatype ne "protein" && !$use_bootstopping && $bootstop < 10 The job will run on 40 processors as configured. If it runs for the entire configured time, it will consume 40 x $runtime cpu hours perl $nchars < 3000 && $datatype eq "protein" && ($use_bootstopping || $bootstop > 9) The job will run on 120 processors as configured. If it runs for the entire configured time, it will consume 120 x $runtime cpu hours perl $nchars > 2999 && $nchars < 12000 && $datatype eq "protein" && ($use_bootstopping || $bootstop > 9) The job will run on 125 processors as configured. If it runs for the entire configured time, it will consume 125 x $runtime cpu hours perl $nchars > 11999 && $nchars < 30000 && $datatype eq "protein" && ($use_bootstopping || $bootstop > 9) The job will run on 128 processors as configured. If it runs for the entire configured time, it will consume 128 x $runtime cpu hours perl $nchars > 29999 && $nchars < 200000 && $datatype eq "protein" && ($use_bootstopping || $bootstop > 9) The job will run on 128 processors as configured. If it runs for the entire configured time, it will consume 128 x $runtime cpu hours perl $nchars > 199999 && $datatype eq "protein" && ($use_bootstopping || $bootstop > 9) The job will run on 12 processors as configured. If it runs for the entire configured time, it will consume 12 x $runtime cpu hours perl $nchars < 3000 && $datatype eq "protein" && !$use_bootstopping && $bootstop < 10 The job will run on 24 processors as configured. If it runs for the entire configured time, it will consume 24 x $runtime cpu hours perl $nchars > 2999 && $nchars < 4500 && $datatype eq "protein" && !$use_bootstopping && $bootstop < 10 The job will run on 32 processors as configured. If it runs for the entire configured time, it will consume 32 x $runtime cpu hours perl $nchars > 4499 && $nchars < 15000 && $datatype eq "protein" && !$use_bootstopping && $bootstop < 10 The job will run on 64 processors as configured. If it runs for the entire configured time, it will consume 64 x $runtime cpu hours perl $nchars > 14999 && $nchars < 100000 && $datatype eq "protein" && !$use_bootstopping && $bootstop < 10 The job will run on 128 processors as configured. If it runs for the entire configured time, it will consume 128 x $runtime cpu hours perl $nchars > 99999 && $datatype eq "protein" && !$use_bootstopping && $bootstop < 10 perl "runhours=$value\\n" nchars Number of patterns in your dataset Knowing the number of characters in your dataset helps us determine the most efficient way to run raxml. We need to know the number of characters per row in the input data matrix. You can find this by doing a quick pre-run using raxml. Please enter a value for the number of characters in your data matrix perl !defined $nchars The number of characters in the matrix must 1 or greater. perl $nchars < 1 15 datatype Sequence Type protein dna dna 2 outgroup Outgroup (one or more comma-separated outgroups, see comment for syntax) perl (defined $value)? " -o $value " : "" 10 The correct syntax for the box is outgroup1,outgroup2,outgroupn. If white space is introduced (e.g. outgroup1, outgroup2, outgroupn) the program will fail with the message "Error, you must specify a model of substitution with the '-m' option" constraint perl !defined $binary_backbone && !$use_bootstopping Constraint (-g) constraint.tre perl (defined $value) ? "-g constraint.tre" : "" 2 This option allows you to specify an incomplete or comprehensive multifurcating constraint tree for the RAxML search in NEWICK format. Initially, multifurcations are resolved randomly. If the tree is incomplete (does not contain all taxa) the remaining taxa are added by using the MP criterion. Once a comprehensive (containing all taxa) bifurcating tree is computed, it is further optimized under ML respecting the given constraints. Important: If you specify a non-comprehensive constraint, e.g., a constraint tree that does not contain all taxa, RAxML will assume that any taxa that not found in the constraint topology are unconstrained, i.e., these taxa can be placed in any part of the tree. As an example consider an alignment with 10 taxa: Loach, Chicken, Human, Cow, Mouse, Whale, Seal, Carp, Rat, Frog. If, for example you would like Loach, Chicken, Human, Cow to be monophyletic you would specify the constraint tree as follows: ((Loach, Chicken, Human, Cow),(Mouse, Whale, Seal, Carp, Rat, Frog)); Moreover, if you would like Loach, Chicken, Human, Cow to be monophyletic and in addition Human, Cow to be monophyletic within that clade you could specify: ((Loach, Chicken, (Human, Cow)),(Mouse, Whale, Seal, Carp, Rat, Frog)); If you specify an incomplete constraint: ((Loach, Chicken, Human, Cow),(Mouse, Whale, Seal, Carp)); the two groups Loach, Chicken, Human, Cow and Mouse, Whale, Seal, Carp will be monophyletic, while Rat and Frog can end up anywhere in the tree. binary_backbone perl !defined $constraint Binary Backbone (-r) binary_backbone.tre perl (defined $value) ? " -r binary_backbone.tre" : "" 2 This option allows you to pass a binary/bifurcating constraint/backbone tree in NEWICK format to RAxML. Note that using this option only makes sense if this tree contains fewer taxa than the input alignment. The remaining taxa will initially be added by using the MP criterion. Once a comprehensive tree with all taxa has been obtained it will be optimized under ML respecting the restrictions of the constraint tree. partition Use a mixed/partitioned model? (-q) perl " -q part" 2 part The multipleModelFileName parameter (-q) allows you to upload a file that specifies the regions of your alignment for which an individual model of nucleotide substitution should be estimated. This will typically be used to infer trees for long (in terms of base pairs) multi-gene alignments. For example, if -m GTRGAMMA is used, individual alpha-shape parameters, GTR-rates, and empirical base frequencies will be estimated and optimized for each partition. Since Raxml can now handle mixed Amino Acid and DNA alignments, you must specify the data type in the partition file, before the partition name. For DNA, this means you have to add DNA to each line in the partition. For AA data you must specify the transition matrices for each partition: The AA substitution model must be the first entry in each line and must be separated by a comma from the gene name, just like the DNA token above. You can not assign different models of rate heterogeneity to different partitions, i.e. it will be CAT or CATI. for all partitions. Finally, if you have a concatenated DNA and AA alignments, with DNA data at positions 1 - 500 and AA data at 501-1000 with the WAG model the partition file should look as follows: DNA, gene1 = 1-500; WAG, gene2 = 501-1000. For more help see http://phylobench.vital-it.ch/raxml-bb/index.php?help=model. exclude_file Create an input file that excludes the range of positions specifed in this file (-E) perl " -E excl" 2 excl This option is used to excludes specific positions in the matrix. For example, uploading a file that contains the text: 100-200 300-400 will create a file that excludes all columns between positions 100 and 200 as well as all columns between positions 300 and 400. Note that the boundary numbers (positions 100, 200, 300, and 400) will also be excluded. To exclude a single column write (100-100). This option does not run an analysis but just prints an alignment file without the excluded columns. Save this file to your data area, and then run the real analysis. If you use a mixed model, an appropriately adapted model file will also be written. The ntax element of the phylip files is automatically corrected Example: raxmlHPC -E excl -s infile -m GTRCAT -q part -n TEST. In this case the files with columns excluded will be named infile.excl and part.excl. invariable Estimate proportion of invariable sites (GTRGAMMA + I) perl $mlsearch I 2 prot_matrix_spec Protein Substitution Matrix perl $datatype eq "protein" DAYHOFF DCMUT JTT MTREV WAG RTREV CPREV VT BLOSUM62 MTMAM LG GTR JTT Note: FLOAT and invariable sites (I) options are not exposed here. If you require this option, please contact mmiller@sdsc.edu. -m PROTCATmatrixName: analyses using the specified AA matrix + Optimization of substitution rates + Optimization of site-specific evolutionary rates which are categorized into numberOfCategories distinct rate categories for greater computational efficiency. Final tree might be evaluated automatically under PROTGAMMAmatrixName[f], depending on the tree search option. -m PROTGAMMAmatrixName[F] anyses use the specified AA matrix + Optimization of substitution rates + GAMMA model of rate heterogeneity (alpha parameter will be estimated) Available AA substitution models: DAYHOFF, DCMUT, JTT, MTREV, WAG, RTREV, CPREV, VT, BLOSUM62, MTMAM, LG, GTR. You can specify if you want to use empirical base frequencies. Please note that for mixed models you can in addition specify the per-gene AA model in the mixed model file (see manual for details). Also note that if you estimate AA GTR parameters on a partitioned dataset, they will be linked (estimated jointly) across all partitions to avoid over-parametrization. empirical Use empirical base frequencies? perl $datatype eq "protein" F 2 The empirical base frequency command is relevant for the protein datatype, and is used as a suffix to the -m model string PROTGAMMA_____ mlsearch Find best tree using maximum likelihood search perl ($value)?" -f a ":"" 1 2 Tell RAxML to conduct a Rapid Bootstrap analysis (-x) and search for the best-scoring ML tree in one single program run. no_bfgs Don't use BFGS searching algorithm (--no-bfgs) 0 perl ($value)? "--no-bfgs":"" Sorry, you cant use automatic bootstopping with a constraint tree perl $use_bootstopping && defined $constraint BFGS is a more efficient optimization algorithm for optimizing branch lengths and GTR parameters simultaneously. YUOu can disable it using this option printbrlength Print branch lengths (-k) perl ($value)?" -k":"" 0 2 The -k option causes bootstrapped trees to be printed with branch lengths. The bootstraps will require a bit longer to run under this option because model parameters will be optimized at the end of each run under GAMMA or GAMMA+P-Invar respectively. all_outputfiles * raxml_outputfiles RAXML_*