EPA-NG on ACCESS 0.3.8 Massively Parallel Evolutionary Placement of Genetic Sequences - run on XSEDE Pierre Barbera, Alexey M Kozlov, Lucas Czech, Benoit Morel, Diego Darriba, Tomáš Flouri, Alexandros Stamatakis,. Pierre Barbera, Alexey M Kozlov, Lucas Czech, Benoit Morel, Diego Darriba, Tomáš Flouri, Alexandros Stamatakis, EPA-ng: Massively Parallel Evolutionary Placement of Genetic Sequences, Systematic Biology, Volume 68, Issue 2, March 2019, Pages 365–369, https://doi.org/10.1093/sysbio/syy054 Phylogeny / Alignment epa_ng_xsede exabayes_15 perl "epa_ng_expanse" 0 conf_fileregmem 2 scheduler.conf perl $set_memory == 96 perl "ChargeFactor=1.0\\n" . "mem=96G\\n" . "nodes=1\\n" . "node_exclusive=0\\n" . "cpus-per-task=48\\n" . "threads_per_process=48\\n" conf_file243mem 2 scheduler.conf perl $set_memory == 243 perl "ChargeFactor=1.0\\n" . "mem=243G\\n" . "nodes=1\\n" . "node_exclusive=1\\n" . "cpus-per-task=128\\n" . "threads_per_process=128\\n" conf_file500mem 2 scheduler.conf perl $set_memory == 500 perl "ChargeFactor=1.0\\n" . "mem=500G\\n" . "large_data=1\\n" . "nodes=1\\n" . "node_exclusive=0\\n" . "cpus-per-task=32\\n" . "threads_per_process=32\\n" specify_threads_96mem perl $set_memory == 96 perl "-T 48" 99 specify_threads_243mem perl $set_memory == 243 perl "-T 128" 99 specify_threads_500mem perl $set_memory == 500 perl "-T 32" 99 infile Input Tree File (must be in Newick format) perl "--tree infile_tree.txt" 5 infile_tree.txt set_outdir perl "--out-dir ./" all_results * runtime 1 scheduler.conf Maximum Hours to Run (up to 168 hours) perl "runhours=$value\\n" 0.5 The maximum hours to run must be less than 168 perl $set_memory < 500 && $runtime > 168.0 For high memory jobs, the maximum hours to run is 48 hours perl $runtime > 48 && $set_memory == 500 For high memory jobs, the runhours request must be greater than 6, but you will only be charged for the time your run actually uses perl $runtime <= 6 && $set_memory == 500 The maximum hours to run must be greater than 0.05 perl $runtime < 0.05 The job will run on 48 processors as configured. If it runs for the entire configured time, it will consume 48 X $runtime cpu hours perl $runtime > 0 && $set_memory == 96 The job will run on 128 processors as configured. If it runs for the entire configured time, it will consume 128 X $runtime cpu hours perl $runtime > 0 && $set_memory == 243 The job will run on 64 processors as configured. If it runs for the entire configured time, it will consume 64 X $runtime cpu hours perl $runtime > 0 && $set_memory == 500 Estimate the maximum time your job will need to run. We recommend testing initially with a < 0.5hr test run because Jobs set for 0.5 h or less depedendably run immediately in the "debug" queue. Once you are sure the configuration is correct, you then increase the time. The reason is that jobs > 0.5 h are submitted to the "normal" queue, where jobs configured for 1 or a few hours times may run sooner than jobs configured for the full 168 hours. ref_file 12 Select ref-msa file pise (defined $ref_file) ? "--ref-msa ref_msa.fasta":"" ref_msa.fasta binary_file 12 Select binary file perl !defined $ref_file pise (defined $binary_file) ? "--binary ref_binary.bin":"" ref_binary.bin query_file 12 Select query file pise (defined $query_file) ? "--query query.txt":"" query.txt Please select either a text reference file or a binary perl !defined $ref_file && !defined $binary_file Please select a query file perl !defined $query_file raxquery_file 12 Select RAxML info file pise (defined $raxquery_file) ? "--model model.txt":"" model.txt As of version 0.2.0, GTRGAMMA model parameters have to be specified explicitly. There are currently two ways of doing this: Either specify a raxml-ng-style model descriptor (elaborated here), like so: epa-ng -*model GTR{0.7/1.8/1.2/0.6/3.0/1.0}+FU{0.25/0.23/0.30/0.22}+G4{0.47} ... or pass a file containing the relevant information, coming from one of the supported tree inference programs. RECOMMENDED In the case of raxml-ng, pass the [...].bestModel file resulting from an evaluation run to EPA-ng: This method has support for pretty much every model that raxml-ng supports, so it is highly recommended you do it this way. Alternatively we also support parsing the model parameters either from RAxML 8.x info files, or from IQ-TREE report files, though there may be parsing problems as not all models are covered. model_string 12 Specify the model with a text string pise (defined $model_string) ? "--model $model_string":"" Please specify the model parameters with a file or a text string perl !defined $raxquery_file && !defined $model_string Sorry, you cannot select a parameter file AND add these values as a string perl defined $raxquery_file && defined $model_string As of version 0.2.0, GTRGAMMA model parameters have to be specified explicitly. There are currently two ways of doing this: Either specify a raxml-ng-style model descriptor (elaborated here), like so: epa-ng -*model GTR{0.7/1.8/1.2/0.6/3.0/1.0}+FU{0.25/0.23/0.30/0.22}+G4{0.47} ... or pass a file containing the relevant information, coming from one of the supported tree inference programs. RECOMMENDED In the case of raxml-ng, pass the [...].bestModel file resulting from an evaluation run to EPA-ng: This method has support for pretty much every model that raxml-ng supports, so it is highly recommended you do it this way. Alternatively we also support parsing the model parameters either from RAxML 8.x info files, or from IQ-TREE report files, though there may be parsing problems as not all models are covered. set_memory Select the memory required 96 243 500 96 compute_opts Compute Options choose_heuristic Choose your heuristic dyn-heur fix-heur baseball-heur no-heur dyn-heur "--dyn-heur $specify_dynamic" fix-heur "--fix-heur $specify_fixed" baseball-heur "--baseball-heur" no-heur "--no-heur" -g,-*dyn-heur FLOAT:FLOAT in [0 - 1]=0.99999 Excludes: -*fix-heur -*baseball-heur -*no-heur Two-phase heuristic, determination of candidate edges using accumulative threshold. Enabled by default! See -*no-heur for disabling it -G,-*fix-heur FLOAT:FLOAT in [0 - 1] Excludes: -*dyn-heur -*baseball-heur -*no-heur Two-phase heuristic, determination of candidate edges by specified percentage of total edges. -*baseball-heur Excludes: -*dyn-heur -*fix-heur -*no-heur Baseball heuristic as known from pplacer. strike_box=3,max_strikes=6,max_pitches=40. -*no-heur Excludes: -*dyn-heur -*fix-heur -*baseball-heur specify_dynamic Provide a value for the dynamic heuristic perl $choose_heuristic eq "dyn-heur" 0.9999 Please enter a value for the dynamic heuristic perl $choose_heuristic eq "dyn-heur" && !defined $specify_dynamic Value for the dynamic heuristic must be between 0 and 1 perl $choose_heuristic eq "dyn-heur" && ($specify_dynamic > 1 || $specify_dynamic < 0 ) specify_fixed Provide a value for the fixed heuristic perl $choose_heuristic eq "fix-heur" Please enter a value for the fixed hueristic perl $choose_heuristic eq "fix-heur" && !defined $specify_fixed The value for the fixed hueristic must be beteen 0 and 1 perl $choose_heuristic eq "fix-heur" && ($specify_fixed > 1 || $specify_fixed < 0) specify_queryseq Number of query sequences to be read in at a time. (--chunk-size) perl (defined $specify_queryseq) ? "--chunk-size $specify_queryseq":"" employ_queryseq Employ old style of branch length optimization during thorough insertion. (--raxml-blo) perl ($value) ? "--raxml-blo":"" no_premask Do not pre-mask sequences. (--no-pre-mask) perl ($value) ? "--no-pre-mask":"" rate_scalers Use individual rate scalers? (--rate-scalers) off on auto perl "--rate-scalers $value" output_opts Output Options likelihood_weight Likelihood weight filter-acc-lwr filter-min-lwr specify_acclw Accumulated likelihood weight perl $likelihood_weight eq "filter-acc-lwr" perl "--filter-acc-lwr $specify_acclw" specify_minlw Minimum likelihood weight perl $likelihood_weight eq "filter-min-lwr" perl "--filter-min-lwr $specify_minlw" 0.01 specify_minplace Minimum number of placements per sequence to include (--filter-min) perl (defined $specify_minplace) ? "--filter-min $specify_minplace":"" 1 specify_maxplace Maximum number of placements per sequence to include (--filter-max) perl (defined $specify_maxplace) ? "--filter-max $specify_maxplace":"" 7 specify_precision Output decimal point precision for floating point (--precision) perl (defined $specify_precision) ? "--precision $specify_precision":"" 10 preserve_rooting Preserve rooting (--preserve-rooting) perl ($value) ? "--preserve-rooting on":"--preserve-rooting off" Preserve the rooting of rooted trees. When disabled, EPA-ng will print the result as an unrooted tree.