ASTRAL on XSEDE 5.15.4 Coalescent-based species tree estimation Siavash Mirab and Tandy Warnow Mirarab, S., and Warnow, T. (2015) ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes. Bioinformatics 31, i44-i52. 10.1093/bioinformatics/btv234 Phylogeny / Alignment astral_xsede astral_invoke perl $num_gtress < "4000" perl "astral_5.15.4_expanse" 0 astral_invoke_gpu perl $num_gtress > "3999" perl "astral_5.15.4_expanse.gpu" 0 number_cores1 2 scheduler.conf perl $num_gtrees < 1000 perl "ChargeFactor=1.0\\n" . "nodes=1\\n" . "mem=15G\\n" . "node_exclusive=0\\n" . "cpus-per-task=8\\n" . "threads_per_process=8\\n" number_cores2 2 scheduler.conf perl $num_gtrees > 999 && $num_gtrees < 4000 perl "ChargeFactor=1.0\\n" . "nodes=1\\n" . "mem=30G\\n" . "node_exclusive=0\\n" . "cpus-per-task=16\\n" . "threads_per_process=16\\n" number_cores3 2 scheduler.conf perl $num_gtrees > 3999 && $num_gtrees < 8000 perl "ChargeFactor=1.0\\n" . "gpu=1\\n" . "nodes=1\\n" . "mem=92G\\n" . "node_exclusive=0\\n" . "threads_per_process=10\\n" number_cores4 2 scheduler.conf perl $num_gtrees > 7999 && $num_gtrees < 16000 perl "ChargeFactor=1.0\\n" . "gpu=2\\n" . "nodes=1\\n" . "mem=184G\\n" . "node_exclusive=0\\n" . "threads_per_process=20\\n" number_cores5 2 scheduler.conf perl $num_gtrees > 15999 perl "ChargeFactor=1.0\\n" . "gpu=4\\n" . "nodes=1\\n" . "mem=368G\\n" . "node_exclusive=1\\n" . "threads_per_process=40\\n" infile Input infile.tre 1 infile_invoke Input perl "-i infile.tre" 1 number_threads1 4 perl $num_gtrees < 1000 perl "-T 8" number_threads2 4 perl $num_gtrees > 999 && $num_gtrees < 4000 perl "-T 16" number_threads3 4 perl $num_gtrees > 3999 && $num_gtrees < 8000 perl "-T 10" number_gpus1 4 perl $num_gtrees > 3999 && $num_gtrees < 8000 perl "-G 1" number_threads4 4 perl $num_gtrees > 7999 && $num_gtrees < 16000 perl "-T 20" number_gpus2 4 perl $num_gtrees > 7999 && $num_gtrees < 16000 perl "-G 1,2" number_threads5 4 perl $num_gtrees > 15999 perl "-T 40" number_gpus4 4 perl $num_gtrees > 15999 perl "-G 1,2,3,4" write_log 99 perl "2> output.log" runtime 1 scheduler.conf Maximum Hours to Run (up to 168 hours) 0.5 The maximum hours to run must be less than 168 perl $runtime > 168.0 && $num_gtrees < 4000 The maximum hours to run must be greater than 0.05 perl $runtime < 0.05 The maximum hours to run must be less than or equal to 120 perl $runtime > 120.0 && $num_gtrees > 3999 perl "runhours=$value\\n" The job will run on 8 processors as configured. If it runs for the entire configured time, it will consume 8 x $runtime cpu hours perl $num_gtrees < 1000 The job will run on 16 processors as configured. If it runs for the entire configured time, it will consume 16 x $runtime cpu hours perl $num_gtrees > 999 && $num_gtrees < 4000 The job will run on 10 processors and 1 GPU as configured. If it runs for the entire configured time, it will consume 21 x $runtime cpu hours perl $num_gtrees > 3999 && $num_gtrees < 8000 The job will run on 20 processors and 2 GPU as configured. If it runs for the entire configured time, it will consume 42 x $runtime cpu hours perl $num_gtrees > 7999 && $num_gtrees < 16000 The job will run on 40 processors and 4 GPU as configured. If it runs for the entire configured time, it will consume 84 x $runtime cpu hours perl $num_gtrees > 15999 Estimate the maximum time your job will need to run. We recommend testing initially with a < 0.5hr test run because Jobs set for 0.5 h or less depedendably run immediately in the "debug" queue. Once you are sure the configuration is correct, you then increase the time. The reason is that jobs > 0.5 h are submitted to the "normal" queue, where jobs configured for 1 or a few hours times may run sooner than jobs configured for the full 168 hours. num_gtrees Number of Gene trees in your input file outfile_name Output file name perl (defined $value) ? "-o $value":"" Specify a filename for storing the output species tree. score_only Score the provided species tree and exit (-q) species_tree.tre perl ($value) ? "-q species_tree.tre":"" 8 use_internodedist USe NJst-like internode distances perl ($value) ? "-A":"" 6 USe NJst-like internode distances instead of quartet distance for building the search space (X). Unpublished work. branch_annotate How much annotations should be added to each branch (-t) perl ($value) ? "-t $value":"" 0 1 2 3 4 8 10 3 8 specify_lambda Set the lambda parameter for the Yule prior. (-c) perl (defined $value) ? "-c $value":"" 0.5 12 Set the lambda parameter for the Yule prior used in the calculations of branch lengths and posterior probabilities. Set to zero to get ML branch lengths instead of MAP. Higher values tend to shorten estimated branch lengths and very high values can give inaccurate results (or even result in underflow). (default: 0.5) upload_namemapfile Select a name mapping file 30 namemapfile.txt perl (defined $value) ? "-a namemapfile.txt":"" Select a file containing the mapping between names in gene tree and names in the species tree. The mapping file has one line per species, with one of two formats: species: gene1,gene2,gene3,gene4 species 4 gene1 gene2 gene3 gene4 specify_minleaves Remove genes with less than specified number of leaves (-m) perl (defined $value) ? "-m $value":"" 26 specify_samplingrounds Perform this many rounds of individual sampling for building the set X (--samplingrounds) perl (defined $value) ? "--samplingrounds $value":"" 26 For multi-individual datasets, perform these many rounds of individual sampling for building the set X. The program automatically picks this parameter if not provided or if below one specify_samplenumtrees Sample this number of trees for each locus (-w) perl (defined $value) ? "-w $value":"" 1 26 specify_exactsoln Find the exact solution by looking at all clusters (-x) perl ($value) ? "-x ":"" 0 32 Please do not use the exact (-x) option when there are more than 18 taxa perl $specify_exactsoln extra_biparts How much extra bipartitions should be added? (-p) perl (defined $value) ? "-p $value":"" 0 1 2 1 Quadratic distance-based method can be quite slow perl $extra_biparts = 3 38 0: adds nothing extra. 1: (default): adds to X but not excessively (greedy resolutions). 2: adds a potentially large number and therefore can be slow (quadratic distance-based). provide_extratrees Provide extra trees (with gene labels) (-e) perl (defined $value) ? "-e extratrees.tre ":"" extratrees.tre 32 Provide extra trees (with gene labels) used to enrich the set of clusters searched extra_speciestrees Provide extra trees (with species labels) (-f) perl (defined $value) ? "-f extraspeciestrees.tre":"" extraspeciestrees.tre 32 Provide extra trees (with species labels) used to enrich the set of clusters searched remove_biparts Remove bipartitions of the provided extra trees (--remove-bipartition) perl ($value) ? "--remove-bipartition ":"" extraspeciestrees.tre 32 Removes bipartitions of the provided extra trees (with species labels) specify_trimthreshold Trimming threshold (-d) perl (defined $value) ? "-d $value":"" 0 26 The trimming threshold is the user's estimate on normalized score; the closer user's estimate is, the faster Astral runs. (default: 0) boot_onlygtr Perform bootstrapping but only with gene tree resampling. (--gene-only) perl !$use_bootstrapping perl ($value) ? "--gene-only":"" 14 results *