APPLES on XSEDE 2.0.11 Accurate Phylogenetic Placement using LEast Squares (Distance based) Balaban, M., Sarmashghi, S., and Mirarab, S. Balaban, M., Sarmashghi, S., and Mirarab, S. (2019) APPLES: Scalable Distance-Based Phylogenetic Placement with or without Alignments. Systematic Biology 69, 566-578 10.1093/sysbio/syz063 Phylogeny / Alignment apples_xsede apples_1 perl "" 0 apples_2 perl $specify_runtype ne 5 perl "" 0 apples_2b perl $specify_runtype eq 5 perl "" 1 conf_file 2 scheduler.conf perl "ChargeFactor=1.0\\n" . "mem=48G\\n" . "nodes=1\\n" . "node_exclusive=0\\n" . "threads_per_process=24\\n" infile Input infile.txt 1 specify_run1or4 perl $specify_runtype == 1 || $specify_runtype == 4 perl "-q infile.txt" 4 specify_run2 perl $specify_runtype == 2 perl "-x infile.txt" 4 specify_run3 perl $specify_runtype == 3 perl "-d infile.txt" 4 specify_run5 perl $specify_runtype == 5 perl "-s infile.txt" 4 number_cores perl "-T 24" 2 results * runtime 1 scheduler.conf Maximum Hours to Run (up to 168 hours) 0.5 The maximum hours to run must be less than 168 perl $runtime > 168.0 && $num_gtrees < 4000 The maximum hours to run must be greater than 0.05 perl $runtime < 0.05 The maximum hours to run must be less than or equal to 120 perl $runtime > 120.0 && $num_gtrees > 3999 perl "runhours=$value\\n" The job will run on 24 processors as configured. If it runs for the entire configured time, it will consume 24 x $runtime cpu hours perl $runtime > 0 Estimate the maximum time your job will need to run. We recommend testing initially with a < 0.5hr test run because Jobs set for 0.5 h or less depedendably run immediately in the "debug" queue. Once you are sure the configuration is correct, you then increase the time. The reason is that jobs > 0.5 h are submitted to the "normal" queue, where jobs configured for 1 or a few hours times may run sooner than jobs configured for the full 168 hours. specify_runtype What analysis do you want to run? 1 2 3 4 5 1 Please select a run type perl !defined $specify_runtype To Query a reference dataset, you must specify your input must be a set of query sequences, then reference dataset and reference tree are specified in the parameter interface. To analyze an extended dataset, your input data must contain both the reference sequences and the query sequences. To analyze a distance matrix, your input must be the distance matrix from another program, such as DEPP; To query an existing apples database, your input should be a set of queries, then specify your APPLES DB here; To create an APPLES database, your input must be a reference dataset; and you must specify the reference tree in the paramter interface. specify_reftree Specify your reference tree (-t) reference.nwk perl $specify_runtype == 1 || $specify_runtype == 3 || $specify_runtype == 5 perl ($value) ? "-t reference.nwk":"" 8 specify_refalignment Specify your reference alignment (-s) ref.fa perl $specify_runtype == 1 perl ($value) ? "-s ref.fa":"" 8 specify_applesdb Select the APPLES database apples.db perl $specify_runtype == 4 perl ($value) ? "-a apples.db":"" 8 outfile_name Output file name perl (defined $value) ? "-o $value":"" 10 Specify a filename for storing the output species tree. specify_protein Input is Protein Sequences (-p) perl ($value) ? "-p":"" 10 specify_distthresh Distance threshold (-f) perl (defined $value) ? "-f $value":"" 26 This parameter ignores distances higher than the given threshold. Improves accuracy when long distances have a high biasor variance. specify_nominbl I already re-estimated the backbone branch lengths (-D) perl ($value) ? "-D":"" 26 By default, APPLES rins FastTree prior to placement to re-estitmate branch lengths using a distance based algorithm. This is required for good results. If you already reestimated the backbone branch lengths, you can skip this step in APPLES using the option "-D" for speedup specify_method Least squares method (-m) perl $specify_runtype ne 5 OLS FM BME BE perl "-m $value" 28 Name of the weighted least squares method specify_criteria Least squares method (-c) perl $specify_runtype ne 5 perl "-c $value" MLSE HYBRID ME 28 Name of the weighted least squares method allow_negbl Allow negative branch lengths (-n) perl $specify_runtype ne 5 perl ($value) ? "-n":"" 26 Relax positivity constraint on new branch lengths specify_minobs Minimum number of observations (-b) perl $specify_runtype ne 5 perl (defined $value) ? "-b $value":"" 26 Minimum number of observations kept for each query ignoring the filter threshold. specify_minnongaps Minimum fraction of nongap sites (-V) perl $specify_runtype ne 5 perl (defined $value) ? "-V $value":"" 26 Minimum fraction of nongap sites needed for a valid pairwise distance. use_internodedist Mask low confidence characters (-X) perl $specify_runtype ne 5 perl ($value) ? "-X":"" 36 exclude_internalnode Exclude queries placed on the internal nodes (--exclude) perl $specify_runtype ne 5 perl ($value) ? "--exclude":"" 36