APPLES on XSEDE2.0.11Accurate Phylogenetic Placement using LEast Squares (Distance based)Balaban, M., Sarmashghi, S., and Mirarab, S.Balaban, M., Sarmashghi, S., and Mirarab, S. (2019) APPLES: Scalable Distance-Based Phylogenetic Placement with or without Alignments. Systematic Biology 69, 566-578 10.1093/sysbio/syz063Phylogeny / Alignmentapples_xsedeapples_1perl""0apples_2perl$specify_runtype ne 5perl""0apples_2bperl$specify_runtype eq 5perl""1conf_file2scheduler.confperl
"ChargeFactor=1.0\\n" .
"mem=48G\\n" .
"nodes=1\\n" .
"node_exclusive=0\\n" .
"threads_per_process=24\\n"
infileInputinfile.txt1specify_run1or4perl$specify_runtype == 1 || $specify_runtype == 4 perl"-q infile.txt"4specify_run2perl$specify_runtype == 2perl"-x infile.txt"4specify_run3perl$specify_runtype == 3perl"-d infile.txt"4specify_run5perl$specify_runtype == 5perl"-s infile.txt"4number_coresperl"-T 24"2results*runtime1scheduler.confMaximum Hours to Run (up to 168 hours)0.5The maximum hours to run must be less than 168perl$runtime > 168.0 && $num_gtrees < 4000The maximum hours to run must be greater than 0.05perl$runtime < 0.05The maximum hours to run must be less than or equal to 120perl$runtime > 120.0 && $num_gtrees > 3999perl"runhours=$value\\n"The job will run on 24 processors as configured. If it runs for the entire configured time, it will consume 24 x $runtime cpu hoursperl$runtime > 0 Estimate the maximum time your job will need to run. We recommend testing initially with a < 0.5hr test run because Jobs set for 0.5 h or less depedendably run immediately in the "debug" queue.
Once you are sure the configuration is correct, you then increase the time. The reason is that jobs > 0.5 h are submitted to the "normal" queue, where jobs configured for 1 or a few hours times may
run sooner than jobs configured for the full 168 hours.
specify_runtypeWhat analysis do you want to run?123451Please select a run typeperl!defined $specify_runtypeTo Query a reference dataset, you must specify your input must be a set of query sequences, then reference dataset and reference tree are specified in the parameter interface.
To analyze an extended dataset, your input data must contain both the reference sequences and the query sequences.
To analyze a distance matrix, your input must be the distance matrix from another program, such as DEPP;
To query an existing apples database, your input should be a set of queries, then specify your APPLES DB here;
To create an APPLES database, your input must be a reference dataset; and you must specify the reference tree in the paramter interface.specify_reftreeSpecify your reference tree (-t)reference.nwkperl$specify_runtype == 1 || $specify_runtype == 3 || $specify_runtype == 5 perl($value) ? "-t reference.nwk":"" 8specify_refalignmentSpecify your reference alignment (-s)ref.faperl$specify_runtype == 1perl($value) ? "-s ref.fa":"" 8specify_applesdbSelect the APPLES databaseapples.dbperl$specify_runtype == 4perl($value) ? "-a apples.db":"" 8outfile_nameOutput file name perl(defined $value) ? "-o $value":"" 10Specify a filename for storing the output species tree.specify_proteinInput is Protein Sequences (-p)perl($value) ? "-p":"" 10specify_distthreshDistance threshold (-f)perl(defined $value) ? "-f $value":"" 26This parameter ignores distances higher than the given threshold. Improves accuracy when long distances have a high biasor variance.specify_nominblI already re-estimated the backbone branch lengths (-D)perl($value) ? "-D":"" 26 By default, APPLES rins FastTree prior to placement to re-estitmate branch lengths using a distance based algorithm. This is required for good results. If you already reestimated the backbone branch lengths,
you can skip this step in APPLES using the option "-D" for speedupspecify_methodLeast squares method (-m)perl$specify_runtype ne 5OLSFMBMEBEperl"-m $value"28Name of the weighted least squares methodspecify_criteriaLeast squares method (-c)perl$specify_runtype ne 5perl"-c $value"MLSEHYBRIDME28Name of the weighted least squares methodallow_negblAllow negative branch lengths (-n)perl$specify_runtype ne 5perl($value) ? "-n":"" 26Relax positivity constraint on new branch lengthsspecify_minobsMinimum number of observations (-b)perl$specify_runtype ne 5perl(defined $value) ? "-b $value":"" 26Minimum number of observations kept for each query ignoring the filter threshold.specify_minnongapsMinimum fraction of nongap sites (-V)perl$specify_runtype ne 5perl(defined $value) ? "-V $value":"" 26Minimum fraction of nongap sites needed for a valid pairwise distance.use_internodedistMask low confidence characters (-X)perl$specify_runtype ne 5perl($value) ? "-X":"" 36exclude_internalnodeExclude queries placed on the internal nodes (--exclude)perl$specify_runtype ne 5perl($value) ? "--exclude":"" 36