Muscle Parallel on ACCESS 5.0 Create Multiple Alignments from Sequences or Profiles Robert C. Edgar Edgar, RC (2021), MUSCLE v5 enables improved estimates of phylogenetic tree confidence by ensemble bootstrapping, bioRxiv 2021.06.20.449169. https://doi.org/10.1101/2021.06.20.449169. Phylogeny/Alignment muscle_parallel_xsede invoke_muscle perl "muscle_5.1_expanse" 0 cores_input2 perl $select_inputtype eq "alignment" perl "-threads 32" 99 scheduler_input2b scheduler.conf perl $select_inputtype eq "alignment" && !$more_memory perl "ChargeFactor=1.0\\n" . "nodes=1\\n" . "mem=64G\\n" . "node_exclusive=0\\n" . "cpus-per-task=32\\n" scheduler_input3b scheduler.conf perl $select_inputtype eq "alignment" && $more_memory perl "ChargeFactor=1.0\\n" . "nodes=1\\n" . "large_data=1\\n" . "mem=500G\\n" . "node_exclusive=0\\n" . "cpus-per-task=32\\n" scheduler_input3 scheduler.conf perl $select_inputtype ne "alignment" perl "ChargeFactor=1.0\\n" . "nodes=1\\n" . "mem=2G\\n" . "node_exclusive=0\\n" . "cpus-per-task=1\\n" infile_type1 Sequences File (must be in fasta format) (-infile) infile.fasta ALL_FILES * runtime scheduler.conf Maximum Hours to Run (click here for help setting this correctly) 1.0 Estimate the maximum time your job will need to run (up to 166 hrs). Your job will be killed if it doesn't finish within the time you specify, however jobs with shorter maximum run times are often scheduled sooner than longer jobs. Maximum Hours to Run must be between 0.1 - 166 perl $runtime > 166 && !$more_memory Maximum Hours to Run must be between 7 - 48 perl $runtime > 48 && $more_memory Large memory runs must be greater than 6 hours, but dont worry, you wont be charged for time you dont use perl $runtime < 7 && $more_memory perl "runhours=$value\\n" The job will run on 32 processors as configured. If it runs for the entire configured time, it will consume 32 X $runtime cpu hours perl $select_inputtype eq "alignment" && !$more_memory The job will run on 256 processors as configured. If it runs for the entire configured time, it will consume 256 X $runtime cpu hours perl $select_inputtype eq "alignment" && $more_memory The job will run on 1 processors as configured. If it runs for the entire configured time, it will consume 1 X $runtime cpu hours perl $select_inputtype ne "alignment" select_inputtype I want to: alignment explode resample efastats disperse maxcc letterconf columnconf alignment Please confirm the input is a set of sequences in fasta format perl $select_inputtype eq "alignment" Please confirm the input is an ensemble of alignments perl $select_inputtype ne "alignment" See the manual if you need help choosing select_algorithm Select the algorithm to use perl $select_inputtype eq "alignment" align super5 align Use the align command if you just want one, high-quality MSA. If your sequence set was so big that the align command took too much time or memory, try the super5 command instead. Super5 does not directly support generating complete ensembles, so the -stratified, -diversified and -replicates options are not supported. input_string1 perl $select_inputtype eq "alignment" perl "-$select_algorithm infile.fasta" 4 specify_output Specify an output name perl $select_inputtype ne "explode" perl defined $specify_output ? "-output $specify_output":"" Please specify a name for your output file perl $select_inputtype eq "alignment" && !defined $specify_output 15 Consider using a descriptive name so you can easily tell ensembles from individual fasta matrices. If the output filename has a @, then one FASTA file is generated for each replicate where @ is replaced by the replicate name, e.g. resample.43, otherwise all replicates are written to one EFA file. Typically, you will want to make one tree from each MSA in the resampled ensemble. To get separate FASTA files suitable for input to tree estimation software, you can use the efa_explode command specify_prefix Specify a prefix perl $select_inputtype eq "explode" perl (defined $value) ? "-prefix $specify_prefix":"" Please specify a name for your prefix perl !defined $specify_prefix 15 The -prefix and -suffix options can be used to specify a fixed prefix or suffix added to the label to make the output filenames. specify_suffix Specify a suffix perl $select_inputtype eq "explode" perl (defined $value) ? "-suffix $specify_suffix":"" Please specify a name for your suffix perl !defined $specify_suffix 15 The -prefix and -suffix options can be used to specify a fixed prefix or suffix added to the label to make the output filenames. more_memory I need more memory perl $select_inputtype eq "alignment" alignsuper5_options Align/Super5 Options specify_seed Random number seed for generating HMM perturbations (-perturb) perl $select_inputtype eq "alignment" perl (defined $value) ? "-perturb $value":"" 10 Integer random number seed for generating HMM perturbations. Default SEED=0, which uses default HMM parameters. specify_perm Specify the guide tree permutation (-perm) perl $select_inputtype eq "alignment" abc acb bca all perl (defined $value) ? "-perm $value":"" 10 There must be a @ in the name for output if -perm all is chosen perl $specify_perm eq "all" -perm PERM Specifies the guide tree permutation. PERM can be none, abc, acb, bca, and all. default is none. specify_consiters Specify the number of consistency iterations (-consiters) perl $select_inputtype eq "alignment" perl (defined $value) ? "-consiters $value":"" 2 10 Specify the number of consistency iterations specify_refineiters Specify the number of refinement iterations (-refiniters) perl $select_inputtype eq "alignment" perl (defined $value) ? "-refineiters $value":"" 100 10 -refineiters N Number of refinement iterations. Default 100 specify_datatype Select the data type perl $select_inputtype eq "alignment" 1 2 3 1 "" 2 "-nt" 3 "-amino" 10 alignOnly Align Options make_stratified Make a stratified ensemble (-stratified) perl $select_inputtype eq "alignment" && $select_algorithm eq "align" perl ($value) ? "-stratified":"" 0 5 Sorry, you cant use perturb option with stratified perl $make_stratified && defined $specify_seed Sorry, you cant use perm option with stratified perl $make_stratified && defined $specify_perm The -stratified option of align creates an ensemble FASTA (EFA) file containing 16 alignments. make_diversified Make a diversified ensemble (-diversified) perl $select_inputtype eq "alignment" && $select_algorithm eq "align" perl ($value) ? "-diversified":"" 0 5 Sorry, you cant use perturb option with diversified perl $make_diversified && defined $specify_seed Sorry, you cant use perm option with diversified perl $make_diversified && defined $specify_perm The diversified ensemble uses a different perturbation random number seed for every replicate with the goal of maximizing variation. specify_alignreplicates Specify the number of replicates (-replicates) perl $select_inputtype eq "alignment" && $select_algorithm eq "align" 30 perl (defined $value) ? "-replicates $specify_alignreplicates":"" Number of replicates; default 4 for -stratified and 100 for -diversified. With -stratified, one replicate is generated for each guide tree permutation, so the total number of replicates is 4 × N. measure_dispersion Measure dispersion of the ensemble perl $select_inputtype eq "disperse" 30 perl "-disperse infile.fasta -log disperse.log" If the dispersion is zero, then all the MSAs in the ensemble are the same and your alignment is robust. Quite likely, it has no errors. If you see a large dispersion, say bigger than 0.05, then there is significant variation between the alignments, and this is necessarily explained by alignment errors. You should then start to consider whether these differences affect your downstream analysis, e.g. making trees. You can check this by doing the analysis separately for each replicate. explode_ensemble Explode the ensemble perl $select_inputtype eq "explode" perl "-efa_explode infile.fasta" 3 This command will break an ensemble into individual alignments. resample_ensemble Resample the ensemble perl $select_inputtype eq "resample" perl "-resample infile.fasta" A resampled ensemble is a set of new MSAs generated an existing ensemble (call it E) by selecting columns at random from MSAs in E, with replacement. "With replacement" means that selecting a column does not delete it, so a given column can be selected any number of times. If the input ensemble has just one MSA, or all MSAs are identical, this is exactly equivalent to the Felsenstein bootstrap which is widely supported by phylogenetic tree software. You don't usually see the ensemble used to calculate Felsenstein bootstrap values because the replicate alignments are calculated internally by phylogenetic tree software and then saved to obscure files or discarded. The resample command creates a resampled ensemble from an existing ensemble (usually a diversified ensemble). replicate_options Replicates specify_replicates Specify the number of resample replicates (-replicates) perl $select_inputtype eq "resample" perl (defined $value) ? "-replicates $value":"" The replicates value must be between 0 and 100 perl $specify_replicates > 100 || $specify_replicates < 0 10 If the output filename has a @, then one FASTA file is generated for each replicate where @ is replaced by the replicate name, e.g. resample.43, otherwise all replicates are written to one EFA file. Typically, you will want to make one tree from each MSA in the resampled ensemble. To get separate FASTA files suitable for input to tree estimation software, you can use the efa_explode command specify_minconf Minimum column confidence (-minconf) perl $select_inputtype eq "resample" perl (defined $value) ? "-minconf $value":"" The minconf value must be between 0 and 1 perl $specify_minconf > 1 || $specify_minconf < 0 13 -minconf CC Mininum column confidence, value in range zero to one, default 0.5. specify_gapfrac Maximum fraction of gapped positions in a column (-gapfract) perl $select_inputtype eq "resample" perl (defined $value) ? "-gapfract $value":"" The -gapfract value must be between 0 and 1 perl $specify_gapfrac > 1 || $specify_gapfrac < 0 15 -gapfract F Maximum fraction of gapped positions in a column, value in range zero to one, default 0.5. confidence_options Confidence Calculations report_stats Report information about MSAs stored in ensemble perl $select_inputtype eq "efastats" perl "-efastats infile.fasta -log efastats.log" The reversified ensemble). add_confseqs Add digits to each replicate in an ensemble (-addconfseqs) perl $select_inputtype eq "columnconf" perl "-addconfseqs" 15 -addconfseqs adds two sequences of digits to each MSA replicates in an ensemble. The digits specify column confidence values to two decimal places. The sequence labels are _conf_ and _conf_2. In a given column, _conf_ is the first decimal place and _conf_2 is the second decimal place, so e.g. 73 means 0.73. A confidence of 1.0 is indicated as ++.. find_maxcc Extract the MSA with highest column confidence from an ensemble. (-maxcc) perl $select_inputtype eq "maxcc" perl "-maxcc infile.fasta -output maxcc.afa" 3 -maxcc extracts the highest column confidence from an ensemble add_letterconf Calculate letter confidence values for an ensemble (-letterconf) perl $select_inputtype eq "letterconf" perl "-letterconf infile.fasta -ref aln.afa -output letterconf.afa" 10 The letterconf command calculates letter confidence values. Output is in FASTA format where each letter is replaced by a letter confidence value 0..9. See also the addconfseqs command which generates column confidence values ref_upload Specify a reference alignment perl $select_inputtype eq "letterconf" aln.afa 15 Please specify a reference alignment to calculate column confidence perl $select_inputtype eq "letterconf" && !defined $ref_upload The -html option specifies an HTML output file where the alignment is colored by confidence. use_html Make an html output file (-html) perl $select_inputtype eq "letterconf" perl ($value) ? "-html letterconf.html":"" 80 The -html option specifies an HTML output file where the alignment is colored by confidence. use_jalview Create a jalview features file (-jalview) perl $select_inputtype eq "letterconf" perl ($value) ? "-jalview letterconf.jalview.features":"" 85 The -jalview option specifies a features file for Jalview. See Jalview features for more information