Muscle Parallel on ACCESS5.0Create Multiple Alignments from Sequences or ProfilesRobert C. Edgar
Edgar, RC (2021), MUSCLE v5 enables improved estimates of phylogenetic tree confidence by ensemble bootstrapping, bioRxiv 2021.06.20.449169. https://doi.org/10.1101/2021.06.20.449169.
Phylogeny/Alignmentmuscle_parallel_xsedeinvoke_muscleperl"muscle_5.1_expanse"0cores_input2perl$select_inputtype eq "alignment" perl"-threads 32"99scheduler_input2bscheduler.confperl$select_inputtype eq "alignment" && !$more_memoryperl
"ChargeFactor=1.0\\n" .
"nodes=1\\n" .
"mem=64G\\n" .
"node_exclusive=0\\n" .
"cpus-per-task=32\\n"
scheduler_input3bscheduler.confperl$select_inputtype eq "alignment" && $more_memoryperl
"ChargeFactor=1.0\\n" .
"nodes=1\\n" .
"large_data=1\\n" .
"mem=500G\\n" .
"node_exclusive=0\\n" .
"cpus-per-task=32\\n"
scheduler_input3scheduler.confperl$select_inputtype ne "alignment"perl
"ChargeFactor=1.0\\n" .
"nodes=1\\n" .
"mem=2G\\n" .
"node_exclusive=0\\n" .
"cpus-per-task=1\\n"
infile_type1Sequences File (must be in fasta format) (-infile)infile.fastaALL_FILES*runtimescheduler.confMaximum Hours to Run (click here for help setting this correctly)1.0
Estimate the maximum time your job will need to run (up to 166 hrs). Your job will be killed if it doesn't finish within the time you specify, however jobs with shorter maximum run times are often scheduled sooner than longer jobs.
Maximum Hours to Run must be between 0.1 - 166perl$runtime > 166 && !$more_memoryMaximum Hours to Run must be between 7 - 48perl$runtime > 48 && $more_memoryFor high memory jobs, the runhours request must be greater than 6, but you will only be charged for the time your run actually usesperl$runtime <= 6 && $more_memoryperl"runhours=$value\\n"The job will run on 32 processors as configured. If it runs for the entire configured time, it will consume 32 X $runtime cpu hoursperl$select_inputtype eq "alignment" && !$more_memoryThe job will run on 256 processors as configured. If it runs for the entire configured time, it will consume 256 X $runtime cpu hoursperl$select_inputtype eq "alignment" && $more_memoryThe job will run on 1 processors as configured. If it runs for the entire configured time, it will consume 1 X $runtime cpu hoursperl$select_inputtype ne "alignment"select_inputtypeI want to:alignmentexploderesampleefastatsdispersemaxccletterconfcolumnconfalignmentPlease confirm the input is a set of sequences in fasta formatperl$select_inputtype eq "alignment"Please confirm the input is an ensemble of alignmentsperl$select_inputtype ne "alignment"See the manual if you need help choosingselect_algorithmSelect the algorithm to useperl$select_inputtype eq "alignment"alignsuper5alignUse the align command if you just want one, high-quality MSA. If your sequence set was so big that the
align command took too much time or memory, try the super5 command instead. Super5 does not directly support
generating complete ensembles, so the -stratified, -diversified and -replicates options are not supported.
input_string1perl$select_inputtype eq "alignment"perl"-$select_algorithm infile.fasta"4specify_outputSpecify an output nameperl$select_inputtype ne "explode"perldefined $specify_output ? "-output $specify_output":""Please specify a name for your output fileperl$select_inputtype eq "alignment" && !defined $specify_output15Consider using a descriptive name so you can easily tell ensembles from individual fasta matrices. If the output filename has a @, then one FASTA file is generated for each replicate where @ is replaced by the replicate name, e.g. resample.43,
otherwise all replicates are written to one EFA file. Typically, you will want to make one tree from each MSA in the resampled ensemble. To get separate FASTA files suitable for input to tree estimation software, you can use the efa_explode command specify_prefixSpecify a prefix perl$select_inputtype eq "explode"perl(defined $value) ? "-prefix $specify_prefix":""Please specify a name for your prefixperl!defined $specify_prefix15The -prefix and -suffix options can be used to specify a fixed prefix or suffix added to the label to make the output filenames. specify_suffixSpecify a suffix perl$select_inputtype eq "explode"perl(defined $value) ? "-suffix $specify_suffix":""Please specify a name for your suffixperl!defined $specify_suffix15The -prefix and -suffix options can be used to specify a fixed prefix or suffix added to the label to make the output filenames. more_memoryI need more memoryperl$select_inputtype eq "alignment"alignsuper5_optionsAlign/Super5 Optionsspecify_seedRandom number seed for generating HMM perturbations (-perturb)perl$select_inputtype eq "alignment"perl(defined $value) ? "-perturb $value":""10Integer random number seed for generating HMM perturbations. Default SEED=0,
which uses default HMM parameters.specify_permSpecify the guide tree permutation (-perm)perl$select_inputtype eq "alignment"abcacbbcaallperl(defined $value) ? "-perm $value":""10There must be a @ in the name for output if -perm all is chosenperl$specify_perm eq "all"-perm PERM Specifies the guide tree permutation. PERM can be none, abc, acb, bca, and all. default is none.specify_consitersSpecify the number of consistency iterations (-consiters)perl$select_inputtype eq "alignment"perl(defined $value) ? "-consiters $value":""210Specify the number of consistency iterations specify_refineitersSpecify the number of refinement iterations (-refiniters)perl$select_inputtype eq "alignment"perl(defined $value) ? "-refineiters $value":""10010-refineiters N Number of refinement iterations. Default 100specify_datatypeSelect the data typeperl$select_inputtype eq "alignment"1231""2"-nt"3"-amino"10alignOnlyAlign Optionsmake_stratifiedMake a stratified ensemble (-stratified)perl$select_inputtype eq "alignment" && $select_algorithm eq "align"perl($value) ? "-stratified":""05Sorry, you cant use perturb option with stratifiedperl$make_stratified && defined $specify_seedSorry, you cant use perm option with stratifiedperl$make_stratified && defined $specify_permThe -stratified option of align creates an ensemble FASTA (EFA) file containing 16 alignments.make_diversifiedMake a diversified ensemble (-diversified)perl$select_inputtype eq "alignment" && $select_algorithm eq "align"perl($value) ? "-diversified":""05Sorry, you cant use perturb option with diversifiedperl$make_diversified && defined $specify_seedSorry, you cant use perm option with diversifiedperl$make_diversified && defined $specify_permThe diversified ensemble uses a different perturbation random number seed for every replicate with the goal of maximizing variation. specify_alignreplicatesSpecify the number of replicates (-replicates)perl$select_inputtype eq "alignment" && $select_algorithm eq "align"30perl(defined $value) ? "-replicates $specify_alignreplicates":""Number of replicates; default 4 for -stratified and 100 for -diversified. With -stratified, one replicate is generated for each guide tree permutation, so the total number of replicates is 4 × N. measure_dispersionMeasure dispersion of the ensembleperl$select_inputtype eq "disperse"30perl"-disperse infile.fasta -log disperse.log"If the dispersion is zero, then all the MSAs in the ensemble are the same and your alignment is robust.
Quite likely, it has no errors. If you see a large dispersion, say bigger than 0.05, then there is significant
variation between the alignments, and this is necessarily explained by alignment errors. You should then start
to consider whether these differences affect your downstream analysis, e.g. making trees.
You can check this by doing the analysis separately for each replicate. explode_ensembleExplode the ensembleperl$select_inputtype eq "explode"perl"-efa_explode infile.fasta"3This command will break an ensemble into individual alignments.resample_ensembleResample the ensembleperl$select_inputtype eq "resample"perl"-resample infile.fasta"A resampled ensemble is a set of new MSAs generated an existing ensemble (call it E) by selecting
columns at random from MSAs in E, with replacement. "With replacement" means that selecting a column does not delete it,
so a given column can be selected any
number of times. If the input ensemble has just one MSA, or all MSAs are identical, this is exactly equivalent
to the Felsenstein bootstrap which is widely supported by phylogenetic tree software. You don't usually see
the ensemble used to calculate Felsenstein bootstrap values because the replicate alignments are
calculated internally by phylogenetic tree software and then saved to obscure files or discarded.
The resample command creates a resampled ensemble from an existing ensemble (usually a diversified ensemble).
replicate_optionsReplicatesspecify_replicatesSpecify the number of resample replicates (-replicates)perl$select_inputtype eq "resample"perl(defined $value) ? "-replicates $value":""The replicates value must be between 0 and 100perl$specify_replicates > 100 || $specify_replicates < 010If the output filename has a @, then one FASTA file is generated for each replicate where @ is replaced by the replicate name, e.g. resample.43,
otherwise all replicates are written to one EFA file. Typically, you will want to make one tree from each MSA in the resampled ensemble.
To get separate FASTA files suitable for input to tree estimation software, you can use the efa_explode command specify_minconfMinimum column confidence (-minconf)perl$select_inputtype eq "resample"perl(defined $value) ? "-minconf $value":""The minconf value must be between 0 and 1perl$specify_minconf > 1 || $specify_minconf < 013-minconf CC Mininum column confidence, value in range zero to one, default 0.5.specify_gapfracMaximum fraction of gapped positions in a column (-gapfract)perl$select_inputtype eq "resample"perl(defined $value) ? "-gapfract $value":""The -gapfract value must be between 0 and 1perl$specify_gapfrac > 1 || $specify_gapfrac < 015-gapfract F Maximum fraction of gapped positions in a column, value in range zero to one, default 0.5.confidence_optionsConfidence Calculationsreport_statsReport information about MSAs stored in ensembleperl$select_inputtype eq "efastats"perl"-efastats infile.fasta -log efastats.log"The reversified ensemble).
add_confseqsAdd digits to each replicate in an ensemble (-addconfseqs)perl$select_inputtype eq "columnconf"perl"-addconfseqs"15-addconfseqs adds two sequences of digits to each MSA replicates in an ensemble. The digits specify
column confidence values to two decimal places. The sequence labels are _conf_ and _conf_2. In a given column,
_conf_ is the first decimal place and _conf_2 is the second decimal place, so e.g. 73 means 0.73. A confidence
of 1.0 is indicated as ++..find_maxccExtract the MSA with highest column confidence from an ensemble. (-maxcc)perl$select_inputtype eq "maxcc"perl"-maxcc infile.fasta -output maxcc.afa"3-maxcc extracts the highest column confidence from an ensembleadd_letterconfCalculate letter confidence values for an ensemble (-letterconf)perl$select_inputtype eq "letterconf"perl"-letterconf infile.fasta -ref aln.afa -output letterconf.afa"10The letterconf command calculates letter confidence values. Output is in FASTA format where each letter is replaced by a letter confidence value 0..9.
See also the addconfseqs command which generates column confidence valuesref_uploadSpecify a reference alignmentperl$select_inputtype eq "letterconf"aln.afa15Please specify a reference alignment to calculate column confidenceperl $select_inputtype eq "letterconf" && !defined $ref_uploadThe -html option specifies an HTML output file where the alignment is colored by confidence.use_htmlMake an html output file (-html)perl$select_inputtype eq "letterconf"perl($value) ? "-html letterconf.html":""80The -html option specifies an HTML output file where the alignment is colored by confidence.use_jalviewCreate a jalview features file (-jalview)perl$select_inputtype eq "letterconf"perl($value) ? "-jalview letterconf.jalview.features":""85The -jalview option specifies a features file for Jalview. See Jalview features for more information