MAFFT on XSEDE

]> MAFFT on XSEDE 7.402 Multiple alignment program for amino acid or nucleotide sequences; parallel version Kazutaka Katoh, Kei-ichi Kuma, Hiroyuki Toh, and Takashi Miyata Katoh K, Toh H. (2008)Recent developments in the MAFFT multiple sequence alignment program. Brief Bioinform. 9(4):286-98 Kazutaka K, Toh, H. (2010)Parallelization of the MAFFT multiple sequence alignment program Bioinformatics 26(15): 1899-1900 Nucleic Acid Sequence,Protein Sequence,Phylogeny / Alignment http://align.bmr.kyushu-u.ac.jp/mafft/software/ mafft_xsede &runtime_tg; mafft7402 perl $which_mafft eq "7402" perl "mafft_7402_comet" 0 mafft7394 perl $which_mafft eq "7394" perl "mafft_7394_comet" 0 mafft7305 perl $which_mafft eq "7305" perl "mafft_7305_comet" 0 mafft7187 perl $which_mafft eq "7187" perl "mafft68" 0 mafft_tg_scheduler scheduler.conf perl !$more_memory perl


	
				 "threads_per_process=8\\n" .
				 "node_exclusive=0\\n" .
				 "nodes=1\\n"

1 mafft_tg_scheduler_big scheduler.conf perl $more_memory perl


	
				 "threads_per_process=12\\n" .
				 "node_exclusive=0\\n" .
				 "nodes=1\\n"

1 mafft_regular_memory perl !$more_memory perl "--thread 8" 1 mafft_big_memory perl $more_memory perl "--thread 12" 1 infile Sequences File (FASTA format) input.fasta perl "input.fasta" 90 all_outputfiles * outputfile output.mafft perl " > output.mafft" 99 tree_outputfile *.tree perl defined $outputGuideTree && $outputGuideTree eq "1" overall Set the Basic Run Parameters which_mafft Which version of MAFFT are you using? 7402 V 7.402 7394 V 7.394 7305 V 7.305 7187 V 7.187 7402 There is a change in reverse complement notation for MAFFT 7.305 (see comments) perl $which_mafft > 7187 Please select a MAFFT version perl !defined $which_mafft There is a change in reverse complement notation for MAFFT > 7.305. In the previous MAFFT 7.187, reverse-complemented strains are marked like this _R__os_[0-9]+_oe_, where [0-9]+ is part of regular expression, but in MAFFT 7.305, reverse complemented strains are marked just _R_ datatype Input type 0 Determine Automatically dna Nucleotide protein Amino Acid 0 0 "" dna " --nuc" protein "--amino" 29 The job will run on 8 processors as configured. If it runs for the entire configured time, it will consume 8 x $runtime cpu hours perl ($datatype == 0 || $datatype eq dna || $datatype eq protein) && !$more_memory The job will run on 12 processors as configured. If it runs for the entire configured time, it will consume 12 x $runtime cpu hours perl ($datatype == 0 || $datatype eq dna || $datatype eq protein) && $more_memory more_memory I need 60 GB of memory, a previous run ran out of memory 0 auto_analysis Automatically select an appropriate strategy from L-INS-i, FFT-NS-i and FFT-NS-2 (based on data size) (--auto) perl !$configure_analysis perl ($value) ? " $outputOrder --ep 0.0 --auto":"" 1 2 analysis_type Use a preconfigured MAFFT strategy that: perl !$configure_analysis && !$auto_analysis accurate Favors accuracy fast Favors speed rna Considers RNA structure accurate Please choose an analysis type, or opt to create an analysis from scratch perl

!$configure_analysis && !$auto_analysis && $analysis_type ne "accurate" && $analysis_type ne "rna" && $analysis_type ne "fast"

configure_analysis I want to configure my own analysis from scratch: (--mafft) perl !$auto_analysis 0 2 accurate_executable Choose a MAFFT accurate executable perl $analysis_type eq "accurate" && !$auto_analysis && !$configure_analysis linsi68 L-INS-i ginsi68 G-INS-i einsi68 E-INS-i ftnsi FFT-NS-i linsi68 "--localpair --maxiterate 1000 $outputOrder --ep 0.0 --retree 1" ginsi68 "--globalpair --maxiterate 1000 $outputOrder --ep 0.0 --retree 1" einsi68 "--ep 0.0 --genafpair --maxiterate 1000 $outputOrder --retree 1" ftnsi "$outputOrder --ep 0.0 --maxiterate 1000" linsi68 Please choose an accurate MAFFT executable perl $analysis_type eq "accurate" && !defined $accurate_executable 2 Accuracy-oriented methods: * L-INS-i (probably most accurate; recommended for < 200 sequences; iterative refinement method incorporating local pairwise alignment information); equivalent to: mafft --localpair --maxiterate 1000 $outputOrder --ep 0.0 --retree 1 input [< output] * G-INS-i (suitable for sequences of similar lengths; recommended for < 200 sequences; iterative refinement method incorporating global pairwise alignment information); equivalent to: mafft --globalpair --maxiterate 1000 $outputOrder --ep 0.0 --retree 1 input [< output] * E-INS-i (suitable for sequences containing large unalignable regions; recommended for < 200 sequences); equivalent to: mafft --ep 0.0 --genafpair --maxiterate 1000 $outputOrder --retree 1 [< output] For E-INS-i, the --ep 0 option is recommended to allow large gaps. FFT-NS-i (Slow; iterative refinement method) equivalent to mafft $outputOrder --ep 0.0 --maxiterate 1000 fast_executables Choose a fast MAFFT executable perl $analysis_type eq "fast" && !$auto_analysis && !$configure_analysis fftnsi2 FFT-NS-i (2 cycles) fftnsi1000 FFT-NS-i (1000 cycles) fftns168 FFT-NS-1 (0 cycles) fftns268 FFT-NS-2 (0 cycles) nwnsi68 NW-NS-i (2 cycles) nwns268 NW-NS-2 nwns268part NW-NS-PartTree fftnsi2 " $outputOrder --maxiterate 2 --retree 2" fftnsi1000 " $outputOrder --maxiterate 1000 --retree 2" fftns168 " $outputOrder --maxiterate 0 --retree 1" fftns268 " $outputOrder --maxiterate 0 --retree 2 " nwnsi68 " $outputOrder --maxiterate 2 --retree 2 --nofft" nwns268 " $outputOrder --maxiterate 0 --retree 2 --nofft" nwns268part " --thread 8 $outputOrder --maxiterate 0 --retree 1 --nofft --parttree" Please choose a fast MAFFT executable perl $analysis_type eq "fast" && !defined $fast_executables 2 Speed-oriented methods: *FFT-NS-i (iterative refinement method; two cycles only); equivalent to mafft -retree 2 -maxiterate 2 input [< output] *FFT-NS-i (iterative refinement method; 1000 cycles); equivalent to mafft -retree 2 -maxiterate 1000 input [< output] *FFT-NS-1 (very fast; recommended for >2000 sequences; progressive method with a rough guide tree): : mafft --retree 1 --maxiterate 0 input [< output] *FFT-NS-2 (fast; progressive method); equivalent to: mafft --retree 2 $outputOrder --maxiterate 0 input [< output] *NW-NS-i (nwnsi) (iterative refinement withoutFFT, two cycles only); equivalent to: mafft --retree 2 --maxiterate 2 --nofft input [< output] *NW-NS-2 (fast; progressive method without the FFT approximation); equivalent to: mafft -retree 2 --maxiterate 0 -nofft input [< output] * *NW-NS-PartTree-1 (recommended for ~10,000 to ~50,000 sequences; progressive method with the PartTree algorithm); equivalent to: mafft -retree 1 -maxiterate 0 -nofft -parttree input [< output] rna_executable Choose a MAFFT RNA structure executable perl $analysis_type eq "rna" && !$auto_analysis qinsi Q-INS-i xinsi X-INS-i qinsi " -qinsi " xinsi " -xinsi " qinsi 1 RNA structure methods * Q-INS-i (suitable for sequences containing large unalignable regions; recommended for < 200 X 1,000 nt sequences); equivalent to: mafft-qinsi $outputOrder --ep 0.0 [--objective function] Secondary structure information of RNA is considered. Uses the Four-way Consistency objective function (Katoh and Toh, submitted) for incorporating structural information. These methods are suitable for a global alignment of highly diverged ncRNA sequenes. For relatively conserved RNAs, such as SSU and LSU rRNA, the advantage of these methods is small. Uses the Four-way Consistency objective function (Katoh and Toh, submitted) for incorporating structural information. * X-INS-i (suitable for sequences containing large unalignable regions; recommended for < 50 sequences x 1,000 nt sequences); equivalent to: mafft-xinsi $outputOrder --ep 0.0 [--algorithm] CONTRAfold or McCaskill (default) algorithm X-INS-i is a framework based on the Four-way Consistency objective function to build a multiple structural alignment by combining pairwise structural alignments given by an external program. At present, the external program can be selected from MXSCARNA (default), LaRA and FOLDALIGN (the local and global options). use_contrafold Use Contrafold rather than McCaskill algorithm (--contrafold) perl $analysis_type eq "rna" perl ($value) ? "--contrafold":"" xinsi_option Which X-INS-i option should be used perl $rna_executable eq "xinsi" scarnapair larapair larapair foldalignlocalpair foldalignlocalpair --foldalignglobalpair foldalignglobalpair scarnapair MXSCARNA larapair "--larapair" foldalignlocalpair "--foldalignlocalpair" foldalignglobalpair "--foldalignglobalpair" scarnapair "--scarnapair" anysymbol There are unusual characters in the dataset (--anysymbol) perl ($value) ? " --anysymbol" : "" 0 Check this box if unusual characters appear in the input sequences 3 para_algorithm Algorithm Options distanceMetric Distance metric perl $configure_analysis 0 6merpair 1 globalpair 2 localpair 3 genafpair 4 fastapair 0 " --6merpair" 1 " --globalpair" 2 " --localpair" 3 " --genafpair" 4 " --fastapair" 0 2 * 6merpair: Distance is calculated based on the number of shared 6mers. * globalpair: All pairwise alignments are computed with the Needleman-Wunsch algorithm. More accurate but slower than using 6merpair. Suitable for a set of globally alignable sequences. Applicable to up to ~200 sequences. A combination with --maxiterate 1000 is recommended (G-INS-i). * localpair: All pairwise alignments are computed with the Smith-Waterman algorithm. More ac curate but slower than using 6merpair. Suitable for a set of locally alignable sequences. Applicable up to ~200 sequences. A combination with --maxiterate 1000 is recommended (L-INS-i). * genafpair: All pairwise alignments are computed with a local algorithm with the generalized affine gap cost (Altschul 1998). More accurate but slower than using 6merpair. Suitable when large internal gaps are expected. Applicable to up to ~200 sequences. A combination with --maxiterate 1000 is recommended (E-INS-i). * fastapair: All pairwise alignments are computed with FASTA (Pearson and Lipman 1988). FASTA is required. weighting_factor Weighting factor for the consistency term calculated from pairwise alignments (--weighti) perl $configure_analysis && $distanceMetric ne "0" perl (defined $value && $value ne $vdef) ? " --weighti $value" : "" 2.7 Weighting factor for the consistency term calculated from pairwise alignments. Valid when any of --globalpair, --localpair, --genafpair, --fastapair or --blastpair is selected. 3 retrees Number of times guide tree is built in progressive stage (--retree) perl ($distanceMetric eq "0") && $configure_analysis perl (defined $value && $value ne $vdef) ? " --retree $value" : "" 2 Valid only with 6-mer distances 4 iterativeRefinements Number of cycles of iterative refinement (-maxiterate) perl $configure_analysis perl (defined $value && $value ne $vdef) ? " --maxiterate $value" : "" 0 5 useFFT FFT approximation in group-to-group alignment perl $configure_analysis 0 off 1 on 1 perl (defined $value && $value ne $vdef) ? " --nofft" : "" 6 noScore Check alignment score in iterative refinement stage ( --noscore) 0 no 1 yes 1 perl (defined $value && $value ne $vdef) ? " --noscore" : "" 7 adjust_direction Adjust direction according to the first sequence (accurate enough for most cases) (--adjustdirection) perl ($value) ? "--adjustdirection ":"" 0 80 This option generates reverse complement sequences, as necessary, and aligns them together with the remaining sequences. This option works well unless the sequences are highly diverged. accuratelyadjust_direction Adjust direction according to the first sequence (very slow) (--adjustdirectionaccurately) perl ($value) ? "--adjustdirectionaccurately ":"" Sorry, you cant choose both adjust direction options on a single run perl $adjust_direction && $accuratelyadjust_direction 0 80 This option generates reverse complement sequences, as necessary, and aligns them together with the remaining sequences. This option works is only for use when sequences are highly diverged. It is very slow memSave Use the Myers-Miller (1988) algorithm (--memsave) 0 auto-select 1 yes 0 perl (defined $value && $value ne $vdef) ? " --memsave" : "" By default, this is automatically turned on when the alignment length exceeds 10,000 8 usePartTree Use the PartTree algorithm for tree building. 0 no 1 yes 0 Uses a fast tree-building method (PartTree, Katoh and Toh 2007). Recommended if a large number (> ~10,000) of sequences are input. 9 partTreeMetric PartTree distance metric perl $usePartTree eq "1" 0 6-mer 1 DP 2 FASTA 0 " --parttree " 1 " --dpparttree " 2 " --fastaparttree " 0 10 * 6-mer: default distance metric * DP: distances are based on dynamic programming. Slightly more accurate and slower than using 6-mers. * FASTA: distances based on FASTA. Slightly more accurate and slower than using 6-mers. partTreePartitions Number of partitions in the PartTree algorithm (--partsize) perl $usePartTree eq "1" perl (defined $value && $value ne $vdef) ? " --partsize $value" : "" 50 11 maxAlignment Maximum alignment size (--groupsize) perl $usePartTree eq "1" perl (defined $value) ? " --groupsize $value" : "" Valid only with the --*parttree options. Default: the number of input sequences 12 unalignlevel Set unalignlevel option (includes --allowshift, --unalignlevel, 0.8 is a reasonable choice) perl $distanceMetric eq "1" || $accurate_executable eq "ginsi68" perl (defined $value) ? "--allowshift --unalignlevel $value " : "" Please select a value between 0 and 0.9 for unalignlevel perl defined $value && ($value < 0 || $value > 0.9) Allowed values are 0-0.9. Default- 0.8. The -unalignlevel option only works if an option using the globalpair method is selected 12 para_parameters Seed/Profile/Merge Options use_seed Use a Seed Alignment (--seed) perl ($value) ? "--seed seed_alignment.fasta":"" 0 80 The --seed option can be used for adding unaligned sequences into a highly reliable alignment (seed) consisting of a small number of sequences. In this option, the aligned letters in the seed alignment are preserved but gaps are not necessarily preserved. If the given alignment (including the gap pattern) has to be completely preserved, use the -add or -addfragments option Please select a seed alignment file perl $use_seed && !defined $seed_alignment1 seed_alignment1 Select the Seed Alignment perl $use_seed seed_alignment.fasta use_add Use Add Alignment (--add) perl ($value) ? "--add":"" 0 89 Sorry, you cant use the add option with --seed or --addfragment options, please uncheck one of these boxes perl $use_add && ($use_addprof || $use_addfrag || $use_seed) Please select an alignment file to --add to perl $use_add && !defined $add_alignment1 This option allows the user to upload an alignment to add to. Use the --add option if the number of unaligned sequences is much smaller than the number of sequences in the skeleton alignment add_alignment1 Select the Alignment to add unaligned sequences to perl $use_add existing_alignment.fasta perl (defined $use_add) ? "existing_alignment.fasta":"" 93 use_addfrag Use Addfragments Alignment (--addfragments) perl ($value) ? "--addfragments":"" 0 89 Sorry, you cant use the addfragment option with --seed, --addprofile, or --add options, please uncheck one of these boxes perl $use_addfrag && ($use_addprof || $use_add || $use_seed) Please select a reference alignment file perl $use_addfrag && !defined $ref_alignment1 The --addfrag option allows the user to add unaligned fragmentary sequence(s) into an existing alignment reorder_add Reorder Output Alignment (--reorder) perl $use_addfrag || $use_add perl ($value) ? "--reorder":"" 1 91 Omit --reorder to preserve the original sequence order. use_keeplength Preserve Alignment length while adding sequences (--keeplength) perl $use_addfrag || $use_add perl ($value) ? "--keeplength":"" 0 92 If the -8keeplength option is given, then the alignment length is unchanged. Insertions at the new sequences are deleted. use_mapout See a correspondence table of positions after adding (--mapout) perl $use_addfrag || $use_add perl ($value) ? "--mapout":"" 0 92 Add -*mapout to see a correspondence table of positions, new_sequences.map, between before and after the calculation. The -*mapout option automatically turns on the -*keeplength option, to keep the numbering of sites in the reference alignment large_align Fast Alignment for Large data set (--6merpair) perl $use_addfrag perl ($value) ? "--6merpair":"" 0 92 Use the --6merpair option for large data. ref_alignment1 Select the Reference Alignment perl $use_addfrag perl "ref_alignment.fasta" ref_alignment.fasta 93 Sorry, you cant use both add and addfrag in the same run perl $use_addfrag && $use_add use_addprof Add Aligned Sequences to an Existing Alignment (--addprofile) perl ($value) ? "--addprofile":"" 0 81 Sorry, you cant use the addprofile option with --seed, --add, or addfragment options, please uncheck one of these boxes perl $use_addprof && ($use_addfrag || $use_add || $use_seed) Please select an existing alignment file for --addprofile perl $use_addprof && !defined $existing_alignment1 This option allows the user to upload an alignment for addprofile. Use the --addprofile option to add sequences to an existing alignment existing_alignment1 Select the Existing Alignment perl $use_addprof perl "existing_alignment.fasta" existing_alignment.fasta 93 use_merge Merge Two or more sub-MSAs into a single file (--merge) perl !$use_addprof && !$use_addfrag && !$use_add perl ($value) ? "--merge submsa_table.fasta":"" 0 81 This option allows the user to merge two or more sub-MSAs (and unaligned sequences) into a single MSA by the --merge option. Each sub-MSA is preserved. submsa_table Select the SUBMSA Table perl $use_merge submsa_table.fasta Please select a SUBMSA table for the --merge option perl $use_merge && !defined $submsa_table treein_tree Provide a Guide Tree for Merge perl $use_merge perl (defined $value) ? "--treein guidetree.tre":"" 84 guidetree.tre para_parameters Algorithm Parameters dnaMatrix Nucleic Acid matrix selection (--kimura) perl $datatype eq "dna" 200 200PAM/kappa=2 20 20PAM / kappa=2 1 1PAM / kappa=2 200 "" 20 "--kimura 20" 1 "--kimura 1" perl "--kimura $value" 200 20 aaMatrix Amino Acid matrix selection perl $datatype eq "protein" 0 BLOSUM (Henikoff and Henikoff 1992) 1 JTT PAM (Jones et al. 1992) 2 Transmembrane PAM (Jones et al. 1994) 3 User-defined 0 20 jtt JTT PAM matrix (Jones et al. 1992) selection (--jtt) perl (defined $aaMatrix && $aaMatrix eq "1") perl (defined $value) ? " --jtt $value" : "" Please enter a value greater than 0. perl $value < 1 Valid entries must be greater than 0. 22 tm Transmembrane PAM matrix (Jones et al. 1994) selection (--tm) perl (defined $aaMatrix && $aaMatrix eq "2") perl (defined $value) ? " --tm $value" : "" Please enter a value greater than 0. perl $value < 1 Valid entries must be greater than 0. 23 userMatrix User-defined amino acid scoring matrix in BLAST format (--aamatrix) perl (defined $aaMatrix && $aaMatrix eq "3") userMatrixFile.blast perl (defined $value) ? " --aamatrix userMatrixFile.blast" : "" The format of matrixfile is the same to that of BLAST. Ignored when nucleotide sequences are input. 24 opPenaltyGroupToGroup Gap opening penalty for group-to-group alignment (--op) perl (defined $value && $value ne $vdef) ? " --op $value" : "" 1.53 13 extendPenaltyGroupToGroup perl $configure_analysis Offset value (gap extension penalty) for group-to-group alignment (--ep) perl (defined $value && $value ne $vdef) ? " --ep $value" : "" 0.123 14 opPenaltyPairwise Gap open penalty for pairwise alignment (--lop) perl ($distanceMetric eq "2") || ($distanceMetric eq "3") perl (defined $value && $value ne $vdef) ? " --lop $value" : "" 15 -2.00 Valid when the --localpair or --genafpair distance metric options are selected. offsetValuePairwise Offset value for pairwise alignment (--lep) perl ($distanceMetric eq "2") || ($distanceMetric eq "3") perl (defined $value && $value ne $vdef) ? " --lep $value" : "" 0.1 Valid when the --localpair or --genafpair distance metric options are selected. 16 extendPenaltyPairwise Gap extension penalty for pairwise alignment (--lexp) perl ($distanceMetric eq "2") || ($distanceMetric eq "3") perl (defined $value && $value ne $vdef) ? " --lexp $value" : "" -0.1 Valid when the --localpair or --genafpair distance metric options are selected. 17 opPenaltySkip Gap open penalty for skipping the alignment (--LOP) perl ($distanceMetric eq "3") perl (defined $value && $value ne $vdef) ? " --LOP $value" : "" -6.00 Valid when the --genafpair distance metric option is selected. 18 extendPenaltySkip Gap extension penalty for skipping the alignment (--LEXP) perl ($distanceMetric eq "3") perl (defined $value && $value ne $vdef) ? " --LEXP $value" : "" 0.00 Valid when the --genafpair distance metric option is selected. 19 fmodel Incorporate AA/nucleotide composition information into the scoring matrix (--fmodel) 0 no 1 yes 0 perl (defined $value && $value ne $vdef) ? " --fmodel" : "" 25 para_io Input/Output Options preservecase Preserve case (--preservecase) 0 perl ($value) ? " --preservecase" : "" 26 outputFormat Output format 0 FASTA 1 ClustalW 0 perl (defined $value && $value ne $vdef) ? " --clustalout" : "" 26 outputOrder Output order --inputorder same as input --reorder aligned --inputorder 27 outputGuideTree Output guide tree (--treeout) 0 no 1 yes 0 perl (defined $value && $value ne $vdef) ? " --treeout" : "" 28