Noisy on ACCESS1.5.12Identify homo-plastic characters in multiple sequence alignments - run on XSEDEChristoph Flamm, Sonja J Prohaska, Guido Fritzsch, Peter F Stadler
Andreas W. M.Dress, Christoph Flamm, Guido Fritzsch, Stefan Grünewald, Matthias Kruspe, Sonja J. Prohaska, Peter F. Stadler Noisy: identification of problematic columns in multiple sequence alignments. Algorithms Mol Biol, 3:7 (2008). doi:10.1186/1748-7188-3-7
Stefan Grünewald, Kristoffer Forslund, Andreas W. M.Dress, Vincent Moulton QNet: An Agglomerative Method for the Construction of Phylogenetic Networks from Weighted Quartets. Mol Biol Evol, 24(2):532-538 (2007). doi:10.1093/molbev/msl180
Bryant, David and Moulton, Vincent (2004) Neighbor-Net: An Agglomerative Method for the Construction of Phylogenetic Networks. Mol. Biol. Evol. 21:255-265
Phylogeny / Alignmentnoisy_xsedenoisy_cometperl""0number_nodes2scheduler.confperl
"threads_per_process=1\\n" .
"node_exclusive=0\\n" .
"mem=15G\\n" .
"nodes=1\\n"
infileInput File (AFA format)perl"input.afa"3input.afaall_results*runtimeMaximum Hours to Run (up to 168 hours)scheduler.conf0.5The maximum hours to run must be less than 168perl$runtime > 168.0The maximum hours to run must be greater than 0.05perl$runtime < 0.05perl"runhours=$value\\n"The job will run on 1 processor as configured. If it runs for the entire configured time, it will consume $runtime cpu hoursperl$runtime > 0 Estimate the maximum time your job will need to run. We recommend testing initially with a < 0.5hr test run because Jobs set for 0.5 h or less depedendably run immediately in the "debug" queue.
Once you are sure the configuration is correct, you then increase the time. The reason is that jobs > 0.5 h are submitted to the "normal" queue, where jobs configured for 1 or a few hours times may
run sooner than jobs configured for the full 168 hours.
specify_ntaxa5How many taxa in your input file?Please specify the number of taxa in your input fileperl!defined $specify_ntaxaSorry, Noisy cannot process more than 338 taxaperl$specify_ntaxa > 338specify_cutoff5Set the lower bound of the reliability scoreperldefined $value ? "--cutoff $value":""Set the lower bound of the reliability score for an alignment column to FLOAT. Columns with a score below FLOAT are removed from the output alignment. The name of the output MSA is constructed from the base name of the input MSA by adding the post fix _out.fas specify_distcalcSet distance calculation of NeighborNet (--distance)HAMMINGGTRHAMMINGperl" --distance $value "6Set distance calculation of NeighborNet to HAMMING or GTRdistance_matrixSelect Substitution matrix file for Neighbornet (--matrix)perl!defined $specify_distcalcperl defined $value ? " --matrix distance_matrix.txt":""7The matrix file is not compatible with using the --distance optionperldefined $distance_matrix && defined $specify_distcalcdistance_matrix.txtRead distance matrix used by NeighborNet to generate the cyclic order from
FILE instead of letting NeighborNet calculate the distance matrix by one
of the methods given to option -*distance
specify_stringTreat this character(s) as missing data (--missing)perldefined $value ? "--missing $value":""N8Each character of STRING is treated as missing data, and is removed a column before before changes between character states are calculated.set_nogapAdd the gap symbol to the set of missing characters (--nogap)perl($value) ? "--nogap":""9Add the gap symbol to the set of missing characters.suppress_constantSuppress constant columns in the output MSA. (--noconstant)perl($value) ? "--noconstant":""10Suppress constant columns in the output MSA.specify_orderingSet the method to calculate the cyclic order (--ordering)nnetqnetrandallINTnnet"--ordering nnet"qnet"--ordering qnet"rand"--ordering rand,$specify_randint"all"--ordering all"INT"--ordering $specify_intint "11If the number of taxa for the all option is greater than 8, the run can become quite lengthyperl$specify_ordering eq "all" && $specify_ntaxa > 8 More than 120 taxa can cause you to run out of memory. Consider using the more memory optionperl$specify_ntaxa > 120 && $specify_ntaxa < 339 specify_randintSpecify an integer for the ordering (rand or INT)perl$specify_ordering eq "rand" 12Please enter an integer for the orderingperl$specify_ordering eq "rand" && !defined $specify_intintWith rand a random sample of all possible orderings of the TAXA can be specified for which the
reliability score is calculated. The size of the random sample (default is 1000) can be set by
adding an integer after a comma to rand i.e. rand,42. (All orderings with a smaller reliability
than cutoff are singled out to a text file with "_best.gr" as post fix)specify_intintSpecify a cyclic ordering (INT)perl$specify_ordering eq "INT"13Please enter an integer for the orderingperl$specify_ordering = "INT" && !defined $specify_intintSpecified by a comma-separated list of TAXA indices in the range [0, NumberOfTAXA[ (no spaces are allowed) e.g 3,0,4,1,2 as ordering for the 5 TAXA in the
input MSA.specify_shufflesSpecify number of random shufflings per column of the MSA (--shuffles)perl(defined $value) ? "--shuffles $value":""14Perform INT random shufflings per column of the MSA.9specify_smoothingCalculate a running average over the reliability score of x columns (--smooth)perl(defined $value) ? "--smooth $value":""15Calculate a running average over the reliability score of INT columns and use this smoothed values to remove unreliable columns from the MAS. specify_datatypeSet sequence type of input MSA (--seqtype)DPRDperl"--seqtype $value"16Set sequence type of input MSA to DNA which is the default Protein or RNA. This information is used by NeighborNet during distance matrix calculation.increase_verbosityIncrease the verbosity level (--verbose)7perl($value) ? "--verbose":""17Provide more verbose output.