<?xml version="1.0" encoding="ISO-8859-1" ?>
<!DOCTYPE pise SYSTEM "PARSER/pise.dtd">

<pise>

  <head>
    <title>HMMER</title>
    <version>2.2g</version>
    <description>sreformat - convert sequence file to different format</description>
    <authors>S. Eddy</authors>
    <category>format</category>
  </head>

  <command>sreformat</command>

  <parameters>

    <parameter iscommand="1" ishidden="1" type="String">
      <name>sreformat</name>
      <attributes>
	<format>
	  <language>perl</language>
	  <code>"sreformat "</code>
	</format>
	<group>0</group>
      </attributes>
    </parameter>

    <parameter ismandatory="1" type="InFile">
      <name>seqfile</name>
      <attributes>
	<prompt>Sequence(s) file</prompt>
	<format>
	  <language>perl</language>
	  <code>" $value"</code>
	</format>
	<group>3</group>
	<comment>
	  <value>Supported input formats include (but are not limited to) the unaligned formats FASTA, Genbank, EMBL, SWISS-PROT, PIR, and GCG, and the aligned formats SELEX, Clustal, and GCG MSF.</value>
	  <value>Unaligned format files cannot be reformatted to aligned formats. However, aligned formats can be reformatted to unaligned formats -- gap characters are simply stripped out.</value>
	</comment>
	<pipe>
	  <pipetype>hmmer_alig</pipetype>
	  <language>perl</language>
	  <code>1</code>
	</pipe>
      </attributes>
    </parameter>

    <parameter ishidden="1" type="Results">
      <name>single_seq_file</name>
      <attributes>
	<format>
	  <language>perl</language>
	  <code>" &gt; seqfile"</code>
	</format>
	<group>10</group>
	<precond>
	  <language>perl</language>
	  <code>$output_format eq "embl" || $output_format eq "GCG" || $output_format eq "pir" || $output_format eq "zuker"</code>
	</precond>
	<filenames>"seqfile"</filenames>
	<pipe>
	  <pipetype>seqfile</pipetype>
	  <language>perl</language>
	  <code>1</code>
	</pipe>
      </attributes>
    </parameter>

    <parameter ishidden="1" type="Results">
      <name>multi_seq_file</name>
      <attributes>
	<format>
	  <language>perl</language>
	  <code>" &gt; seqsfile"</code>
	</format>
	<group>10</group>
	<precond>
	  <language>perl</language>
	  <code>$output_format eq "fasta" || $output_format eq "gcgdata" || $output_format eq "raw"</code>
	</precond>
	<filenames>"seqsfile"</filenames>
	<pipe>
	  <pipetype>seqsfile</pipetype>
	  <language>perl</language>
	  <code>1</code>
	</pipe>
      </attributes>
    </parameter>

    <parameter type="Results">
      <name>alig_file</name>
      <attributes>
	<format>
	  <language>perl</language>
	  <code>" &gt; aligfile"</code>
	</format>
	<group>10</group>
	<precond>
	  <language>perl</language>
	  <code>$output_format eq "msf" || $output_format eq "a2m"</code>
	</precond>
	<filenames>"aligfile"</filenames>
	<pipe>
	  <pipetype>readseq_ok_alig</pipetype>
	  <language>perl</language>
	  <code>1</code>
	</pipe>
      </attributes>
    </parameter>

    <parameter type="Results">
      <name>selex_alig_file</name>
      <attributes>
	<format>
	  <language>perl</language>
	  <code>" &gt; selex_aligfile"</code>
	</format>
	<group>10</group>
	<precond>
	  <language>perl</language>
	  <code>$output_format eq "selex"</code>
	</precond>
	<filenames>"selex_aligfile"</filenames>
	<pipe>
	  <pipetype>hmmer_alig</pipetype>
	  <language>perl</language>
	  <code>1</code>
	</pipe>
      </attributes>
    </parameter>

    <parameter ismandatory="1" type="Excl">
      <name>output_format</name>
      <attributes>
	<prompt>output format</prompt>
	<format>
	  <language>perl</language>
	  <code>" $value "</code>
	</format>
	<group>2</group>
	<vlist>
	  <value>fasta</value>
	  <label>FASTA format</label>
	  <value>embl</value>
	  <label>EMBL/SWISSPROT format</label>
	  <value>genbank</value>
	  <label>GENBANK format</label>
	  <value>GCG</value>
	  <label>GCG single sequence format</label>
	  <value>gcgdata</value>
	  <label>GCG flatfile database format</label>
	  <value>strider</value>
	  <label>MacStrider format</label>
	  <value>zuker</value>
	  <label>ZUKER MFOLD format</label>
          <value>ig</value>
          <label>Intelligenetics format</label>
          <value>pir</value>
          <label>PIR/CODATA flatfile format</label>
          <value>squid</value>
          <label>undocumented St Louis format</label>
	  <value>raw</value>
	  <label>raw sequence, no other information</label>
	  <value>stockholm</value>
	  <label>PFAM/Stockholm format</label>
          <value>msf</value>
          <label>GCG MSF format</label>
          <value>a2m</value>
          <label>aligned FASTA format</label>
	  <value>phylip</value>
	  <label>Felsenstein's PHYLIP format</label>
	  <value>selex</value>
	  <label>old SELEX/HMMER/Pfam annotated alignment format</label>
	</vlist>
	<comment>
	  <value>Available unaligned output file format codes include fasta (FASTA format); embl (EMBL/SWISSPROT format); genbank (Genbank format); gcg (GCG single sequence format); gcgdata (GCG flatfile database format); strider (MacStrider format); zuker (Zuker MFOLD format); ig (Intelligenetics format); pir (PIR/CODATA flatfile format); squid (an undocumented St. Louisformat); raw (raw sequence, no other information). The available aligned output file format codes include stockholm (PFAM/Stockholm format); msf (GCG MSF format); a2m (aligned FASTA format, called A2M by the UC Santa Cruz HMM group); PHYLIP (Felsenstein's PHYLIP format); and selex (old SELEX/HMMER/Pfam annotated alignment format).</value>
	</comment>
      </attributes>
    </parameter>

    <parameter type="Switch">
      <name>dna</name>
      <attributes>
	<prompt>DNA (-d)</prompt>
	<format>
	  <language>perl</language>
	  <code>($value) ? " -d" : ""</code>
	</format>
	<group>1</group>
	<comment>
	  <value>convert U's to T's, to make sure a nucleic acid sequence is shown as DNA not RNA.</value>
	</comment>
      </attributes>
    </parameter>

    <parameter type="Switch">
      <name>lowercase</name>
      <attributes>
	<prompt>Convert all sequence residues to lower case (-l)</prompt>
	<format>
	  <language>perl</language>
	  <code>($value) ? " -l" : ""</code>
	</format>
	<group>1</group>
	<comment>
	  <value>convert all sequence residues to lower case.</value>
	</comment>
      </attributes>
    </parameter>

    <parameter type="Switch">
      <name>rna</name>
      <attributes>
	<prompt>RNA (-r)</prompt>
	<format>
	  <language>perl</language>
	  <code>($value) ? " -r" : ""</code>
	</format>
	<group>1</group>
	<comment>
	  <value>convert T's to U's, to make sure a nucleic acid sequence is shown as RNA not DNA.</value>
	</comment>
      </attributes>
    </parameter>

    <parameter type="Switch">
      <name>uppercase</name>
      <attributes>
	<prompt>Convert all sequence residues to upper case (-u)</prompt>
	<format>
	  <language>perl</language>
	  <code>($value) ? " -u" : ""</code>
	</format>
	<group>1</group>
	<comment>
	  <value>convert all sequence residues to upper case.</value>
	</comment>
      </attributes>
    </parameter>

    <parameter type="Switch">
      <name>iupac_convert</name>
      <attributes>
	<prompt>Convert DNA non-IUPAC characters (such as X's) to N's (-x)</prompt>
	<format>
	  <language>perl</language>
	  <code>($value) ? " -x" : ""</code>
	</format>
	<group>1</group>
      </attributes>
    </parameter>

    <parameter type="Paragraph">
      <paragraph>
	<name>expert_options</name>
	<prompt>Expert options</prompt>
	<parameters>

	  <parameter type="Switch">
	    <name>pfam</name>
	    <attributes>
	      <prompt>(--pfam)</prompt>
	      <format>
		<language>perl</language>
		<code>($value) ? " --pfam" : ""</code>
	      </format>
	      <group>1</group>
	      <comment>
		<value>For SELEX alignment output format only, put the entire alignment in one block (don't wrap into multiple blocks).</value>
	      </comment>
	    </attributes>
	  </parameter>

	  <parameter type="Switch">
	    <name>sam</name>
	    <attributes>
	      <prompt>(--sam)</prompt>
	      <format>
		<language>perl</language>
		<code>($value)? " --sam " : ""</code>
	      </format>
	      <group>1</group>
	      <comment>
		<value>Try to convert gap characters to UC Santa Cruz SAM style, where a . means a gap in an insert column, and a - means a deletion in a consensus/match column. This only works for converting aligned file formats, and only if the alignment already adheres to the SAM convention of upper case for residues consensus/match columns, and lower case for residues in insert columns. This is true, for instance, of all alignments produced by old versions of HMMER. (HMMER2 produces alignments that adhere to SAM's conventions even in gap character choice.) This option was added to allow Pfam alignments to be reformatted into something more suitable for profile HMM construction using the UCSC SAM software.</value>
	      </comment>
	    </attributes>
	  </parameter>

	  <parameter type="Float">
	    <name>samfrac</name>
	    <attributes>
	      <prompt>(--samfrac x)</prompt>
	      <format>
		<language>perl</language>
		<code>($value)? " --samfrac " : ""</code>
	      </format>
	      <group>1</group>
	      <comment>
		<value>Try to convert the alignment gap characters and residue cases to UC Santa Cruz SAM style, where a . means a gap in an insert column and a - means a deletion in a consensus/match column, and upper case means match/consensus residues and lower case means inserted resiudes. This will only work for converting aligned file formats, but unlike the -sam option, it will work regardless of whether the file adheres to the upper/lower case residue convention. Instead, any column containing more than a fraction x of gap characters is interpreted as an insert column, and all other columns are interpreted as match columns. This option was added to allow Pfam alignments to be reformatted into something more suitable for profile HMM construction using the UCSC SAM software.</value>
	      </comment>
	    </attributes>
	  </parameter>

	</parameters>
      </paragraph>

    </parameter>

  </parameters>
</pise>
