OUTPUT [ Previous | Top | Next ] The output from FastX is a list file, and is suitable for input to any GCG program that allows indirect file specifications. (For information about indirect file specification, see Chapter 2, Using Sequence Files and Databases of the User's Guide.) Here is some of the output file: !!SEQUENCE_LIST 1.0 (Nucleotide) FASTX of: singlepass.seq from: 1 to: 102 September 25, 1998 13:36 A fragment of ggammacod.seq with simulated frameshift errors. Human fetal beta globins G and A gamma from Shen, Slightom and Smithies, Cell 26; 191-203. Analyzed by Smithies et al. Cell 26; 345-353. TO: PIR:* Sequences: 109,075 Symbols: 34,814,664 Word Size: 2 Databases searched: NBRF, Release 57.0, Released on 30Jun1998, Formatted on 18Aug1998 Searching with both strands of the query. Scoring matrix: GenRunData:Blosum50.Cmp Constant pamfactor used Gap creation penalty: 15 Gap extension penalty: 2 Frameshift penalty: 20 Histogram Key: Each histogram symbol represents 327 search set sequences Each inset symbol represents 7 search set sequences z-scores computed from opt scores z-score obs exp (=) (*) < 20 1731 0:====== 22 6 0:= 24 44 0:= 26 93 5:* 28 246 49:* 30 664 300:*== 32 1804 1160:===*== 34 4373 3146:=========*==== 36 7914 6461:===================*===== 38 12508 10677:================================*====== 40 17296 14893:=============================================*======= 42 19386 18205:=======================================================*==== 44 19567 20082:===========================================================* 46 19089 20454:===========================================================* 48 17629 19583:====================================================== * 50 15834 17869:================================================= * 52 14363 15710:============================================ * 54 12398 13419:====================================== * 56 10183 11209:================================ * 58 8834 9202:============================* 60 7074 7455:======================* 62 5747 5976:==================* 64 4660 4753:==============* 66 3865 3757:===========* 68 3084 2955:=========* 70 2187 2316:=======* 72 1735 1809:=====* 74 1306 1411:====* 76 1079 1098:===* 78 868 853:==* 80 559 663:==* 82 467 507:=* 84 352 402:=* 86 239 311:* 88 178 240:* 90 131 186:* 92 92 144:* :============== * 94 80 111:* :============ * 96 33 86:* :===== * 98 28 67:* :==== * 100 27 52:* :==== * 102 8 40:* :== * 104 15 31:* :=== * 106 5 24:* := * 108 7 18:* := * 110 4 14:* :=* 112 7 11:* :=* 114 3 9:* :=* 116 2 7:* :* 118 2 5:* :* >120 344 4:*= :*======================================= Joining threshold: 36, opt. threshold: 24, opt. width: 16, reg.-scaled The best scores are: init1 initn opt z-sc E(217753).. PIR2:A30213 ! hemoglobin epsilon chain - North Am... 130 130 141 223.3 2.7e-05 PIR1:HGMQP ! hemoglobin gamma chain - pig-tailed... 129 129 140 221.8 3.3e-05 PIR1:HGBAY ! hemoglobin gamma chain - yellow baboon 129 129 140 221.8 3.3e-05 ////////////////////////////////////////////////////////////////////////// \\End of List singlepass.seq PIR2:A30213 P1;A30213 - hemoglobin epsilon chain - North American opossum C;Species: Didelphis virginiana, Didelphis marsupialis virginiana (North American opossum) C;Date: 18-Oct-1989 #sequence_revision 18-Oct-1989 #text_change 21-Nov-1997 C;Accession: A30213 R;Koop, B.F.; Goodman, M. Proc. Natl. Acad. Sci. U.S.A. 85, 3893-3897, 1988 . . . SCORES Init1: 130 Initn: 130 Opt: 141 z-score: 223.3 E(): 2.7e-05 Smith-Waterman score: 141; 76.5% identity in 34 aa overlap 10 20 30 39 singlepass.s LVVYP/WTQRFV\DSFGNLSSASASWATPXVKAH ||||| ||||| ||||||||||| ::| |||| A30213 LVVYP-WTQRFF-DSFGNLSSASAVMGNPKVKAH 40 50 60 ////////////////////////////////////////////////////////////////////////// ! Distributed over 1 thread. ! Start time: Fri Sep 25 13:32:20 1998 ! Completion time: Fri Sep 25 13:39:08 1998 ! CPU time used: ! Database scan: 0:02:29.6 ! Post-scan processing: 0:00:04.5 ! Total CPU time: 0:02:34.1 ! Output File: singlepass.fastx What is the Output? The first part of the output file contains a histogram showing the distribution of the z-scores between the query and search set sequences. (See the ALGORITHM topic for an explanation of z-score.) The histogram is composed of bins of size 2 that are labeled according to the higher score for that bin (the leftmost column of the histogram). For example, the bin labeled 24 stores the number of sequence pairs that had scores of 23 or 24. The next two columns of the histogram list the number of z-scores that fell within each bin. The second column lists the number of z-scores observed in the search and the third column lists the number of z-scores that were expected. The body of the histogram displays a graphical representation of the score distributions. Equal signs (=) indicate the number of scores of that magnitude that were observed during the search, while asterisks (*) plot the number of scores of that magnitude that were expected. At the bottom of the histogram is a list of some of the parameters pertaining to the search. Below the histogram, FastX displays a listing of the best scores. Strand:- after the sequence name in this list indicates that the match was found between search set sequence and the reverse complement of the query sequence. Following the list of best scores, FastX displays the alignments of the regions of best overlap between the query and search sequences. /rev following the query sequence name indicates that the search sequence is aligned with the reverse complement of the query sequence. This program displays only the region of overlap between the two aligned sequences (plus some residues on either side of the region to provide context for the alignment) unless you use -SHOWall. The display of identities and conservative replacements between the aligned sequences depends on the value of -MARKx. By default ( -MARKx=3), the pipe character (|) is used to denote identities and the colon (:) to denote conservative replacements.