- sample.sam is a small subset of the AU004904.sam dataset. It has reads with inserts, deletes and mods. - sample1.sam and sample2.sam are sample.out reads split into two files. (Concat them to get sample.out). - single_sample.out is the result of "samerrors.py -o single_sample.out -l 74 -f 2 sample.sam" - two_samples.out is the result of "samerrors.py -o two_samples.out -l 74 -f 2 sample1.sam sample2.sam" it is identical to single_sample.out as it should be - single_sample_debug.out is result of running with "debug=True" in samerrors.py. It prints the position of each error of each type in each read so you can manually verify the percentages. - single_sample.out.jpg is the jpg result of "samerrors.py -r -o single_sample.out -l 74 -f 2 sample.sam" If you want to test the -m argument to specify the column that contains the MD, it's the 21st (i.e -m 21) for these sam files. - bfast_sample.sam is a subset of Andrew's 1m.bfast.rmdup.sam. It's MD is in columnn 19, read len is 75 and it uses a different format MD than the above samples (hence the ommission of "-f 2" on the command line as this is the default MD format for samerrors.py). - bfast_sample.out is the result of running "samerrors.py -o bfast_sample.out -m 19 -l 75 bfast_sample.sam" - novo.clip.sam is a subset of Andrew's novo.rmdup.sam. Some of its cigar's have an "S" to indicate soft clipping at beginning or end of the alignment. I treat the "S" the same as an M when parsing the cigar, but keep track of the number of bps clipped at the start and end so that when we parse the MD we can add the starting clip to our startpos and the trailing clip to our count at the end to make sure we end up with readLen accounted for. - novo.clip.out is the result of running "samerrors.py -m 16 -l 75 -f 2 -r -o novo.clip.out novo.clip.sam"