PICARD on XSEDE

PICARD on XSEDE 2.2.8 Tools for manipulating high-throughput sequencing (HTS) data and formats http://broadinstitute.github.io/picard/ Assembly:Assemble_reads picard_xsede picard_sortsam perl $run_sortsam perl

"picard_expanse picard SortSam VALIDATION_STRINGENCY=LENIENT MAX_RECORDS_IN_RAM=7500000  INPUT=A1.sam OUTPUT=A1.bam.sorted.bam $select_sortorder"

1 picard_sortsam_and perl $run_sortsam && $run_markduplicates perl "&&" 2 picard_sammarkduplicates perl $run_sortsam && $run_markduplicates perl

"picard_expanse picard MarkDuplicates INPUT=A1.bam.sorted.bam OUTPUT=A1.bam.sorted_marked.bam METRICS_FILE=metrics.txt OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 CREATE_INDEX=true TMP_DIR=/tmp"

3 picard_markduplicates perl !$run_sortsam && $run_markduplicates perl

"picard_expanse picard MarkDuplicates INPUT=A1.bam.sorted.bam OUTPUT=A1.bam.sorted_marked.bam METRICS_FILE=metrics.txt OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 CREATE_INDEX=true TMP_DIR=/tmp"

3 picard_readgroups_and perl $run_addreadgroups && $run_markduplicates perl "&&" 4 picard_addreadgroups perl $run_addreadgroups perl

"picard_expanse picard AddOrReplaceReadGroups I=A1.bam.sorted_marked.bam O=A1.bam.sorted_marked_readgroups.bam TMP_DIR=tmp SORT_ORDER=coordinate RGID=$sample RGLB=$sample RGPL=illumina RGPU=$sample RGSM=$sample CREATE_INDEX=True VALIDATION_STRINGENCY=LENIENT"

5 infile Input fasta file A1.sam picard_scheduler scheduler.conf perl


									"threads_per_process=1\\n" .
									"node_exclusive=0\\n" .
									"mem=2G\\n" .
									"nodes=1\\n"

0 allresults * runtime 1 scheduler.conf Maximum Hours to Run (click here for help setting this correctly) 0.25 Estimate the maximum time your job will need to run. We recommend testimg initially with a < 0.5hr test run because Jobs set for 0.5 h or less depedendably run immediately in the "debug" queue. Once you are sure the configuration is correct, you then increase the time. The reason is that jobs > 0.5 h are submitted to the "normal" queue, where jobs configured for 1 or a few hours times may run sooner than jobs configured for the full 168 hours. Maximum Hours to Run must be less than 168 perl $runtime > 168.0 Maximum Hours to Run must be greater than 0.1 perl $runtime < 0.1 Sorry, you cannot run SortSam and Addreadgroups unless you also run Markduplicates perl $run_addreadgroups && $run_sortsam && !$run_markduplicates The job will run on 1 processors as configured. If it runs for the entire configured time, it will consume 1 x $runtime cpu hours perl defined $runtime perl "runhours=$value\\n" run_sortsam Run SortSam command 1 select_sortorder Sort Order of Output File unsorted unsorted queryname queryname coordinate coordinate duplicate duplicate unknown unknown coordinate perl ($value ne $vdef) ? "SO=$value":'" run_addreadgroups Run AddReadgroups command 1 mark_duplicatespara Mark Duplicates run_markduplicates Run MarkDuplicates command 1 infile2 Select Input for MarkDuplicates perl $run_markduplicates && !$run_sortsam markdups.bam Please enter a file for the MarkDuplicates stage perl !defined $infile2 && $run_markduplicates && !$run_sortsam To use the MarkDuplicates command without SortSam, the input file must be "coordinate" sorted perl $run_markduplicates && !$run_sortsam infile3 Select Input for Addreadgroups command perl $run_addreadgroups && !$run_markduplicates && !$run_sortsam A1.bam.sorted_marked.bam Please enter a file for the AddReadGroups stage perl !defined $infile3 && $run_addreadgroups && !$run_markduplicates && !$run_sortsam outfile Outfile Name This is not the fasta file. This is the sam/bam file. sample sample name Name of the sample (e.g. A3)