PICARD on XSEDE

PICARD on XSEDE 2.3.0 Tools for manipulating high-throughput sequencing (HTS) data and formats http://broadinstitute.github.io/picard/ Assembly:Assemble_reads picard_xsede picard_sortsam perl $run_sortsam perl

"picard_comet java -Xmx4g -jar /opt/biotools/picard/picard.jar SortSam  VALIDATION_STRINGENCY=LENIENT MAX_RECORDS_IN_RAM=7500000  INPUT=A1.sam OUTPUT=A1.bam.sorted.bam SO=coordinate"

1 picard_sortsam_and perl $run_sortsam && $run_markduplicates perl "&&" 2 picard_markduplicates perl $run_markduplicates perl

"picard_comet java -Xmx4g  -jar /opt/biotools/picard/picard.jar MarkDuplicates INPUT=A1.bam.sorted.bam OUTPUT=A1.bam.sorted_marked.bam METRICS_FILE=metrics.txt OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 CREATE_INDEX=true TMP_DIR=/tmp"

3 picard_readgroups_and perl $run_addreadgroups && $run_markduplicates perl "&&" 4 picard_addreadgroups perl $run_addreadgroups perl

"picard_comet java -Xmx4g -jar /opt/biotools/picard/picard.jar AddOrReplaceReadGroups I=A1.bam.sorted_marked.bam O=A1.bam.sorted_marked_readgroups.bam TMP_DIR=tmp SORT_ORDER=coordinate RGID=$sample RGLB=$sample RGPL=illumina RGPU=$sample RGSM=$sample CREATE_INDEX=True VALIDATION_STRINGENCY=LENIENT"

5 infile Input fasta file A1.sam picard_scheduler scheduler.conf perl


									"threads_per_process=1\\n" .
									"node_exclusive=0\\n" .
									"nodes=1\\n"

0 allresults * runtime 1 scheduler.conf Maximum Hours to Run (click here for help setting this correctly) 0.25 Estimate the maximum time your job will need to run. We recommend testimg initially with a < 0.5hr test run because Jobs set for 0.5 h or less depedendably run immediately in the "debug" queue. Once you are sure the configuration is correct, you then increase the time. The reason is that jobs > 0.5 h are submitted to the "normal" queue, where jobs configured for 1 or a few hours times may run sooner than jobs configured for the full 168 hours. Maximum Hours to Run must be less than 168 perl $runtime > 168.0 Maximum Hours to Run must be greater than 0.1 perl $runtime < 0.1 Sorry, you cannot run SortSam and Addreadgroups unless you also run Markduplicates perl $run_addreadgroups && $run_sortsam && !$run_markduplicates The job will run on 1 processors as configured. If it runs for the entire configured time, it will consume 1 x $runtime cpu hours perl defined $runtime perl "runhours=$value\\n" run_sortsam Run SortSam command 1 run_markduplicates Run Markduplicates command 1 run_addreadgroups Run Addreadgroups command 1 infile2 Select Input for Markduplicates perl $run_markduplicates && !$run_sortsam A1.bam.sorted.bam Please enter a file for the Markduplicates stage perl !defined $infile2 && $run_markduplicates && !$run_sortsam infile3 Select Input for Addreadgroups command perl $run_addreadgroups && !$run_markduplicates && !$run_sortsam A1.bam.sorted_marked.bam Please enter a file for the Addredgroups stage perl !defined $infile3 && $run_addreadgroups && !$run_markduplicates && !$run_sortsam sample sample name Name of the sample (e.g. A3)