Cutadapt on XSEDE 3.4 Remove adapter sequences from high-throughput sequencing reads Marcel Martin Marcel Martin (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.Journal, 17(1):10-12. http://dx.doi.org/10.14806/ej.17.1.200 Assembly:Assemble_reads cutadapt_xsede cutadapt_invoke perl "cutadapt_expanse" 1 infile_placeholder 97 placeholder.txt infile_notcompressed2 Input fasta file 97 perl !$compressed_input perl "input.fastq" infile_compressed2 97 perl $compressed_input perl "input.fastq.gz" cutadapt_scheduler scheduler.conf perl $num_cores <24 perl "threads_per_process=$num_cores\\n" . "mem=" . (int($num_cores*(248/32))) . "G\\n" . "node_exclusive=0\\n" . "nodes=1\\n" 0 cutadapt_scheduler2 scheduler.conf perl $num_cores == 24 perl "threads_per_process=24\\n" . "mem=" . (int(24*(248/32))) . "G\\n" . "node_exclusive=0\\n" . "nodes=1\\n" 0 allresults * runtime 1 scheduler.conf Maximum Hours to Run (click here for help setting this correctly) 0.25 Estimate the maximum time your job will need to run. We recommend testimg initially with a < 0.5hr test run because Jobs set for 0.5 h or less depedendably run immediately in the "debug" queue. Once you are sure the configuration is correct, you then increase the time. The reason is that jobs > 0.5 h are submitted to the "normal" queue, where jobs configured for 1 or a few hours times may run sooner than jobs configured for the full 168 hours. Maximum Hours to Run must be less than 168 perl $runtime > 168.0 Maximum Hours to Run must be greater than 0.1 perl $runtime < 0.1 The job will run on $num_cores processors as configured. If it runs for the entire configured time, it will consume $num_cores x $runtime cpu hours perl defined $runtime perl "runhours=$value\\n" num_cores How many cores? 3 perl (defined $value) ? "-j $num_cores":"" 1 The number of cores must be less than or equal to 24 perl $num_cores > 24 paired_ends I have paired end reads compressed_input My input is compressed infile_compressed Select your first input file (compressed) perl $compressed_input 97 input.fastq.gz Please select your first input file (compressed) perl $compressed_input && !defined $infile_compressed infile_notcompressed Select your first input file (not compressed) perl !$compressed_input 97 input.fastq Please select your first input file (not compressed) perl !$compressed_input && !defined $infile_notcompressed paired_endfilecompr Select your second paired end reads file (compressed) 99 perl $paired_ends && $compressed_input perl "input2.fastq.gz" input2.fastq.gz Please select your second paired end reads file (compressed) perl $paired_ends && $compressed_input && !defined $paired_endfilecompr paired_endfilenotcompr Select your second paired end reads file (not compressed) 99 perl !$compressed_input && $paired_ends perl "input2.fastq" input2.fastq Please select your second paired end reads file (not compressed) perl $paired_ends && !$compressed_input && !defined $paired_endfilenotcompr specify_3adapter Enter the 3' adapter 2 perl (defined $value) ? "-a $value":"" Sequence of an adapter ligated to the 3 end (paired data: of the first read). The adapter and subsequent bases are trimmed. If a $ character is appended (anchoring), the adapter is only found if it is a suffix of the read. specify_3adapterpaired Enter the 3' adapter -A for paired-end reads 2 perl $paired_ends perl (defined $value) ? "-A $value":"" Sequence of an adapter ligated to the 3 end (paired data: of the first read). The adapter and subsequent bases are trimmed. If a $ character is appended (anchoring), the adapter is only found if it is a suffix of the read. specify_5adapter Enter the 5' adapter 2 perl (defined $value) ? "-g $value":"" 5_adapter Sequence of an adapter ligated to the 5 end (paired data: of the first read). The adapter and any preceding bases are trimmed. Partial matches at the 5' end are allowed. If a ^ character is prepended (anchoring), the adapter is only found if it is a prefix of the read. specify_5adapterpaired Enter the 5' adapter for paired end reads (-G) 2 perl $paired_ends perl (defined $value) ? "-G $value":"" 5_adapter Sequence of an adapter ligated to the 5 end (paired data: of the first read). The adapter and any preceding bases are trimmed. Partial matches at the 5' end are allowed. If a ^ character is prepended (anchoring), the adapter is only found if it is a prefix of the read. specify_anywhichwayadapter Enter an adapter that may be 3' or 5' 2 perl (defined $value) ? "-b $value":"" 35_adapter Sequence of an adapter that may be ligated to the 5'or 3' end (paired data: of the first read). Both types of matches as described under -a und -g are allowed. If the first base of the read is part of the match, the behavior is as with -g, otherwise as with -a. This option is mostly for rescuing failed library preparations - do not use if you know which end your adapter was ligated to! specify_anywhichwayadapterpaired Enter an adapter that may be 3' or 5' for paired end reads (-B) 2 perl $paired_ends perl (defined $value) ? "-B $value":"" 35_adapter Sequence of an adapter that may be ligated to the 5'or 3' end (paired data: of the first read). Both types of matches as described under -a und -g are allowed. If the first base of the read is part of the match, the behavior is as with -g, otherwise as with -a. This option is mostly for rescuing failed library preparations - do not use if you know which end your adapter was ligated to! specify_maxerrorrate Maximum allowed error rate (-e, --error-rate) 9 perl ($value ne $vdef) ? "-e $specify_maxerrorrate":"" 0.1 Maximum allowed error rate as value between 0 and 1 (no. of errors divided by length of matching region). Default: 0.1 (=10%) only_mismatches Allow only mismatches in alignments 11 perl ($value) ? "--no-indels":"" specify_removenadapters Remove up to COUNT adapters from each read (-n, --times) 13 perl ($value ne $vdef) ? "-n $specify_removenadapters":"" 1 overlap_minlength Require MINLENGTH overlap between read and adapter (--overlap) 15 perl ($value ne $vdef) ? "-O $overlap_minlength":"" 3 Require MINLENGTH overlap between read and adapter for an adapter to be found. (--overlap) Default=3 interpret_wildcards Interpret IUPAC wildcards in reads 17 perl ($value) ? "--match-read-wildcards":"" donot_interpretwildcards Do not interpret IUPAC wildcards in adapters. 19 perl ($value) ? "--no-match-adapter-wildcards":"" specify_adapterhandling What to do with found adapters? 21 trim mask lowercase none perl (defined $value) ? "--action $value":"" trim check_reversecomp Check read AND reverse complement for adapter matches 23 perl $paired_ends perl ($value) ? "--rc":"" Check both the read and its reverse complement for adapter matches. If match is on reverse-complemented version, output that one. Default: check only read additional_readmods Additional read modifications cut_length Remove this many bases from each read (-u) 25 perl (defined $value) ? "-u $cut_length":"" Remove bases from each read (first read only if paired). If LENGTH is positive, remove bases from the beginning. If LENGTH is negative, remove bases from the end. Can be used twice if LENGTHs have different signs. This is applied *before* adapter trimming nextseq_trim Number of nucleotides removed via NextSeq-specific quality trimming (each read; --nextseq-trim). 27 perl (defined $value) ? "--nextseq-trim=$value":"" NextSeq-specific quality trimming (each read). Trims also dark cycles appearing as high-quality G bases. quality_cutoff Trim low-quality bases from 5' and/or 3' ends -q ([5',]3prime). 27 perl (defined $value) ? "-q $value":"" Trim low-quality bases from 5 and/or 3 ends of each read before adapter removal. Applied to both reads if data is paired. If one value is given, only the 3 end is trimmed. If two comma-separated cutoffs are given, the 5 end is trimmed with the first cutoff, the 3 end with the second. quality_base Assume that quality values in FASTQ are encoded as ascii (quality + N) 27 perl (defined $value) ? "--quality-base $value":"" 33 Assume that quality values in FASTQ are encoded as ascii(quality + N). This needs to be set to 64 for some old Illumina FASTQ files. Default: 33 specify_length Shorten reads to LENGTH (-l, --length) 29 perl (defined $value) ? "-l $value":"" Shorten reads to LENGTH. Positive values remove bases at the end while negative ones remove bases at the beginning. This and the following modifications are applied after adapter trimming. specify_trimn Trim N's on ends of reads. (--trim-n) 29 perl ($value) ? "--trim-n":"" specify_lengthtag Search for TAG followed by a decimal number (--length-tag) 31 perl (defined $value) ? "--length-tag $value":"" Search for TAG followed by a decimal number in the description field of the read. Replace the decimal number with the correct length of the trimmed read.For example, use --length-tag 'length=' to correct fields like 'length=123'. specify_stripsuffix Remove this suffix from read names if present (--strip-suffix) 31 perl (defined $value) ? "--strip-suffix $value":"" specify_addprefix Add this prefix to read names (-x) 31 perl (defined $value) ? "-x $value":"" specify_addsuffix Add this suffix to read names (-y) 31 perl (defined $value) ? "-y $value":"" specify_negtozero Change negative quality values to zero (-z) 31 perl ($value) ? "-z":"" filtering_processedreads Filtering of processed reads specify_discardlength Discard reads shorter than LEN (-m; minimum) perl !$paired_ends 55 perl (defined $value) ? "-m $value":"" specify_maxdiscardlength Discard reads longer than LEN (-M; maximum) 57 perl !$paired_ends perl (defined $value) ? "-M $value":"" specify_maxnbases Discard reads with more than COUNT 'N' bases. (--max-n) 59 perl !$paired_ends perl (defined $value) ? "--max-n $value":"" Discard reads with more than COUNT 'N' bases. If COUNT is a number between 0 and 1, it is interpreted as a fraction of the read length. discard_maxerrs Discard reads whose expected number of errors exceeds (--max-expected-errors) 61 perl !$paired_ends perl (defined $value) ? "--max-expected-errors $value":"" Discard reads whose expected number of errors (computed from quality values) exceeds ERRORS discard_trimmed Discard reads that contain an adapter (--discard-trimmed) 63 perl !$paired_ends perl ($value) ? "--discard-trimmed":"" Discard reads that contain an adapter. Use also -O to avoid discarding too many randomly matching read discard_untrimmed Discard untrimmed reads (--discard-untrimmed) 65 perl !$paired_ends perl ($value) ? "--discard-untrimmed":"" Discard reads that do not contain an adapter. discard_casava Discard reads that did not pass CASAVA filtering (--discard-casava) 67 perl !$paired_ends perl ($value) ? "--discard-casava":"" Search for TAG followed by a decimal number in the description field of the read. Replace the decimal number with the correct length of the trimmed read.For example, use --length-tag 'length=' to correct fields like 'length=123'. filtering_pairedreads Filtering of paired end reads specify_paireddiscardlength Discard paired end reads shorter than LEN (-m, --minimum-length LEN:LEN ) perl $paired_ends 55 perl (defined $value) ? "-m $value":"" When trimming paired-end reads, the minimum lengths for R1 and R2 can be specified separately by separating them with a colon (:). If the colon syntax is not used, the same minimum length applies to both reads, as discussed above. Also, one of the values can be omitted to impose no restrictions. For example, with -m 17:, the length of R1 must be at least 17, but the length of R2 is ignored. specify_pairedmaxlength Discard paired end reads longer than LEN (-M, --maximum-length LEN:LEN ) 57 perl $paired_ends perl (defined $value) ? "-M $value":"" When trimming paired-end reads, the maximum lengths for R1 and R2 can be specified separately by separating them with a colon (:). If the colon syntax is not used, the same minimum length applies to both reads, as discussed above. Also, one of the values can be omitted to impose no restrictions. For example, with -m 17:, the length of R1 must be at least 17, but the length of R2 is ignored. specify_pairedreadfilter Which reads in paired-end read have to match the filtering criterion (--discard-casava) 67 perl $paired_ends any both first perl ($value) ? "--pair-filter=$value":"" Which of the reads in a paired-end read have to match the filtering criterion in order for it to be filtered. output_options Output options specify_fullreport Print full report (unchecked gives minimal) 81 perl ($value) ? "--report full":"--report minimal" specify_outputfile Write trimmed reads to FILE (-o) 81 perl (defined $value) ? "-o $value":"" Please enter a name for the trimmed reads output file perl !defined $specify_outputfile Write trimmed reads to FILE. FASTQ or FASTA format is chosen depending on input. Summary report is sent to standard output. Use '{name}' for demultiplexing (see docs). Default: write to standard output specify_pairedoutputfile Write trimmed paired end reads to FILE (-p) perl $paired_ends 81 perl (defined $value) ? "-p $value":"" Please enter a name for the second paired end trimmed reads output file perl $paired_ends && !defined $specify_pairedoutputfile Write trimmed reads to FILE. FASTQ or FASTA format is chosen depending on input. Summary report is sent to standard output. Use '{name}' for demultiplexing (see docs). Default: write to standard output specify_fastaout Output FASTA to standard output even on FASTQ input (--fasta) 83 perl ($value) ? "--fasta":"" specify_compression Use compression level 1 for gzipped output files (-Z) 85 perl ($value) ? "--fasta":"" Use compression level 1 for gzipped output files (faster, but uses more space) specify_infofile Write information about each read and its adapter matches into FILE (--info-file) 87 perl (defined $value) ? "--info-file $value":"" Write information about each read and its adapter matches into FILE. See the documentation for the file specify_restfile Write information about each read and its adapter matches into FILE (-r) 89 perl (defined $value) ? "-r $value":"" When the adapter matches in the middle of a read, write the rest (after the adapter) to FILE. specify_tooshortfile Write reads that are too short into FILE (--too-short-output) 89 perl (defined $value) ? "--too-short-output $value":"" Write reads that are too short (according to length specified by -m) to FILE. Default: discard reads specify_untooshortpairedreadoutfile Write the second read in a pair to this file if pair is too short (--too-short-paired-output) 67 perl $paired_ends perl ($value) ? "--untrimmed-paired-output too-short-paired-output.txt":"" too-short-paired-output.txt Write the second read in a pair to this file if pair is too short. Use together with --too-short-output. specify_toolongfile Write reads that are too long into FILE (--too-long-output) 89 perl (defined $value) ? "--too-long-output $value":"" Write reads that are too long (according to length specified by -M) to FILE. Default: discard reads specify_toolongpairedreadoutfile Write too long second reads to a file (--toolong-paired-output) 67 perl $paired_ends perl ($value) ? "--too-long-paired-output toolong_paired_outfile.txt":"" toolong_paired_outfile.txt Search for TAG followed by a decimal number in the description field of the read. Replace the decimal number with the correct length of the trimmed read.For example, use --length-tag 'length=' to correct fields like 'length=123'. specify_untrimmedfile Write reads that do not contain any adapter into FILE (--too-untrimmed-output) 89 perl (defined $value) ? "--too-untrimmed-output $value":"" Write reads that do not contain any adapter to FILE Default: discard reads specify_untrimmedpairedreadoutfile Write paired reads that do not contain any adapter into a file (--untrimmed-paired-output) 67 perl $paired_ends perl ($value) ? "--untrimmed-paired-output untrimmed_paired_outfile.txt":"" untrimmed_paired_outfile.txt