ProtHint on ACCESS 2.6.0 Prothint pipeline for predicting and scoring hints (in the form of introns, start and stop codons) in the genome Tomas Bruna, Alexandre Lomsadze, Mark Borodovsky GeneMark-ES: Gene identification in novel eukaryotic genomes by self-training algorithm Nucl Acids Res (2005) 33, 6494-6506 Assembly / Genecalling prothint_access infile_fasta Input File (must be in fasta format) perl "genome.fasta" 98 genome.fasta prothint_invocation perl "prothint_2.6.0_expanse" 0 prothint_scheduler1 scheduler.conf perl $specify_cores < 128 perl "threads_per_process=$specify_cores\\n" . "mem=" . ($specify_cores * 2) . "G\\n" . "node_exclusive=0\\n" . "nodes=1\\n" prothint_scheduler2 scheduler.conf perl $specify_cores > 64 perl "threads_per_process=$specify_cores\\n" . "mem=243G\\n" . "node_exclusive=1\\n" . "nodes=1\\n" txt_output Txt output *.txt targz_output Txt output *.tar.gz tar_output Txt output *.tar zip_output Zip output *.zip zip_output Zip output *.zip jobinfo_output Jobinfo output _JOBINFO.TXT batch_output Batch output _batch-command.* runtime 1 scheduler.conf Maximum Hours to Run (click here for help setting this correctly) perl "runhours=$value\\n" 0.25 Maximum Hours to Run must be less than 168 perl $runtime > 168.0 Maximum Hours to Run must be greater than 0.1 perl $runtime < 0.1 The job will run on 1 processor as configured. If it runs for the entire configured time, it will consume $specify_cores x $runtime cpu hours perl $runtime ne 0 && $specify_cores == 1 The job will run on 2 processors as configured. If it runs for the entire configured time, it will consume $specify_cores x $runtime cpu hours perl $runtime ne 0 && $specify_cores == 2 The job will run on 4 processors as configured. If it runs for the entire configured time, it will consume $specify_cores x $runtime cpu hours perl $runtime ne 0 && $specify_cores == 4 The job will run on 8 processors as configured. If it runs for the entire configured time, it will consume $specify_cores x $runtime cpu hours perl $runtime ne 0 && $specify_cores == 8 The job will run on 16 processors as configured. If it runs for the entire configured time, it will consume $specify_cores x $runtime cpu hours perl $runtime ne 0 && $specify_cores == 16 The job will run on 32 processors as configured. If it runs for the entire configured time, it will consume $specify_cores x $runtime cpu hours perl $runtime ne 0 && $specify_cores == 32 The job will run on 64 processors as configured. If it runs for the entire configured time, it will consume $specify_cores x $runtime cpu hours perl $runtime ne 0 && $specify_cores == 64 The job will run on 128 processors as configured. If it runs for the entire configured time, it will consume $specify_cores x $runtime cpu hours perl $runtime ne 0 && $specify_cores == 128 Estimate the maximum time your job will need to run. We recommend testimg initially with a < 0.5hr test run because Jobs set for 0.5 h or less dependably run immediately in the "debug" queue. Once you are sure the configuration is correct, you then increase the time. The reason is that jobs > 0.5 h are submitted to the "normal" queue, where jobs configured for 1 or a few hours times may run sooner than jobs configured for the full 168 hours. specify_cores How many cores? 1 2 4 8 16 32 64 128 perl defined $value ? "--threads $value":"" 30 infile_fasta2 Protein Input File (must be in fasta format) perl defined $value ? "proteins.fasta":"" 99 proteins.fasta Please select a protein input file perl !defined $infile_fasta2 specify_genemarkfile Specify a genemark.gtf file genemark.gtf perl (defined $specify_genemarkfile) ? "--geneMarkGtf genemark.gtf":"" 30 specify_outdirname Specify the output directory name perl defined $value ? "--workdir $value":"" 30 specify_fungus Run GeneMark-ES in fungus mode perl $specify_fungus ? "--fungus":"" 30 specify_diamondpairs File with "seed gene-protein" hits generated by DIAMOND perl defined $specify_diamondpairs ? " --diamondPairs diamondpairs.txt":"" diamondpairs.txt 30 When a diamond pairs file is included, the value of nmaxProteinsPerSeed is ignored. perl defined $specify_diamondpairs && defined $specify_maxprotsperseed If this file is provided, DIAMOND search for protein hits is skipped. The seed genes in this file must correspond to seed genespassed by "-*geneSeeds" option. All pairs in the file are used -* option "-*nmaxProteinsPerSeed" is ignored. specify_maxprotsperseed Maximum number of protein hits per seed gene perl defined $specify_maxprotsperseed ? "--maxProteinsPerSeed $value":"" 25 30 Increasing this number leads to increased runtime and may improve the sensitivity of hints. Decreasing has an opposite effect. Default is set to 25. specify_evalue Maximum e-value for DIAMOND alignments hits perl defined $specify_evalue ? "--evalue $value":"" 0.001 30 specify_minexonscore Specify the Minimum Exon Score perl defined $specify_minexonscore ? "--minExonScore $value":"" 25 30 Discard all hints inside/neighboring exons with score lower than this value. Default=25 specify_cleanup Remove intermediate results perl $specify_cleanup ? " --cleanup ":"" 0 30 Cleanup is turned off by default as it is useful to keep these files for troubleshooting and the intermediate results might be useful on their own. specify_prevgeneseeds File with previous GeneSeeds perl defined $specify_prevgeneseeds ? "--prevGeneSeeds prevGeneSeeds.txt":"" prevGeneSeeds.txt 30 File with gene seed used in the previous iteration. Next iteration of ProtHint is only executed for -*geneSeeds which differ from -*prevGeneSeeds. -*prevSpalnGff is required when this option is used since results from the previous iteration are reused for seeds which do not differ (Gene ids of such hints are updated to match the new seed genes). specify_revspalngff File with scored hints from previous iteration perl defined $specify_prevgeneseeds perl defined $specify_revspalngff ? " --revSpalnGff revSpaln.gff":"" revSpaln.gff To use the File with prevGeneSeeds, you must include a file with scored hints from the previous iteration perl defined $specify_prevgeneseeds && !defined $specify_revspalngff 30