ProtHint on ACCESS2.6.0Prothint pipeline for predicting and scoring hints (in the form of introns, start and stop codons) in the genomeTomas Bruna, Alexandre Lomsadze, Mark BorodovskyGeneMark-ES: Gene identification in novel eukaryotic genomes by self-training algorithm Nucl Acids Res (2005) 33, 6494-6506 Assembly / Genecallingprothint_accessinfile_fastaInput File (must be in fasta format)perl"genome.fasta"98genome.fastaprothint_invocationperl"prothint_2.6.0_expanse"0prothint_scheduler1scheduler.confperl$specify_cores < 128 perl
"threads_per_process=$specify_cores\\n" .
"mem=" . ($specify_cores * 2) . "G\\n" .
"node_exclusive=0\\n" .
"nodes=1\\n"
prothint_scheduler2scheduler.confperl$specify_cores > 64 perl
"threads_per_process=$specify_cores\\n" .
"mem=243G\\n" .
"node_exclusive=1\\n" .
"nodes=1\\n"
txt_outputTxt output*.txttargz_outputTxt output*.tar.gztar_outputTxt output*.tarzip_outputZip output*.zipzip_outputZip output*.zipjobinfo_outputJobinfo output_JOBINFO.TXTbatch_outputBatch output_batch-command.*runtime1scheduler.confMaximum Hours to Run (click here for help setting this correctly)perl"runhours=$value\\n"0.25Maximum Hours to Run must be less than 168perl$runtime > 168.0Maximum Hours to Run must be greater than 0.1 perl$runtime < 0.1The job will run on 1 processor as configured. If it runs for the entire configured time, it will consume $specify_cores x $runtime cpu hoursperl$runtime ne 0 && $specify_cores == 1The job will run on 2 processors as configured. If it runs for the entire configured time, it will consume $specify_cores x $runtime cpu hoursperl$runtime ne 0 && $specify_cores == 2The job will run on 4 processors as configured. If it runs for the entire configured time, it will consume $specify_cores x $runtime cpu hoursperl$runtime ne 0 && $specify_cores == 4The job will run on 8 processors as configured. If it runs for the entire configured time, it will consume $specify_cores x $runtime cpu hoursperl$runtime ne 0 && $specify_cores == 8The job will run on 16 processors as configured. If it runs for the entire configured time, it will consume $specify_cores x $runtime cpu hoursperl$runtime ne 0 && $specify_cores == 16The job will run on 32 processors as configured. If it runs for the entire configured time, it will consume $specify_cores x $runtime cpu hoursperl$runtime ne 0 && $specify_cores == 32The job will run on 64 processors as configured. If it runs for the entire configured time, it will consume $specify_cores x $runtime cpu hoursperl$runtime ne 0 && $specify_cores == 64The job will run on 128 processors as configured. If it runs for the entire configured time, it will consume $specify_cores x $runtime cpu hoursperl$runtime ne 0 && $specify_cores == 128Estimate the maximum time your job will need to run. We recommend testimg initially with a < 0.5hr test run because Jobs set for 0.5 h or less dependably run immediately in the "debug" queue.
Once you are sure the configuration is correct, you then increase the time. The reason is that jobs > 0.5 h are submitted to the "normal" queue, where jobs configured for 1 or a few hours times may
run sooner than jobs configured for the full 168 hours.
specify_coresHow many cores?1248163264128perldefined $value ? "--threads $value":""30infile_fasta2Protein Input File (must be in fasta format)perldefined $value ? "proteins.fasta":""99proteins.fastaPlease select a protein input fileperl!defined $infile_fasta2specify_genemarkfileSpecify a genemark.gtf file genemark.gtfperl(defined $specify_genemarkfile) ? "--geneMarkGtf genemark.gtf":""30specify_outdirnameSpecify the output directory nameperldefined $value ? "--workdir $value":""30specify_fungusRun GeneMark-ES in fungus modeperl$specify_fungus ? "--fungus":""30specify_diamondpairsFile with "seed gene-protein" hits generated by DIAMONDperldefined $specify_diamondpairs ? " --diamondPairs diamondpairs.txt":""diamondpairs.txt30When a diamond pairs file is included, the value of nmaxProteinsPerSeed is ignored.perldefined $specify_diamondpairs && defined $specify_maxprotsperseed If this file is provided, DIAMOND search for protein hits is skipped. The seed genes in this file must correspond to seed genespassed by "-*geneSeeds" option.
All pairs in the file are used -* option "-*nmaxProteinsPerSeed" is ignored.specify_maxprotsperseedMaximum number of protein hits per seed geneperldefined $specify_maxprotsperseed ? "--maxProteinsPerSeed $value":""2530Increasing this number leads to increased runtime and may improve the sensitivity of hints. Decreasing has an opposite effect. Default is set to 25.specify_evalueMaximum e-value for DIAMOND alignments hitsperldefined $specify_evalue ? "--evalue $value":""0.00130specify_minexonscoreSpecify the Minimum Exon Scoreperldefined $specify_minexonscore ? "--minExonScore $value":""2530Discard all hints inside/neighboring exons with score lower than this value. Default=25specify_cleanupRemove intermediate resultsperl$specify_cleanup ? " --cleanup ":""030Cleanup is turned off by default as it is useful to keep these files for troubleshooting and the intermediate results might be useful on their own.specify_prevgeneseedsFile with previous GeneSeedsperldefined $specify_prevgeneseeds ? "--prevGeneSeeds prevGeneSeeds.txt":""prevGeneSeeds.txt30File with gene seed used in the previous iteration. Next iteration of ProtHint is only executed for -*geneSeeds which differ
from -*prevGeneSeeds. -*prevSpalnGff is required when this option is used since results from the previous iteration are reused for seeds which
do not differ (Gene ids of such hints are updated to match the new seed genes).specify_revspalngffFile with scored hints from previous iterationperldefined $specify_prevgeneseeds perldefined $specify_revspalngff ? " --revSpalnGff revSpaln.gff":""revSpaln.gffTo use the File with prevGeneSeeds, you must include a file with scored hints from the previous iterationperldefined $specify_prevgeneseeds && !defined $specify_revspalngff30