GeneMark on ACCESS 4.371 Novel eukaryotic genomes can be analyzed by the self-training GeneMark-ES. Alexandre Lomsadze, Vardges Ter-Hovhannisyan, Yury O. Chernoff, Mark Borodovsky GeneMark-ES: Gene identification in novel eukaryotic genomes by self-training algorithm Nucl Acids Res (2005) 33, 6494-6506 Aseembly / Genecalling genemark_xsede infile_fasta Input File (must be in fasta format) perl "--sequence infile.fa" 99 infile.fa genemark_invocation perl "genemark_4.371_expanse" 0 genemark_scheduler1 scheduler.conf perl $specify_cores < 128 perl "threads_per_process=$specify_cores\\n" . "mem=" . ($specify_cores * 2) . "G\\n" . "node_exclusive=0\\n" . "nodes=1\\n" genemark_scheduler2 scheduler.conf perl $specify_cores > 64 perl "threads_per_process=$specify_cores\\n" . "mem=243G\\n" . "node_exclusive=1\\n" . "nodes=1\\n" specify_cores2 scheduler.conf perl --cores $specify_cores 3 all_output All output * runtime 1 scheduler.conf Maximum Hours to Run (click here for help setting this correctly) perl "runhours=$value\\n" 0.25 Maximum Hours to Run must be less than 168 perl $runtime > 168.0 Maximum Hours to Run must be greater than 0.1 perl $runtime < 0.1 The job will run on 1 processor as configured. If it runs for the entire configured time, it will consume $specify_cores x $runtime cpu hours perl $runtime ne 0 && $specify_cores == 1 The job will run on 2 processors as configured. If it runs for the entire configured time, it will consume $specify_cores x $runtime cpu hours perl $runtime ne 0 && $specify_cores == 2 The job will run on 4 processors as configured. If it runs for the entire configured time, it will consume $specify_cores x $runtime cpu hours perl $runtime ne 0 && $specify_cores == 4 The job will run on 8 processors as configured. If it runs for the entire configured time, it will consume $specify_cores x $runtime cpu hours perl $runtime ne 0 && $specify_cores == 8 The job will run on 16 processors as configured. If it runs for the entire configured time, it will consume $specify_cores x $runtime cpu hours perl $runtime ne 0 && $specify_cores == 16 The job will run on 32 processors as configured. If it runs for the entire configured time, it will consume $specify_cores x $runtime cpu hours perl $runtime ne 0 && $specify_cores == 32 The job will run on 64 processors as configured. If it runs for the entire configured time, it will consume $specify_cores x $runtime cpu hours perl $runtime ne 0 && $specify_cores == 64 The job will run on 128 processors as configured. If it runs for the entire configured time, it will consume $specify_cores x $runtime cpu hours perl $runtime ne 0 && $specify_cores == 128 Estimate the maximum time your job will need to run. We recommend testimg initially with a < 0.5hr test run because Jobs set for 0.5 h or less dependably run immediately in the "debug" queue. Once you are sure the configuration is correct, you then increase the time. The reason is that jobs > 0.5 h are submitted to the "normal" queue, where jobs configured for 1 or a few hours times may run sooner than jobs configured for the full 168 hours. specify_cores How many cores? 1 2 4 8 16 32 64 128 perl "--cores $value" specify_analysis Which analysis? EP ES ET ES "--ES" ET "--ET RNAseq_alignment.gff" EP "" ES 5 specify_etfile Specify a file with inttron coordinates perl $specify_analysis eq "ET" RNAseq_alignment.gff specify_etscore Minimum score of the intron for ET? perl $specify_analysis eq "ET" perl "--etscore $value" 10 5 specify_epfile1 Specify a file with protein database in FASTA format perl $specify_analysis eq "EP" protein_db.fa perl (defined $specify_epfile1) ? "--EP protein_db.fa" : "" specify_epfile2 Specify a file with intron coordinates perl $specify_analysis eq "EP" protein_splice_alignment.gff perl (defined $specify_epfile2) ? "--EP protein_splice_alignment.gff" : "" specify_epscore Minimum score of the intron for EP? perl $specify_analysis eq "EP" perl "--ep_score $value" 4 Please specify a value for ep_score perl $specify_analysis eq "EP" && !defined $specify_epscore 5