is an algorithm to discover signals of protein-coding sequences
within genomic regions. You can analyze a short nucleotide
sequence (up to 25Kb in length or up to 1Mb if you upload the
Blast report). It highlights genomic regions with stacked
non-significant alignments (protomotifs) which would represent
present or ancient protein-coding sequences. It allows to
discover new genes in bacteria or exons in eukaryotic organisms.
AnABlast Input website
You only have to paste
your nucleotide sequence or upload it as a FASTA file, and click
on submit. The execution time depends on the length of the input
sequence: approximately 1 minute per Kb.
achieve AnABlast results in a shorter time you can optionally
upload a BLAST output file. To do that, you will need to run
program in this way:
gzip -d uniref50.fasta.gz
makeblastdb -in uniref50.fasta -dbtype prot
blastx -db uniref50.fasta -query sequence.fasta -evalue 50000
-outfmt '6 sseqid qseqid qstart qend evalue bitscore qframe'
-matrix BLOSUM90 -seg no -max_target_seqs 10000000
The sensitivity score
allow to perform more sensitive predictions (though less
specific) changing to bit-score=28, or more specific predictions
(though less sensitive) changing to bit-score=35.
The results are shown on a
genome browser (JBrowse)
and are divided in two sections:
The left block has track
names that you can activate/inactivate:
Reference sequence: Track
with the nucleotide sequence.
AnABlast: piling up
of protomotifs throughout the query sequence. In green color
those from the forward strand, and in red color those from the
reverse strand. Commonly, peaks of pile up protomotifs
match with protein-coding regions.
significant protomotif accumulation (track highligthed in pink).
If you click on a peak you will have:
Primary Data: name, type, score (peak height), position and
Attibutes: Amino acids sequence, frame, peak ID, sequence id,
source, number of stop codons inside the peak, nucleotide
sequence and subfeatures.
Subfeatures: functional annotation predicted by Sma3s annotator,
and the best hit blast.
reading frames (longer than five aa) found in peak regions
(highligthed in blue).
ab initio gene predictions using Prodigal (for prokariotic
sequences) and Augustis (for eukaryotic sequences).
The right block shows all
the tracks described above and allows you to move along the
sequence just dragging the mouse and zoom out or in (so, you can
see the nucleotides of the sequence). You can also extract your
sequence of interest by zooming to it, clicking on the header
"Reference sequence" and selecting "save track data".
Rubio A, Casimiro-Soriguer CS, Mier P, Andrade-Navarro MA,
Garzón A, Jimenez J, Pérez-Pulido AJ (2019) AnABlast: Re-searching for Protein-Coding Sequences in Genomic Regions.
Methods in molecular biology (Clifton, N.J.). 2019; 1962:207-214.
Jimenez J, Duncan CD,
Gallardo M, Mata J, Perez-Pulido AJ (2015) AnABlast: a new in silico strategy for the
genome-wide search of novel genes and fossil regions. DNA
Res. 2015 Oct 21.