The basic execution of orthoFind is easy. The only mandatory requirement is one protein sequence in fasta format,
although more than one protein is also accepted. The sequence could be pasted or uploaded in a file. A sequence in fasta format
begins with a line description, including its AC, Gene Name, Organism, etc. The description is followed by one or more lines of
sequence data. One example of protein sequence in fasta format is the following:|
>sp|Q16637|SMN_HUMAN Survival motor neuron protein OS=Homo sapiens GN=SMN1 PE=1 SV=1
After submitting the sequence, orthoFind checks if it is correct (it must be a protein sequence). If so, the
tool will start its execution. It will look for homologs and orthologs to the initial sequence using the default parameters. The
default parameters are:
Minimum identity required: 53%
Low complexity filter: OFF
Results will be available within 2-10 minutes from the beginning of the execution.
The initial search for homologs uses Swiss-Prot database by default, but it can be selected alternatively a set of completely sequenced proteomes from
each Kingdom (Animalia, Archaea, Bacteria, Fungi, Plantae), or all of them at the same time ("Reference_proteomes"). With the homologs collected in the first search, a second search
can be performed. This second search can be performed against a selection of 75 proteomes:|
Animalia: Aedes aegypti, Apis mellifera, Bombyx mori, Bos taurus, Branchiostoma floridae, Caenorhabditis elegans,
Callithrix jacchus, Ciona intestinalis, Danio rerio, Daphnia pulex, Drosophila melanogaster, Equus caballus, Gallus gallus,
Gasterosteus aculeatus, Homo sapiens, Ixodes scapularis, Latimeria chalumnae, Macaca mulatta, Mus musculus, Oryctolagus cuniculus,
Oryzias latipes, Pan troglodytes, Pongo abelii, Rattus norvegicus, Tetraodon nigroviridis, Xenopus tropicalis.
Archaea: Cenarchaeum symbiosum, Halobacterium salinarum, Methanothermobacter thermautotrophicus, Pyrococcus furiosus,
Sulfolobus solfataricus, Thermoplasma acidophilum.
Bacteria: Escherichia coli, Agrobacterium tumefaciens, Bacillus subtilis, Bifidobacterium longum, Clostridium botulinum,
Corynebacterium glutamicum, Deinococcus radiodurans, Desulfovibrio vulgaris, Enterococcus faecalis, Flavobacterium psychrophilum,
Haemophilus influenzae, Helicobacter pylori, Lactococcus lactis, Listeria monocytogenes, Mycobacterium tuberculosis, Pseudomonas aeruginosa,
Salmonella typhimurium, Staphylococcus aureus, Streptococcus pneumoniae, Streptomyces coelicolor, Thermus thermophilus, Vibrio cholerae,
Xanthomonas campestris, Yersinia pestis.
Fungi: Ajellomyces capsulata, Candida albicans, Coprinopsis cinerea, Cryptococcus neoformans, Emericella nidulans,
Encephalitozoon cuniculi, Gibberella zeae, Neosartorya fumigata, Neurospora crassa, Saccharomyces cerevisiae, Schizosaccharomyces pombe,
Ustilago maydis, Yarrowia lipolytica.
Plantae: Arabidopsis thaliana, Brachypodium distachyon, Chlamydomonas reinhardtii, Vitis vinifera, Oryza sativa subsp japonica,
Alternatively, the search can be performed against a protein set uploaded by the user.
This set will be assumed to represent the complete proteome of a given organism.
Finally, the user can provide a proteome as an EST dataset. This dataset must be a file with transcripts from one organism.
The updated files must be in FASTA format and have a size less than 250Mb.
Some of the running parameters of the tool can be modified, such as:
If a valid email address is typed, a link to the results will be send there when they are ready.
- The minimum identity required: it refers to the minimum identity required for a found protein to be considered an homolog. A
high value of this parameter usually leads to less homologs found (increasing its specificity), whereas reducing it turns out in a
greater number of them.
- The low complexity filter: it is off by default, but it can be turned on to avoid low complexity regions. Be careful if the
initial sequence contains low complexity regions, since the search for homologs could be affected by that fact.