A BLAST search returns a hit with an E-value of 1e-50. What does this E-value tell you?
AThere is a 1e-50 probability the hit is a true homolog
BYou would expect 1e-50 alignments of this score or better by chance in a database this size
CThe alignment covers 50% of the query sequence
DThe sequences share 50% identity
The E-value (expect value) is the number of alignments with an equal or better score that you would expect to see purely by chance when searching a database of that particular size. An E-value of 1e-50 means such a score would essentially never arise by chance, providing very strong evidence that the similarity reflects true homology. The E-value is not a probability of homology, not a measure of coverage, and not a percent identity.
Question 2 True / False
BLAST is guaranteed to find the optimal local alignment between a query and every sequence in the database.
TTrue
FFalse
Answer: False
BLAST uses a heuristic seeding strategy: it first finds short exact matches (words) between the query and database sequences, then extends these seeds. This makes BLAST fast enough for large databases, but it can miss alignments that lack a sufficiently high-scoring seed — particularly weak homologies between distantly related sequences. The exact Smith-Waterman algorithm guarantees the optimal local alignment but is too slow for routine database searches.
Question 3 Short Answer
Why does the E-value of a BLAST hit depend on the size of the database being searched?
Think about your answer, then reveal below.
Model answer: A larger database contains more sequences, which means more opportunities for random matches to achieve high scores by chance. The E-value scales roughly linearly with database size: the same alignment score will have a higher (worse) E-value in a larger database because the expected number of chance hits increases. This is why E-values from searches against different databases cannot be directly compared without accounting for database size.
This is analogous to multiple testing in statistics. Searching 10 million sequences instead of 10 thousand means a million times more chances for a spurious match, so the threshold for significance must account for that. BLAST's statistical model (based on Karlin-Altschul statistics) formalizes this relationship.