Explain the concept of annotation error propagation and why it is a significant problem in genomic databases.
Think about your answer, then reveal below.
Model answer: When a gene is functionally annotated by homology to another gene, any error in the source annotation is transferred to the new gene. As databases grow, these errors amplify: a misannotated gene serves as the basis for annotating hundreds of homologs across genomes, each of which may then serve as evidence for still more annotations. Studies estimate that 5-40% of functional annotations in public databases contain errors, depending on the organism and the specificity of the annotation. The problem is compounded because transitive annotation (A annotated from B, B annotated from C) can link genes through chains of decreasing reliability.
This is why experimental evidence codes matter and why curated databases like SwissProt (manually reviewed) are more reliable than TrEMBL (automatically annotated). It is also why careful orthology assignment is critical — annotating a gene based on a paralog rather than the true ortholog increases the risk of functional misprediction.