In Journal of molecular biology ; h5-index 65.0
Numerous viruses infecting bacteria and archaea encode CRISPR-Cas system inhibitors, known as anti-CRISPR proteins (Acr). The Acrs typically are highly specific for particular CRISPR variants, resulting in remarkable sequence and structural diversity and complicating accurate prediction and identification of Acrs. In addition to their intrinsic interest for understanding the coevolution of defense and counter-defense systems in prokaryotes, Acrs could be natural, potent on-off switches for CRISPR-based biotechnological tools, so their discovery, characterization and application are of major importance. Here we discuss the computational approaches for Acr prediction. Due to the enormous diversity and likely multiple origins of the Acrs, sequence similarity searches are of limited use. However, multiple features of protein and gene organization have been successfully harnessed to this end including small protein size and distinct amino acid compositions of the Acrs, association of acr genes in virus genomes with genes encoding helix-turn-helix proteins that regulate Acr expression (Acr-associated proteins, Aca), and presence of self-targeting CRISPR spacers in bacterial and archaeal genomes containing Acr-encoding proviruses. Productive approaches for Acr prediction also involve genome comparison of closely related viruses, of which one is resistant and the other one is sensitive to a particular CRISPR variant, and "guilt by association" whereby genes adjacent to a homolog of a known Aca are identified as candidate Acrs. The distinctive features of Acrs are employed for Acr prediction both by developing dedicated search algorithms and through machine learning. New approaches will be needed to identify novel types of Acrs that are likely to exist.
Makarova Kira S, I Wolf Yuri, V Koonin Eugene
2023-Mar-01
anti-CRISPR proteins, comparative genomics, guilt-by-association, machine learning, self-targeting