In Bioinformatics (Oxford, England)
SUMMARY : We propose a new spectral framework for reliable training, scalable inference and interpretable explanation of the DNA repair outcome following a Cas9 cutting. Our framework, dubbed CRISPRL and, relies on an unexploited observation about the nature of the repair process: the landscape of the DNA repair is highly sparse in the (Walsh-Hadamard) spectral domain. This observation enables our framework to address key shortcomings that limit the interpretability and scaling of current deep-learning-based DNA repair models. In particular, CRISPRL and reduces the time to compute the full DNA repair landscape from a striking 5230 years to 1 week and the sampling complexity from 1012 to 3 million guide RNAs with only a small loss in accuracy (R2R2 ∼ 0.9). Our proposed framework is based on a divide-and-conquer strategy that uses a fast peeling algorithm to learn the DNA repair models. CRISPRL and captures lower-degree features around the cut site, which enrich for short insertions and deletions as well as higher-degree microhomology patterns that enrich for longer deletions.
AVAILABILITY AND IMPLEMENTATION : The CRISPRL and software is publicly available at https://github.com/UCBASiCS/CRISPRLand.
Aghazadeh Amirali, Ocal Orhan, Ramchandran Kannan