In Methods in molecular biology (Clifton, N.J.)

Machine learning is revolutionizing molecular biology and bioengineering by providing powerful insights and predictions. Massively parallel reporter assays (MPRAs) have emerged as a particularly valuable class of high-throughput technique to support such algorithms. MPRAs enable the simultaneous characterization of thousands or even millions of genetic constructs and provide the large amounts of data needed to train models. However, while the scale of this approach is impressive, the design of effective MPRA experiments is challenging due to the many factors that can be varied and the difficulty in predicting how these will impact the quality and quantity of data obtained. Here, we present a computational tool called FORECAST, which can simulate MPRA experiments based on fluorescence-activated cell sorting and subsequent sequencing (commonly referred to as Flow-seq or Sort-seq experiments), as well as carry out rigorous statistical estimation of construct performance from this type of experimental data. FORECAST can be used to develop workflows to aid the design of MPRA experiments and reanalyze existing MPRA data sets.

Gilliot Pierre-Aurelien, Gorochowski Thomas E


Bioinformatics, Cell sorting, Experimental design, Inference, Massively parallel reporter assay, Sequencing, Synthetic biology