In Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing ; h5-index 0.0
Joint peak detection is a central problem when comparing samples in epigenomic data analysis, but current algorithms for this task are unsupervised and limited to at most 2 sample types. We propose PeakSegPipeline, a new genome-wide multi-sample peak calling pipeline for epigenomic data sets. It performs peak detection using a constrained maximum likelihood segmentation model with essentially only one free parameter that needs to be tuned: the number of peaks. To select the number of peaks, we propose to learn a penalty function based on user-provided labels that indicate genomic regions with or without peaks in specific samples. In comparisons with state-of-the-art peak detection algorithms, PeakSegPipeline achieves similar or better accuracy, and a more interpretable model with overlapping peaks that occur in exactly the same positions across all samples. Our novel approach is able to learn that predicted peak sizes vary by experiment type.
Hocking Toby Dylan, Bourque Guillaume