Globally, every year about 11% of infants are born preterm, defined as a birth prior to 37 weeks of gestation, with significant and lingering health consequences. Multiple studies have related the vaginal microbiome to preterm birth. We present a crowdsourcing approach to predict: (a) preterm or (b) early preterm birth from 9 publicly available vaginal microbiome studies representing 3,578 samples from 1,268 pregnant individuals, aggregated from raw sequences via an open-source tool, MaLiAmPi. We validated the crowdsourced models on novel datasets representing 331 samples from 148 pregnant individuals. From 318 DREAM challenge participants we received 148 and 121 submissions for our two separate prediction sub-challenges with top-ranking submissions achieving bootstrapped AUROC scores of 0.69 and 0.87, respectively. Alpha diversity, VALENCIA community state types, and composition (via phylotype relative abundance) were important features in the top performing models, most of which were tree based methods. This work serves as the foundation for subsequent efforts to translate predictive tests into clinical practice, and to better understand and prevent preterm birth.

Golob Jonathan L, Oskotsky Tomiko T, Tang Alice S, Roldan Alennie, Chung Verena, Ha Connie W Y, Wong Ronald J, Flynn Kaitlin J, Parraga-Leo Antonio, Wibrand Camilla, Minot Samuel S, Andreoletti Gaia, Kosti Idit, Bletz Julie, Nelson Amber, Gao Jifan, Wei Zhoujingpeng, Chen Guanhua, Tang Zheng-Zheng, Novielli Pierfrancesco, Romano Donato, Pantaleo Ester, Amoroso Nicola, Monaco Alfonso, Vacca Mirco, Angelis Maria De, Bellotti Roberto, Tangaro Sabina, Kuntzleman Abigail, Bigcraft Isaac, Techtmann Stephen, Bae Daehun, Kim Eunyoung, Jeon Jongbum, Joe Soobok, Theis Kevin R, Ng Sherrianne, Lee Li Yun S, Bennett Phillip R, MacIntyre David A, Stolovitzky Gustavo, Lynch Susan V, Albrecht Jake, Gomez-Lopez Nardhy, Romero Roberto, Stevenson David K, Aghaeepour Nima, Tarca Adi L, Costello James C, Sirota Marina