Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General General

From sonority hierarchy to posterior probability as a measure of lenition: The case of Spanish stops.

In The Journal of the Acoustical Society of America

A deep learning Phonet model was evaluated as a method to measure lenition. Unlike quantitative acoustic methods, recurrent networks were trained to recognize the posterior probabilities of sonorant and continuant phonological features in a corpus of Argentinian Spanish. When applied to intervocalic and post-nasal voiced and voiceless stops, the approach yielded lenition patterns similar to those previously reported. Further, additional patterns also emerged. The results suggest the validity of the approach as an alternative or addition to quantitative acoustic measures of lenition.

Tang Kevin, Wayland Ratree, Wang Fenqi, Vellozzi Sophia, Sengupta Rahul, Altmann Lori

2023-Feb

Public Health Public Health

Precision information extraction for rare disease epidemiology at scale.

In Journal of translational medicine

BACKGROUND : The United Nations recently made a call to address the challenges of an estimated 300 million persons worldwide living with a rare disease through the collection, analysis, and dissemination of disaggregated data. Epidemiologic Information (EI) regarding prevalence and incidence data of rare diseases is sparse and current paradigms of identifying, extracting, and curating EI rely upon time-intensive, error-prone manual processes. With these limitations, a clear understanding of the variation in epidemiology and outcomes for rare disease patients is hampered. This challenges the public health of rare diseases patients through a lack of information necessary to prioritize research, policy decisions, therapeutic development, and health system allocations.

METHODS : In this study, we developed a newly curated epidemiology corpus for Named Entity Recognition (NER), a deep learning framework, and a novel rare disease epidemiologic information pipeline named EpiPipeline4RD consisting of a web interface and Restful API. For the corpus creation, we programmatically gathered a representative sample of rare disease epidemiologic abstracts, utilized weakly-supervised machine learning techniques to label the dataset, and manually validated the labeled dataset. For the deep learning framework development, we fine-tuned our dataset and adapted the BioBERT model for NER. We measured the performance of our BioBERT model for epidemiology entity recognition quantitatively with precision, recall, and F1 and qualitatively through a comparison with Orphanet. We demonstrated the ability for our pipeline to gather, identify, and extract epidemiology information from rare disease abstracts through three case studies.

RESULTS : We developed a deep learning model to extract EI with overall F1 scores of 0.817 and 0.878, evaluated at the entity-level and token-level respectively, and which achieved comparable qualitative results to Orphanet's collection paradigm. Additionally, case studies of the rare diseases Classic homocystinuria, GRACILE syndrome, Phenylketonuria demonstrated the adequate recall of abstracts with epidemiology information, high precision of epidemiology information extraction through our deep learning model, and the increased efficiency of EpiPipeline4RD compared to a manual curation paradigm.

CONCLUSIONS : EpiPipeline4RD demonstrated high performance of EI extraction from rare disease literature to augment manual curation processes. This automated information curation paradigm will not only effectively empower development of the NIH Genetic and Rare Diseases Information Center (GARD), but also support the public health of the rare disease community.

Kariampuzha William Z, Alyea Gioconda, Qu Sue, Sanjak Jaleal, Mathé Ewy, Sid Eric, Chatelaine Haley, Yadaw Arjun, Xu Yanji, Zhu Qian

2023-Feb-28

General General

Potential and limitations of machine meta-learning (ensemble) methods for predicting COVID-19 mortality in a large inhospital Brazilian dataset.

In Scientific reports ; h5-index 158.0

The majority of early prediction scores and methods to predict COVID-19 mortality are bound by methodological flaws and technological limitations (e.g., the use of a single prediction model). Our aim is to provide a thorough comparative study that tackles those methodological issues, considering multiple techniques to build mortality prediction models, including modern machine learning (neural) algorithms and traditional statistical techniques, as well as meta-learning (ensemble) approaches. This study used a dataset from a multicenter cohort of 10,897 adult Brazilian COVID-19 patients, admitted from March/2020 to November/2021, including patients [median age 60 (interquartile range 48-71), 46% women]. We also proposed new original population-based meta-features that have not been devised in the literature. Stacking has shown to achieve the best results reported in the literature for the death prediction task, improving over previous state-of-the-art by more than 46% in Recall for predicting death, with AUROC 0.826 and MacroF1 of 65.4%. The newly proposed meta-features were highly discriminative of death, but fell short in producing large improvements in final prediction performance, demonstrating that we are possibly on the limits of the prediction capabilities that can be achieved with the current set of ML techniques and (meta-)features. Finally, we investigated how the trained models perform on different hospitals, showing that there are indeed large differences in classifier performance between different hospitals, further making the case that errors are produced by factors that cannot be modeled with the current predictors.

de Paiva Bruno Barbosa Miranda, Pereira Polianna Delfino, de Andrade Claudio Moisés Valiense, Gomes Virginia Mara Reis, Souza-Silva Maira Viana Rego, Martins Karina Paula Medeiros Prado, Sales Thaís Lorenna Souza, de Carvalho Rafael Lima Rodrigues, Pires Magda Carvalho, Ramos Lucas Emanuel Ferreira, Silva Rafael Tavares, de Freitas Martins Vieira Alessandra, Nunes Aline Gabrielle Sousa, de Oliveira Jorge Alzira, de Oliveira Maurílio Amanda, Scotton Ana Luiza Bahia Alves, da Silva Carla Thais Candida Alves, Cimini Christiane Corrêa Rodrigues, Ponce Daniela, Pereira Elayne Crestani, Manenti Euler Roberto Fernandes, Rodrigues Fernanda d’Athayde, Anschau Fernando, Botoni Fernando Antônio, Bartolazzi Frederico, Grizende Genna Maira Santos, Noal Helena Carolina, Duani Helena, Gomes Isabela Moraes, Costa Jamille Hemétrio Salles Martins, di Sabatino Santos Guimarães Júlia, Tupinambás Julia Teixeira, Rugolo Juliana Machado, Batista Joanna d’Arc Lyra, de Alvarenga Joice Coutinho, Chatkin José Miguel, Ruschel Karen Brasil, Zandoná Liege Barella, Pinheiro Lílian Santos, Menezes Luanna Silva Monteiro, de Oliveira Lucas Moyses Carvalho, Kopittke Luciane, Assis Luisa Argolo, Marques Luiza Margoto, Raposo Magda Cesar, Floriani Maiara Anschau, Bicalho Maria Aparecida Camargos, Nogueira Matheus Carvalho Alves, de Oliveira Neimy Ramos, Ziegelmann Patricia Klarmann, Paraiso Pedro Gibson, de Lima Martelli Petrônio José, Senger Roberta, Menezes Rochele Mosmann, Francisco Saionara Cristina, Araújo Silvia Ferreira, Kurtz Tatiana, Fereguetti Tatiani Oliveira, de Oliveira Thainara Conceição, Ribeiro Yara Cristina Neves Marques Barbosa, Ramires Yuri Carlotto, Lima Maria Clara Pontello Barbosa, Carneiro Marcelo, Bezerra Adriana Falangola Benjamin, Schwarzbold Alexandre Vargas, de Moura Costa André Soares, Farace Barbara Lopes, Silveira Daniel Vitorio, de Almeida Cenci Evelin Paola, Lucas Fernanda Barbosa, Aranha Fernando Graça, Bastos Gisele Alsina Nader, Vietta Giovanna Grunewald, Nascimento Guilherme Fagundes, Vianna Heloisa Reniers, Guimarães Henrique Cerqueira, de Morais Julia Drumond Parreiras, Moreira Leila Beltrami, de Oliveira Leonardo Seixas, de Deus Sousa Lucas, de Souza Viana Luciano, de Souza Cabral Máderson Alvares, Ferreira Maria Angélica Pires, de Godoy Mariana Frizzo, de Figueiredo Meire Pereira, Guimarães-Junior Milton Henriques, de Paula de Sordi Mônica Aparecida, da Cunha Severino Sampaio Natália, Assaf Pedro Ledic, Lutkmeier Raquel, Valacio Reginaldo Aparecido, Finger Renan Goulart, de Freitas Rufino, Guimarães Silvana Mangeon Meirelles, Oliveira Talita Fischer, Diniz Thulio Henrique Oliveira, Gonçalves Marcos André, Marcolino Milena Soriano

2023-Mar-01

Pathology Pathology

Cluster-Guided Semi-Supervised Domain Adaptation for Imbalanced Medical Image Classification

ArXiv Preprint

Semi-supervised domain adaptation is a technique to build a classifier for a target domain by modifying a classifier in another (source) domain using many unlabeled samples and a small number of labeled samples from the target domain. In this paper, we develop a semi-supervised domain adaptation method, which has robustness to class-imbalanced situations, which are common in medical image classification tasks. For robustness, we propose a weakly-supervised clustering pipeline to obtain high-purity clusters and utilize the clusters in representation learning for domain adaptation. The proposed method showed state-of-the-art performance in the experiment using severely class-imbalanced pathological image patches.

Shota Harada, Ryoma Bise, Kengo Araki, Akihiko Yoshizawa, Kazuhiro Terada, Mariyo Kurata, Naoki Nakajima, Hiroyuki Abe, Tetsuo Ushiku, Seiichi Uchida

2023-03-02

General General

BOTAN: BOnd TArgeting Network for prediction of slow glassy dynamics by machine learning relative motion.

In The Journal of chemical physics

Recent developments in machine learning have enabled accurate predictions of the dynamics of slow structural relaxation in glass-forming systems. However, existing machine learning models for these tasks are mostly designed such that they learn a single dynamic quantity and relate it to the structural features of glassy liquids. In this study, we propose a graph neural network model, "BOnd TArgeting Network," that learns relative motion between neighboring pairs of particles, in addition to the self-motion of particles. By relating the structural features to these two different dynamical variables, the model autonomously acquires the ability to discern how the self motion of particles undergoing slow relaxation is affected by different dynamical processes, strain fluctuations and particle rearrangements, and thus can predict with high precision how slow structural relaxation develops in space and time.

Shiba Hayato, Hanai Masatoshi, Suzumura Toyotaro, Shimokawabe Takashi

2023-Feb-28

General General

A flexible proton beam imaging energy spectrometer (PROBIES) for high repetition rate or single-shot high energy density (HED) experiments (invited).

In The Review of scientific instruments

The PROBIES diagnostic is a new, highly flexible, imaging and energy spectrometer designed for laser-accelerated protons. The diagnostic can detect low-mode spatial variations in the proton beam profile while resolving multiple energies on a single detector or more. When a radiochromic film stack is employed for "single-shot mode," the energy resolution of the stack can be greatly increased while reducing the need for large numbers of films; for example, a recently deployed version allowed for 180 unique energy measurements spanning ∼3 to 75 MeV with <0.4 MeV resolution using just 20 films vs 180 for a comparable traditional film and filter stack. When utilized with a scintillator, the diagnostic can be run in high-rep-rate (>Hz rate) mode to recover nine proton energy bins. We also demonstrate a deep learning-based method to analyze data from synthetic PROBIES images with greater than 95% accuracy on sub-millisecond timescales and retrained with experimental data to analyze real-world images on sub-millisecond time-scales with comparable accuracy.

Mariscal D A, Djordjević B Z, Anirudh R, Bremer T, Campbell P C, Feister S, Folsom E, Grace E S, Hollinger R, Jacobs S A, Kailkhura B, Kalantar D, Kemp A J, Kim J, Kur E, Liu S, Ludwig J, Morrison J, Nedbailo R, Ose N, Park J, Rocca J J, Scott G G, Simpson R A, Song H, Spears B, Sullivan B, Swanson K K, Thiagarajan J, Wang S, Williams G J, Wilks S C, Wyatt M, Van Essen B, Zacharias R, Zeraouli G, Zhang J, Ma T

2023-Feb-01