Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General General

Image-Based Artificial Intelligence Methods for Product Control of Tablet Coating Quality.

In Pharmaceutics

Mimicking the human decision-making process is challenging. Especially, many process control situations during the manufacturing of pharmaceuticals are based on visual observations and related experience-based actions. The aim of the present work was to investigate the use of image analysis to classify the quality of coated tablets. Tablets with an increasing amount of coating solution were imaged by fast scanning using a conventional office scanner. A segmentation routine was implemented to the images, allowing the extraction of numeric image-based information from individual tablets. The image preprocessing was performed prior to utilization of four different classification techniques for the individual tablet images. The support vector machine (SVM) technique performed superior compared to a convolutional neural network (CNN) in relation to computational time, and this approach was also slightly better at classifying the tablets correctly. The fastest multivariate method was partial least squares (PLS) regression, but this method was hampered by the inferior classification accuracy of the tablets. Finally, it was possible to create a numerical threshold classification model with an accuracy comparable to the SVM approach, so it is evident that there exist multiple valid options for classifying coated tablets.

Hirschberg Cosima, Edinger Magnus, Holmfred Else, Rantanen Jukka, Boetker Johan


artificial intelligence, image analysis, in silico modelling, multivariate analysis, neural networks

General General

Screening for obstructive sleep apnea with novel hybrid acoustic smartphone app technology.

In Journal of thoracic disease ; h5-index 52.0

Background : Obstructive sleep apnea (OSA) has a high prevalence, with an estimated 425 million adults with apnea hypopnea index (AHI) of ≥15 events/hour, and is significantly underdiagnosed. This presents a significant pain point for both the sufferers, and for healthcare systems, particularly in a post COVID-19 pandemic world. As such, it presents an opportunity for new technologies that can enable screening in both developing and developed countries. In this work, the performance of a non-contact OSA screener App that can run on both Apple and Android smartphones is presented.

Methods : The subtle breathing patterns of a person in bed can be measured via a smartphone using the "Firefly" app technology platform [and underpinning software development kit (SDK)], which utilizes advanced digital signal processing (DSP) technology and artificial intelligence (AI) algorithms to identify detailed sleep stages, respiration rate, snoring, and OSA patterns. The smartphone is simply placed adjacent to the subject, such as on a bedside table, night stand or shelf, during the sleep session. The system was trained on a set of 128 overnights recorded at a sleep laboratory, where volunteers underwent simultaneous full polysomnography (PSG), and "Firefly" smartphone app analysis. A separate independent test set of 120 recordings was collected across a range of Apple iOS and Android smartphones, and withheld for performance evaluation by a different team. An operating point tuned for mid-sensitivity (i.e., balancing sensitivity and specificity) was chosen for the screener.

Results : The performance on the test set is comparable to ambulatory OSA screeners, and other smartphone screening apps, with a sensitivity of 88.3% and specificity of 80.0% [with receiver operating characteristic (ROC) area under the curve (AUC) of 0.92], for a clinical threshold for the AHI of ≥15 events/hour of detected sleep time.

Conclusions : The "Firefly" app based sensing technology offers the potential to significantly lower the barrier of entry to OSA screening, as no hardware (other than the user's personal smartphone) is required. Additionally, multi-night analysis is possible in the home environment, without requiring the wearing of a portable PSG or other home sleep test (HST).

Tiron Roxana, Lyon Graeme, Kilroy Hannah, Osman Ahmed, Kelly Nicola, O’Mahony Niall, Lopes Cesar, Coffey Sam, McMahon Stephen, Wren Michael, Conway Kieran, Fox Niall, Costello John, Shouldice Redmond, Lederer Katharina, Fietze Ingo, Penzel Thomas


Sleep-disordered breathing (SDB), apnea hypopnea index (AHI), obstructive sleep apnea (OSA), screening, smartphone

General General

An annotated data set for identifying women reporting adverse pregnancy outcomes on Twitter.

In Data in brief

Despite the prevalence in the United States of miscarriage [1], stillbirth [2], and infant mortality associated with preterm birth and low birthweight [3], their causes remain largely unknown [4], [5], [6]. To advance the use of social media data as a complementary resource for epidemiology of adverse pregnancy outcomes, we present a data set of 6487 tweets that mention miscarriage, stillbirth, preterm birth or premature labor, low birthweight, neonatal intensive care, or fetal/infant loss in general. These tweets are a subset of 22,912 tweets retrieved by applying hand-written regular expressions to a database containing more than 400 million public tweets posted by more than 100,000 women who have announced their pregnancy on Twitter [7]. Two professional annotators labeled the 6487 tweets in a binary fashion, distinguishing those potentially reporting that the user has personally experienced the outcome ("outcome" tweets) from those that merely mention the outcome ("non-outcome" tweets). Inter-annotator agreement was κ = 0.90 (Cohen's kappa). The tweets annotated as "outcome" include 1318 women reporting miscarriage, 94 stillbirth, 591 preterm birth or premature labor, 171 low birthweight, 453 neonatal intensive care, and 356 fetal/infant loss in general. These "outcome" tweets can be used to explore patient experiences and perceptions of adverse pregnancy outcomes, and can direct researchers to the users' broader timelines-tweets posted by a user over time-for observational studies. Our past work demonstrates the analysis of timelines for selecting a study population [8] and conducting a case-control study [9] of users reporting that their child has a birth defect. For larger-scale studies, the full annotated corpus can be used to train supervised machine learning algorithms to automatically identify additional users reporting adverse pregnancy outcomes on Twitter. We used the annotated corpus to train feature-engineered and deep learning-based classifiers presented in "A natural language processing pipeline to advance the use of Twitter data for digital epidemiology of adverse pregnancy outcomes" [10].

Klein Ari Z, Gonzalez-Hernandez Graciela


Data mining, Epidemiology, Machine learning, Natural language processing, Pregnancy, Social media

Ophthalmology Ophthalmology

Characterization of the retinal vasculature in fundus photos using the PanOptic iExaminer system.

In Eye and vision (London, England)

Background : The goal was to characterize retinal vasculature by quantitative analysis of arteriole-to-venule (A/V) ratio and vessel density in fundus photos taken with the PanOptic iExaminer System.

Methods : The PanOptic ophthalmoscope equipped with a smartphone was used to acquire fundus photos centered on the optic nerve head. Two fundus photos of a total of 19 eyes from 10 subjects were imaged. Retinal vessels were analyzed to obtain the A/V ratio. In addition, the vessel tree was extracted using deep learning U-NET, and vessel density was processed by the percentage of pixels within vessels over the entire image.

Results : All images were successfully processed for the A/V ratio and vessel density. There was no significant difference of averaged A/V ratio between the first (0.77 ± 0.09) and second (0.77 ± 0.10) measurements (P = 0.53). There was no significant difference of averaged vessel density (%) between the first (6.11 ± 1.39) and second (6.12 ± 1.40) measurements (P = 0.85).

Conclusions : Quantitative analysis of the retinal vasculature was feasible in fundus photos taken using the PanOptic ophthalmoscope. The device appears to provide sufficient image quality for analyzing A/V ratio and vessel density with the benefit of portability, easy data transferring, and low cost of the device, which could be used for pre-clinical screening of systemic, cerebral and ocular diseases.

Hu Huiling, Wei Haicheng, Xiao Mingxia, Jiang Liqiong, Wang Huijuan, Jiang Hong, Rundek Tatjana, Wang Jianhua


Arteriovenous ratio, Deep learning, Image analysis, Retina, Smartphone ophthalmoscope, Vessel density

Public Health Public Health

Applying machine learning on health record data from general practitioners to predict suicidality.

In Internet interventions

Background : Suicidal behaviour is difficult to detect in the general practice. Machine learning (ML) algorithms using routinely collected data might support General Practitioners (GPs) in the detection of suicidal behaviour. In this paper, we applied machine learning techniques to support GPs recognizing suicidal behaviour in primary care patients using routinely collected general practice data.

Methods : This case-control study used data from a national representative primary care database including over 1.5 million patients (Nivel Primary Care Database). Patients with a suicide (attempt) in 2017 were selected as cases (N = 574) and an at risk control group (N = 207,308) was selected from patients with psychological vulnerability but without a suicide attempt in 2017. RandomForest was trained on a small subsample of the data (training set), and evaluated on unseen data (test set).

Results : Almost two-third (65%) of the cases visited their GP within the last 30 days before the suicide (attempt). RandomForest showed a positive predictive value (PPV) of 0.05 (0.04-0.06), with a sensitivity of 0.39 (0.32-0.47) and area under the curve (AUC) of 0.85 (0.81-0.88). Almost all controls were accurately labeled as controls (specificity = 0.98 (0.97-0.98)). Among a sample of 650 at-risk primary care patients, the algorithm would label 20 patients as high-risk. Of those, one would be an actual case and additionally, one case would be missed.

Conclusion : In this study, we applied machine learning to predict suicidal behaviour using general practice data. Our results showed that these techniques can be used as a complementary step in the identification and stratification of patients at risk of suicidal behaviour. The results are encouraging and provide a first step to use automated screening directly in clinical practice. Additional data from different social domains, such as employment and education, might improve accuracy.

van Mens Kasper, Elzinga Elke, Nielen Mark, Lokkerbol Joran, Poortvliet Rune, Donker Gé, Heins Marianne, Korevaar Joke, Dückers Michel, Aussems Claire, Helbich Marco, Tiemens Bea, Gilissen Renske, Beekman Aartjan, de Beurs Derek


Electronic health records, General practice, Machine learning, Suicide

General General

Generative-Discriminative Complementary Learning.

In Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence

The majority of state-of-the-art deep learning methods are discriminative approaches, which model the conditional distribution of labels given inputs features. The success of such approaches heavily depends on high-quality labeled instances, which are not easy to obtain, especially as the number of candidate classes increases. In this paper, we study the complementary learning problem. Unlike ordinary labels, complementary labels are easy to obtain because an annotator only needs to provide a yes/no answer to a randomly chosen candidate class for each instance. We propose a generative-discriminative complementary learning method that estimates the ordinary labels by modeling both the conditional (discriminative) and instance (generative) distributions. Our method, we call Complementary Conditional GAN (CCGAN), improves the accuracy of predicting ordinary labels and is able to generate high-quality instances in spite of weak supervision. In addition to the extensive empirical studies, we also theoretically show that our model can retrieve the true conditional distribution from the complementarily-labeled data.

Xu Yanwu, Gong Mingming, Chen Junxiang, Liu Tongliang, Zhang Kun, Batmanghelich Kayhan