Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General General

Use statistical machine learning to detect nutrient thresholds in Microcystis blooms and microcystin management.

In Harmful algae

The frequency of toxin-producing cyanobacterial blooms has increased in recent decades due to nutrient enrichment and climate change. Because Microcystis blooms are related to different environmental conditions, identifying potential nutrient control targets can facilitate water quality managers to reduce the likelihood of microcystins (MCs) risk. However, complex biotic interactions and field data limitations have constrained our understanding of the nutrient-microcystin relationship. This study develops a Bayesian modelling framework with intracellular and extracellular MCs that characterize the relationships between different environmental and biological factors. This model was fit to the across-lake dataset including three bloom-plagued lakes in China and estimated the putative thresholds of total nitrogen (TN) and total phosphorus (TP). The lake-specific nutrient thresholds were estimated using Bayesian updating process. Our results suggested dual N and P reduction in controlling cyanotoxin risks. The total Microcystis biomass can be substantially suppressed by achieving the putative thresholds of TP (0.10 mg/L) in Lakes Taihu and Chaohu, but a stricter TP target (0.05 mg/L) in Dianchi Lake. To maintain MCs concentrations below 1.0 μg/L, the estimated TN threshold in three lakes was 1.8 mg/L, but the effect can be counteracted by the increase of temperature. Overall, the present approach provides an efficient way to integrate empirical knowledge into the data-driven model and is helpful for the management of water resources.

Shan Kun, Wang Xiaoxiao, Yang Hong, Zhou Botian, Song Lirong, Shang Mingsheng


Bayesian modelling, Cyanobacterial blooms, Eutrophication, Microcystin, Microcystis, Nutrient thresholds

General General

Machine learning for the diagnosis of early stage diabetes using temporal glucose profiles

ArXiv Preprint

Machine learning shows remarkable success for recognizing patterns in data. Here we apply the machine learning (ML) for the diagnosis of early stage diabetes, which is known as a challenging task in medicine. Blood glucose levels are tightly regulated by two counter-regulatory hormones, insulin and glucagon, and the failure of the glucose homeostasis leads to the common metabolic disease, diabetes mellitus. It is a chronic disease that has a long latent period the complicates detection of the disease at an early stage. The vast majority of diabetics result from that diminished effectiveness of insulin action. The insulin resistance must modify the temporal profile of blood glucose. Thus we propose to use ML to detect the subtle change in the temporal pattern of glucose concentration. Time series data of blood glucose with sufficient resolution is currently unavailable, so we confirm the proposal using synthetic data of glucose profiles produced by a biophysical model that considers the glucose regulation and hormone action. Multi-layered perceptrons, convolutional neural networks, and recurrent neural networks all identified the degree of insulin resistance with high accuracy above $85\%$.

Woo Seok Lee, Junghyo Jo, Taegeun Song


General General

Applications of machine learning methods for the discovery of NDM-1 inhibitors.

In Chemical biology & drug design ; h5-index 32.0

The emergence of New Delhi metal beta-lactamase (NDM-1) -producing bacteria and their worldwide spread pose great challenges for the treatment of drug-resistant bacterial infections. These bacteria can hydrolyze most β-lactam antibacterials. Unfortunately, there are no clinically useful NDM-1 inhibitors. In the current work, we manually collected NDM-1 inhibitors reported in the past decade and established the first NDM-1 inhibitor database. Four machine learning models were constructed using the structural and property characteristics of the collected compounds as input training set to discover potential NDM-1 inhibitors. In order to distinguish between high active inhibitors and putative positive drugs, a three-classification strategy was introduced in our study. In detail, the commonly used positive and negative divisions are converted into strongly active, weakly active and inactive. The accuracy of the best prediction model designed based on this strategy reached 90.5%, compared with 69.14% achieved by the traditional docking-based virtual screening method. Consequently, the best model was used to virtually screen a natural product library. The safety of the selected compounds was analyzed by the ADMET prediction model based on machine learning. Seven novel NDM-1 inhibitors were identified, which will provide valuable clues for the discovery of NDM-1 inhibitors.

Shi Cheng, Dong Fanyi, Zhao Guiling, Zhu Ning, Lao Xingzhen, Zheng Heng


Bacterial resistance, Drug discovery, Machine learning, NDM-1 inhibitors, Virtual screening

Radiology Radiology

Computer-aided diagnosis in the era of deep learning.

In Medical physics ; h5-index 59.0

Computer-aided diagnosis (CAD) has been a major field of research for the past few decades. CAD uses machine learning methods to analyze imaging and/or nonimaging patient data and makes assessment of the patient's condition, which can then be used to assist clinicians in their decision-making process. The recent success of the deep learning technology in machine learning spurs new research and development efforts to improve CAD performance and to develop CAD for many other complex clinical tasks. In this paper, we discuss the potential and challenges in developing CAD tools using deep learning technology or artificial intelligence (AI) in general, the pitfalls and lessons learned from CAD in screening mammography and considerations needed for future implementation of CAD or AI in clinical use. It is hoped that the past experiences and the deep learning technology will lead to successful advancement and lasting growth in this new era of CAD, thereby enabling CAD to deliver intelligent aids to improve health care.

Chan Heang-Ping, Hadjiiski Lubomir M, Samala Ravi K


artificial intelligence, computer-aided diagnosis, deep learning

General General

Can accelerometer ear tags identify behavioural changes in sheep associated with parturition?

In Animal reproduction science

On-animal sensor systems provide an opportunity to monitor ewes during parturition, potentially reducing ewe and lamb mortality risk. This study investigated the capacity of machine learning (ML) behaviour classification to monitor changes in sheep behaviour around the time of lambing using ear-borne accelerometers. Accelerometers were attached to 27 ewes grazing a 4.4 ha paddock. Data were then classified based on three different ethograms: (i) detection of grazing, lying, standing, walking; (ii) detection of active behaviour; and (iii) detection of body posture. Proportion of time devoted to performing each behaviour and activity was then calculated at a daily and hourly scale. Frequency of posture change was also calculated on an hourly scale. Assessment of each metric using a linear mixed-effects model was conducted for the 7 days (day scale) or 12 h (hour scale) before and after lambing. For all physical movements, regardless of the ethogram, there was a change in the days surrounding lambing. This involved either a decrease (grazing, lying, active behaviour) or peak (standing, walking) on the day of parturition, with most values returning to either pre-partum or near-pre-partum levels (all P < 0.001). Hourly changes also occurred for all behaviours (all P < 0.001), the most marked being increased walking behaviour and frequency of posture change. These findings indicate ewes were more restless around the time of parturition. Further application of this research should focus on development of algorithms that can be used to identify onset of lambing and/or time of parturition in pasture-based ewes.

Fogarty E S, Swain D L, Cronin G M, Moraes L E, Trotter M


Accelerometer, MEMS, Machine learning, Parturition, Remote monitoring

General General

Classification of Spam Emails through Hierarchical Clustering and Supervised Learning

ArXiv Preprint

Spammers take advantage of email popularity to send indiscriminately unsolicited emails. Although researchers and organizations continuously develop anti-spam filters based on binary classification, spammers bypass them through new strategies, like word obfuscation or image-based spam. For the first time in literature, we propose to classify spam email in categories to improve the handle of already detected spam emails, instead of just using a binary model. First, we applied a hierarchical clustering algorithm to create SPEMC-$11$K (SPam EMail Classification), the first multi-class dataset, which contains three types of spam emails: Health and Technology, Personal Scams, and Sexual Content. Then, we used SPEMC-$11$K to evaluate the combination of TF-IDF and BOW encodings with Na\"ive Bayes, Decision Trees and SVM classifiers. Finally, we recommend for the task of multi-class spam classification the use of (i) TF-IDF combined with SVM for the best micro F1 score performance, $95.39\%$, and (ii) TD-IDF along with NB for the fastest spam classification, analyzing an email in $2.13$ms.

Francisco Jáñez-Martino, Eduardo Fidalgo, Santiago González-Martínez, Javier Velasco-Mata