Doctor Penguin

Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General

General

Automated extraction of pod phenotype data from micro-computed tomography.

In Frontiers in plant science

INTRODUCTION : Plant image datasets have the potential to greatly improve our understanding of the phenotypic response of plants to environmental and genetic factors. However, manual data extraction from such datasets are known to be time-consuming and resource intensive. Therefore, the development of efficient and reliable machine learning methods for extracting phenotype data from plant imagery is crucial.

METHODS : In this paper, a current gold standard computed vision method for detecting and segmenting objects in three-dimensional imagery (StartDist-3D) is applied to X-ray micro-computed tomography scans of oilseed rape (Brassica napus) mature pods.

RESULTS : With a relatively minimal training effort, this fine-tuned StarDist-3D model accurately detected (Validation F1-score = 96.3%,Testing F1-score = 99.3%) and predicted the shape (mean matched score = 90%) of seeds.

DISCUSSION : This method then allowed rapid extraction of data on the number, size, shape, seed spacing and seed location in specific valves that can be integrated into models of plant development or crop yield. Additionally, the fine-tuned StarDist-3D provides an efficient way to create a dataset of segmented images of individual seeds that could be used to further explore the factors affecting seed development, abortion and maturation synchrony within the pod. There is also potential for the fine-tuned Stardist-3D method to be applied to imagery of seeds from other plant species, as well as imagery of similarly shaped plant structures such as beans or wheat grains, provided the structures targeted for detection and segmentation can be described as star-convex polygons.

Corcoran Evangeline, Siles Laura, Kurup Smita, Ahnert Sebastian

2023

computer vision, machine learning, micro-compute tomography, phenotyping, plant development

General

General

Fungal identification in peanuts seeds through multispectral images: Technological advances to enhance sanitary quality.

In Frontiers in plant science
The sanitary quality of seed is essential in agriculture. This is because pathogenic fungi compromise seed physiological quality and prevent the formation of plants in the field, which causes losses to farmers. Multispectral images technologies coupled with machine learning algorithms can optimize the identification of healthy peanut seeds, greatly improving the sanitary quality. The objective was to verify whether multispectral images technologies and artificial intelligence tools are effective for discriminating pathogenic fungi in tropical peanut seeds. For this purpose, dry peanut seeds infected by fungi (A. flavus, A. niger, Penicillium sp., and Rhizopus sp.) were used to acquire images at different wavelengths (365 to 970 nm). Multispectral markers of peanut seed health quality were found. The incubation period of 216 h was the one that most contributed to discriminating healthy seeds from those containing fungi through multispectral images. Texture (Percent Run), color (CIELab L) and reflectance (490 nm) were highly effective in discriminating the sanitary quality of peanut seeds. Machine learning algorithms (LDA, MLP, RF, and SVM) demonstrated high accuracy in autonomous detection of seed health status (90 to 100%). Thus, multispectral images coupled with machine learning algorithms are effective for screening peanut seeds with superior sanitary quality.
Sudki Julia Marconato, Fonseca de Oliveira Gustavo Roberto, de Medeiros André Dantas, Mastrangelo Thiago, Arthur Valter, Amaral da Silva Edvaldo Aparecido, Mastrangelo Clíssia Barboza*

2023

Arachis hypogaea L., Aspergillus spp., machine learning, seed health, support vector machine

General

General

Data Flush.

In Harvard data science review
Data perturbation is a technique for generating synthetic data by adding "noise" to raw data, which has an array of applications in science and engineering, primarily in data security and privacy. One challenge for data perturbation is that it usually produces synthetic data resulting in information loss at the expense of privacy protection. The information loss, in turn, renders the accuracy loss for any statistical or machine learning method based on the synthetic data, weakening downstream analysis and deteriorating in machine learning. In this article, we introduce and advocate a fundamental principle of data perturbation, which requires the preservation of the distribution of raw data. To achieve this, we propose a new scheme, named data flush, which ascertains the validity of the downstream analysis and maintains the predictive accuracy of a learning task. It perturbs data nonlinearly while accommodating the requirement of strict privacy protection, for instance, differential privacy. We highlight multiple facets of data flush through examples.
Shen Xiaotong, Bi Xuan, Shen Rex

2022

Census, data integration, differential privacy, distribution preservation, statistical inference

General

General

Leveraging explanations in interactive machine learning: An overview.

In Frontiers in artificial intelligence
Explanations have gained an increasing level of interest in the AI and Machine Learning (ML) communities in order to improve model transparency and allow users to form a mental model of a trained ML model. However, explanations can go beyond this one way communication as a mechanism to elicit user control, because once users understand, they can then provide feedback. The goal of this paper is to present an overview of research where explanations are combined with interactive capabilities as a mean to learn new models from scratch and to edit and debug existing ones. To this end, we draw a conceptual map of the state-of-the-art, grouping relevant approaches based on their intended purpose and on how they structure the interaction, highlighting similarities and differences between them. We also discuss open research issues and outline possible directions forward, with the hope of spurring further research on this blooming research topic.
Teso Stefano, Alkan Öznur, Stammer Wolfgang, Daly Elizabeth

2023

explainable AI, human-in-the-loop, interactive machine learning, model debugging, model editing

General

General

Optimal blending of multiple independent prediction models.

In Frontiers in artificial intelligence
We derive blending coefficients for the optimal blend of multiple independent prediction models with normal (Gaussian) distribution as well as the variance of the final blend. We also provide lower and upper bound estimation for the final variance and we compare these results with machine learning with counts, where only binary information (feature says yes or no only) is used for every feature and the majority of features agreeing together make the decision.
Taraba Peter

2023

Gaussians, blending of independent models, going wider, machine learning with counts, normal distributions

Pathology

Pathology

SMPD1 expression profile and mutation landscape help decipher genotype-phenotype association and precision diagnosis for acid sphingomyelinase deficiency.

In Hereditas

BACKGROUND : Acid sphingomyelinase deficiency (ASMD) disorder, also known as Niemann-Pick disease (NPD) is a rare genetic disease caused by mutations in SMPD1 gene, which encodes sphingomyelin phosphodiesterase (ASM). Except for liver and spleen enlargement and lung disease, two subtypes (Type A and B) of NDP have different onset times, survival times, ASM activities, and neurological abnormalities. To comprehensively explore NPD's genotype-phenotype association and pathophysiological characteristics, we collected 144 NPD cases with strict quality control through literature mining.

RESULTS : The difference in ASM activity can differentiate NPD type A from other subtypes, with the ratio of ASM activity to the reference values being lower in type A (threshold 0.045 (4.45%)). Severe variations, such as deletion and insertion, can cause complete loss of ASM function, leading to type A, whereas relatively mild missense mutations generally result in type B. Among reported mutations, the p.Arg3AlafsX76 mutation is highly prevalent in the Chinese population, and the p.R608del mutation is common in Mediterranean countries. The expression profiles of SMPD1 from GTEx and single-cell RNA sequencing data of multiple fetal tissues showed that high expressions of SMPD1 can be observed in the liver, spleen, and brain tissues of adults and hepatoblasts, hematopoietic stem cells, STC2_TLX1-positive cells, mesothelial cells of the spleen, vascular endothelial cells of the cerebellum and the cerebrum of fetuses, indicating that SMPD1 dysfunction is highly likely to have a significant effect on the function of those cell types during development and the clinicians need pay attention to these organs or tissues as well during diagnosis. In addition, we also predicted 21 new pathogenic mutations in the SMPD1 gene that potentially cause the NPD, signifying that more rare cases will be detected with those mutations in SMPD1. Finally, we also analysed the function of the NPD type A cells following the extracellular milieu.

CONCLUSIONS : Our study is the first to elucidate the effects of SMPD1 mutation on cell types and at the tissue level, which provides new insights into the genotype-phenotype association and can help in the precise diagnosis of NPD.

Wang Ruisong, Qin Ziyi, Huang Long, Luo Huiling, Peng Han, Zhou Xinyu, Zhao Zhixiang, Liu Mingyao, Yang Pinhong, Shi Tieliu

2023-Mar-13

Acid sphingomyelinase deficiency, Genotype, Niemann-pick disease type a and B, Novel target for the subtypes, Phenotype