Doctor Penguin

Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General

General

Algal cell viability assessment: The role of environmental factors in phytoplankton population dynamics.

In Marine pollution bulletin
The viability of algal cells is one of the most fundamental issues in marine ecological research. In this work, a method was designed to identify algal cell viability based on digital holography and deep learning, which divided algal cells into three categories: active, weak, and dead cells. This method was applied to measure algal cells in surface waters of the East China Sea in spring, revealing about 4.34 %-23.29 % weak cells and 3.98 %-19.47 % dead cells. Levels of nitrate and chlorophyll a were the main factors affecting the viability of algal cells. Furthermore, algal viability changes during the heating and cooling were observed in laboratory experiments: high temperatures led to an increase in weak algal cells. This may provide an explanation for why most harmful algal blooms occur in warming months. This study provided a novel insight into how to identify the viability of algal cells and understand their significance in the ocean.
Wang Yanyan, Zhai Wei-Dong, Wu Chi

2023-Mar-08

Cell viability, Deep learning, Digital holography, East China Sea, Harmful algal blooms

General

General

Human-guided deep learning with ante-hoc explainability by convolutional network from non-image data for pregnancy prognostication.

In Neural networks : the official journal of the International Neural Network Society

BACKGROUND AND OBJECTIVE : Deep learning is applied in medicine mostly due to its state-of-the-art performance for diagnostic imaging. Supervisory authorities also require the model to be explainable, but most explain the model after development (post hoc) instead of incorporating explanation into the design (ante hoc). This study aimed to demonstrate a human-guided deep learning with ante-hoc explainability by convolutional network from non-image data to develop, validate, and deploy a prognostic prediction model for PROM and an estimator of time of delivery using a nationwide health insurance database.

METHODS : To guide modeling, we constructed and verified association diagrams respectively from literatures and electronic health records. Non-image data were transformed into meaningful images utilizing predictor-to-predictor similarities, harnessing the power of convolutional neural network mostly used for diagnostic imaging. The network architecture was also inferred from the similarities.

RESULTS : This resulted the best model for prelabor rupture of membranes (n=883, 376) with the area under curves 0.73 (95% CI 0.72 to 0.75) and 0.70 (95% CI 0.69 to 0.71) respectively by internal and external validations, and outperformed previous models found by systematic review. It was explainable by knowledge-based diagrams and model representation.

CONCLUSIONS : This allows prognostication with actionable insights for preventive medicine.

Sufriyana Herdiantri, Wu Yu-Wei, Su Emily Chia-Yu

2023-Feb-24

Causal diagram, Deep learning, Electronic health records, Explainable artificial intelligence, Prelabor rupture of membranes

General

General

Explore traffic conflict risks considering motion constraint degree in the diverging area of toll plazas.

In Accident; analysis and prevention
In the diverging area of toll plazas, the absence of lane markings, the gradual widening of lanes, and the crossing of vehicles with different tolling methods increase the likelihood of collisions. This study proposed a concept of motion constraint degree to investigate traffic conflict risks in the toll plaza diverging area. On the basis of the motion constraint degree, a two-step method was developed, in which all potentially influencing factors were separated into two parts. The first part was used to analyze the association between the motion constraint degree and some factors, while the remaining factors were utilized for risk regression/prediction together with the motion constraint degree. The random parameters logit model was applied for regression analysis and four prevalent machine learning models were employed for risk prediction. Results indicate that (1) the proposed approach considering motion constraint degree outperforms the conventional direct method, no matter for conflict risk regression or prediction; (2) the motion constraint degree is not monotonically correlated with the risk level of vehicles; (3) due to the layout of the toll plaza, ETC vehicles are less likely to be at risk in the diverging area; and (4) lane-changing behaviors in the restricted space increase the conflict risk.
Xing Lu, Yu Le, Zheng Ou, Abdel-Aty Mohamed

2023-Mar-08

Driving behavior, Machine learning, Motion constraint degree, Random parameters logit, Toll plaza diverging area, Traffic conflict

General

General

Contributions of various driving factors to air pollution events: Interpretability analysis from Machine learning perspective.

In Environment international
The air quality in China has been improved substantially, however fine particulate matter (PM_2.5) still remain at a high level in many areas. PM_2.5 pollution is a complex process that is attributed to gaseous precursors, chemical, and meteorological factors. Quantifying the contribution of each variable to air pollution can facilitate the formulation of effective policies to precisely eliminate air pollution. In this study, we first used decision plot to map out the decision process of the Random Forest (RF) model for a single hourly data set and constructed a framework for analyzing the causes of air pollution using multiple interpretable methods. Permutation importance was used to qualitatively analyze the effect of each variable on PM_2.5 concentrations. The sensitivity of secondary inorganic aerosols (SIA): SO₄^2-, NO₃^- and NH₄⁺ to PM_2.5 was verified by Partial dependence plot (PDP). Shapley Additive Explanation (Shapley) was used to quantify the contribution of drivers behind the ten air pollution events. The RF model can accurately predict PM_2.5 concentrations, with determination coefficient (R²) of 0.94, root mean square error (RMSE) and mean absolute error (MAE) of 9.4 μg/m³ and 5.7 μg/m³, respectively. This study revealed that the order of sensitivity of SIA to PM_2.5 was NH₄⁺＞NO₃^-＞SO₄^2-. Fossil fuel and biomass combustion may be contributing factors to air pollution events in Zibo in 2021 autumn-winter. NH₄⁺ contributed 19.9-65.4 μg/m³ among ten air pollution events (APs). K, NO₃^-, EC and OC were the other main drivers, contributing 8.7 ± 2.7 μg/m³, 6.8 ± 7.5 μg/m³, 3.6 ± 5.8 μg/m³ and 2.5 ± 2.0 μg/m³, respectively. Lower temperature and higher humidity were vital factors that promoted the formation of NO₃^-. Our study may provide a methodological framework for precise air pollution management.
Li Tianshuai, Zhang Qingzhu, Peng Yanbo, Guan Xu, Li Lei, Mu Jiangshan, Wang Xinfeng, Yin Xianwei, Wang Qiao

2023-Mar-04

Air pollution, Machine learning, PDP, PM(2.5), Permutation importance, SHAP

oncology

Oncology

Investigation of radiomics and deep convolutional neural networks approaches for glioma grading.

In Biomedical physics & engineering express
To determine glioma grading by applying radiomic analysis or deep convolutional neural networks (DCNN) and to benchmark both approaches on broader validation sets.Methods: Seven public datasets were considered: 1) low-grade glioma or high-grade glioma (369 patients, BraTS'20) 2) well-differentiated liposarcoma or lipoma (115, LIPO); 3) desmoid-type fibromatosis or extremity soft-tissue sarcomas (203, Desmoid); 4) primary solid liver tumors, either malignant or benign (186, LIVER); 5) gastrointestinal stromal tumors (GISTs) or intra-abdominal gastrointestinal tumors radiologically resembling GISTs (246, GIST); 6) colorectal liver metastases (77, CRLM); and 7) lung metastases of metastatic melanoma (103, Melanoma).Radiomic analysis was performed on 464 (2016) radiomic features for the BraTS'20 (others) datasets respectively. Random forests (RF), Extreme Gradient Boosting (XGBOOST) and a voting algorithm comprising both classifiers were tested. The parameters of the classifiers were optimized using a repeated nested stratified cross-validation process. The feature importance of each classifier was computed using the Gini index or permutation feature importance.DCNN was performed on 2D axial and sagittal slices encompassing the tumor. A balanced database was created, when necessary, using smart slices selection. ResNet50, Xception, EficientNetB0, and EfficientNetB3 were transferred from the ImageNet application to the tumor classification and were fine-tuned. Five-fold stratified cross-validation was performed to evaluate the models. The classification performance of the models was measured using multiple indices including area under the receiver operating characteristic curve (AUC). Results: The best radiomic approach was based on XGBOOST for all datasets; AUC was 0.934 (BraTS'20), 0.86 (LIPO), 0.73 (LIVER), (0.844) Desmoid, 0.76 (GIST), 0.664 (CRLM), and 0.577 (Melanoma) respectively. The best DCNN was based on EfficientNetB0; AUC was 0.99 (BraTS'20), 0.982 (LIPO), 0.977 (LIVER), (0.961) Desmoid, 0.926 (GIST), 0.901 (CRLM), and 0.89 (Melanoma) respectively.Conclusion: Tumor classification can be accurately determined by adapting state-of-the-art machine learning algorithms to the medical context. &#xD.
Aouadi Souha, Torfeh Tarraf, Yoganathan S A, Paloor Satheesh, Riyas Mohamed, Hammoud Rabih, Al-Hammadi Noora

2023-Mar-10

Benchmarking, CT, Deep learning, Glioma grading, Multi-contrast MRI, Radiomics

General

General

De novo design of small beta barrel proteins.

In Proceedings of the National Academy of Sciences of the United States of America
Small beta barrel proteins are attractive targets for computational design because of their considerable functional diversity despite their very small size (<70 amino acids). However, there are considerable challenges to designing such structures, and there has been little success thus far. Because of the small size, the hydrophobic core stabilizing the fold is necessarily very small, and the conformational strain of barrel closure can oppose folding; also intermolecular aggregation through free beta strand edges can compete with proper monomer folding. Here, we explore the de novo design of small beta barrel topologies using both Rosetta energy-based methods and deep learning approaches to design four small beta barrel folds: Src homology 3 (SH3) and oligonucleotide/oligosaccharide-binding (OB) topologies found in nature and five and six up-and-down-stranded barrels rarely if ever seen in nature. Both approaches yielded successful designs with high thermal stability and experimentally determined structures with less than 2.4 Å rmsd from the designed models. Using deep learning for backbone generation and Rosetta for sequence design yielded higher design success rates and increased structural diversity than Rosetta alone. The ability to design a large and structurally diverse set of small beta barrel proteins greatly increases the protein shape space available for designing binders to protein targets of interest.
Kim David E, Jensen Davin R, Feldman David, Tischer Doug, Saleem Ayesha, Chow Cameron M, Li Xinting, Carter Lauren, Milles Lukas, Nguyen Hannah, Kang Alex, Bera Asim K, Peterson Francis C, Volkman Brian F, Ovchinnikov Sergey, Baker David

2023-Mar-14

high-throughput screening, machine learning, protein design, small beta barrels