Doctor Penguin

Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General

General

Sparse Group Lasso: Optimal Sample Complexity, Convergence Rate, and Statistical Inference.

In IEEE transactions on information theory
We study sparse group Lasso for high-dimensional double sparse linear regression, where the parameter of interest is simultaneously element-wise and group-wise sparse. This problem is an important instance of the simultaneously structured model - an actively studied topic in statistics and machine learning. In the noiseless case, matching upper and lower bounds on sample complexity are established for the exact recovery of sparse vectors and for stable estimation of approximately sparse vectors, respectively. In the noisy case, upper and matching minimax lower bounds for estimation error are obtained. We also consider the debiased sparse group Lasso and investigate its asymptotic property for the purpose of statistical inference. Finally, numerical studies are provided to support the theoretical results.
Cai T Tony, Zhang Anru R, Zhou Yuchen

2022-Sep

approximate dual certificate, convex optimization, simultaneously structured model, sparse group Lasso, sparsity

General

General

Ant Colony Optimization-Enabled CNN Deep Learning Technique for Accurate Detection of Cervical Cancer.

In BioMed research international ; h5-index 102.0
Cancer is characterized by abnormal cell growth and proliferation, which are both diagnostic indicators of the disease. When cancerous cells enter one organ, there is a risk that they may spread to adjacent tissues and eventually to other organs. Cancer of the cervix of the uterus often initially manifests itself in the uterine cervix, which is located at the very bottom of the uterus. Both the growth and death of cervical cells are characteristic features of this condition. False-negative results provide a significant moral dilemma since they may cause women to get an incorrect diagnosis of cancer, which in turn can result in the woman's premature death from the disease. False-positive results do not raise any significant ethical concerns; but they do require a patient to go through an expensive and time-consuming treatment process, and they also cause the patient to experience tension and anxiety that is not warranted. In order to detect cervical cancer in its earliest stages in women, a screening procedure known as a Pap test is often performed. This article describes a technique for improving images using Brightness Preserving Dynamic Fuzzy Histogram Equalization. To individual components and find the right area of interest, the fuzzy c-means approach is applied. The images are segmented using the fuzzy c-means method to find the right area of interest. The feature selection algorithm is the ACO algorithm. Following that, categorization is carried out utilizing the CNN, MLP, and ANN algorithms.
Kavitha R, Jothi D Kiruba, Saravanan K, Swain Mahendra Pratap, Gonzáles José Luis Arias, Bhardwaj Rakhi Joshi, Adomako Elijah

2023

General

General

A scalable thin-film defect quantify model under imbalanced regression and classification task based on computer vision.

In Heliyon
Optical coating damage detection is a part of both industrial production and scientific research. Traditional methods require sophisticated expert systems or experienced front-line producers, and the cost of these methods rises dramatically when film types or inspection environments change. In practice, it has been found that customized expert systems imply a significant investment of time and money, and we expect to find a method that can perform this task automatically and quickly, while at the same time the method should be adaptable to the later addition of coating types and the ability to identify damage kinds. In this paper, we propose a deep neural network-based detection tool that splits the task into two parts: damage classification and damage degree regression. Introduces attention mechanisms and Embedding operations to enhance the performance of the model. It was found that the damage type detection accuracy of our model reached 93.65%, and the regression loss was kept within 10% on different data sets. We believe that deep neural networks have great potential to tackle industrial defect detection by significantly reducing the design cost and time of traditional expert systems, while gaining the ability to detect entirely new damage types at a fraction of the cost.
Yang Guoliang, Zhou Gaohao, Wang Changyuan, Mu Jing, Yang Zhenhu, Li Yuan, Su Junhong

2023-Feb

Deep learning, Image processing, Keywords, Thin-film

General

General

Prospective predictors of electronic nicotine delivery system initiation in tobacco naive young adults: A machine learning approach.

In Preventive medicine reports
The use of electronic nicotine delivery systems (ENDS) is increasing among young adults. However, there are few studies regarding predictors of ENDS initiation in tobacco-naive young adults. Identifying the risk and protective factors of ENDS initiation that are specific to tobacco-naive young adults will enable the creation of targeted policies and prevention programs. This study used machine learning (ML) to create predictive models, identify risk and protective factors for ENDS initiation for tobacco-naive young adults, and the relationship between these predictors and the prediction of ENDS initiation. We used nationally representative data of tobacco-naive young adults in the U.S drawn from the Population Assessment of Tobacco and Health (PATH) longitudinal cohort survey. Respondents were young adults (18-24 years) who had never used any tobacco products in Wave 4 and who completed Waves 4 and 5 interviews. ML techniques were used to create models and determine predictors at 1-year follow-up from Wave 4 data. Among the 2,746 tobacco-naive young adults at baseline, 309 initiated ENDS use at 1-year follow-up. The top five prospective predictors of ENDS initiation were susceptibility to ENDS, increased days of physical exercise specifically designed to strengthen muscles, frequency of social media use, marijuana use and susceptibility to cigarettes. This study identified previously unreported and emerging predictors of ENDS initiation that warrant further investigation and provided comprehensive information on the predictors of ENDS initiation. Furthermore, this study showed that ML is a promising technique that can aid ENDS monitoring and prevention programs.
Atuegwu Nkiruka C, Mortensen Eric M, Krishnan-Sarin Suchitra, Laubenbacher Reinhard C, Litt Mark D

2023-Apr

E-cigarette, ENDS, Electronic nicotine delivery systems, Machine learning, Never tobacco users, PATH, Population Assessment of Tobacco and Health survey, Prospective predictors, Tobacco naïve, Vaping, Young adults

General

General

Development and Multi-Site External Validation of a Generalizable Risk Prediction Model for Bipolar Disorder.

In medRxiv : the preprint server for health sciences
Bipolar disorder is a leading contributor to disability, premature mortality, and suicide. Early identification of risk for bipolar disorder using generalizable predictive models trained on diverse cohorts around the United States could improve targeted assessment of high risk individuals, reduce misdiagnosis, and improve the allocation of limited mental health resources. This observational case-control study intended to develop and validate generalizable predictive models of bipolar disorder as part of the multisite, multinational PsycheMERGE Consortium across diverse and large biobanks with linked electronic health records (EHRs) from three academic medical centers: in the Northeast (Massachusetts General Brigham), the Mid-Atlantic (Geisinger) and the Mid-South (Vanderbilt University Medical Center). Predictive models were developed and validated with multiple algorithms at each study site: random forests, gradient boosting machines, penalized regression, including stacked ensemble learning algorithms combining them. Predictors were limited to widely available EHR-based features agnostic to a common data model including demographics, diagnostic codes, and medications. The main study outcome was bipolar disorder diagnosis as defined by the International Cohort Collection for Bipolar Disorder, 2015. In total, the study included records for 3,529,569 patients including 12,533 cases (0.3%) of bipolar disorder. After internal and external validation, algorithms demonstrated optimal performance in their respective development sites. The stacked ensemble achieved the best combination of overall discrimination (AUC = 0.82 - 0.87) and calibration performance with positive predictive values above 5% in the highest risk quantiles at all three study sites. In conclusion, generalizable predictive models of risk for bipolar disorder can be feasibly developed across diverse sites to enable precision medicine. Comparison of a range of machine learning methods indicated that an ensemble approach provides the best performance overall but required local retraining. These models will be disseminated via the PsycheMERGE Consortium website.
Walsh Colin G, Ripperger Michael A, Hu Yirui, Sheu Yi-Han, Wilimitis Drew, Zheutlin Amanda B, Rocha Daniel, Choi Karmel W, Castro Victor M, Kirchner H Lester, Chabris Christopher F, Davis Lea K, Smoller Jordan W

2023-Feb-26

General

General

Cyclic peptide structure prediction and design using AlphaFold.

In bioRxiv : the preprint server for biology
Deep learning networks offer considerable opportunities for accurate structure prediction and design of biomolecules. While cyclic peptides have gained significant traction as a therapeutic modality, developing deep learning methods for designing such peptides has been slow, mostly due to the small number of available structures for molecules in this size range. Here, we report approaches to modify the AlphaFold network for accurate structure prediction and design of cyclic peptides. Our results show this approach can accurately predict the structures of native cyclic peptides from a single sequence, with 36 out of 49 cases predicted with high confidence (pLDDT > 0.85) matching the native structure with root mean squared deviation (RMSD) less than 1.5 Å. Further extending our approach, we describe computational methods for designing sequences of peptide backbones generated by other backbone sampling methods and for de novo design of new macrocyclic peptides. We extensively sampled the structural diversity of cyclic peptides between 7-13 amino acids, and identified around 10,000 unique design candidates predicted to fold into the designed structures with high confidence. X-ray crystal structures for seven sequences with diverse sizes and structures designed by our approach match very closely with the design models (root mean squared deviation < 1.0 Å), highlighting the atomic level accuracy in our approach. The computational methods and scaffolds developed here provide the basis for custom-designing peptides for targeted therapeutic applications.
Rettie Stephen A, Campbell Katelyn V, Bera Asim K, Kang Alex, Kozlov Simon, De La Cruz Joshmyn, Adebomi Victor, Zhou Guangfeng, DiMaio Frank, Ovchinnikov Sergey, Bhardwaj Gaurav

2023-Feb-26