Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General General

DeepKhib: A Deep-Learning Framework for Lysine 2-Hydroxyisobutyrylation Sites Prediction.

In Frontiers in cell and developmental biology

As a novel type of post-translational modification, lysine 2-Hydroxyisobutyrylation (K hib ) plays an important role in gene transcription and signal transduction. In order to understand its regulatory mechanism, the essential step is the recognition of K hib sites. Thousands of K hib sites have been experimentally verified across five different species. However, there are only a couple traditional machine-learning algorithms developed to predict K hib sites for limited species, lacking a general prediction algorithm. We constructed a deep-learning algorithm based on convolutional neural network with the one-hot encoding approach, dubbed CNN OH . It performs favorably to the traditional machine-learning models and other deep-learning models across different species, in terms of cross-validation and independent test. The area under the ROC curve (AUC) values for CNN OH ranged from 0.82 to 0.87 for different organisms, which is superior to the currently available K hib predictors. Moreover, we developed the general model based on the integrated data from multiple species and it showed great universality and effectiveness with the AUC values in the range of 0.79-0.87. Accordingly, we constructed the on-line prediction tool dubbed DeepKhib for easily identifying K hib sites, which includes both species-specific and general models. DeepKhib is available at

Zhang Luna, Zou Yang, He Ningning, Chen Yu, Chen Zhen, Li Lei


deep learning, lysine 2-hydroxyisobutyrylation, machine learning, modification site prediction, post-translational modification

General General

Review on the Application of Machine Learning Algorithms in the Sequence Data Mining of DNA.

In Frontiers in bioengineering and biotechnology

Deoxyribonucleic acid (DNA) is a biological macromolecule. Its main function is information storage. At present, the advancement of sequencing technology had caused DNA sequence data to grow at an explosive rate, which has also pushed the study of DNA sequences in the wave of big data. Moreover, machine learning is a powerful technique for analyzing largescale data and learns spontaneously to gain knowledge. It has been widely used in DNA sequence data analysis and obtained a lot of research achievements. Firstly, the review introduces the development process of sequencing technology, expounds on the concept of DNA sequence data structure and sequence similarity. Then we analyze the basic process of data mining, summary several major machine learning algorithms, and put forward the challenges faced by machine learning algorithms in the mining of biological sequence data and possible solutions in the future. Then we review four typical applications of machine learning in DNA sequence data: DNA sequence alignment, DNA sequence classification, DNA sequence clustering, and DNA pattern mining. We analyze their corresponding biological application background and significance, and systematically summarized the development and potential problems in the field of DNA sequence data mining in recent years. Finally, we summarize the content of the review and look into the future of some research directions for the next step.

Yang Aimin, Zhang Wei, Wang Jiahao, Yang Ke, Han Yang, Zhang Limin


DNA pattern mining, DNA sequence, DNA sequence alignment, DNA sequence classification, DNA sequence clustering, data mining, machine learning

Public Health Public Health

Develop and Evaluate a New and Effective Approach for Predicting Dyslipidemia in Steel Workers.

In Frontiers in bioengineering and biotechnology

The convolutional neural network (CNN) has made certain progress in image processing, language processing, medical information processing and other aspects, and there are few relevant researches on its application in disease risk prediction. Dyslipidemia is a major and modifiable risk factor for cardiovascular disease, early detection of dyslipidemia and early intervention can effectively reduce the occurrence of cardiovascular diseases. Risk prediction model can effectively identify high-risk groups and is widely used in public health and clinical medicine. Steel workers are a special occupational group. Their particular occupational hazards, such as high temperatures, noise and shift work, make them more susceptible to disease than the general population, which makes the risk prediction model for the general population no longer applicable to steel workers. Therefore, it is necessary to establish a new model dedicated to the prediction of dyslipidemia of steel workers. In this study, the physical examination information of thousands of steel workers was collected, and the risk factors of dyslipidemia in steel workers were screened out. Then, based on the data characteristics, the corresponding parameters were set for the convolutional neural network model, and the risk of dyslipidemia in steel workers was predicted by using convolutional neural network. Finally, the predictive performance of the convolutional neural network model is compared with the existing predictive models of dyslipidemia, logistics regression model and BP neural network model. The results show that the convolutional neural network has a good predictive performance in the risk prediction of dyslipidemia of steel workers, and is superior to the Logistic regression model and BP neural network model.

Wu Jianhui, Qin Sheng, Wang Jie, Li Jing, Wang Han, Li Huiyuan, Chen Zhe, Li Chao, Wang Jiaojiao, Yuan Juxiang


convolutional neural network, deep learning, disease model prediction, dyslipidemia, model performance comparison, steel worker

General General

Evaluation of Artificial Intelligence in Participating Structure-Based Virtual Screening for Identifying Novel Interleukin-1 Receptor Associated Kinase-1 Inhibitors.

In Frontiers in oncology

Interleukin-1 receptor associated kinase-1 (IRAK1) exhibits important roles in inflammation, infection, and autoimmune diseases; however, only a few inhibitors have been discovered. In this study, at first, a discriminatory structure-based virtual screening (SBVS) was employed, but only one active compound (compound 1, IC50 = 2.25 μM) was identified. The low hit rate (2.63%) which derives from the weak discriminatory power of docking among high-scored molecules was observed in our virtual screening (VS) process for IRAK1 inhibitor. Furthermore, an artificial intelligence (AI) method, which employed a support vector machine (SVM) model, integrated information of molecular docking, pharmacophore scoring and molecular descriptors was constructed to enhance the traditional IRAK1-VS protocol. Using AI, it was found that VS of IRAK1 inhibitors excluded by over 50% of the inactive compounds, which could significantly improve the prediction accuracy of the SBVS model. Moreover, four active molecules (two of which exhibited comparative IC50 with compound 1) were accurately identified from a set of highly similar candidates. Amongst, compounds with better activity exhibited good selectivity against IRAK4. The AI assisted workflow could serve as an effective tool for enhancement of SBVS.

Che Jinxin, Feng Ruiwei, Gao Jian, Yu Hongyun, Weng Qinjie, He Qiaojun, Dong Xiaowu, Wu Jian, Yang Bo


IRAK1, artificial intelligence, inhibitors, machine learning, virtual screening

General General

Artificial intelligence in in vitro fertilization: a computer decision support system for day-to-day management of ovarian stimulation during in vitro fertilization.

In Fertility and sterility ; h5-index 78.0

OBJECTIVE : To describe a computer algorithm designed for in vitro fertilization (IVF) management and to assess the algorithm's accuracy in the day-to-day decision making during ovarian stimulation for IVF when compared to evidence-based decisions by the clinical team.

DESIGN : Descriptive and comparative study of new technology.

SETTING : Private fertility practice.


PATIENT(S) : Data were derived from monitoring during ovarian stimulation from IVF cycles. The database consisted of 2,603 cycles (1,853 autologous and 750 donor cycles) incorporating 7,376 visits for training. An additional 556 unique cycles were used for challenge and to calculate accuracy. There were 59,706 data points. Input variables included estradiol concentrations in picograms per milliliter; ultrasound measurements of follicle diameters in two dimensions in millimeters; cycle day during stimulation and dose of recombinant follicle-stimulating hormone during ovarian stimulation for IVF.

MAIN OUTCOME MEASURE(S) : Accuracy of the algorithm to predict four critical clinical decisions during ovarian stimulation for IVF: [1] stop stimulation or continue stimulation. If the decision was to stop, then the next automated decision was to [2] trigger or cancel. If the decision was to return, then the next key decisions were [3] number of days to follow-up and [4] whether any dosage adjustment was needed.

RESULT(S) : Algorithm accuracies for these four decisions are as follows: continue or stop treatment: 0.92; trigger and schedule oocyte retrieval or cancel cycle: 0.96; dose of medication adjustment: 0.82; and number of days to follow-up: 0.87. These accuracies are for first iteration of the algorithm.

CONCLUSION(S) : We describe a first iteration of a predictive analytic algorithm that is highly accurate and in agreement with evidence-based decisions by expert teams during ovarian stimulation during IVF. These tools offer a potential platform to optimize clinical decision making during IVF.

Letterie Gerard, Mac Donald Andrew


IVF, artificial intelligence, decision support systems, predictive analytics

General General

Sentiment Analysis of COVID-19 tweets by Deep Learning Classifiers-A study to show how popularity is affecting accuracy in social media.

In Applied soft computing

COVID-19 originally known as Corona VIrus Disease of 2019, has been declared as a pandemic by World Health Organization (WHO) on 11th March 2020. Unprecedented pressures have mounted on each country to make compelling requisites for controlling the population by assessing the cases and properly utilizing available resources. The rapid number of exponential cases globally has become the apprehension of panic, fear and anxiety among people. The mental and physical health of the global population is found to be directly proportional to this pandemic disease. The current situation has reported more than twenty four million people being tested positive worldwide as of 27th August, 2020 Therefore, it's the need of the hour to implement different measures to safeguard the countries by demystifying the pertinent facts and information. This paper aims to bring out the fact that tweets containing all handles related to COVID-19 and WHO have been unsuccessful in guiding people around this pandemic outbreak appositely. This study analyses two types of tweets gathered during the pandemic times. In one case, around twenty three thousand most re-tweeted tweets within the time span from1st Jan 2019 to 23rd March 2020 have been analysed and observation says that the maximum number of the tweets portrays neutral or negative sentiments. On the other hand, a dataset containing 226668 tweets collected within the time span between December 2019 and May 2020 have been analysed which contrastingly show that there were a maximum number of positive and neutral tweets tweeted by netizens. The research demonstrates that though people have tweeted mostly positive regarding COVID-19, yet netizens were busy engrossed in re-tweeting the negative tweets and that no useful words could be found in WordCloud or computations using word frequency in tweets. The claims have been validated through a proposed model using deep learning classifiers with admissible accuracy up to 81%. Apart from these the authors have proposed the implementation of a Gaussian membership function based fuzzy rule base to correctly identify sentiments from tweets. The accuracy for the said model yields up to a permissible rate of 79%.

Chakraborty Koyel, Bhatia Surbhi, Bhattacharyya Siddhartha, Platos Jan, Bag Rajib, Hassanien Aboul Ella


00-01, 99-00, COVID-19, Deep learning, Emotional intelligence, Fuzzy rule, Gaussian membership function, Sentiment analysis, Tweets, WHO