Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

In International journal of biological macromolecules

The cell surface proteins of gram-positive bacteria are involved in many important biological functions, including the infection of host cells. Owing to their virulent nature, these proteins are also considered strong candidates for potential drug or vaccine targets. Among the various cell surface proteins of gram-positive bacteria, LPXTG-like proteins form a major class. These proteins have a highly conserved C-terminal cell wall sorting signal, which consists of an LPXTG sequence motif, a hydrophobic domain, and a positively charged tail. These surface proteins are targeted to the cell envelope by a sortase enzyme via transpeptidation. A variety of LPXTG-like proteins have been experimentally characterized; however, their number in public databases has increased owing to extensive bacterial genome sequencing without proper annotation. In the absence of experimental characterization, identifying and annotating these sequences is extremely challenging. Therefore, in this study, we developed the first machine learning-based predictor called GPApred, which can identify LPXTG-like proteins from their primary sequences. Using a newly constructed benchmark dataset, we explored different classifiers and five feature encodings and their hybrids. Optimal features were derived using the recursive feature elimination (RFE) method, and these features were then trained using a support vector machine algorithm. The performance of different models was evaluated using independent datasets, and a final model (GPApred) was selected based on consistency during cross-validation and independent assessment. GPApred can be an effective tool for predicting LPXTG-like sequences and can be further employed for functional characterization or drug targeting. Availability: https://procarb.org/gpapred/.

Malik Adeel, Shoombuatong Watshara, Kim Chang-Bae, Manavalan Balachandran

2022-Dec-31

Cell wall sorting signal, Feature selection, Machine learning, Sortase, Support vector machine, Surface proteins