Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

bioRxiv Preprint

Artificial intelligence (AI) programs that train on a large amount of data require powerful compute infrastructure. Jupyterlab notebook provides an excellent framework for developing AI programs but it needs to be hosted on a powerful infrastructure to enable AI programs to train on large data. An open-source, docker-based, and GPU-enabled jupyterlab notebook infrastructure has been developed that runs on the public compute infrastructure of Galaxy Europe for rapid prototyping and developing end-to-end AI projects. Using such a notebook, long-running AI model training programs can be executed remotely. Trained models, represented in a standard open neural network exchange (ONNX) format, and other resulting datasets are created in Galaxy. Other features include GPU support for faster training, git integration for version control, the option of creating and executing pipelines of notebooks, and the availability of multiple dashboards for monitoring compute resources. These features make the jupyterlab notebook highly suitable for creating and managing AI projects. A recent scientific publication that predicts infected regions of COVID-19 CT scan images is reproduced using multiple features of this notebook. In addition, colabfold, a faster implementation of alphafold2, can also be accessed in this notebook to predict the 3D structure of protein sequences. Jupyterlab notebook is accessible in two ways - first as an interactive Galaxy tool and second by running the underlying docker container. In both ways, long-running training can be executed on Galaxy's compute infrastructure.

Kumar, A.; Cuccuru, G.; Gruening, B.; Backofen, R.

2022-07-11