Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

Radiology Radiology

Tubular Shape Aware Data Generation for Semantic Segmentation in Medical Imaging

ArXiv Preprint

Chest X-ray is one of the most widespread examinations of the human body. In interventional radiology, its use is frequently associated with the need to visualize various tube-like objects, such as puncture needles, guiding sheaths, wires, and catheters. Detection and precise localization of these tube-like objects in the X-ray images is, therefore, of utmost value, catalyzing the development of accurate target-specific segmentation algorithms. Similar to the other medical imaging tasks, the manual pixel-wise annotation of the tubes is a resource-consuming process. In this work, we aim to alleviate the lack of the annotated images by using artificial data. Specifically, we present an approach for synthetic data generation of the tube-shaped objects, with a generative adversarial network being regularized with a prior-shape constraint. Our method eliminates the need for paired image--mask data and requires only a weakly-labeled dataset (10--20 images) to reach the accuracy of the fully-supervised models. We report the applicability of the approach for the task of segmenting tubes and catheters in the X-ray images, whereas the results should also hold for the other imaging modalities.

Ilyas Sirazitdinov, Heinrich Schulz, Axel Saalbach, Steffen Renisch, Dmitry V. Dylov


General General

Answerable and Unanswerable Questions in Risk Analysis with Open-World Novelty.

In Risk analysis : an official publication of the Society for Risk Analysis

Decision analysis and risk analysis have grown up around a set of organizing questions: what might go wrong, how likely is it to do so, how bad might the consequences be, what should be done to maximize expected utility and minimize expected loss or regret, and how large are the remaining risks? In probabilistic causal models capable of representing unpredictable and novel events, probabilities for what will happen, and even what is possible, cannot necessarily be determined in advance. Standard decision and risk analysis questions become inherently unanswerable ("undecidable") for realistically complex causal systems with "open-world" uncertainties about what exists, what can happen, what other agents know, and how they will act. Recent artificial intelligence (AI) techniques enable agents (e.g., robots, drone swarms, and automatic controllers) to learn, plan, and act effectively despite open-world uncertainties in a host of practical applications, from robotics and autonomous vehicles to industrial engineering, transportation and logistics automation, and industrial process control. This article offers an AI/machine learning perspective on recent ideas for making decision and risk analysis (even) more useful. It reviews undecidability results and recent principles and methods for enabling intelligent agents to learn what works and how to complete useful tasks, adjust plans as needed, and achieve multiple goals safely and reasonably efficiently when possible, despite open-world uncertainties and unpredictable events. In the near future, these principles could contribute to the formulation and effective implementation of more effective plans and policies in business, regulation, and public policy, as well as in engineering, disaster management, and military and civil defense operations. They can extend traditional decision and risk analysis to deal more successfully with open-world novelty and unpredictable events in large-scale real-world planning, policymaking, and risk management.

Cox Louis Anthony


Decision analysis, risk analysis

General General

Bidirectional long short-term memory for surgical skill classification of temporally segmented tasks.

In International journal of computer assisted radiology and surgery

PURPOSE : The majority of historical surgical skill research typically analyzes holistic summary task-level metrics to create a skill classification for a performance. Recent advances in machine learning allow time series classification at the sub-task level, allowing predictions on segments of tasks, which could improve task-level technical skill assessment.

METHODS : A bidirectional long short-term memory (LSTM) network was used with 8-s windows of multidimensional time-series data from the Basic Laparoscopic Urologic Skills dataset. The network was trained on experts and novices from four common surgical tasks. Stratified cross-validation with regularization was used to avoid overfitting. The misclassified cases were re-submitted for surgical technical skill assessment to crowds using Amazon Mechanical Turk to re-evaluate and to analyze the level of agreement with previous scores.

RESULTS : Performance was best for the suturing task, with 96.88% accuracy at predicting whether a performance was an expert or novice, with 1 misclassification, when compared to previously obtained crowd evaluations. When compared with expert surgeon ratings, the LSTM predictions resulted in a Spearman coefficient of 0.89 for suturing tasks. When crowds re-evaluated misclassified performances, it was found that for all 5 misclassified cases from peg transfer and suturing tasks, the crowds agreed more with our LSTM model than with the previously obtained crowd scores.

CONCLUSION : The technique presented shows results not incomparable with labels which would be obtained from crowd-sourced labels of surgical tasks. However, these results bring about questions of the reliability of crowd sourced labels in videos of surgical tasks. We, as a research community, should take a closer look at crowd labeling with higher scrutiny, systematically look at biases, and quantify label noise.

Kelly Jason D, Petersen Ashley, Lendvay Thomas S, Kowalewski Timothy M


Bidirectional LSTM, Crowd sourcing, Machine learning, Surgical skill, Surgical technical skill

Radiology Radiology

Development of machine learning models for predicting postoperative delayed remission of patients with Cushing's disease.

In The Journal of clinical endocrinology and metabolism

CONTEXT : Postoperative hypercortisolemia mandates further therapy in patients with Cushing's disease (CD). Delayed remission (DR) is defined as not achieving postoperative immediate remission (IR), but having spontaneous remission during long-term follow-up.

OBJECTIVE : We aimed to develop and validate machine learning (ML) models for predicting DR in non-IR patients with CD.

METHODS : We enrolled 201 CD patients, and randomly divided them into training and test datasets. We then used the recursive feature elimination (RFE) algorithm to select features, and applied five ML algorithms to construct DR prediction models. We used permutation importance and local interpretable model-agnostic explanation (LIME) algorithms to determine the importance of the selected features and interpret the ML models.

RESULTS : Eighty-eight (43.8 %) of the 201 CD patients met the criteria for DR. Overall, patients who were younger, had low body-mass index, Knosp grade III-IV and a tumor not found by pathological examination tended to achieve a lower rate of DR. After RFE feature selection, the Adaboost model, which comprised 18 features, had the greatest discriminatory ability, and its predictive ability was significantly better than using Knosp grade and postoperative immediate morning serum cortisol (PoC). The results obtained from permutation importance and LIME algorithms showed that preoperative 24-hour urine free cortisol, PoC and age were the most important features, and showed the reliability and clinical practicability of Adaboost model in DC prediction.

CONCLUSIONS : ML-based models could serve as an effective non-invasive approach to predicting DR, and could aid in determining individual treatment and follow-up strategies for CD patients.

Fan Yanghua, Li Yichao, Bao Xinjie, Zhu Huijuan, Lu Lin, Yao Yong, Li Yansheng, Su Mingliang, Feng Feng, Feng Shanshan, Feng Ming, Wang Renzhi


“Cushings disease”, Delayed remission, Local interpretable model–agnostic explanation, Machine learning

General General

Predicting cardiac arrest in the emergency department.

In Journal of the American College of Emergency Physicians open

In-hospital cardiac arrest remains a leading cause of death: roughly 300,000 in-hospital cardiac arrests occur each year in the United States, ≈10% of which occur in the emergency department. ED-based cardiac arrest may represent a subset of in-hospital cardiac arrest with a higher proportion of reversible etiologies and a higher potential for neurologically intact survival. Patients presenting to the ED have become increasingly complex, have a high burden of critical illness, and face crowded departments with thinly stretched resources. As a result, patients in the ED are vulnerable to unrecognized clinical deterioration that may lead to ED-based cardiac arrest. Efforts to identify patients who may progress to ED-based cardiac arrest have traditionally been approached through identification of critically ill patients at triage and the identification of patients who unexpectedly deteriorate during their stay in the ED. Interventions to facilitate appropriate triage and resource allocation, as well as earlier identification of patients at risk of deterioration in the ED, could potentially allow for both prevention of cardiac arrest and optimization of outcomes from ED-based cardiac arrest. This review will discuss the epidemiology of ED-based cardiac arrest, as well as commonly used approaches to predict ED-based cardiac arrest and highlight areas that require further research to improve outcomes for this population.

Mitchell Oscar J L, Edelson Dana P, Abella Benjamin S


cardiac arrest, deterioration, early warning scores, machine learning, prediction, quality improvement, triage

General General

DIY AI, deep learning network development for automated image classification in a point-of-care ultrasound quality assurance program.

In Journal of the American College of Emergency Physicians open

Background : Artificial intelligence (AI) is increasingly a part of daily life and offers great possibilities to enrich health care. Imaging applications of AI have been mostly developed by large, well-funded companies and currently are inaccessible to the comparatively small market of point-of-care ultrasound (POCUS) programs. Given this absence of commercial solutions, we sought to create and test a do-it-yourself (DIY) deep learning algorithm to classify ultrasound images to enhance the quality assurance work-flow for POCUS programs.

Methods : We created a convolutional neural network using publicly available software tools and pre-existing convolutional neural network architecture. The convolutional neural network was subsequently trained using ultrasound images from seven ultrasound exam types: pelvis, heart, lung, abdomen, musculoskeletal, ocular, and central vascular access from 189 publicly available POCUS videos. Approximately 121,000 individual images were extracted from the videos, 80% were used for model training and 10% each for cross validation and testing. We then tested the algorithm for accuracy against a set of 160 randomly extracted ultrasound frames from ultrasound videos not previously used for training and that were performed on different ultrasound equipment. Three POCUS experts blindly categorized the 160 random images, and results were compared to the convolutional neural network algorithm. Descriptive statistics and Krippendorff alpha reliability estimates were calculated.

Results : The cross validation of the convolutional neural network approached 99% for accuracy. The algorithm accurately classified 98% of the test ultrasound images. In the new POCUS program simulation phase, the algorithm accurately classified 70% of 160 new images for moderate correlation with the ground truth, α = 0.64. The three blinded POCUS experts correctly classified 93%, 94%, and 98% of the images, respectively. There was excellent agreement among the experts with α = 0.87. Agreement between experts and algorithm was good with α = 0.74. The most common error was misclassifying musculoskeletal images for both the algorithm (40%) and POCUS experts (40.6%). The algorithm took 7 minutes 45 seconds to review and classify the new 160 images. The 3 expert reviewers took 27, 32, and 45 minutes to classify the images, respectively.

Conclusions : Our algorithm accurately classified 98% of new images, by body scan area, related to its training pool, simulating POCUS program workflow. Performance was diminished with exam images from an unrelated image pool and ultrasound equipment, suggesting additional images and convolutional neural network training are necessary for fine tuning when using across different POCUS programs. The algorithm showed theoretical potential to improve workflow for POCUS program directors, if fully implemented. The implications of our DIY AI for POCUS are scalable and further work to maximize the collaboration between AI and POCUS programs is warranted.

Blaivas Michael, Arntfield Robert, White Matthew


artificial intelligence, deep learning, emergency medicine, emergency ultrasound, point‐of‐care ultrasound