The field of Machine Learning, a subset of Artificial Intelligence, has led
to remarkable advancements in many areas, including medicine. Machine Learning
algorithms require large datasets to train computer models successfully.
Although there are medical image datasets available, more image datasets are
needed from a variety of medical entities, especially cancer pathology. Even
more scarce are ML-ready image datasets. To address this need, we created an
image dataset (LC25000) with 25,000 color images in 5 classes. Each class
contains 5,000 images of the following histologic entities: colon
adenocarcinoma, benign colonic tissue, lung adenocarcinoma, lung squamous cell
carcinoma, and benign lung tissue. All images are de-identified, HIPAA
compliant, validated, and freely available for download to AI researchers.
Andrew A. Borkowski, Marilyn M. Bui, L. Brannon Thomas, Catherine P. Wilson, Lauren A. DeLand, Stephen M. Mastorides