Skin cancer Detection using Machine learning .The purpose of this project is to create a tool that considering the image of a mole, can calculate the probability that a mole can be malign.
Skin cancer is a common disease that affect a big amount of peoples. Some facts about skin cancer:
Every year there are more new cases of skin cancer than the combined incidence of cancers of the breast, prostate, lung and colon.
An estimated 87,110 new cases of invasive melanoma will be diagnosed in the U.S. in 2017.
The estimated 5-year survival rate for patients whose melanoma is detected early is about 98 percent in the U.S. The survival rate falls to 62 percent when the disease reaches the lymph nodes, and 18 percent when the disease metastasizes to distant organs.
Development process and Data
The idea of this project is to construct a CNN model that can predict the probability that a specific mole can be malign.
Data: Skin cancer Detection using Machine learning
To train this model I'm planning to use a set of images from the International Skin Imaging Collaboration:
Mellanoma Project ISIC https://isic-archive.com.
The specific datasets to use are:
ISICUDA-21: Moles and melanomas. Biopsy-confirmed melanocytic lesions. Both malignant and benign lesions are included.
ISICUDA-11 Moles and melanomas. Biopsy-confirmed melanocytic lesions. Both malignant and
benign lesions are included.
ISICMSK-21: Benign and malignant skin lesions. Biopsy-confirmed melanocytic and non-melanocytic lesions.
Benign: 1167 (Not used)
ISICMSK-12: Both malignant and benign melanocytic and non-melanocytic lesions. Almost all images confirmed by histopathology. Images not taken with modern digital cameras.
ISICMSK-11: Moles and melanomas. Biopsy-confirmed melanocytic lesions, both malignant and benign.
Benign: 448 Malign: 224
As summary the total images to use are:
|Benign Images||Malign Images|
Some sample images are shown below: 1. Sample images of benign moles:
Sample images of malign moles:
The following preprocessing tasks are going to be developed for each image: 1. Visual inspection to detect images with low quality or not representative 2. Image resizing: Transform images to 128x128x3 3. Crop images: Automatic or manual Crop 4. Other to define later in order to improve model quality
The idea is to develop a simple CNN model from scratch, and evaluate the performance to set a baseline. The following steps to improve the model are: 1. Data augmentation: Rotations, noising, scaling to avoid overfitting 2. Transferred Learning: Using a pre-trained network construct some additional layer at the end to fine tuning our model. (VGG-16, or other) 3. Others to define.
To evaluate the different models we will use ROC Curves and AUC score. To choose the correct model we will evaluate the precision and accuracy to set the threshold level that represent a good tradeoff between TPR and FPR.