--- a +++ b/README.md @@ -0,0 +1,35 @@ +## About Dataset +### Relevant information: + +Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image. Separating plane described above was obtained using Multisurface Method-Tree (MSM-T) [K. P. Bennett, "Decision Tree Construction Via Linear Programming." Proceedings of the 4th Midwest Artificial Intelligence and Cognitive Science Society, pp. 97-101, 1992], a classification method which uses linear programming to construct a decision tree. Relevant features were selected using an exhaustive search in the space of 1-4 features and 1-3 separating planes. The actual linear program used to obtain the separating plane in the 3-dimensional space is that described in: [K. P. Bennett and O. L. Mangasarian: "Robust Linear Programming Discrimination of Two Linearly Inseparable Sets", Optimization Methods and Software 1, 1992, 23-34]. + +Number of instances: 569 + +Number of attributes: 32 (ID, diagnosis, 30 real-valued input features) + +Diagnosis (M = malignant, B = benign) + +Ten real-valued features are computed for each cell nucleus: + +a) radius (mean of distances from center to points on the perimeter) +b) texture (standard deviation of gray-scale values) +c) perimeter +d) area +e) smoothness (local variation in radius lengths) +f) compactness (perimeter^2 / area - 1.0) +g) concavity (severity of concave portions of the contour) +h) concave points (number of concave portions of the contour) +i) symmetry +j) fractal dimension ("coastline approximation" - 1) + +Missing attribute values: none + +Class distribution: 357 benign, 212 malignant + +### Creators: + +Dr. William H. Wolberg, General Surgery Dept., University of Wisconsin. + +W. Nick Street, Computer Sciences Dept., University of Wisconsin. + +Olvi L. Mangasarian, Computer Sciences Dept., University of Wisconsin. \ No newline at end of file