Automated histological classification of whole slide images of colorectal biopsy specimens

Background An automated image analysis system, e-Pathologist, was developed to improve the quality of colorectal biopsy diagnostics in routine pathology practice. Objective The aim of the study was to evaluate the classification accuracy of the e-Pathologist image analysis software in the setting of routine pathology practice in two institutions. Materials and methods In total, 1328 colorectal tissue specimens were consecutively obtained from two hospitals (1077 tissues from Tokyo hospital, and 251 tissues from East hospital) and the stained specimen slides were anonymized and digitized. At least two experienced gastrointestinal pathologists evaluated each slide for pathological diagnosis. We compared the 3-tier classification results (carcinoma or suspicion of carcinoma, adenoma, and lastly negative for a neoplastic lesion) between the human pathologists and that of e-Pathologist. Results For the Tokyo hospital specimens, all carcinoma tissues were correctly classified (n=112), and 9.9% (80/810) of the adenoma tissues were incorrectly classified as negative. For the East hospital specimens, 0 out of the 51 adenoma tissues were incorrectly classified as negative while 9.3% (11/118) of the carcinoma tissues were incorrectly classified as either adenoma, or negative. For the Tokyo and East hospital datasets, the undetected rate of carcinoma, undetected rate of adenoma, and over-detected proportion were 0% and 9.3%, 9.9% and 0%, and 36.1% and 27.1%, respectively. Conclusions This image analysis system requires some improvements; however, it has the potential to assist pathologists in quality improvement of routine pathological practice in the not too distant future.


Supplementary Material 2. Color normalization
Our procedure was as follows: first, regions of nuclei and background were roughly extracted from an image according to color and shape, and the average RGB color of the nuclear regions, and the background were computed. Second, in order to transform the computed average color of the nuclear regions and background to the specific predefined nuclear and white color respectively, a transformation curve was computed for each of the RGB components. This curve was an adopted gamma curve. Finally, the input RGB color image was corrected using the transformation curve calculated above. This method can be applied to various image datasets collected during experimental setups. Supplementary Figure S4 shows an example of color normalization and color distribution.

Supplementary Material 3. Structural atypia analysis
In pre-processing (Step 1), the mask image was created to eliminate non-target areas such as lymph nodule and crushed artifact. We proposed a method for automated segmentation of lymph nodules. The algorithm involved three steps: enhancement of hematoxylin color, binarization, and dilatation of the binary image to expand the extracted objects corresponding to lymph nodules.
In low-magnification analysis (Step 2), the thickness of glandular nuclei was quantified to evaluate glandular atypia based on the low-magnification image, 1.25x, where a thickness of normal glandular nuclei is approximately one pixel. We proposed a method for quantifying thickness of glandular nuclei, which would correspond to glandular atypia level. The algorithm involved eight steps: (1) reduction of image size using bicubic interpolation [2], (2) conversion of RGB values (24bit) into grayscale values (8-bit), (3) suppression of the low maxima to remove less significant objects [3], (4) enhancement of contrast using morphological operations [4], (5) binarization, (6) deletion of the regions superposed on the lymph nodules extracted in Step 1, (7) calculation of the glandular level value in the range of 1 to 10 for each glandular nuclei component.
The level value was calculated as , where R S = S o / S bb , R L = S o / L bb , S o is area of a object, S bb is area of bounding box, L bb is perimeter of bounding box, and (8) classification of thin-class or thickclass for an input tissue image based on a rule-base classifier using the following features: the number of objects per level, cumulative area of objects per level, and maximum area of objects per level. In the case where a tissue image is classified as thick-class, the process proceeded to the next high-magnification analysis (Step 3). Two examples of the image processing for glandular segmentation are shown in Supplementary Figure S5.
In high-magnification analysis (Step 3), the local arrangement of glandular components was evaluated. We proposed a method for segmenting glandular nuclei, cytoplasm, lumen, and stroma, and evaluating the distribution of both glandular nuclei, and cytoplasm. The algorithm involved six steps: (1) segmentation of a tissue image into glandular nuclei, and glandular cytoplasm including goblet cells, lumen, and stroma by using color information, (2) extraction of glandular components combining glandular nuclei, and cytoplasm, (3) deletion of the regions superposed on the lymph nodules extracted in Step 1, (4) calculation of six features for each gland, f1: the number of nuclear components, f2: the ratio of nuclear component area to a glandular area, f3: the average area of the nuclear components (nuclear components area/the number of nuclei components), f4: the ratio of the number of nuclear components to a glandular area, f5: the ratio of the area of nuclear components including lumen side half of gland thickness, to that of the included stroma side half of the glandular thickness using lumen location (Supplementary Figure S6), f6: the ratio of the area of nuclear components including lumen side half of glandular thickness, to that of the included stroma side half of the glandular thickness using stroma location (Supplementary Figure S6), (5) classification of disturbed-class, middle-class, or regular-class for each gland based on the rule which was constructed by the rule-base classifier, and was modified under the consideration of the morphological meaning, and (6) classification of high atypia level, middle atypia level, or low atypia level for an input tissue image, when even one disturbed-class, middle-class, or otherwise is present, respectively.