Lesion detection and segmentation using a convolutional network of 3D patches (MICCAI-MSSEG 2016)

Files for the MSSEG challenge of the MICCAI 2016. This repository containes code and the weights for the two nets.

Project maintained by marianocabezas Hosted on GitHub Pages — Theme by mattgraham


Automatic multiple sclerosis (MS) lesion segmentation in magnetic resonance (MR) imaging is a challenging task due to the small size of the lesions, its heterogeneous shape and distribution, overlapping tissue intensity distributions, and the inherent artifacts of MR images. Here, we propose a convolutional neural network trained with 3D patches of candidate lesion voxels. The method uses 4 anatomical MR images: T1-weigthed, T2-weighted, PD-weighted and T2-FLAIR-weighted.


Magnetic resonance (MR) imaging of the brain has been widely used during the last years in clinical practice. This image modality presents a high contrast for soft tissues, including white matter lesions (WML). Expert tracing of these lesions is a time-consuming task prone to observer errors. On the other hand, intensity inhomogeneities and image artifacts difficult the task of obtaining an automatic and reliable segmentation of these lesions based only on intensity features.

Common supervised approaches rely on the use of a classification algorithm. These algorithms involve a first stage in which a model is estimated on training data composed by a set of features and their corresponding ground truth, and a second stage in which the model is tested on a new dataset to provide the desired classification. These features can include information from an atlas, context, spatial coordiantes or even texture. However, classic machine learning methods require hand-crafting these feature vectors to extract appearance information using, for instance, Gaussian or Haar-like kernels. In contrast, convolutional neural networks (CNNs) learn sets of convolution kernels that are specifically created for the task at hand.

Currently, CNNs have demonstrated a superior performance in several computer vision tasks including handwriting recognition, classification of 2D images in 1000 classes, segmentation of crowds in surveillance videos or the application of a painting’s style to other pictures. Recently, CNNs have also gained popularity in medical imaging in general, and brain imaging specifically.

Different architectures have been published in the literature to tackle the abovementioned problems. For instance, Zhang et al proposed a deep convolutional neural network for segmenting isointense brain tissues using multi-modality MR images. Their multiple intermediate layers applied convolution, pooling, normalization and other operations on 2D patches to capture highly nonlinear mappings between the inputs and the outputs. Moreover, Moeskops et al, also presented a CNN architecture based on 2D patches of a single anatomical MR image. However, in their work they used different patch sizes to which they applied different convolutional layers and average pooling that were finally combined using a fully connected layer with softmax to obtain a 9 class segmentation including background. On the other hand, Brosch et al defined a multiscale fully convolutional encoder network with shortcuts to segment lesions using the whole brain image. However, as pointed out in their work, this kind of network needs a large number of cases in order to train a deep network of more than 1 layer for the convolutional and deconvolutional pathways.

Here, we present a 3D CNN that uses 3D candidate voxel patches to train an architecture that incorporates convolutional layers, max pooling and dense layers to obtain the probability for each candidate voxel of being lesion. This map is then post-processed to obtain a final lesion mask.



We decided to use the pre-processed dataset to focus exclusively on the CNNs implementation. This dataset has been denoised with the NL-means algorithm and rigidly registered of each image towards the FLAIR image. Moreover, the skull has been stripped using the volBrain platform on the T1 image and applied on the other modalities with sinc interpolation, and, finally, bias correction was applied using the N4 algorithm.

In order to train our CNN architecture, we also normalised the intensities for each image using the mean and standard deviation of the brain voxels.

CNN architecture

Most of the CNNs from the literature use 2D images or patches to segment tissues or lesions. However, when using MR images, such approaches are prone to false positives on some slices, due to the similarity between lesions and artifacts in some slices. By using 3D patches we can discard those false positives that are clearly not lesions when analysed in 3D. Furthermore, we decided to use patches instead of the whole image as input to obtain a higher number of training samples (positive and negative) while also reducing the amount of parameters to optimise as network weights. For each candidate voxel, we define a patch of size 15 × 15 × 15 for each image i ∈ [T1, T2, PD, FLAIR]. Therefore, our input vector has a size of N × 4 × 15 × 15 × 15.

With this input vector, we desgined our CNN architecture detailed in figure 1. The first convolutional layer contained 32 filters of size 5 × 5 × 5, followed by a max pooling of size 2 with stride 2. The following convolutional layer had 64 filters of the same size, also followed by a max pooling with the previous parameters. Afterwards we apply a dropout on probabilities lower than 0.5 to reduce overfitting, and we finish the architecture with 2 dense layers. The first one had 256 outputs and the last one was a 2-way softmax to obtain the probabilities for the 2 possible classes (lesion and not lesion).


Since the dataset is unbalanced, we have a larger number of voxels that belong to normal appearing tissues than lesion voxels, we decided to apply an iterative process during training. This process has two main steps that train the same CNN architecture with different data.

During the initial step, we select a random number of negative voxels. First, we applied an empirical threshold of 1.5 (deviations) on the normalised FLAIR image to obtain all hyperintense candidate voxels. From this set of voxels, we used all the voxels defined as lesion in the consensus and a random sampling of the same size of all the candidates that are not lesion, in order to balance the dataset. Since this initial selection is suboptimal, some of the tissue voxels from each image have a high probability of belonging to lesion and would be classified as false positives. Therefore, we must train again our network with a new subset of negative voxels to better classify these voxels.

During the second and final step, we test the training images with the first network to obtain a probabilistic map. In order to select challenging false positive, we apply a threshold of 0.5 to this map and we randomly select a sample of the negative voxels inside this mask of the same size as the lesion voxel dataset.

Finally, we train again the same CNN architecture with these new training set. Since both networks use the same positive voxels, when testing both networks, all lesions are correctly classified. However, since we used different negative voxels, the probabilistic maps present different false positive detections that do not overlap. Therefore, we decided to multiply the output of both networks to maximise the true positives and minimise the false positives.

CNN parameter tunning

Our CNN architecture was developed in Python using the nolearn and Lasagne modules for Theano. The batch size of the net for training was set to 4096 and the maximum number of epochs was set to 50 for the initial iteration and 2000 for the final iteration (even though it automatically stops if there is no improvement after 50 iterations). To update the weights we used the ADAM learning algorithm.

In order to evaluate the performance of the net, we used the training dataset with a leave-one-out strategy.

Authors and Contributors

In 2016, Sergi Valverde (@sergivalverde), Mariano Cabezas (@marianocabezas), Eloy Roura (@eloyroura), Sandra González-Villà, Arnau Oliver and Xavier Lladó. Strategy developed by the NIC-VICOROB team (@NIC-VICOROB)

Support or Contact

Having trouble with the code? Contact with me.


This work has been partially supported by ”La Fundació la Marató de TV3”, by Retos de Investigación TIN2014-55710-R, and by MPC UdG 2016/022 grant. The authors gratefully acknowledge the support of the NVIDIA Corporation with their donation of the Tesla K40 GPU used in this research.