Publish AI, ML & data-science insights to a global community of data professionals.

AI Got Your Back Segmented (PyTorch)

UNet segmentation for spinal columns

Part of the image by this source and part by the author
Part of the image by this source and part by the author

Artificial intelligence (AI) will increasingly be used in the field of healthcare as we gather more data day by day. One of the key categories of AI applications in healthcare is diagnosis. AI in medical diagnosis helps with decision making, management, automation, and more.

The spine is an important part of the musculoskeletal system, supporting the body and its organ structure while playing a major role in our mobility and load transfer. It also protects the spinal cord from injuries and mechanical shocks due to impacts.

In an automated spine processing pipeline vertebral labeling and segmentation are two fundamental tasks. Reliable and accurate processing of spine images is expected to benefit clinical decision support systems for diagnosis, surgery planning, and population-based analysis of spine and bone health. Designing automated algorithms for spine processing is challenging predominantly due to considerable variations in anatomy and acquisition protocols and due to a severe shortage of publicly available data.

In this blog, I will be only focusing on the segmentation of the vertebral column from the given CT scans dataset. The task of labeling each vertebra and further diagnosis has not been included in this blog and can be done as a continuation of this task.


Spine or vertebral segmentation is a crucial step in all applications regarding automated quantification of spinal morphology and pathology. With the advent of deep learning, for such a task on computed tomography (CT) scans, big and varied data is a primary sought-after resource. However, a large-scale, public dataset is currently unavailable. The VerSe is a large-scale, multi-detector, multi-site, CT spine dataset consisting of 374 scans from 355 patients. There are both 2019 and 2020 datasets. For this blog, I have combined both datasets into one dataset to benefit from more data.

GitHub – anjany/verse: Everything about the ‘Large Scale Vertebrae Segmentation Challenge’ @ MICCAI…

The data is provided under the CC BY-SA 4.0 License, making it fully open-sourced.

NIfTI (Neuroimaging Informatics Technology Initiative) is a type of file format for neuroimaging. NIfTI files are used very commonly in imaging informatics for neuroscience and even neuroradiology research. Each NIfTI file contains metadata and a voxel in up to 7 dimensions and supports a variety of data types. The first three dimensions are reserved to define the three spatial dimensions x, y, and z, while the fourth dimension is reserved to define the time points t. The remaining dimensions, from fifth to seventh, are for other uses. The fifth dimension, however, can still have some predefined uses, such as to store voxel-specific distributional parameters or to hold vector-based data. The VerSe dataset contains zip files of NIfTI files.

ITK-SNAP is a software application used to segment structures in 3D medical images. It is open-source software that can be installed on different platforms. I have used it to be able to visualize the NifTi files in 3D view as well as loading and overlaying 3D masks on the raw images. I highly recommend using it for this task.

CT scan machine - Image source
CT scan machine – Image source

Computed tomography (CT) is an x-ray imaging procedure in which a narrow beam of x-rays is aimed at a patient as quickly rotating around the body. The signals gathered by the machine will be stored in a computer to generate cross-sectional images also known as "slices" of the body. These slices are called tomographic images and contain more detailed information than conventional x-rays. Series of slices can be digitally "stacked" together to form a 3D image of the patient that allows for easier identification and location of basic structures as well as possible tumors or abnormalities.

The action steps are as follows. I started with downloading both 2019 and 2020 datasets. Then I combined both datasets into their train, validation, and test folder. The next step is reading CT-scan images and converting each slice of CT-scan image to a series of PNG raw images and masks. Later I used the UNet model from this Github repo and train a segmentation model.


Data understanding: Before starting with data processing and training I would like to load a couple of NIfTI files to get more familiar with their 3D data structure and be able to visualize them and extract metadata from images.

After downloading the VerSe dataset I was able to open one .nii.gz file using the NiBabel library (explained below). By reading one file and looking at one specific slice of the CT-scan image I was able to run the Numpy transpose function to view one slice in three different views of Axial, Sagittal, and Coronal.

One slice of a CT scan in three different angels - Image by author
One slice of a CT scan in three different angels – Image by author

After I got more familiar with raw images and was able to take one slice out of the raw 3D image now it is time to look at the mask file for the same slice. As you can see in the image below I was able to overlay mask slice over raw image slice. The reason we see gradient color here is that the mask files are not only defined areas of each vertebra that exists but also they have different labels (showed with different colors) along with them which is the number or label of each vertebra. To better understand vertebral column labeling you can refer to this page.

Overlay of mask data on one slice of the raw image - Image by author
Overlay of mask data on one slice of the raw image – Image by author

Data Preparation: The task of data preparation is to generate slices of images from each 3D CT-scan file both from raw images and mask files. It starts with reading raw and mask images in a ".zip" format using the NiBabel library and converting them into a Numpy array. Then going through each 3D image checking the view angle of each image and trying to convert most of them to Sagittal view. Next, I did generate PNG files from each slice and stored them as format "L" which is a grayscale value. In this case, we didn’t need to generate RGB images as each CT-scan slice only has 1D values.

In this task, I used UNet architecture to be able to apply Semantic Segmentation over the dataset. To better learn about both UNet and semantic segmentation I recommend checking out this blog.

I also used Pytorch and Pytorchvision for this task. As I mentioned this repo has a good implementation of UNet using PyTorch and I have been using some codes from it.

As I am working with NIfTI files and to be able to read these files in python I will be using the NiBabel library. NiBabel is a python library to read and write some common medical and neuroimaging file formats such as NIfTI files.

Dice Score: To evaluate how our model is doing on a semantic segmentation task we can use a dice score. The Dice Coefficient is 2 * the Area of Overlap (between predicted masked area and true masked area) divided by the total number of pixels in both images.

Dice Score - Image source
Dice Score – Image source

Training: First I defined the UNet class and next defined the PyTorch dataset class which includes reading and preprocessing images. The preprocessing task includes loading PNG files, resizing them all to one size (in this case 250×250), and converting them all to NumPy array and later to PyTorch tensors. By calling the dataset class (VerSeDataset) we can prepare our data within batches that I defined. To make sure the mapping between raw image and mask image is proper I once called the next(iter(valid_dataloader)) to get the next item in a batch and visualize it.

Later I define the model as model = UNet(n_channels=1, n_classes=1). The number of channels is 1 as I have a grayscale image and not RGB, in case your image is an RGB image you might change the number of channels into 3. The number of classes is 1 as I have only one class of whether one pixel is part of the vertebra or is not. In case your problem is multiclass segmentation you might set the number of classes to how many classes you have. Later I trained the model for some number of epochs. For each batch, I first did calculate the loss value, updated the parameters by doing backpropagation. Later I went over all batches again and only calculated the loss for the validation dataset and stored the loss values. Next, I did a visual look at the loss values for both train and validation and tracked our model’s performance.

After I have saved the model I was able to grab one of the images and pass it to the trained model and received a predicted mask image. By plotting all three images of raw, true mask and predicted mask beside each other I was able to visually evaluate the results.

Raw image (left), True Mask (middle), and Predicted Mask (right)— Image by author
Raw image (left), True Mask (middle), and Predicted Mask (right)— Image by author

As you can see from the above image the model did very well on both sagittal and axial views as the predicted mask is very similar to the true mask area.

You can find the entire code here:

Future works: This task can also be done with 3D UNet as well, and that might be a better approach to learning the structure of the vertebra. Since we have labels for each mask area of each vertebra then we can further do the multiclass mask segmentation. Also, the model performance is best when the image view is sagittal, so maybe converting all slices to sagittal would have the best model.

Thanks for reading!


Towards Data Science is a community publication. Submit your insights to reach our global audience and earn through the TDS Author Payment Program.

Write for TDS

Related Articles