Hello everyone. My name is Zongwei Zhou. Our paper provides pre-trained 3D models, which learn representation directly from a large number of unlabeled images to generate powerful target models through transfer learning. So, we nickname these generic models as Models Genesis.
Nowadays, ImageNet-based transfer learning is a common practice in medical image analysis. In contrast, our Models Genesis are different from ImageNet models in three ways:
First, we pre-train models directly on medical images, while ImageNet models are pre-trained on natural images. We believe that transfer learning from medical images to medical images should be more powerful than from natural images to medical images.
Second, Models Genesis are trained in 3D directly, while ImageNet models have to be trained in 2D. The most prominent medical modalities are formed in 3D, such as CT and MRI. To fit the ImageNet-based transfer learning paradigm, we have to solve a 3D problem in 2D, which definitely loses 3D spatial information and inevitably compromises the performance. We believe that 3D imaging tasks should be solved in 3D.
Most importantly, ImageNet demands a huge amount of annotation efforts, but we pre-trained Models Genesis by self-supervised learning without using any expert annotation.
Here shows the diagram of our self-supervised learning framework. We design it as a simple image restoration task. Given an image, we first deform it and then feed into a model, let the model restore the original image. We adopt the V-Net structure. The ground truth is the original image, and the input is the deformed image. To deform-an image, we propose four different approaches.
First, non-linear transformation. In CT scans, the pixel intensity of air, organ, or bone, has to be in a certain range of Hounsfield Unit, which means, CT scan itself naturally comes with the pixel-wise annotation. Therefore, we apply a non-linear translating function to the CT images. By restoring the original intensity values, the model needs to learn from the organ appearance, including shape and intensity distribution.
Second, local pixel shuffling. We randomly shuffle the pixel position within small regions and then let the model learn to recover the original image. By doing so, the model must learn the organ texture and local boundaries.
We also have image out-painting and in-painting, where some small regions are hidden from the model by random numbers. In out-painting, to restore the original image, the model must learn from organ spatial layout and global geometry by extrapolation; and in in-painting, the model must learn the local continuities of organs by interpolation.
We combined these four deformations together to let the model learn from multiple perspectives. Our ablation study shows that the combined approach performs more robust compared with each individual. Also, our self-learning framework is scalable because it is easy to incorporate any other meaningful image deformations.
After pre-training, the encoder could be used for target classification tasks, and the encoder-decoder together could be used for target segmentation tasks. We have evaluated Models Genesis on seven different medical applications, including classification and segmentation on CT, MRI, Ultrasound, and X-ray images, across diseases and organs.
I will present the major conclusions here. First thing first, 3D models are critical to utilize spatial information offered by 3D medical images. However, training 3D models directly is not easy, because they usually have more parameters to optimize. Two out of three target applications show that they perform equivalent or even worse than 2D ImageNet models. But our pre-trained Models Genesis significantly outperform 3D models trained from scratch. Here, the red bar is our Models Genesis, and the grey one is learning 3D models from scratch. Furthermore, 3D Models Genesis consistently outperform any 2D approaches, including state-of-the-art ImageNet models and our Models Genesis in 2D. We here introduce these degraded 2D Models Genesis to have an apple-to-apple comparison with ImageNet models. As seen, Models Genesis 2D offer performances equivalent to ImageNet models. This result is unprecedented because no self-supervised methods have thus far performed as well as ImageNet-based transfer learning.
Through all seven medical applications, we envision that Models Genesis may serve as a primary resource of 3D transfer learning for medical imaging. In this paper, we pre-trained Models Genesis only from LUNA16 and NIH Chest X-ray without using the labels tagged with these datasets. We plan to pre-train Models Genesis on other body regions and other modalities such as MRI, and eventually pre-train Models Genesis from all the available medical images on the Internet. We make the development of Models Genesis open science and invite researchers around the world to contribute to this effort. We hope that our collective efforts will lead to the Holy Grail of Models Genesis, effective across diseases, organs, and modalities.
For more information, please join us tomorrow for the poster session.
Talk in Mila
Hello everyone. My name is Zongwei Zhou. Our paper provides pre-trained 3D models, which learn representation directly from a large number of unlabeled images to generate powerful target models through transfer learning. So, we nickname these generic models as Models Genesis.
Nowadays, ImageNet-based transfer learning is a common practice in medical image analysis. In contrast, our Models Genesis are different from ImageNet models in three ways:
First, we pre-train models directly on medical images, while ImageNet models are pre-trained on natural images. We believe that transfer learning from medical images to medical images should be more powerful than from natural images to medical images.
Second, Models Genesis are trained in 3D directly, while ImageNet models have to be trained in 2D. The most prominent medical modalities are formed in 3D, such as CT and MRI. To fit the ImageNet-based transfer learning paradigm, we have to solve a 3D problem in 2D, which definitely loses 3D spatial information and inevitably compromises the performance. We believe that 3D imaging tasks should be solved in 3D.
Most importantly, ImageNet demands a huge amount of annotation efforts, but we pre-trained Models Genesis by self-supervised learning without using any expert annotation. And for the very first time, we are going to show you that our Models Genesis, even with zero expert annotation, outperform ImageNet-based transfer learning across diseases and organs.
Here shows the diagram of our self-supervised learning framework. We design it as a simple image restoration task. Given an image, we first deform it and then feed into a model, let the model restore the original image. We adopt the V-Net structure. The ground truth is the original image, and the input is the deformed image. To deform-an image, we propose four different approaches.
First, non-linear transformation. The intensity values in the CT scans have their own practical meanings. This is different from natural images. For example, in natural images, a flower can be any color, but in CT scans, the pixel intensity of air, organ, or bone, has to be in a certain Hounsfield Unit range. In other words, the intensity values in CT can be considered as a pixel-wise annotation.
Therefore, we apply a non-linear translating function to the CT images. By restoring the original intensity values, the model needs to learn from the organ appearance, including shape and intensity distribution.
Second, local pixel shuffling. We randomly shuffle the pixel position within small regions and then let the model learn to recover the original image. By doing so, the model must learn the organ texture and local boundaries.
We also have image out-painting and in-painting, where some small regions are hidden from the model by random numbers. In out-painting, to restore the original image, the model must learn from organ spatial layout and global geometry by extrapolation; and in in-painting, the model must learn the local continuities of organs by interpolation.
We combined these four deformations together to let the model learn from multiple perspectives. As shown in the figures, compared with each individual approach, the combination does not always offer the best performance,
but in case it is not the best, it performs as well as the best, statistically.
The combined approach shows more robust across all five target tasks. Also, our self-learning framework is scalable because it is easy to incorporate any other meaningful image deformations.
Here comes the question, what is a meaningful image deformation to our framework? You may think, okay, these guys just did some sort of fancy data augmentation to the image and ask the model to restore the original one.
...
After pre-training, the encoder could be used for target classification tasks, and the encoder-decoder together could be used for target segmentation tasks. We have evaluated Models Genesis on seven different medical applications, including classification and segmentation on CT, MRI, Ultrasound, and X-ray images, across diseases and organs.
I will present the major conclusions here. First thing first, 3D models are critical to utilize spatial information offered by 3D medical images. However, training 3D models directly is not easy, because they usually have more parameters to optimize. Two out of three target applications show that they perform equivalent or even worse than 2D ImageNet models. But our pre-trained Models Genesis significantly outperform 3D models trained from scratch. Here, the red bar is our Models Genesis, and the grey one is learning 3D models from scratch. Furthermore, 3D Models Genesis consistently outperform any 2D approaches, including state-of-the-art ImageNet models and our Models Genesis in 2D. We here introduce these degraded 2D Models Genesis to have an apple-to-apple comparison with ImageNet models. As seen, Models Genesis 2D offer performances equivalent to ImageNet models. This result is unprecedented because no self-supervised methods have thus far performed as well as ImageNet-based transfer learning.
Do you really need to build a large scale Medical ImageNet?
...
Therefore, considering the domain gap between medical imaging and natural imaging, we conclude that a large scale systematic labeled Medical ImageNet is necessary. Our Models Genesis are not designed to replace such a large, strongly annotated dataset for medical image analysis like ImageNet for computer vision, but rather helping create one.
Through all seven medical applications, we envision that Models Genesis may serve as a primary resource of 3D transfer learning for medical imaging. In this paper, we pre-trained Models Genesis only from LUNA16 and NIH Chest X-ray without using the labels tagged with these datasets. We plan to pre-train Models Genesis on other body regions and other modalities such as MRI, and eventually pre-train Models Genesis from all the available medical images on the Internet. We make the development of Models Genesis open science and invite researchers around the world to contribute to this effort. We hope that our collective efforts will lead to the Holy Grail of Models Genesis, effective across diseases, organs, and modalities.
I would like to thank all the co-authors: Vatsal, Mahfuzur, Ruibin, Nima, Dr. Gotway, and Dr. Liang.