Authors: Jiahong Ouyang (Stanford University)*; Ehsan Adeli (Stanford University); Kilian Pohl (Stanford University); Qingyu Zhao (Stanford University); Greg Zaharchuk (Stanford University)
Multi-modal MRIs are widely used in neuroimaging applications since different MR sequences provide complementary information about brain structures. Recent works have suggested that multi-modal deep learning analysis can benefit from explicitly disentangling anatomical (shape) and modality (appearance) information into separate image presentations. In this work, we challenge mainstream strategies by showing that they do not naturally lead to representation disentanglement both in theory and in practice. To address this issue, we propose a margin loss that regularizes the similarity in relationships of the representations across subjects and modalities. To enable robust training, we further use a conditional convolution to design a single model for encoding images of all modalities. Lastly, we propose a fusion function to combine the disentangled anatomical representations as a set of modality-invariant features for downstream tasks. We evaluate the proposed method on three multi-modal neuroimaging datasets. Experiments show that our proposed method can achieve superior disentangled representations compared to existing disentanglement strategies. Results also indicate that the fused anatomical representation has potential in the downstream task of zero-dose PET reconstruction and brain tumor segmentation.
The code is available at https://github.com/ouyangjiahong/representation-disentanglement.