In this chapter, we present a Manifold Learning viewpoint on the analysis of data arising from multiple modalities. We assume that the high-dimensional multimodal data lie on underlying low-dimensional manifolds and devise a new data-driven representation that accommodates this inherent structure. Based on diffusion geometry, we present three composite operators, facilitating different aspects of fusion of information from different modalities in different settings. These operators are shown to recover the common structures and the differences between modalities in terms of their intrinsic geometry and allow for the construction of data-driven representations which capture these characteristics. The properties of these operators are demonstrated in four applications: recovery of the common variable in two camera views, shape analysis, foetal heart rate identification and sleep dynamics assessment.