Noise pollution in modern cities is getting worse and sound sensors are sparse and costly, but it is highly demanded to have a system that can help reason and present the noise pol-lution at any region in urban areas. In this work, we leverage multimodal geo-social media data on Foursquare, Twitter, Flickr, and Gowalla in New York City, to infer and visualize the volume and the composition of noise pollution for ev-ery region in NYC. Using NYC 311 noise complaint records as the approximation of noise pollution for validation, we develop a joint inference and visualization system that inte-grates multimodal features, including geographical, mobil-ity, visual, and social, with a graph-based learning model to infer the noise compositions of regions. Experimental re-sults show that our model can achieve promising results with substantially few training data, compared to state-of-The-Art methods. A NYC Urban Noise Diagnotor system is devel-oped and allowed users to understand the noise composition of any region of NYC in an interactive manner.