In the recent years, multicore processor designs have become increasingly popular for embedded applications, but diversified inter-core communication mechanisms have led to the difficulties in software development, integration and migration. A unified, portable, and efficient inter-core communication mechanism would have helped reduce these difficulties significantly, but such a solution did not exist today. We proposed a scheme called MSG, which provides users with a set of essential message-passing programming interfaces adopted from MPI and MCAPI, including blocking and non-blocking point-to-point communications, one-sided communications, and collective operations. We experimented and evaluated our design methodology with the case study on the IBM CELL, a popular heterogeneous multicore platform. On the CELL platform, our MSG library fitted in the 256KB local memory on each individual processor core and outperformed two existing communication libraries, DaCS and CML. With a systematic approach, we showed how optimization could be done on the CELL platform to improve the performance of the MSG library. Hopefully, our experiences help the design and development of communication libraries for existing and future multicore platforms and embedded applications.