Recently, embedded multicore platforms have become popular, but software development for such platforms has been very challenging. While message-passing is a popular programming model for parallel applications, it is not adequately supported on the current embedded multicore platforms. Similar to the situations in '80s∼'90s, applications are hardly portable across parallel computers before the advent of MPI. Unfortunately, MPI is too big for most embedded platforms of today. Moreover, the message-passing functions need to utilize the architectural features to offer optimized performance, but such platform-specific optimizations often hurt the portability. This paper addresses the portability and performance issues by designing a new message-passing library with a three-layer modular design. The top two layers are mostly platform-independent, and the bottom layer enables platform-specific optimizations. We discuss the performance issues in the paper and evaluate the issues with experimental results.