3D facial animation had been widely used in many multimedia applications and could be applied to help articulation disorders in articulatory training. In this paper, a mixmodel driven 3D facial synthesis including lip and tongue animation is proposed to provide multi-model feedback such as speech signal, lip motion, and tongue motion. Text-to-speech can generate the speech signal of arbitrary text and provide syllable boundaries. Contextual knowledge based phoneme segmentation is applied to estimate the phoneme boundaries in a syllable and the number of 3D facial models can be effectively reduced. Parametric 3D tongue and lip movement models are smoothed by B-spline to eliminate the jerkiness and synthesize the 3D facial with tongue and lip animation. Integrating boundary information, the speech-synching can be easily accomplished. The mult-model feedbacks in 3D facial animation are used to improve the efficiency of articulatory training. The preliminary experimental results show that this method is feasible.