A complete emotional expression contains a complex temporal course in a conversation. Related research on utterance and segment-level processing lacks considering subtle differences in characteristics and historical information. In this work, as Deep Scattering Spectrum (DSS) can obtain more detailed energy distributions in frequency domain than the Low Level Descriptors (LLDs), this work combines LLDs and DSS as the speech features. Autoencoder neural network is then applied to extract the bottleneck features for dimensionality reduction. Finally, the long-short term memory (LSTM) is employed to characterize temporal variation of speech emotion for emotion recognition. For evaluation, the MHMC emotion database was collected and used for performance evaluation. Experimental results show that the proposed method using the bottleneck features from the combination of the LLDs and DSS achieved an emotion recognition accuracy of 98.1%, outperforming the systems using LLDs or DSS individually.