Analog over-the-air computing enables a swarm of end-user devices to efficiently conduct distributed learning, where the intermediate parameters of users, such as gradients, are modulated and transmitted via a group of orthogonal waveforms, and can be mixed directly at a server without individually detecting the feedback parameters of each user. Nonetheless, the scarcity of orthogonal waveforms, as well as communication resources of the end-user devices, are throttling this paradigm in adopting complex deep learning models. To balance the tradeoff between communication efficiency and accuracy performance, we study model pruning for analog over-the-air distributed learning in this paper. First, a model pruning scheme is proposed to improve the communication efficiency of analog over-the-air training. An importance measure for model parameter pruning is also designed based on the analog over-the-air aggregated gradient, which can characterize the contribution of each parameter without removing channel fading and electromagnetic interference. Second, an analytical expression of the training error upper bound is derived, which shows the proposed scheme is able to converge even when the aggregated gradient is corrupted by heavy-tailed electromagnetic interference with an infinite variance. Finally, several experimental results are provided to show the performance gains achieved by our proposed scheme, and also verify the correctness of analytical results.