This paper proposes an acoustic DSP processor with a neural network core for speech enhancement. Accelerators for convolutional neural network (CNN) and fast Fourier transform (FFT) are embedded. The CNN-based speech enhancement algorithm takes the speech signals spectrogram as the model's input, and predicts the desired mask of speech to enhance speech intelligibility after passing through the CNN model. An array of multiply-accumulator (MAC) and coordinate rotation digital computer (CORDIC) engines are deployed to efficiently compute linear and nonlinear functions. Hardware sharing is applied to reduce hardware area by leveraging the high similarity between CNN and FFT computations. The proposed DSP processor chip is fabricated in a 40-nm CMOS technology with a core area of 4.3 mm2. The chip's power dissipation is 2.17 mW at an operating frequency of 5 MHz. The CNN accelerator supports both convolutional and fully-connected layers and achieves an energy efficiency of 1200-to-2180 GOPS/W, despite the flexibility for FFT. The speech intelligibility can be enhanced by up to 41% under low SNR conditions.