This paper introduces the notion of soft bits to address the rate-distortion optimization for learning-based image compression. Recent methods for such compression train an autoencoder end-to-end with an objective to strike a balance between distortion and rate. They are faced with the zero gradient issue due to quantization and the difficulty of estimating the rate accurately. Inspired by soft quantization, we represent quantization indices of feature maps with differentiable soft bits. This allows us to couple tightly the rate estimation with context-adaptive binary arithmetic coding. It also provides a differentiable distortion objective function. Experimental results show that our approach achieves the state-of-the-art compression performance among the learning-based schemes in terms of MS-SSIM and PSNR.