We consider multiple devices with local datasets collaboratively learning a global model through device-to-device (D2D) communications. The conventional decentralized stochastic gradient descent (DSGD) solution for this problem assumes error-free orthogonal links among the devices. This is based on the assumption of an underlying communication protocol that takes care of the noise, fading, and interference in the wireless medium. In this work, we show the suboptimality of this approach by designing the communication and learning protocols jointly. We first consider a point-to-point (P2P) communication scheme by scheduling D2D transmissions in an orthogonal fashion to minimize interference. Then, we propose a novel over-the-air consensus scheme by exploiting the signal superposition property of wireless transmission, rather than avoiding interference. In the proposed OAC-MAC scheme, multiple nodes align their transmissions toward a single receiver node. For both schemes, we cast the scheduling problem as a graph coloring problem. We then numerically compare the two approaches for the distributed MNIST image classification task under various network conditions. We show that the OAC-MAC scheme attains better convergence speed and final accuracy thanks to the improved robustness against channel fading and noise. We also introduce a noise-aware version of the OAC-MAC scheme with further improvements in the convergence speed and accuracy.