Observing the widespread use of Kinect-like depth cameras, in this work, we investigate into the problem of using sole depth data for human action recognition and retrieval in videos. We proposed the use of simple depth descriptors without learning optimization to achieve promising performances as compatible to those of the leading methods based on color images and videos, and can be effectively applied for real-time applications. Because of the infrared nature of depth cameras, the proposed approach will be especially useful under poor lighting conditions, e.g. the surveillance environments without sufficient lighting. Meanwhile, we proposed a large Depth-included Human Action video dataset, namely DHA, which contains 357 videos of performed human actions belonging to 17 categories. To the best of our knowledge, the DHA is one of the largest depth-included video datasets of human actions.