Accurate human detection is still a challenging topic due to complicated environments in the real world. In addition, the RGB-D cameras are becoming popular at reasonable price, such as Microsoft Kinect sensor, which provides both RGB and depth data. The depth information often helpful for detection. We adopt the R-CNN method in this paper, which combines the Selective Search technique to generate region proposals and the CNNs (Convolutional Neural Networks) to learn features. A depth map encoding technique (HHA) is adopted to match the CNNs format for learning features. The HHA and RGB images are our inputs. We propose several algorithms to combine their information in constructing various human detectors. Our information fusion structures include CNN, SVM together with PCA for features reduction. More accurate human detection results are shown with the aid of depth information.