The recent advances in imaging devices have opened the opportunity of better solving computer vision tasks. The next-generation cameras, such as the depth or binocular cameras, capture diverse information, and complement the conventional 2D RGB cameras. Thus, investigating the yielded multi-modal images generally facilitates the accomplishment of related applications. However, the limitations of these devices, such as short effective distances, expensive costs, or long response time, degrade their applicability in practical use. Addressing this problem in this work, we aim at action recognition in RGB videos with the aid of Kinect. We improve recognition accuracy by leveraging information derived from an offline collected database, in which not only the RGB but also the depth and skeleton images of actions are available. Our approach adapts the inter-database variations, and enables the sharing of visual knowledge across different image modalities. Each action instance for recognition in RGB representation is then augmented with the borrowed depth and skeleton features.