This research investigates real-time fingertip detection in RGB images/frames captured from such wearable devices as smart glasses. A modified Mask Regional Convolutional Neural Network (Mask R-CNN) is proposed with one region-based CNN for hand detection and another three-layer CNN for locating the fingertip. The processing speed is high enough to facilitate several interesting applications. One application is to trace the location of a user's fingertip from first-person perspective to form writing trajectories. A text input mechanism for smart glasses can thus be implemented to enable a user to write letters/characters in air as the input and even interact with the system using simple gestures. Experimental results demonstrate the feasibility of this new text input methodology.