We propose a weakly supervised object detection network based on eye-tracking data. A large number of training samples cannot be used due to the following problems: (1) the labels of training samples in object detection are not all pixel-level and (2) the cost of labeling is too high. Thus, we introduce a framework whose input combines images with only image-level labels and eye-tracking data. Based on the position given by the eye-tracking data, the framework has effective performance even in the case of incomplete sample annotation. Thus, we use an eye-tracker to collect the data on the most interesting area in the sample images and present the data in the fixations way. Then, the bounding boxes produced by the fixations data and the original image-level label become the input data of the object detection network. In this way, eye-tracking data helps us selecting the bounding boxes and providing detailed location information. Experiment results verify that the framework is effective with the support of eye-tracking data.