In the recent years, several research works have been conducted on collecting context data from various sensors for activity inference. We observe that users perform several actions in their mobile phones: taking photos, performing check-ins, and accessing Wi-Fi networks. These actions generate spatial-temporal data that could be utilized to capture user activities. Spatial-temporal data could indicate that a user stays in a certain location at a particular time for a certain activity. In addition, by referring to social media data, one could also infer user activities. Three types of features are extracted for activity inference: 1) geographical feature, indicating where a user performs activities; 2) temporal feature, indicating when a user performs activities; and 3) semantic feature, showing the semantic concept of a place from location-based social networks. Here, we propose Spatial-Temporal Activity Inference Model (STAIM) to infer user activities from data with those three features. In addition, to determine the weight for each feature, we further propose three methods based on frequency, entropy, and entropy-frequency. Experimental results show that STAIM is able to effectively infer user activities, achieving 75% accuracy on average. Moreover, STAIM could infer user activities even when there is no training data (with some performance loss). Moreover, sensitive analysis of parameters is also conducted to select the most optimal parameter.