Scene recognition has a wide range of applications, such as object recognition and detection, content-based image indexing and retrieval, and intelligent vehicle and robot navigation. In particular, natural scene images tend to be very complex and are difficult to analyze due to changes of illumination and transformation. In this study, we investigate a novel model to learn and recognize scenes in nature by combining locality constrained sparse coding (LCSP), Spatial Pyramid Pooling, and linear SVM in end-to-end model. First, interesting points for each image in the training set are characterized by a collection of local features, known as codewords, obtained using dense SIFT descriptor. Each codeword is represented as part of a topic. Then, we employ LCSP algorithm to learn the codeword distribution of those local features from the training images. Next, a modified Spatial Pyramid Pooling model is employed to encode the spatial distribution of the local features. For the final stage, a linear SVM is employed to classify local features encoded by Spatial Pyramid Pooling. Experimental evaluations on several benchmarks well demonstrate the effectiveness and robustness of the proposed method compared to several state-of-the-art visual descriptors.