Automated human action recognition has the potential to play an important role in public security, for example, in relation to the multiview surveillance videos taken in public places, such as train stations or airports. This paper compares three practical, reliable, and generic systems for multiview video-based human action recognition, namely, the nearest neighbor classifier, Gaussian mixture model classifier, and the nearest mean classifier. To describe the different actions performed in different views, view-invariant features are proposed to address multiview action recognition. These features are obtained by extracting the holistic features from different temporal scales which are modeled as points of interest which represent the global spatial-temporal distribution. Experiments and cross-data testing are conducted on the KTH, WEIZMANN, and MuHAVi datasets. The system does not need to be retrained when scenarios are changed which means the trained database can be applied in a wide variety of environments, such as view angle or background changes. The experiment results show that the proposed approach outperforms the existing methods on the KTH and WEIZMANN datasets.