Semantic context detection is one of the key techniques to facilitate efficient multimedia retrieval. Semantic context is a scene that completely represents a meaningful information segment to human beings. In this paper, we propose a novel hierarchical approach that models the statistical characteristics of several audio events, over a time series, to accomplish semantic context detection. The approach consists of two stages: audio event and semantic context detections. HMMs are used to model basic audio events, and event detection is performed in the first stage. Then semantic context detection is achieved based on Gaussian mixture models, which model the correlations among several audio events temporally. With this framework, we bridge the gaps between low-level features and the semantic contexts that last in a time series. The experimental evaluations indicate that the approach is effective in detecting high-level semantics.