In this paper, we describe a novel video shot retrieval where each shot is separated into multiple video cubes. Therefore, every video shot can be represented by a linear combination of video cubes, which are used to calculate the similarity measurements among video shots in terms of video cube similarity. The position of a voxel in a cube is characterized with (x, y, t) 3D coordinates and the spatial-temporal features within video cubes are extracted with a set of analytical formulas derived from the proposed 3D moment-preserving technique. Then, the content of a video cube is approximated by three blocks generated from projecting the cube onto xy, yt and tx planes. Based on the visual patterns of xy, yt, and tx blocks, a fast video shot retrieval scheme is proposed. As compared with other key-frame based representations, the proposed cube-based video retrieval improves the retrieval accuracy without sacrificing the execution speed. Experimental results show the efficiency and effectiveness of the proposed video retrieval.