Recent research shows the stream processing model is suitable for portable media applications. However, previous implementations of stream processors are suffered from their power consumption and cost for chip area. Thus, these designs focus on super computer architecture and scientific computation instead of real-time media applications. This paper proposes an arithmetic logic unit (ALU) cluster with Advanced Microcontroller Bus Architecture (AMBA) platform interface, which is utilized as a reconfigurable hardware accelerator for portable media applications. The proposed design is implemented and fabricated using TSMC 0.15um technology with backend Magnetic RAM (MRAM) process integration. Floating point unit (FPU) improves 3.2 times higher averagely of performance and only increases 10.8% area overhead. The measurement result also reveals double power efficiency over previous designs using traditional architectures. Outstanding area-performance trade-off efficiency in FPU and homogeneous cores, power efficiency and design methodologies of this work contribute a turnkey solution for modern portable multimedia devices.