It is common for network researchers and system developers to run packet processing algorithms on UNIX-like operating systems. For the ease of development, complex packet processing algorithms are often implemented at the user-space level. As a result, performance benchmarks for packet processing algorithms often show a great gap when packets are input from different sources. An algorithm that performs well by reading packets from a raw packet trace file may get a worse result when it reads packets directly from a network interface. Such a phenomenon gets much worse when the algorithm is going to process packets in-line. In this paper, we identify the performance bottleneck of existing in-line packet processing implementations in the Linux operating system. Based on the observation, a new software architecture, named Fast Queue, is proposed and implemented to show that the identified bottleneck can be effectively eliminated. Experiments show that the proposed software architecture reduces 30% of CPU utilization. In addition, the overall system throughput can be improved by a factor of 1.6 when it is applied to the well-known snort-inline open source intrusion detection system.