Network processors are emerging as a programmable alternative to the traditional ASIC-based solutions in scaling up the data plane processing of network services. This work, rather than proposing new algorithms, illustrates the process of, and examines the performance issues in, prototyping a DiffServ edge router with IXP1200. The external benchmarks reveal that although the system can scale to wire speed of 1.8 Gb/s in simple IP forwarding, the throughput declines to 180-290 Mb/s when DiffServ is performed due to the double bottlenecks of SRAM and microengines. Through internal benchmarks, the performance bottleneck was found to be able to shift from one place to another given different network services and algorithms. Most of the results reported here should be applicable to other NPs since they have similar architectures and components.