Testing networking devices before releasing them onto the market is a way of ensuring quality and robustness. Replaying artificial or realworld traffic is a method to test networking devices. Using real-world traffic is desirable as it uncovers more realistic properties. The challenges of testing with real-world traffic are mainly the high volume of the captured traces and the prolonged time required for replay testing. In order to efficiently reproduce the failures of networking devices and reduce the replay time, it is necessary to reduce the size of the traces that have triggered the failures. In this work, two algorithms used to downsize the traces but still retain the failures they triggered, Binary Downsizing (BD) and Linear Downsizing (LD), are proposed. Meanwhile, a metric called downsizing ratio (DR), the ratio between the size of the downsized traces and that of the original traces, is defined in order to evaluate the efficiency of the trace downsizing. Three kinds of probes following the basic RFC benchmarking requirements, ARP, ICMP, and HTTP requests, are regularly sent to diagnose the devices during the testing. ARP and ICMP probes test the reachability of a networking device hosted on the local network, and HTTP probes check if the device still responds to users¿ requests. The evaluation of failure distribution shows that 70 percent of failures happened because they failed to respond to one of the three probes, 23 percent failed to respond to two probes, and 7 percent failed to respond to all probes. From the downsizing experiments, LD was inferred to have a slightly higher DR than that of BD, but BD generally would require fewer iterations than LD.