Traffic management is known to be important to effectively utilize the high bandwidth provided by datacenters. Recent works have focused on identifying elephant flows and rerouting them to improve network utilization. These approaches however require either a significant monitoring overhead or hardware/end-host modifications. In this paper, we propose FlowSeer, a fast, low-overhead elephant flow detection and scheduling system using data stream mining. Our key idea is that the features from flows' first few packets allow us to train the streaming classification models that can accurately and quickly predict the rate and duration of any initiated flow. With these predicted information, FlowSeercan adapt routing polices of elephant flows to their demands and dynamic network conditions. Another nice property of FlowSeeris its capability of enabling the controller and switches to perform cooperative prediction. Most of decisions can be made by switches locally, thereby reducing both detection latency and signaling overhead. FlowSeerrequires less than 100 flow table entries at each switch to enable cooperative prediction, and hence can be implemented on off-the-shelf switches. The evaluation via both experiments in realistic virtual networks and trace-driven simulations shows that FlowSeerimproves the throughput by multiple times over Hedera, which pulls flow statistics, and performs comparably to Mahout, which needs end-host modification.
- Flow classification
- software defined data center networks
- streaming mining