Most existing datacenters still use Equal-Cost Multi-Path (ECMP) to achieve network load balancing. But such a stateless load balancing mechanism is not able to react adaptively when network congestion occurs. It will cause datacenters unable to make good use of network resources under the unbalanced network state. The main reason most datacenters still using ECMP is that the cost of realizing the state-of-art load balancing technologies is too high since all switches in datacenters need to be replaced with programmable switches, such as P4 switches. In this paper, we propose a cost-effective congestion-aware load balancing (CCLB) scheme that only needs to replace a portion of switches with programmable switches to achieve congestion-aware load balancing. CCLB makes good use of Explicit Congestion Notification (ECN) in the IP layer to be aware of network congestion and uses flowlet switching that slices large flows into small sub-flows to achieve load balancing. Experiment results show that the average flow completion time of the proposed CCLB is 27% shorter than that of ECMP in large flows. Compared to a classical congestion-aware mechanism, HULA, the average flow completion time of the proposed CCLB is slightly longer, but our switch replacement cost is much lower than HULA's.