In this paper, we propose a load balancing strategy that has the following features: (1) estimating the potential load of real gateways with low computation and no communication overhead, (2) asynchronous alarm sent when the utilization of a real gateway exceeds a critical threshold, and (3) WAP-awareness. We also propose a scalable WAP gateway (SWG) that consists of a WAP dispatcher and a cluster of real gateways. The WAP dispatcher is a front-end distributor with our load balancing strategy. To prevent the WAP dispatcher from becoming a bottleneck, the WAP dispatcher distributes mobile clients' requests in kernel space and does not process outgoing gateway-to-client responses. Experimental results show that our SWG has better load balancing performance, throughput, and delay compared to the LVS and the Kannel gateway. Although WAP services are not so popular as expected, our load balancing strategy, can be easily adapted to other distributed services.