GPGPUs have emerged as one of the most widely used throughput processors. Deep multithreading and cache hierarchy are the two effective implementations to achieve high throughput computing in modern GPGPUs. However, these are two conflicting design options. Finding a proper design point between the two has become a significant performance factor to GPGPUs. This paper proposes a distributed thread scheduler for dynamic multithreading on GPGPUs. By demonstrating the trade-off issue between the multithreading and cache contention, the proposed scheduler dynamically adjusts the multithreading degree to achieve superior performance. With the proposed scheduler, the cache misses can be decreased by 20.6% and 37.9% on the L1 and L2 cache respectively. The overall performance can be enhanced by an average of 16.4%.