Consistent with the trend towards the use of many cores in SOC and 3D Chip techniques, this paper proposes a "single-cycle ring" interconnection (SC-Ring) with ultra-low latency and minimal complexity. The proposed SC-Ring allows multiple single-cycle transactions in parallel. The main features of the circuit-switched design include a set of 3-ported circuitswitched routers (4∼16) and a performance/timing effective arbiter. The arbiter, called "BTPC", features single-cycle arbitration and routing-control by means of the novel Binary-Tree paths convergence and path-prediction mechanisms, to provide a highly reduced time complexity. By combining this with the integration of 3D chips, the proposed ring-based interconnection offers several advantages for hierarchical clustering in future many-core systems, in terms of cost, latency, and power reductions. Moreover, based on the proposed SC-Ring, this work realizes a "level-1 non-uniform cache architecture" (L1-NUCA) for fast data communication without cache-coherency in facilitating multithreading/multi-core as a case study. Finally, experimental results show that our approach yields promising performance.