Conventional buffer insertion in timing ECO involves only minimizing the arrival time of the most critical sink in one multi-pin net and neglects the obstacles and the topology of routed wire segments, which may worsen the arrival times of other sinks and burden subsequent timing ECO. This work develops a topology-aware ECO timing optimization (TOPO) flow that comprises three phases - buffering pair scoring, edge breaking and buffer connection, and topology restructuring. TOPO effectively improves the arrival times of violation sinks without worsening those of other sinks. Experimental results indicate that TOPO improves the worst negative slack (WNS) and total negative slack (TNS) of benchmarks by an average of 79.2% and 84.3%, respectively. The proposed algorithm improves the arrival time that is achieved using conventional two-pin net-based buffer insertion by an average of 40.4%, at the cost of consuming 19x runtime. To speed up routing and further improve sink slack, a highly scalable massively parallel maze routing on Graphics Processing Unit (GPU) platform is also developed to enable the proposed flow to explore more solution candidates. High scalability and parallelism are realized by block partitioning and staggering. Experiments reveal that the proposed GPU-based parallel maze routing can achieve near 12x runtime speedup for two-pin routings. With parallelized maze routing, WNS violations in four out of five cases can be resolved.