Utilizing multi-bit flip-flops (MBFFs) is one of the most effective power optimization techniques in modern nanometer integrated circuit (IC) design. Most of the previous work apply MBFFs without doing placement refinement of combinational logic cells. Such problem formulation may result in less power reduction due to tight timing constraints with fixed combinational logic cells. This paper introduces a novel placement flow with clock-tree aware flip-flop merging and MBFF generation, and proposes the corresponding algorithms to simultaneously minimize flip-flop power and clock latency when applying MBFFs during placement. Experimental results based on the IWLS-2005 benchmark show that our approach is very effective in not only flip-flop power but also clock latency minimization without degrading circuit performance. To our best knowledge, this is also the first work in the literature which considers clock trees during flip-flop merging and MBFF generation.