This paper addresses two issues hindering the advances in accurate image alignment. First, he performance of descriptor-based approaches to image alignment relies on the chosen descriptor, but the optimal descriptor typically varies from image to image, or even pixel to pixel. Second, the neighborhood structure for smoothness enforcement is usually predefined before alignment. However, object boundaries are often better discovered during alignment. The proposed approach tackles the two issues by adaptive descriptor selection and dynamic neighborhood construction. Specifically we associate each pixel to be aligned with an affine transformation, and integrate the learning of the pixel-specific transformations into image alignment. The transformations serve as the common domain for descriptor fusion, since the local consensus of each descriptor can be estimated by accessing the corresponding affine transformation t allows us to pick the most plausible descriptor for aligning each pixel. On the other hand more object-aware neighborhoods can be produced by referencing the consistency between the learned affine transformations of neighboring pixels. The promising results on popular image alignment benchmarks manifests the effectiveness of our approach.