Indoor localization is a fundamental issue in IoT (Internet of Things). On the other hand, IoT provides a lot of networked devices that would help increase the precision of indoor localization. Particle Filter (PF) is widely used in indoor localization due to its flexibility that can adapt to different, and usually complex, indoor floorplans and furniture placements. In this work, we consider the fusion of multi-sensory data using PF. We focus on three types of popular sensors: IM (inertial measurement) sensor, RF (radio frequency) sensor, and environmental visual sensor. In particular, with environmental visual sensors, there is no extra device to be attached to localized targets. We propose a PF model that can adopt these types of sensory inputs. We show that in scenarios where visual sensory inputs are available, sub-meter precision can be achieved and in places with no visual coverage, seamless localization with reasonable precision can be supported by other sensors. Field trial results are presented, which show that our model is quite suitable for areas like lobby, corridor, and meeting room.