Knowledge-Based Visual Question Answering (KB-VQA) relies on external knowledge for cross-modal scene understanding and reasoning. Existing methods still suffer from limited reasoning capability due to two major drawbacks: (1) the visual entity anchoring issue, where current methods fail to accurately anchor visual entities from questions, leading to irrelevant knowledge retrieval and misleading reasoning. (2) the visual-aware reasoning issue, where prior approaches overly rely on text-only reasoning while ignoring visual cues, resulting in unreliable reasoning chains. To this end, we propose VAMER, a Visual-Anchored Multimodal Evidence Reasoning framework with two components: (1) For the visual entity anchoring issue, we introduce a Visual Entity Linking (VEL) module that utilizes the reasoning capability of a Visual-Language Model (VLM) to extract semantic and spatial information from questions, which is used to guide semantic-spatial contrastive learning for entity localization. (2) For the visual-aware reasoning issue, we propose a Multimodal Evidence Chain Reasoning (MECR) module that adopts a hierarchical two-phase approach to separately handle evidence chain construction and answer generation, enabling iterative integration of visual and textual information for improved reasoning reliability. Extensive experiments on the OK-VQA, A-OKVQA, and F-VQA datasets demonstrate the effectiveness of the proposed method for Knowledge-based VQA.