VICI: VLM-Instructed Cross-view Image-localisation

Xiaohan Zhang; Tavis Shore; Chen Chen; Oscar Mendez; Simon Hadfield; Safwan Wshah

doi:10.1145/3728482.3757386

What is it about?

This paper introduces VICI, a two-stage AI system that matches limited FoV street photos to satellite images for geolocation. It first retrieves likely satellite matches using visual features, then uses a vision-language model like Gemini to rerank and justify the best match, boosting accuracy and interpretability in real-world settings where panoramic views aren't available.

Photo by Z on Unsplash

Why is it important?

This work improves real-world visual geolocation by matching everyday narrow-angle street photos to satellite images, without GPS. It’s crucial for robots, navigation, and rescue operations where GPS fails, and it boosts accuracy and trust by using AI that explains its decisions.

This page is a summary of: VICI: VLM-Instructed Cross-view Image-localisation, October 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3728482.3757386.
You can read the full text:

Read

Contributors

The following have contributed to this page

Matching Street Photos to Satellite Images Using AI Reasoning

What is it about?

Why is it important?

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

Matching Street Photos to Satellite Images Using AI Reasoning

What is it about?

Featured Image

Why is it important?

Read the Original

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management