The Dawn of Visual Intelligence
On Thursday, 25th September 2025, our very own Talia Bender and Jeremy Michaels officially launched the new whitepaper, proudly announcing the dawn of Visual General Intelligence (VGI) to the world.
This is the third of six articles covering all six sections of that whitepaper – The Future of Visual Intelligence: AI Vision Through The Looking Glass – which includes our belief that VGI will be the quintessential proof point of Artificial General Intelligence (AGI). Crucially, it also outlines the role VGI plays in accelerating the paths to zero downtime and zero harm.
You can now download the whitepaper and enjoy our on-demand webinar with our founders – Nico Klingler and Gaudenz Boesch – stepping us through what VGI is and why it matters. In this article, we summarize the third section of that whitepaper.

“The future of AI is not just about making computers smarter but about making them see and understand the world like humans do.”
– Andrej Karpathy (Computer Scientist and co-founder of Open AI)
See more:
- VGI Whitepaper
- Official VGI Whitepaper Launch Interview
- Q&A with the Founders: Introducing VGI Webinar
Visual General Intelligence: through the looking glass of AI
Lewis Carroll’s ‘Through the Looking Glass’ invited readers to step into a world where logic followed new rules, and familiar things became extraordinary. Today, we find ourselves in a similar moment with Artificial Intelligence (AI). The mirror we are stepping through is not fiction but Visual General Intelligence (VGI) – a leap beyond what machines have ever been able to perceive, interpret, and understand.
Computer Vision (CV) and Visual Intelligence (VI) have already proven powerful: detecting defects in factories, monitoring safety risks, and interpreting medical scans. But VGI promises something different. It is not about incremental accuracy or faster pattern recognition. It is about machines redefining what it means to see – surpassing human vision in depth, adaptability, and contextual awareness.
In this blog, adapted from our whitepaper – The Future of Visual Intelligence: AI Vision Through the Looking Glass – we explore why VGI may become the quintessential proof point of Artificial General Intelligence (AGI), and why businesses must prepare now.

Setting the scene: AI today
At its core, AI refers to systems capable of performing tasks that once required human intelligence: reasoning, problem-solving, perception, language. Within AI, Machine Learning (ML) and Deep Learning (DL) have driven rapid progress.
Computer Vision, a key branch of AI, mimics human sight – interpreting images and video to detect objects, recognize faces, or identify anomalies. Over time, this evolved into VI, which goes beyond recognition to understand context, relationships, and meaning.
Yet these systems remain narrow. A model trained to identify a defective circuit board cannot instantly pivot to detecting hazards on a construction site. Each use case demands retraining, new data, and costly configuration. That’s where VGI shifts the paradigm.

Defining Visual General Intelligence (VGI)
VGI is the capability of AI systems to understand and reason about visual information across all domains and contexts, achieving human-level comprehension and beyond.
Imagine a system that:
- Recognizes not just what is visible but infers why it matters
- Adapts to new environments without retraining
- Performs open-world reasoning across diverse industries
- Learns continuously from real-world feedback
If AGI represents human-level intelligence across domains, VGI represents human-level vision across all contexts – a domain-specific but universal form of intelligence.
This makes VGI an ideal proof point for AGI. Unlike abstract reasoning, visual intelligence can be measured objectively against medical and cognitive benchmarks such as perception speed, recall accuracy, and contextual reasoning. When machines surpass these thresholds, we may know we’ve crossed into AGI territory.

Why vision matters most
Among the five human senses, vision dominates. Studies show that up to 80% of human perception and learning comes through sight. This makes vision not only our richest data stream but also the most natural testbed for general intelligence.
As Yann LeCun (Chief AI Scientist at Meta) has noted, training models on visual data is critical to achieving AGI. Compared to text, which is limited by grammar and structure, visual data is vast, continuous, and multi-dimensional. A four-year-old child’s optic nerves process more information in bits per second than most large language models (LLMs) have been trained with in their entire learning.
If machines can learn to see as humans do – and eventually better – this is the clearest, most tangible milestone toward general intelligence.

When might AGI arrive?
Timelines vary, but the trajectory is accelerating. Analysts suggest AGI could arrive by 2040, while entrepreneurs predict much sooner: Sam Altman (OpenAI) points to 2027, Demis Hassabis (DeepMind) estimates 2030-2035, and Ray Kurzweil suggests 2029.
VGI may emerge first. Just as specialized milestones (e.g. protein folding) marked breakthroughs in AI before broader capabilities, VGI could become the measurable indicator that AGI is within reach.
Benchmarks might include:
- Latency – neural signals distinguish targets in ~150 ms, and humans categorize images in <300 ms (machines matching or surpassing this is a key milestone)
- Accuracy – recognition, recall, and reasoning across domains without retraining
- Adaptability – self-improving systems capable of generalizing from minimal data

The starting gun: market forces driving adoption
Technological revolutions don’t happen in isolation – they are accelerated by market dynamics. Several forces are converging to make VGI not only possible but inevitable.
1. Foundation model race
Open-source vision models are accelerating innovation. Businesses can now focus less on building from scratch and more on scaling applications.
2. Hardware democratization
GPUs and edge devices are becoming more affordable, enabling real-time inference everywhere – from factories to smartphones.
3. Reduced data dependency
New training methods slash the need for labeled datasets, enabling rapid iteration and lower costs. Closed-loop learning with human-in-the-loop (HITL) feedback ensures continuous improvement.
Together, these forces create a tipping point where VGI is no longer just a research goal but a commercial inevitability.
Why the application layer matters most
Models alone don’t deliver value. The real impact emerges at the application layer – the orchestration systems that connect vision models to real-world decisions.
As Gerard Corrigan, CTO at viso, explains:
“The application layer isn’t just important – it’s where the magic happens, because VGI without applications is just expensive computer vision. VGI with the right application architecture… that’s when we stop building software and start building digital senses.”
The application layer ensures that insights from visual models translate into action: alerting workers to risks, adjusting supply chains, or reallocating resources in real time. Just as the app ecosystem unlocked the smartphone revolution, application layers will unlock the VGI revolution.
Implications for global industries
The leap to VGI will ripple across industries, for example:
- Manufacturing: real-time defect detection, predictive maintenance, zero downtime
- Construction: adaptive safety systems, progress tracking, anticipatory risk prevention
- Waste management: intelligent sorting, contamination detection, facility optimization
These aren’t futuristic scenarios: they’re the next phase of transformation. For companies, the question is not if but how quickly they can integrate VGI into their operations.

Beyond perception: building anticipatory systems
Perhaps the most profound shift VGI brings is moving from re-active to anticipatory systems. Today, AI often responds to events after they occur. With VGI, machines will predict risks, anticipate needs, and act proactively.
Imagine a logistics network that reroutes itself before a disruption, or a workplace safety system that prevents an accident rather than recording one. This is the true promise of VGI: not machines that see better, but machines that understand and act with foresight.
Looking ahead
VGI represents more than a technical milestone. It is a civilizational threshold. For the first time, machines may share with us not only the ability to process language but the ability to perceive and understand the world visually.
In that moment, we will not just be building smarter software. We will be building digital senses that extend and surpass our own.

Why download the whitepaper?
This blog has introduced some key concepts and arguments from our whitepaper. But the full paper goes deeper:
- Definitions and benchmarks of AI, VI, and VGI
- Expert predictions on timelines to AGI and VGI
- Technical frameworks for proving VGI as the North Star of AGI
- Industry case studies showing real-world value at the application layer
👉 Download the full whitepaper – The Future of Visual Intelligence: AI Vision Through the Looking Glass – to explore how your organization can harness VGI and prepare for the most important technological leap of our time.