The future of visual intelligence

Subscribe
Whitepapers

The future of visual intelligence

Visual General Intelligence (VGI) will be the quintessential proof point of Artificial General Intelligence (AGI). Download the whitepaper.
Agriculture
Conservation
Construction
Education
Finance
Healthcare
Hospitality
Insurance
Legal
Logistics
Manufacturing
Mining
Retail
Security
Services
Smart City
Sports
Technology

Prologue: Why this paper?

“I thought you could do anything with a camera that the eye could do, or the imagination could do. I didn’t know that there were things you couldn’t do. So, anything I could think up in my dreams, I attempted to photograph.”

Orson Welles
Filmmaker and scriptwriter

The Crisinity: Living through remarkable and extraordinary times

In this ‘post post-truth’ era, we have moved beyond Hypernormalisation1 into an age beset with unending crises. These create emerging opportunities, underpinned by transformative technological leapfrogs. Democratization of the internet; quantum theory; the transcendence of Artificial Intelligence (AI). The Age of Crisinity is here2.

With the geopolitical, socio-economic, climate, and technological disruptions that we face, new opportunities for advancement are essential. The greatest shift in established systems, transfers of assets, and decentralization have long since been underway. There is an emerging appreciation that we are today inexorably approaching what Ray Kurzweil called ‘The Singularity’. This is the hypothetical point when AI surpasses human intelligence, triggering unprecedented technological growth and disruption. More crucially, Kurzweil describes it as the point where ‘humans transcend biology’.

1 Hypernormalization (Definition): ‘an accepted yet false societal system, that is so pervasive that whilst everyone knows it is in fact fake, people continue to act as if it is an objectively true and rational reality’

2 The Age of Crisinity (Definition): ‘the ability to find opportunities within the crises that we all face’

1. Through the looking glass

In summary: Focus of this section

This chapter introduces VGI as the next leap beyond traditional CV and VI, radically redefining how machines perceive and understand the world. VGI surpasses human visual cognition in depth, adaptability, and context awareness, marking a paradigm shift in AI.
It argues that VGI could serve as the most concrete and measurable proof point for AGI, offering tangible benchmarks through human- machine comparisons in visual comprehension, perception, and reasoning.

VGI represents a journey ‘through the looking glass’ into an augmented reality: one where the traditional hierarchy of perception is fundamentally transformed. Just as Alice discovered a world where logic operated by different rules, we are entering an era where machines will surpass human visual cognition.

This is not merely true in terms of processing speed or data volume, but in the depth and sophistication of visual understanding itself. These systems will perceive patterns potentially invisible to human eyes, comprehend spatial relationships in dimensions we cannot fathom, and extract meaning from visual information with unsurpassed clarity.

This transition promises to reveal that what we considered the pinnacle of visual intelligence – human perception and understanding – was merely the beginning of a far more expansive visual consciousness. Machines will not simply see better than we can: they will redefine what it means to see, transforming our understanding of reality itself through their superhuman visual comprehension. Before we begin our journey through the looking glass and prove VGI, we wanted to clearly define the terminology we use.

2. Ready? The next industrial revolution

In summary: Focus of this section

Here, we explore how VGI will further transform operations across manufacturing, construction, and waste management through two key applications: Health and Safety and Lean/Overall Equipment Effectiveness (OEE).

“If we want machines to think, we need to teach them to see.”

Fei-Fei Li
Professor of Computer Science at Stanford University

(A) Health and Safety: The path to zero harm and zero downtime

1. Manufacturing transformed: The eyes that never blink

VGI will revolutionize manufacturing safety by creating adaptive systems that learn from every incident and near-miss across global facilities. Unlike traditional safety monitoring that relies on rigid rule-based detection, VGI systems can understand contextual safety risks, predicting potential hazards before they occur.

Systems adapt to new equipment, changing production layouts, and evolving safety protocols without complete retraining: continuous protection as environments evolve.

Additional VGI-enabled HSE use cases could include:

• Predictive ergonomic analysis that monitors worker posture and movement patterns to prevent repetitive strain injuries
• Dynamic risk assessment that evaluates changing environmental conditions like temperature, humidity, and air quality in real-
time
• Intelligent emergency response coordination that can automatically guide evacuation procedures and coordinate with emergency services based on visual analysis of incident severity and location

3. The past does not dictate the future

In summary: Focus of this section

This chapter presents VGI as a transformative leap from traditional CV and towards fully-adaptive, real-time, end-to-end VI systems. Unlike legacy wapproaches dependent on rigid rules or vast annotated datasets, VGI introduces flexibility, contextual awareness, and scalability for real-world deployment. Drawing on case studies like DeepSeek’s cost-engineered breakthroughs and outlining the core architecture of VGI – including models, deployment, configuration, feedback, and analytics – it positions VGI as both a technological and strategic inflection point for AI vision’s future.

(A) On the crest of a wave: The founder view

The past does not dictate the future

– Not all vision-related AI is the same. We have seen remarkable progress, but too many are treading the same path of a single-use; a single fixed-function application. This may deliver some outcomes, but it isn’t designed for flexibility, speed, and extensibility.

– It doesn’t ensure that users are equipped to tackle today’s challenges, while preparing for tomorrow’s needs. First adopters may initially go for fixed-function solutions, but these are not suitable for mature adoption that requires scalability, customizability, security, and extensibility, to future-proof and deliver greater ROI.

AI Vision: The ‘next big thing’ will soon be supplanted

LLMs dominate AGI conversations, but big things are taking shape in vision. Text is easier as it is structured by semantic and grammatical rules. Vision data is more complex and heavier, with cumbersome workloads and bigger files. Edge computing is required for vision at the source: this is why vision is the next big thing.

Sight is crucial to human intelligence. Teaching machines to see unlocks a deeper and more intuitive understanding of the world. We seek to see the world differently through AI Vision, through the perspective of our clients and the challenges they face tomorrow.

We are pioneers of game-changing technology that places organizational needs at the very center of our thinking. An intelligence engine that drives positive outcomes through data. A paradigm where cameras don’t just detect objects but rather truly understand what they see and what to do about it. An approach that adapts to a wide range of use cases with minimal model training and the ability to embed AI models into real-world applications to solve business problems.

While everyone talks about AGI, we have built VI. We have paved the way for VGI, AGI’s cousin, specializing in visual tasks.

We are equipping machines with the ability to see as clearly as we can, if not better, solving every vision-related challenge, and reshaping entire industries in the process.

4. Welcome to the future: The dawn of VGI

In summary: Focus of this section

This chapter looks ahead to the future of Visual Intelligence and its convergence with AI and broader technological advances. It charts how innovation across the 2020s–2050s – from multimodal grounding to embodied agents, from human-computer symbiosis to sentient AI – will reshape civilization. VGI sits at the center of this transformation, evolving from perception to prediction and ultimately to visual super intelligence. The result is an integrated future where vision, intelligence, and technology merge to augment human capability and redefine the very nature of work, safety, and progress.

In order to step boldly into the future of Visual Intelligence, we believe it is essential to outline developments we anticipate both in the technology sector at large and across AI that will pave the way to VGI.

The future of Visual Intelligence: Inside the crystal ball

This timeline illustrates the convergence of three technological domains – VI, AI, and general technology – across a period from now through to the 2050s. It shows how:

• Broader technology advances from augmentation to transcendence
• AI progresses from generalization to symbiosis, and
• Visual capabilities are evolving from multi-modal processing to advanced VI

This culminates in an integrated future where these domains merge to transform and reshape human civilization.

5. The ‘North Star’: Proving VGI

In summary: Focus of this section

Finally, we introduce VGI as a system with general-purpose visual cognition that rivals and ultimately surpasses human visual capabilities. It shows traits of a truly intelligent visual system – goal-directed perception and autonomous understanding, plus simulation and continual learning – proposing a new benchmark: The “Visual Turing Test” (evaluating indistinguishability for human vs machine visual reasoning).

It could compare medically and cognitively accepted human visual standards vs machine capabilities. While machines exceed biological limits in isolated areas (frame rate or spectrum detection), the leap to VGI lies in achieving generalization, memory, abstraction, and context hallmarks of human visual cognition.

The ‘North Star’: Approaching VGI at the speed of light

We believe VGI will exhibit broad perceptual abilities and perform high-level visual understanding tasks such as interpretation, abstraction, and contextual recognition across diverse environments. It could perceive and reason visual scenes in dynamic, unstructured settings and apply its visual knowledge to unfamiliar situations:

6. Conclusion: The age of sight, reimagined

We are not merely witnessing an evolution in AI – we are crossing a visual event horizon. VGI is not a byproduct of AGI – it may well be the defining proof of its arrival.

VGI grants machines a faculty that has shaped civilization more than any other: vision. But unlike human sight, constrained by biology and subject to fatigue, VGI will offer relentless clarity, infinite memory, and perception untethered from time and space.

Acknowledgments:

“The future belongs to those who believe in the beauty of their dreams”

Eleanor Roosevelt

This whitepaper is built upon the collective efforts of the team at viso, past and present. It is the culmination of many years of conceptual development, technological innovation and computer vision knowledge built up by key individuals.

The paper would not have been possible without research and insight provided in particular by the co-founders and co-CEOs of viso, Gaudenz Boesch and Nico Klingler. Several sections draw from thought leadership exclusively drafted by them.

Do you want to see the future of computer vision?

We believe that VGI will be the proof point of AGI. See for yourself how VGI can unlock visual intelligence that streamlines operations.

Download whitepaper