• Train




          Data Collection

          Building Blocks​

          Device Enrollment

          Monitoring Dashboards

          Video Annotation​

          Application Editor​

          Device Management

          Remote Maintenance

          Model Training

          Application Library

          Deployment Manager

          Unified Security Center

          AI Model Library

          Configuration Manager

          IoT Edge Gateway

          Privacy-preserving AI

          Ready to get started?

          Expert Services
  • Why Viso Suite
  • Pricing
Close this search box.

Computer Vision Projects: How To Get Started (Guide)


Viso Suite is the all-in-one solution for teams to build, deliver, scale computer vision applications.

Need Computer Vision?

Viso Suite is the world’s only end-to-end computer vision platform. Request a demo.

There are many reasons why Computer Vision is difficult, and most computer vision projects never make it to production. Software algorithms, cloud infrastructure, and Edge Computing hardware components need to be perfectly aligned to kick off a new Edge AI vision project. The high level of complexity for such AI vision systems often results in exhausted budgets, delayed deadlines, or ROI metrics that are not met.

At the same time, new applications of Computer Vision appear every day, no matter what industry we are looking at. These use cases have the potential to disrupt whole industries by automating time-consuming manual processes or the introduction of innovative products or services.

Setting Up Computer Vision Projects

Most of the time, innovators find themselves having many great ideas about how to incorporate these emerging technologies into their business to create unforeseen value. But where should you get started? At viso.ai, we have accompanied dozens of computer vision projects from scratch and developed our own way of taking the first steps with a straightforward approach.

This article helps you structure your idea and explain how and where you should get started to productize your next computer vision project.

1.) Describe Your Computer Vision Project

First and foremost, it is essential to create your project description, to identify and inform your stakeholders. In addition, the project description brings everyone to the same level of understanding and supports the process of translating the business requirements into technical tasks later on.

The project description should cover the following dimensions:

  • The project name and purpose.
    Make sure to state a clear and narrow focus.
  • Measurable business goals.
    Define what success looks like and identify the value drivers.
  • Timeline and milestones.
    Define the milestones, and estimate the time needed per milestone.
  • Team and stakeholders.
    Don’t forget to take privacy and security stakeholders into account.
  • One or multiple locations of the endpoints.
    Define the expected scenery and environment.
  • Already existing hardware.
    State what cameras or servers are supposed to be (re-)used for testing.
  • Available infrastructure.
    Focus on the availability of power supply and internet connectivity.

For this stage, keep it concise as there is no need to be too detailed.

2.) Name the Features

As for any software project, the software to be developed will have to fulfill certain requirements and perform a set of features. The biggest risk we have seen in the past is that a project gets overloaded before it has even started. The more features you add to your computer vision project, the more complex it gets.

  • Define one core feature and keep it as narrow as possible. This feature should be closely related to the project’s value drivers and determine the project’s technical nature (e.g., what method of AI computer vision will be applied). The core feature will not be fundamentally changed later and has to be validated in a basic prototype (proof of concept).
  • Define a set of additional key features to add more functionality and increase the product value. We usually try to identify the 2-3 most important features (never go over 5) and call them “must-have” software characteristics.

Most of the time, to build a feasibility study or proof of concept, this is enough for getting started quickly without adding too much complexity. Particularly, most Visual AI-based projects start with a simple idea such as “counting people” or “identify damaged produce”.

All nice-to-have features that can be added later are not too relevant at this early stage of development and could prevent from getting started quickly enough.

3.) Prepare the Video Material

Eventually, all computer vision projects are based on sample video material. We use video material as visual input for AI inferencing that later can be replaced with real-time video feeds. Hence, we apply the pre-trained models, such as object detection and object tracking or human pose estimation, to the video material. While initial features, timeline, or administrative components of the project are important to be discussed, it can never kick off without video material of the scenery of interest.

  • Video Scenario. For a first starting point, the video material does not need to reflect the exact and final setting. However, it should show a realistic scenario that is representative of the product use case. For example, we often use a set of up to 10 sequences. For most cases, the sequences should not be shorter than 10 seconds and not longer than 1 minute.
  • Camera Type. The fastest and easiest way of getting started is setting up an IP camera or USB camera connected to any video recording device. For AI vision, there are no special AI cameras required because any digital video input can be processed. Sometimes, the webcam of a laptop or the mobile phone camera can do the job too. If you have existing cameras available, for example, digital CCTV cameras, these can be used too (with a Network Video Recorder, NVR).
  • Video Quality. Make sure the video samples reflect the actual scenario as closely as possible regarding lighting, colors (some algorithms require colors, IR night vision videos won’t work), contrasts (low contrast is better), and distance to objects (the larger the object size, the better). Use a lower image resolution (640p or 720p) for higher accuracy and overall better results because you will achieve significantly higher FPS (frames per second) with the same computer, server, or AI hardware in general. The camera used does not need to have the final image resolution, nor does it need to fulfill the definitive hardware specifications. In most projects, these factors will be evaluated and identified later on when cost factors come into play (especially computing hardware is a cost driver).


Computer vision projects require video material for testing with pre-trained AI models.
Computer vision projects require video material for testing with pre-trained AI models. The example shows applied privacy-preserving Face Blur.

The goal is to create a set of videos with the camera, angle, and scenery that you think might be practicable to implement later and show the objects of interest in a clear manner. That way, first feasibility tests can be run smoothly, and changes to the setting or the AI model can be identified easily. Often it is much easier to adjust the setting, for example, by optimizing the distance to the object of interest.

4.) Start Computer Vision Projects as Early as Possible

We have seen ourselves in the situation of overthinking the “how getting started” question at the beginning of new projects with computer vision.

  • Test and benchmark different settings
    In most cases, the feasibility of a new idea based on AI vision can be tested with minimal financial effort. However, while the proof of concept needs to follow a clear structure and methodology, it does not get any better with going into every detail at the beginning. So we try instead to use video samples and tests to adapt technical specifications iteratively.
  • Optimize the setting
    Optimization is key when it comes to using AI inference in a real-world use case. The key metric we use is Cost/FPS (hardware costs relative to performance) combined with the minimum FPS required for the use case. You will find that some use cases don’t require high FPS because the insight quality does not always increase with more frames processed per second. Changes to the hardware setup or the processing logic can lead to dramatic cost savings. For example, using a camera with lower resolution requires significantly less computing power – while achieving the same overall product performance (accuracy).

The challenges and opportunities will be very clear soon once you get started working on real video footage. Once the first results are available, questions about computing performance, the need for real-time processing, or the optimal balance between algorithm performance and costs will follow automatically. This time you will be able to back your decisions on how to move forward with data from your proof of concept.

What’s Next?

Getting started with computer vision and visual deep learning can be complex. A structured and scalable approach can help you to kick off.

If you enjoyed reading this article, you might be interested in:

Follow us

Related Articles
Play Video
Would you like a demo?

See how your team can build your real-world AI vision systems faster with our end-to-end solution.

Join 6,300+ Fellow
AI Enthusiasts

Get expert AI news 2x a month. Subscribe to the most read Computer Vision Blog.

You can unsubscribe anytime. See our privacy policy.

Schedule a live demo

Not interested?

We’re always looking to improve, so please let us know why you are not interested in using Computer Vision with Viso Suite.