Computer Vision Projects: How To Get Started (Guide)

Computer Vision Projects with Object detection

There are many reasons why Computer Vision is difficult and most computer vision projects never make it to production. Software algorithms, cloud infrastructure and Edge Computing hardware components need to be perfectly aligned to successfully kick off a new Edge AI vision project. The high level of complexity for such AI vision systems often results in exhausted budgets, delayed deadlines or ROI metrics that are not met.

At the same time, new applications of Computer Vision appear every day no matter what industry we are looking at. These use cases have the potential to disrupt whole industries by automating time-consuming manual processes or the introduction of innovative products or services.

Setting Up Computer Vision Projects

Most of the time, innovators find themselves having a lot of great ideas about how to incorporate these emerging technologies into their business to create unforeseen value. But where should you get started? At, we have accompanied dozens of computer vision projects from scratch and developed our own way of taking the first steps with a simple and straightforward approach.

This article helps you to structure your idea and explains how and where you should get started to productize your next computer vision project.

1.) Describe Your Computer Vision Project

First and foremost, it is truly important to create your project description, to identify and inform your stakeholders. The project description brings everyone to the same level of understanding and supports the process to translate the business requirements into technical tasks later on.

The project description should cover the following dimensions:

  • The project name and purpose.
    Make sure to state a clear and narrow focus.
  • Measurable business goals.
    Define what success looks like and identify the value drivers.
  • Timeline and milestones.
    Define the milestones, and estimate the time needed per milestone.
  • Team and stakeholders.
    Don’t forget to take privacy and security stakeholders into account.
  • One or multiple locations of the endpoints.
    Define the expected scenery and environment.
  • Already existing hardware.
    State what cameras or servers are supposed to be (re-)used for testing.
  • Available infrastructure.
    Focus on the availability of power supply and internet connectivity.

For this stage, keep it concise as there is no need to be too detailed.

2.) Name the Features

As for any software project, the software to be developed will have to fulfill certain requirements and perform a set of features. The biggest risk we have seen in the past is that a project gets overloaded before it has even started. The more features you add to your computer vision project, the more complex it gets.

  • Define one core feature and keep it as narrow as possible. This feature should be closely related to the project’s value drivers and determines the technical nature of the project (e.g. what method of AI computer vision will be applied). The core feature will not be fundamentally changed later and has to be validated in a basic prototype (proof of concept).
  • Define a set of additional key features to add more functionality and increase the product value. We usually try to identify the 2-3 most important features (never go over 5) and call them “must-have” characteristics of the software.

Most of the time, to build a feasibility study or proof of concept, this is enough for getting started quickly without adding too much complexity. Particularly, as most Visual AI based projects start with a simple idea such as “counting people” or “identify damaged produce”.

All nice-to-have features which can be added at a later point are not too relevant at this early stage of development and could prevent from getting started quickly enough.

3.) Prepare the Video Material

Eventually, all computer vision projects are based on sample video material. We use video material as visual input for AI inferencing that later can be replaced with real-time video feeds. Hence, we apply the pre-trained models for example for object detection or human pose estimation to the video material. While initial features, timeline, or administrative components of the project are important to be discussed, it can never kick off without video material of the scenery of interest.

  • Video Scenario. For a first starting point, the video material does not need to reflect the exact and final setting. However, it should show a realistic scenario that is representative for the product use case. We often use a set of up to 10 sequences. For most cases, the sequences should not be shorter than 10 seconds and not longer than 1 minute.
  • Camera Type. The fastest and easiest way of getting started is setting up an IP camera or USB camera which is connected to any video recording device. Sometimes, the webcam of a laptop or the mobile phone camera can do the job too. If you have existing cameras available, for example, digital CCTV cameras, these can be used too (with a Network Video Recorder, NVR).
  • Video Quality. Make sure the video samples reflect the actual scenario as closely as possible regarding lighting, colors (some algorithms require colors, IR night vision videos won’t work), contrasts (low contrast is better), and distance to objects (the larger the object size, the better). Use a lower image resolution (640p or 720p) for higher accuracy and overall better results, because you will achieve significantly higher FPS (frames per second) with the same computer, server, or AI hardware in general. The camera used does not need to have the final image resolution, nor does it need to fulfill the definitive hardware specifications. In most projects, these factors will be evaluated and identified later on when cost factors come into play (especially computing hardware is a cost driver).


Computer vision projects require video material for testing with pre-trained AI models.
Computer vision projects require video material for testing with pre-trained AI models. The example shows applied privacy-preserving Face Blur.

The goal is to create a set of videos with the camera, angle, and scenery which you think might be practicable to implement later and which shows the objects of interest in a clear manner. That way, first feasibility tests can be run smoothly and changes to the setting or the AI model can be identified easily. Often it is much easier to adjust the setting, for example by optimizing the distance to the object of interest.

4.) Start Computer Vision Projects as Early as Possible

We have seen ourselves in the situation of overthinking the “how getting started” question at the beginning of new projects with computer vision.

  • Test and benchmark different settings
    In most cases, the feasibility of a new idea based on AI vision can be tested with minimal financial effort. While the proof of concept needs to follow a clear structure and methodology, it does not get any better with going into every detail at the beginning. We try instead to use video samples and tests to adapt technical specifications iteratively.
  • Optimize the setting
    Optimization is key when it comes to using AI inference in a real-world use case. The key metric we use is Cost/FPS (hardware costs relative to performance) in combination with the minimum FPS required for the use case. You will find that some use cases don’t require high FPS because the insight quality simply does not always increase with more frames processed per second. Changes to the hardware setup or the processing logic can lead to dramatic cost savings. For example, using a camera with lower resolution requires significantly less computing power – while achieving the same overall product performance (accuracy).

The challenges and opportunities will be soon very clear once you get started working on real video footage. Once the first results are available, questions about computing performance, the need for real-time processing, or the optimal balance between algorithm performance and costs will follow automatically. This time you will be able to back your decisions on how to move forward with data from your proof of concept.

What’s Next?

Getting started with computer vision and visual deep learning can be complex. A structured and scalable approach can help you to kick off. If you enjoyed reading this article, you might be interested in:

Share on linkedin
Share on twitter
Share on whatsapp
Share on facebook
Share on email
Related Articles

Want to use Computer Vision applications?

Get the all-in-one Suite to build and deliver Computer Vision Applications. 
Learn more

This website uses cookies. By continuing to browse this site, you agree to this use.