The Deep Dive

FHD 4K 8K 100K - The Future of Computer Vision.

Written by Gwihwan Moon | Oct 27, 2023 7:29:12 AM

Typically, the monitor resolution of the computers we use is FHD (1,920 x 1,080 pixels).

However, with recent advances in image sensors, the resolution of smartphone cameras is approaching 16K.

Just a few years ago, there were no monitors that supported 4K video, but these days, there are many monitors that support 4K resolution on the market, and both smartphone cameras and digital cameras can capture video and photos at a resolution of 4K or higher.

Advances in image sensors are also contributing greatly to increasing the usability of unmanned aerial vehicles. Basically, the video resolution of recently released multi-rotor drones is 4K (UHD) resolution, which is twice the resolution of FHD.

EO sensors, similar to image sensors, are widely used in military, scientific, and aerospace fields, and the performance of these sensors is also improving dramatically as technology advances.

Now, let’s check out the Object Detection model, which is considered the most basic task of computer vision AI technology. In general, researchers say that recent Object Detection models can handle images up to 4K resolution. However, when actually feeding 4K resolution images into the model, the AI model has difficulty finding objects that appear small.

However, if you increase the model's input size (input image resolution size), the total model size becomes too large, VRAM consumption increases significantly, and the model cannot even be run on the GPU.

While natural language processing models, such as Transformers, can run on multiple GPUs in parallel, AI models for machine learning models recently used in the computer vision field are generally run on a single GPU.

In other words, it is difficult to blindly increase the size of the AI model due to the memory capacity limitations of current hardware and the implementation method of the technology.

In the end, unlike natural language processing models, the image is simply segmented and the AI model is run on multiple GPUs. From this moment, computer engineering problems such as inference speed and parallel processing arise, and in addition to solving these problems, more problems arise.

For issues related to this, please refer to the following articles:

It's not just resolution that causes problems.

When the resolution of the image is doubled, the file size is quadrupled.

Additionally, the speed of reading a file is tens to hundreds of times different than reading data loaded in RAM or cache and processing images by dividing them into multiple files causes more overhead.

However, for the data handled by Facebook, Tesla, and other global software companies, finding small objects or processing large-capacity, high-resolution images are not the most important tasks. So the workers in the geospatial domain or other fields who need to solve these problems have trouble adopting the technology.

Of course, DeepBlock.net has been dealing with this problem for nearly 5 years, and in the geospatial field, we are well aware that there is a problem of finding various objects that appear small in ultra-high-resolution photos taken from high sky, and we are working hard to solve this problem.

This problem will probably worsen in the future.

Recently, the resolution of optical satellite payloads launched by the Korean government or other countries' governments or companies has been doubling almost every two years. And, the size of images is increasing fourfold, and this trend will continue.

The market and technology tell us that the technology we have developed is meaningful, but I have recently felt the limitations of the size of the Korean Geospatial AI (GeoAI) market, and we need to utilize the technology we have developed so far.

This is the reason why we are working to discover new use cases.

Recently, as we have been dealing with images and videos other than geospatial imagery (EO, SAR), I have come to think that the technology we have developed can be useful in other fields as well.

Recently, the resolution of image sensors has continued to increase, and when images of such large resolution are simply inserted into a machine learning model, the model cannot find all objects or extract features.

If the image resolution used in fields other than the geospatial field becomes larger, the problems in the remote sensing industry will also appear in other industries. Just as 4K has become a standard in recent years, technology will continue to advance, and so will cameras mounted on UAVs and image sensors mounted on mobile devices.

I believe that if computer vision technology does not change much in the near future, our high-capacity, high-resolution image processing technology will be able to be used in a wider range of applications.

And I think it won't take very long.

I wish good luck to all those working hard.