It’s not something new, it’s already happened with the “Cloud” or the “Big Data”. Everyone seemed to need a Cloud product or have solutions that supposedly leveraged technology called Big Data. Today almost all video surveillance solution manufacturers offer systems with “Artificial Intelligence”. The three terms are ambiguous enough to give rise to confusion and perhaps some deception. The following guide aims to clarify some recurring doubts from professionals in the private security sector regarding the possibilities of techniques based on “Deep Learning” which is the technology that manufacturers refer lately when they talk about Artificial Intelligence.
1.- They say that cameras based on “Deep Learning”work similarly to the human brain, ¿Is it right?
Being rigorous the simple answer is no, it’s not true. The human brain is far superior to any system based on “Deep Learning”except at a crucial point, machines do not get tired. Any adult who is paying enough attention is much more reliable than a machine for a certain period of time and with a limited amount of video. The advantage of machines arises when you want to monitor several cameras for long periods of time, the human brain simply does not have the ability to pay attention to so much information for so long.
2.- So, ¿Why do they compare so often, brain and “Deep Learning”?
Deep Learning techniques are based on neural networks, which are mathematical techniques inspired by what (little) we know about the functioning of the human brain. From a didactic point of view it can be useful to perform the comparison. For example, a neural network learns “seeing” patterns (images). People can also learn by seeing patterns. Comparisons beyond these phrases begin to be complicated and generate false expectations.
3.- Neural networks that learn ? What do they learn?
By simplifying a little, the neural networks used in Deep Learning solutions are nothing more than pattern classifiers. A pattern could be defined as a particular formation of elements, in the case of video it is usually a set ofpixels. Therefore, a neural network based on “Deep Learning”is nothing more than a pixel set classifier. Given a set of pixels, the classifier determines which class (object type) it belongs to. It’s just a shape recognizer.
4.- How does stop learning a neural network based on Deep Learning ?
Normally neural networks have two different operating modes. A mode usually called “training” and another mode often referred to as “inference.” There are several important differences between the two modes, for example in the first case the goal is for the network to learn how to recognize objects, in the second case the goal is for the network to recognize objects. Another difference is computational requirements. “Training” a network is much more expensive than “running” it. The vast majority of products implement only inference mode, i.e. the network “comes learned” from the factory and does not learn anymore.
5.- Are the products that in addition to infer can continue to learn, better?
It’s hard to answer this question with a yes or no. While it is true that having the ability to continue learning gives more power to a product, it is also true that it involves problems, especially in the field of security. When a network is learned from the factory it is trusted that the learning has been properly supervised by the engineers of a company specialized in it. If we let a neural network learn alone in an installation, we run the risk of learning things that aren’t true. This is especially serious if we consider that a neural network that is learning all the time has the ability to change its mind, which again may sound good, but it has dangerous implications such as -that given the same situation (same scene and same person)-,one day it is classified as intrusion and another day it does not. Systems become completely unpredictable (indeterministic) and therefore impossible to tune or correct. In the security sector, deep Learning-based neural networks are generally better off having as controlled a learning process as possible.
6.- So is a product better if it can learn or not?
You have to differentiate between a neural network based on Deep Learning and a video analytics product. The neural network is just one of many technology that can be used to develop a product. The same technology can be used in many ways and in combination with many other technologies. The better they are used the better the product. Compared to the world of vehicles, there are many manufacturers that use explosion engines, but the performance of those engines is not the same in all manufacturers. Even with the same engine, the performance of a vehicle depends not only on that element but on the combination with the rest of the components or even the body design. The most powerful and reliable video analytics systems do not have a single component of Artificial Intelligence, combining several so that some can adapt without compromising the reliability of the rest. A product that only has “Deep Learning” technology is likely to be unsafe.
7.- How is it possible that a deep Learning based product is more insecure than one that is not?
In point 3 we have indicated that a deep learning-based neural network is simply a shape classifier. Therefore, if the element to be detected does not have that shape it is not detected. Most cameras with “AI” on the market do not generate alarm if a person is hidden behind a box. They are therefore very insecure and easily sabotaged systems. As indicated in point 6, the reliability of a video analytics product lies in the management of technologies used and the engineering team’s expertise in those technologies. Deep Learning technology is very powerful but in order to be used in the security sector it must be used in conjunction with other technologies that make it more robust in the face of adverse situations.
8.- What hardware needs a solution that includes Deep Learning components?
As we indicated earlier, Deep Learning-based neural networks are nothing more than mathematical algorithms and can be run on any computer machine. Depending on the type and architecture of the network you may have higher or lower hardware requirements. We currently have Deep Learning-based Learning systems in cloud solutions, on our mobile phones, within cameras, or on rackable servers. Server or cloud-based solutions are generally more flexible, requiring a greater number of technologies than camera-only solutions. Within server-based solutions, there are optimized solutions capable of working on general purpose processors (X86) and others that either use adapted architectures or that work with such a large amount of data that they require GPU-like graphics processors.
9.- Should I always use Deep Learning-based products?
As it almost always depends on the project. In non-critical environments, new products that only include a basic Deep Learning (DL) system can be a good solution. However, for professional environments, where a high level of security is sought to use only such devices should be avoided. In fact, if you have to choose between a system with only DL and another that does not have any DL components it is safer to use the second. If we have the possibility to use a perimeter protection system that among other elements incorporates DL we will have a greater range of possibilities to find the right point between the detection level (avoid non-detections) and the level of false warnings. Having better tools gives us more possibilities to implement the best solution in each situation.
10.- What can we expect from systems that learn completely alone?
Typically all systems require a minimum of configuration. For example, we should tell you which areas are of interest to us and which are not. On the other hand, the elements to be recognized may not always be the same, in some circumstances a vehicle may be an intruder but in others only people will be. A normal camera that covers an area on its own is not able to determine the size of objects so it will require you to provide reference information. Certain environments are more complicated than others, the experience of the installer technician in selecting the location of cameras and the architecture of areas of interest can result in a very noticeable improvement of the results obtained with the same product. Both, today and in the coming years, the knowledge of people using video analytics products is just as important or more important as the technologies they use. Deep Learning remains insufficient to provide the best security for perimeter installations. The learning of technicians in the use of tools remains key.
Eduardo Cermeño
Professor of Computer Science and Artificial Intelligence (Universidad Autónoma Madrid)
Founding Partner of Vaelsys