If you throw away quite ancient implementations, AR is image recognition and marker tracking.

With image recognition, it’s more or less clear. If the application should recognize a table, it is enough to upload to the server the library of table photos, indicate the general structure, color, arbitrary parameters and assign this data set a certain action when detected in the image.

The second part is marker tracking. Markers can be either specially printed images or any objects.

The application recognizes a log cover in a simple form with right angles and a specific image, and will track its position in space by marking its offset from the background. In this case, the cover itself is a marker.

With special markers everything is even easier. Let’s say we want to try on new disks in the car. For this purpose it is enough for us to paste QR tags on disks and the system will automatically understand, what exactly in these places the image of new wheels should be inserted into a picture. One more example: we put a mark on the floor and the application understands that this plane is the floor and will place arbitrary objects on it.

But markers everywhere you do not stick, and make a unique marker for each situation and unify the whole system is too difficult.

Here comes SLAM – a method of Simultaneous Localization and Map Building, used to build a map in an unknown space with simultaneous control of the current location and the path covered.

Sounds complicated. In a simplified way, SLAM is a way to recognize the camera’s environment and location by decomposing the image into geometric objects and lines. Then the system assigns a point (or many, many points) to each individual form, fixing their location in spatial coordinates on consecutive frames of the video stream. Thus, the conditional building is laid out on the plane of walls, windows, edges and other distinguished elements. And the conditional room – on the plane (floor, ceiling, walls) and objects inside. Due to the fact that the algorithm allows you to memorize the position of points in space, returning to the same room from another one you will see the points at the same places where they were before.

This method received a particularly strong boost after smartphone manufacturers began to build in additional cameras to calculate the depth of field in their devices.

Don’t think that Slam is an advanced version of conventional pattern recognition and marker tracking. Rather, it is a tool that is much better suited for orientation of augmented reality systems in space. It gives the application an idea of where the user is. But it is much worse suited for identifying, for example, a bear in an image.

For maximum efficiency, both approaches are combined for a particular task. Which brings us to the present situation.