Connect with us

Thought Leaders

Indoor User Localization Using Visual Place Recognition




Visual place recognition is one of the cornerstones of computer vision development and robotics. The VPR algorithms’ task is to identify examined locations based on images. The technology can support autonomous robots and the human workforce alike, identifying surroundings and facilitating the execution of desired actions.

Researchers at NeuroSYS harness computer vision algorithms as a part of the developed AR platform, Nsflow, enabling interactive work instructions and hands-on training to identify user positions while undergoing on-site training. In this case, the use of VPR leads to a significant acceleration of onboarding and learning processes due to a reduced need for prior training and supervision.

Locating a person or finding the desired place using GPS is old news already. But what to do when the satellite-based navigation system is inoperable? Indoor positioning systems (IPS) are coming to the rescue. 

When looking for a needle in a haystack, you can make use of various techniques, including beacons, magnetic positioning, inertial measurement units (IMU) with accelerometers and gyroscopes, measuring movement from the last known point, wi-fi-based positioning, or simply – utilize visual markers. 

All the above methods have their flaws (e.g. the need to install markers or beacons, IMU increasing the error of measurement over time and requiring repositioning), outweighing their benefits. The solution answering the crucial problem – general user whereabouts with accuracy to the nearest few meters – turns out to be within the remit of algorithms. 

The process of recognizing places relies on a two-step procedure, creating two databases. Initially, the target place is photographed and certain items, keypoints, are marked by a feature detector to identify characteristic elements of the area. Afterward, the labeled points are compared to a reference image. Once the assessed keypoints are deemed similar enough by a feature matcher, the picture qualifies as showing the same place. 

The image database combines pictures of target locations, in this case, workspaces, and a set of their properties including unique identifiers, followed by local and global descriptors. The other set, the room database, matches singular keypoints with certain areas in the considered space. 

Using SuperPoint, SuperGlue, and netVLAD neural networks from the visual place recognition field, researchers utilized the above process in user localization. The deep neural networks, SuperPoint and SuperGlue, cooperate in feature detection and matching, extracting information from the databases. 

The global descriptors enter the stage

The process calls for global descriptors, serving as vectors distinguishing the place, identifying areas in a way that presents no ambiguities. To fulfill their role, the vectors should be illumination & point-of-view-agnostic – no matter the perspective and lighting conditions, the global descriptors should leave no doubt when distinguishing places in various pictures. 

Additionally, variable objects present in the area of interest should not be bound by global descriptors as features distinguishing places. Items like furniture and equipment are prone to changes (redecoration, dismantling), meaning they can’t define areas through their presence. 

Computer-vision-powered place recognition relies on permanent elements of examined locations, like doors, windows, stairs, and other distinctive items of long-lasting nature. During the research in question, the deep neural network NetVLAD was used for calculations, presenting, as a result, vectors meeting the set requirements. In the process of global descriptor matching, images of the most similar vectors are processed, following calculations of distance between each characteristic anchor point. 

When processing two databases – the room database and the other, containing key points and global descriptors – the system deals with attributes of images. After performing the similarities and shortest distances estimation, the second neural network, SuperGlue, identifies location images. The system using VPR allows for user localization based, in short, on the number of matching key points. 

The algorithms found application in the AI & AR platform, helping users to perform training equipped with smart glasses. VPR enables the trainees’ localization in the workplace, launching appropriate tutorials and guides assigned to particular spots, improving safety, and reducing the need for direct supervision. 

Project co-financed from European Union funds under the European Regional Development Funds as part of the Smart Growth Operational Programme. Project implemented as part of the National Centre for Research and Development: Fast Track.

Jowita Kessler is a Poland-based tech aficionado, working as a content marketing specialist at NeuroSYS. Compulsive reader and writer, dedicated to erasing the barrier between humanities and technology. Privately: daydreamer and nightwalker, a fan of cats and bats.