ABBYY mobile real-time recognition

Dimitry Chubanov and Derek Gerber presented at the ABBYY Technology Summit on ABBYY’s mobile real-time recognition (RTR), which allows for recognition directly on a mobile device, rather than just capturing content to pass on to a back-end recognition server. Mobile data capture comes in two basic flavors: first, the mobile user is just entering data, such as an account number or password; and second, the mobile user is entering both data and image, such as personal data and a copy of their ID.

ABBYY RTR isn’t based on taking a photo and then running recognition on that image; instead, it uses several frames of image from the camera preview stream and runs recognition algorithms on the stream without having to capture an image. This provides a better user experience since the recognition results are immediate and they don’t have to type the data manually, and better privacy since no image is captured to the phone or passed to any other device or server. They demonstrated this using a sample app on an iPhone; it’s interesting to see the results changing slightly as the phone moves around, since the recognition is happening using the previous several frames of video data, and it gradually gains recognition confidence after a few seconds of video. We saw recognition of unstructured paragraphs of text, drivers licenses, passports and bank cards. The SDK ships with a lot of predefined document types, or you can create your own by training for specific fields using location and regular expressions. They are also offering the ability to capture meter data, such as electricity meters, although some of this requirement is being by smart meters and other IoT advances.

They also have a mobile imaging SDK that can capture an image when it’s needed — for proof of ID, for example — with scene stabilization, document edge detection, deskewing and various types of image enhancement to capture the optimal photo for downstream storage and processing.

I can imagine, for example, a mobile airline app that needs to capture your passport information using mobile RTR to grab the data directly rather than having you type it in. I’ve also seen something very similar used to capture the unique number from an iTunes gift card directly into the App Store on an iPhone. Just like QR code reading is now built right into the search bar on the mobile versions of Google Chrome, and Google Translate on mobile allows real-time capture of text using the same camera preview mode (plus simultaneous translation), being able to capture text from a printed source instead of requiring a mobile user to type it in is likely to become ubiquitous in mobile apps.

Leave a Reply