Machine learning in ABBYY FlexiCapture

Chip VonBurg, senior solutions architect at ABBYY, gave us a look at machine learning in FlexiCapture 12. This is my last session for ABBYY Technology Summit 2017; there’s a roadmap session after this to close the conference, but I have to catch a plane.

He started with a basic definition of machine learning: a method of data analysis that automates analytical model building, allowing computers to find insights in data and execute logic without being explicitly programmed for where to look or what to do. It’s based on pattern recognition and computational statistics, and it’s popping up in areas such as biology, search and recommendations (e.g., Netflix), and spam detection. Machine learning is an iterative process that uses sample data and one or more machine learning algorithms: the training data set is used by the algorithm to build an analytical model, which is then applied to attempt to analyze or classify new data. Feedback on the correctness of the model for the new data is fed back to refine the learning and therefore the model. In many cases, users don’t even know that they’re providing feedback to train machine learning: every time you click “Spam” on a message in Gmail (or “Not Spam” for something that was improperly classified), or thumbs up/down for a movie in Netflix, you’re providing feedback to their machine learning models.

He walked us through several different algorithms, and their specific applicability: Naive Bayes, Support Vector Machine (SVM), and deep learning; then a bit about machine learning scenarios inclunition rulesding supervised, unsupervised and reinforcement learning. In FlexiCapture, machine learning can be used to sort documents into categories (classification), and for training on field-level recognition. The reason that this is important for ABBYY customers (partners and end customers) is that it radically compresses the time to develop the rules required for any capture project, which typically consumes most of the development time. For example, instead of just training a capture application for the most common documents since that’s all you have time for, it can be trained for all document types, then the model will continue to self-improve as verification users correct errors made by the algorithm.

Although VonBurg was unsure if the machine learning capabilities are available yet in the SDK — he works in the FlexiCapture application team, which is based on the same technology stack but runs independently — the session on robotic information capture yesterday seems to indicate that it is in the SDK, or will be very soon.

Capture microservices for BPO with iCapt and ABBYY

Claudio Chaves Jr. of iCapt presented a session at ABBYY Technology Summit on how business process outsourcing (BPO) operations are improving efficiencies through service reusability. iCapt is a solutions provider for a group of Brazilian companies, including three BPOs in specific verticals, a physical document storage company, and a scanner distributor. He walked through a typical BPO capture flow — scan, recognize, classify, extract, validate, export — and how each stage can be implemented using standalone scan products, OCR SDKs, custom UIs and ECM platforms. Even though this capture process only outputs data to the customer’s business systems at the end, such a solution needs to interact with those systems throughout for data validation; in fact, the existing business systems may provide some overlapping capabilities with the capture process. iCapt decided to turn this traditional capture process around by decoupling each stage into independent, reusable microservices that can be invoked from the business systems or some other workflow capability, so that the business system is the driver for the end-to-end capture flow. The microservices can be invoked in any order, and only the ones that are required are invoked. As independent services, each of them can be scaled up and distributed independently without having to scale the entire capture process.

The recognize, classify and extract steps are typically unattended, and became immediate candidates to be implemented as microservices. This allows them to be reusable across processes, scaled independently, and deployed on-premise or in the cloud. For example, a capture process that is used for a single type of document doesn’t require the classification service, but only uses the recognize and extract services; another process that uses all three may reuse the same recognize and extract services when it encounters the same type of document as the first process handles, but also uses the classify service to determine the document type for heterogeneous batches of documents. iCapt is using ABBYY FineReader as a core component in their iCaptServices Cloud offering, embedded within their own web APIs that offer higher-level services on top of the FRE core functions; the entire package can be deployed as a container or serverless function to be called from other applications. They provide services for mobile client development to allow these business applications to have capture on mobile devices.

He gave an example of a project that they did for recovering old accounting records by scanning and recognizing paper books; this was a one-time conversion project, not an ongoing BPO operation, making it crucial that they be able to build the data capture application quickly without developing an excessive amount of custom code that would have been discarded after the 10-week project duration. They’re currently using the Windows version of ABBYY which increases their container/cloud costs somewhat, and are interested in trying out the Linux version that we heard about yesterday.

Pairing @UiPath and ABBYY for image capture within RPA

Andrew Rayner of UiPath presented at the ABBYY Technology Summit on robotic process automation powered by ABBYY’s FineReader Engine (FRE). He started with a basic definition of RPA — emulating human execution of repetitive processes with existing applications — and the expected benefits in high scalability and reduction in errors, costs and cycle time. RPA products work really well with text on the screen, copying and pasting data between applications, and many are using machine learning to train and improve their automated actions so that it’s more than the simpler old-school “screen scraping” that was dependent purely on field locations on the screen.

What RPA doesn’t do, however, is work with images; that’s where ABBYY FRE comes in. UiPath provides developers using their UiPath Studio the ability to OCR images as part of the RPA flow: an image is passed to FineReader for recognition, then an XML data file of the recognized data is returned in order to complete the next robotic steps. Note that “images” may be scanned documents, but can also be virtualized screens that don’t transfer data fields directly, just display the screen as an image, such as you might have with an application running in Citrix — this is a pretty important capability that is eluding standard RPA.

Rayner walked through an example of invoice processing (definitely the most common example used in all presentations here, in part because of ABBYY’s capabilities in invoice recognition): UiPath grabs the scanned documents and drops them in a folder for ABBYY; FRE does the recognition pass and creates the output XML files as well as managing the human verification step, including applying machine learning on the human interaction to continuously improve the recognition as we heard about yesterday; then finally, UiPath pushes the results into SAP for completing the payment process.

For solution developers working with RPA and needing to integrate data captured from images or virtualized screens, this is a pretty compelling advantage for UiPath.

ABBYY partnerships in ECM, BPM, RPA and ERP

It’s the first session of the last morning of the ABBYY Technology Summit 2017, and the crowd is a bit sparse — a lot of people must have had fun at the evening event last night — and I’m in a presentation by another ex-FileNet colleague of mine, Carl Hillier.

He discussed how capture isn’t just a discrete operation any more, where you just capture, index and store in a content management repository, but is now the front end to business processes that have the potential for digital transformation. To that end, since ABBYY has no plans to expand their side of the business, they have made strategic partnerships with a number of vendors that push into downstream processes: M-Files and Laserfiche for content management, Appian and Pega (still in the works) for BPM, and Acumatica for ERP. As with many technology partnerships, there can be overlap in capabilities but that usually sorts out in favor of the specialist vendor: for example, with Laserfiche, ABBYY is being used to replace Laserfiche’s simpler OCR capabilities for customers with more complex capture capabilities. Both BPM vendors have RPA capabilities — Appian through a partnership with Blue Prism, Pega through their OpenSpan acquisition — and there’s a session following by RPA vendor UiPath on using ABBYY for RPA that likely has broader implications for working with these other partners.

For the solution builders who use ABBYY’s FlexiCapture, the connectors to these other products gives them fast path to implementation, although they can also use the ABBYY SDK directly to create solutions that include competing products. We saw a bit about each of the ABBYY connectors to the five strategic partners, and how they take advantage of those platforms’ capabilities: with Appian, for example, a capture operator uses FlexiCapture to scan/import and verify documents, then the connector maps the structured data directly into Appian’s data objects (records), whereas for one of the content management platforms, they may transfer a smaller subset of document indexing data. The Acumatica integration is a bit different, in that FlexiCapture isn’t seen as a separate application for the capture front end, but it’s embedded within the Acumatica interface as an invoice capture service.

ABBYY’s plan is to create more of these connectors, making it easier for their VARs and solution partners (who are the primary attendees at this conference) to quickly build solutions with ABBYY and a variety of platforms.

ABBYY mobile real-time recognition

Dimitry Chubanov and Derek Gerber presented at the ABBYY Technology Summit on ABBYY’s mobile real-time recognition (RTR), which allows for recognition directly on a mobile device, rather than just capturing content to pass on to a back-end recognition server. Mobile data capture comes in two basic flavors: first, the mobile user is just entering data, such as an account number or password; and second, the mobile user is entering both data and image, such as personal data and a copy of their ID.

ABBYY RTR isn’t based on taking a photo and then running recognition on that image; instead, it uses several frames of image from the camera preview stream and runs recognition algorithms on the stream without having to capture an image. This provides a better user experience since the recognition results are immediate and they don’t have to type the data manually, and better privacy since no image is captured to the phone or passed to any other device or server. They demonstrated this using a sample app on an iPhone; it’s interesting to see the results changing slightly as the phone moves around, since the recognition is happening using the previous several frames of video data, and it gradually gains recognition confidence after a few seconds of video. We saw recognition of unstructured paragraphs of text, drivers licenses, passports and bank cards. The SDK ships with a lot of predefined document types, or you can create your own by training for specific fields using location and regular expressions. They are also offering the ability to capture meter data, such as electricity meters, although some of this requirement is being by smart meters and other IoT advances.

They also have a mobile imaging SDK that can capture an image when it’s needed — for proof of ID, for example — with scene stabilization, document edge detection, deskewing and various types of image enhancement to capture the optimal photo for downstream storage and processing.

I can imagine, for example, a mobile airline app that needs to capture your passport information using mobile RTR to grab the data directly rather than having you type it in. I’ve also seen something very similar used to capture the unique number from an iTunes gift card directly into the App Store on an iPhone. Just like QR code reading is now built right into the search bar on the mobile versions of Google Chrome, and Google Translate on mobile allows real-time capture of text using the same camera preview mode (plus simultaneous translation), being able to capture text from a printed source instead of requiring a mobile user to type it in is likely to become ubiquitous in mobile apps.

ABBYY Robotic Information Capture applies machine learning to capture

Back in the SDK track at ABBYY Technology Summit, I attended a session on “robotic information capture” with FlexiCapture Engine 12, with lead product manager Andrew Zyuzin and director of product marketing Semyon Sergunin showing some of the automation classification and data extraction capabilities powered by machine learning. Traditional enterprise capture uses manually-created rules for classification and data extraction to set up for automated capture: a time-consuming training process up front in order to maximize recognition rates. At the other end of the spectrum, robotic process automation uses machine learning to analyze user actions, and create classification and extraction algorithms that can be run by robots to replace human operators. In the Goldilocks middle, they position robotic information capture as a blending of these two ideas: the system is pre-trained and processes standard documents out of the box, then uses machine learning to enhance the recognition for non-standard documents by analyzing how human operators handle the exceptions. Although I’m not completely aligned with their use of the term robotic process automation since RPA is not completely synonymous with machine learning and also isn’t limited to capture applications, I understand why they’re positioning their ML-assisted capture as robotic information capture as a middle ground between traditional capture and ML-assisted RPA.

We saw a demo of this with invoice capture: a PDF invoice was processed through their standard invoice recognition, detecting vendor name and invoice number, but the wrong number was picked up for the total amount due to the location of the field. This was corrected by a user in the verification client, and the information of where to find the total was analyzed for retraining and fed back to the recognition model. The user doesn’t know that they’re actually training the system — there’s no explicit training mode — but it just happens automatically in the background for continuous improvement of the recognition rates, gradually reducing the amount of manual verification. After the training was fed back, we saw another invoice from the same vendor processed, with the invoice total field properly detected.

Although I think that most technology is pretty interesting, this is the first thing I’ve seen today that made me say “cool!”

Zyuzin also walked us through their advanced classification, which can classify documents without any development based on large data sets of typical document types such as invoices, cheques, and drivers licences; automatic classification is important as the front end to recognition so that the correct recognition techniques and templates can be applied. Their advanced classification uses both image and content classification, that is, determines what type of document it is based on how it looks as well as the available text content. He showed us a demo of processing a package of mortgage documents, where there is a large number of possible documents that can be submitted by a consumer as supporting documentation; most of the documents were properly classified, but a few were unrecognized and required a quick setup of a new document type to train the classifier. This was more of a manual training process, but once the new document class was created, it could be applied to other unrecognized documents in the package.

ABBYY Recognition Server 5.0 update

I’ve switched over to the FlexiCapture technical track at the ABBYY Technology Summit for a preview of the new version of Recognition Server to be released in the first half of 2018. Paula Saunders, director of sales engineering, walked us through a presentation of the features and a demo.

New features include:

  • Smart PDF quality detection and processing, including detecting if there is already an OCR layer on the document and using that instead of re-recognizing
  • Support of PDF/E standard for engineering drawings
  • Import of email messages in MSG format, including both the message text and attachments
  • Advanced document editing at indexing and verification stations, such as rotation and redaction
  • Support of user languages and pattern-matching to fine-tune non-standard text
  • Extracting index fields by using a template of fixed regions
  • Native 64-bit support

Recognition Server is for more of the production capture work, where you set up a capture workflow and define several properties that define the input, process, document separation, quality control, indexing and output stages of that flow. She walked us through the screens for creating a new workflow and setting the properties at each stage, then showed us what it would look like at an indexing station if you wanted to edit the original image: deskewing, despeckling, cropping and more. The indexing station module also allows you to create field templates, for fine-tuning the recognition and assigning for mapping form areas to index fields directly on live document data. The verification station module can be used for additional training using pattern matching, such as recognizing unusual fonts.

ABBYY SDK update and FineReader Engine deep dive

I attended two back-to-back sessions from the SDK track in the first round of breakouts at the 2017 ABBYY Technology Summit. All of the products covered in these sessions are developer tools for building OCR capabilities into other solutions, not end-user tools.

Semyon Sergunin, director of product marketing for ABBYY‘s SDK products, gave us a high-level update and a bit of the roadmap for all of the SDK products. For reference, FineReader Engine is an OCR toolkit, while FlexiCapture Engine is based on the same technology but is an SDK for document separation, classification and data extraction.

FineReader Engine 12:

  • New OCR support for Farsi and Burmese languages, and improved OCR for Japanese
  • Improved layout retention, so that the recognized/exported document in plain text or structured document formats (MS Office) looks more like the original
  • Improved automation of document classification and data extraction using machine learning
  • Additional export formats (ALTO, PDF/A 2-b and 3-b), and improvements to some existing ones (XML, TXT)

He also discussed some of their licensing changes, including cloud licenses for Azure public cloud and virtual cloud instances.

FlexiCapture Engine 12:

  • New classification and PDF export features supported via the API
  • Update to latest version of OCR technologies
  • Processing of natively-digital documents (email, text, MS-Word), not just images
  • Cloud licensing
  • Changes to classification logic depending on whether the text or image version of the content is available
  • Processing of PDFs with text layers
  • Linux support using a Wine wrapper

Receipts Capture SDK:

  • Available on Windows, Linux (via Wine) and cloud
  • Supports 120 major US vendor receipt styles
  • Added field-level confidence levels, not just character or word confidence
  • Added manual verification service

Mobile real-time recognition SDK:

  • Built-in support for bank cards, passports, several different states’ drivers licenses, and regular expressions
  • Combined SDK for video or still photo input on mobile

Cloud OCR SDK:

  • Same functionality as FineReader Engine, plus a few extras such as receipt recognition
  • Subscription and package pricing

There’s also a new FlexiCapture Cloud product in beta now, providing the additional functionality for document classification and data extraction.

The details here are primarily of interested to technical developers who are working with ABBYY products (or planning to), but the amount of new information shows a good rate of innovation. This was a fast high-level update, although more detail than we saw in the analyst briefing yesterday; there will be more information coming in later breakout sessions.

This was followed by a deep dive session on the use of FineReader Engine, with Larysa Lototska, technical marketing manager, and Tony Connell, pre-sales engineer. They covered the following topics:

  • Licensing, both runtime and developer
  • Improving recognition accuracy by using predefined profiles for specific types of documents or data extraction, e.g., engineering drawings or business cards; and by applying additional settings via code
  • Improving recognition speed by changing the engine loading method; using multiple CPU cores or concurrent recognition processes; using parallelism for multiple pages within documents; and batch scanning for batches of documents with the same number of pages (including single-page documents)

They gave live demos showing how to use some of the different profiles and settings in sample code in Visual Studio, applying methods for classifying and recognizing particularly difficult or degraded images.

They also discussed turning on the FineReader Engine log file to track down performance problems, since it tracks and timestamps every engine call plus any errors that are thrown, and walked through various sources of developer help on their site and bundled with the SDK.

There are a lot of interesting sessions at the conference: even with only three tracks, I’m having trouble deciding what to attend in some time slots.

The collision of capture, content and analytics

Martyn Christian of UNDRSTND Group, who I worked with back in FileNet in 2000-1, gave a keynote at ABBYY Technology Summit 2017 on the evolution and ultimate collision of capture, content and analytics. He started by highlighting some key acquisitions in the industry, including the entry of private capital, as well as a move to artificial intelligence in the capture space, as harbingers of the changes in the capture market. Since Gartner declared enterprise content management dead — long live content services platforms! — and introduced new players in the magic quadrant alongside the traditional ECM players, while shifting IBM from the leaders quadrant back to the challengers quadrant.

Intelligent capture is gaining visibility and importance, particularly as a driver for digital transformation. Interestingly, capture was traditionally about converting analog (paper) to digital (data); now, however, many forms of information are natively digital, and capture is not only about performing OCR on scanned paper documents but about extracting and analyzing actionable data from both analog and digital content. High-volume in-house production scanning operations are being augmented — or replaced — with customers doing their own capture, such as we now see with depositing a check using a mobile banking application. Information about customer actions and sentiment is being automatically gleaned from their social media actions. Advanced machine learning is being used to classify content, reducing the need for manual intervention further downstream, and enabling straight-through processing or the use of autonomous agents.

As a marketing guy, he had a lot of advice on how this can be positioned and sold into customers; UNDRSTND apparently ran a workshop yesterday for some of the channel partner companies on bringing this message to their customers who are seeking to move beyond simple capture solutions to digital transformation.

ABBYY corporate vision and strategy

We have a pretty full agenda for the next two days of the 2017 ABBYY Technology Summit, and we started off with an address from Ulf Persson, ABBYY’s relatively new worldwide CEO (although he is a long-time board member). As a company with its roots in Russia that has spread from country to country in a bit of a disjointed way in the past, Persson is pushing the idea of #OneABBYY: a global company rather than a collection of regional companies. A leader in OCR since 1989, ABBYY is the best-kept secret in OCR: end customers don’t know who they are, and even other vendors in the same space haven’t heard the name, unlike that of their competitors. They are actively trying to change how they grow: becoming more globally balanced in terms of development, marketing and organizational structure; becoming more agile and customer-centric; and continuing to be profitable and innovative. Their revenues are well-diversified, with 33% in Europe, 28% in North America, global accounts (bundling with hardware vendors) 19%, then smaller segments in Russia, Africa and Australia.

Their strategy includes:

  • Increasing market share in enterprise capture by pushing intelligent capture solutions, primarily as a cloud service.
  • Becoming the partner of choice for ISVs that need to build capture capabilities into their solutions. Unlike some other capture vendors, they are not looking to push into adjacent spaces, such as BPM, but plan to stay as an indepedent vendor in the intelligent capture and automation market that can partner with a wide variety of hardware, software and solution providers.
  • Becoming a leader in text analytics solutions, driven by the data that they capture from documents. He mentioned contracts in particular, where very complex text analytics are required to automate understanding of these types of documents.

They are making use of machine learning and artificial intelligence in their capture technology, and offering real-time recognition as a service or as embedded technology.

As a self-funded profitable company, they don’t need to go to the markets for funding, and state outright that they are not for sale.

Disclaimer: ABBYY has been my customer in the past — I gave the keynote here last year as well as another presentation in Toronto, and wrote a white paper — but I am not being compensated for my time here this week or for writing these posts. ABBYY did pay for my flights and hotel, which is the usual deal that I have with vendors to attend their conferences and blog my thoughts about what I see.