No, I haven’t learned German in my spare time , an interview with me about intelligent capture was translated for a German-language magazine.
Something strange about receiving an email about an upcoming webinar, featuring two people who I know well…
…then scrolling down to see that ABBYY is featuring the paper that I wrote for them as follow-on bonus material!
Nathaniel Palmer and Carl Hillier are both intelligent speakers with long histories in the industry, tune in to hear them talk about the role that content capture and content analytics play in digital transformation.
A little over a year ago, I wrote a paper on intelligent capture for digital transformation, sponsored by ABBYY, and gave a keynote at their conference on the same topic. The original English version is on their site here, and if you read German (or want to pass it along to German-speaking colleagues), you can find the German version here. As usual, this paper is not about ABBYY’s products, but about how intelligent capture is the on-ramp for any type of automated processes and hence required for digital transformation. From the abstract:
Data capture from paper or electronic documents is an essential step for most business processes, and often is the initiator for customer-facing business processes. Capture has traditionally required human effort – data entry workers transcribing information from paper documents, or copying and pasting text from electronic documents – to expose information for downstream processing. These manual capture methods are inefficient and error-prone, but more importantly, they hinder customer engagement and self-service by placing an unnecessary barrier between customers and the processes that serve them.
Intelligent capture – including recognition, document classification, data extraction and text analytics – replaces manual capture with fully-automated conversion of documents to business-ready data. This streamlines the essential link between customers and your business, enhancing the customer journey and enabling digital transformation of customer-facing processes.
Or, in German:
Die Erfassung von Daten aus papierbasierten oder elektronischen Dokumenten steht als
zentraler Schritt am Anfang zahlreicher kundenorientierter Geschäftsprozesse. Dies ist üblicherweise
mit großem manuellen Aufwand verbunden – Mitarbeiter übertragen und kopieren
per Hand Daten und Texte, um sie so nachgelagerten Systemen und Prozessen zur Verfügung
zu stellen. Diese manuelle Vorgehensweise ist jedoch nicht nur ineffizient und fehleranfällig,
sie bremst auch den Kundendialog aus und verhindert Self-Service-Szenarien durch unnötige
Barrieren zwischen Kunden und Dienstleistern. Intelligent-Capture-Lösungen – mit Texterkennung,
Dokumentenklassifizierung, Datenextraktion und Textanalyse – ersetzen die manuelle
Datenerfassung. Dokumente werden vollautomatisch in geschäftlich nutzbare Daten umgewandelt.
So können Unternehmen die Beziehung zu ihren Kunden stärken, das Benutzererlebnis
steigern und die digitale Transformation kundenorientierter Prozesse vorantreiben.
Recently, I was interviewed by KVD, a major European professional association for customer service professionals. Although most of their publication is in German, the interview was in English, and you can find it on their site here.
It’s been three years since I looked at ITESOFT | W4’s BPMN+ product, which was prior to W4’s acquisition by ITESOFT. At that time, I had just seen W4 for the first time at bpmNEXT 2014, and had this to say about it:
For the last demo of this session, Jean-Loup Comeliau of W4 on their BPMN+ product, which provides model-driven development using BPMN 2, UML 2, CMIS and other standards to generate web-based process applications without generating code: the engine interprets and executes the models directly. The BPMN modeling is pretty standard compared to other process modeling tools, but they also allow UML modeling of the data objects within the process model; I see this in more complete stack tools such as TIBCO’s, but this is less common from the smaller BPM vendors. Resources can be assigned to user tasks using various rules, and user interface forms are generated based on the activities and data models, and can be modified if required. The entire application is deployed as a web application. The data-centricity is key, since if the models change, the interface and application will automatically update to match. There is definitely a strong message here on the role of standards, and how we need more than just BPMN if we’re going to have fully model-driven application development.
A couple of weeks ago, I spoke with Laurent Hénault and François Bonnet (the latter whom I met when he demoed at bpmNEXT in 2015 and 2016) about what’s happened in their product since then. From their company beginnings over 30 years ago in document capture and workflow, they have expanded their platform capabilities and relabelled it as digital process automation since it goes beyond BPM technology, a trend I’m seeing with many other BPM vendors. It’s not clear how many of their 650+ customers are using many of the capabilities of the new platform versus just their traditional imaging and workflow functions, but they seem to be expanding on the original capabilities rather than replacing them, which will make transitioning customers easier.
The new platform, Secure Capture and Process Automation (SCPA), provides capabilities for capture, business automation (process, content and decisions), analytics and collaborative modeling, and adds some nice extras in the area of document recognition, fraud detection and computer-aided process design. Using the three technology pillars of omni-channel capture, process automation, and document fraud detection, they offer several solutions including eContract for paperless customer purchase contracts, including automatic fraud detection on documents uploaded by the customer; and the cloud-based Streamline for Invoices for automated invoice processing.
Their eContract solution provides online forms with e-signature, document capture, creation of an eIDAS-compliant contract and other services required to complete a complex purchase contract bundled into a single digital case. The example shown was an online used car purchase with the car loan offered as part of the contract process: by bundling all components of the contract and the loan into a single online transaction, they were able to double the purchase close rate. Their document fraud detection comes into play here, using graphometric handwriting analysis and content verification to detect if a document uploaded by a potential customer has been falsified or modified. Many different types of documents can be analyzed for potential fraud based on content: government ID, tax forms, pay slips, bank information, and public utility invoices may contain information in multiple formats (e.g., plain text plus encoded barcode); other documents such as medical records often contain publicly-available information such as the practitioner’s registration ID. They have a paper available for more information on combatting incoming document fraud.
Their invoice processing solution also relies heavily on understanding certain types of documents: 650,000 different supplier invoice types are recognized, and they maintain a shared supplier database in their cloud capture environment to allow these formats to be added and modified for use by all of their invoice processing customers. There’s also a learning environment to capture new invoice types as they occur. Keep in mind that the heavy lifting in invoice processing is all around interpreting the vendor invoice: once you have that sorted out, the rest of the process of interacting with the A/P system is straightforward, and the payment of most invoices that relate to a purchase order can be fully automated. Streamline for Invoices won the Accounts Payable/Invoicing product of the year at the 2017 Document Manager Awards.
After a discussion of their solutions and some case studies, we dug into a more technical demo. A few highlights:
- The Web Modeler provides a fully BPMN-compliant collaborative process modeling environment, with synchronous model changes and (persistent) discussion thread between users. This is a standalone business analyst tool, and the model must be exported as a BPMN file for import to the engine for execution, so there’s no round-tripping. A cool feature is the ability to scroll back through the history of changes to the model by dragging a timeline slider: each changed snapshot is shown with the specific author.
- Once the business analyst’s process model has been imported into the BPMN+ Composer tool, the full application can be designed: data model, full process model, low code forms-based user experience, and custom code (if required). This allows a more complex BPMN model to be integrated into a low code application – something that isn’t allowed by many of the low code platforms that provide only simple linear flows – as well as developer code for “beyond the norm” integration such as external portals.
- Supervisor dashboards provide human task monitoring, including task assignment rules and skills matrix that can be changed in real time, and performance statistics.
The applications developed with their tools generally fall into the case management category, although they are document/data based rather than CMMN. Like many BPM vendors, they are finding that there is not the same level of customer demand for CMMN as there was for BPMN, and data-driven case management paradigms are often more understandable to business people.
They’ve OEM’d some of the components (the capture OCR, which is from ABBYY, and the web modeler from another French company) but put them together into a seamless offering. The platform is built on a standard Java stack; some of the components can be scaled independently and containerized (using Microsoft Azure), allowing customers to choose which data should exist on which private and public cloud infrastructure.
They also showed some of the features that they demoed at the 2017 bpmNEXT (which I unfortunately missed): process guidance and correction that goes beyond just BPMN validation to attempt to add data elements, missing tasks, missing pathways and more; a GANTT-type timeline model of a process (which I’ve seen in BPLogix for years, but is sadly absent in many products) to show expected completion times and bottlenecks, and the same visualization directly in a live instance that auto-updates as tasks are completed within the instance. I’m not sure if these features are fully available in the commercial product, but they show some interesting views on providing automated assistance to process modeling.
Chip VonBurg, senior solutions architect at ABBYY, gave us a look at machine learning in FlexiCapture 12. This is my last session for ABBYY Technology Summit 2017; there’s a roadmap session after this to close the conference, but I have to catch a plane.
He started with a basic definition of machine learning: a method of data analysis that automates analytical model building, allowing computers to find insights in data and execute logic without being explicitly programmed for where to look or what to do. It’s based on pattern recognition and computational statistics, and it’s popping up in areas such as biology, search and recommendations (e.g., Netflix), and spam detection. Machine learning is an iterative process that uses sample data and one or more machine learning algorithms: the training data set is used by the algorithm to build an analytical model, which is then applied to attempt to analyze or classify new data. Feedback on the correctness of the model for the new data is fed back to refine the learning and therefore the model. In many cases, users don’t even know that they’re providing feedback to train machine learning: every time you click “Spam” on a message in Gmail (or “Not Spam” for something that was improperly classified), or thumbs up/down for a movie in Netflix, you’re providing feedback to their machine learning models.
He walked us through several different algorithms, and their specific applicability: Naive Bayes, Support Vector Machine (SVM), and deep learning; then a bit about machine learning scenarios inclunition rulesding supervised, unsupervised and reinforcement learning. In FlexiCapture, machine learning can be used to sort documents into categories (classification), and for training on field-level recognition. The reason that this is important for ABBYY customers (partners and end customers) is that it radically compresses the time to develop the rules required for any capture project, which typically consumes most of the development time. For example, instead of just training a capture application for the most common documents since that’s all you have time for, it can be trained for all document types, then the model will continue to self-improve as verification users correct errors made by the algorithm.
Although VonBurg was unsure if the machine learning capabilities are available yet in the SDK — he works in the FlexiCapture application team, which is based on the same technology stack but runs independently — the session on robotic information capture yesterday seems to indicate that it is in the SDK, or will be very soon.
Claudio Chaves Jr. of iCapt presented a session at ABBYY Technology Summit on how business process outsourcing (BPO) operations are improving efficiencies through service reusability. iCapt is a solutions provider for a group of Brazilian companies, including three BPOs in specific verticals, a physical document storage company, and a scanner distributor. He walked through a typical BPO capture flow — scan, recognize, classify, extract, validate, export — and how each stage can be implemented using standalone scan products, OCR SDKs, custom UIs and ECM platforms. Even though this capture process only outputs data to the customer’s business systems at the end, such a solution needs to interact with those systems throughout for data validation; in fact, the existing business systems may provide some overlapping capabilities with the capture process. iCapt decided to turn this traditional capture process around by decoupling each stage into independent, reusable microservices that can be invoked from the business systems or some other workflow capability, so that the business system is the driver for the end-to-end capture flow. The microservices can be invoked in any order, and only the ones that are required are invoked. As independent services, each of them can be scaled up and distributed independently without having to scale the entire capture process.
The recognize, classify and extract steps are typically unattended, and became immediate candidates to be implemented as microservices. This allows them to be reusable across processes, scaled independently, and deployed on-premise or in the cloud. For example, a capture process that is used for a single type of document doesn’t require the classification service, but only uses the recognize and extract services; another process that uses all three may reuse the same recognize and extract services when it encounters the same type of document as the first process handles, but also uses the classify service to determine the document type for heterogeneous batches of documents. iCapt is using ABBYY FineReader as a core component in their iCaptServices Cloud offering, embedded within their own web APIs that offer higher-level services on top of the FRE core functions; the entire package can be deployed as a container or serverless function to be called from other applications. They provide services for mobile client development to allow these business applications to have capture on mobile devices.
He gave an example of a project that they did for recovering old accounting records by scanning and recognizing paper books; this was a one-time conversion project, not an ongoing BPO operation, making it crucial that they be able to build the data capture application quickly without developing an excessive amount of custom code that would have been discarded after the 10-week project duration. They’re currently using the Windows version of ABBYY which increases their container/cloud costs somewhat, and are interested in trying out the Linux version that we heard about yesterday.
Andrew Rayner of UiPath presented at the ABBYY Technology Summit on robotic process automation powered by ABBYY’s FineReader Engine (FRE). He started with a basic definition of RPA — emulating human execution of repetitive processes with existing applications — and the expected benefits in high scalability and reduction in errors, costs and cycle time. RPA products work really well with text on the screen, copying and pasting data between applications, and many are using machine learning to train and improve their automated actions so that it’s more than the simpler old-school “screen scraping” that was dependent purely on field locations on the screen.
What RPA doesn’t do, however, is work with images; that’s where ABBYY FRE comes in. UiPath provides developers using their UiPath Studio the ability to OCR images as part of the RPA flow: an image is passed to FineReader for recognition, then an XML data file of the recognized data is returned in order to complete the next robotic steps. Note that “images” may be scanned documents, but can also be virtualized screens that don’t transfer data fields directly, just display the screen as an image, such as you might have with an application running in Citrix — this is a pretty important capability that is eluding standard RPA.
Rayner walked through an example of invoice processing (definitely the most common example used in all presentations here, in part because of ABBYY’s capabilities in invoice recognition): UiPath grabs the scanned documents and drops them in a folder for ABBYY; FRE does the recognition pass and creates the output XML files as well as managing the human verification step, including applying machine learning on the human interaction to continuously improve the recognition as we heard about yesterday; then finally, UiPath pushes the results into SAP for completing the payment process.
For solution developers working with RPA and needing to integrate data captured from images or virtualized screens, this is a pretty compelling advantage for UiPath.
It’s the first session of the last morning of the ABBYY Technology Summit 2017, and the crowd is a bit sparse — a lot of people must have had fun at the evening event last night — and I’m in a presentation by another ex-FileNet colleague of mine, Carl Hillier.
He discussed how capture isn’t just a discrete operation any more, where you just capture, index and store in a content management repository, but is now the front end to business processes that have the potential for digital transformation. To that end, since ABBYY has no plans to expand their side of the business, they have made strategic partnerships with a number of vendors that push into downstream processes: M-Files and Laserfiche for content management, Appian and Pega (still in the works) for BPM, and Acumatica for ERP. As with many technology partnerships, there can be overlap in capabilities but that usually sorts out in favor of the specialist vendor: for example, with Laserfiche, ABBYY is being used to replace Laserfiche’s simpler OCR capabilities for customers with more complex capture capabilities. Both BPM vendors have RPA capabilities — Appian through a partnership with Blue Prism, Pega through their OpenSpan acquisition — and there’s a session following by RPA vendor UiPath on using ABBYY for RPA that likely has broader implications for working with these other partners.
For the solution builders who use ABBYY’s FlexiCapture, the connectors to these other products gives them fast path to implementation, although they can also use the ABBYY SDK directly to create solutions that include competing products. We saw a bit about each of the ABBYY connectors to the five strategic partners, and how they take advantage of those platforms’ capabilities: with Appian, for example, a capture operator uses FlexiCapture to scan/import and verify documents, then the connector maps the structured data directly into Appian’s data objects (records), whereas for one of the content management platforms, they may transfer a smaller subset of document indexing data. The Acumatica integration is a bit different, in that FlexiCapture isn’t seen as a separate application for the capture front end, but it’s embedded within the Acumatica interface as an invoice capture service.
ABBYY’s plan is to create more of these connectors, making it easier for their VARs and solution partners (who are the primary attendees at this conference) to quickly build solutions with ABBYY and a variety of platforms.
Dimitry Chubanov and Derek Gerber presented at the ABBYY Technology Summit on ABBYY’s mobile real-time recognition (RTR), which allows for recognition directly on a mobile device, rather than just capturing content to pass on to a back-end recognition server. Mobile data capture comes in two basic flavors: first, the mobile user is just entering data, such as an account number or password; and second, the mobile user is entering both data and image, such as personal data and a copy of their ID.
ABBYY RTR isn’t based on taking a photo and then running recognition on that image; instead, it uses several frames of image from the camera preview stream and runs recognition algorithms on the stream without having to capture an image. This provides a better user experience since the recognition results are immediate and they don’t have to type the data manually, and better privacy since no image is captured to the phone or passed to any other device or server. They demonstrated this using a sample app on an iPhone; it’s interesting to see the results changing slightly as the phone moves around, since the recognition is happening using the previous several frames of video data, and it gradually gains recognition confidence after a few seconds of video. We saw recognition of unstructured paragraphs of text, drivers licenses, passports and bank cards. The SDK ships with a lot of predefined document types, or you can create your own by training for specific fields using location and regular expressions. They are also offering the ability to capture meter data, such as electricity meters, although some of this requirement is being by smart meters and other IoT advances.
They also have a mobile imaging SDK that can capture an image when it’s needed — for proof of ID, for example — with scene stabilization, document edge detection, deskewing and various types of image enhancement to capture the optimal photo for downstream storage and processing.
I can imagine, for example, a mobile airline app that needs to capture your passport information using mobile RTR to grab the data directly rather than having you type it in. I’ve also seen something very similar used to capture the unique number from an iTunes gift card directly into the App Store on an iPhone. Just like QR code reading is now built right into the search bar on the mobile versions of Google Chrome, and Google Translate on mobile allows real-time capture of text using the same camera preview mode (plus simultaneous translation), being able to capture text from a printed source instead of requiring a mobile user to type it in is likely to become ubiquitous in mobile apps.
Back in the SDK track at ABBYY Technology Summit, I attended a session on “robotic information capture” with FlexiCapture Engine 12, with lead product manager Andrew Zyuzin and director of product marketing Semyon Sergunin showing some of the automation classification and data extraction capabilities powered by machine learning. Traditional enterprise capture uses manually-created rules for classification and data extraction to set up for automated capture: a time-consuming training process up front in order to maximize recognition rates. At the other end of the spectrum, robotic process automation uses machine learning to analyze user actions, and create classification and extraction algorithms that can be run by robots to replace human operators. In the Goldilocks middle, they position robotic information capture as a blending of these two ideas: the system is pre-trained and processes standard documents out of the box, then uses machine learning to enhance the recognition for non-standard documents by analyzing how human operators handle the exceptions. Although I’m not completely aligned with their use of the term robotic process automation since RPA is not completely synonymous with machine learning and also isn’t limited to capture applications, I understand why they’re positioning their ML-assisted capture as robotic information capture as a middle ground between traditional capture and ML-assisted RPA.
We saw a demo of this with invoice capture: a PDF invoice was processed through their standard invoice recognition, detecting vendor name and invoice number, but the wrong number was picked up for the total amount due to the location of the field. This was corrected by a user in the verification client, and the information of where to find the total was analyzed for retraining and fed back to the recognition model. The user doesn’t know that they’re actually training the system — there’s no explicit training mode — but it just happens automatically in the background for continuous improvement of the recognition rates, gradually reducing the amount of manual verification. After the training was fed back, we saw another invoice from the same vendor processed, with the invoice total field properly detected.
Although I think that most technology is pretty interesting, this is the first thing I’ve seen today that made me say “cool!”
Zyuzin also walked us through their advanced classification, which can classify documents without any development based on large data sets of typical document types such as invoices, cheques, and drivers licences; automatic classification is important as the front end to recognition so that the correct recognition techniques and templates can be applied. Their advanced classification uses both image and content classification, that is, determines what type of document it is based on how it looks as well as the available text content. He showed us a demo of processing a package of mortgage documents, where there is a large number of possible documents that can be submitted by a consumer as supporting documentation; most of the documents were properly classified, but a few were unrecognized and required a quick setup of a new document type to train the classifier. This was more of a manual training process, but once the new document class was created, it could be applied to other unrecognized documents in the package.