In the light of our latest BlinkID v5 release, we would like to explain just how big of a deal this is for us as a machine learning company. Moreover, we will show you the difference it will make for you, our trusted clients, and users.
ID Data Extraction: What's the Big Deal
Automated data extraction from identity documents is a challenging task. To help explain why let’s first think of how we do it as humans.
For humans, automated data extraction essentially means reading. With some practice, our sight allows us to read exceptionally well even in hard conditions (for example, in the case of incomplete or faded text, or poor lighting).
Engineers have been developing OCR systems to match the human level of reading performance for a long time and have succeeded in doing so for some specific types of text. A good example is the DeepOCR system we initially built into BlinkID, which was able to read all ID text with high confidence accurately. However, is this enough to automate the task of ID data extraction?
Unfortunately, no. In reality, the biggest problem is not reading the text, but understanding which ID information is contained in which text line. If you were asked to read out all the info on your ID, you could easily find and read your Name, Surname, Address, Date of expiry, or Document number. Simply because you already know the context. It’s hard to present the complexity of this task if you use your ID as a reference, so let’s try with a different example.
Can you find the name, surname, address, and document number on this ID image?
It’s hard because you don’t have any previous knowledge about the document. None of the words are familiar, making them impossible to classify with certainty. The two numbers have no additional information about their meaning either, so there’s no way to determine which one represents the document number. IDs are meant to be human-readable, but even people can struggle if they don’t understand the language or aren’t familiar with a particular ID type.
Building Our Document Expertise
The first step towards automating the task of ID data extraction is building human knowledge to understand all the different ID types. There are roughly 5000 different types of identity documents in the world, so this isn’t a trivial task. In Microblink, we have a dedicated team of document experts working in a custom-built system designed to make the process scalable and able to cover the entire volume of the ID space.
Once we set up a solid system for understanding various ID types, the main question is how to extract the ID data intelligently. Let’s start with the old-school approach: templates.
How We Used to Do It
Since all IDs are printed according to a predefined layout, the ID data extraction process seems pretty straightforward. Developers simply translate the knowledge from document experts into so-called templates - bundles of coordinates that indicate the locations of the relevant ID data. Implementation is easy, developers need to do load the appropriate template for an ID type and read out the text lines from the predefined locations.
Because it’s such a practical approach and it does the job well, this is the de facto standard in data extraction from relatively simple ID types. We have been setting this standard combined with great UX and security for years.
So, what’s wrong with this approach?
Since we built the first BlinkID using templates back in 2015, in the past several years we’ve found that templating has its serious limitations. Let’s take a look at a few examples.
The first example presents a very common problem when using templates. The actual text of a certain type of ID information (e.g. Address) varies in length and number of rows depending on the ID owner’s information. A template for this ID type must be defined in such a way that it always returns all the Address information for every ID owner. In the templating approach this would be done with a simple operation of finding the rectangle that encompasses all the possible Address text variations. This is where it gets tricky. Very often these rectangles cannot be defined without including text from other ID information fields. In those cases, engineers have to write complex code to find the actual ID information (in this example, the Address).
The second example presents a more complex problem common in some of the hardest ID types. Depending on the ID owner’s information, the template rectangles of two different ID information types can completely overlap. In this case, one ID owner has two first names and one last name, while the other ID owner has one first name and two last names. These two ID information fields are printed one after the other, making their separation with a single template impossible.
Setting new standards with AI
Our goal is to move forward and develop fully automated data extraction for all IDs in the world. We realized that relying on templates would drive our development process towards a dead-end. Sooner or later, we would come across an ID type that couldn’t be templated without our engineering team being forced to write complex code to extract its data. This would be a serious issue as scaling up support for IDs would entirely depend on the engineers’ effort. For comparison, in our BlinkID v5 release we have added support for a large number of ID types that would take our engineering team several years of development with the old templating approach. To make this possible, it was crucial to build a system that would scale up with new knowledge from our document experts. Since we knew that knowledge could be structured as data, we decided to build a machine learning system.
When switching to an ML-based system, there are two major concerns that need to be addressed:
- How much data is required,
- and how fast the system can run.
Both are valid concerns and have been our main challenges in building this brand new system.
Hunger for Data
ML models replace explicit code logic with functions learned from data. One is relying on data to represent the entire problem with all it's particular, which means lots of data is needed. The complexity of ID data extraction can range from very simple ID types (see the first image in this post) to extremely complex ID types (challenging even for human document experts). Our goal was to build a system that would cover this whole range of complexity while being extremely data efficient. After years of dedicated work by our researchers, engineers and document experts, we can proudly say that we have finally succeeded in doing so. Our system now requires very little data when tackling even the most challenging ID types.
Need for Speed
Speed is imperative in all our products. Developing real-time mobile vision technology means all our solutions run in less than a second on a standard smartphone.
We’ve already set the bar high with the Photomath app, which uses our DeepOCR system to perform recognition of handwritten math equations in an instant. To our knowledge, this was the world's first entirely learned OCR model running in real time on mobile devices back in 2016. We succeeded by having our teams work on two main challenges:
- designing an optimal ML model suited for the task (it’s a tiny model), and
- developing our very own ML inference system that surpasses the speed of any existing software on the market (we later called this system the Runner).
However, ID data extraction is a different problem. On the one hand, it’s far more complex and goes beyond OCR. On the other hand, the speed requirements remain the same: real-time performance on a standard smartphone. So, building the new BlinkID v5 challenged us both in the design of our ML models and the further development of the Runner. Again, with a joint effort by all our teams, we managed to fulfill the set conditions and provide the world with what we believe is the first entirely learned real-time ID data extractor.
What does this mean for you?
As our first entirely ML-powered product, BlinkID v5 enables new levels of scalability and efficiency in our development process. More importantly, adding support for new ID types is now significantly faster, and we’re able to respond to clients’ needs much quicker than before. We’re going full steam ahead towards our goal of tackling even the hardest data extraction problems and ultimately supporting all ID types in the world. What’s more, we also introduced a new, redesigned UX, which will make the scanning process easier and faster for end users than ever before.