Table of contents:
- Why the change?
- What has changed?
- Platform-specific changes: iOS, Android, cross-platform
- How does it all that affect me?
In the last two years, we have shifted from traditional text recognition to a deep-learning approach. Our research team designed a custom-made machine learning system for OCR and is continuously working on new models of state-of-the-art neural networks, while our development team makes sure that DeepOCR runs fast on mobile devices while at the same time requiring only minimal memory. This enables high accuracy and speed for even the most complex use cases. DeepOCR technology is already implemented in our award-winning product BlinkReceipt and it also powers the recognition of handwritten problems in the Photomath app.
Microblink SDKs are used in a wide variety of use cases, from scanning of identity documents using BlinkID, payslips or invoices using PhotoPay, various predefined data using BlinkInput, to barcode- and QR code-scanning with PDF417. Now it's time to prepare all SDKs for the implementation of DeepOCR, but it's not as straightforward as one might think. Such a variety of use cases cannot be solved with a single DeepOCR model, and so support for using multiple models within an SDK is needed.
The above-mentioned is the reason why we decided to change the licensing subsystem with the backward-incompatible change of the API and to introduce the new format of license keys. The new API and the new license keys are necessary to support all the information required to run DeepOCR and also to support other new features that we plan to add in 2018.
The release of the new API provides some additional key benefits for developers:
- the integration of SDKs is easier and more flexible;
- the SDKs are now faster and smaller;
- the interaction of objects within the API involves much less overhead to the native library that does all the processing.
We understand that this type of major change requires additional development effort on integration so we will be available to help you at every stage of the development. Please don't hesitate to contact us for support.
Since this license format change is not backward compatible with the current license format and as we use semantic versioning for our SDKs, this means that we need to raise the major version number of all our SDKs.
The new versions for the Microblink SDKs will be:
- PDF417.mobi SDK
- for Android: version 7.0.0
- for iOS: version 7.0.0
- BlinkInput SDK
- for Android: version 4.0.0
- for iOS: 4.0.0
- BlinkID SDK
- for Android: version 4.0.0
- for iOS: version 4.0.0
- PhotoPay SDK
- for Android: version 7.0.0
- for iOS: version 7.0.0
As you may notice, we decided to increase the iOS versions by more than one version number. This is to reduce any risk of confusion and ensure that the same version number is used for both Android and iOS SDKs, as well as for the wrappers (PhoneGap, Xamarin, React Native).
Please note that the existing license keys cannot work with the new SDK versions, but they will continue to work with the existing SDK versions. Vice versa, it's not possible to use the new license key with the old SDK versions.
In this section, we will describe all the changes in our SDKs, namely:
- the change in the license key formats, specifically the licensing subsystem and licensing API;
- the change in handling the recognizers and parsers;
- implementation of a new concept: the processor.
Because of the ever-increasing number of features that clients require from us, we decided that we needed a new license key format that would support these present and future demands. Technically, adding support for that required increasing the size of the binary layout of the license buffer, which meant that our license keys could no longer be formed from 8 groups of 8 alphanumeric characters.
Therefore, the new license keys are utilized and are now distributed in three different formats for the client to decide which one to use:
- as a file;
- as the base64 encoded strings;
- as raw buffers.
We recommend using the license key as a file, as it's the easiest way to manage multiple license keys (trial and production). Instead of having different license setup codes for your test and production app, you can now have the same code while using different license files within assets of your app.
We simplified the API for setting up the license key. Instead of having several different ways of setting up the same license key, especially in Android, now there is a unified way to set up the license keys in both, i.e., Android and iOS, SDKs.
For example, in Android, a class called [MicroblinkSDK] allows you to set the license key in three ways:
- as a path to the file within your assets folder;
- as a base64 string;
- as a raw buffer.
The choice is yours. A similar class exists in iOS and can be used similarly.
One of the greatest problems with the licensing system in the old API arose when a developer set a license key that didn't allow usage of a specific recognizer and then activated that recognizer. The developer was informed about the licensing error at the point when the native library was starting up and after the camera had already been initialized. This information was delivered via asynchronous callback, which was difficult to handle and confusing for most developers. Sometimes developers would ignore the callback and then wonder why the scanning wasn't working.
With the new API, this is no longer possible. We expect a developer to set the license key as early as possible during the startup of an app. Whenever a specific recognizer, detector, processor, or parser that is not allowed by the license key is instantiated, an exception arises in Android, and an NSError will be returned in the iOS. Thus, it will be much more difficult for a developer to go into production with an invalid license key.
Also, now we ensure that the trial license keys always inform a user when a trial version is being used so that the app testers can easily notice if the production version is using a trial license key.
All our existing clients are already familiar with the concept of a Recognizer. Some of them are also already familiar with the concepts of a Parser and Detector, which are available only within BlinkInput, BlinkID, and PhotoPay SDKs. However, in the new API, we're introducing a new concept: the Processor. To explain the processor concept, let's first review the concepts behind the recognizer, parser, and detector.
The recognizer has always been the main unit of recognition within Microblink SDKs. Basically, a recognizer is the most abstract object that serves a specific use case. For example, [BarcodeRecognizer] is an object that knows how to scan barcodes on images received from a camera, while [MRTDRecognizer] is an object that knows how to find a machine-readable zone of a travel document on a camera frame, performs OCR on that zone, and extracts relevant document information from it.
As you can see, a recognizer is quite a complex object with many responsibilities:
- It manages the detection of objects like barcodes, ID cards, payslips, and machine-readable zones.
- It performs image correction and the dewarping of detected objects.
- It performs optical character or barcode recognition.
- It intelligently questions the recognized data in order to produce the final result.
Recognizers are not new and have existed in the all Microblink SDKs from the very first version of every SDK, but initially, they were internal objects. Developers could only interact with them by creating [RecognizerSettings] objects that configured the expected behavior of a specific recognizer. When recognition was finished, developers then needed to typecast the given [BaseRecognitionResult] to the specific [RecognitionResult] for the specific recognizer. This, however, proved rather confusing, as it was not always clear that the specific [RecognitionResult] could only be produced by the specific recognizer configured with the specific [RecognizerSettings].
Now, this process has been simplified. A developer now simply needs to instantiate a specific recognizer object, configure it, and give it to the [RecognizerRunner] object, which will use it to perform the desired recognition. After doing the recognition, that same specific recognizer will internally contain its recognition result, which a developer can then obtain by calling on an appropriate getter method.
This makes the recognizers long-lived stateful objects that live within an app and change their internal state while performing recognition. This is probably the biggest change developers will face when integrating the new version of Microblink SDK, but once used to it, it will be obvious that the recognition is much simpler to handle than it was before.
There is one special type of recognizer that is very flexible and configurable - it's called the Templating Recognizer. It is used as a part of the Templating API, which allows manually defining its behavior. To perform the detection of objects, a detector is required. Then, locations are used within that detection to identify any parts of the detected object that need perspective correction, and the settings for performing OCR on the corrected images are defined. Finally, the parsers that extract structured information from the OCR result are defined.
With the new API, we upgraded the flexibility of the Templating Recognizer and added a new processor concept that can be used within the Templating API. This is explained in more detail below.
A detector is an object that knows how to find a certain object in a camera image. BlinkID developers are likely familiar with [DocumentDetector], which can find cards and checks in images, and [MRTDDetector], which can find documents containing a machine-readable zone in images. Those two detectors will remain in the BlinkID and PhotoPay SDKs, while other detectors will be removed from the SDKs.
Previously, developers similarly interacted with detectors as with recognizers: they created a specific [DetectorSettings] object and associated it with a special recognizer called [DetectorRecognizer] by using the [DetectorRecognizerSettings] object. Then, during the operation of [DetectorRecognizer], after it had internally performed the detection and before continuing to the next step, it returned the concrete [DetectionResult] via [MetadataListener (or the [didOutputMetadata] callback in iOS). This asymmetry was confusing even more than the case with recognizers, especially because the same callback could receive detection results from internal detectors within recognizer objects and no one actually knew where these results were coming from.
In the new API, a developer will simply create a specific detector and associate it with [DetectorRecognizer] directly. After [DetectorRecognizer] internally performs detection using the specified detector, its detection results will remain saved within the specific detector and will be available to the developer via the provided getter method - in the same way as the recognizer's result is available via the specific recognizer's getter method.
Using detectors will now be the same as using recognizers, which we believe will make things a lot easier for developers.
Parsers are objects that can extract structured data from the raw OCR result. BlinkInput, BlinkID, and PhotoPay developers will already be familiar with the concept, especially when using the field-by-field scanning feature. With the field-by-field scanning feature, each parser tries to extract specific information from the OCR result obtained by performing OCR over a small area of a camera frame in the user interface.
In previous versions of SDKs, the parsers always produced their results as strings. This proved confusing for some use cases, like date parsing, where the date parser would return the string as returned from the OCR engine and, although it internally knew which part of the date was the day, which part was the month, and which part was the year, it had no way to communicate that back to the developer.
Moreover, in order to obtain the specific parser result, the developer had to know the exact name of the parser and the exact name of the parser group where a parser was placed. To make things even more confusing, when using [BlinkInputRecognizer] for performing field-by-field scanning, it was possible to use multiple parser groups over a single image, while when using [DetectorRecognizer] or [MRTDRecognizer] (i.e., Templating Recognizers), the name of the parser group was actually the name of the location within the detected location of the document and there was always a single parser group for each decoding location.
Has that confused you? I bet it has! To address this issue, we really thought hard and long about how to make this concept easier to use, but without losing all the flexibility it provided. We love symmetry, so we thought that it would be a good idea to organize parsers in the same way as recognizers and detectors are organized. So, we did it.
The parser is now a stateful object, just like the recognizer or detector. Developers will create a specific parser and then associate it with [ParserGroupProcessor] (more on that later), which will be associated with either [BlinkInputRecognizer] (for the field-by-field scan) or with the Templating Recognizer. Then, after the parser performs extraction of the OCR result, it will save the extraction result internally, and it will be available to the developer via the provided getter method, just like the recognizer provides its result via its own getter method.
This means that developers will no longer need to worry about assigning arbitrary strings to parser names and to then use those strings to later obtain parsed results from some obscure [BlinkInputRecognitionResult]; now, the parser's result will be available within the parser object.
Some might ask: "What about parser groups? Where did they disappear to?"
In the above story about parsers, you probably noticed that in the old API, parsers were grouped into parser groups, where every parser within the same group would perform extraction of the same OCR result calculated for the entire parser group. You also probably noticed the discrepancy between the field-by-field scan and the Templating API, where you could use multiple parser groups on the same image in the field-by-field scan, but only a single parser group on the dewarped image within Templating Recognizer.
We were thinking: "How to avoid that discrepancy and also provide more flexibility within Templating API?" or for example, "How to ensure that recognition performed with Templating API is not fully complete if the image that should contain a person's face in the document does not contain it?" We knew we needed something like a parser, but not working with the OCR result. Instead, it should work with the image just like a recognizer but should be possible to use within Templating Recognizer. Well, that led us to the Processor.
A processor is an object that can perform recognition of the image. Unlike the recognizer, the processor cannot be used alone - it must be used within the Templating API. The above-mentioned [ParserGroupProcessor] is a special processor (it acts as a parser group in the old API) that performs OCR on a given image using the same rules as the parser group used in the old API, and then runs every parser bundled within it to extract the OCR result. If a developer needs a dewarped image, [ImageReturnProcessor] can be used to simply save the image that was provided to it. In future releases, we plan to add lots of new processors for various use cases.
And the architecture of the processor object is the same as the architecture of the recognizer, parser, and detector. A developer will create the processor and associate it with Templating Recognizer. After the recognition is finished, the developer will obtain the result from the processor.
If you were familiar with our Templating API you might now ask: "Where are the classifiers? How do we define decoding locations?"
Well, decoding locations are now defined within [ProcessorGroup], which contains
- one or more processors;
- a location of interest within a document;
- instruction on how to perform image correction and dewarping.
Templating Recognizer uses the chosen instruction to perform image correction and dewarping of the desired location and then runs processors within the given processor group on the corrected image.
What about classifiers?
We changed those too. In the old API, a developer had to define a single document classifier that needed to provide a classification of the document based on the parser results obtained in the pre-classification stage of Templating Recognizer's processing to continue processing with the document-specific parsers. Yes, we know that was a complex sentence, but it describes the very complex process that developers had to follow in order to use Templating API to recognize the custom document correctly.
Now, to provide a better abstraction, we created Class, which is an object containing two collections of processor groups and a classifier. The two collections of processor groups within Class are:
the classification processor group collection;
the non-classification processor group collection.
This process goes as follows:
All processor groups within the classification collection perform processing.
The classifier decides whether the object being recognized belongs to the current class and if it decides so, then the processor groups within the non-classification collection perform processing.
Finally, Templating Recognizer just contains one or more class of objects.
OK, you have lost me back at the recognizer. Do I need to use this Templating API?
In the most common cases, the Templating API is not used. The Templating API is a very flexible API that can be used to perform the recognition of almost any document, and with the new release, it has become even more flexible than it was in the old API. However, flexibility comes with increased complexness and we are aware of that.
If we simplify it too much, then developers will not be able to add support for scanning custom documents, such as loyalty cards, or will be very constrained about what they can do. The Templating API would then not be flexible enough for many practical use cases, and that would make our SDKs useless for those who want to add support for documents by themselves. Adding lots of flexibility makes Templating API very complex, but also very powerful.
Hence, we decided to make Templating API flexible and powerful, at the cost of it being more complex. The Templating API has always been and will always be a tool for the more advanced developers - typically those specialized in Microblink's technology.
The changes described above apply to all platforms. However, there are some additional changes to mention that are specific to Android and iOS SDKs.
A big problem with the old API was that the same concepts had different names in the Android and iOS SDKs. This was a problem in cases when a developer became familiar with Android documentation but then needed to port its code to iOS. Code porting was not so straightforward as some recognizers and UI elements had different names and even some basic API objects were named completely different (for example, [PPCameraCoordinator] in iOS was the same as [RecognizerView] in Android - but who knew that without asking our support engineers?).
The new API, however, has unified naming across platforms. The only difference in names now are those due to a specific platform's naming conventions; for instance, the DirectAPI singleton will now be called [RecognizerRunner] in Android and [MBRecognizerRunner] in iOS. Similarly, in iOS, there is now [MBRecognizerRunnerViewController], and in Android, there's [RecognizerRunnerView] and [RecognizerRunnerFragment]. In the same way, other components will have similar, if not the same, names, as you will see from the new and updated documentation accompanying each SDK release.
In the new API, besides the scanned text, the results (in the recognizer and processor) can also contain images. This is especially important for BlinkID SDK. Now, it will be much easier to obtain images of the documents as well as faces and images of signatures from the documents. Those images will no longer be sent to an image callback. Instead, images will now be part of the specific recognizer's result, just like the extracted OCR data is.
Note for Android
In order to support this, we needed to change the way how recognizer objects are passed between activities via Intent. The problem is that Android has very strict limits on the size of data transferred via Intent, so it is not possible to transfer images. You can find details about this in the documentation and troubleshooting part of the new README. Also, make sure to check updated sample integration apps to see any changes.
Specifically for iOS, there are several notable changes to mention.
- Since the recognizer object is now a stateful object that gets mutated while it performs the recognition, we needed to change the way results are delivered via a delegate. Previously, that happened in [didOutputResults]: a method that was always called on in the main thread. Now, this happens in [didFinishScanning]: a method that will always be called on in the background processing thread. The reason for this is that when this method is called on, the recognition cannot continue since the same thread is busy processing the callback method. This gives you the opportunity to pause scanning while still in the processing thread to prevent changes to the recognizer's result that could occur during processing of the new camera frames that will arrive while the block is being dispatched from the processing thread to the main thread.
- There is no longer a [didOutputMetadata] delegate method. Instead, there is a separate delegate for every metadata item that can be obtained during processing. In this way, it is now more clear which methods need to be implemented if specific metadata needs to be obtained.
- The segment scan overlay is renamed to [MBFieldByFieldOverlayViewController] and now will be part of the SDK. This means that integrating the field-by-field scanning feature into your app will be much easier, as you will not be required to copy lots of code from our sample app into your app to get that behavior. Using field-by-field overlay will now be as simple as using any other scanning overlay.
For more details about the iOS changes, you should always check the updated sample integration apps and documentation.
Specifically for Android, there are two significant changes.
First, just like in iOS, we removed [MetadataListener] and introduced separate callbacks for every metadata that could be obtained during processing. This makes it much easier to manage events reported by the recognition process.
Second, we introduced the [RecognizerRunnerFragment] for more flexible integration of the built-in UI.
One of the questions developers often asked us was how could they embed a built-in UI into their applications' UI. Unfortunately, with the old API that was not possible. Developers could either use our built-in activities or [RecognizerView] to create their own scanning UI from scratch. Creating a custom UI from scratch was too much effort for some, and yet they needed our scanning UI within their layout. This usually resulted in the developers using our built-in activities instead of presenting a scanning interface as they originally intended or resulted in a poorly integrated [RecognizerView] that caused weird bugs and crashes in the final app.
Therefore, in the new API, the [RecognizerRunnerFragment] is introduced. We created a fragment that controls the [RecognizerRunnerView] and can be skinned with different built-in overlays. Furthermore, every built-in activity is now actually implemented in a way that it presents the [RecognizerRunnerFragment] in full screen and adds a specific overlay to it. This is very similar to what occurs in iOS integration. Now developers are given a way to simply present our built-in scanning UI somewhere within their application layout, without forcing them to navigate away to a new activity.
When using [RecognizerRunnerFragment] or [RecognizerRunnerView], notification that scanning was completed will be obtained via [ScanResultListener], just like before when using [RecognizerView] in the old API. However, there are some differences in behavior. Most notably, just like in iOS, [ScanResultListener] method [onScanningDone] is no longer invoked in the UI thread. Instead, it is invoked in the background processing thread to give the opportunity to pause scanning and to prevent changing of the recognizer's result object while the runnable block is being dispatched from the processing thread to the UI thread.
For more details about the changes in Android, you should check the updated sample integration apps and documentation.
Although we initially planned to change the API only for the native Android and iOS SDKs, while keeping Cordova/PhoneGap, Xamarin and React Native wrappers untouched, we quickly saw that doing so will keep the wrappers out of date with native SDKs.
Therefore, we sought a way to automate the creation of these wrappers, and we found it. This process required some changes in wrapper APIs.
We already released BlinkID v4.0.0 for PhoneGap, React Native, and Xamarin with the new API.
The great news is that now we are able to offer nearly same-day support for these wrappers (same-day support for PhoneGap and React Native, and next-day support for Xamarin), as for the native SDKs. In other words, as soon as we release new versions of our SDKs for iOS and Android platforms, we will automatically add support for SDKs in PhoneGap, React Native and Xamarin wrappers.
To be clear, the next update to the new SDK will not work straight out of the box. A developer will need to adapt your application to the new API. This means that you will need to get new license keys for all your applications and change the integration code. Depending on the complexity of your app, this may take from a couple of minutes to a couple of weeks, so make sure you get prepared to do the work.
But fear not! If you used the most basic level of integration, there's only a small set of changes that you will need to apply to your app. However, if you created a custom scanning UI using Microblink SDK or if you used the Templating API to add support for scanning some custom document types, then you will need to be prepared to make some more substantial changes to your codebase, which may take up to a couple of weeks. But rest assured, we want the upgrade process to be as easy as possible for you, so don't hesitate to ask our engineering teams if you need help.
To help you plan the changes in your applications, we are announcing the SDK release schedule below.
PDF417 Android SDK was released in January along with detailed documentation, followed by the PDF417 iOS SDK release in March. These SDKs hopefully served to give you a glimpse of the new Recognizer architecture and a chance to test the new license key formats.
We're have released BlinkInput SDK with the new API for both Android and iOS in May. This release gives you the opportunity to play with the new Templating API, the new Field-by-Field scan, and the new Parser and Processor architectures. It also contains a preview version of our next-generation DeepOCR engine. However, DeepOCR is optional to try it out in your experiment, and we would welcome your feedback on how we can improve it.
We have released the new BlinkID SDK in June, and we're planning on releasing PhotoPay SDK sometime during July. However, the release date will depend on the number of issues we receive from developers who have tried the new API in BlinkInput or BlinkID.
After we make sure the new API works flawlessly, we will continue porting the BlinkID and PhotoPay SDKs to the new API. We strongly encourage you to try the new API send us your feedback - it is greatly appreciated.
Ultimately, we truly hope that you will enjoy using our products with the new API as much as we enjoyed creating it for you.
For feedback and help with integration, contact us at help.microblink.com.