Microblink is dedicated to developing highly optimized computer-vision solutions for mobile devices, with emphasis on OCR technology (optical character recognition). The main advantages of our technology, which is continuously praised by our clients, are extremely fast recognition speed, high accuracy, and reliability of our algorithms. In several years we’ve completely shifted our development efforts towards machine learning, thus developing DeepOCR - optical character recognition powered by an entirely learned custom made neural network. How did we do it?
Our OCR was quite advanced and provided reliable results even before we started working on deep learning. It had high accuracy on a variety of use-specific documents and it was highly optimized for mobile devices, with great speed and no server-side processing. While it yielded great results, there were a couple of problems that emerged over time.
Firstly, there were difficulties with different scanning environments and scanning backgrounds, for example on ID documents, payment slips, or licence plates. Secondly, the development system was too complex for more specific use-cases and rather unscalable. A lot of engineering hours were put into finding mathematical algorithms for an increasing number of variables in document and text recognition.
We weren’t able to tackle these issues with the existing approach, so we started researching machine learning and its potential implementation into the development of our proprietary OCR.
A change of direction
The idea itself and the initial demonstration of the new paradigm were born out of a Master's thesis of one of our engineers, but what was needed in order to successfully pull off this change was understanding on all levels of our company. We’ve made a big decision: to put aside development of our existing mobile vision technology to invest almost all of our time and resources into something that was quite risky at the time. Luckily, it proved not only to be able to solve our issues at the time, but it also created lots of new opportunities for more complex and specific use-cases.
The idea is just a start
We’ve been in this for a couple of years, so take it from us: nothing comes easy. For any company to be able to build technology based on machine learning, it takes significant changes in perspective, company management, and people development. We could sum up our success story down to a couple of factors: huge amounts of data, process design, cooperation between teams, and extensive research.
There’s never too much data
In the traditional approach to algorithm development, the engineer is the one who makes all the decisions about the solution being implemented. In machine learning, these decisions are made automatically, i.e. they are made by being trained on data. The level of success of such algorithm heavily depends on quantity and quality of the data; meaning the greater the coverage of the problem domain with data, the more successful the algorithm.
It’s important to note how much “a lot” of data really is. Ten thousand in machine learning is not a big number - we’re talking millions. Collecting, storing, and managing such great amounts of data is an enormous challenge, which is why Microblink has a special team designated to that particular task.
Designing the learning system
Thanks to publicly available learning tools, developing machine learning prototype solutions is fairly simple. It might give an impression that it’s a plug-n-play approach to problem-solving (meaning that it doesn’t take a lot of knowledge or effort to come up with a good solution), but real-life use-cases are far more demanding, considering limitations with respect to model complexity.
When developing such solutions, great attention needs to be paid to the learning process, the behavior of learned models, and data properties. We’ve invested a lot of effort and resources to developing a learning system which would take into account all of those factors. Some of the main advantages of our system are: the ability to collect and manage huge amounts of data, fast learning on our GPU servers whose number is continuously growing, and detailed analysis of our learned models with respect to precision and performance on mobile devices. Our ultimate goal? To make machine learning a standard tool across the whole company, in order for it to be used outside of our core development as well.
One of the problems with machine learning is that learned models which give good results are often too heavy to be used on mobile devices; most of the current industry solutions include some kind of server work to get the results in the shortest possible time. Of course, such approach has its flaws, such as an obligatory Internet connection and inability to perform in real time. We started pretty early with the development of our proprietary, optimized software for running neural networks directly on mobile devices. Today we have a whole team working on its development, and our models perform twice as fast as commonly used open-source alternatives. on open source implementations for running neural networks.
Cooperation and communication are key
Our successful introduction of machine learning into our business is a result of continuous academic research, strenuous optimization efforts, and most of all, excellent communication between all employees - from top management to student interns. Our research team was not the only one challenged in this process - every new feature coming from the research department had to be backed up by lots of work on its implementation in our products and optimization for mobile devices, which is why good communication is crucial!
Many of Microblink’s projects have been organized in cooperation with academic institutions, and we’re striving towards finding more partners in the field. Machine learning is quite different in practice than in theory, so the greatest value we give to students is industry experience. Our engineers regularly hold internal education sessions and workshops for students to help students understand and work on real-life industry challenges.
We also have to be in the loop with the latest advances in machine learning happening on a global level. We’re regularly attending some of the biggest conferences and events worldwide, such as NIPS or ICDAR. It gives us an opportunity to get a feel of what new ideas are out there and what others are doing. So far, the impression we’ve got is that we are definitely keeping up well with the most powerful names in the industry.
Patience you must have…
Finally, it all comes down to time. The learning process is incredibly demanding and it never really ends. In several of our first trials, it took us weeks to get the first good results (and a glimmer of hope that we were indeed going in the right direction). It took us several years and a lot of trial and error to build the team we have today and the proprietary neural network that we named DeepOCR.
There was a lot of talk about all the hard work we do, but now we give you the proof. Implementation of deep learning in OCR has enabled a lot of exciting new features: automatic multiple feature extraction, scalability for complex images and documents, and accuracy of over 90 percent in some very challenging cases!
Our first big successful project in deep learning was recognition of handwritten math expressions featured in Photomath. To our knowledge, we were the first in the world to have developed an entirely learned OCR model that works in real time on a mobile device.
Our latest achievement has pushed the boundaries even further, by enabling full-screen receipt OCR with the same approach - entirely learned on data and with real-time performance on mobile devices. Easily integrated into any mobile app, BlinkReceipt is the best-in-class SDK that enables instant capture of every detail from consumer retail receipts, on SKU level.
In the USA, BlinkReceipt is already used on more than 100k images of receipts daily. It gives enormous value to businesses by significantly reducing the cost of purchase data collection, which no longer needs to rely on human transcription, and by improving user experience with instant results.
Take a look at how it works:
There are even more exciting breakthroughs on the way: we are currently working on implementing DeepOCR in BlinkID, followed by other products as well, which will enable us faster and better support for new kinds of documents. Stay tuned for updates!
If you’re feeling inspired to improve your business with advanced and user-friendly OCR solutions, feel free to explore our products and use-cases, and contact us below if you have any questions! We’d love to discuss what opportunities DeepOCR could provide you.