Our research team has come a long way in developing proprietary, custom-made machine learning system for mobile OCR. That is in large part due to our researcher’s expertise, who continuously keep up with trends in text recognition and explore new possibilities to optimize our neural network architectures. One such opportunity was this year’s International Conference on Document Analysis and Recognition in Kyoto, Japan. Our lead research engineer and CTO have recently visited the conference and got great insight into the latest advances in areas of text, document, and graphics recognition & analysis. They were especially interested to find out more about areas where deep learning played an important role. Here is a short review of their experience.
The conference was split into two parts - 4 days of workshops and 3 days of lectures and poster sessions. Workshops are considered optional when attending the conference, while the main program is shaped by lectures. However, after the last day of ICDAR, it was quite clear to us that the workshops were extremely valuable. There were a lot more face-to-face interactions with researchers, discussions after each lecture, and hands-on topics which are especially interesting for our industry specific solutions.
Problem complexity and data driven research
One particular area where deep learning seems to have made significant progress is scene text recognition. The most advanced neural network architectures could be seen on problems in this area. There are two main reasons for this, in our opinion. First, we believe that the creativity and innovativeness in problem solving are driven by the complexity of the problem - and scene text recognition is a very complex problem. From our personal experience, the complexity of problems such as performing OCR on handwritten math expressions on device in real time was something that really pushed us forward in our research. The second reason is closely related to the emergence of deep learning as a standard tool in computer vision. In order to utilize the power of these methods, large amounts of annotated data are needed. There were problems on ICDAR harder than scene text recognition, but there simply wasn’t enough data to make effective solutions and/or conclusions.
Lack of optimization
To our surprise, the one thing that the conference was missing was focus on optimization. Optimization makes research methods applicable to real-world products. To us it seemed like there was a general misconception that deep learning solutions aren’t fast enough for on-device processing. Our proposal to the organizers was to dedicate at least one workshop to this important area, and also that optimization is taken into account when comparing models in competitions.
We would like to point out that optimization is an important part in developing solutions and that deep learning neural nets can run on a device in real-time. Our first ML-based OCR model was in production in September 2016. The model was learned entirely on data to perform OCR for handwritten math expressions in the Photomath app. Afterwards, optimizing the runtime of our OCR models allowed us to have real-time ID scanning in BlinkID and receipt scanning in BlinkReceipt. We’re planning to continue with the development of mobile OCR for many other use-cases in the future.
All in all, it was a great event and we’re looking forward to ICDAR 2019. In the meantime, our research team is packing their bags again and heading for another conference - see you at NIPS 2017!