Outside the Westworld realm, AI comes in many forms, has quite a long history, and is more present in your everyday life than you might imagine.
We connect with friends on social media, get an Uber to work, discover new shows on Netflix, and get restaurant recommendations in Maps - all of those are a digital extension of activities we were always doing. Only now, in large part thanks to AI, the way we interact with those services is faster, easier, and smarter.
Machine learning, computer vision, natural language processing, neural networks, and recommendation engines - research engineers have been developing those for several decades now.
So why does everyone seem to be jumping on the AI bandwagon just now?
In 2012, AI research made a big breakthrough in the shape of deep neural networks.
They are way more powerful than the previous learning methods and provide the best results in numerous real-life problems. Their development is backed up by advances in computing power, which allowed running a huge number of parameters and the complex, data-heavy learning process.
Deep neural networks enable powerful solutions for object detection, image recognition, and segmentation. For companies, they provide an opportunity to build supreme user experience, increase efficiency, and delight their customers.
How are AI products built?
When it comes to building products and services based on AI, companies commonly go with one of two approaches. They either use plug&play APIs and machine learning models or develop their own, specialized solutions.
Off-the-shelf APIs are the first choice for many businesses because they’re very affordable and easy to use. The leading computing providers make them easily available with the goal of attracting more developers to use their platforms. That’s why Amazon Rekognition, Google Vision, or Microsoft Cognitive Services APIs are continuously becoming more advanced and powerful.
However, these tools are not flawless. They often perform poorly when solving specific AI problems, and businesses miss the opportunity to deliver supreme quality and user experience.
Many APIs enable text recognition on images, but they’re almost certainly not adjusted to extracting specific types and formats, i.e., making sense of the recognized text. For example, they can’t detect the name and surname on someone’s ID or recognize a particular kind of a math problem. At Microblink, we realized that gap and took a step further to provide a significantly better, specialized computer vision software on the market.
The other AI approach companies take is building proprietary solutions for specialized application in a particular product or service. That approach is also becoming easier to start with. The main reason for that is the growing number of publicly available datasets for training machine learning models. Some may work even for specific ML problems (e.g., MS COCO dataset for object detection, segmentation, and captioning).
Besides easily accessible datasets, there are also more open-source tools that simplify the learning process. PyTorch and Tensorflow 2.0, the most popular open-source machine learning libraries for research and production, are significantly more user-friendly than they were, let’s say, two years ago.
Sorry to spoil the party again, but there’s a catch in this approach as well. The theory is one thing, but in real-life problems, it can be hard to get the desired results. There are two common issues one should watch out for when building their own AI solutions:
- The discrepancy between the dataset that was used to train neural network models and the actual data that the models are applied to. Particularly, models trained on publicly available datasets are rarely suitable for highly specialized products.
- The problem of applicability in terms of resource, memory, and time consumption, given that the final solution needs to satisfy the efficiency requirements that often don’t exist in the research phase of AI development.
(In simpler terms, end users have little understanding and even less patience for challenges in ML-powered services. The service has to work always and it has to work fast.)
Taking the best of both worlds
In the past six years, we’ve become well aware of the benefits and challenges of both approaches. We made AI the core of our technology and turned to develop deep neural networks several years ago to target the following problems:
- Text recognition and extraction on images of ID documents, credit cards, retail receipts, and even math problems
- Detection and classification of different document types
- Image analysis and various other tasks.
We’re dedicated to tackling these computer vision challenges to enable companies supreme digital experiences, such as:
- Effortless bill payments (by scanning payment slips - because who even goes to the bank anymore?)
- Easy expense tracking (by scanning retail receipts instead of entering each purchased item manually)
- Remote user onboarding (by scanning identity documents or credit cards)
- Mastering math with ease (by using an app to scan, solve, and learn math concepts)
We use an integrated approach that combines the best of the two perspectives mentioned above, and our success confirms that we’re on the right track. Our technology has a global outreach, saving time and making everyday tasks easier for over 100 million end users around the world. Photomath, today a separate company from Microblink, is the #1 app to learn math, has over 120 million downloads and has recently been integrated into Snapchat. BlinkReceipt, our product for scanning retail receipts, has so far been used to scan over 180 million retail receipts. BlinkCard, our latest product, is the fastest and most accurate scanning software for various types of credit and bank cards.
If this has sparked your curiosity, great! We’ve sprinkled some more wisdom in another post: Our 5-Step Guide to AI Development.