When you’re focused on research around your own technology, it’s easy to miss out on what the academia is doing and what can be done differently. Keeping up with the research community is an essential part of our work. Reading relevant papers and regularly attending machine learning conferences has inspired new ideas, given us a sense of what’s coming next, and confirmed that we're on the right track.
One of the first conferences we attended was NIPS in Barcelona, in 2016. We were overwhelmed with new practical ideas, which we implemented even during the conference to get better results on our problems. In 2017, NIPS in Long Beach returned to its roots with the emphasis on theory, while interpretability and safety became important new topics. We were astonished by the improvements in GANs and Meta-Learning, which were hot topics alongside Bayesian Deep Learning.
Another important conference for us was ICDAR 2017 in Kyoto, which gave us insight into the latest advances in areas of text, document, and graphics recognition and analysis.
This year, the research team decided that one of the most relevant conferences to attend would be ECCV, and they proved to be right.
ECCV 2018 took place in Munich from 8th to 16th September. For the very first time, our whole research team of eleven attended the conference together. The conference covered a variety of topics, so every one of our team members had the opportunity to delve into valuable papers tied to his specific area of expertise. Here’s a summary of our findings and impressions from the conference.
Dataset is King
The first two days and the last day were reserved for workshops and tutorials, and once again, those proved to be the most useful part of the conference. We really appreciated the What is Optical Flow for? workshop. It showed us new ways of how the deep learning approach could significantly increase speed and accuracy over the standard approach. One fascinating fact we learned was that powerful optical flow algorithms could be trained on generated data only. From our experience, it’s difficult to match the real data distribution using the generator, but for optical flow, it seems that very simple image manipulations, in their Flying Chairs dataset, can produce incredible results on unseen real test data.
Another very useful workshop was Joint COCO and Mapilary Recognition Challenge Workshop, where Andreas Geiger talked about satisfying the thirst for data. His points were in sync with our thinking and experience. We learned about improvements in the development process of the COCO dataset, and how the process is sped up with the support of Facebook and Microsoft. The existence of large and maintained datasets like COCO plays a large part in the popularity of the specific problem. Our experience has shown that the quality and size of the annotated dataset almost always outweighs the importance of an ideal architecture, and that’s why we invest a lot in polishing our data streams and the annotation process. In our opinion, the research community would flourish if the academic and industry communities would cooperate to create new datasets with more control over the collection and labeling of the data.
The most frequent problem mentioned was pose estimation - approximately every fourth paper was on this topic. We can now safely say that all conferences put a strong focus on a single issue that attracts researchers a bit more. We saw the same case at NIPS 2016 where object detection was the big thing, and again at NIPS 2017 where it was all about GANs. The hype always starts after a significant advancement, and that was the case with pose estimation in the past two years.
We’re glad to see an increasing interest in optimization, an area that has been a big part of our core research in the last couple of years. For example, quantization of the model can be taken to the next level once researchers dig much deeper into specific hardware limitations to find the optimal approach. Also, model pruning and smart computation reduction in architecture design show promising results. It was exciting to see that all this work was done or supported by large companies like Facebook, Google, Samsung, and Apple.
On a few occasions, we were surprised to see that some papers implemented the same ideas that we’d implemented a few months or even a year before. And while we’re happy to know we’re sometimes ahead of the game, we’re well aware that there’s still a lot to be done. For example, in achieving software acceleration, a large variety of different hardware specifications makes it impossible to develop a single solution that would work across all devices. Big leaps in inference speed will come from hardware accelerators, which we need to wait for mobile manufacturers to add. Some of them, such as Apple, already did. Improvements in this field are crucial for the mass adoption of deep learning solutions on mobile devices, so we’re carefully following any news or updates in this field.
Live sessions are always an excellent intro to papers, all of which we found interesting, but we’ll point out one that grabbed our attention several weeks before the event. The session was about the Group Normalization paper by Yuxin Wu and Kaiming He from Facebook AI Research (FAIR). They introduced a new version of normalization which surpasses some limitations of the standard batch normalization scheme. Group normalization enables smaller-batch sizes while keeping the similar results. That frees up GPU memory for a bigger model and bigger image sizes, which is of great importance for object detection and segmentation problems, especially if images have small objects. The discussion continued at the poster sessions, where we took the opportunity to talk with the winners of the “COCO Detection Challenge” challenge. While their opponents were mainly focused on creating enormous architectures to squeeze out that extra half percent, the winners instead made a thorough analysis of their process. It showed how various changes in architecture interacted and explained why their new FishNet feature extractor outperformed standard feature extractors.
We mentioned that optimization is important. One great representative of this field is ShuffleNet V2. This paper provides practical guidelines for efficient network design, which goes beyond taking into consideration only FLOPs because speed depends on other factors, such as memory access cost and platform characteristics. One of the impressive papers on the same topic is AMC: AutoML for Model Compression and Acceleration on Mobile Devices, which uses reinforcement learning to compress the model. The authors removed hand-crafted heuristics and rule-based policies in the model compression process. We’ve also found numerous times that a network is better at finding optimal parameters on this kind of optimization problems, and we’re pleased to be soon able to delegate this problem to the network.
A new interesting idea in object detection is proposed in the CornerNet paper. They removed the standardly used anchor boxes and introduced top-left and bottom-right corners as box descriptors. Their motivation was the fact that anchor boxes didn't match the shape of ground truth boxes perfectly and they added a lot of extra hyperparameters. Our experience has shown that with the help of simple dataset statistics and overlap analysis, anchor boxes can be precisely adjusted, although we agree that they add more complexity to the overall training process.
One of the most impressive fields in machine learning is unsupervised learning, and all researchers pay attention to new improvements in sight. For us, the focus is on any area that reduces the need for labeled data. That’s why we would like to point out the Learning to Segment via Cut-and-Paste paper, which presents a weakly-supervised approach to object instance segmentation. Starting with known or predicted object bounding boxes, the paper proposes to learn object masks by playing a game of cut-and-paste in an adversarial learning setup. Even if these results are still far from the results of models trained on labeled data, the method can be used for pretraining or pre-annotation processes.
These are just a few papers that were most relevant and interesting to us, but there were many excellent papers at the event. We encourage you to visit the official page of ECCV 2018 and discover more exciting works.
To wrap up
All in all, we found ECCV2018 a well-organized conference with highly relevant papers for research in general, as well as industry representatives. The importance of the event can be seen in the number of submitted papers, big companies, and academic institutions that participated, and the rapidly growing number of attendees.
However, there is one drawback of this conference (and other similar conferences for that matter): the lack of result validation and code sharing. Some papers presented superior results for specific problems, but since they cannot be easily evaluated or reproduced, their validity is questionable.
Nevertheless, we had a great time in Munich and came back with lots of inspiration and new ideas. We’re looking forward to the next ECCV in 2020 while keeping an eye on ICCV in the meantime.