The results of the European Conference on Computer Vision 2020 – Robust Vision Challenge (RVC) – Object Detection have been announced. Wisers AI has won the runner-up in the competition for its advanced artificial intelligence technology. Robust Vision Challenge is a workshop under the European Conference on Computer Vision 2020, the most important computer vision conference. Renowned companies including Apple, Google and Intel have sponsored and participated in the conference.
The RVC object detection competition features a combination of three distinctive and extremely different benchmarks. Among them, the Open Image Competition organized by Google opted to join RVC in 2020 to become one of the benchmarks for the competition. In addition, RVC chose the COCO (Common Object in Context) dataset that is widely used in object detection and the MVD (Mapillary Vista Dataset) dataset dedicated for high-definition street-level view images as the other two benchmarks. Competitors are required to present a unified solution in the unified label space officially provided by RVC and submit a single model to all three benchmarks for comprehensive evaluation. Handling all three distinctive benchmarks in a unified way has become the greatest challenge for the competition. Usually, the label confusion may occur due to the semantically overlapping nature of some labels from different datasets, which would severely jeopardize the training process.
Moreover, there exists data imbalance issue in each dataset. This is made worse when combining them together. Last but not least, each benchmark has its data characteristics – the image sizes, the aspect ratio of bounding box, and even the content would vary across datasets. This undoubtedly poses another grave challenge in the design and implementation of a unified solution for the competition.
To cope with the above challenges, Wisers AI team has come up with a series of solutions and managed to achieve a satisfying evaluation result based on its years of experience in model expansion and generalization. Among them, the label merge technique can merge the labels with great similarity across different datasets and recover the original labels through reverse mapping during the post-processing phase. This method can largely eradicate the impact of label confusion during training.
To address the data imbalance issue, the team introduced the term frequency–inverse document frequency (TF-IDF) technology in Natural Language Processing (NLP) into image recognition. First, an offline data sampling based on the TF-IDF is adopted to screen the training samples according to their importance in training. On top of this, the team proposed a soft-balance sampling strategy to achieve a class-aware sampling on training data. At last, a hybrid training scheduler is used to ensure the optimal utilization of each sample class, combining different sampling methods. By using these approaches, the impact of data imbalance issue can be mitigated to a great extent.
Data heterogeneity is another issue. Among the three datasets, the MVD included a lot of high-resolution street view photos from users’ mobile phones or cameras, while the other two datasets mainly used pictures collected from the Web. Therefore, there are great differences in resolution, image size, aspect ratio of bounding boxes and the content among data samples from a different dataset. To tackle this issue, a deeper network architecture is adopted, combined with a set of comprehensive training image augmentation techniques, such as random cropping, multi-scale augmentation, and test-time augmentation. Finally, expert models were used for a few unique yet poorly performing classes to further mitigate the impact of data heterogeneity.
After years of development, Wisers AI, which was founded in 2014, has laid a solid foundation for the development and expansion of intelligent data services and AI solutions by virtue of its solid data capability and cutting-edge AI-based natural language processing technology. The AI will continuously develop new algorithms, tools, and technical solutions to solve more basic technical challenges in text understanding, computer vision, multi-modality data mining and multidimensional correlation analysis. Wisers AI will also continue to explore the combined applications of AI and big data, expand the applications of AI technology and improve AI functions to help enterprises in more industries and fields realize digital transformation and embrace the AI era.