Many Heads but One Brain: Fusion Brain - a Competition and a Single Multimodal Multitask Architecture

Abstract

Supporting the current trend in the AI community, we present the AI Journey 2021 Challenge called FusionBrain, the first competition which is targeted to make a universal architecture which could process different modalities (in this case, images, texts, and code) and solve multiple tasks for vision and language. The FusionBrain Challenge combines the following specific tasks: Code2code Translation, Handwritten Text recognition, Zero-shot Object Detection, and Visual Question Answering. We have created datasets for each task to test the participants’ submissions on it. Moreover, we have collected and made publicly available a new handwritten dataset in both English and Russian, which consists of 94,128 pairs of images and texts. We also propose a multimodal and multitask architecture – a baseline solution, in the centre of which is a frozen foundation model and which has been trained in Fusion mode along with Single-task mode. The proposed Fusion approach proves to be competitive and more energy-efficient compared to the task-specific one.

Publication
Computer Optics
Aleksandr Petiushko Александр Петюшко
Aleksandr Petiushko Александр Петюшко
Director, Head of ML Research / Adjunct Professor / PhD

Principal R&D Researcher (15+ years of experience), R&D Technical Leader (10+ years of experience), and R&D Manager (7+ years of experience). Running and managing industrial research and academic collaboration (35+ publications, 30+ patents). Inspired by theoretical computer science and how it changes the world.