SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models
Jiawei Zhang,
Xuan Yang,
Taiqi Wang,
Yu Yao,
Aleksandr Petiushko Александр Петюшко,
Bo Li
February, 2025
Abstract
Traditional autonomous driving systems often struggle to integrate high-level reasoning with low-level control, resulting in suboptimal and sometimes unsafe driving behaviors. The emergence of multimodal large language models (MLLMs), which can process both visual and textual data, presents an opportunity to unify perception and reasoning tasks within a single framework. However, effectively embedding precise safety knowledge into MLLMs for autonomous driving remains a significant challenge. To address this, we propose SafeAuto, a novel framework that enhances MLLM-based autonomous driving systems by incorporating both unstructured and structured knowledge. Specifically, we first introduce the Position-Dependent Cross-Entropy (PDCE) loss function, designed to improve the accuracy of low-level control signal predictions when numerical values are represented as text. Second, to ensure safe autonomous driving by explicitly integrating precise safety knowledge into the MLLM, we develop a reasoning component for SafeAuto. This component translates driving safety regulations into first-order logic rules (e.g., ``red light stop’’) and incorporates these rules into a probabilistic graphical model, such as a Markov Logic Network (MLN). The MLN is trained to verify the predicted next actions using environmental attributes identified by attribute recognition models (e.g., detecting a red light) to form the predicates. Additionally, we construct a Multimodal Retrieval-Augmented Generation (Multimodal RAG) model that leverages video data, control signals, and environmental attributes to learn more effectively from past similar driving experiences. By integrating PDCE, MLN, and Multimodal RAG, SafeAuto significantly outperforms existing baselines across multiple datasets. This advancement paves the way for more accurate, reliable, and safer autonomous driving systems that effectively learn from experience, adhere to traffic regulations, and execute precise control actions.
Sr. Director, Head of AI Research / Adjunct Professor / PhD
Principal R&D Researcher (20 years of experience), R&D Technical Leader (15 years of experience), and R&D Manager (10 years of experience). Running and managing industrial research and academic collaboration (45 publications, 40 patents). Hiring and transforming AI/ML teams. Inspired by theoretical computer science and how it changes the world.