Carlos Roberto Cueto Zumaya


Topic
3D Scene Graphs for Enhanced Situational Awareness in Autonomous Systems
Autonomous systems require a unified and scalable spatial representation to perceive, understand, and interact with complex, dynamic environments. Current 3D Scene Graph (3DSG) frameworks have shown promise in integrating geometry, semantics, and relationships but remain limited by issues of scalability, data management, and long-term adaptability. These limitations hinder their application in real-world, large-scale, and indoor/outdoor scenarios where real-time updates and persistent situational awareness are critical.
This research proposes a scalable 3DSG framework designed to support real-time perception and understanding across large and dynamic environments. The framework will fuse multi-modal sensory inputs, such as LiDAR, RGB-D cameras, and IMUs, into a unified hierarchical representation that captures spatial and semantic relationships. By leveraging these graph data structures and mature storage backends, including graph and vector databases, the system will enable querying, dynamic updates, and long-term persistence. Implemented within the ROS 2 ecosystem, the framework will provide a structured foundation that can be exploited by downstream modules/tasks, such as large language models or task planners, for higher-level reasoning, interaction, and decision-making in autonomous systems.
The proposed framework will enable autonomous systems to maintain reliable, up-to-date situational awareness and make informed decisions in dynamic environments. It will improve long-term robustness, adaptability, and operational continuity by efficiently managing large-scale spatial and semantic data. This will enhance downstream capabilities such as navigation, exploration, and environment monitoring, ultimately bridging the gap between high-fidelity perception and practical autonomy.
Existing frameworks like Hydra, Kimera, and S-Graphs have laid solid foundations for 3D scene-graph mapping, yet they remain focused on specific- or small-scale environments and depend on tightly coupled architectures that hinder scalability and long-term deployment. Approaches such as FunGraph and CURB extend 3DSG concepts to outdoor or task-specific domains, but still lack generalized mechanisms for efficient querying, persistent storage, and multi-modal integration.
Traditional mapping systems, such as OctoMap, Voxblox, and TSDF-based reconstructions, offer efficient geometric modeling but lack semantic hierarchy and relational structure, limiting higher-level contextual understanding.
The proposed framework bridges this gap by combining the semantic richness and structure of 3DSGs with the scalability and persistence of modern graph and vector databases, enabling efficient management of large, dynamic environments and supporting downstream reasoning modules like LLMs or task planners for real-time decision-making.

