Osama Tasneem

Topic

Vision–Language–Action Models for Robotic Manipulation in Industrial Environments

Supervisor(s)

The field of Artificial Intelligence has advanced rapidly, driven by the introduction of the Transformers architecture in 2017. Large Language Models (LLMs) are utilised for tasks like question answering, summarization, and inference, while also enhancing computer vision applications like image processing, classification, and semantic understanding.

Recently, a multimodal AI paradigm has emerged which has the potential to transform the robotics industry. Vision Language models like PaLM-E and Google’s RT-2 enabling robots to better perceive environments and interact with users. These architectures show promise for collaborative tasks in industrial environments. However, their deployment in such settings will require fine-tuning and robustness to handle industrial tasks safely and efficiently.
This doctoral research project aims to equip robots with the ability to perceive their environment through vision, comprehend natural language commands, and execute appropriate actions autonomously. By bridging the gap between human operators and robotic systems, the vision language action model has the potential to revolutionize industrial processes, enhancing productivity, and improving safety.
The successful implementation of visual language models in an industrial context would lead to significant advancement in the field of robotics. Integration of Large Language models would enable robots to understand human commands more intuitively, thereby enhancing human-robot interaction. Additionally, incorporating vision capabilities into the model would provide robots with situational awareness, enabling them to make intelligent decisions and improve safety in industrial settings. Furthermore, it would expand the potential use cases, allowing robots to understand complex instructions and perform tasks with greater flexibility and adaptability. This could lead to increased levels of automation in industries such as manufacturing, mining, logistics and healthcare. For instance, in manufacturing, robots could be deployed for assembly, quality inspection, safety checks and material handling tasks with minimal reprogramming effort, thanks to their ability to comprehend and respond to natural language instructions.