Authors:
Shaojun Xu、Xusheng Luo、Yutong Huang、Letian Leng、Ruixuan Liu、Changliu Liu
Paper:
https://arxiv.org/abs/2408.08188
Introduction
Long-horizon planning in multi-robot systems is fraught with challenges such as uncertainty accumulation, computational complexity, delayed rewards, and incomplete information. This paper proposes a novel approach to exploit task hierarchy from human instructions to facilitate multi-robot planning. By leveraging Large Language Models (LLMs), the authors introduce a two-step method to translate multi-sentence instructions into a structured language, Hierarchical Linear Temporal Logic (LTL), which serves as a formal representation for planning.
Related Work
Language-Conditioned Robotic Planning
There are two primary methods for generating actions from instructions:
1. Deep-Learning Techniques: These translate instructions into low-level actions, such as joint states. Examples include Open-X Embodiment and Octo.
2. Intermediate Representation: Instructions are first translated into an intermediate representation, then off-the-shelf solvers generate actions. This approach reduces the need for extensive data.
Natural Language to Temporal Logic
Temporal logics are effective for goals involving temporal constraints and providing performance assurances. Initial adaptations used grammar-based methods, but recent efforts leverage LLMs for such reasoning tasks. However, these models focus on the translation process and do not tackle the challenges of language grounding in robotics.
LLMs to Multi-Robots
Recent trends adapt LLMs for multi-robot systems. Examples include SMART-LLM, which synthesizes code for task decomposition and allocation, and Roco, which uses a dialogue-based approach for task coordination. However, these works primarily focus on finding feasible solutions rather than optimizing cost and time.
Preliminary
Linear Temporal Logic (LTL)
LTL is composed of atomic propositions and operators like conjunction (∧), negation (¬), next (⃝), and until (U). The paper focuses on syntactically co-safe LTL (sc-LTL), which is suitable for reasoning about robot tasks with finite durations.
Hierarchical LTL
Hierarchical LTL includes multiple levels, where each specification at a higher level is constructed from specifications at a lower level. This hierarchical structure aligns well with human instructions.
Example: Dishwasher Loading Problem
The hierarchical LTL for loading a dishwasher is:
– L1: ϕ1_1 = ♢(ϕ1_2 ∧ ♢ϕ2_2)
– L2: ϕ1_2 = ♢πl_plates ∧ ♢πl_mugs ∧ ♢πl_utensils
– L2: ϕ2_2 = ♢(πu_saucers ∧ ♢πu_cups)
Natural Language to Hierarchical LTL
The authors propose a two-stage method for translating natural language into hierarchical LTL using an intermediary structure known as the Hierarchical Task Tree (HTT).
Conversion from Human Instructions to Hierarchical Task Tree
- HTT without Temporal Relations: LLMs decompose the overarching task into a structured hierarchy.
- Add Temporal Relations: LLMs determine the temporal relations between sibling tasks within the HTT.
Generation of Task-wise Flat LTL Specifications
Once the HTT representation is obtained, a single flat LTL specification is generated for each node using a breadth-first search algorithm.
Experimental Results
The performance of the proposed method is evaluated in both simulated and real-world environments.
Mobile Manipulation Tasks in AI2-THOR
Tasks from the ALFRED dataset are used to create derivative tasks, which are categorized based on the number of base tasks. The results show that the proposed method achieves higher success rates and lower costs compared to SMART-LLM.
Real-World Rearrangement Experiments Involving Human Participants
A tabletop experiment with a robotic arm placing fruits and vegetables onto colored plates is conducted. The results demonstrate the adaptability of the method to various verbal styles and its effectiveness compared to existing methods.
Multi-Robot Handover Tasks
The execution of pick-and-place tasks involving multiple objects by four fixed robot arms is examined. The results indicate that the proposed method effectively manages multi-stage handover tasks.
Conclusions and Limitations
The proposed method transforms unstructured language into a structured formal representation with a hierarchical structure. The simulation and real-world experiment outcomes demonstrate that the framework offers an intuitive and user-friendly approach for deploying robots in daily situations.
Limitations
- Open Loop Operation: The framework operates as an open loop without feedback. Integrating a syntax checker and a semantic checker is essential for transitioning to a closed-loop system.
- Static HTT Representation: Once created, the HTT representation remains unchanged. To handle tasks with more base tasks, it is necessary to restructure the HTT to restrict the number of child tasks a single parent task has.