Scaling Up Natural Language Understanding for Multi-Robots Through the Lens of Hierarchy

Authors:

Shaojun Xu、Xusheng Luo、Yutong Huang、Letian Leng、Ruixuan Liu、Changliu Liu

Paper:

Introduction

Long-horizon planning in multi-robot systems is fraught with challenges such as uncertainty accumulation, computational complexity, delayed rewards, and incomplete information. This paper proposes a novel approach to exploit task hierarchy from human instructions to facilitate multi-robot planning. By leveraging Large Language Models (LLMs), the authors introduce a two-step method to translate multi-sentence instructions into a structured language, Hierarchical Linear Temporal Logic (LTL), which serves as a formal representation for planning.

Related Work

Language-Conditioned Robotic Planning

There are two primary methods for generating actions from instructions:
1. Deep-Learning Techniques: These translate instructions into low-level actions, such as joint states. Examples include Open-X Embodiment and Octo.
2. Intermediate Representation: Instructions are first translated into an intermediate representation, then off-the-shelf solvers generate actions. This approach reduces the need for extensive data.

Natural Language to Temporal Logic

Temporal logics are effective for goals involving temporal constraints and providing performance assurances. Initial adaptations used grammar-based methods, but recent efforts leverage LLMs for such reasoning tasks. However, these models focus on the translation process and do not tackle the challenges of language grounding in robotics.

LLMs to Multi-Robots

Recent trends adapt LLMs for multi-robot systems. Examples include SMART-LLM, which synthesizes code for task decomposition and allocation, and Roco, which uses a dialogue-based approach for task coordination. However, these works primarily focus on finding feasible solutions rather than optimizing cost and time.

Preliminary

Linear Temporal Logic (LTL)

LTL is composed of atomic propositions and operators like conjunction (∧), negation (¬), next (⃝), and until (U). The paper focuses on syntactically co-safe LTL (sc-LTL), which is suitable for reasoning about robot tasks with finite durations.

Hierarchical LTL

Hierarchical LTL includes multiple levels, where each specification at a higher level is constructed from specifications at a lower level. This hierarchical structure aligns well with human instructions.

Example: Dishwasher Loading Problem

The hierarchical LTL for loading a dishwasher is:
– L1: ϕ1_1 = ♢(ϕ1_2 ∧ ♢ϕ2_2)
– L2: ϕ1_2 = ♢πl_plates ∧ ♢πl_mugs ∧ ♢πl_utensils
– L2: ϕ2_2 = ♢(πu_saucers ∧ ♢πu_cups)

Natural Language to Hierarchical LTL

The authors propose a two-stage method for translating natural language into hierarchical LTL using an intermediary structure known as the Hierarchical Task Tree (HTT).

Conversion from Human Instructions to Hierarchical Task Tree

HTT without Temporal Relations: LLMs decompose the overarching task into a structured hierarchy.
Add Temporal Relations: LLMs determine the temporal relations between sibling tasks within the HTT.

Generation of Task-wise Flat LTL Specifications

Once the HTT representation is obtained, a single flat LTL specification is generated for each node using a breadth-first search algorithm.

Experimental Results

The performance of the proposed method is evaluated in both simulated and real-world environments.

Mobile Manipulation Tasks in AI2-THOR

Tasks from the ALFRED dataset are used to create derivative tasks, which are categorized based on the number of base tasks. The results show that the proposed method achieves higher success rates and lower costs compared to SMART-LLM.

Real-World Rearrangement Experiments Involving Human Participants

A tabletop experiment with a robotic arm placing fruits and vegetables onto colored plates is conducted. The results demonstrate the adaptability of the method to various verbal styles and its effectiveness compared to existing methods.

Multi-Robot Handover Tasks

The execution of pick-and-place tasks involving multiple objects by four fixed robot arms is examined. The results indicate that the proposed method effectively manages multi-stage handover tasks.

Conclusions and Limitations

The proposed method transforms unstructured language into a structured formal representation with a hierarchical structure. The simulation and real-world experiment outcomes demonstrate that the framework offers an intuitive and user-friendly approach for deploying robots in daily situations.

Limitations

Open Loop Operation: The framework operates as an open loop without feedback. Integrating a syntax checker and a semantic checker is essential for transitioning to a closed-loop system.
Static HTT Representation: Once created, the HTT representation remains unchanged. To handle tasks with more base tasks, it is necessary to restructure the HTT to restrict the number of child tasks a single parent task has.