AgentCourt: Simulating Court with Adversarial Evolvable Lawyer Agents

Authors:

Guhong Chen、Liyang Fan、Zihan Gong、Nan Xie、Zixuan Li、Ziqiang Liu、Chengming Li、Qiang Qu、Shiwen Ni、Min Yang

Paper:

Introduction

Artificial intelligence (AI) technologies, particularly large language models (LLMs), are rapidly transforming the traditional legal industry. From automated text generation to interactive legal consulting, AI applications in the legal domain are becoming increasingly widespread. However, significant challenges remain in handling complex legal queries and simulating real court environments. Existing legal AI systems often struggle to comprehensively simulate the legal reasoning process and multi-party interactions.

To address these limitations, the paper presents AgentCourt, an innovative LLM-based system designed for the simulation of civil courts. AgentCourt involves multiple roles, including judges, attorneys, plaintiffs, and defendants, providing a more authentic and comprehensive legal scenario simulation. This system excels not only in handling standard legal queries but also in analyzing complex real-world cases.

Related Work

Large Language Models in the Legal Domain

AI applications in the legal domain have made significant strides, particularly with the advent of LLMs. These models have shown potential in various legal tasks, including case prediction, legal research, and document analysis. Despite these advancements, current AI legal systems still face limitations in handling complex legal queries and simulating real court environments.

Large Language Models for Real World Simulation

LLM-based multi-agent systems represent a new direction in AI research, leveraging collaborative agents to address complex problems. These systems excel in utilizing cognitive synergy and knowledge sharing, thereby enhancing overall decision quality and interaction capabilities. Notable applications include Agent Hospital, which simulates a hospital environment with LLM-driven autonomous agents representing patients, nurses, and doctors.

Court Simulacrum Visualization Settings

Inspired by previous studies, the court sandbox simulation environment is designed using Pygame to clearly demonstrate the entire process within the court. Two distinct scenarios are managed: the law firm, where plaintiffs and defendants interact with their lawyers, and the court, where the trial proceedings take place.

Agent Settings

To accurately recreate real litigation scenarios, six distinct roles are designed: plaintiff, defendant, plaintiff’s lawyer, defendant’s lawyer, judge, and court clerk. Each role is powered by ERNIE-Speed-128K and can be easily extended.

Plaintiff and Defendant Agents

The simulation begins before a case has occurred, necessitating two agents to play the roles of a potential plaintiff and defendant. These agents autonomously seek legal assistance from a law firm, obtaining a complaint or an answer during their interactions with the lawyer.

Lawyer Agents

Two lawyer agents are designed to communicate with their respective clients to gather relevant information about the case. They engage in court debates in accordance with the prescribed procedures, championing the interests of their respective clients.

Judge Agent

In the court, the judge is responsible for overseeing the entire process, listening to the arguments from both lawyers, and asking questions when appropriate. The judge summarizes and evaluates each round of the lawyers’ arguments before delivering the final judgment.

Court Clerk Agent

To create a more realistic court environment and to facilitate the evolution of the agent, a court clerk agent is designed to announce the commencement of the trial and document the entire process of the trial.

Data Settings and Processing

The data settings and processing methodology encompass regularized filtering, BERT embedding, and privacy masking. Leveraging the China Judgement Website, a dataset of 10,000 civil judgements was compiled. Preprocessing focused on enhancing dataset quality, resulting in a curated dataset of 1,000 training and 50 test samples.

Interactions Settings

Case Generation

Two resident agents randomly encounter cases and seek legal assistance from law firms, transforming into Plaintiff and Defendant Agents. The law firm randomly assigns lawyer agents to communicate with them and submit the complaint and defense documents to the court.

Court Proceedings

The full process of the simulation is illustrated in Figure 4. The DEBATE ROUNDS parameter is a hyperparameter set to accelerate the iteration process. This can be adjusted based on specific requirements or to more closely simulate real-world proceedings.

Adversarial Evolutionary Approach

An adversarial evolutionary strategy is introduced to advance lawyer proficiency within court simulations driven by LLMs. This strategy dramatically cuts annotation expenses while augmenting the model’s ability to generalize across diverse and complicated court scenarios. The strategy’s effectiveness hinges upon three fundamental modules: the experience database, case database, and legal code database.

Database Design

Construction of Experience Repository

Learning from past litigation experiences is essential for the development of an AI agent in legal contexts. The repository of experiences focuses on broad strategies and overarching directions, equipping the legal agent with a comprehensive perspective in case handling.

Development of Case Repository

A case library emphasizes the retention of individual cases themselves. Referencing prior cases and their judgments is imperative, enabling legal agents to swiftly amass a wealth of cases for future citation.

Establishment of Legal Code Repository

Possessing comprehensive and professional legal knowledge is of utmost importance. Legal agents review the fundamental aspects of the case and past exchanges during court proceedings, reflecting on which legal provisions could have improved their responses.

Experiment

Experimental Setup

The simulated court is based on 1,000 real-world civil cases. The experiment includes two parts: automatic evaluation and manual evaluation.

Automatic Evaluation Tasks

The LawBench assessment metrics were applied to conduct an automatic evaluation of the model. Tasks include Article Recitation, Dispute Focus Identification, Issue Topic Identification, and Consultation.

Manual Evaluation Tasks

Human experts evaluated the performance of the AI agents in simulated court debates based on three key dimensions: Cognitive Agility, Professional Knowledge, and Logical Rigor.

Evaluation Methods

Automatic Evaluation

The experimental baseline was adopted from ERNIE-Speed-128K, and the model consists of the two legal agents that have undergone evolution.

Manual Evaluation

A comprehensive human evaluation experiment with a double-blind controlled design was conducted. Five legal experts independently reviewed all records using a binary choice scoring system for each dimension.

Experimental Results

Automatic Evaluation Results

The two evolved agents show noticeable improvements across all four tasks, attributed to the court simulation process and evolution strategy.

Manual Evaluation Results

The evolved version of CourseAgent demonstrated significant improvement across all three evaluation dimensions, particularly in Professional Knowledge and Logical Rigor.

Case Analysis

Automatic Evaluation: Article Recitation Task

The agents were prompted to provide the content of article 43 of the Labor Law. The comparison demonstrates the evolved agent’s improved ability to accurately recall and cite legal content.

Manual Evaluation: Expert Analysis

Legal experts evaluated the performance across three dimensions: cognitive agility, professional knowledge, and logical rigor. The evolved agent demonstrated superior agility, deeper legal knowledge, and more structured and rigorous arguments.

Conclusion

AgentCourt is a novel simulation system for courtroom scenarios based on LLMs and agent technology. It facilitates multi-party interactions and intricate legal reasoning through a parameter-free approach. Evaluations show significant enhancements in knowledge acquisition, response adaptability, and argumentation quality as lawyer agents evolve via adversarial interactions. By open-sourcing the privacy-anonymized dataset and code, the aim is to drive the Legal AI field and potentially revolutionize legal practice through continuous improvement and optimization.

What's Hot

AAAI.2024 – Humans and AI

How Diffusion Models Learn to Factorize and Compose

Temporal Fairness in Decision Making Problems

AgentCourt: Simulating Court with Adversarial Evolvable Lawyer Agents

AAAI.2024 – Humans and AI

How Diffusion Models Learn to Factorize and Compose

Temporal Fairness in Decision Making Problems

NeCo: Improving DINOv2’s spatial representations in 19 GPU hours with Patch Neighbor Consistency

AAAI.2024 – Humans and AI

How Diffusion Models Learn to Factorize and Compose

Temporal Fairness in Decision Making Problems

NeCo: Improving DINOv2’s spatial representations in 19 GPU hours with Patch Neighbor Consistency

Our Picks

AAAI.2024 – Humans and AI

How Diffusion Models Learn to Factorize and Compose

Temporal Fairness in Decision Making Problems

Subscribe to Updates

What's Hot

AgentCourt: Simulating Court with Adversarial Evolvable Lawyer Agents

Authors:

Paper:

Introduction

Related Work

Large Language Models in the Legal Domain

Large Language Models for Real World Simulation

Court Simulacrum Visualization Settings

Agent Settings

Plaintiff and Defendant Agents

Lawyer Agents

Judge Agent

Court Clerk Agent

Data Settings and Processing

Interactions Settings

Case Generation

Court Proceedings

Adversarial Evolutionary Approach

Database Design

Construction of Experience Repository

Development of Case Repository

Establishment of Legal Code Repository

Experiment

Experimental Setup

Automatic Evaluation Tasks

Manual Evaluation Tasks

Evaluation Methods

Automatic Evaluation

Manual Evaluation

Experimental Results

Automatic Evaluation Results

Manual Evaluation Results

Case Analysis

Automatic Evaluation: Article Recitation Task

Manual Evaluation: Expert Analysis

Conclusion

Related Posts