LLM-PCGC: Large Language Model-based Point Cloud Geometry Compression

Authors:

Paper:

Introduction

Point cloud data is essential for applications like autonomous driving and virtual reality. The challenge lies in compressing this data efficiently while preserving its intricate 3D structure. Traditional methods, both voxel-based and tree-based, have limitations in context modeling due to constraints in data volume and model size. This paper introduces a novel approach, leveraging the capabilities of large language models (LLMs) for point cloud geometry compression (PCGC).

Framework Overview

Encoding Pipeline

The encoding process begins with clustering the input 3D point clouds. Each cluster undergoes several steps:

Normalization: Coordinates are normalized by subtracting an offset.
K-Tree Structuring: The data is organized into a K-tree structure.
Flattening and Chunking: The hierarchical tree structure is flattened and divided into segments.
Token Conversion: Point cloud tokens are translated into text tokens using a codebook.
Probability Distribution: A trained Low Rank Adaptation (LoRA) model with a frozen LLM predicts the probability distribution for the next token.
Arithmetic Encoding: The probability distributions are fed into an arithmetic encoder, generating the encoded bitstream.

Decoding Pipeline

The decoding process mirrors the encoding steps in reverse:

Binary File Processing: Binary bits are split and converted to decimals.
Arithmetic Decoding: The main bitstream is decoded using probabilities from the LoRA and LLM.
Token Mapping: Text tokens are translated back into point cloud patch tokens.
Reconstruction: Patches are aligned and merged based on offsets and indices, and a K-tree structure is used for clustered point cloud reconstruction.
Post-Reconstruction: Offsets are applied to rebuild the original point cloud.

Experiments

Training and Testing Setting

The experiments involved training two sets of LLM-PCGC parameters on different datasets to ensure fair comparison. The datasets used include Microsoft Voxelized Upper Bodies (MVUB), 8i Voxelized Full Bodies (MPEG 8i), ShapeNet, and ModelNet40. The testing followed the common test condition (CTC) recommendations, evaluating on MPEG 8i and Owlii datasets.

Base Model and LoRA Setting

The foundational model chosen for this experiment was LLaMA2-7B, an open-source LLM. The LoRA and Embedding modules of LLaMA2-7B were fine-tuned, with the total number of trainable parameters amounting to only 6.7% of the original model’s parameters.

Experiment Results

The LLM-PCGC method demonstrated significant performance improvements:

Bit Rate Reduction: Achieved a -40.213% bit rate reduction compared to the G-PCC standard and a -2.267% reduction compared to the state-of-the-art learning-based method.
Compression Efficiency: Recorded an average of 0.52 bits per point (bpp) for the MPEG 8i dataset, outperforming other autoregressive methods.

Conclusion

The LLM-PCGC method is the first to employ LLMs as compressors for point cloud data within the “Generator is compressor” framework. By utilizing adaptation techniques like clustering, K-tree, token mapping invariance, and LoRA, the method achieves efficient cross-modality representation alignment and semantic consistency. Experimental results show superior compression performance over existing methods, highlighting the potential of LLMs in data compression. Future research can focus on optimizing memory consumption and inference time.

Datasets:

ModelNet

What's Hot

AAAI.2024 – Humans and AI

How Diffusion Models Learn to Factorize and Compose

Temporal Fairness in Decision Making Problems

LLM-PCGC: Large Language Model-based Point Cloud Geometry Compression

AAAI.2024 – Humans and AI

How Diffusion Models Learn to Factorize and Compose

Temporal Fairness in Decision Making Problems

NeCo: Improving DINOv2’s spatial representations in 19 GPU hours with Patch Neighbor Consistency

AAAI.2024 – Humans and AI

How Diffusion Models Learn to Factorize and Compose

Temporal Fairness in Decision Making Problems

NeCo: Improving DINOv2’s spatial representations in 19 GPU hours with Patch Neighbor Consistency

Our Picks

AAAI.2024 – Humans and AI

How Diffusion Models Learn to Factorize and Compose

Temporal Fairness in Decision Making Problems

Subscribe to Updates

What's Hot

LLM-PCGC: Large Language Model-based Point Cloud Geometry Compression

Authors:

Paper:

Introduction

Framework Overview

Encoding Pipeline

Decoding Pipeline

Experiments

Training and Testing Setting

Base Model and LoRA Setting

Experiment Results

Conclusion

Datasets:

Related Posts