Retail-GPT: leveraging Retrieval Augmented Generation (RAG) for building E-commerce Chat Assistants

Authors:

Bruno Amaral Teixeira de Freitas、Roberto de Alencar Lotufo

Paper:

Introduction

The advent of Large Language Models (LLMs), particularly with the release of OpenAI’s GPT series, has revolutionized human-machine textual interactions. These models have enabled the creation of chat assistants, or chatbots, that can engage in more natural conversations and better understand user needs. When combined with Retrieval-Augmented Generation (RAG) techniques, these models can interact with external software systems, enhancing their capabilities with data retrieved from external sources.

In the realm of e-commerce, the potential for such systems is immense. With global retail e-commerce sales projected to surpass 6.3 trillion US dollars by 2024, enhancing the customer experience in online shopping holds significant commercial value. This study introduces Retail-GPT, an open-source RAG-based chatbot designed to guide users through product recommendations and assist with cart operations, aiming to enhance user engagement and serve as a virtual sales agent.

Related Work

Several examples in the literature highlight the application of RAG techniques in various domains:

FACTS: NVIDIA’s framework for building assistants that leverage enterprise data to enhance employee productivity.
Healthcare Agents: Abbasian et al. developed health agents focused on assisting users with healthcare-related tasks.

These examples demonstrate the versatility of RAG-based systems in different contexts. However, the application of such systems in the e-commerce domain, specifically for enhancing customer experience in online shopping, remains underexplored.

Research Methodology

The development of Retail-GPT focused on creating a cross-platform chatbot adaptable to various e-commerce domains. The primary objectives were:

Natural Conversations: The chatbot should engage in natural conversations, mimicking human interaction.
Accurate Interpretation: It should accurately interpret user demands, check product availability, and perform tasks like adding or removing products from the cart.
Controlled Responses: To mitigate the unpredictability and hallucination tendencies of LLMs, the system employs a combination of pre-written responses and a transformer-based message classifier.

The chatbot’s interaction flow begins with an initial form to gather essential data such as delivery addresses and registration information. It then engages in conversation, providing product recommendations and performing cart operations based on user inputs. The final step involves controlled questions regarding the checkout process.

Experimental Design

The system architecture combines RAG and an NLP pipeline built around a message classifier. The process flow is as follows:

Message Tokenization: User messages are tokenized and converted into vector embeddings by the featurizer.
DIET Classifier: The embeddings are fed into a DIET (Dual Intent and Entity Transformer) classifier, which extracts data and determines the response.
LLM Subsystem: For complex or contextually relevant responses, the classifier delegates message processing to the LLM-based subsystem. This subsystem includes input guardrails to block inappropriate content and a prompt that defines function calls for product search, cart editing, and purchase completion.

The system was implemented using Python 3, with the NLP pipeline built through Rasa and SpaCy’s en_core_web_lg pipeline. The chosen LLM was GPT-4o, accessed via OpenAI’s API. For demonstration purposes, the system was developed as an assistant for a fictional convenience store, featuring around 50 products.

Results and Analysis

The source code for Retail-GPT is available at GitHub. Qualitative testing demonstrated that Retail-GPT could engage in conversations fulfilling the objectives described in the methodology. Figures 3 and 4 illustrate real conversations showcasing the system’s capabilities in guiding sales through cart operations and product recommendations.

Despite promising results, the system encountered errors in interpreting user intentions and generating appropriate function calls. The following experiments were conducted to evaluate the system’s performance:

Tool Selection Accuracy: The system’s ability to select the appropriate tool (e.g., cart editing, product search) was tested using a handmade dataset. The results, shown in Table 1, indicate that while the system generally invoked the correct tools, it exhibited vulnerabilities to hallucinations and unnecessary product searches.
Security and Consistency: The system’s robustness against prompt injections, corner-case messages, and off-topic messages was tested. As shown in Table 2, the chatbot handled off-topic messages and unavailable information requests well. However, it was susceptible to prompt injections, highlighting the need for more rigid guardrails.

Overall Conclusion

Retail-GPT represents a significant step towards leveraging RAG for building e-commerce chat assistants. The system can manage conversations, guide users through purchases, make product recommendations, and perform cart operations. However, challenges such as hallucinations and security issues remain. Future work should focus on enhancing the system’s robustness and security to make it suitable for production environments.