Supervisor:

Dr. Tsui-Wei Weng

Posted:

Dec 14, 2023

Course:

DSC 210 FA’23 Numerical Linear Algebra

Team:

Sanidhya Singal PID: A59020297 Computer Science & Engineering UC San Diego
Shyam Renjith PID: A59020323 Computer Science & Engineering UC San Diego
Srinivas Thillaisthanam Raman PID: A59018024 Computer Science & Engineering UC San Diego

1. Introduction

Question-answering systems aim to use Information Retrieval (IR) and Natural Language Processing (NLP) techniques to answer a user’s queries. While Open Domain Question Answering (ODQA) systems are based on broad world knowledge, Closed-Domain Question Answering (CDQA) [1] systems answer questions that pertain to a certain domain, or topic and are mostly limited to a set of available documents. These systems are often used in healthcare, finance, legal, and customer support settings. Most importantly, organizations can customize CDQA systems to suit their specific needs. By limiting the scope of these systems to a set of documents, these systems can provide highly accurate answers, especially in niche areas. Developing CDQA systems is challenging because there a multitude of domains each with its own vocabulary, language syntax and semantics.

In our work, we specifically focus on IR-based Closed-Domain Question Answering system where the system tries to answer the user’s query after retrieving relevant documents pertaining to the user’s query from the given set of documents. The most relevant document retrieved by the IR system is passed as a context along with the user’s query to a Large Language Model (LLM) that generates the response to be given to the user. This process is called Retrieval Augmented Generation (RAG) [2].

1.1 History / Background

The last few years have seen a surge in LLMs, such as GPT-3, ChatGPT, LLaMA, etc., achieving human level performance in tasks such conversational AI, content generation, and language translation. However, these LLMs are prone to hallucinations which refer to instances where the model generates responses that may be factually incorrect or not grounded in the provided domain-specific information. This becomes a significant problem in closed-domain QA because the goal is to provide accurate and domain-specific responses. If a language model hallucinates information that is not present in the given domain or context, it can lead to misleading and unreliable answers. Closed-domain QA systems aim to answer questions within a specific subject area or domain, relying on precise and factual information. Hallucinations can compromise the reliability of these systems, potentially impacting decision-making processes and user trust.

Retrieval Augmented Generation is a technique that combines both retrieval-based and generation-based approaches to improve the performance of large language models and mitigate issues like hallucinations. Retrieval-based methods involve retrieving relevant information from a knowledge base or a set of documents based on the input query. This ensures that the model has access to domain-specific knowledge, reducing the likelihood of hallucinations that might arise from extrapolating information from unrelated or incorrect sources.

1.2 Applications

Closed-Domain Question Answering with Retrieval Augmented Generation finds applications across various domains where accurate and contextually grounded responses are essential. There are a variety of use cases such as technical support chatbots, legal research assistance, healthcare information retrieval, and financial data analysis, etc. For example, if a doctor wants to know specific details of a particular patient’s case, a CDQA system built using RAG would be more appropriate compared to an ODQA system. These are a few instances where the combination of retrieval and generation techniques enables more accurate, contextually grounded, and informative responses to user queries.

The State-of-the-art (SOTA) approach to solve this problem involves using LLM-based embeddings for effective information retrieval, storing them in a vector database for fast querying, and using a generative LLM for generating the answer to the user’s query.