Comparing the Performance of LLMs in RAG-based Question-Answering - A Case Study in Computer Science Literature
conference paperAuthors: Ranul Dayarathne, Uvini Ranaweera, Upeksha Ganegoda
Abstract: Retrieval Augmented Generation (RAG) is emerging as a powerful technique to enhance the capabilities of Generative AI models by reducing hallucination. Thus, the increasing prominence of RAG alongside Large Language Models (LLMs) has sparked interest in comparing the performance of different LLMs in question-answering (QA) in diverse domains. This study compares the performance of four open-source LLMs, Mistral-7b-instruct, LLaMa2-7b-chat, Falcon-7b-instruct and Orca-mini-v3-7b, and OpenAI’s trending GPT-3.5 over QA tasks within the computer science literature leveraging RAG support. Evaluation metrics employed in the study include accuracy and precision for binary questions and ranking by a human expert, ranking by Google’s AI model Gemini, alongside cosine similarity for long-answer questions. GPT-3.5, when paired with RAG, effectively answers binary and long-answer questions, reaffirming its status as an advanced LLM. Regarding open-source LLMs, Mistral AI’s Mistral-7b-instrucut paired with RAG surpasses the rest in answering both binary and long-answer questions. However, among the open-source LLMs, Orca-mini-v3-7b reports the shortest average latency in generating responses, whereas LLaMa2-7b-chat by Meta reports the highest average latency. This research underscores the fact that open-source LLMs, too, can go hand in hand with proprietary models like GPT 3.5 with better infrastructure.
Keywords: Retrieval Augmented Generation · Large Language Models · Question Answering
Presented: 5th International Conference on Artificial Intelligence in Education Technologies AIET 2024 hosted by University of Barcelona, Spain
DOI: https://doi.org/10.1007/978-981-97-9255-9_26
Cite the paper:
@InProceedings{10.1007/978-981-97-9255-9_26,
author = "Dayarathne, Ranul and Ranaweera, Uvini and Ganegoda, Upeksha",
editor = "Schlippe, Tim and Cheng, Eric C. K. and Wang, Tianchong",
title = "Comparing the Performance of LLMs in RAG-Based Question-Answering: A Case Study in Computer Science Literature",
booktitle = "Artificial Intelligence in Education Technologies: New Development and Innovative Practices",
year = "2025",
publisher = "Springer Nature Singapore",
address = "Singapore",
pages = "387--403",
isbn = "978-981-97-9255-9"
}