Comparing the Performance of LLMs in RAG-based Question-Answering - A Case Study in Computer Science Literature

Mining Strategic Business Insights from Online Reviews - A Case Study in the Southern Coast of Sri Lanka

Abstract: This study investigates tourist perceptions of eight southern beaches in Sri Lanka using Google Reviews. With the increasing influence of online platforms in travel decision-making, analyzing review content provides valuable insights into tourist experiences and preferences. The research employs transformer-based models from Hugging Face for sentiment analysis and topic modeling, offering a modern, data-driven approach to textual review interpretation. Word clouds and bigram visualizations are used to highlight common positive and negative expressions associated with each beach. The findings reveal themes such as cleanliness, natural beauty, surfing opportunities, crowd, and local service quality as key themes associated with the southern coastline of Sri Lanka. Sentiment patterns vary across beaches, with some consistently rated positively while others receive mixed feedback. This analysis offers practical insights for stakeholders in the field of tourism to improve destination management and marketing strategies. The study demonstrates the effectiveness of modern-day NLP techniques in understanding tourist experiences and provides a scalable framework for future such analysis that centres around the user responses.

Comparing the Performance of LLMs in RAG-based Question-Answering - A Case Study in Computer Science Literature

Abstract: Retrieval Augmented Generation (RAG) is emerging as a powerful technique to enhance the capabilities of Generative AI models by reducing hallucination. Thus, the increasing prominence of RAG alongside Large Language Models (LLMs) has sparked interest in comparing the performance of different LLMs in question-answering (QA) in diverse domains. This study compares the performance of four open-source LLMs, Mistral-7b-instruct, LLaMa2-7b-chat, Falcon-7b-instruct and Orca-mini-v3-7b, and OpenAI’s trending GPT-3.5 over QA tasks within the computer science literature leveraging RAG support. Evaluation metrics employed in the study include accuracy and precision for binary questions and ranking by a human expert, ranking by Google’s AI model Gemini, alongside cosine similarity for long-answer questions. GPT-3.5, when paired with RAG, effectively answers binary and long-answer questions, reaffirming its status as an advanced LLM. Regarding open-source LLMs, Mistral AI’s Mistral-7b-instrucut paired with RAG surpasses the rest in answering both binary and long-answer questions. However, among the open-source LLMs, Orca-mini-v3-7b reports the shortest average latency in generating responses, whereas LLaMa2-7b-chat by Meta reports the highest average latency. This research underscores the fact that open-source LLMs, too, can go hand in hand with proprietary models like GPT 3.5 with better infrastructure.

Comparison of Machine Learning Models to Classify Documents on Digital Development

Abstract: Automated document classification is a trending topic in Natural Language Processing (NLP) due to the extensive growth in digital databases. However, a model that fits well for a specific classification task might perform weakly for another dataset due to differences in the context. Thus, training and evaluating several models is necessary to optimise the results. This study employs a publicly available document database on worldwide digital development interventions categorised under twelve areas. Since digital interventions are still emerging, utilising NLP in the field is relatively new. Given the exponential growth of digital interventions, this research has a vast scope for improving how digital-development-oriented organisations report their work. The paper examines the classification performance of Machine Learning (ML) algorithms, including Decision Trees, k-Nearest Neighbors, Support Vector Machine, AdaBoost, Stochastic Gradient Descent, Naive Bayes, and Logistic Regression. Accuracy, precision, recall and F1-score are utilised to evaluate the performance of these models, while oversampling is used to address the class-imbalanced nature of the dataset. Deviating from the traditional approach of fitting a single model for multiclass classification, this paper investigates the One vs Rest approach to build a combined model that optimises the performance. The study concludes that the amount of data is not the sole factor affecting the performance; features like similarity within classes and dissimilarity among classes are also crucial.

Publications

Mining Strategic Business Insights from Online Reviews - A Case Study in the Southern Coast of Sri Lanka