diff --git a/README.md b/README.md index 433ab9abae8432d8d61724791a49c7352aface90..61762420c2a841bc3b7e3ea59c595ce69bffaef4 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,15 @@ +# Purposes +This repository should enable you to compare LLM performances on RAG systems. + +Imagine a Professor - Studend situation: + +1. The script fetches RAG files and chunks them. +1. The prof (LLM1) creates a bunch of questions per chunk. +1. The studend (LLM2) answers these questions. +1. The professor then evaluates the questions in continuous and comparable metrics such as hallucination and correctness. + +The result is detailed information on the quality of each answer and an overview of the metrics of the LLM as a whole (i.e. ndcg@2 & precision@2). + # Prerequisits - python3 installed - [openAI API Key](https://auth.openai.com/) @@ -23,3 +35,8 @@ acccess webservice via browser, i.e. # Sources - [YT: RAG Time! Evaluate RAG with LLM Evals and Benchmarking](https://www.youtube.com/watch?v=LrMguHcbpO8) - [Phoenix Docs](https://docs.arize.com/phoenix) + + +# Planned +- [ ] ability to choose student LLM +- [ ] option to use local LLM via ollama \ No newline at end of file diff --git a/evaluateRAG.py b/evaluateRAG.py index b01e5b2f394159740ca89808f378d1efa763eb64..a096e584e4367bcb4978b71783124ad9ad8334d0 100644 --- a/evaluateRAG.py +++ b/evaluateRAG.py @@ -9,8 +9,6 @@ from operator import length_hint from dotenv import load_dotenv # getpass enables secure password input from getpass import getpass -# creating temporary files -import tempfile # download files containing the domain specific information from urllib.request import urlparse, urlretrieve # pandas handles table data