-
Notifications
You must be signed in to change notification settings - Fork 0
AI RAG evaluation project using Ragas. Includes RAG metrics (precision, recall, faithfulness), retrieval diagnostics, and prompt testing examples for fintech/banking LLM systems. Designed as an AI QA Specialist portfolio project.
alinaleo27/ai-rag-eval-qa
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Β | Β | |||
Β | Β | |||
Β | Β | |||
Β | Β | |||
Β | Β | |||
Β | Β | |||
Repository files navigation
π AI RAG Evaluation β QA Test Suite
This project demonstrates how an AI QA Specialist can evaluate a RAG (Retrieval-Augmented Generation) system using Ragas.
The repository covers:
LLM / RAG quality evaluation
retrieval error analysis (missing / wrong / irrelevant context)
automated RAG metrics (precision, recall, faithfulness)
basic prompt testing for a banking / fintech chatbot
π§© Project structure
ai-rag-eval-qa/
βββ README.md
βββ requirements.txt
βββ .gitignore
βββ data/
β βββ rag_eval_dataset.jsonl
βββ notebooks/
β βββ ragas_evaluation.py
βββ prompts/
βββ prompt_tests.md
βοΈ Installation
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
π API Key Configuration (.env)
This project uses OpenAI for semantic evaluation.
Create a .env file in the project root:
OPENAI_API_KEY=your_api_key_here
The .env file is in .gitignore, so your API key will stay on your machine only
π§ͺ Running evaluation
python3 notebooks/ragas_evaluation.py
The script loads the dataset and evaluates:
context_precision
context_recall
faithfulness
(You can also run in offline mode by disabling LLM usage.)
β οΈ Note on OpenAI Quota
The full evaluation requires an active OpenAI quota.
If the account has no credits or quota, you will see:
openai.RateLimitError: insufficient_quota
This is expected behavior and not an error in the project.
To run offline (no API calls):
results = evaluate(dataset, metrics=metrics, llm=None)
Note: faithfulness does not work without an LLM.
π Dataset
rag_eval_dataset.jsonl contains 10 fintech/banking examples for RAG evaluation.
π¬ Prompt tests
Located in:
prompts/prompt_tests.md
Includes:
JSON output validation
jailbreak attempts
safety tests
consistency checks
π― Purpose
This repository serves as a compact example for:
RAG evaluation
LLM QA
retrieval diagnostics
prompt testing
Designed for AI QA / LLM QA roles.About
AI RAG evaluation project using Ragas. Includes RAG metrics (precision, recall, faithfulness), retrieval diagnostics, and prompt testing examples for fintech/banking LLM systems. Designed as an AI QA Specialist portfolio project.
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published