MultiModal_Eval_Project_PoC-

This is PoC related to MultiModal Eval framework project idea doc of mine.

The project description is as follow:

1.The project shows how to evaluate the system. 2.Here my focus is on evaluating one of from many system (We can evaluate other system where MultiModal concept based systems is used) i.e RAG System. 3.Project is done in Google Colab Notebook Environment.

The Project is divided mainly into Two Parts:

-Firstly we Benchmark to know if the model is good for desired system. -As here our focused is on RAG System . -We Benchmark the model Using Lighteval Framework for f1 metric using custom tasks and RAG focused datasets(here,name-"rag-datasets/rag-mini-bioasq") from Hugging Face. -This is done all locally (Except calling model through api) -The Light Eval Pipeline is made to handle custom tasks with dual subsets(here-"question-answer-passages" and "text-corpus") with corresponding Splits. -From this we conclude if the model is best for desired system or not. -This part saves money and time at early steps

-This part is where whole RAG System PipeLine is situated. -First simply write logic of "Indexing" part of RAG System. -Second part is of "Retrieving" of RAG System. -first we simply run the RAG System to test whether it is working or broken.

-After this,we here used RAGAS Metric Library to make our evaluation pipeline. -we gave some sample queries and expected response which are related to our doc. -Hence we evaluate the RAG System. -For Traces and observability only we used LangSmith.

Lastly it is integrated for fastapi.

What other or different approaches should be taken? --->1.The evaluation pipeline could include llm-as-judge or humna-in-loop. 2.There are several reason for this addition . 3.Reasons are ,by this we would not just evaluate system we might improve the scores of it. 4.Pipeline would be [Rag output -->llm-as-judge --> human-in-loop (for correctness and future answer betterment) ---> Store in database(Vector database)(for future ).This might increase score indirectly but it ensure correctness and beliefness.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
ai_evals_project_gsoc		ai_evals_project_gsoc
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MultiModal_Eval_Project_PoC-

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MultiModal_Eval_Project_PoC-

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages