You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Aug 20, 2025. It is now read-only.
Behaviour
: LLMEvaluator evaluates trained model's performance via designated LLM service (i.e. PaLM, Gemini, ChatGPT, ...) by comparing the outputs of the model and the labels provided from ExampleGen.
: LLMEvaluator takes a parameter instruction which let you specify the prompt to the model. Since each LLM service could not interpret the same prompt in the same way, and it should be differentiated from task to task.
Why
: It is common sense to leverage LLM service to evaluate the model these days (especially when we fine-tune one of the open source LLM such as LLaMA).
This is a custom TFX component project idea.
hope to get some feedbacks from (@rcrowe-google , @hanneshapke , @sayakpaul , @casassg)
Temporary Name of the component: LLMEvaluator
Behaviour
: LLMEvaluator evaluates trained model's performance via designated LLM service (i.e. PaLM, Gemini, ChatGPT, ...) by comparing the outputs of the model and the labels provided from ExampleGen.
: LLMEvaluator takes a parameter
instructionwhich let you specify the prompt to the model. Since each LLM service could not interpret the same prompt in the same way, and it should be differentiated from task to task.Why
: It is common sense to leverage LLM service to evaluate the model these days (especially when we fine-tune one of the open source LLM such as LLaMA).