A streamlined, end-to-end workflow to fine-tune Google's FunctionGemma models (specialized for tool calling/function calling), convert them to GGUF format, and deploy them directly to Ollama for local inference.
- Custom Fine-Tuning: Uses Hugging Face
trlandSFTTrainerto fine-tune FunctionGemma on your own tool definitions and query samples. - Automated Conversion: Seamlessly converts the fine-tuned model to GGUF using
llama.cpp. - Ollama Deployment: Automatically creates a custom Ollama model with a date-stamped tag (e.g.,
functiongemma-custom:2025-12-29). - Windows Optimized: Includes batch scripts for one-click setup and execution, with fixes for common Windows SSL and Path issues.
- GPU Accelerated: Pre-configured for NVIDIA GPUs (BF16/FP16 support enabled).
Before you begin, ensure you have the following installed:
- Python 3.10+: Download Python
- Git: Download Git
- Ollama: Download Ollama
- NVIDIA GPU (Recommended): With latest drivers and CUDA support.
-
Clone this repository:
git clone https://github.com/manojkumarredbus/functiongemmaOllama.git cd functiongemmaOllama -
Run the Setup Script: Double-click
setup.bator run:.\setup.bat
This creates a virtual environment and installs all necessary Python dependencies.
-
Clone llama.cpp (Required for GGUF conversion): The pipeline expects
llama.cppto be in the root directory.git clone https://github.com/ggerganov/llama.cpp.git
Note: You may need to install llama.cpp dependencies via
venv\Scripts\pip install -r llama.cpp\requirements.txtif they aren't covered by the main setup, though the pipeline mainly uses the conversion script.
The magic happens in run_pipeline.bat. This script performs the entire workflow in sequence:
.\run_pipeline.batWhat this script does:
- Trains the model using
train.py(Default: 3 epochs). - Tests the Python model locally with
test_model.py. - Converts the model to GGUF format using
llama.cpp.- New: Now includes an interactive prompt to select your quantization level (
q8_0,f16,bf16,tq1_0, etc.).
- New: Now includes an interactive prompt to select your quantization level (
- Imports the model into Ollama as
functiongemma-custom:YYYY-MM-DD.
If you prefer to run steps manually:
-
Activate Environment:
venv\Scripts\activate
-
Train:
python train.py
-
Test (Python):
python test_model.py
-
Convert to GGUF:
python llama.cpp\convert_hf_to_gguf.py functiongemma-270m-it-simple-tool-calling --outfile model.gguf --outtype bf16
-
Import to Ollama:
echo FROM ./model.gguf > Modelfile ollama create my-function-model -f Modelfile
Once deployed, you can run the model directly:
ollama run functiongemma-custom:2025-12-29 "What is the reimbursement limit for travel meals?"Note: FunctionGemma expects specific XML-like tags for tool definitions. Using ollama run interactively might not yield perfect tool calls without the proper system prompt context, but the model is trained to recognize the structure.
To train on your own data:
- Open
train.py. - Locate the
simple_tool_callinglist. - Add your own examples following the format:
{ "user_content": "Your user query here", "tool_name": "function_to_call", "tool_arguments": "{\"arg_name\": \"value\"}" } - Update the
TOOLSdefinition list if you are adding new function schemas. - Re-run
run_pipeline.bat.
- SSL Verification: The
train.pyscript includes patches to bypass SSL verification errors common in some corporate Windows environments. Do not use this configuration in a production security-sensitive environment without reviewingtrain.py. - Hardware: The default config is optimized for an RTX 4090 (BF16 enabled). If you are on an older GPU, you may need to edit
train.pyto setbf16=Falseandfp16=True.