Inspired by 《Learning to Reason in 13 parameters》, use TinyLoRA+GRPO(32 parameters) to fine-tune Qwen2.5-Coder-3B-Instruct(or other models) to accomplish competitive programming..
python cpp rl cpp17 deepcoder peft good-first-issue good-first-pr good-first-contribution qwen2-5 grpo qwen-coder tinylora learning-to-reason-in-13-parameters code-contests deepmind-code-contests
-
Updated
Mar 11, 2026 - Python