Hi there! Thanks for your great work. I have a few questions regarding your custom-trained code quality scorer model.
The paper mentions that you adopted a Llama-series pretrained model as the backbone. However, in Appendix A2.1 about the evaluation prompt, it states:
"It remains consistent throughout the entire pipeline, from collecting ground-truth data to training the quality scorer and applying it across all GitHub data during inference."
I would like to confirm:
Does this mean you appended the fixed prompt to code samples during the training phase of the quality model?
Additionally, since the base Llama model is only pretrained and has not undergone instruction tuning, is it necessary to feed a task-specific prompt together with code inputs for quality model training?
Thanks a lot for your clarification!
Hi there! Thanks for your great work. I have a few questions regarding your custom-trained code quality scorer model.
The paper mentions that you adopted a Llama-series pretrained model as the backbone. However, in Appendix A2.1 about the evaluation prompt, it states:
"It remains consistent throughout the entire pipeline, from collecting ground-truth data to training the quality scorer and applying it across all GitHub data during inference."
I would like to confirm:
Does this mean you appended the fixed prompt to code samples during the training phase of the quality model?
Additionally, since the base Llama model is only pretrained and has not undergone instruction tuning, is it necessary to feed a task-specific prompt together with code inputs for quality model training?
Thanks a lot for your clarification!