Data 542 project using HuggingFace data set.
How do adoption patterns of different AI coding agents vary across repositories with different characteristics?
We will analyse how frequently, and in which repositories, AI agents submit code by joining all pull request and all repository to identify which agents have contributed to each repository. For each agent and repository we will compute metrics such as the number of agentic PRs, contribution frequency, and the number of repositories that have not adopted any agent. Repository characteristics (e.g., primary language, stars, forks, and age) will be extracted from the repository tables. We will use descriptive statistics and χ2 tests to examine whether adoption patterns differ across languages and popularity bins. Finally, we will fit logistic or multinomial regression models to identify which repository-level characteristics are most predictive of adopting a specific AI agent
We aim to examine how different AI coding agents vary in their pull-request resolution dynamics and how these dynamics relate to measurable user and repository characteristics.
To study resolution behavior, we will analyze the time it takes for PRs to be closed using the existing created_at and closed_at timestamps. We will compare resolution times across agents and investigate how they correlate with repository popularity indicators, as well as user influence metrics. The analysis will combine descriptive comparisons with more advanced modeling to understand how agent identity, repository features, and user characteristics jointly influence the speed at which PRs are resolved.
We aim to identify the factors that predict whether an AI-generated pull request is merged and to understand how agent identity, user influence, and repository characteristics jointly shape merge outcomes.
The merge outcome will be determined directly from the dataset based on whether the merged_at field is present. We will investigate how the likelihood of merging differs across agents and how it is associated with repository characteristics and user-level attributes.