Identifying the files in your project that undergo the most change is a heuristic you can use to identify bug hotspots.
TODO: find source; I'm not making this up. There's also subtlety around whether it's a bug hotspot or an API weakness.
Sort all files in the history by their modification count:
git log --name-only --pretty=format: | sort | uniq -c | sort -rn
Count the number of modifications to a given file, tracing through any renames:
git log --follow --name-only --pretty=format: -- path/to/file | sort | uniq -c | sort -rn
You can track all files across their renames like so:
git log --all -M --name-only --pretty=format: | sort | uniq -c | sort -rn
but this includes old files that are no longer present in the repository, which aren't as useful for the purpose of identifying hotspots. So count the modifications to current files in the repository, including any renames, like so:
git ls-files | xargs -P 8 -I {} sh -c 'echo "$(git log --follow --oneline -- "{}" | wc -l) {}"' | sort -rn
You can adjust xargs -P <jobs> to adjust parallelism.
Identifying the files in your project that undergo the most change is a heuristic you can use to identify bug hotspots.
TODO: find source; I'm not making this up. There's also subtlety around whether it's a bug hotspot or an API weakness.
Sort all files in the history by their modification count:
Count the number of modifications to a given file, tracing through any renames:
You can track all files across their renames like so:
but this includes old files that are no longer present in the repository, which aren't as useful for the purpose of identifying hotspots. So count the modifications to current files in the repository, including any renames, like so:
You can adjust
xargs -P <jobs>to adjust parallelism.