-
Notifications
You must be signed in to change notification settings - Fork 24
Fix: Refactor is_comment_line to use Pygments for improved accuracy #68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The count_non_scoreable_lines function had a bug where typo pairs (- followed by +) were being counted incorrectly. When a typo was detected, the function would: 1. Mark both lines as non-scoreable (+2) 2. Continue to skip the - line 3. BUT the next iteration would process the + line again as scoreable This allowed miners to gain score credit for typo corrections that should be filtered out. Fixed by adding a skip_next flag to ensure the + line of a detected typo pair is not processed in the next iteration. Impact: Prevents gaming via intentional typo corrections that were previously being scored. Contribution by Gittensor, learn more at https://gittensor.io/
Contributing
The calculate_issue_multiplier function was filtering out invalid issues (self-created, wrong timing, etc.) into valid_issues, but then incorrectly using the unfiltered pr.issues list for scoring. This allowed miners to game the system by creating invalid issues that would still count toward their multiplier. Changed lines 190 and 195 to use valid_issues instead of pr.issues, ensuring only validated issues contribute to the score multiplier. Contribution by Gittensor, learn more at https://gittensor.io/
* fix: prevent gamming issue multiplier * fix comment * small fix
* update subnet repsitories * add SDKv10 branch to bittensor repo * add sn118 repo
* Fix confusing recycle allocation logic in dynamic emissions The original code used a backwards ternary expression that would add 1.0 to recycled emissions when total_recycled <= 0, which doesn't make logical sense for an emissions recycling system. Original buggy code: max(total_recycled, 1 if total_recycled <= 0 else 0) This evaluates to: - If total_recycled > 0: max(total_recycled, 0) = total_recycled ✅ - If total_recycled <= 0: max(total_recycled, 1) = 1 ❌ (adds 1.0 when nothing should be recycled) Fixed to simply ensure non-negative recycled amount: max(total_recycled, 0.0) Impact: Prevents incorrect allocation of 1.0 emissions to RECYCLE_UID when no emissions should be recycled. Contribution by Gittensor, learn more at https://gittensor.io/ * Fix edge case: handle total_original=0 with dynamic recycle bound - Add dynamic_recycle_bound: 1 if no earned scores, 0 otherwise - Update recycle_percentage to 100% when total_original=0 - Addresses owner feedback on PR entrius#54 Contribution by Gittensor, learn more at https://gittensor.io/
* Removed # in some languages * Removed unused languages and comment * Update constants.py * Update constants.py
* Implemented PR success ratio multipliers to apply to score. (% merged / total attempted PRs) * Updated test file contributions to only get 10% of earned score. Down from 25% * Updated storage functionality with new multipliers / fields * Storage functionality touchups for MinerEvaluation datamodels * Dec 4 1PM EST Time cushion before closed PRs impact success ratio score * Added edge case safeguard on if no closed prs exist * Ensured merged count only begins after 12/4 1pm est. * date fix update on typo - better logging * added protection threshold before ratio takes place * kimbo like 10 - we do 10
* Require issues to be closed within 2 days of PR merged data + 2 issues scored per pr max * 1D, more appropiate issue resolvement time
|
@langverse2023 Close this PR, otherwise will report. You're stealing my effort. |
# Test cases
if __name__ == "__main__":
# Test 1: Comments only
"""
/*
* Hello world
*/
"""
test_string = """
* Hello world
"""
print(f"Rust comments only: {is_comment_line(test_string, '.rs')}")
# Test 2: Code
test_string = """
#include <stdio.h>
"""
print(f"C program code: {is_comment_line(test_string, '.c')}")
# Test 3: multi-line comment
"""
'''
This is a comment
'''
"""
test_string = '''
This is a comment
'''
print(f"Multi-line comment: {is_comment_line(test_string, '.py')}")I don't think Pygments library works better than current solution as we check line by line |
| from pygments import lex | ||
| from pygments.lexers import get_lexer_for_filename, TextLexer | ||
| from pygments.token import Comment, String | ||
| from pygments.util import ClassNotFound |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to include pygments as a dependency
|
Thanks for the feedback! You are absolutely right that checking line-by-line is problematic for multi-line comments (e.g., Pygments might interpret an isolated * Hello world as a multiplication or pointer instead of a comment). |
|
@LandynDev could u check my pr |
|
I like the idea of this PR, it will take a couple days to fully vet it and test, expect a full review then |
|
resolve conflicts, I'm looking into this PR now finally |
|
|
||
| # 2. Determine the appropriate lexer | ||
| try: | ||
| filename = f"dummy{file_extension}" if file_extension else "dummy.txt" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double check the dummy filename construction, I don't believe file_extension has the ..
also why not just use the actual filename? seems strange to just make up a dummy filename
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I updated the function signature to pass the full filename directly. It's much cleaner now.
|
@anderdc could u check my pr |
|
@LandynDev , @anderdc would you review my PR ? |
This PR replaces the previous regex-based, line-by-line comment detection with a more robust solution using the
Pygmentslibrary.The Problem:
The previous approach analyzed each line in isolation. This caused issues with multi-line comments (e.g., C-style
/* ... */), where middle lines starting with*were often invalidly interpreted as multiplication operators or code syntax.The Solution:
I have introduced a new helper function
get_comment_line_indicesthat preserves the context of the code block.Implementation Details:
+,-) from the patch to reconstruct the continuous source text.Pygmentslexer. This allows the lexer to correctly identify multi-line comment blocks (like""" docstrings """or/* block comments */).Pygmentsand map them back to the original line numbers.Key Changes:
is_comment_line(regex-based).get_comment_line_indices(Pygments-based).count_non_scoreable_linesto pre-calculate comment indices before iterating.Fixes #64
Improves #56
Contribution by Gittensor, learn more at https://gittensor.io/