Skip to content

Conversation

@langverse2023
Copy link

@langverse2023 langverse2023 commented Dec 4, 2025

This PR replaces the previous regex-based, line-by-line comment detection with a more robust solution using the Pygments library.

The Problem:
The previous approach analyzed each line in isolation. This caused issues with multi-line comments (e.g., C-style /* ... */), where middle lines starting with * were often invalidly interpreted as multiplication operators or code syntax.

The Solution:
I have introduced a new helper function get_comment_line_indices that preserves the context of the code block.

Implementation Details:

  1. Reconstruct Source: The function strips diff markers (+, -) from the patch to reconstruct the continuous source text.
  2. Context-Aware Lexing: Instead of checking line by line, we pass the full text to the Pygments lexer. This allows the lexer to correctly identify multi-line comment blocks (like """ docstrings """ or /* block comments */).
  3. Token Mapping: We iterate through the tokens generated by Pygments and map them back to the original line numbers.
  4. Filtering: A line is marked as "non-scoreable" only if it contains comment tokens and strictly no code tokens.

Key Changes:

  • Removed is_comment_line (regex-based).
  • Added get_comment_line_indices (Pygments-based).
  • Updated count_non_scoreable_lines to pre-calculate comment indices before iterating.

Fixes #64
Improves #56


Contribution by Gittensor, learn more at https://gittensor.io/

hsparks-codes and others added 16 commits December 1, 2025 22:09
The count_non_scoreable_lines function had a bug where typo pairs (- followed by +) were being counted incorrectly. When a typo was detected, the function would:
1. Mark both lines as non-scoreable (+2)
2. Continue to skip the - line
3. BUT the next iteration would process the + line again as scoreable

This allowed miners to gain score credit for typo corrections that should be filtered out.

Fixed by adding a skip_next flag to ensure the + line of a detected typo pair is not processed in the next iteration.

Impact: Prevents gaming via intentional typo corrections that were previously being scored.

Contribution by Gittensor, learn more at https://gittensor.io/
The calculate_issue_multiplier function was filtering out invalid issues (self-created, wrong timing, etc.) into valid_issues, but then incorrectly using the unfiltered pr.issues list for scoring. This allowed miners to game the system by creating invalid issues that would still count toward their multiplier.

Changed lines 190 and 195 to use valid_issues instead of pr.issues, ensuring only validated issues contribute to the score multiplier.

Contribution by Gittensor, learn more at https://gittensor.io/
* fix: prevent gamming issue multiplier

* fix comment

* small fix
* update subnet repsitories

* add SDKv10 branch to bittensor repo

* add sn118 repo
* Fix confusing recycle allocation logic in dynamic emissions

The original code used a backwards ternary expression that would add 1.0 to recycled emissions when total_recycled <= 0, which doesn't make logical sense for an emissions recycling system.

Original buggy code:
max(total_recycled, 1 if total_recycled <= 0 else 0)

This evaluates to:
- If total_recycled > 0: max(total_recycled, 0) = total_recycled ✅
- If total_recycled <= 0: max(total_recycled, 1) = 1 ❌ (adds 1.0 when nothing should be recycled)

Fixed to simply ensure non-negative recycled amount:
max(total_recycled, 0.0)

Impact: Prevents incorrect allocation of 1.0 emissions to RECYCLE_UID when no emissions should be recycled.

Contribution by Gittensor, learn more at https://gittensor.io/

* Fix edge case: handle total_original=0 with dynamic recycle bound

- Add dynamic_recycle_bound: 1 if no earned scores, 0 otherwise
- Update recycle_percentage to 100% when total_original=0
- Addresses owner feedback on PR entrius#54

Contribution by Gittensor, learn more at https://gittensor.io/
* Removed # in some languages

* Removed unused languages and comment

* Update constants.py

* Update constants.py
* Implemented PR success ratio multipliers to apply to score. (% merged / total attempted PRs)

* Updated test file contributions to only get 10% of earned score. Down from 25%

* Updated storage functionality with new multipliers / fields

* Storage functionality touchups for MinerEvaluation datamodels

* Dec 4 1PM EST Time cushion before closed PRs impact success ratio score

* Added edge case safeguard on if no closed prs exist

* Ensured merged count only begins after 12/4 1pm est.

* date fix update on typo - better logging

* added protection threshold before ratio takes place

* kimbo like 10 - we do 10
* Require issues to be closed within 2 days of PR merged data + 2 issues scored per pr max

* 1D, more appropiate issue resolvement time
@hsparks-codes
Copy link
Contributor

hsparks-codes commented Dec 4, 2025

@langverse2023 Close this PR, otherwise will report. You're stealing my effort.

@SmartDever02
Copy link
Contributor

# Test cases
if __name__ == "__main__":
    # Test 1: Comments only
    """
    /*
    * Hello world
    */

    """
    test_string = """
* Hello world
"""
    print(f"Rust comments only: {is_comment_line(test_string, '.rs')}")  
    
    # Test 2: Code
    test_string = """
#include <stdio.h>
"""
    print(f"C program code: {is_comment_line(test_string, '.c')}")  
    
    # Test 3: multi-line comment
    """
    '''
    This is a comment
    '''
    """
    test_string = '''
This is a comment
'''
    print(f"Multi-line comment: {is_comment_line(test_string, '.py')}")

I don't think Pygments library works better than current solution as we check line by line

Comment on lines +4 to +7
from pygments import lex
from pygments.lexers import get_lexer_for_filename, TextLexer
from pygments.token import Comment, String
from pygments.util import ClassNotFound

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to include pygments as a dependency

@langverse2023
Copy link
Author

@langverse2023 Close this PR, otherwise will report. You're stealing my effort.
I believe there is a misunderstanding. My implementation is distinct because it integrates the Pygments library to handle comment detection, whereas your solution did not use this library. This is a different technical approach, not a copy of your work.

@langverse2023
Copy link
Author

# Test cases
if __name__ == "__main__":
    # Test 1: Comments only
    """
    /*
    * Hello world
    */

    """
    test_string = """
* Hello world
"""
    print(f"Rust comments only: {is_comment_line(test_string, '.rs')}")  
    
    # Test 2: Code
    test_string = """
#include <stdio.h>
"""
    print(f"C program code: {is_comment_line(test_string, '.c')}")  
    
    # Test 3: multi-line comment
    """
    '''
    This is a comment
    '''
    """
    test_string = '''
This is a comment
'''
    print(f"Multi-line comment: {is_comment_line(test_string, '.py')}")

I don't think Pygments library works better than current solution as we check line by line

Thanks for the feedback! You are absolutely right that checking line-by-line is problematic for multi-line comments (e.g., Pygments might interpret an isolated * Hello world as a multiplication or pointer instead of a comment).
To address this, I have refactored the logic to analyze the entire patch context at once.

@langverse2023
Copy link
Author

@LandynDev could u check my pr

@anderdc
Copy link
Collaborator

anderdc commented Dec 5, 2025

I like the idea of this PR, it will take a couple days to fully vet it and test, expect a full review then

@anderdc
Copy link
Collaborator

anderdc commented Dec 14, 2025

resolve conflicts, I'm looking into this PR now finally


# 2. Determine the appropriate lexer
try:
filename = f"dummy{file_extension}" if file_extension else "dummy.txt"
Copy link
Collaborator

@anderdc anderdc Dec 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double check the dummy filename construction, I don't believe file_extension has the ..

also why not just use the actual filename? seems strange to just make up a dummy filename

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I updated the function signature to pass the full filename directly. It's much cleaner now.

@langverse2023
Copy link
Author

@anderdc could u check my pr

@langverse2023
Copy link
Author

@LandynDev , @anderdc would you review my PR ?

@LandynDev LandynDev closed this Dec 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants