Skip to content

Conversation

@wjrforcyber
Copy link
Contributor

@wjrforcyber wjrforcyber commented Oct 26, 2025

Use build-in function of GCC/Clang which boost the efficiency of finding the leading 1 by subtracting leading 0.
This could be a tiny efficiency boost with comparing approximately $O(log_2(n))$ with a single instruction $O(1)$.

Test with short code snippets

#include <assert.h>
#include <stdio.h>
#include <time.h>

static inline int      Abc_Base2Log( unsigned n )             { int r; if ( n < 2 ) return (int)n; for ( r = 0, n--; n; n >>= 1, r++ ) {}; return r; }
static inline int      Abc_Base2LogUpdated( unsigned n )      { if ( n < 2 ) return (int)n; return 31 - __builtin_clz(n); }

//force compiler to evaluate
volatile unsigned int result;

int main() {
    const int ITERATIONS = 100000000; // 100 million
    unsigned int test_values[] = {1, 2, 3, 15, 16, 255, 256, 65535, 65536, 2147483647};
    int num_tests = sizeof(test_values) / sizeof(test_values[0]);
    int res_ori, res_update;
    for (int t = 0; t < num_tests; t++) {
        unsigned int uTest = test_values[t];
        
        // Test original version
        clock_t startOri = clock();
        for (int i = 0; i < ITERATIONS; i++) {
            result = Abc_Base2Log(uTest + (i & 1)); // Vary input slightly
        }
        clock_t endOri = clock();
        double timeOri = ((double)(endOri - startOri)) / CLOCKS_PER_SEC;
        
        // Test updated version  
        clock_t startUpdated = clock();
        for (int i = 0; i < ITERATIONS; i++) {
            result = Abc_Base2LogUpdated(uTest + (i & 1)); // Vary input slightly
        }
        clock_t endUpdated = clock();
        double timeUpdated = ((double)(endUpdated - startUpdated)) / CLOCKS_PER_SEC;
        printf("n = %u:\n", uTest);
        printf("  Original:  %.6f sec (%d iterations)\n", timeOri, ITERATIONS);
        printf("  Updated:   %.6f sec (%d iterations)\n", timeUpdated, ITERATIONS);
        printf("  Speedup:   %.2fx\n\n", timeOri / timeUpdated);
    }
    
    return 0;
}

We could have a general efficiency boost result:

n = 1:
  Original:  0.168807 sec (100000000 iterations)
  Updated:   0.102936 sec (100000000 iterations)
  Speedup:   1.64x

n = 2:
  Original:  0.195803 sec (100000000 iterations)
  Updated:   0.133787 sec (100000000 iterations)
  Speedup:   1.46x

n = 3:
  Original:  0.210005 sec (100000000 iterations)
  Updated:   0.112542 sec (100000000 iterations)
  Speedup:   1.87x

n = 15:
  Original:  0.301259 sec (100000000 iterations)
  Updated:   0.106297 sec (100000000 iterations)
  Speedup:   2.83x

n = 16:
  Original:  0.357181 sec (100000000 iterations)
  Updated:   0.139394 sec (100000000 iterations)
  Speedup:   2.56x

n = 255:
  Original:  0.496130 sec (100000000 iterations)
  Updated:   0.107810 sec (100000000 iterations)
  Speedup:   4.60x

n = 256:
  Original:  0.585638 sec (100000000 iterations)
  Updated:   0.134941 sec (100000000 iterations)
  Speedup:   4.34x

n = 65535:
  Original:  0.904508 sec (100000000 iterations)
  Updated:   0.131047 sec (100000000 iterations)
  Speedup:   6.90x

n = 65536:
  Original:  0.953645 sec (100000000 iterations)
  Updated:   0.107659 sec (100000000 iterations)
  Speedup:   8.86x

n = 2147483647:
  Original:  1.620423 sec (100000000 iterations)
  Updated:   0.111851 sec (100000000 iterations)
  Speedup:   14.49x

Signed-off-by: JingrenWang <wjrforcyber@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant