Conversation
📝 WalkthroughWalkthroughAdds header-detection/skip support, a FileSnippet scan configuration with resolution rules, CLI options to control snippet behavior, propagation of FileSnippet as a base64 JSON header on scan requests, Winnowing header-skipping integration, numerous tests, and defensive constructors/private utilities across the codebase. Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as ScanCommandLine
participant Settings as ScanossSettings
participant FileSnippet as FileSnippet
participant Scanner as Scanner
participant Winnowing as Winnowing
participant HeaderFilter as HeaderFilter
participant API as ScanApi
CLI->>Settings: load settings from scanoss.json
CLI->>FileSnippet: build CLI FileSnippet
CLI->>FileSnippet: resolve(FileSnippet_file, FileSnippet_cli)
CLI->>Scanner: initialize with resolved settings
Scanner->>Winnowing: configure skipHeaders / skipHeadersLimit
Winnowing->>HeaderFilter: filter(file, contents)
HeaderFilter-->>Winnowing: skipLines (start_line)
alt skipHeaders enabled
Winnowing->>Winnowing: emit start_line and skip lines for hashing
end
Scanner->>API: prepare scan request
API->>API: serialize FileSnippet → base64
API->>API: attach `scanoss-settings` header
API-->>External: send HTTP request
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
🟠 Major comments (17)
testing/data/header-files-test/cache.lua-30-31 (1)
30-31:⚠️ Potential issue | 🟠 MajorEnforce integer
max_sizeto preserve capacity guarantees.At Line 30, fractional values (e.g.
1.5) pass validation and allowsizeto grow beyond intended capacity.Proposed fix
function _M.new(max_size, default_ttl) - if type(max_size) ~= "number" or max_size < 1 then + if type(max_size) ~= "number" or max_size < 1 or max_size ~= math_floor(max_size) then error("max_size must be a positive number") end🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@testing/data/header-files-test/cache.lua` around lines 30 - 31, The validation for max_size in cache.lua currently allows fractional numbers; update the check so max_size is an integer >= 1 by requiring max_size == math.floor(max_size) (or equivalent integer test) in the if condition that error()s, and update the error text to reflect "max_size must be a positive integer"; reference the max_size variable and the same check that initializes/validates size so capacity guarantees hold.testing/data/header-files-test/cache.lua-52-63 (1)
52-63:⚠️ Potential issue | 🟠 MajorValidate TTL type/range before arithmetic.
At Line 62, non-numeric
ttlwill raise a runtime error onos_time() + ttl. The constructor acceptsdefault_ttlwithout type validation (line 39), and thesetmethod acceptsttlwithout type validation (line 52), allowing either parameter to reach the arithmetic operation.Add validation in the constructor to ensure
default_ttlis a non-negative number if provided, and in thesetmethod after line 52 to validatettlbefore use.Proposed fix
function _M.new(max_size, default_ttl) if type(max_size) ~= "number" or max_size < 1 then error("max_size must be a positive number") end + if default_ttl ~= nil and (type(default_ttl) ~= "number" or default_ttl < 0) then + error("default_ttl must be a non-negative number") + end local self = {function _M.set(self, key, value, ttl) if key == nil then return nil, "key is nil" end ttl = ttl or self.default_ttl + if type(ttl) ~= "number" or ttl < 0 then + return nil, "ttl must be a non-negative number" + end🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@testing/data/header-files-test/cache.lua` around lines 52 - 63, Validate that default_ttl provided to the constructor is a number >= 0 and raise an error (or set to a safe default) if not; in the set method (after "ttl = ttl or self.default_ttl") validate ttl is a number and >= 0 before doing os_time() + ttl and raise an error if validation fails so non-numeric or negative values cannot reach the expiry arithmetic; reference the constructor's default_ttl and the set method's ttl and the os_time() + ttl assignment when adding these checks.testing/data/header-files-test/parser.pl-75-91 (1)
75-91:⚠️ Potential issue | 🟠 Major
escapeconfig is unused, and unterminated quotes are silently accepted.Line 23 defines
escape, but Lines 77-87 never use it. Escaped quotes/delimiters can be misparsed, and unmatched quotes should fail in strict mode.💡 Proposed fix
sub _split_line { my ($self, $line) = `@_`; my `@fields`; my $field = ''; my $in_quotes = 0; + my $escaped = 0; for my $char (split //, $line) { - if ($char eq $self->{quote_char} && !$in_quotes) { - $in_quotes = 1; - } elsif ($char eq $self->{quote_char} && $in_quotes) { - $in_quotes = 0; + if (!$escaped && $char eq $self->{escape}) { + $escaped = 1; + next; + } elsif ($escaped) { + $field .= $char; + $escaped = 0; + next; + } elsif ($char eq $self->{quote_char}) { + $in_quotes = !$in_quotes; } elsif ($char eq $self->{delimiter} && !$in_quotes) { push `@fields`, $field; $field = ''; } else { $field .= $char; } } push `@fields`, $field; + croak sprintf("Unterminated quoted field at line %d", $self->{line_num}) + if $self->{strict} && $in_quotes; return \@fields; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@testing/data/header-files-test/parser.pl` around lines 75 - 91, The parser loop ignores the configured escape character ($self->{escape}) and silently accepts unterminated quotes; update the loop that iterates over characters (the block that uses $in_quotes, $self->{quote_char}, $self->{delimiter}, $field and pushes to `@fields`) to honor $self->{escape} by treating an escape character as consuming the next character literally (so an escaped quote or delimiter is appended to $field without toggling $in_quotes or splitting fields), and after the loop check if $in_quotes is still true and if strict mode (e.g. $self->{strict}) is enabled then return/raise an error instead of returning fields; ensure all references use $self->{escape} and adjust field accumulation accordingly.testing/data/header-files-test/parser.pl-39-43 (1)
39-43:⚠️ Potential issue | 🟠 MajorDerive headers from the first data line, not the first physical line.
Line 39 currently treats the first line as headers unconditionally. This breaks on empty files and files starting with blank/comment lines, and can trigger warnings.
💡 Proposed fix
- my $header_line = <$fh>; - chomp $header_line; - $self->{headers} = $self->_split_line($header_line); - $self->{line_num} = 1; + $self->{line_num} = 0; + my $header_line; + while (defined(my $candidate = <$fh>)) { + $self->{line_num}++; + chomp $candidate; + next if $candidate =~ /^\s*$/; + next if $candidate =~ /^\s*#/; + $header_line = $candidate; + last; + } + croak "Missing header row in '$filename'" unless defined $header_line; + $self->{headers} = $self->_split_line($header_line);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@testing/data/header-files-test/parser.pl` around lines 39 - 43, The code currently reads the first physical line into $header_line and sets $self->{headers} via $self->_split_line which fails for files that start with blank/comment lines or are empty; change the logic in parser.pl to loop reading from <$fh> until you find the first non-empty, non-comment data line, then call $self->_split_line on that line and set $self->{headers} and $self->{line_num} (line number should reflect the data line index, not always 1); also handle EOF gracefully (no warnings) by checking for undef before chomp/split and returning an appropriate error or empty headers if the file has no data.testing/data/header-files-test/TokenVerifier.java-256-261 (1)
256-261:⚠️ Potential issue | 🟠 Major
withDefaultChecks()misses the documented subject check.Line 250 says
submust be defined, butwithDefaultChecks()does not registerSUBJECT_EXISTS_CHECK, so subject-less tokens can pass default verification.Suggested fix
public TokenVerifier<T> withDefaultChecks() { return withChecks( RealmUrlCheck.NULL_INSTANCE, + SUBJECT_EXISTS_CHECK, TokenTypeCheck.INSTANCE_DEFAULT_TOKEN_TYPE, IS_ACTIVE ); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@testing/data/header-files-test/TokenVerifier.java` around lines 256 - 261, The withDefaultChecks() method in TokenVerifier<T> is missing the documented SUBJECT_EXISTS_CHECK, allowing tokens without a 'sub' to pass; update TokenVerifier.withDefaultChecks() to include SUBJECT_EXISTS_CHECK in the call to withChecks (alongside RealmUrlCheck.NULL_INSTANCE, TokenTypeCheck.INSTANCE_DEFAULT_TOKEN_TYPE, and IS_ACTIVE) so the subject existence check is enforced during default verification.testing/data/header-files-test/TokenVerifier.java-349-352 (1)
349-352:⚠️ Potential issue | 🟠 Major
tokenType(List<String>)should fail fast on null/empty input.If
tokenTypesis null, verification later fails with a runtime NPE (Line 132) instead of a clear configuration error at setup time.Suggested fix
public TokenVerifier<T> tokenType(List<String> tokenTypes) { + if (tokenTypes == null || tokenTypes.isEmpty()) { + throw new IllegalArgumentException("tokenTypes must not be null or empty"); + } - this.expectedTokenType = tokenTypes; + this.expectedTokenType = new LinkedList<>(tokenTypes); return replaceCheck(TokenTypeCheck.class, this.checkTokenType, new TokenTypeCheck(expectedTokenType)); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@testing/data/header-files-test/TokenVerifier.java` around lines 349 - 352, tokenType(List<String>) should validate its input and fail fast: in TokenVerifier.tokenType, check that tokenTypes is non-null and not empty (and optionally that it contains no null/blank entries); if the check fails, throw an IllegalArgumentException with a clear message (e.g. "tokenTypes must not be null or empty") instead of allowing a later NPE. Keep the existing behavior of setting this.expectedTokenType and calling replaceCheck(TokenTypeCheck.class, this.checkTokenType, new TokenTypeCheck(expectedTokenType)) only after the validation passes.testing/data/header-files-test/TokenVerifier.java-377-380 (1)
377-380:⚠️ Potential issue | 🟠 Major
audience(null|empty)currently installs an always-failing check.Line 374 says input can be null, but Lines 378-380 add
new AudienceCheck(null), which always throws at Line 150. This makesaudience()with no values a guaranteed verification failure.Suggested fix
public TokenVerifier<T> audience(String... expectedAudiences) { if (expectedAudiences == null || expectedAudiences.length == 0) { - return this.replaceCheck(AudienceCheck.class, true, new AudienceCheck(null)); + return this.replaceCheck(AudienceCheck.class, false); } AudienceCheck[] audienceChecks = new AudienceCheck[expectedAudiences.length]; for (int i = 0; i < expectedAudiences.length; ++i) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@testing/data/header-files-test/TokenVerifier.java` around lines 377 - 380, The audience(...) branch currently constructs new AudienceCheck(null) which triggers an always-throwing behavior; change the branch so that when expectedAudiences is null or empty you install a non-failing/empty-audience check instead of passing null — e.g., call replaceCheck(AudienceCheck.class, true, new AudienceCheck(new String[0])) or otherwise remove/disable the AudienceCheck via replaceCheck so the verifier does not always fail; update the TokenVerifier.audience method to pass an empty String[] (or disable the check) when expectedAudiences is null/empty.testing/data/header-files-test/core.clj-22-24 (1)
22-24:⚠️ Potential issue | 🟠 MajorFail-fast on config load errors instead of silently continuing.
Returning
{}fromload-configon any exception means startup proceeds with implicit defaults even when configuration is broken. That can hide deployment issues and make behavior hard to diagnose.Suggested fix
(defn load-config "Load configuration from an EDN file. Returns a map of configuration values." [path] (try (with-open [reader (io/reader path)] (edn/read (java.io.PushbackReader. reader))) (catch Exception e (log/error e "Failed to load config from" path) - {}))) + (throw (ex-info "Configuration load failed" {:path path} e)))))Also applies to: 63-66
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@testing/data/header-files-test/core.clj` around lines 22 - 24, The catch in load-config is swallowing all exceptions and returning {} which hides broken configs; change the catch in the (catch Exception e ...) block inside load-config (and the analogous catch at lines 63-66) to log the error then rethrow the exception (e.g. call (throw e) or remove the catch to let it propagate) so the process fails fast on config load errors instead of silently continuing with {}.testing/data/header-files-test/core.clj-29-31 (1)
29-31:⚠️ Potential issue | 🟠 MajorGuard against nil or empty request strings before parsing.
The function doesn't validate
request-strbefore passing it tostr/split-lines. Ifrequest-strisnil, aNullPointerExceptionis thrown at line 29. Additionally, an empty or whitespace-only request line would result in an invalid method/path/version tuple. Use(or request-str "")to handle nil input, then validate the request line before splitting.Suggested fix
(defn parse-request "Parse an HTTP request string into a map." [request-str] - (let [lines (str/split-lines request-str) - [method path version] (str/split (first lines) #"\s+") + (let [lines (str/split-lines (or request-str "")) + request-line (first lines)] + (when (str/blank? request-line) + (throw (ex-info "Invalid request line" {:request request-str}))) + (let [[method path version] (str/split request-line #"\s+") headers (->> (rest lines) (take-while (complement str/blank?)) (map #(str/split % #":\s*" 2)) (filter #(= 2 (count %))) (into {} (map (fn [[k v]] [(str/lower-case k) v]))))] - {:method method - :path path - :version version - :headers headers})) + {:method method + :path path + :version version + :headers headers})))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@testing/data/header-files-test/core.clj` around lines 29 - 31, Guard against nil/empty request strings before parsing by replacing request-str with (or request-str "") before calling str/split-lines and validate the first request line (the result of (first lines)) contains non-whitespace content before doing (str/split ...) and destructuring into [method path version]; if the line is missing or doesn't split into three parts, handle gracefully (return an error, default values, or skip parsing headers) inside the same let where lines, [method path version], and headers are computed.testing/data/header-files-test/FileModel.ts-61-73 (1)
61-73:⚠️ Potential issue | 🟠 MajorFix transaction control in
insertFiles; current flow can hide failures and violates data integrity.Lines 60-74 contain critical transaction handling issues:
Async executor antipattern (line 61): The function returns a Promise with an
asyncexecutor, but the operations insideserialize()don't actually await, causing the function to return before writes complete.Missing error handling on INSERT statements (line 65): Individual
run()calls without callbacks silently drop errors. If any INSERT fails, execution continues without rolling back the transaction.No rollback on failure: Partial writes can persist if an error occurs after BEGIN but before COMMIT.
This pattern contradicts the proper async/await approach used elsewhere in the class (see
ignored(),setDirty(),deleteDirty()methods), which usepromisify()for error handling.Proposed fix
- public async insertFiles(data: Array<any>) { - return new Promise<void>(async (resolve, reject) => { - this.connection.serialize(async () => { - this.connection.run('begin transaction'); - for (let i = 0; i < data.length; i += 1) { - this.connection.run('INSERT INTO FILES(path,type) VALUES(?,?) ON CONFLICT(path) DO UPDATE SET type = excluded.type; ', data[i].path, data[i].type); - } - - this.connection.run('commit', (err: any) => { - if (!err) resolve(); - reject(err); - }); - }); - }); - } + public async insertFiles(data: Array<{ path: string; type: string }>): Promise<void> { + const run = promisify(this.connection.run.bind(this.connection)); + await run('BEGIN TRANSACTION'); + try { + for (const item of data) { + await run( + 'INSERT INTO files(path,type) VALUES(?,?) ON CONFLICT(path) DO UPDATE SET type = excluded.type;', + item.path, + item.type, + ); + } + await run('COMMIT'); + } catch (error) { + await run('ROLLBACK'); + throw error; + } + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@testing/data/header-files-test/FileModel.ts` around lines 61 - 73, The insertFiles method currently uses an async Promise executor and connection.serialize with fire-and-forget run() calls, which can drop errors and never roll back; refactor insertFiles to avoid an async Promise executor, use promisified versions of this.connection.run (or wrap run in callbacks/promises) to await the BEGIN, each INSERT, and the COMMIT, and on any INSERT/run error call ROLLBACK before rejecting; specifically update the code that uses this.connection.serialize, the 'begin transaction' run, the loop over data where run('INSERT INTO FILES(...)') is invoked, and the final commit callback so all runs are awaited, errors are caught, rollback is issued via this.connection.run('rollback') and then the Promise is rejected (mirroring the promisify pattern used by ignored(), setDirty(), deleteDirty()).src/main/java/com/scanoss/settings/FileSnippet.java-67-70 (1)
67-70:⚠️ Potential issue | 🟠 MajorPreserve unset semantics for
skipHeadersin merge resolution.
skipHeadersdefaults tofalse, soapplyNonDefaulttreats it as set even whenskip_headersis omitted in JSON. That can overwrite lower-priority values unexpectedly.Proposed fix
- `@Builder.Default` `@SerializedName`("skip_headers") - private Boolean skipHeaders = false; + private Boolean skipHeaders; + + public boolean isSkipHeadersSet() { + return skipHeaders != null; + } ... - if (source.getSkipHeaders() != null) { + if (source.isSkipHeadersSet()) { target.setSkipHeaders(source.getSkipHeaders()); }Also applies to: 170-172
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/main/java/com/scanoss/settings/FileSnippet.java` around lines 67 - 70, The merge logic is incorrectly treating the default false as an explicit set value; change the skipHeaders field to be nullable so "unset" can be distinguished from false and ensure applyNonDefault treats null as unset. Specifically, replace the `@Builder.Default` Boolean skipHeaders = false with a nullable Boolean skipHeaders = null (remove the default), update any builder calls or constructors that assumed a non-null default, and adjust applyNonDefault to only override when the source value is non-null; apply the same change to the other similar field instance referenced (the second skipHeaders occurrence).testing/data/header-files-test/results.py-185-191 (1)
185-191:⚠️ Potential issue | 🟠 MajorHandle
Nonereturn fromload_fileto prevent downstreamAttributeError.If
load_filecatches an exception at line 182-183, it returnsNoneimplicitly. Then_transform_data(None)will fail when callingdata.items()at line 204.Suggested fix
def _load_and_transform(self, file: str) -> List[Dict[str, Any]]: """ Load the file and transform the data into a list of dictionaries with the filename and the file data """ raw_data = self.load_file(file) + if raw_data is None: + return [] return self._transform_data(raw_data)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@testing/data/header-files-test/results.py` around lines 185 - 191, The _load_and_transform method calls load_file(file) which may return None on error; guard against that before calling _transform_data to avoid an AttributeError in _transform_data when it expects a mapping. Update _load_and_transform to check raw_data for None (or falsy) after load_file(file) and either return an empty list or raise a clear exception; reference the methods by name (_load_and_transform, load_file, _transform_data) and ensure any return path matches the method's List[Dict[str, Any]] signature and/or logs the original error for debugging.src/main/java/com/scanoss/cli/ScanCommandLine.java-124-128 (1)
124-128:⚠️ Potential issue | 🟠 Major
--skip-headershas no true "unset" state, which can override CLI intent during merge.Using primitive
booleanalways supplies a value. During resolve, defaulted file-snippet booleans can clear an explicitly intended CLI value.Proposed fix (tri-state like other booleans)
- `@picocli.CommandLine.Option`(names = {"--skip-headers"}, description = "Skip license headers, comments and imports at the beginning of files (applies locally)") - private boolean skipHeaders = false; + `@picocli.CommandLine.Option`( + names = {"--skip-headers"}, + description = "Skip license headers, comments and imports at the beginning of files (true|false|unset, applies locally)", + arity = "1" + ) + private String skipHeaders = "unset"; ... - .skipHeaders(skipHeaders) + .skipHeaders(parseTriStateBoolean(skipHeaders))Also applies to: 166-177
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/main/java/com/scanoss/cli/ScanCommandLine.java` around lines 124 - 128, Change the primitive fields to nullable boxed types so the CLI can represent an "unset" state: replace the boolean skipHeaders with a Boolean skipHeaders (no initializer) and replace int skipHeadersLimit with Integer skipHeadersLimit (no initializer or set to null) and do the same for the other tri-state options referenced around lines 166-177. Update any code that reads these fields (places that previously relied on default false/0) to explicitly handle null as "unset" before merging/overriding values, using Boolean.TRUE/Boolean.FALSE checks or null checks as appropriate.src/main/java/com/scanoss/HeaderFilter.java-271-280 (1)
271-280:⚠️ Potential issue | 🟠 MajorPython triple-single-quoted docstrings are not recognized.
Only
"""is handled. Files using'''docstring headers will be misclassified as implementation from line 1.Proposed docstring handling expansion
- if (commentPatterns.docStringStart != null && line.contains("\"\"\"")) { - int count = countOccurrences(line, "\"\"\""); - if (count == COMPLETE_DOCSTRING_QUOTE_COUNT) { + if (commentPatterns.docStringStart != null) { + int doubleQuoteCount = countOccurrences(line, "\"\"\""); + int singleQuoteCount = countOccurrences(line, "'''"); + if (doubleQuoteCount == COMPLETE_DOCSTRING_QUOTE_COUNT + || singleQuoteCount == COMPLETE_DOCSTRING_QUOTE_COUNT) { return new CommentResult(true, false); } - if (count == 1) { + if (doubleQuoteCount == 1 || singleQuoteCount == 1) { return new CommentResult(true, true); } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/main/java/com/scanoss/HeaderFilter.java` around lines 271 - 280, The docstring start handling in HeaderFilter only checks for triple-double quotes ("\"\"\"") so triple-single-quoted Python docstrings (''' ) are ignored; update the docstring detection in the method using commentPatterns.docStringStart to consider both triple-double and triple-single quote tokens (e.g., choose a docQuote variable that is either "\"\"\"" or "'''", or run the same logic twice), then call countOccurrences(line, docQuote) and compare against COMPLETE_DOCSTRING_QUOTE_COUNT and 1 to return the appropriate CommentResult(true, false) or CommentResult(true, true); ensure you reference the existing symbols commentPatterns.docStringStart, countOccurrences, COMPLETE_DOCSTRING_QUOTE_COUNT, and CommentResult so behavior remains consistent.src/main/java/com/scanoss/HeaderFilter.java-173-176 (1)
173-176:⚠️ Potential issue | 🟠 MajorPHP opening tags are treated as implementation code, forcing offset 0 for normal PHP files.
For PHP, the initial
<?php/<?line should be skipped as preamble; otherwise header/import detection never executes.Proposed fix
- if ((i == 0 && stripped.startsWith("#!")) || stripped.isEmpty()) { + if ((i == 0 && stripped.startsWith("#!")) + || (i == 0 && "php".equals(language) && (stripped.startsWith("<?php") || stripped.equals("<?"))) + || stripped.isEmpty()) { continue; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/main/java/com/scanoss/HeaderFilter.java` around lines 173 - 176, The current header-filter loop in HeaderFilter that skips only shebangs and blank lines should also skip PHP opening tags on the first line so PHP preambles like "<?php" or "<?" don't block header/import detection; update the condition that checks (i == 0 && stripped.startsWith("#!")) to additionally check for (i == 0 && (stripped.startsWith("<?php") || stripped.startsWith("<?"))) so the first-line PHP opening tag is treated as preamble and continued over, leaving the rest of the header detection logic unchanged.src/main/java/com/scanoss/HeaderFilter.java-523-533 (1)
523-533:⚠️ Potential issue | 🟠 MajorElixir and Clojure are detected but mapped to the wrong comment style.
detectLanguage()returnselixir/clojure, but missing style mapping falls back toc_style, so#/;headers are not skipped.Proposed mapping update
- for (String lang : new String[]{"python", "ruby", "perl", "r"}) { + for (String lang : new String[]{"python", "ruby", "perl", "r", "elixir"}) { LANGUAGE_COMMENT_STYLE.put(lang, "python_style"); } + for (String lang : new String[]{"clojure"}) { + LANGUAGE_COMMENT_STYLE.put(lang, "clojure_style"); + }+ COMMENT_PATTERNS.put("clojure_style", new CommentPatterns( + Pattern.compile("^\\s*;+.*$"), + null, null, null, null, null + ));🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/main/java/com/scanoss/HeaderFilter.java` around lines 523 - 533, The LANGUAGE_COMMENT_STYLE map is missing entries for languages detected as "elixir" and "clojure", so initLanguageCommentStyles() currently falls back to c_style and fails to skip '#' and ';' headers; add explicit mappings inside initLanguageCommentStyles(): map "elixir" to the existing "python_style" (handles '#' comments) and map "clojure" to an appropriate semicolon/ Lisp comment style (e.g., "lisp_style" or a new style that treats ';' as comment) and ensure the code that uses LANGUAGE_COMMENT_STYLE (referenced by detectLanguage() results and header-skipping logic) recognizes the new "lisp_style" or existing equivalent.src/main/java/com/scanoss/cli/ScanCommandLine.java-242-247 (1)
242-247:⚠️ Potential issue | 🟠 MajorReject invalid tri-state values instead of silently coercing to
false.
Boolean.parseBoolean()returnsfalsefor any non-"true"string, causing typos in CLI arguments (e.g.,--ranking enable) to silently disable the feature rather than error. Implement explicit validation that accepts only"true","false", and"unset", rejecting all other inputs with anIllegalArgumentException.Proposed parser hardening
private static Boolean parseTriStateBoolean(String value) { - if (value == null || value.equalsIgnoreCase("unset")) { + if (value == null || value.trim().isEmpty() || value.equalsIgnoreCase("unset")) { return null; } - return Boolean.parseBoolean(value); + if (value.equalsIgnoreCase("true")) { + return Boolean.TRUE; + } + if (value.equalsIgnoreCase("false")) { + return Boolean.FALSE; + } + throw new IllegalArgumentException( + "Invalid value '" + value + "'. Expected: true, false, or unset." + ); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/main/java/com/scanoss/cli/ScanCommandLine.java` around lines 242 - 247, The current parseTriStateBoolean method silently coerces any non-"true" string to false via Boolean.parseBoolean, so change parseTriStateBoolean to explicitly accept only "true", "false", or "unset" (case-insensitive): return Boolean.TRUE for "true", Boolean.FALSE for "false", and null for "unset"; for any other input throw an IllegalArgumentException with a clear message. Reference the parseTriStateBoolean method and remove the use of Boolean.parseBoolean to perform explicit string checks and validation.
🟡 Minor comments (7)
testing/data/header-files-test/logger.rb-44-51 (1)
44-51:⚠️ Potential issue | 🟡 MinorThread safety:
@tagsaccessed without synchronization.The severity methods read
@tagswithout holding the mutex, whiletaggedmodifies it under mutex protection. This creates a potential data race if multiple threads use the logger concurrently.Since this file serves as test data for header detection rather than production Ruby code, this may be acceptable for its intended purpose.
Fix if needed
SEVERITIES.each do |severity| define_method(severity) do |message = nil, &block| message = block.call if block return if message.nil? - formatted_tags = `@tags.map` { |t| "[#{t}]" }.join(" ") - `@logger.send`(severity, "#{formatted_tags} #{message}".strip) + `@mutex.synchronize` do + formatted_tags = `@tags.map` { |t| "[#{t}]" }.join(" ") + `@logger.send`(severity, "#{formatted_tags} #{message}".strip) + end end end🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@testing/data/header-files-test/logger.rb` around lines 44 - 51, The severity methods read `@tags` without any synchronization, causing a race with the tagged method that mutates `@tags` under a mutex; update the define_method block (the SEVERITIES loop / generated severity methods) to access `@tags` under the same mutex used by tagged (e.g., wrap the `@tags` read / formatted_tags creation in the mutex.synchronize or fetch a dup of `@tags` while holding the mutex) so reads and writes are consistent and thread-safe.testing/data/header-files-test/config.rs-69-73 (1)
69-73:⚠️ Potential issue | 🟡 MinorAvoid silently ignoring invalid
SERVER_PORT.If parsing fails (Line 70), the code quietly keeps the default port. That can hide deployment misconfigurations. Prefer returning an error on invalid input.
Suggested tightening
- pub fn from_env() -> Self { + pub fn from_env() -> io::Result<Self> { let mut config = Self::default(); if let Ok(host) = std::env::var("SERVER_HOST") { config.server.host = host; } if let Ok(port) = std::env::var("SERVER_PORT") { - if let Ok(port) = port.parse() { - config.server.port = port; - } + config.server.port = port.parse().map_err(|e| { + io::Error::new(io::ErrorKind::InvalidInput, format!("Invalid SERVER_PORT: {e}")) + })?; } if let Ok(url) = std::env::var("DATABASE_URL") { config.database.url = url; } - config + Ok(config) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@testing/data/header-files-test/config.rs` around lines 69 - 73, The server port parsing silently ignores invalid SERVER_PORT values—update the logic around reading std::env::var("SERVER_PORT") so that if port.parse() fails you return or propagate an explicit error instead of leaving config.server.port at its default; locate the block that sets config.server.port and change it to handle the Err from port.parse() (e.g., map the parse error to a descriptive Error/Result and return it from the surrounding function or use ? to propagate), ensuring the returned error mentions SERVER_PORT and the invalid value.CHANGELOG.md-16-16 (1)
16-16:⚠️ Potential issue | 🟡 MinorFix typo in text.
There's a missing space in "
skip_headerlimit". It should be either "skip_headerlimit" or "skip_headers_limit" to match the parameter name used in line 12.✏️ Proposed fix
-- Included `start_line` into wfp file when `skip_header`limit is enabled +- Included `start_line` into wfp file when `skip_headers_limit` is enabled🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@CHANGELOG.md` at line 16, Fix the typo in the CHANGELOG entry: replace "`skip_header`limit" with the correct parameter name and spacing to match line 12 — use "`skip_headers_limit`" (or at minimum "`skip_header` limit") so the sentence reads correctly and matches the parameter name used elsewhere (reference: start_line and wfp mentioned in the entry).CHANGELOG.md-10-10 (1)
10-10:⚠️ Potential issue | 🟡 MinorUpdate the version release date for 0.13.0.
The version is dated 2026-02-04, but the PR was created in March 2026. If this version is still unreleased, remove the specific date or update it to the actual release date.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@CHANGELOG.md` at line 10, Update the 0.13.0 changelog heading: either remove the specific date suffix or replace "2026-02-04" with the actual release date for version "0.13.0" in the changelog heading "## [0.13.0] - 2026-02-04"; ensure the heading format matches other entries (e.g., "## [0.13.0]" if unreleased or "## [0.13.0] - YYYY-MM-DD" if released).testing/data/header-files-test/cache.lua-123-123 (1)
123-123:⚠️ Potential issue | 🟡 Minor
hit_ratecalculation is biased and under-reports accuracy.At Line 123, adding
+1skews all non-zero hit rates. Use a zero-denominator guard instead.Proposed fix
function _M.stats(self) + local total = self.hits + self.misses return { @@ - hit_rate = self.hits / (self.hits + self.misses + 1), + hit_rate = total > 0 and (self.hits / total) or 0, } end🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@testing/data/header-files-test/cache.lua` at line 123, The hit_rate expression currently biases results by adding a hard +1 to the denominator; change the calculation in the code that sets hit_rate (using self.hits and self.misses) to remove the +1 and instead protect against division by zero by checking if (self.hits + self.misses) == 0 — return 0 (or an appropriate neutral value) in that case, otherwise compute hit_rate = self.hits / (self.hits + self.misses) so the metric is accurate and only guarded when the denominator is zero.testing/data/header-files-test/ViewController.m-38-39 (1)
38-39:⚠️ Potential issue | 🟡 MinorFix cell style to display subtitles.
Lines 38–39 register default-style cells, which don't have
detailTextLabel. Line 84 attempts to setcell.detailTextLabel.textwhich will beniland won't render. Create cells withUITableViewCellStyleSubtitleto display subtitles:Proposed fix
- (void)setupTableView { self.tableView = [[UITableView alloc] initWithFrame:self.view.bounds style:UITableViewStylePlain]; self.tableView.dataSource = self; self.tableView.delegate = self; self.tableView.autoresizingMask = UIViewAutoresizingFlexibleWidth | UIViewAutoresizingFlexibleHeight; - [self.tableView registerClass:[UITableViewCell class] - forCellReuseIdentifier:@"Cell"]; [self.view addSubview:self.tableView]; } - (UITableViewCell *)tableView:(UITableView *)tableView cellForRowAtIndexPath:(NSIndexPath *)indexPath { - UITableViewCell *cell = [tableView dequeueReusableCellWithIdentifier:@"Cell" - forIndexPath:indexPath]; + UITableViewCell *cell = [tableView dequeueReusableCellWithIdentifier:@"Cell"]; + if (!cell) { + cell = [[UITableViewCell alloc] initWithStyle:UITableViewCellStyleSubtitle + reuseIdentifier:@"Cell"]; + } NSDictionary *item = self.dataSource[indexPath.row]; cell.textLabel.text = item[@"title"]; cell.detailTextLabel.text = item[@"subtitle"]; cell.accessoryType = UITableViewCellAccessoryDisclosureIndicator; return cell; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@testing/data/header-files-test/ViewController.m` around lines 38 - 39, The registered cell currently uses the default style so cell.detailTextLabel is nil; update the code to create cells with UITableViewCellStyleSubtitle instead of relying on registerClass: for the default style: remove or stop using the current [self.tableView registerClass:... forCellReuseIdentifier:@"Cell"] call and in -tableView:cellForRowAtIndexPath: when dequeuing a nil cell, allocate it with [[UITableViewCell alloc] initWithStyle:UITableViewCellStyleSubtitle reuseIdentifier:@"Cell"] so detailTextLabel is available and cell.detailTextLabel.text will render.src/main/java/com/scanoss/cli/WfpCommandLine.java-74-75 (1)
74-75:⚠️ Potential issue | 🟡 MinorValidate
--skip-headers-limitas non-negative before building settings.Currently negative values are accepted and propagated into runtime settings.
Proposed fix
if (!f.exists()) { throw new RuntimeException(String.format("Error: File or folder does not exist: %s\n", fileFolder)); } + if (skipHeadersLimit < 0) { + throw new CommandLine.ParameterException( + spec.commandLine(), + "--skip-headers-limit must be >= 0" + ); + } ... FileSnippet fileSnippet = FileSnippet.builder() .skipHeaders(skipHeaders) .skipHeadersLimit(skipHeadersLimit) .build();Also applies to: 101-107
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/main/java/com/scanoss/cli/WfpCommandLine.java` around lines 74 - 75, Validate the --skip-headers-limit value is non-negative before it is used to build runtime settings: in class WfpCommandLine, check the field skipHeadersLimit (and the other related options around the same area) prior to constructing settings and reject negative values (e.g., throw a ParameterException or print an error and exit) so negative integers are not propagated into runtime settings; update the validation logic where settings are assembled (the method that reads these picocli options and builds the settings object) to enforce skipHeadersLimit >= 0 and return a clear error when the check fails.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 29c7a75f-373d-492d-a340-3195342e17cb
📒 Files selected for processing (58)
CHANGELOG.mdconfig/checkstyle.xmlpom.xmlsrc/main/java/com/scanoss/HeaderFilter.javasrc/main/java/com/scanoss/Scanner.javasrc/main/java/com/scanoss/ScanossConstants.javasrc/main/java/com/scanoss/Winnowing.javasrc/main/java/com/scanoss/cli/ScanCommandLine.javasrc/main/java/com/scanoss/cli/WfpCommandLine.javasrc/main/java/com/scanoss/filters/BaseFilter.javasrc/main/java/com/scanoss/filters/FileFilter.javasrc/main/java/com/scanoss/filters/FolderFilter.javasrc/main/java/com/scanoss/filters/GitIgnoreFilter.javasrc/main/java/com/scanoss/filters/factories/FileFilterFactory.javasrc/main/java/com/scanoss/filters/factories/FolderFilterFactory.javasrc/main/java/com/scanoss/rest/ScanApi.javasrc/main/java/com/scanoss/settings/Bom.javasrc/main/java/com/scanoss/settings/FileSnippet.javasrc/main/java/com/scanoss/settings/RemoveRule.javasrc/main/java/com/scanoss/settings/ReplaceRule.javasrc/main/java/com/scanoss/settings/Rule.javasrc/main/java/com/scanoss/settings/RuleComparator.javasrc/main/java/com/scanoss/settings/ScanossSettings.javasrc/main/java/com/scanoss/utils/Hpsm.javasrc/main/java/com/scanoss/utils/JsonUtils.javasrc/main/java/com/scanoss/utils/LineRangeUtils.javasrc/main/java/com/scanoss/utils/PackageDetails.javasrc/main/java/com/scanoss/utils/ProxyUtils.javasrc/main/java/com/scanoss/utils/Purl2Url.javasrc/main/java/com/scanoss/utils/WinnowingUtils.javasrc/test/java/com/scanoss/TestHeaderFilter.javasrc/test/java/com/scanoss/TestSettings.javasrc/test/java/com/scanoss/TestWinnowing.javasrc/test/resources/scanoss.jsontesting/data/header-files-test/DataFrame.scalatesting/data/header-files-test/FileModel.tstesting/data/header-files-test/HttpClient.kttesting/data/header-files-test/Package.swifttesting/data/header-files-test/Parser.hstesting/data/header-files-test/Router.phptesting/data/header-files-test/ServiceProvider.cstesting/data/header-files-test/StringUtils.hpptesting/data/header-files-test/TokenVerifier.javatesting/data/header-files-test/ViewController.mtesting/data/header-files-test/analysis.rtesting/data/header-files-test/cache.luatesting/data/header-files-test/config.rstesting/data/header-files-test/core.cljtesting/data/header-files-test/crc32c.ctesting/data/header-files-test/deploy.shtesting/data/header-files-test/handler.gotesting/data/header-files-test/logger.rbtesting/data/header-files-test/multiline_imports.pytesting/data/header-files-test/parser.pltesting/data/header-files-test/results.pytesting/data/header-files-test/server.extesting/data/header-files-test/server.jstesting/data/header-files-test/widget.dart
There was a problem hiding this comment.
🧹 Nitpick comments (1)
pom.xml (1)
225-229: Use the native<doclint>parameter instead of raw Javadoc option.Replace
<additionalJOptions><additionalJOption>-Xdoclint:all</additionalJOption></additionalJOptions>with the clearer, native<doclint>all</doclint>configuration. This is the recommended approach for maven-javadoc-plugin 3.5.0 and improves clarity and portability.Proposed refactor
- <configuration> - <additionalJOptions> - <additionalJOption>-Xdoclint:all</additionalJOption> - </additionalJOptions> - </configuration> + <configuration> + <doclint>all</doclint> + </configuration>🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pom.xml` around lines 225 - 229, The POM currently uses the low-level additionalJOptions/additionalJOption entries to pass -Xdoclint:all; replace that with the native doclint element supported by maven-javadoc-plugin 3.5.0 by removing the <additionalJOptions>/<additionalJOption>/-Xdoclint:all entries and adding <doclint>all</doclint> inside the plugin's <configuration>; target the maven-javadoc-plugin configuration block and update the configuration element accordingly (remove additionalJOptions/additionalJOption and add doclint).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@pom.xml`:
- Around line 225-229: The POM currently uses the low-level
additionalJOptions/additionalJOption entries to pass -Xdoclint:all; replace that
with the native doclint element supported by maven-javadoc-plugin 3.5.0 by
removing the <additionalJOptions>/<additionalJOption>/-Xdoclint:all entries and
adding <doclint>all</doclint> inside the plugin's <configuration>; target the
maven-javadoc-plugin configuration block and update the configuration element
accordingly (remove additionalJOptions/additionalJOption and add doclint).
Summary by CodeRabbit
New Features
Documentation
Tests
Chores