-
Notifications
You must be signed in to change notification settings - Fork 123
Add CLI executable with model registry support #127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Implements issue #119 with a full-featured command-line interface: - Default action is classification (no subcommand needed) - Supports Bayes, LSI, KNN, and Logistic Regression models - Commands: train, info, fit (LR only), search (LSI), related (LSI) - Flags: -f (file), -m (model type), -p (probabilities), -k (KNN neighbors) - Reads from stdin when no text argument provided - Sensible defaults: ./classifier.json, bayes model type Also: - Add Classifier::VERSION as single source of truth - LR now requires explicit fit() before classification - Add add_category() method to LR for dynamic category creation Closes #119
Instead of showing "Error: model not found" when running `classifier` with no arguments and no model file, show a friendly getting started guide with examples for training, classifying, and LSI semantic search.
When a model exists but classifier is run without any text to classify, show the model info and usage hints instead of silently exiting.
The info command now outputs pretty-printed JSON with detailed stats: - file: model path - type: classifier type - categories: list of categories - category_stats: per-category word counts (Bayes) - documents: document count (LSI, KNN) - items: list of items (LSI) - fitted: fit status (LR)
- LSI examples now use stdin training like Bayes examples - Use separate lsi.json file to avoid conflicts - All examples can be copy-pasted and run immediately
- Add defaults for --learning-rate (0.1), --regularization (0.01), --max-iterations (100) - Remove -f lsi.json from LSI examples for simpler copy-paste
Greptile SummaryImplements a full-featured CLI for training and classifying text across all classifier types (Bayes, LSI, KNN, Logistic Regression). The implementation successfully provides sensible defaults, stdin support, JSON output for Key Changes:
Style Issues Found:
Implementation Quality: Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant Exe as exe/classifier
participant CLI as CLI.new
participant Parser as OptionParser
participant Command as execute_command
participant Classifier as Bayes/LSI/KNN/LR
participant Storage as Storage::File
User->>Exe: Run classifier [args]
Exe->>CLI: new(ARGV)
CLI->>CLI: initialize(args)
Note over CLI: Set default options from ENV
Exe->>CLI: run()
CLI->>Parser: parse_options
Parser->>CLI: Parse flags (-f, -m, -p, etc)
CLI->>Command: execute_command
alt train command
Command->>Storage: load_or_create_classifier
Storage-->>Command: Classifier instance
Command->>Classifier: train(category, text)
Command->>Storage: save_classifier
Storage->>Storage: write JSON
else classify command (default)
alt No model exists
Command->>User: show_getting_started
else Model exists
Command->>Storage: load_classifier
Storage-->>Command: Classifier instance
alt LR not fitted
Command-->>User: Error: Model not fitted
else Ready to classify
Command->>Classifier: classify(text)
Classifier-->>Command: category
Command->>User: Output result
end
end
else fit command (LR only)
Command->>Storage: load_classifier
Storage-->>Command: LR instance
Command->>Classifier: fit()
Note over Classifier: Run SGD optimization
Command->>Storage: save_classifier
else info command
Command->>Storage: load_classifier
Storage-->>Command: Classifier instance
Command->>Classifier: Extract metadata
Command->>User: Output JSON
else search/related (LSI only)
Command->>Storage: load_classifier
Storage-->>Command: LSI instance
Command->>Classifier: search(query) / find_related(item)
Classifier-->>Command: results with scores
Command->>User: Output results
end
CLI-->>Exe: {output, error, exit_code}
Exe->>User: Display output/error
Exe->>User: exit(code)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
13 files reviewed, 11 comments
| unless File.exist?(@options[:model]) | ||
| if text.empty? && ($stdin.tty? || @stdin.nil?) | ||
| show_getting_started | ||
| return | ||
| end | ||
| @error << "Error: model not found at #{@options[:model]}" | ||
| @exit_code = 1 | ||
| return | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Nested conditionals violate style guide. Use guard clauses with early returns.
| unless File.exist?(@options[:model]) | |
| if text.empty? && ($stdin.tty? || @stdin.nil?) | |
| show_getting_started | |
| return | |
| end | |
| @error << "Error: model not found at #{@options[:model]}" | |
| @exit_code = 1 | |
| return | |
| end | |
| # No model and no input - show getting started guide | |
| if text.empty? && ($stdin.tty? || @stdin.nil?) | |
| show_getting_started | |
| return | |
| end | |
| unless File.exist?(@options[:model]) | |
| @error << "Error: model not found at #{@options[:model]}" | |
| @exit_code = 1 | |
| return | |
| end |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/cli.rb
Line: 336:344
Comment:
**style:** Nested conditionals violate style guide. Use guard clauses with early returns.
```suggestion
# No model and no input - show getting started guide
if text.empty? && ($stdin.tty? || @stdin.nil?)
show_getting_started
return
end
unless File.exist?(@options[:model])
@error << "Error: model not found at #{@options[:model]}"
@exit_code = 1
return
end
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| if text.empty? | ||
| lines = read_stdin_lines | ||
| if lines.empty? | ||
| show_model_usage(classifier) | ||
| return | ||
| end | ||
| lines.each { |line| classify_and_output(classifier, line) } | ||
| else | ||
| classify_and_output(classifier, text) | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Nested conditionals make logic harder to follow. Refactor using guard clauses.
| if text.empty? | |
| lines = read_stdin_lines | |
| if lines.empty? | |
| show_model_usage(classifier) | |
| return | |
| end | |
| lines.each { |line| classify_and_output(classifier, line) } | |
| else | |
| classify_and_output(classifier, text) | |
| end | |
| if text.empty? | |
| lines = read_stdin_lines | |
| return show_model_usage(classifier) if lines.empty? | |
| lines.each { |line| classify_and_output(classifier, line) } | |
| else | |
| classify_and_output(classifier, text) | |
| end |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/cli.rb
Line: 348:357
Comment:
**style:** Nested conditionals make logic harder to follow. Refactor using guard clauses.
```suggestion
if text.empty?
lines = read_stdin_lines
return show_model_usage(classifier) if lines.empty?
lines.each { |line| classify_and_output(classifier, line) }
else
classify_and_output(classifier, text)
end
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| if classifier.is_a?(LSI) && @args.any? | ||
| train_lsi_from_files(classifier, category, @args) | ||
| else | ||
| text = read_training_input | ||
| if text.empty? | ||
| @error << 'Error: no training data provided' | ||
| @exit_code = 2 | ||
| return | ||
| end | ||
| train_classifier(classifier, category, text) | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Nested conditional inside else branch. Use guard clauses for clearer flow.
| if classifier.is_a?(LSI) && @args.any? | |
| train_lsi_from_files(classifier, category, @args) | |
| else | |
| text = read_training_input | |
| if text.empty? | |
| @error << 'Error: no training data provided' | |
| @exit_code = 2 | |
| return | |
| end | |
| train_classifier(classifier, category, text) | |
| end | |
| # For LSI, train with file paths as item keys | |
| if classifier.is_a?(LSI) && @args.any? | |
| train_lsi_from_files(classifier, category, @args) | |
| save_classifier(classifier) | |
| return | |
| end | |
| text = read_training_input | |
| if text.empty? | |
| @error << 'Error: no training data provided' | |
| @exit_code = 2 | |
| return | |
| end | |
| train_classifier(classifier, category, text) |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/cli.rb
Line: 171:181
Comment:
**style:** Nested conditional inside else branch. Use guard clauses for clearer flow.
```suggestion
# For LSI, train with file paths as item keys
if classifier.is_a?(LSI) && @args.any?
train_lsi_from_files(classifier, category, @args)
save_classifier(classifier)
return
end
text = read_training_input
if text.empty?
@error << 'Error: no training data provided'
@exit_code = 2
return
end
train_classifier(classifier, category, text)
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
lib/classifier/cli.rb
Outdated
| # Categories - different classifiers store them differently | ||
| if classifier.respond_to?(:categories) | ||
| info[:categories] = classifier.categories.map(&:to_s) | ||
| end | ||
|
|
||
| info[:training_count] = classifier.training_count if classifier.respond_to?(:training_count) | ||
| info[:vocab_size] = classifier.vocab_size if classifier.respond_to?(:vocab_size) | ||
| info[:fitted] = classifier.fitted? if classifier.respond_to?(:fitted?) | ||
|
|
||
| # Category-specific stats for Bayes | ||
| if classifier.is_a?(Bayes) | ||
| categories_data = classifier.instance_variable_get(:@categories) | ||
| info[:category_stats] = classifier.categories.to_h do |cat| | ||
| cat_data = categories_data[cat.to_sym] || {} | ||
| [cat.to_s, { unique_words: cat_data.size, total_words: cat_data.values.sum }] | ||
| end | ||
| end | ||
|
|
||
| # LSI-specific info | ||
| if classifier.is_a?(LSI) | ||
| info[:documents] = classifier.items.size | ||
| info[:items] = classifier.items | ||
| # Get unique categories from items | ||
| categories = classifier.items.map { |item| classifier.categories_for(item) }.flatten.uniq | ||
| info[:categories] = categories.map(&:to_s) unless categories.empty? | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Remove inline comments that restate code. # Categories - different classifiers store them differently and # Get unique categories from items just describe what the code already shows. The conditional checks make the intent obvious.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/cli.rb
Line: 204:229
Comment:
**style:** Remove inline comments that restate code. `# Categories - different classifiers store them differently` and `# Get unique categories from items` just describe what the code already shows. The conditional checks make the intent obvious.
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
lib/classifier/cli.rb
Outdated
| # Category-specific stats for Bayes | ||
| if classifier.is_a?(Bayes) | ||
| categories_data = classifier.instance_variable_get(:@categories) | ||
| info[:category_stats] = classifier.categories.to_h do |cat| | ||
| cat_data = categories_data[cat.to_sym] || {} | ||
| [cat.to_s, { unique_words: cat_data.size, total_words: cat_data.values.sum }] | ||
| end | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Remove comment # Category-specific stats for Bayes — the code clearly shows it's extracting Bayes-specific category stats.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/cli.rb
Line: 213:220
Comment:
**style:** Remove comment `# Category-specific stats for Bayes` — the code clearly shows it's extracting Bayes-specific category stats.
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
lib/classifier/cli.rb
Outdated
| # KNN-specific info | ||
| if classifier.is_a?(KNN) | ||
| data = classifier.instance_variable_get(:@data) || [] | ||
| info[:documents] = data.size | ||
| categories = data.map { |d| d[:category] }.uniq | ||
| info[:categories] = categories.map(&:to_s) unless categories.empty? | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Remove comment # KNN-specific info — redundant given the type check.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/cli.rb
Line: 231:237
Comment:
**style:** Remove comment `# KNN-specific info` — redundant given the type check.
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
lib/classifier/cli.rb
Outdated
| # For LSI, train with file paths as item keys | ||
| if classifier.is_a?(LSI) && @args.any? | ||
| train_lsi_from_files(classifier, category, @args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Remove comment # For LSI, train with file paths as item keys — restates what the code shows. The classifier.is_a?(LSI) check and method name train_lsi_from_files make this obvious.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/cli.rb
Line: 170:172
Comment:
**style:** Remove comment `# For LSI, train with file paths as item keys` — restates what the code shows. The `classifier.is_a?(LSI)` check and method name `train_lsi_from_files` make this obvious.
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
lib/classifier/cli.rb
Outdated
| # No model and no input - show getting started guide | ||
| unless File.exist?(@options[:model]) | ||
| if text.empty? && ($stdin.tty? || @stdin.nil?) | ||
| show_getting_started | ||
| return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Remove comment # No model and no input - show getting started guide — obvious from the condition and method call.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/cli.rb
Line: 335:339
Comment:
**style:** Remove comment `# No model and no input - show getting started guide` — obvious from the condition and method call.
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
lib/classifier/cli.rb
Outdated
| # LR requires explicit fit before classification | ||
| if classifier.is_a?(LogisticRegression) && !classifier.fitted? | ||
| raise StandardError, "Model not fitted. Run 'classifier fit' after training." | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Remove comment # LR requires explicit fit before classification — the error message already explains this clearly.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/cli.rb
Line: 387:390
Comment:
**style:** Remove comment `# LR requires explicit fit before classification` — the error message already explains this clearly.
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| content = File.read(file) | ||
| # Use file path as item key so search/related returns file paths | ||
| classifier.add_item(file, category.to_sym) { content } | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Remove comment # Use file path as item key so search/related returns file paths — obvious from the code using file as the item identifier.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/cli.rb
Line: 488:491
Comment:
**style:** Remove comment `# Use file path as item key so search/related returns file paths` — obvious from the code using `file` as the item identifier.
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.- Refactor build_model_info into smaller methods to reduce complexity - Fix Minitest assertion style (refute/assert vs assert_equal) - Add JSON.pretty_generate to vendor RBS
- Fix file permissions on generated RBS files (644 instead of 600) - Refactor restore_state to reduce ABC complexity - Regenerate RBS files for logistic_regression
- Refactor nested conditionals to use guard clauses in command_train - Refactor nested conditionals to use guard clauses in command_classify - Remove redundant comments that restate what code already shows
Enable downloading and using pre-trained models from GitHub repositories similar to Homebrew taps. This makes it easy to share and reuse classifier models without training from scratch. New commands: - `models` - list available models in a registry - `pull` - download models to local cache - `push` - show instructions for contributing models New options: - `-r/--remote MODEL` - classify using a remote model directly - `-o/--output FILE` - specify download location for pull Features: - Default registry: cardmagic/classifier-models - Custom registries: @user/repo:model syntax - Local caching in ~/.classifier/models/ - Automatic fallback from main to master branch - Environment variables for configuration Closes #119
Steep was failing because the CLI uses stdlib classes (FileUtils, URI, Net::HTTP, JSON::ParserError) that weren't declared in the Steepfile. Also fixes nil safety issue with spec[1..] which returns String? in Ruby's type system - now uses fallback to empty string.
The 'models' command now supports --local to list models that have been downloaded to the local cache (~/.classifier/models/) instead of fetching the registry index from GitHub. Shows model name, type, and human-readable file size. Custom registry models display with their full @user/repo:name format.
Use idiomatic Ruby patterns to satisfy linter:
- Replace each+append with map for building arrays
- Use annotated format tokens (%<name>s) over positional ones
- Use %r{} for regex containing slashes
Summary
Implements #119 - a full-featured command-line interface for the classifier gem with model registry support.
Core CLI Features
classifier 'text to classify'train,info,fit(LR only),search(LSI),related(LSI)./classifier.jsonmodel file,bayesmodel typeclassifier infooutputs JSON for jq filteringModel Registry (like Homebrew taps)
Pre-trained models can be downloaded from public GitHub repositories:
github.com/cardmagic/classifier-models@user/repo:modelsyntax)New Commands:
classifier models- List models in default registryclassifier models @user/repo- List models in custom registryclassifier pull <model>- Download model from default registryclassifier pull @user/repo:model- Download model from custom registryclassifier pull @user/repo- Download ALL models from registryclassifier push- Shows instructions for contributing a modelNew Options:
-r, --remote MODEL- Use remote model directly (downloads and caches automatically)-o, --output FILE- Output path for pull commandCache System:
~/.classifier/models/~/.classifier/models/@user/repo/Developer Experience
classifierwith no model shows a getting started guide with copy-paste examplesclassifierwith a model but no input shows model info and usage hintsExample Usage
Environment Variables
CLASSIFIER_MODEL- Default model pathCLASSIFIER_TYPE- Default classifier typeCLASSIFIER_REGISTRY- Default registry (default:cardmagic/classifier-models)CLASSIFIER_CACHE- Cache directory (default:~/.classifier)Test plan
-rflag testedCloses #119