feat: add full text search and limit flags to article list command#25
feat: add full text search and limit flags to article list command#25map9959 wants to merge 3 commits into
Conversation
|
Warning Review limit reached
More reviews will be available in 45 minutes and 25 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (10)
📝 WalkthroughWalkthroughAdds Description and Content fields to articles, introduces an exported ArticleFilter and MaxListLimit, implements FTS5-backed search and an escape helper, refactors ListArticles/GetArticles/MarkAllArticlesRead to use ArticleFilter, updates CLI and scanner wiring, and adjusts tests for the new APIs. ChangesArticleFilter + FTS5 search and propagation
Sequence Diagram(s)sequenceDiagram
participant CLI as articles/read-all command
participant Controller
participant Storage as Database
participant FTS as articles_fts
CLI->>Controller: GetArticles(ctx, db, ArticleFilter{Search: "term"})
Controller->>Storage: ListArticles(ctx, filter)
alt filter.Search is set
Storage->>FTS: SELECT ... FROM articles_fts WHERE MATCH <escaped query>
FTS-->>Storage: ranked IDs
Storage->>Storage: Join articles, apply unread/blog/category/date, pagination
else no search
Storage->>Storage: SELECT ... FROM articles WHERE filters
end
Storage-->>Controller: articles
Controller-->>CLI: articles (+ blogNames map)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
internal/controller/controller.go (1)
148-163:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftMark-all can partially apply updates on failure.
Line 156–160 updates articles one-by-one and returns immediately on the first error, so some articles may be marked read while later ones are not. This makes
read-allnon-atomic and hard to recover from.Suggested direction
func MarkAllArticlesRead(ctx context.Context, db *storage.Database, filter storage.ArticleFilter) ([]model.Article, error) { - filter.UnreadOnly = true - - articles, err := db.ListArticles(ctx, filter) - if err != nil { - return nil, err - } - - for _, article := range articles { - _, err := db.MarkArticleRead(ctx, article.ID) - if err != nil { - return nil, err - } - } - - return articles, nil + filter.UnreadOnly = true + // Prefer a single storage operation that runs in one DB transaction: + // 1) resolve matching article IDs by filter + // 2) update them in bulk + // 3) return affected articles (or IDs/count) + return db.MarkAllArticlesReadByFilter(ctx, filter) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/controller/controller.go` around lines 148 - 163, MarkAllArticlesRead currently loops over articles and returns on the first error, causing partial application; change it to perform the updates atomically by either using a batch/transaction API on storage.Database (e.g., start a transaction, call db.MarkArticleRead for each article inside it, and commit/rollback as a unit) or, if no transaction support exists, track the IDs of successfully-updated articles and on any failure run a compensating loop to revert them (e.g., call a MarkArticleUnread or equivalent) so the operation is all-or-nothing; update the implementation around MarkAllArticlesRead, db.ListArticles and db.MarkArticleRead accordingly.internal/scanner/scanner_test.go (1)
133-135:⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy liftBlocking: concurrent scan path is still failing with
SQLITE_BUSYin CI.
TestScanAllBlogsConcurrentis failing withscan Test-a: database is locked (5)and a follow-up rollback state error. Please fix the concurrent write/transaction handling in the scan path before merge (e.g., lock strategy / busy-timeout / rollback lifecycle).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/scanner/scanner_test.go` around lines 133 - 135, TestScanAllBlogsConcurrent is failing due to SQLITE_BUSY during concurrent writes; update the ScanAllBlogs path to handle SQLite concurrency robustly by (1) applying a busy-timeout or driver-level busy handler when opening or receiving *sql.DB so SQLite will wait instead of returning SQLITE_BUSY, (2) use an appropriate transaction mode in ScanAllBlogs (e.g., BEGIN IMMEDIATE or explicit locking) to avoid lock escalation during concurrent readers/writers, (3) add retry-with-backoff logic around statements that can return SQLITE_BUSY, and (4) ensure proper rollback lifecycle by always deferring tx.Rollback() before any tx.Commit() in ScanAllBlogs and any helper methods so partially started transactions are cleaned up on errors.
🧹 Nitpick comments (2)
internal/cli/commands.go (1)
247-259: ⚡ Quick winExtract shared blog-name resolution into one helper.
The same lookup/error-handling block appears twice. Centralizing it will reduce drift and keep command behavior consistent.
Refactor sketch
+func resolveBlogID(ctx context.Context, db *storage.Database, blogName string) (*int64, error) { + if blogName == "" { + return nil, nil + } + blog, err := db.GetBlogByName(ctx, blogName) + if err != nil { + return nil, err + } + if blog == nil { + return nil, fmt.Errorf("blog '%s' not found", blogName) + } + return &blog.ID, nil +}Then use it in both
newArticlesCommandandnewReadAllCommand.Also applies to: 342-353
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/cli/commands.go` around lines 247 - 259, Extract the repeated blog-name resolution into a helper (e.g., resolveBlogByFlag or getBlogFromFlag) that accepts context and the blog name, calls db.GetBlogByName(ctx, blogName), returns (*models.Blog, error) or a typed error when not found, and logs/marks errors exactly as the duplicated blocks do; then replace the duplicated code in newArticlesCommand and newReadAllCommand with a call to this helper and set filter.BlogID = &blog.ID when a blog is returned. Ensure the helper reuses db.GetBlogByName and the same printError/markError behavior so command behavior remains identical.internal/storage/database_test.go (1)
681-686: ⚡ Quick winAdd at least one body-text-only search assertion.
Current search coverage verifies title matching but not body-text matching. Add an article where only
BodyTextcontains the token and assertSearchreturns it.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/storage/database_test.go` around lines 681 - 686, Add a test case that ensures Search matches on BodyText only: create an article via db.AddArticle with BlogID set to blog.ID and BodyText containing a unique token not present in Title/URL (use model.Article's BodyText field), require no error, then call db.Search(ctx, token, ...) and assert the returned results include that article (by ID or URL/Title) so Search is validated against body-text-only matches.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/cli/commands.go`:
- Around line 238-245: Validate the CLI --limit value before constructing the
storage.ArticleFilter: read the int via viper.GetInt("limit"), check that it's >
0 and return a user-facing error (or print and exit) when <= 0, then pass the
validated value into the Limit field of the filter instead of using viper.GetInt
inline; apply the same validation for the other occurrence that also uses
viper.GetInt("limit") so both places using storage.ArticleFilter.Limit reject
non-positive values.
In `@internal/storage/database_test.go`:
- Around line 716-726: The test isn't actually exercising unread filtering
because it passes UnreadOnly: false; update the subtest so it first locates an
article that matches the Search term "golang" (e.g., iterate results from
db.ListArticles(ctx, ArticleFilter{Search: "golang"}) and pick one), call
db.MarkArticleRead(ctx, matching.ID) to mark it read, then call
db.ListArticles(ctx, ArticleFilter{Search: "golang", UnreadOnly: true}) and
assert the previously-read article is excluded (e.g., require.Len(t, articles,
0) or assert the IDs don't include matching.ID). Use the existing functions
ListArticles, ArticleFilter, and MarkArticleRead to implement this.
In `@internal/storage/database.go`:
- Around line 464-468: The code is implicitly forcing a default 100-row cap for
searches by treating limit<=0 as 100; change the logic so only an explicit
positive limit is capped: replace the current check with something like "if
limit > 100 { limit = 100 }" and only call query.Limit(uint64(limit)) when limit
> 0 (so limit<=0 means no limit); update the block around filter.Limit and
query.Limit(uint64(limit)) in database.go (the variables/functions: filter.Limit
and query.Limit) to implement this consistent behavior across search and
non-search ListArticles paths.
- Line 445: In searchArticles, stop pre-escaping apostrophes by removing
escapeFTS5Query from the FTS match parameter and pass filter.Search directly
into the query (i.e., change the Where call to use filter.Search rather than
escapeFTS5Query(filter.Search)); also remove the now-unused escapeFTS5Query
symbol and the unused strings import so there are no leftover references. Ensure
any FTS5-specific quoting is done with FTS5 double-quoted phrases where needed
rather than SQL-style single-quote escaping.
---
Outside diff comments:
In `@internal/controller/controller.go`:
- Around line 148-163: MarkAllArticlesRead currently loops over articles and
returns on the first error, causing partial application; change it to perform
the updates atomically by either using a batch/transaction API on
storage.Database (e.g., start a transaction, call db.MarkArticleRead for each
article inside it, and commit/rollback as a unit) or, if no transaction support
exists, track the IDs of successfully-updated articles and on any failure run a
compensating loop to revert them (e.g., call a MarkArticleUnread or equivalent)
so the operation is all-or-nothing; update the implementation around
MarkAllArticlesRead, db.ListArticles and db.MarkArticleRead accordingly.
In `@internal/scanner/scanner_test.go`:
- Around line 133-135: TestScanAllBlogsConcurrent is failing due to SQLITE_BUSY
during concurrent writes; update the ScanAllBlogs path to handle SQLite
concurrency robustly by (1) applying a busy-timeout or driver-level busy handler
when opening or receiving *sql.DB so SQLite will wait instead of returning
SQLITE_BUSY, (2) use an appropriate transaction mode in ScanAllBlogs (e.g.,
BEGIN IMMEDIATE or explicit locking) to avoid lock escalation during concurrent
readers/writers, (3) add retry-with-backoff logic around statements that can
return SQLITE_BUSY, and (4) ensure proper rollback lifecycle by always deferring
tx.Rollback() before any tx.Commit() in ScanAllBlogs and any helper methods so
partially started transactions are cleaned up on errors.
---
Nitpick comments:
In `@internal/cli/commands.go`:
- Around line 247-259: Extract the repeated blog-name resolution into a helper
(e.g., resolveBlogByFlag or getBlogFromFlag) that accepts context and the blog
name, calls db.GetBlogByName(ctx, blogName), returns (*models.Blog, error) or a
typed error when not found, and logs/marks errors exactly as the duplicated
blocks do; then replace the duplicated code in newArticlesCommand and
newReadAllCommand with a call to this helper and set filter.BlogID = &blog.ID
when a blog is returned. Ensure the helper reuses db.GetBlogByName and the same
printError/markError behavior so command behavior remains identical.
In `@internal/storage/database_test.go`:
- Around line 681-686: Add a test case that ensures Search matches on BodyText
only: create an article via db.AddArticle with BlogID set to blog.ID and
BodyText containing a unique token not present in Title/URL (use model.Article's
BodyText field), require no error, then call db.Search(ctx, token, ...) and
assert the returned results include that article (by ID or URL/Title) so Search
is validated against body-text-only matches.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 1c7f1276-de80-4d58-99a4-67f229d39a60
📒 Files selected for processing (9)
internal/cli/commands.gointernal/controller/controller.gointernal/controller/controller_test.gointernal/model/model.gointernal/scanner/scanner_test.gointernal/storage/database.gointernal/storage/database_test.gointernal/storage/migrations/000004_add_search.down.sqlinternal/storage/migrations/000004_add_search.up.sql
- Limit sql.DB to a single connection so concurrent workers do not fight over the SQLite file - Share the same *storage.Database across all ScanAllBlogs workers instead of opening per-worker connections - Validate CLI --limit value (reject negative) before building filter - Extract blog-name resolution into resolveBlogID helper, removing duplicated GetBlogByName blocks in articles and read-all commands - Fix searchArticles limit cap: limit <= 0 now means unlimited instead of forcing 100; both ListArticles paths cap at 100 - Rewrite escapeFTS5Query to double-quote words containing single quotes / double quotes for valid FTS5 syntax, remove unused import - Make MarkAllArticlesRead atomic via single batch UPDATE and new Database.MarkArticlesRead method - Fix unread+search test to actually exercise UnreadOnly: true with proper pre-conditions - Add test verifying FTS5 matches on body text content - Add test verifying apostrophe queries work without SQL-style escaping
6d44239 to
6f1e2d1
Compare
|
addressed all coderabbit changes |
JulienTant
left a comment
There was a problem hiding this comment.
Thanks for your contribution! Highly appreciated
I left a few feedbacks, I'm happy to take over with the fixes if you're busy!
| if filter.Limit > 0 { | ||
| l := uint64(filter.Limit) | ||
| if l > 100 { | ||
| l = 100 |
There was a problem hiding this comment.
I would prefer if we return an error, rather than forcing a default: if an agent ask for 500 items and we return 100, it might think that he got the whole list and we only have 500 entries.
Also if we can move the check at the cli/entrypoint level that'd be lovely!
| if filter.Limit > 0 { | ||
| l := uint64(filter.Limit) | ||
| if l > 100 { | ||
| l = 100 |
There was a problem hiding this comment.
re: error rather than forced value
| DiscoveredDate *time.Time | ||
| IsRead bool | ||
| Categories []string | ||
| BodyText string |
There was a problem hiding this comment.
Unless I'm missing something somewhere, we are never assigning a value to it?!
| DELETE FROM articles_fts WHERE rowid = old.id; | ||
| END; | ||
|
|
||
| CREATE TRIGGER articles_au AFTER UPDATE ON articles BEGIN |
There was a problem hiding this comment.
The current articles_ad/articles_au triggers use plain DELETE FROM articles_fts / UPDATE articles_fts SET, which is the pattern for regular (non-external) content tables. This path is exercised in production: RemoveBlog deletes a blog's articles, firing articles_ad, so after removing a blog, searches can return phantom rowids or hit "database disk image is malformed."
We can use something documented in https://www.sqlite.org/fts5.html (4.4.3)
We can replace the delete/update triggers with the 'delete'-command form.
ALTER TABLE articles ADD COLUMN body_text TEXT;
CREATE VIRTUAL TABLE articles_fts USING fts5(title, body_text, content='articles', content_rowid='rowid');
INSERT INTO articles_fts(rowid, title, body_text)
SELECT id, title, COALESCE(body_text, '') FROM articles;
CREATE TRIGGER articles_ai AFTER INSERT ON articles BEGIN
INSERT INTO articles_fts(rowid, title, body_text)
VALUES (new.id, new.title, COALESCE(new.body_text, ''));
END;
CREATE TRIGGER articles_ad AFTER DELETE ON articles BEGIN
INSERT INTO articles_fts(articles_fts, rowid, title, body_text)
VALUES ('delete', old.id, old.title, COALESCE(old.body_text, ''));
END;
CREATE TRIGGER articles_au AFTER UPDATE ON articles BEGIN
INSERT INTO articles_fts(articles_fts, rowid, title, body_text)
VALUES ('delete', old.id, old.title, COALESCE(old.body_text, ''));
INSERT INTO articles_fts(rowid, title, body_text)
VALUES (new.id, new.title, COALESCE(new.body_text, ''));
END;|
Thank you for the review @JulienTant! I made a follow-up commit, I know it rewrites existing migrations but this branch hasn't been merged yet so the only casualty should be me hahaha |
6c67e16 to
cd23942
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/storage/database.go (1)
448-453:⚠️ Potential issue | 🔴 Critical | ⚡ Quick win
searchArticlesdepends onarticles_fts, which is never created.This JOIN is the consumer of the table that the
000004_add_search.up.sqlmigration fails to create; once that migration is fixed (and indexesdescription/content), this query and the FTS tests pass. Flagging here only as the downstream effect — the fix belongs in the migration.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/storage/database.go` around lines 448 - 453, The searchArticles function relies on an FTS5 virtual table named articles_fts (used in the JOIN and OrderBy "f.rank") but that table is not being created by the migrations; update the migration 000004_add_search.up.sql to create the articles_fts FTS5 virtual table (including description and content in its column list), ensure it is populated/kept in sync with articles (via content= or explicit triggers as appropriate) and create any needed indexes so description and content are searchable; after updating the migration confirm the articles_fts name and column names match what searchArticles expects and that escapeFTS5Query can be used against the created FTS table.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/storage/migrations/000004_add_search.down.sql`:
- Around line 1-2: The rollback SQL in 000004_add_search.down.sql is dropping
content/description but the up migration never creates the full-text-search
artifacts and code (articles_fts virtual table and its triggers) required by
internal/storage/database.go; update the migrations so the up migration
(000004_add_search.up.sql) creates a virtual table named articles_fts (FTS5 or
FTS4 as used elsewhere) and the insert/update/delete triggers that keep
articles_fts in sync with articles, and update the down migration to first DROP
TRIGGER for those triggers and DROP TABLE articles_fts before dropping the
articles.content and articles.description columns; reference the migration
scripts (000004_add_search.up.sql / 000004_add_search.down.sql), the virtual
table name articles_fts, and the triggers used to sync articles -> articles_fts
when making these changes.
In `@internal/storage/migrations/000004_add_search.up.sql`:
- Around line 1-2: Migration only adds articles.description/content but never
creates the FTS5 virtual table or sync triggers, so add creation of the virtual
table articles_fts (USING fts5 with columns title, description, content,
content='articles', content_rowid='id'), backfill existing rows into
articles_fts (INSERT SELECT id, title, COALESCE(description,''),
COALESCE(content,'') FROM articles), and create AFTER INSERT/DELETE/UPDATE
triggers (e.g., articles_ai, articles_ad, articles_au) that insert/delete/update
rows in articles_fts to keep it in sync with the articles table.
---
Outside diff comments:
In `@internal/storage/database.go`:
- Around line 448-453: The searchArticles function relies on an FTS5 virtual
table named articles_fts (used in the JOIN and OrderBy "f.rank") but that table
is not being created by the migrations; update the migration
000004_add_search.up.sql to create the articles_fts FTS5 virtual table
(including description and content in its column list), ensure it is
populated/kept in sync with articles (via content= or explicit triggers as
appropriate) and create any needed indexes so description and content are
searchable; after updating the migration confirm the articles_fts name and
column names match what searchArticles expects and that escapeFTS5Query can be
used against the created FTS table.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 670c8b57-5a03-4d81-b4d9-a249a44c6703
📒 Files selected for processing (9)
internal/cli/commands.gointernal/controller/controller.gointernal/model/model.gointernal/rss/rss.gointernal/scanner/scanner.gointernal/storage/database.gointernal/storage/database_test.gointernal/storage/migrations/000004_add_search.down.sqlinternal/storage/migrations/000004_add_search.up.sql
🚧 Files skipped from review as they are similar to previous changes (2)
- internal/controller/controller.go
- internal/cli/commands.go
cd23942 to
368923d
Compare
- add Description and Content fields to match gofeed's data model - make article limit throw an error, move it to CLI layer - rewrite migration to implement FTS deletion properly
368923d to
dd8851d
Compare
Summary
Adds full text search using FTS5, native to SQLite, and a limit flag to the article list command. Also consolidates filters in the code for further extensibility. Now that blogwatcher-cli has full text search and date filters, it basically serves as a memory system for remote feeds; very useful for summarization of past collected data.
Also includes a fix for SQLITE_BUSY in the concurrent checks test.
Test plan
golangci-lint runpassesgotestsum -- ./...passesAI Disclaimer
This was generated using Hermes Agent so it could update its own copy locally, then regenerated using OpenCode so I could make this PR without giving my agent an SSH key to my Github account.
Summary by CodeRabbit
New Features
Bug Fixes / Improvements