CAMEL-21540: Add PGVector component for PostgreSQL vector database#22207
CAMEL-21540: Add PGVector component for PostgreSQL vector database#22207
Conversation
|
🌟 Thank you for your contribution to the Apache Camel project! 🌟 🐫 Apache Camel Committers, please review the following items:
|
|
🧪 CI tested the following changed modules:
Build reactor — dependencies compiled but only changed modules were tested (28 modules)
|
apupier
left a comment
There was a problem hiding this comment.
What is the reason for a different postgres component specific for the vectors? Why not including these capabilities directly in the existing Postgres component?
There was a problem hiding this comment.
Claude Code on behalf of Guillaume Nodet
Good question! There are several reasons for keeping pgvector as a separate component under camel-ai/ rather than adding it to the existing PostgreSQL components:
-
Consistency with other vector DB components. All vector database components live under
camel-ai/: Milvus, Qdrant, Pinecone, Weaviate, Neo4j. pgvector serves the same purpose — vector similarity search for AI/RAG pipelines — it just happens to use PostgreSQL as its storage engine. Users looking for vector database components would naturally look in the AI group. -
Different use case and audience. The existing PG components (
camel-pgeventfor LISTEN/NOTIFY,camel-pg-replication-slotfor WAL replication) are PostgreSQL-specific integration utilities. pgvector targets a completely different use case: embeddings storage, similarity search, and RAG pipelines. These serve different personas. -
Additional dependencies. pgvector brings in
com.pgvector:pgvector(for vector type support) and LangChain4j data type transformers — AI-oriented dependencies that would be out of place in a general PostgreSQL component. -
Separate lifecycle. Keeping it as its own component allows it to evolve independently without affecting the existing PostgreSQL components.
This is the same approach taken by other frameworks — for example, LangChain and LlamaIndex treat pgvector as a vector store alongside Pinecone/Qdrant/etc., not as a PostgreSQL utility.
When a new dependency is added to parent/pom.xml, the diff contains
structural XML elements like <groupId>, <artifactId>, <version> which
were incorrectly extracted as "changed properties" by detectChangedProperties.
This caused the script to search for modules using ${artifactId} or
${groupId} as property references, which either matched nothing useful
or caused spurious failures.
Fix: filter out known structural POM element names (groupId, artifactId,
version, scope, type, etc.) so only actual property names like
"pgvector-version" or "openai-java-version" are detected.
Fixes the CI script bug seen in PR #22207 where adding a new component
to parent/pom.xml caused the dependency detection to fail.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
The CI failures on the "mvn test parent pom dependencies changed" step are caused by a bug in the All actual tests pass on JDK 17 and 21. The "Fail if there are uncommitted changes" check also passes on all JDKs. Claude Code on behalf of Guillaume Nodet |
|
Hi @gnodet could you check how this component integrates with the camel-openai component? in the openai documentation https://camel.apache.org/components/4.18.x/openai-component.html#_postgresql_pgvector_recommended there is an example that uses openai + sql, I was wondering if there are potential improvements with the pgvector component |
|
Good point! I've updated the The raw SQL approach via Claude Code on behalf of Guillaume Nodet |
|
CI status: all actual tests pass (JDK 21 fully green, JDK 17 and 25 cancelled by fail-fast). The only failing step is "mvn test parent pom dependencies changed" — the CI script bug that will be fixed by #22247. Claude Code on behalf of Guillaume Nodet |
- New camel-pgvector component under components/camel-ai/ - Supports CREATE_TABLE, DROP_TABLE, UPSERT, DELETE, SIMILARITY_SEARCH actions - Uses PostgreSQL pgvector extension via JDBC with com.pgvector library - Supports cosine, euclidean, and inner product distance types - LangChain4j data type transformers: pgvector:embeddings and pgvector:rag - Integration tests with testcontainers pgvector image - LangChain4j embeddings integration test with AllMiniLmL6V2 model Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add BOM, catalog, DSL, and documentation generated files - Remove Spring Boot starter reference (no starter yet) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…in and add pgvector entries Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove pgvector from alphabetical nav listing (grouped under AI only) - Fix trailing spaces in javadoc blank comment lines - Update EventEndpointBuilderFactory to CamelEventEndpointBuilderFactory - Update rest-openapi description Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add CREATE_INDEX action to create HNSW indexes for faster approximate nearest neighbor search, using the configured distance type - Add CamelPgVectorFilter header to apply SQL WHERE clause filtering on similarity search results (e.g., filter by metadata or text content) - Add integration tests for both features Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…index action Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace ** glob with {*,*/*} for dsl source pattern to prevent
scandir of target/ directories created during parallel builds.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The dsl.adoc lives at dsl/src/main/docs/ (depth 0), which is not
matched by {*,*/*}. Add explicit pattern for it.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the raw SQL example with the pgvector component for PostgreSQL vector database integration. Add a similarity search example showing the full index + search workflow. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add pgvector examples to langchain4j-embeddings docs (alongside Qdrant) - Add OpenAI and LangChain4j integration examples to pgvector docs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Make PgVectorHeaders final with private constructor - Add null check for embedding header in data type transformer - Add upsertUpdate test to verify ON CONFLICT DO UPDATE behavior - Add upsertAutoId test to verify auto-generated UUID - Fix test ordering and assertions after adding new tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add log4j2.properties for test logging (consistent with other vector DB components) - Remove redundant camel-pgvector from openai "Alternative" list since it's already recommended above Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The {*,*/*} pattern causes the glob library to scan into target/
directories at the second level, which can race with concurrent test
execution deleting target/surefire. Use !(target) extglob to exclude
target directories from the scan.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
New
camel-pgvectorcomponent for vector similarity search using the PostgreSQL pgvector extension. This provides a lightweight, SQL-native vector database option — no separate infrastructure needed beyond PostgreSQL.Example Routes
Store embeddings with OpenAI:
Similarity search:
LangChain4j RAG pipeline:
Features
CREATE_TABLE,CREATE_INDEX(HNSW),DROP_TABLE,UPSERT,DELETE,SIMILARITY_SEARCHpgvector:embeddingsandpgvector:ragfor LangChain4j integrationAlso included
target/directories during parallel builds)Test plan
PgVectorComponentIT(11 tests) — all actions, upsert update, auto-ID, filtered searchLangChain4jEmbeddingsComponentPgVectorTargetIT(4 tests) — end-to-end embeddings pipeline