CAMEL-21540: Add PGVector component for PostgreSQL vector database by gnodet · Pull Request #22207 · apache/camel

gnodet · 2026-03-23T20:53:13Z

Summary

New camel-pgvector component for vector similarity search using the PostgreSQL pgvector extension. This provides a lightweight, SQL-native vector database option — no separate infrastructure needed beyond PostgreSQL.

Example Routes

Store embeddings with OpenAI:

- route:
    from:
      uri: direct:index
    steps:
      - setVariable:
          name: text
          simple: "${body}"
      - to:
          uri: openai:embeddings
          parameters:
            embeddingModel: nomic-embed-text
      - setHeader:
          name: CamelPgVectorAction
          constant: UPSERT
      - setHeader:
          name: CamelPgVectorTextContent
          simple: "${variable.text}"
      - to: pgvector:documents

Similarity search:

- route:
    from:
      uri: direct:search
    steps:
      - to:
          uri: openai:embeddings
          parameters:
            embeddingModel: nomic-embed-text
      - setHeader:
          name: CamelPgVectorAction
          constant: SIMILARITY_SEARCH
      - setHeader:
          name: CamelPgVectorQueryTopK
          constant: 5
      - to: pgvector:documents

LangChain4j RAG pipeline:

from("direct:search")
    .to("langchain4j-embeddings:embed")
    .transformDataType(new DataType("pgvector:embeddings"))
    .setHeader(PgVectorHeaders.ACTION, constant(PgVectorAction.SIMILARITY_SEARCH))
    .to("pgvector:myCollection")
    .transformDataType(new DataType("pgvector:rag"));

Features

Actions: CREATE_TABLE, CREATE_INDEX (HNSW), DROP_TABLE, UPSERT, DELETE, SIMILARITY_SEARCH
Distance types: cosine (default), euclidean, inner product
Filtering: SQL WHERE clause on text_content and metadata columns
Auto-ID: UUID auto-generated when no record ID is provided
Upsert: ON CONFLICT DO UPDATE for idempotent writes
Data type transformers: pgvector:embeddings and pgvector:rag for LangChain4j integration
Cross-documentation: OpenAI and LangChain4j embeddings docs updated with pgvector examples

Also included

Docs gulp race condition fix (ENOENT on target/ directories during parallel builds)

Test plan

PgVectorComponentIT (11 tests) — all actions, upsert update, auto-ID, filtered search
LangChain4jEmbeddingsComponentPgVectorTargetIT (4 tests) — end-to-end embeddings pipeline
Code formatted and imports sorted
All generated files committed (catalog, endpoint DSL, component DSL)

github-actions · 2026-03-23T20:53:46Z

🌟 Thank you for your contribution to the Apache Camel project! 🌟
🤖 CI automation will test this PR automatically.

🐫 Apache Camel Committers, please review the following items:

First-time contributors require MANUAL approval for the GitHub Actions to run
You can use the command /component-test (camel-)component-name1 (camel-)component-name2.. to request a test from the test bot although they are normally detected and executed by CI.
You can label PRs using build-all, build-dependents, skip-tests and test-dependents to fine-tune the checks executed by this PR.
Build and test logs are available in the summary page. Only Apache Camel committers have access to the summary.

⚠️ Be careful when sharing logs. Review their contents before sharing them publicly.

github-actions · 2026-03-24T00:02:16Z

🧪 CI tested the following changed modules:

bom/camel-bom
catalog/camel-allcomponents
catalog/camel-catalog
components/camel-ai
components/camel-ai/camel-langchain4j-embeddings
components/camel-ai/camel-openai
components/camel-ai/camel-pgvector
core/camel-main
docs
dsl/camel-componentdsl
dsl/camel-endpointdsl
dsl/camel-kamelet-main
parent
tooling/maven/camel-package-maven-plugin

ℹ️ Dependent modules were not tested because the total number of affected modules exceeded the threshold (50). Use the test-dependents label to force testing all dependents.

Build reactor — dependencies compiled but only changed modules were tested (28 modules)

Camel :: AI :: LangChain4j :: Embedding
Camel :: AI :: LangChain4j :: Embedding [jar]
Camel :: AI :: OpenAI
Camel :: AI :: OpenAI [jar]
Camel :: AI :: PGVector
Camel :: AI :: PGVector [jar]
Camel :: AI :: Parent
Camel :: AI :: Parent [pom]
Camel :: All Components Sync point
Camel :: All Components Sync point [pom]
Camel :: BOM
Camel :: BOM [pom]
Camel :: Catalog :: Camel Catalog
Camel :: Catalog :: Camel Catalog [jar]
Camel :: Component DSL
Camel :: Component DSL [jar]
Camel :: Docs
Camel :: Docs [pom]
Camel :: Endpoint DSL
Camel :: Endpoint DSL [jar]
Camel :: Kamelet Main
Camel :: Kamelet Main [jar]
Camel :: Main
Camel :: Main [jar]
Camel :: Maven Plugins :: Camel Maven Package
Camel :: Maven Plugins :: Camel Maven Package [maven-plugin]
Camel :: Parent
Camel :: Parent [pom]

apupier

What is the reason for a different postgres component specific for the vectors? Why not including these capabilities directly in the existing Postgres component?

components/camel-ai/camel-pgvector/pom.xml

gnodet

Claude Code on behalf of Guillaume Nodet

Good question! There are several reasons for keeping pgvector as a separate component under camel-ai/ rather than adding it to the existing PostgreSQL components:

Consistency with other vector DB components. All vector database components live under camel-ai/: Milvus, Qdrant, Pinecone, Weaviate, Neo4j. pgvector serves the same purpose — vector similarity search for AI/RAG pipelines — it just happens to use PostgreSQL as its storage engine. Users looking for vector database components would naturally look in the AI group.
Different use case and audience. The existing PG components (camel-pgevent for LISTEN/NOTIFY, camel-pg-replication-slot for WAL replication) are PostgreSQL-specific integration utilities. pgvector targets a completely different use case: embeddings storage, similarity search, and RAG pipelines. These serve different personas.
Additional dependencies. pgvector brings in com.pgvector:pgvector (for vector type support) and LangChain4j data type transformers — AI-oriented dependencies that would be out of place in a general PostgreSQL component.
Separate lifecycle. Keeping it as its own component allows it to evolve independently without affecting the existing PostgreSQL components.

This is the same approach taken by other frameworks — for example, LangChain and LlamaIndex treat pgvector as a vector store alongside Pinecone/Qdrant/etc., not as a PostgreSQL utility.

When a new dependency is added to parent/pom.xml, the diff contains structural XML elements like <groupId>, <artifactId>, <version> which were incorrectly extracted as "changed properties" by detectChangedProperties. This caused the script to search for modules using ${artifactId} or ${groupId} as property references, which either matched nothing useful or caused spurious failures. Fix: filter out known structural POM element names (groupId, artifactId, version, scope, type, etc.) so only actual property names like "pgvector-version" or "openai-java-version" are detected. Fixes the CI script bug seen in PR #22207 where adding a new component to parent/pom.xml caused the dependency detection to fail. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

gnodet · 2026-03-25T13:36:53Z

The CI failures on the "mvn test parent pom dependencies changed" step are caused by a bug in the detect-test.sh script: it incorrectly extracts structural XML elements (<groupId>, <artifactId>, <version>) as "changed properties" when a new dependency is added to parent/pom.xml. This is fixed in #22247.

All actual tests pass on JDK 17 and 21. The "Fail if there are uncommitted changes" check also passes on all JDKs.

Claude Code on behalf of Guillaume Nodet

Croway · 2026-03-25T14:56:22Z

Hi @gnodet could you check how this component integrates with the camel-openai component? in the openai documentation https://camel.apache.org/components/4.18.x/openai-component.html#_postgresql_pgvector_recommended there is an example that uses openai + sql, I was wondering if there are potential improvements with the pgvector component

gnodet · 2026-03-25T15:12:32Z

Good point! I've updated the camel-openai docs in this PR to show how the pgvector component integrates with openai embeddings. The pgvector component simplifies the flow — no need for raw SQL, manual ::vector casting, or intermediate variables for the embedding. It also adds similarity search, HNSW indexing, and metadata filtering out of the box.

The raw SQL approach via camel-sql is still documented as a fallback for custom table schemas or complex queries (joins, CTEs, etc.).

Claude Code on behalf of Guillaume Nodet

gnodet · 2026-03-25T23:30:11Z

CI status: all actual tests pass (JDK 21 fully green, JDK 17 and 25 cancelled by fail-fast). The only failing step is "mvn test parent pom dependencies changed" — the CI script bug that will be fixed by #22247.

Claude Code on behalf of Guillaume Nodet

- New camel-pgvector component under components/camel-ai/ - Supports CREATE_TABLE, DROP_TABLE, UPSERT, DELETE, SIMILARITY_SEARCH actions - Uses PostgreSQL pgvector extension via JDBC with com.pgvector library - Supports cosine, euclidean, and inner product distance types - LangChain4j data type transformers: pgvector:embeddings and pgvector:rag - Integration tests with testcontainers pgvector image - LangChain4j embeddings integration test with AllMiniLmL6V2 model Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add BOM, catalog, DSL, and documentation generated files - Remove Spring Boot starter reference (no starter yet) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…in and add pgvector entries Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Remove pgvector from alphabetical nav listing (grouped under AI only) - Fix trailing spaces in javadoc blank comment lines - Update EventEndpointBuilderFactory to CamelEventEndpointBuilderFactory - Update rest-openapi description Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add CREATE_INDEX action to create HNSW indexes for faster approximate nearest neighbor search, using the configured distance type - Add CamelPgVectorFilter header to apply SQL WHERE clause filtering on similarity search results (e.g., filter by metadata or text content) - Add integration tests for both features Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…index action Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace ** glob with {*,*/*} for dsl source pattern to prevent scandir of target/ directories created during parallel builds. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The dsl.adoc lives at dsl/src/main/docs/ (depth 0), which is not matched by {*,*/*}. Add explicit pattern for it. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace the raw SQL example with the pgvector component for PostgreSQL vector database integration. Add a similarity search example showing the full index + search workflow. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add pgvector examples to langchain4j-embeddings docs (alongside Qdrant) - Add OpenAI and LangChain4j integration examples to pgvector docs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Make PgVectorHeaders final with private constructor - Add null check for embedding header in data type transformer - Add upsertUpdate test to verify ON CONFLICT DO UPDATE behavior - Add upsertAutoId test to verify auto-generated UUID - Fix test ordering and assertions after adding new tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add log4j2.properties for test logging (consistent with other vector DB components) - Remove redundant camel-pgvector from openai "Alternative" list since it's already recommended above Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The {*,*/*} pattern causes the glob library to scan into target/ directories at the second level, which can race with concurrent test execution deleting target/surefire. Use !(target) extglob to exclude target directories from the scan. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions bot added components catalog docs tooling tooling-maven core-build-and-dependencies components-ai core dsl labels Mar 23, 2026

gnodet marked this pull request as draft March 23, 2026 21:21

gnodet marked this pull request as ready for review March 24, 2026 05:53

apupier reviewed Mar 24, 2026

View reviewed changes

components/camel-ai/camel-pgvector/pom.xml Show resolved Hide resolved

gnodet commented Mar 24, 2026

View reviewed changes

oscerd approved these changes Mar 24, 2026

View reviewed changes

gnodet force-pushed the hungry-quark branch from dc85b19 to d595392 Compare March 24, 2026 22:23

gnodet mentioned this pull request Mar 25, 2026

chore(ci): rationalize CI into single workflow with unified comment #22247

Open

7 tasks

gnodet marked this pull request as draft March 25, 2026 14:53

gnodet and others added 5 commits March 26, 2026 23:35

CAMEL-21540: Add generated files for PGVector component

2c810f2

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

CAMEL-21540: Add generated files and fix docs for PGVector component

2193ff7

- Add BOM, catalog, DSL, and documentation generated files - Remove Spring Boot starter reference (no starter yet) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

CAMEL-21540: Fix docs symlinks and nav ordering for pgvector component

7a601c4

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

CAMEL-21540: Fix generated files - restore missing components from ma…

d8e0eaa

…in and add pgvector entries Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

gnodet and others added 10 commits March 26, 2026 23:35

CAMEL-21540: Fix springEvent to use EventEndpointBuilderFactory

0f8ee2a

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

CAMEL-21540: Update generated files for new filter header and create …

3d78b1d

…index action Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

CAMEL-21540: Fix docs gulp race condition with dsl target directories

3d5e6b4

Replace ** glob with {*,*/*} for dsl source pattern to prevent scandir of target/ directories created during parallel builds. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

CAMEL-21540: Include dsl/src/main/docs in gulpfile source pattern

b8ee61a

The dsl.adoc lives at dsl/src/main/docs/ (depth 0), which is not matched by {*,*/*}. Add explicit pattern for it. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

CAMEL-21540: Mention raw SQL alternative for custom schemas

5d8f0d7

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

CAMEL-21540: Add cross-documentation for pgvector component

4b028ff

- Add pgvector examples to langchain4j-embeddings docs (alongside Qdrant) - Add OpenAI and LangChain4j integration examples to pgvector docs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

gnodet force-pushed the hungry-quark branch from a62996a to 379ac92 Compare March 26, 2026 22:41

gnodet marked this pull request as ready for review March 26, 2026 22:58

gnodet marked this pull request as draft March 26, 2026 22:58

gnodet marked this pull request as ready for review March 26, 2026 23:50

gnodet mentioned this pull request Mar 27, 2026

Fix docs gulp race condition with dsl target directories #22292

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CAMEL-21540: Add PGVector component for PostgreSQL vector database#22207

CAMEL-21540: Add PGVector component for PostgreSQL vector database#22207
gnodet wants to merge 17 commits intomainfrom
hungry-quark

gnodet commented Mar 23, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 23, 2026

Uh oh!

github-actions bot commented Mar 24, 2026 •

edited

Loading

Uh oh!

apupier left a comment

Uh oh!

Uh oh!

gnodet left a comment •

edited

Loading

Uh oh!

gnodet commented Mar 25, 2026

Uh oh!

Croway commented Mar 25, 2026

Uh oh!

gnodet commented Mar 25, 2026

Uh oh!

gnodet commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

gnodet commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Example Routes

Features

Also included

Test plan

Uh oh!

github-actions bot commented Mar 23, 2026

Uh oh!

github-actions bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

apupier left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gnodet left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gnodet commented Mar 25, 2026

Uh oh!

Croway commented Mar 25, 2026

Uh oh!

gnodet commented Mar 25, 2026

Uh oh!

gnodet commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gnodet commented Mar 23, 2026 •

edited

Loading

github-actions bot commented Mar 24, 2026 •

edited

Loading

gnodet left a comment •

edited

Loading