Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
- **GFQL policy / Cypher compiler hooks (#1454)**: Added experimental exact-key `precompile` and `postcompile` policy hooks for local Cypher string-query compilation. `postcompile` reports success or failure using the existing policy `success`, `error`, and `error_type` fields plus a stable `CompileSummary` with scalar compiler metadata.

### Changed
- **GFQL public schema declarations (#1337)**: Added stable `graphistry.schema` exports for `NodeType`, `EdgeType`, `GraphSchema`, and `EdgeTopology`, plus top-level `graphistry` re-exports. `NodeType` and `EdgeType` now accept Arrow-first `pyarrow.Schema` declarations, preserve dtype/nullability through GFQL `RowSchema`, and export back to Arrow with label/type columns via `to_arrow()`. `graphistry.bind(..., schema=schema)` / `g.bind(schema=schema)` now attach public schema declarations to plotters, and Cypher preflight validation consumes the adapted internal `GraphSchemaCatalog` for declared labels, properties, relationship types, and source/destination topology checks. `GraphSchema(strict=False)` now makes schema-bound `g.gfql_validate(...)` permissive by default while explicit call-level `strict=True` still forces strict validation.
- **GFQL / Cypher pattern predicate existence semantics (#1449)**: Direct-Cypher `WHERE (pattern)` predicates now lower through correlated semi-apply markers instead of rewriting single positive predicates into appended `MATCH` clauses, preventing existence checks from multiplying result rows. Added pandas/cuDF coverage for the residual `expr-pattern1-10`, `expr-pattern1-13`, and `expr-pattern1-18` undirected pattern-predicate wrong-row cases.
- **GFQL / Cypher reentry failfast scaffolding cleanup (#1421)**: Removed the obsolete `graphistry.compute.gfql.cypher.reentry.runtime` compatibility re-export shim after compile-time reentry ownership moved to `reentry.compiletime`, moved tests off the old private `gfql_unified._compiled_query_reentry_state` access path, and lifted the stale closed-#1256 aggregate failfast so chained reentry secondary-property carries now flow through downstream aggregating `WITH` stages with positive row assertions.
- **GFQL / Cypher pre-strict binder compatibility guard deletion (#1420)**: Retired the legacy loose `FrontendBinder.bind(strict_name_resolution=False)` graph traversal path and unresolved-name fallbacks now that #1357 made strict binder semantics canonical. Cypher compile prepass and graph-constructor binding now pass `strict_name_resolution=True` explicitly, and binder tests now pin that the legacy false flag no longer admits unresolved `collect(...)`, single-alias list literal, or missing-schema inputs while preserving strict source-order traversal through `WITH → UNWIND → MATCH`.
Expand Down
1 change: 1 addition & 0 deletions docs/source/api/gfql/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,4 @@ GFQL API Reference
hop
node
predicates
schema
7 changes: 7 additions & 0 deletions docs/source/api/gfql/schema.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
GFQL Schema
===========

.. automodule:: graphistry.schema
:members:
:undoc-members:
:show-inheritance:
1 change: 1 addition & 0 deletions docs/source/gfql/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@ See also:
builtin_calls
policy
strict_mode
schema
wire_protocol_examples

.. toctree::
Expand Down
109 changes: 109 additions & 0 deletions docs/source/gfql/schema.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
Declarative Graph Schemas
=========================

GFQL accepts public schema declarations through the stable
``graphistry.schema`` import path. Use this when application code owns a graph
contract and wants Cypher preflight checks to fail before query execution.

.. code-block:: python

import graphistry
import pyarrow as pa
from graphistry.schema import EdgeType, GraphSchema, NodeType

Person = NodeType(
"Person",
pa.schema([
pa.field("id", pa.int64(), nullable=False),
pa.field("name", pa.large_string()),
]),
)
Company = NodeType(
"Company",
pa.schema([
pa.field("id", pa.int64(), nullable=False),
pa.field("name", pa.large_string()),
]),
)
WorksAt = EdgeType(
"WORKS_AT",
source=Person,
destination=Company,
properties=pa.schema([pa.field("since", pa.int32(), nullable=False)]),
)

schema = GraphSchema(
node_types=[Person, Company],
edge_types=[WorksAt],
node_id_column="id",
edge_source_column="src",
edge_destination_column="dst",
)

g = graphistry.bind(
source="src",
destination="dst",
node="id",
schema=schema,
)

g.gfql_validate("MATCH (p:Person)-[:WORKS_AT]->(c:Company) RETURN p.name")

Schema Objects
--------------

``NodeType(name, properties, labels=None)``
Declares a node contract. ``labels`` defaults to ``(name,)`` and maps to the
existing GFQL label-column convention ``label__<Label>``. ``properties``
accepts a ``pyarrow.Schema``, a GFQL ``RowSchema``, or a mapping shorthand
such as ``{"id": pa.int64(), "name": pa.large_string()}`` or
``{"id": int, "name": str}``. Arrow schemas are the preferred declaration
path because they preserve dtype and nullability.

``EdgeType(name, source, destination, properties=None)``
Declares an edge contract and topology. ``source`` and ``destination`` accept
``NodeType`` objects, label strings, or label iterables. Edge properties use
the same Arrow-aligned schema inputs as node properties.

``GraphSchema(node_types, edge_types, strict=True, ...)``
Groups node/edge contracts and adapts them to the internal
``GraphSchemaCatalog`` used by binder/preflight validation. ``strict=False``
makes schema-bound ``g.gfql_validate(...)`` permissive by default; callers can
still override per call with ``g.gfql_validate(..., strict=True)``.

``NodeType.to_arrow()`` and ``EdgeType.to_arrow()``
Export declarations as ``pyarrow.Schema`` objects through GFQL's row-schema
bridge. Label/type columns are included by default so exports line up with
the table columns used by binder/preflight validation.

What Preflight Checks
---------------------

When a schema is bound to a graph, Cypher preflight checks validate:

* node labels against declared node types,
* node and edge property names against declared properties,
* relationship types against declared edge types, and
* relationship source/destination labels against declared topology when the
query provides enough label information.

Invalid queries raise ``GFQLValidationError`` with structured context.

Compatibility Notes
-------------------

The public import path is stable:

.. code-block:: python

from graphistry.schema import NodeType, EdgeType, GraphSchema

Top-level imports are also available:

.. code-block:: python

from graphistry import NodeType, EdgeType, GraphSchema

This lane exposes declaration, Arrow row-schema export, and binder/preflight
integration. Inference from existing plottables remains a separate follow-on
surface.
1 change: 1 addition & 0 deletions graphistry/Plottable.py
Original file line number Diff line number Diff line change
Expand Up @@ -355,6 +355,7 @@ def bind(
url: Optional[str] = None,
nodes_file_id: Optional[str] = None,
edges_file_id: Optional[str] = None,
schema: Optional[Any] = None,
) -> 'Plottable':
...

Expand Down
6 changes: 6 additions & 0 deletions graphistry/PlotterBase.py
Original file line number Diff line number Diff line change
Expand Up @@ -227,6 +227,7 @@ def __init__(self, *args: Any, **kwargs: Any) -> None:
self._point_y : Optional[str] = None
self._point_longitude : Optional[str] = None
self._point_latitude : Optional[str] = None
self._gfql_schema : Any = None
# Settings
self._height : int = 500
self._render : RenderModesConcrete = resolve_render_mode(self, True)
Expand Down Expand Up @@ -1529,6 +1530,7 @@ def bind(self,
url: Optional[str] = None,
nodes_file_id: Optional[str] = None,
edges_file_id: Optional[str] = None,
schema: Optional[Any] = None,
) -> Plottable:
"""Relate data attributes to graph structure and visual representation. To facilitate reuse and replayable notebooks, the binding call is chainable. Invocation does not effect the old binding: it instead returns a new Plotter instance with the new bindings added to the existing ones. Both the old and new bindings can then be used for different graphs.

Expand Down Expand Up @@ -1598,6 +1600,9 @@ def bind(self,
:param edges_file_id: Remote edges file id
:type edges_file_id: Optional[str]

:param schema: Optional public GFQL schema declaration from ``graphistry.schema``.
:type schema: Optional[Any]

:returns: Plotter
:rtype: Plotter

Expand Down Expand Up @@ -1677,6 +1682,7 @@ def bind(self,
res._url = url or self._url
res._nodes_file_id = nodes_file_id or self._nodes_file_id
res._edges_file_id = edges_file_id or self._edges_file_id
res._gfql_schema = schema if schema is not None else self._gfql_schema

# Invalidate dataset_id if we're changing encodings, not setting IDs
encoding_params_changed = any([
Expand Down
7 changes: 7 additions & 0 deletions graphistry/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,13 @@

from graphistry.Engine import Engine

from graphistry.schema import (
EdgeTopology,
EdgeType,
GraphSchema,
NodeType,
)

from graphistry.privacy import (
Mode, Privacy
)
Expand Down
Loading
Loading