Skip to content

Release 1.1.0 April 19, 2026

Latest

Choose a tag to compare

@VisLab VisLab released this 19 Apr 17:25
· 8 commits to main since this release
5194ed6

Release 1.1.0 focuses on HED search performance and expanded search capabilities. The query engine (ExpressionAnd, ExpressionOr, ExpressionNegation) has been refactored to use set-based deduplication, yielding substantial speedups for complex queries. Two new internal modules — string_search and schema_lookup — enable schema-free HED search for use cases where loading a full schema is not practical. This release also adds Pandas 3.0 compatibility, a filename filter for the extract bids-sidecar CLI command, a get_task_names() method on BidsFileGroup, and deduplication of skip_cols in TabularSummary. A new benchmarks/ directory and docs/search_details.md page document the performance characteristics and design trade-offs of the three HED search implementations.

New features

Search engine performance improvements

ExpressionAnd, ExpressionOr, and ExpressionNegation in query_expressions.py now use set-based deduplication instead of O(n²) list scanning, giving a significant speedup for queries that produce many intermediate results. Supporting this, SearchResult gains __eq__ and __hash__ methods so instances can be stored in sets and dicts. QueryHandler._expr_has_wildcard() replaces the fragile "?" in str(interior) string check with a proper recursive AST walk, eliminating false positives on queries whose string representations happen to contain a literal ?.

StringQueryHandler and schema_lookup (internal / experimental)

Two new internal modules support schema-free HED search:

  • hed/models/string_search.pyStringQueryHandler subclasses QueryHandler and accepts a raw HED string instead of a parsed HedString, enabling query evaluation without loading a schema. StringNode duck-types HedGroup/HedTag so that existing Expression subclasses evaluate against it without modification.
  • hed/models/schema_lookup.pygenerate_schema_lookup(schema) builds a compact {short_tag: tag_terms} dict from a loaded schema that can be passed to StringQueryHandler.search() to enable ancestor-aware matching on short-form strings. save_schema_lookup() / load_schema_lookup() persist the table as JSON for offline use.

These modules are not part of the public API and may change in future releases.

HedGroup find-method documentation clarified

Docstrings for find_tags, find_wildcard_tags, find_exact_tags, and find_tags_with_term now document the exact comparison property each method uses (short_base_tag, short_tag, HedTag.__eq__, and tag_terms respectively) and explain the rationale for that choice.

Search benchmarks

A new benchmarks/ directory provides reproducible performance benchmarking tools: search_benchmark.py measures throughput across query types and string sizes, data_generator.py synthesizes realistic HED strings, and report.py generates Markdown and PNG reports. Pre-computed results are stored under benchmarks/results/ and benchmark figures under docs/_static/images/.

Search documentation

A new docs/search_details.md page covers all three HED search implementations (basic_search, QueryHandler, and StringQueryHandler): design trade-offs, query language reference, and measured performance characteristics with benchmark figures.

Pandas 3.0 compatibility

All pandas 3.0 breaking changes have been addressed, and the pandas version constraint in pyproject.toml has been updated from <3.0.0 to <4.0.0:

  • Copy-on-Write (CoW): Chained df[col][mask] = ... assignments in df_util.py replaced with df.loc[mask, col] = ... to prevent silent no-ops and the new ChainedAssignmentError.
  • drop() API: Removed redundant axis=1 argument when columns= is already specified in data_util.py (the two arguments conflict in pandas 3.0).
  • NaN handling in schema loading: df2schema.py, df_util.py, and hed_id_util.py now check isinstance(value, str) before calling string methods such as .strip() and .startswith(), preventing AttributeError when empty cells are float NaN rather than "".
  • StringDtype in _merge_dataframes: Fillna logic updated in schema_io/df_util.py to use pd.api.types.is_numeric_dtype() instead of dtype == "object", correctly handling pandas 3.0 StringDtype columns.
  • Float64 column FutureWarning: assign_hed_ids_section in hed_id_util.py now casts all-NaN hedId columns from float64 to object before assigning string values, eliminating a pandas deprecation warning.
  • Added tests/test_pandas3_compat.py with 27 targeted tests covering all of the above fixes.

Filename filter for extract bids-sidecar

hedpy extract bids-sidecar and the underlying hed_extract_bids_sidecar script now accept a --filter / -fl option. Only files whose name contains the filter string are included in the sidecar extraction. Example:

hedpy extract bids-sidecar /path/to/dataset --filter sub-01

BidsFileGroup.get_task_names()

BidsFileGroup now exposes a get_task_names() method that returns a sorted list of unique task names (the xxxx portion of task-xxxx BIDS entities) found across all sidecar and data files in the group.

TabularSummary deduplicates skip_cols

TabularSummary.__init__ now deduplicates the skip_cols list using dict.fromkeys, preserving order. Passing the same column name more than once no longer produces duplicate entries in skip_cols or in the "Skip columns" field of the summary metadata output. Functional behaviour (which columns are skipped) is unchanged.

Documentation

  • Removed {index} placeholder annotations from README.md and examples/README.md.

CI/CD

  • Bumped actions/configure-pages from 5 to 6.
  • Bumped astral-sh/setup-uv from v7 to v8.0.0.
  • Updated anthropics/claude-code-action to v1.0.97.
  • Pinned all GitHub Actions steps to full SHA hashes for supply-chain security.
  • Updated spec_tests/hed-examples, spec_tests/hed-schemas, and spec_tests/hed-tests submodules.

Full Changelog: 1.0.0...1.1.0