Release 1.1.0 focuses on HED search performance and expanded search capabilities. The query engine (ExpressionAnd, ExpressionOr, ExpressionNegation) has been refactored to use set-based deduplication, yielding substantial speedups for complex queries. Two new internal modules — string_search and schema_lookup — enable schema-free HED search for use cases where loading a full schema is not practical. This release also adds Pandas 3.0 compatibility, a filename filter for the extract bids-sidecar CLI command, a get_task_names() method on BidsFileGroup, and deduplication of skip_cols in TabularSummary. A new benchmarks/ directory and docs/search_details.md page document the performance characteristics and design trade-offs of the three HED search implementations.
New features
Search engine performance improvements
ExpressionAnd, ExpressionOr, and ExpressionNegation in query_expressions.py now use set-based deduplication instead of O(n²) list scanning, giving a significant speedup for queries that produce many intermediate results. Supporting this, SearchResult gains __eq__ and __hash__ methods so instances can be stored in sets and dicts. QueryHandler._expr_has_wildcard() replaces the fragile "?" in str(interior) string check with a proper recursive AST walk, eliminating false positives on queries whose string representations happen to contain a literal ?.
StringQueryHandler and schema_lookup (internal / experimental)
Two new internal modules support schema-free HED search:
hed/models/string_search.py—StringQueryHandlersubclassesQueryHandlerand accepts a raw HED string instead of a parsedHedString, enabling query evaluation without loading a schema.StringNodeduck-typesHedGroup/HedTagso that existingExpressionsubclasses evaluate against it without modification.hed/models/schema_lookup.py—generate_schema_lookup(schema)builds a compact{short_tag: tag_terms}dict from a loaded schema that can be passed toStringQueryHandler.search()to enable ancestor-aware matching on short-form strings.save_schema_lookup()/load_schema_lookup()persist the table as JSON for offline use.
These modules are not part of the public API and may change in future releases.
HedGroup find-method documentation clarified
Docstrings for find_tags, find_wildcard_tags, find_exact_tags, and find_tags_with_term now document the exact comparison property each method uses (short_base_tag, short_tag, HedTag.__eq__, and tag_terms respectively) and explain the rationale for that choice.
Search benchmarks
A new benchmarks/ directory provides reproducible performance benchmarking tools: search_benchmark.py measures throughput across query types and string sizes, data_generator.py synthesizes realistic HED strings, and report.py generates Markdown and PNG reports. Pre-computed results are stored under benchmarks/results/ and benchmark figures under docs/_static/images/.
Search documentation
A new docs/search_details.md page covers all three HED search implementations (basic_search, QueryHandler, and StringQueryHandler): design trade-offs, query language reference, and measured performance characteristics with benchmark figures.
Pandas 3.0 compatibility
All pandas 3.0 breaking changes have been addressed, and the pandas version constraint in pyproject.toml has been updated from <3.0.0 to <4.0.0:
- Copy-on-Write (CoW): Chained
df[col][mask] = ...assignments indf_util.pyreplaced withdf.loc[mask, col] = ...to prevent silent no-ops and the newChainedAssignmentError. drop()API: Removed redundantaxis=1argument whencolumns=is already specified indata_util.py(the two arguments conflict in pandas 3.0).- NaN handling in schema loading:
df2schema.py,df_util.py, andhed_id_util.pynow checkisinstance(value, str)before calling string methods such as.strip()and.startswith(), preventingAttributeErrorwhen empty cells arefloatNaN rather than"". - StringDtype in
_merge_dataframes: Fillna logic updated inschema_io/df_util.pyto usepd.api.types.is_numeric_dtype()instead ofdtype == "object", correctly handling pandas 3.0StringDtypecolumns. - Float64 column FutureWarning:
assign_hed_ids_sectioninhed_id_util.pynow casts all-NaN hedId columns fromfloat64toobjectbefore assigning string values, eliminating a pandas deprecation warning. - Added
tests/test_pandas3_compat.pywith 27 targeted tests covering all of the above fixes.
Filename filter for extract bids-sidecar
hedpy extract bids-sidecar and the underlying hed_extract_bids_sidecar script now accept a --filter / -fl option. Only files whose name contains the filter string are included in the sidecar extraction. Example:
hedpy extract bids-sidecar /path/to/dataset --filter sub-01BidsFileGroup.get_task_names()
BidsFileGroup now exposes a get_task_names() method that returns a sorted list of unique task names (the xxxx portion of task-xxxx BIDS entities) found across all sidecar and data files in the group.
TabularSummary deduplicates skip_cols
TabularSummary.__init__ now deduplicates the skip_cols list using dict.fromkeys, preserving order. Passing the same column name more than once no longer produces duplicate entries in skip_cols or in the "Skip columns" field of the summary metadata output. Functional behaviour (which columns are skipped) is unchanged.
Documentation
- Removed
{index}placeholder annotations fromREADME.mdandexamples/README.md.
CI/CD
- Bumped
actions/configure-pagesfrom 5 to 6. - Bumped
astral-sh/setup-uvfrom v7 to v8.0.0. - Updated
anthropics/claude-code-actionto v1.0.97. - Pinned all GitHub Actions steps to full SHA hashes for supply-chain security.
- Updated
spec_tests/hed-examples,spec_tests/hed-schemas, andspec_tests/hed-testssubmodules.
Full Changelog: 1.0.0...1.1.0