702: Add regex support for target sorting in target_is_sorted_by operator by alexfurmenkov · Pull Request #1705 · cdisc-org/cdisc-rules-engine

alexfurmenkov · 2026-04-23T09:15:02Z

No description provided.

github-actions

Updated schema has not been merged with markdown descriptions. Please run the "Merge Schema with Markdown Descriptions" workflow to update the merged schema files.

github-actions

Updated schema has not been merged with markdown descriptions. Please run the "Merge Schema with Markdown Descriptions" workflow to update the merged schema files.

…er details

github-actions

Updated schema has not been merged with markdown descriptions. Please run the "Merge Schema with Markdown Descriptions" workflow to update the merged schema files.

…entation

…isc-org/cdisc-rules-engine into 702-target-is-sorted-by-regex

github-actions

Updated schema has not been merged with markdown descriptions. Please run the "Merge Schema with Markdown Descriptions" workflow to update the merged schema files.

RamilCDISC · 2026-04-28T18:58:19Z

+                target_for_sorting = f"{target}_extracted"
+                # Sort by within columns only, preserve original order within groups
+                sorted_df = working_df.sort_values(
+                    by=within_columns,


here we would need to sort by within_columns and extracted target not preserve the order for regex branch.

RamilCDISC · 2026-04-28T19:00:26Z

+          "markdownDescription": "\nTrue if the values in name are ordered according to the values specified by value\nin ascending/descending order, grouped by the values in within. Each value entry\nrequires a variable name, a sort_order of asc or desc, and an optional\nnull_position of first or last (defaults to last) which controls where null/empty\ncomparator values are placed in the expected ordering. Within accepts either a\nsingle column or an ordered list of columns. Columns can be either number or Char\nDates in ISO8601 YYYY-MM-DD format. Date value(s) with different precisions that\noverlap (e.g. 2005-10, 2005-10-3 and 2005-10-08) are all flagged as not sorted as\ntheir order cannot be inferred.\n\nOptionally supports a `regex` parameter that extracts a portion of the target\nvalue for sorting. The regex must contain at least one capturing group. The first\ncaptured group is extracted and converted to numeric if possible, allowing proper\nsorting of sequence numbers (e.g., \"MIDS1\", \"MIDS2\", ..., \"MIDS10\" with regex\n`.*?(\\\\d+)$`). This is particularly useful for variables that end with sequence\nnumbers that may or may not be zero-padded.\n\n```yaml\nCheck:\n  all:\n    - name: --SEQ\n      within:\n        - USUBJID\n        - MIDSTYPE\n      operator: target_is_sorted_by\n      value:\n        - name: --STDTC\n          sort_order: asc\n          null_position: last\n```\n\nExample with regex for extracting sequence numbers:\n\n```yaml\nCheck:\n  all:\n    - name: MIDS\n      operator: target_is_sorted_by\n      regex: \".*?(\\\\d+)$\" # Extract trailing digits, convert to numeric\n      value:\n        - name: SMSTDTC\n          sort_order: asc\n      within:\n        - USUBJID\n        - MIDSTYPE\n```\n"
        }
      },
      "required": ["operator", "value", "within"],


I think we should add the new regex property here.

It’s not very visible here, but I actually added information about the regex. It’s easier to view it in the editor.

…racted target values

…isc-org/cdisc-rules-engine into 702-target-is-sorted-by-regex

RamilCDISC

I executed the rule CG0546 in dev editor. I used the dataset from folder CG0545 in sharepiont as there was no dataset for CG0546. I made a change and added the suffix to the MIDSTYPE column records in the SM dataset. The updated dataset.is attached.

I get the following error

{
  "SM": [
    {
      "executionStatus": "execution error",
      "dataset": "SM",
      "domain": "SM",
      "variables": [],
      "message": "rule evaluation error - operation failed",
      "errors": [
        {
          "dataset": "SM",
          "error": "Error occurred during operation execution",
          "message": "Failed to execute rule operation. Operation: record_count, Target: None, Domain: SM, Error: single positional indexer is out-of-bounds"
        }
      ]
    }
  ],
  "TM": [
    {
      "executionStatus": "skipped",
      "dataset": "TM",
      "domain": "TM",
      "variables": [],
      "message": "Rule skipped - doesn't apply to domain for rule id=CDISC.SDTMIG.CG0546, dataset=TM",
      "errors": [
        {
          "dataset": "TM",
          "error": "Outside scope",
          "message": "Rule skipped - doesn't apply to domain for rule id=CDISC.SDTMIG.CG0546, dataset=TM"
        }
      ]
    }
  ]
}

unit-test-coreid-CG0545-negative 1.xlsx

Please let me know if I updated the dataset incorrectly.

gerrycampion

The unit tests look good to me, by still waiting on @alexfurmenkov to address @RamilCDISC's test results

alexfurmenkov · 2026-05-13T15:54:50Z

Hi @RamilCDISC ,
The initial test dataset unit-test-coreid-CG0545-negative.xlsx contained errors that prevented rule CG0546 from executing (status SKIPPED): the MIDSTYPE field in the SM dataset had values "DIAGNOSIS1" and "DIAGNOSIS2" instead of "DIAGNOSIS" (as in TM), which blocked dataset merging, and the MIDS field contained "DIAG" without trailing digits, failing the regex validation. After fixing these basic issues (changed MIDSTYPE to "DIAGNOSIS" and MIDS to "DIAG1"/"DIAG2"), two correct datasets were created: a positive test with proper order (DIAG1→2005-10, DIAG2→2007-10) and a negative test with violated order (DIAG2→2005-10, DIAG1→2007-10).

When testing the target_is_sorted_by operator in isolation, it worked correctly: for positive data it returned [True, True] (data is valid), for negative data it returned [False, False] (data is invalid). However, during full validation through the engine, a complete logical inversion was discovered: the positive dataset with correct data returned status "ISSUE REPORTED" with 2 errors (should be SUCCESS with 0 errors), while the negative dataset with incorrect order returned status "SUCCESS" with 0 errors (should be ISSUE REPORTED with 2 errors). This means the rule works completely backwards: valid data is marked as invalid (false positives), and invalid data is passed through (false negatives).

The problem is that the rule uses the operator's result directly for error generation: when target_is_sorted_by returns True (which semantically means "data is properly sorted"), the engine generates an error, and when it returns False (data is improperly sorted) - it does not generate an error.

The quickest fix is to use the target_is_NOT_sorted_by operator instead of target_is_sorted_by in the rule, which inverts the logic: the operator will return True when order is violated, and the rule will correctly generate errors only for incorrect data.

I attach correct datasets
unit-test-coreid-CG0545-negative-FIXED.xlsx
unit-test-coreid-CG0545-positive.xlsx

RamilCDISC

The PR adds regex support for target_is _sorted_by_operator. The validation was done by:

Reviewing the PR for any unwanted code or comments.
Reviewing the PR logic in accordance with AC.
Ensuring all unit and regression testing pass.
Ensuring relevant testing is updated.
Ensuring documentation is updated.
Ensuring the regex support is properly implemented.
Verifying the regex implementation in the operator will handle edge cases.
Verifying the regex implementation is similar to other operators.
Running manual validation using dev editor using negative dataset.
Running manual validation using dev editor using positive dataset.

Add regex support for target sorting in target_is_sorted_by operator

4ddf1a4

alexfurmenkov temporarily deployed to DEV April 23, 2026 09:15 — with GitHub Actions Inactive

github-actions Bot requested changes Apr 23, 2026

View reviewed changes

alexfurmenkov changed the title ~~Add regex support for target sorting in target_is_sorted_by operator~~ 702: Add regex support for target sorting in target_is_sorted_by operator Apr 23, 2026

alexfurmenkov linked an issue Apr 23, 2026 that may be closed by this pull request

Rule blocked: CORERULES-246 #702

Open

Remove the deprecated function

74bf864

alexfurmenkov temporarily deployed to DEV April 23, 2026 09:20 — with GitHub Actions Inactive

github-actions Bot requested changes Apr 23, 2026

View reviewed changes

Enhance target_is_sorted_by operator documentation with regex paramet…

3b65d2c

…er details

alexfurmenkov temporarily deployed to DEV April 24, 2026 08:26 — with GitHub Actions Inactive

github-actions Bot requested changes Apr 24, 2026

View reviewed changes

github-actions and others added 2 commits April 24, 2026 09:51

Update merged schema files with markdown descriptions

b19cb47

Merge branch 'main' into 702-target-is-sorted-by-regex

eb47752

alexfurmenkov temporarily deployed to DEV April 24, 2026 10:13 — with GitHub Actions Inactive

alexfurmenkov added 2 commits April 24, 2026 12:28

Fix formatting of regex comment in target_is_sorted_by operator docum…

579fd86

…entation

Merge branch '702-target-is-sorted-by-regex' of https://github.com/cd…

106a725

…isc-org/cdisc-rules-engine into 702-target-is-sorted-by-regex

alexfurmenkov temporarily deployed to DEV April 24, 2026 10:28 — with GitHub Actions Inactive

github-actions Bot requested changes Apr 24, 2026

View reviewed changes

Update merged schema files with markdown descriptions

68e9141

alexfurmenkov requested review from RamilCDISC, SFJohnson24 and gerrycampion April 24, 2026 10:47

Merge branch 'main' into 702-target-is-sorted-by-regex

02cbb87

RamilCDISC temporarily deployed to DEV April 28, 2026 18:47 — with GitHub Actions Inactive

RamilCDISC requested changes Apr 28, 2026

View reviewed changes

alexfurmenkov added 2 commits May 5, 2026 13:11

Enhance target sorting in target_is_sorted_by operator to include ext…

6715e04

…racted target values

Merge branch '702-target-is-sorted-by-regex' of https://github.com/cd…

66c8c58

…isc-org/cdisc-rules-engine into 702-target-is-sorted-by-regex

alexfurmenkov temporarily deployed to DEV May 5, 2026 11:14 — with GitHub Actions Inactive

alexfurmenkov requested a review from RamilCDISC May 5, 2026 14:43

Merge branch 'main' into 702-target-is-sorted-by-regex

0041445

RamilCDISC temporarily deployed to DEV May 5, 2026 21:34 — with GitHub Actions Inactive

Merge branch 'main' into 702-target-is-sorted-by-regex

8941d97

SFJohnson24 temporarily deployed to DEV May 6, 2026 20:10 — with GitHub Actions Inactive

Merge branch 'main' into 702-target-is-sorted-by-regex

f3792af

SFJohnson24 temporarily deployed to DEV May 7, 2026 17:12 — with GitHub Actions Inactive

RamilCDISC temporarily deployed to DEV May 7, 2026 20:10 — with GitHub Actions Inactive

RamilCDISC requested changes May 7, 2026

View reviewed changes

gerrycampion reviewed May 11, 2026

View reviewed changes

alexfurmenkov requested a review from RamilCDISC May 14, 2026 11:33

RamilCDISC deployed to DEV May 14, 2026 20:07 — with GitHub Actions Active

RamilCDISC approved these changes May 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

702: Add regex support for target sorting in target_is_sorted_by operator#1705

702: Add regex support for target sorting in target_is_sorted_by operator#1705
alexfurmenkov wants to merge 14 commits into
mainfrom
702-target-is-sorted-by-regex

alexfurmenkov commented Apr 23, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot left a comment

Uh oh!

RamilCDISC Apr 28, 2026

Uh oh!

RamilCDISC Apr 28, 2026

Uh oh!

alexfurmenkov May 5, 2026

Uh oh!

RamilCDISC left a comment

Uh oh!

gerrycampion left a comment •

edited

Loading

Uh oh!

alexfurmenkov commented May 13, 2026

Uh oh!

RamilCDISC left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

alexfurmenkov commented Apr 23, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

RamilCDISC Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

RamilCDISC Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

alexfurmenkov May 5, 2026

Choose a reason for hiding this comment

Uh oh!

RamilCDISC left a comment

Choose a reason for hiding this comment

Uh oh!

gerrycampion left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexfurmenkov commented May 13, 2026

Uh oh!

RamilCDISC left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gerrycampion left a comment •

edited

Loading