702: Add regex support for target sorting in target_is_sorted_by operator#1705
702: Add regex support for target sorting in target_is_sorted_by operator#1705alexfurmenkov wants to merge 14 commits into
Conversation
…isc-org/cdisc-rules-engine into 702-target-is-sorted-by-regex
| target_for_sorting = f"{target}_extracted" | ||
| # Sort by within columns only, preserve original order within groups | ||
| sorted_df = working_df.sort_values( | ||
| by=within_columns, |
There was a problem hiding this comment.
here we would need to sort by within_columns and extracted target not preserve the order for regex branch.
| "markdownDescription": "\nTrue if the values in name are ordered according to the values specified by value\nin ascending/descending order, grouped by the values in within. Each value entry\nrequires a variable name, a sort_order of asc or desc, and an optional\nnull_position of first or last (defaults to last) which controls where null/empty\ncomparator values are placed in the expected ordering. Within accepts either a\nsingle column or an ordered list of columns. Columns can be either number or Char\nDates in ISO8601 YYYY-MM-DD format. Date value(s) with different precisions that\noverlap (e.g. 2005-10, 2005-10-3 and 2005-10-08) are all flagged as not sorted as\ntheir order cannot be inferred.\n\nOptionally supports a `regex` parameter that extracts a portion of the target\nvalue for sorting. The regex must contain at least one capturing group. The first\ncaptured group is extracted and converted to numeric if possible, allowing proper\nsorting of sequence numbers (e.g., \"MIDS1\", \"MIDS2\", ..., \"MIDS10\" with regex\n`.*?(\\\\d+)$`). This is particularly useful for variables that end with sequence\nnumbers that may or may not be zero-padded.\n\n```yaml\nCheck:\n all:\n - name: --SEQ\n within:\n - USUBJID\n - MIDSTYPE\n operator: target_is_sorted_by\n value:\n - name: --STDTC\n sort_order: asc\n null_position: last\n```\n\nExample with regex for extracting sequence numbers:\n\n```yaml\nCheck:\n all:\n - name: MIDS\n operator: target_is_sorted_by\n regex: \".*?(\\\\d+)$\" # Extract trailing digits, convert to numeric\n value:\n - name: SMSTDTC\n sort_order: asc\n within:\n - USUBJID\n - MIDSTYPE\n```\n" | ||
| } | ||
| }, | ||
| "required": ["operator", "value", "within"], |
There was a problem hiding this comment.
I think we should add the new regex property here.
There was a problem hiding this comment.
It’s not very visible here, but I actually added information about the regex. It’s easier to view it in the editor.
…racted target values
…isc-org/cdisc-rules-engine into 702-target-is-sorted-by-regex
RamilCDISC
left a comment
There was a problem hiding this comment.
I executed the rule CG0546 in dev editor. I used the dataset from folder CG0545 in sharepiont as there was no dataset for CG0546. I made a change and added the suffix to the MIDSTYPE column records in the SM dataset. The updated dataset.is attached.
I get the following error
{
"SM": [
{
"executionStatus": "execution error",
"dataset": "SM",
"domain": "SM",
"variables": [],
"message": "rule evaluation error - operation failed",
"errors": [
{
"dataset": "SM",
"error": "Error occurred during operation execution",
"message": "Failed to execute rule operation. Operation: record_count, Target: None, Domain: SM, Error: single positional indexer is out-of-bounds"
}
]
}
],
"TM": [
{
"executionStatus": "skipped",
"dataset": "TM",
"domain": "TM",
"variables": [],
"message": "Rule skipped - doesn't apply to domain for rule id=CDISC.SDTMIG.CG0546, dataset=TM",
"errors": [
{
"dataset": "TM",
"error": "Outside scope",
"message": "Rule skipped - doesn't apply to domain for rule id=CDISC.SDTMIG.CG0546, dataset=TM"
}
]
}
]
}
unit-test-coreid-CG0545-negative 1.xlsx
Please let me know if I updated the dataset incorrectly.
There was a problem hiding this comment.
The unit tests look good to me, by still waiting on @alexfurmenkov to address @RamilCDISC's test results
|
Hi @RamilCDISC , When testing the target_is_sorted_by operator in isolation, it worked correctly: for positive data it returned The problem is that the rule uses the operator's result directly for error generation: when The quickest fix is to use the I attach correct datasets |
RamilCDISC
left a comment
There was a problem hiding this comment.
The PR adds regex support for target_is _sorted_by_operator. The validation was done by:
- Reviewing the PR for any unwanted code or comments.
- Reviewing the PR logic in accordance with AC.
- Ensuring all unit and regression testing pass.
- Ensuring relevant testing is updated.
- Ensuring documentation is updated.
- Ensuring the regex support is properly implemented.
- Verifying the regex implementation in the operator will handle edge cases.
- Verifying the regex implementation is similar to other operators.
- Running manual validation using dev editor using negative dataset.
- Running manual validation using dev editor using positive dataset.
No description provided.