Skip to content

Constants not working in python dataset view #2574

@matthewoneill7-nhs

Description

@matthewoneill7-nhs

Describe the bug
When using constants in a dataset view, the query fails.

This is because the code in view uses constants instead of constant which is in the specification and on the java side

changing the view method in pathling/datasource.py to take constant fixes the issues:

To Reproduce

pc = PathlingContext.create(spark)
data_source = pc.read.datasets({"AuditEvent": encoded})
# where encoded is a dataframe of AuditEvents

results = data_source.view(resource="AuditEvent", 
      constants=[{"name": "requestor", "valueBoolean": "true"}]
      select = [
        {"column": [{"path": "id", "name": "event_id"}]},
        {
            "forEach": "agent.where(requestor = %requestor)",
            "column": [
                {"path": "who.display", "name": "agent_name"},
            ],
        },
    ],
)

produces the following traceback (truncated for relevance):
File "site-packages/pathling/datasource.py", line 102, in view
return self._wrap_df(jquery.execute())
~~~~~~~~~~~~~~^^
File "site-packages/py4j/java_gateway.py", line 1362, in call
return_value = get_return_value(
answer, self.gateway_client, self.target_id, self.name)
File "site-packages/pyspark/errors/exceptions/captured.py", line 185, in deco
raise converted from None
pyspark.errors.exceptions.captured.IllegalArgumentException: Unknown variable: requestor

Expected behavior
The value of the constant should be injected into the query

Fix
Change the argument name in DataSource.view from constants to constant and change the key in the query args to constant

def view(
    self,
    resource: Optional[str] = None,
    select: Optional[Sequence[Dict]] = None,
    constant: Optional[Sequence[Dict]] = None,
    where: Optional[Sequence[Dict]] = None,
    json: Optional[str] = None,
) -> DataFrame:
    """
    Executes a SQL on FHIR view definition and returns the result as a Spark DataFrame.

    :param resource: The FHIR resource that the view is based upon, e.g. 'Patient' or
            'Observation'.
    :param select: A list of columns and nested selects to include in the view.
    :param constant: A list of constants that can be used in FHIRPath expressions.
    :param where: A list of FHIRPath expressions that can be used to filter the view.
    :param json: A JSON string representing the view definition, as an alternative to providing
            the parameters as Python objects.
    :return: A Spark DataFrame containing the results of the view.
    """
    if json:
        query_json = json
        parsed = loads(json)
        resource = parsed.get("resource")
    else:
        args = locals()
        query = {key: args[key] for key in ["resource", "select", "constant", "where"] if args[key] is not None}
        query_json = dumps(query)
    jquery = self._jds.view(resource)
    jquery.json(query_json)
    return self._wrap_df(jquery.execute())

Work around
To work around the issue, you can pass the query in as json:

results = data_source.view(json = json.dumps({"resource": "AuditEvent", 
      "constant" : [{"name": "requestor", "valueBoolean": "true"}],
      "select" : [
        {"column": [{"path": "id", "name": "event_id"}]},
        {
            "forEach": "agent.where(requestor = %requestor)",
            "column": [
                {"path": "who.display", "name": "agent_name"},
            ],
        },
    ],
    })
)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions