Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion packages/uipath/pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "uipath"
version = "2.10.70"
version = "2.10.72"
description = "Python SDK and CLI for UiPath Platform, enabling programmatic interaction with automation services, process management, and deployment tools."
readme = { file = "README.md", content-type = "text/markdown" }
requires-python = ">=3.11"
Expand Down
139 changes: 139 additions & 0 deletions packages/uipath/samples/classifier_demo/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
# Classifier evaluator end-to-end demo

A minimal intent-classification agent that exercises the new
`ClassifierEvaluator` end-to-end. Use this as the test fixture for both
SDK-only validation (Path A below) and Studio Web full-stack validation
(Path B).

## What's here

```
classifier_demo/
├── main.py # 3-class keyword classifier
├── uipath.json
├── pyproject.toml
├── bindings.json
└── evaluations/
├── eval-sets/
│ └── main.json # 9 datapoints, 3 per class, some intentionally wrong
└── evaluators/
├── intent_match.json # per-datapoint ExactMatch on agent_output.intent
└── intent_classifier.json # the new uipath-classifier (pure metadata)
```

The eval set is wired so that for every datapoint both evaluators run:
- `intent_match` produces a 1.0/0.0 score with `{"expected": "...", "actual": "..."}` justification.
- `intent_classifier` produces a sentinel 0.0 score with `{"classes": [...], "source_evaluator": "intent_match"}` justification.

Downstream (the C# layer in Studio Web) reads both to compute precision /
recall / F-score across the dataset.

> Heads-up — every datapoint must have an entry for the classifier in
> `evaluationCriterias` (even an empty `{}`). The runtime currently skips
> evaluators that aren't keyed in `evaluationCriterias` for a datapoint, so
> omitting them silently drops the classifier results.

## Path A — SDK only (real run, ~30 seconds)

```bash
cd packages/uipath
uv sync --all-extras

cd samples/classifier_demo
uv run --project ../.. uipath eval main main.json --no-report --output-file /tmp/out.json
```

Expected: a results table with two columns (`intent_classifier`, `intent_match`).
`intent_match` averages to 0.7 (6/9 correct). `intent_classifier` shows 0.0 per
row by design — its real work is to ship the classes list to the backend.

To see the metadata payload that lands in the backend's
`CodedEvaluatorScore.Justification`:

```bash
python3 -c "
import json
with open('/tmp/out.json') as f: d = json.load(f)
for r in d['evaluationSetResults'][0]['evaluationRunResults']:
print(r['evaluatorName'], r['result'].get('details'))
"
```

You should see something like:

```
intent_classifier {'expected': '', 'actual': '', 'classes': ['book', 'cancel', 'reschedule'], 'source_evaluator': 'intent_match'}
intent_match {'expected': 'book', 'actual': 'book'}
```

## Path B — Full Studio Web stack (real UI, click Run, see panel)

Currently blocked on environment that I (the assistant who built this) didn't
have available locally. The pieces:

### Prereqs (per `Agents/LOCAL_DEVELOPMENT.md`)
- Docker installed and running
- `make` available
- Azure CLI authenticated session (`az login`)
- Azure DevOps PAT exported as `AZURE_DEVOPS_PAT`
- GitHub NPM registry token exported as `GH_NPM_REGISTRY_TOKEN`
- Azure access token exported as `AZURE_ACCESS_TOKEN` (for the python worker build)
- `cloud-provider-kind` binary (used for the local KinD cluster)

### Steps

1. **Point python-eval-worker at the local SDK branch.** The published
`uipath` package on PyPI doesn't yet have `ClassifierEvaluator`. Edit
`Agents/python-eval-worker/pyproject.toml`:

```toml
[tool.uv.sources]
uipath = { path = "../../uipath-python/packages/uipath", editable = true }
```

Then `cd python-eval-worker && uv lock && uv sync`.

2. **Bring up the local KinD cluster** (from `Agents/`):
```bash
make create-kind-cluster
kubectl get nodes
sudo ./bin/cloud-provider-kind & # in a separate shell or background
make up
make deploy
```

3. **Build the backend with the classifier changes:**
```bash
git checkout feat/eval-classifier-backend # in Agents repo
# Re-trigger the helm/skaffold deploy for the backend
make deploy
```

4. **Build the frontend with the UI changes:**
```bash
git checkout feat/eval-dataset-evaluators-ui # in Agents repo
# Same deploy command rebuilds frontend image
```

5. **Open Studio Web** (URL surfaced by the deploy output), create an agent
project, upload the eval-set + evaluator JSONs from this directory (or
author them in the UI — the picker now shows a "Classifier" entry under
the AGGREGATION section), and click Run.

6. **Verify** the Aggregations panel renders between the run header and the
datapoint table, with the confusion matrix matching what Path A's Python
shim computes (macro F1 ≈ 0.667 on this fixture).

### Open questions for the team owning local dev

- Does the existing PAT / token set get refreshed automatically by the dev tooling, or do contributors need to rotate them periodically?
- Is there a simpler "local-only" path that bypasses the KinD cluster (e.g. docker-compose) for changes that don't touch K8s manifests?
- What's the standard pattern for pointing the python worker at a non-PyPI uipath build? The `[tool.uv.sources]` override above is the standard uv path — confirm there's no Helm/skaffold complication.

## Companion PRs

| Repo | Branch | PR | What |
|---|---|---|---|
| uipath-python | `feat/eval-classifier-evaluator` | [#1674](https://github.com/UiPath/uipath-python/pull/1674) | SDK `ClassifierEvaluator` |
| Agents | `feat/eval-classifier-backend` | [#5313](https://github.com/UiPath/Agents/pull/5313) | C# math + activity + envelope storage |
| Agents | `feat/eval-dataset-evaluators-ui` | [#5306](https://github.com/UiPath/Agents/pull/5306) | Frontend picker + Aggregations panel |
4 changes: 4 additions & 0 deletions packages/uipath/samples/classifier_demo/bindings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"version": "2.0",
"resources": []
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
{
"version": "1.0",
"id": "classifier-demo-eval-set",
"name": "Classifier demo eval set",
"evaluatorRefs": [
"intent_match",
"intent_classifier"
],
"evaluations": [
{
"id": "book-1",
"name": "book \u2014 straightforward",
"inputs": {
"utterance": "I want to book a table for two"
},
"expectedOutput": {
"intent": "book"
},
"evaluationCriterias": {
"intent_match": {
"expectedOutput": {
"intent": "book"
}
},
"intent_classifier": {}
}
},
{
"id": "book-2",
"name": "book \u2014 schedule keyword",
"inputs": {
"utterance": "Please schedule an appointment"
},
"expectedOutput": {
"intent": "book"
},
"evaluationCriterias": {
"intent_match": {
"expectedOutput": {
"intent": "book"
}
},
"intent_classifier": {}
}
},
{
"id": "book-3",
"name": "book \u2014 agent misclassifies (utterance triggers cancel keyword)",
"inputs": {
"utterance": "I had to cancel my last attempt but I want to reserve a slot now"
},
"expectedOutput": {
"intent": "book"
},
"evaluationCriterias": {
"intent_match": {
"expectedOutput": {
"intent": "book"
}
},
"intent_classifier": {}
}
},
{
"id": "cancel-1",
"name": "cancel \u2014 straightforward",
"inputs": {
"utterance": "Please cancel my reservation"
},
"expectedOutput": {
"intent": "cancel"
},
"evaluationCriterias": {
"intent_match": {
"expectedOutput": {
"intent": "cancel"
}
},
"intent_classifier": {}
}
},
{
"id": "cancel-2",
"name": "cancel \u2014 void synonym",
"inputs": {
"utterance": "I want to void the order"
},
"expectedOutput": {
"intent": "cancel"
},
"evaluationCriterias": {
"intent_match": {
"expectedOutput": {
"intent": "cancel"
}
},
"intent_classifier": {}
}
},
{
"id": "cancel-3",
"name": "cancel \u2014 agent misclassifies (utterance has 'move' which triggers reschedule)",
"inputs": {
"utterance": "I need to move past this and cancel everything"
},
"expectedOutput": {
"intent": "cancel"
},
"evaluationCriterias": {
"intent_match": {
"expectedOutput": {
"intent": "cancel"
}
},
"intent_classifier": {}
}
},
{
"id": "reschedule-1",
"name": "reschedule \u2014 straightforward",
"inputs": {
"utterance": "I want to reschedule the meeting"
},
"expectedOutput": {
"intent": "reschedule"
},
"evaluationCriterias": {
"intent_match": {
"expectedOutput": {
"intent": "reschedule"
}
},
"intent_classifier": {}
}
},
{
"id": "reschedule-2",
"name": "reschedule \u2014 move synonym",
"inputs": {
"utterance": "Can we move the slot to tomorrow"
},
"expectedOutput": {
"intent": "reschedule"
},
"evaluationCriterias": {
"intent_match": {
"expectedOutput": {
"intent": "reschedule"
}
},
"intent_classifier": {}
}
},
{
"id": "reschedule-3",
"name": "reschedule \u2014 agent misclassifies (falls through to default 'book')",
"inputs": {
"utterance": "Different timing please"
},
"expectedOutput": {
"intent": "reschedule"
},
"evaluationCriterias": {
"intent_match": {
"expectedOutput": {
"intent": "reschedule"
}
},
"intent_classifier": {}
}
}
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
{
"version": "1.0",
"id": "intent_classifier",
"description": "Classification aggregator. Pure metadata — carries the classes list + source evaluator name to downstream consumers (the C# backend computes precision/recall/F-score over the dataset). Per-datapoint result is a no-op carrying the metadata.",
"evaluatorTypeId": "uipath-classifier",
"evaluatorConfig": {
"name": "intent_classifier",
"classes": ["book", "cancel", "reschedule"],
"sourceEvaluator": "intent_match"
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{
"version": "1.0",
"id": "intent_match",
"description": "Per-datapoint ExactMatch on the agent's `intent` output. Produces expected/actual justification that the ClassifierEvaluator pipeline reads.",
"evaluatorTypeId": "uipath-exact-match",
"evaluatorConfig": {
"name": "intent_match",
"targetOutputKey": "intent",
"caseSensitive": false,
"negated": false,
"defaultEvaluationCriteria": {
"expectedOutput": "book"
}
}
}
42 changes: 42 additions & 0 deletions packages/uipath/samples/classifier_demo/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
"""Tiny intent-classification agent for the ClassifierEvaluator demo.

Given an utterance, returns the intent label. Three intents:
- book (anything containing "book" / "reserve" / "schedule")
- cancel (anything containing "cancel" / "void")
- reschedule (anything containing "reschedule" / "move")

A few datapoints are deliberately misclassified so the run-level
classification metrics (precision/recall/F-score) come out non-trivially.
"""

from dataclasses import dataclass


@dataclass
class IntentInput:
utterance: str


@dataclass
class IntentOutput:
intent: str


BOOK_KEYWORDS = {"book", "reserve", "schedule"}
CANCEL_KEYWORDS = {"cancel", "void"}
RESCHEDULE_KEYWORDS = {"reschedule", "move"}


async def main(input: IntentInput) -> IntentOutput:
"""Classify the utterance into book / cancel / reschedule."""
text = input.utterance.lower()
tokens = set(text.split())

if tokens & RESCHEDULE_KEYWORDS:
return IntentOutput(intent="reschedule")
if tokens & CANCEL_KEYWORDS:
return IntentOutput(intent="cancel")
if tokens & BOOK_KEYWORDS:
return IntentOutput(intent="book")
# Fallback to "book" — deliberately wrong-ish so the matrix is interesting.
return IntentOutput(intent="book")
Loading
Loading