-
Notifications
You must be signed in to change notification settings - Fork 0
feat: add harder multi-app example (terminal + gedit) #80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| #!/bin/bash | ||
| exec xfce4-terminal |
| Original file line number | Diff line number | Diff line change | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,54 @@ | ||||||||||||||
| { | ||||||||||||||
| "schema_version": "1.0", | ||||||||||||||
| "id": "multi-app-terminal-gedit", | ||||||||||||||
| "instruction": "Open a terminal (xfce4-terminal). In the terminal, run: curl http://localhost:8000/data.csv -o /home/tester/data.csv. Then open the file /home/tester/data.csv in gedit. Use Find and Replace (Ctrl+H) to replace all occurrences of 'ERROR' with 'FIXED'. Save the file. Then switch back to the terminal and run: grep -c FIXED /home/tester/data.csv", | ||||||||||||||
| "completion_condition": "The file data.csv has been downloaded, all ERROR entries replaced with FIXED, saved, and the grep count command has been executed in the terminal.", | ||||||||||||||
| "app": { | ||||||||||||||
| "type": "folder", | ||||||||||||||
| "dir": "./examples/multi-app-terminal-gedit-app", | ||||||||||||||
| "entrypoint": "run.sh" | ||||||||||||||
| }, | ||||||||||||||
| "config": [ | ||||||||||||||
| { | ||||||||||||||
| "type": "execute", | ||||||||||||||
| "command": "mkdir -p /home/tester/server && cat > /home/tester/server/data.csv << 'CSVEOF'\nid,timestamp,status,message\n1,2024-01-15 08:30:00,OK,System started\n2,2024-01-15 08:31:12,ERROR,Connection timeout on port 5432\n3,2024-01-15 08:32:45,OK,Heartbeat received\n4,2024-01-15 08:33:01,ERROR,Disk usage above threshold\n5,2024-01-15 08:34:22,OK,Backup completed\n6,2024-01-15 08:35:10,ERROR,Authentication failed for user admin\n7,2024-01-15 08:36:00,OK,Cache cleared\n8,2024-01-15 08:37:15,ERROR,Memory allocation failure in worker 3\n9,2024-01-15 08:38:44,OK,Service restored\n10,2024-01-15 08:39:59,ERROR,SSL certificate expiring in 7 days\nCSVEOF" | ||||||||||||||
| }, | ||||||||||||||
| { | ||||||||||||||
| "type": "execute", | ||||||||||||||
| "command": "cd /home/tester/server && nohup python3 -m http.server 8000 > /dev/null 2>&1 &" | ||||||||||||||
| } | ||||||||||||||
| ], | ||||||||||||||
| "evaluator": { | ||||||||||||||
| "mode": "hybrid", | ||||||||||||||
| "metrics": [ | ||||||||||||||
| { | ||||||||||||||
| "type": "file_exists", | ||||||||||||||
| "path": "/home/tester/data.csv" | ||||||||||||||
| }, | ||||||||||||||
| { | ||||||||||||||
| "type": "command_output", | ||||||||||||||
| "command": "grep -c FIXED /home/tester/data.csv", | ||||||||||||||
| "expected": "5", | ||||||||||||||
| "match_mode": "contains" | ||||||||||||||
| }, | ||||||||||||||
| { | ||||||||||||||
| "type": "command_output", | ||||||||||||||
| "command": "grep -c ERROR /home/tester/data.csv", | ||||||||||||||
|
Comment on lines
+33
to
+36
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
The same applies to the |
||||||||||||||
| "expected": "0", | ||||||||||||||
| "match_mode": "contains" | ||||||||||||||
|
Comment on lines
+36
to
+38
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡 Evaluator metric for The metric at line 36-38 checks that
Suggested change
Was this helpful? React with 👍 or 👎 to provide feedback. |
||||||||||||||
| }, | ||||||||||||||
| { | ||||||||||||||
| "type": "command_output", | ||||||||||||||
| "command": "head -1 /home/tester/data.csv", | ||||||||||||||
| "expected": "id,timestamp,status,message", | ||||||||||||||
| "match_mode": "contains" | ||||||||||||||
| } | ||||||||||||||
| ] | ||||||||||||||
| }, | ||||||||||||||
| "timeout": 240, | ||||||||||||||
| "max_steps": 25, | ||||||||||||||
| "metadata": { | ||||||||||||||
| "description": "Multi-app workflow: open terminal, curl a CSV from a local server, open in gedit, find-and-replace ERROR->FIXED, save, then verify in terminal. Tests app switching, terminal interaction, and dialog navigation.", | ||||||||||||||
| "tags": ["multi-app", "terminal", "gedit", "find-replace", "curl", "hard"] | ||||||||||||||
| } | ||||||||||||||
| } | ||||||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The JSON file is missing a trailing newline. Most POSIX tools and style guides expect text files to end with a newline character. This is also the only example file in the repository without one. Adding a newline after the closing |
||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The HTTP server is started with
nohup ... &(a backgrounded process), and there is no delay before the agent loop begins. If the agent issuescurl http://localhost:8000/data.csvbefore Python'shttp.serverfinishes binding to port 8000, the curl will fail with "Connection refused" and the task cannot proceed.The schema supports a
sleepconfig step — adding one after the server launch ensures the server is up before the agent starts:{ "type": "execute", "command": "cd /home/tester/server && nohup python3 -m http.server 8000 > /dev/null 2>&1 &" }, { "type": "sleep", "seconds": 1.0 }Alternatively, a readiness-polling execute step like
until curl -sf http://localhost:8000/ > /dev/null; do sleep 0.2; donewould be more robust.