Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions examples/multi-app-terminal-gedit-app/run.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
#!/bin/bash
exec xfce4-terminal
54 changes: 54 additions & 0 deletions examples/multi-app-terminal-gedit.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
{
"schema_version": "1.0",
"id": "multi-app-terminal-gedit",
"instruction": "Open a terminal (xfce4-terminal). In the terminal, run: curl http://localhost:8000/data.csv -o /home/tester/data.csv. Then open the file /home/tester/data.csv in gedit. Use Find and Replace (Ctrl+H) to replace all occurrences of 'ERROR' with 'FIXED'. Save the file. Then switch back to the terminal and run: grep -c FIXED /home/tester/data.csv",
"completion_condition": "The file data.csv has been downloaded, all ERROR entries replaced with FIXED, saved, and the grep count command has been executed in the terminal.",
"app": {
"type": "folder",
"dir": "./examples/multi-app-terminal-gedit-app",
"entrypoint": "run.sh"
},
"config": [
{
"type": "execute",
"command": "mkdir -p /home/tester/server && cat > /home/tester/server/data.csv << 'CSVEOF'\nid,timestamp,status,message\n1,2024-01-15 08:30:00,OK,System started\n2,2024-01-15 08:31:12,ERROR,Connection timeout on port 5432\n3,2024-01-15 08:32:45,OK,Heartbeat received\n4,2024-01-15 08:33:01,ERROR,Disk usage above threshold\n5,2024-01-15 08:34:22,OK,Backup completed\n6,2024-01-15 08:35:10,ERROR,Authentication failed for user admin\n7,2024-01-15 08:36:00,OK,Cache cleared\n8,2024-01-15 08:37:15,ERROR,Memory allocation failure in worker 3\n9,2024-01-15 08:38:44,OK,Service restored\n10,2024-01-15 08:39:59,ERROR,SSL certificate expiring in 7 days\nCSVEOF"
},
{
"type": "execute",
"command": "cd /home/tester/server && nohup python3 -m http.server 8000 > /dev/null 2>&1 &"
}
Comment on lines +17 to +19
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 HTTP server race condition — agent may curl before server is ready

The HTTP server is started with nohup ... & (a backgrounded process), and there is no delay before the agent loop begins. If the agent issues curl http://localhost:8000/data.csv before Python's http.server finishes binding to port 8000, the curl will fail with "Connection refused" and the task cannot proceed.

The schema supports a sleep config step — adding one after the server launch ensures the server is up before the agent starts:

{
  "type": "execute",
  "command": "cd /home/tester/server && nohup python3 -m http.server 8000 > /dev/null 2>&1 &"
},
{
  "type": "sleep",
  "seconds": 1.0
}

Alternatively, a readiness-polling execute step like until curl -sf http://localhost:8000/ > /dev/null; do sleep 0.2; done would be more robust.

],
"evaluator": {
"mode": "hybrid",
"metrics": [
{
"type": "file_exists",
"path": "/home/tester/data.csv"
},
{
"type": "command_output",
"command": "grep -c FIXED /home/tester/data.csv",
"expected": "5",
"match_mode": "contains"
},
{
"type": "command_output",
"command": "grep -c ERROR /home/tester/data.csv",
Comment on lines +33 to +36
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Use equals match mode for exact count assertions

match_mode: "contains" is used to verify that grep -c ERROR outputs "0". Because "contains" checks for substring membership, any output that includes the digit "0" — e.g. "10", "20" — would also pass. While impossible with this 5-row dataset, it sets an imprecise contract. For numeric count checks, "equals" expresses the intent clearly and avoids future false positives if the dataset ever grows:

Suggested change
},
{
"type": "command_output",
"command": "grep -c ERROR /home/tester/data.csv",
"expected": "0",
"match_mode": "equals"

The same applies to the grep -c FIXED metric above ("5" is a substring of "15", "25", etc.).

"expected": "0",
"match_mode": "contains"
Comment on lines +36 to +38
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Evaluator metric for grep -c ERROR uses contains match on expected "0", which would false-positive on counts like 10, 20, 30, etc.

The metric at line 36-38 checks that grep -c ERROR output contains "0" using match_mode: "contains". In src/evaluator/command.rs:30, this translates to stdout.contains(expected), which is a substring check. Any count containing the digit '0' (e.g., "10", "20", "30", "100") would falsely pass this check. While this happens to work for this specific 5-entry CSV (where the only possible counts 0–5 don't cause a false positive), the evaluator metric does not correctly express its intent of verifying an exact count of zero. The correct match_mode should be "equals".

Suggested change
"command": "grep -c ERROR /home/tester/data.csv",
"expected": "0",
"match_mode": "contains"
"command": "grep -c ERROR /home/tester/data.csv",
"expected": "0",
"match_mode": "equals"
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

},
{
"type": "command_output",
"command": "head -1 /home/tester/data.csv",
"expected": "id,timestamp,status,message",
"match_mode": "contains"
}
]
},
"timeout": 240,
"max_steps": 25,
"metadata": {
"description": "Multi-app workflow: open terminal, curl a CSV from a local server, open in gedit, find-and-replace ERROR->FIXED, save, then verify in terminal. Tests app switching, terminal interaction, and dialog navigation.",
"tags": ["multi-app", "terminal", "gedit", "find-replace", "curl", "hard"]
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Missing trailing newline at end of file

The JSON file is missing a trailing newline. Most POSIX tools and style guides expect text files to end with a newline character. This is also the only example file in the repository without one. Adding a newline after the closing } brings it in line with the existing examples (gedit-save.json, libreoffice-calc.json, etc.).

Loading