Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
bd228c9
update P804 to runon hitapp server
babaknaderi Oct 7, 2025
9f612e4
fixes on p.804, updated personalized p835 and ACR
babaknaderi Oct 7, 2025
2cb01ba
update for result_parser and utility script to accept/reject hits
babaknaderi Oct 23, 2025
820d332
add rdp check to p.804
babaknaderi Dec 3, 2025
86352aa
basic support and documentation. Also updates on creating local previ…
babaknaderi Mar 25, 2026
cfbf336
applied changed for prolific support into other templates
babaknaderi Mar 25, 2026
ba37929
update documentation
babaknaderi Mar 25, 2026
4785d23
added script to create gold questions, updated trapping clip script, …
babaknaderi Mar 26, 2026
965239c
utility to upload files
babaknaderi Mar 27, 2026
4ff4ef4
Update study creation instruction and add utility scripts
babaknaderi Mar 27, 2026
e639356
Add direct upload mode to copy_to_pub_storage.py
babaknaderi Mar 27, 2026
fb1c11b
Add AI agent usage section to README.md
babaknaderi Mar 27, 2026
efb2bde
Update create.instruction.md: gold/trapping source clip guidance and …
babaknaderi May 19, 2026
1acf987
Improve agent discoverability and instruction file structure
babaknaderi May 19, 2026
e1c6c1f
Add CLAUDE.md for agent discovery in Claude Code
babaknaderi May 19, 2026
ce790ab
Add cross-platform adaptation directive to create-study agent
babaknaderi May 19, 2026
13097ec
Replace hardcoded repo paths with REPO_ROOT placeholder
babaknaderi May 19, 2026
3917211
Remove redundant quick-reference and known-issues sections
babaknaderi May 19, 2026
9036122
Address agent workflow feedback: dir rename, CSV fix, instruction upd…
babaknaderi May 20, 2026
be71557
Update agent runbook and add SAS token support to download_clips.py
babaknaderi May 21, 2026
e7f2d7e
Update agent SAS token handling and P.804 template
babaknaderi May 21, 2026
e53ff6b
Add CCR trapping clips and update TTS script to use Azure AD auth
babaknaderi May 22, 2026
8a2b8bf
Add CCR trapping support and update agent workflow
babaknaderi May 22, 2026
1c40598
Clarify DCR/CCR trapping as legacy gold/control question
babaknaderi May 22, 2026
6692b1b
Add analyze-results agent for result parsing workflow
babaknaderi May 22, 2026
e1b87bc
Clarify comments on trapping vs gold questions for CCR/DCR
babaknaderi May 22, 2026
10a859e
Fix spelling, grammar, and punctuation in HTML templates
babaknaderi May 22, 2026
28ac1f7
Update requirements.txt: remove stdlib, add missing deps, unpin versions
babaknaderi May 22, 2026
0d64953
Merge origin/master into babaknaderi/prolific, resolve conflicts
Copilot May 22, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
191 changes: 191 additions & 0 deletions .github/agents/analyze-results.agent.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,191 @@
---
name: analyze-results
description: Analyzes crowdsourced subjective test results — runs result_parser.py for data cleaning, quality checks, and per-clip/per-worker MOS aggregation.
---

# Analyze subjective test results

Use this runbook when asked to analyze, parse, or evaluate results from a completed
subjective speech quality test (ACR, DCR, CCR, P.835, P.804, echo impairment, or
personalized P.835).

**Trigger phrases**: "analyze results", "parse results", "evaluate the study",
"process the answers", "run result parser".

## Platform and shell adaptation

Code examples use **PowerShell on Windows** (`\` paths). Adapt for other OS/shells:
replace PowerShell cmdlets with equivalents, use `python3` if needed, convert paths.
Replace `REPO_ROOT` with the actual absolute path of this repository.

## Mandatory pre-check

Before running anything:

1. Read `AGENTS.md` and `.github\copilot-instructions.md`.
2. Confirm this is an analysis task, not creation. For study creation, use
the `create-study` agent instead.

## Environment prerequisites

Verify once at the start:

1. **Python deps**: `pip install -r requirements.txt --quiet` in `src\`.

## Inputs the agent must collect

Do not guess these values if they are missing:

1. **Test method**: one of `acr`, `dcr`, `ccr`, `p835`, `p804`,
`echo_impairment_test`, `pp835`.
2. **Result parser config file** (`*_result_parser.cfg`): generated by
`master_script.py` during study creation. Located in the project output
directory.
3. **Answers CSV** (`Batch_XXX.csv`): exported from the crowdsourcing platform
(AMT) or HIT App server. Contains worker responses.
4. **Prolific demographic CSV** (optional): `prolific_demographic_export_*.csv`
— only needed if the study was run on Prolific via HIT App server.

## Execution workflow

### 1. Collect input files

**[ASK]** Ask the user for:
- The path to the project directory (where the `*_result_parser.cfg` is).
- The test method used.
- Whether they used Prolific or AMT.

Then instruct: "Please download the answers file (`Batch_XXX.csv`) and, if using
Prolific, the demographic export (`prolific_demographic_export_*.csv`) and place
them in the same directory as the config file."

**[ASK]** Once the user confirms the files are ready, ask for the exact
filenames.

### 2. Validate input files

Before running the parser, verify:

1. The `*_result_parser.cfg` file exists and is readable.
2. The answers CSV (`Batch_XXX.csv`) exists, is non-empty, and is valid CSV.
3. If Prolific was used, the demographic CSV exists and is valid.

```powershell
# Validate files exist
Test-Path "PROJECT_DIR\*_result_parser.cfg"
Test-Path "PROJECT_DIR\Batch_XXX.csv"
# If Prolific:
Test-Path "PROJECT_DIR\prolific_demographic_export_*.csv"
```

### 3. Run the result parser

Set the working directory to the project directory before running.

**Without Prolific:**

```powershell
Set-Location PROJECT_DIR
python REPO_ROOT\src\result_parser.py `
--cfg RESULT_PARSER_CFG `
--method METHOD `
--answers Batch_XXX.csv
```

**With Prolific demographic data:**

```powershell
Set-Location PROJECT_DIR
python REPO_ROOT\src\result_parser.py `
--cfg RESULT_PARSER_CFG `
--method METHOD `
--answers Batch_XXX.csv `
--prolific_answers prolific_demographic_export_XXX.csv
```

**Notes:**
- Use **full absolute paths** for `--cfg` and script path to avoid resolution
issues.
- The working directory should be the project directory so output files are
written there.

### 4. Analyze the output and summarize

After the parser completes, provide a summary covering:

#### 4a. Rejection rate

Extract from the parser output:
- `"Number of submissions: YYYY"`
- `"overall XXXX answers are rejected"`

Calculate: `rejection_percentage = XXXX / YYYY * 100`

**⚠️ If rejection rate > 35%**: flag as alarming. Ask the user to investigate
the rejection reasons in the data cleaning report.

#### 4b. Gold question performance

Read `detailed_gold_question_performance.csv` from the working directory.

- Look for columns matching `wrong*` — these indicate how many times each gold
clip received a wrong answer.
- Look for columns matching `url*` — these identify the gold clip URLs.
- **Any row where the sum of `wrong*` columns > 0** means that gold clip received
at least one wrong answer.
- Calculate the rejection rate per gold clip:
`gold_rejection_rate = wrong_count / total_times_shown * 100`

**⚠️ If any gold clip is rejected > 20% of the time**: flag as alarming. Ask the
user to check that clip and verify the expected answer is correct. It may
indicate a bad gold clip rather than bad workers.

#### 4c. Summary to present

Provide the user with a structured summary:

```
📊 Result Parser Summary
─────────────────────────
Method: [method]
Total submissions: [N]
Rejected: [X] ([%]%)
Accepted & used: [Y] ([%]%)
─────────────────────────
⚠️ Alerts: [any alarming findings]
```

### 5. Point user to output files

After analysis, direct the user to the key output files:

| File pattern | Purpose |
|-------------|---------|
| `Batch_XXX_votes_per_clip_[SCALE].csv` | Per-clip MOS ratings for each scale |
| `Batch_XXX_votes_per_clip_all-scales.csv` | Aggregated per-clip ratings across all scales (multi-scale methods like P.804, P.835) |
| `Batch_XXX_votes_per_worker_[SCALE].csv` | Per-worker rating statistics |
| `Batch_XXX_all_votes_per_clip.csv` | All individual votes per clip (key: `all_votes` in name) |
| `Batch_XXX_data_cleaning_report.csv` | Detailed per-submission data cleaning report |
| `detailed_gold_question_performance.csv` | Per-gold-clip acceptance/rejection statistics |
| `Batch_XXX_quantity_bonus_report.csv` | Quantity bonus calculations |

**Scale suffixes by method:**

| Method | Scales |
|--------|--------|
| `acr` | `_mos` |
| `dcr` | `_dmos` |
| `ccr` | `_cmos` |
| `p835` | `_sig`, `_bak`, `_ovrl` + `all-scales` |
| `p804` | `_noi`, `_col`, `_dis`, `_loud`, `_reverb`, `_sig`, `_ovrl` + `all-scales` |
| `echo_impairment_test` | `_echo` |

### 6. Handle follow-up questions

If the user asks why specific submissions were rejected or not used:
- Direct them to `Batch_XXX_data_cleaning_report.csv`.
- Key columns: `accept` (1 = accepted), `accept_and_use` (1 = used for
aggregation), `failures` (reasons for rejection/exclusion).
- Common rejection reasons: `gold` (failed gold question), `variance` (low
rating variance), `comparisons` (failed pair comparisons), `performance`
(overall rater performance below threshold).
Loading
Loading