diff --git a/.github/agents/analyze-results.agent.md b/.github/agents/analyze-results.agent.md new file mode 100644 index 0000000..c993fa0 --- /dev/null +++ b/.github/agents/analyze-results.agent.md @@ -0,0 +1,191 @@ +--- +name: analyze-results +description: Analyzes crowdsourced subjective test results — runs result_parser.py for data cleaning, quality checks, and per-clip/per-worker MOS aggregation. +--- + +# Analyze subjective test results + +Use this runbook when asked to analyze, parse, or evaluate results from a completed +subjective speech quality test (ACR, DCR, CCR, P.835, P.804, echo impairment, or +personalized P.835). + +**Trigger phrases**: "analyze results", "parse results", "evaluate the study", +"process the answers", "run result parser". + +## Platform and shell adaptation + +Code examples use **PowerShell on Windows** (`\` paths). Adapt for other OS/shells: +replace PowerShell cmdlets with equivalents, use `python3` if needed, convert paths. +Replace `REPO_ROOT` with the actual absolute path of this repository. + +## Mandatory pre-check + +Before running anything: + +1. Read `AGENTS.md` and `.github\copilot-instructions.md`. +2. Confirm this is an analysis task, not creation. For study creation, use + the `create-study` agent instead. + +## Environment prerequisites + +Verify once at the start: + +1. **Python deps**: `pip install -r requirements.txt --quiet` in `src\`. + +## Inputs the agent must collect + +Do not guess these values if they are missing: + +1. **Test method**: one of `acr`, `dcr`, `ccr`, `p835`, `p804`, + `echo_impairment_test`, `pp835`. +2. **Result parser config file** (`*_result_parser.cfg`): generated by + `master_script.py` during study creation. Located in the project output + directory. +3. **Answers CSV** (`Batch_XXX.csv`): exported from the crowdsourcing platform + (AMT) or HIT App server. Contains worker responses. +4. **Prolific demographic CSV** (optional): `prolific_demographic_export_*.csv` + — only needed if the study was run on Prolific via HIT App server. + +## Execution workflow + +### 1. Collect input files + +**[ASK]** Ask the user for: +- The path to the project directory (where the `*_result_parser.cfg` is). +- The test method used. +- Whether they used Prolific or AMT. + +Then instruct: "Please download the answers file (`Batch_XXX.csv`) and, if using +Prolific, the demographic export (`prolific_demographic_export_*.csv`) and place +them in the same directory as the config file." + +**[ASK]** Once the user confirms the files are ready, ask for the exact +filenames. + +### 2. Validate input files + +Before running the parser, verify: + +1. The `*_result_parser.cfg` file exists and is readable. +2. The answers CSV (`Batch_XXX.csv`) exists, is non-empty, and is valid CSV. +3. If Prolific was used, the demographic CSV exists and is valid. + +```powershell +# Validate files exist +Test-Path "PROJECT_DIR\*_result_parser.cfg" +Test-Path "PROJECT_DIR\Batch_XXX.csv" +# If Prolific: +Test-Path "PROJECT_DIR\prolific_demographic_export_*.csv" +``` + +### 3. Run the result parser + +Set the working directory to the project directory before running. + +**Without Prolific:** + +```powershell +Set-Location PROJECT_DIR +python REPO_ROOT\src\result_parser.py ` + --cfg RESULT_PARSER_CFG ` + --method METHOD ` + --answers Batch_XXX.csv +``` + +**With Prolific demographic data:** + +```powershell +Set-Location PROJECT_DIR +python REPO_ROOT\src\result_parser.py ` + --cfg RESULT_PARSER_CFG ` + --method METHOD ` + --answers Batch_XXX.csv ` + --prolific_answers prolific_demographic_export_XXX.csv +``` + +**Notes:** +- Use **full absolute paths** for `--cfg` and script path to avoid resolution + issues. +- The working directory should be the project directory so output files are + written there. + +### 4. Analyze the output and summarize + +After the parser completes, provide a summary covering: + +#### 4a. Rejection rate + +Extract from the parser output: +- `"Number of submissions: YYYY"` +- `"overall XXXX answers are rejected"` + +Calculate: `rejection_percentage = XXXX / YYYY * 100` + +**⚠️ If rejection rate > 35%**: flag as alarming. Ask the user to investigate +the rejection reasons in the data cleaning report. + +#### 4b. Gold question performance + +Read `detailed_gold_question_performance.csv` from the working directory. + +- Look for columns matching `wrong*` — these indicate how many times each gold + clip received a wrong answer. +- Look for columns matching `url*` — these identify the gold clip URLs. +- **Any row where the sum of `wrong*` columns > 0** means that gold clip received + at least one wrong answer. +- Calculate the rejection rate per gold clip: + `gold_rejection_rate = wrong_count / total_times_shown * 100` + +**⚠️ If any gold clip is rejected > 20% of the time**: flag as alarming. Ask the +user to check that clip and verify the expected answer is correct. It may +indicate a bad gold clip rather than bad workers. + +#### 4c. Summary to present + +Provide the user with a structured summary: + +``` +📊 Result Parser Summary +───────────────────────── +Method: [method] +Total submissions: [N] +Rejected: [X] ([%]%) +Accepted & used: [Y] ([%]%) +───────────────────────── +⚠️ Alerts: [any alarming findings] +``` + +### 5. Point user to output files + +After analysis, direct the user to the key output files: + +| File pattern | Purpose | +|-------------|---------| +| `Batch_XXX_votes_per_clip_[SCALE].csv` | Per-clip MOS ratings for each scale | +| `Batch_XXX_votes_per_clip_all-scales.csv` | Aggregated per-clip ratings across all scales (multi-scale methods like P.804, P.835) | +| `Batch_XXX_votes_per_worker_[SCALE].csv` | Per-worker rating statistics | +| `Batch_XXX_all_votes_per_clip.csv` | All individual votes per clip (key: `all_votes` in name) | +| `Batch_XXX_data_cleaning_report.csv` | Detailed per-submission data cleaning report | +| `detailed_gold_question_performance.csv` | Per-gold-clip acceptance/rejection statistics | +| `Batch_XXX_quantity_bonus_report.csv` | Quantity bonus calculations | + +**Scale suffixes by method:** + +| Method | Scales | +|--------|--------| +| `acr` | `_mos` | +| `dcr` | `_dmos` | +| `ccr` | `_cmos` | +| `p835` | `_sig`, `_bak`, `_ovrl` + `all-scales` | +| `p804` | `_noi`, `_col`, `_dis`, `_loud`, `_reverb`, `_sig`, `_ovrl` + `all-scales` | +| `echo_impairment_test` | `_echo` | + +### 6. Handle follow-up questions + +If the user asks why specific submissions were rejected or not used: +- Direct them to `Batch_XXX_data_cleaning_report.csv`. +- Key columns: `accept` (1 = accepted), `accept_and_use` (1 = used for + aggregation), `failures` (reasons for rejection/exclusion). +- Common rejection reasons: `gold` (failed gold question), `variance` (low + rating variance), `comparisons` (failed pair comparisons), `performance` + (overall rater performance below threshold). diff --git a/.github/agents/create-study.agent.md b/.github/agents/create-study.agent.md new file mode 100644 index 0000000..e1810af --- /dev/null +++ b/.github/agents/create-study.agent.md @@ -0,0 +1,722 @@ +--- +name: create-study +description: Creates subjective speech quality tests using the P.808 toolkit — handles study setup, gold/trapping clip generation, storage upload, and project building for crowdsourcing platforms. +--- + +# Create subjective test instructions + +Use this runbook when asked to create a new subjective speech quality test with the P.808 toolkit. + +**Trigger phrases**: "create a study", "run a [method] test", "set up a [method] study", "prepare a +[method] test for these files". + +## Platform and shell adaptation + +Code examples use **PowerShell on Windows** (`\` paths). Adapt for other OS/shells: +replace PowerShell cmdlets with equivalents, use `python3` if needed, convert paths. +Replace `REPO_ROOT` with the actual absolute path of this repository. + +## Best-practice variables + +These are best-practice defaults. Confirm or override them with the requester before first use. +After confirmation, save a `.cfg` file next to the input files so future runs can reuse it. +When asked to re-run a test or "go yolo", look for an existing config file first. + +```text +BEST_PRACTICE_PLATFORM = Prolific +BEST_PRACTICE_VALID_VOTE_BUFFER = 20% +BEST_PRACTICE_CLIPS_PER_SESSION = 10 +BEST_PRACTICE_GOLD_PER_SESSION = 1 (use 2 for P.804 — see method-specific notes) +BEST_PRACTICE_TRAPPING_PER_SESSION = 1 +BEST_PRACTICE_TRAINING_CLIPS = 5 +BEST_PRACTICE_GOLD_SOURCE_COUNT = max(3, ceil(0.05 * number_of_rating_clips)) +BEST_PRACTICE_TRAPPING_SOURCE_COUNT= max(3, ceil(0.05 * number_of_rating_clips)) +BEST_PRACTICE_MAX_GOLD_SOURCE_CLIPS = 15 +BEST_PRACTICE_MAX_TRAPPING_SOURCE_CLIPS= 15 +BEST_PRACTICE_ALLOWED_MAX_HITS = min(int(number_of_rating_clips / 10), 50) +BEST_PRACTICE_BASE_PAYMENT = 0.50 +BEST_PRACTICE_QUANTITY_BONUS = 0.10 +BEST_PRACTICE_QUALITY_BONUS = 0.15 +BEST_PRACTICE_BW_MIN = FB +``` + +## Scope + +This instruction covers preparing inputs, generating gold/trapping clips, uploading to +storage, running `master_script.py`, and handing off for publishing. Setting up the HIT +in a HITAPP server and publishing is done by the requester. + +## Mandatory pre-check + +Before editing or running anything in this repository: + +1. Read `AGENTS.md` and `.github\copilot-instructions.md`. +2. Confirm this is a creation task, not analysis. For analysis, use + `.github\evaluate.instruction.md` instead. + +## Environment prerequisites + +Verify once at the start: + +1. **`az` CLI** logged in: `az account show`. If expired, prompt `az login`. +2. **Python deps**: `pip install -r requirements.txt --quiet` in `src\`. +3. **PowerShell**: use `Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass`. + Prefer running Python scripts directly over `.ps1` wrappers. + +Once prerequisites pass, proceed without pausing except at **[ASK]** decision points. + +## Reusing information from prior studies + +When creating a new study that reuses settings from a previous study or session +knowledge (e.g. same storage account, contact email, platform, source clips, +gold/trapping clips, training clips), **do not silently reuse them**. Instead: + +1. Collect all reused or assumed values into a single summary list. +2. **[ASK]** Present the list and ask: "I plan to reuse the following from the + previous study. Are all of these correct?" + - If the user confirms all → proceed. + - If the user says they want to modify → ask about each item one by one, + then proceed with the updated values. + +## Supported test methods + +| Method | `--method` flag | Gold clip generation | Trapping config | Template | +|--------|-----------------|---------------------|-----------------|----------| +| ACR | `acr` | `--method acr` | `trapping.cfg` or `trapping_p835.cfg` | `ACR_template.html` | +| DCR | `dcr` | N/A (manual) | N/A — see note below | `DCR_template.html` | +| CCR | `ccr` | N/A (manual) | N/A — see note below | `CCR_template.html` | +| P.835 | `p835` | `--method acr` (**not** `p835`) | `trapping.cfg` or `trapping_p835.cfg` | `P835_template.html` | +| P.804 | `p804` | `--method p804` | `trapping_p804.cfg` | via `pp835_p804` path | +| Echo impairment | `echo_impairment_test` | `--method acr` | `trapping.cfg` | `echo_impairment_test_template.html` | +| Personalized P.835 | `pp835` | special (per-dimension) | `trapping.cfg` | `P835_personalized_template3.html` | + +**Critical**: For plain `p835`, use `--method acr` when generating gold clips with +`create_gold_clips.py`. The `p835` method in the gold generator produces per-dimension columns +(`gold_sig_ans`, `gold_bak_ans`, `gold_ovrl_ans`) but `master_script.py` expects `gold_clips_ans` +for plain P.835. + +**DCR/CCR trapping note (legacy)**: DCR and CCR do **not** use traditional trapping +questions (i.e. clips with overlaid spoken scores generated by `create_trapping_stimuli.py`). +Instead, the `TP` field in the publish batch contains a reference clip that functions as +a **gold/control question** — the worker compares a clip to itself, so the expected +answer is "about the same" (0). The `trapping_clips.csv` for DCR/CCR should contain +reference clip URLs only (no `trapping_ans` column). Do **not** run +`create_trapping_stimuli.py` for these methods. This is a legacy naming issue in the +code and may be updated in the future. + +## Inputs the agent must confirm + +Do not guess these values if they are missing: + +1. **Test method**: one of `acr`, `dcr`, `ccr`, `p835`, `p804`, `echo_impairment_test`, `pp835`. +2. **Crowd platform**: Prolific (recommended), AMT, or another panel. +3. **Project name**: for generated output folder and files. +4. **Input resources**: + - `rating_clips.csv` — **required**. + - `training_clips.csv` — **required for most methods** (can be auto-generated from + rating clips, but manual selection is recommended — see section 4). + **Not needed for P.804 or pp835 when `training_gold_clips.csv` is provided or + generated** — the two are mutually exclusive (see section 4). + - `training_gold_clips.csv` — **P.804 and pp835 only**. Contains training + clips with per-dimension answers, variance, and feedback messages. **When available + (provided or auto-generated), this takes priority over `training_clips.csv`** — + do not ask for or generate plain training clips. If not provided the agent can + generate one from gold clips (see section 4b). + - `gold_clips.csv` — optional (can be generated from source clips). + - `trapping_clips.csv` — optional (can be generated from source clips). +5. **Source clips for gold/trapping generation**: + - **Gold clips**: if `gold_clips.csv` is not provided, the agent **must not** blindly + download random rating clips. Gold clips require **high-quality, clean reference audio**. + Ask the user one of: + - Do they have a local directory of clean reference WAV files from the same dataset? + - Can they identify clean clips by a URL pattern (e.g. `*/clean/*`, `*/reference/*`)? + - Would they like the agent to download a small subset of rating clips for the user to + **listen to and manually remove any clips with distortion** before gold generation? + **Important**: never use the sample clips bundled in this repository (`src\test_inputs\`). + Source clips must come from the same dataset as the rating clips. + - **Trapping clips**: if `trapping_clips.csv` is not provided, the quality of source + clips does **not** matter for trapping questions. The agent can download a random sample + of rating clips and use them directly — no manual review needed. +6. **Storage**: the Azure storage account name and container for uploading generated clips. + The container **must be publicly accessible** — crowd workers need unauthenticated access. + See "Storage and public accessibility" below for the full check procedure. +7. **Contact email**: the email address to show in the HIT app for worker inquiries. + Do **not** use a hardcoded default — always ask. +8. **Max assignments per worker** (for Prolific) or worker requirements and payment (for AMT). +9. **Target valid votes per clip**: suggest publishing `target + BEST_PRACTICE_VALID_VOTE_BUFFER`. + +## Storage and public accessibility + +All clip URLs must be publicly accessible for crowd workers. + +**Check accessibility**: pick one URL from the clip list and test with HTTP HEAD: + +```powershell +$testUrl = "" +try { + $r = Invoke-WebRequest -Uri $testUrl -Method Head -UseBasicParsing -ErrorAction Stop + Write-Host "PUBLIC — HTTP $($r.StatusCode)" +} catch { Write-Host "NOT PUBLIC — $($_.Exception.Message)" } +``` + +- **If public (HTTP 200)**: upload gold/trapping clips to the same account/container. +- **If not public (HTTP 403/409)**: ask the user which **public** container to copy all + clips to. Use a **random opaque subdirectory** (e.g. `PROJECT_NAME/stim_x7k2m9`). + +#### Preserving directory structure when copying + +Rating clips often share filenames across subdirectories (each representing a different +condition). **Never copy to a flat directory** — preserve the parent directory name: + +```text +Source: .../condition_A/clip1.wav → Dest: ...//condition_A/clip1.wav +Source: .../condition_B/clip1.wav → Dest: ...//condition_B/clip1.wav +``` + +#### Copy method: `azcopy` with SAS tokens (preferred) + +For private-to-public transfers, use `azcopy` with **user-delegation SAS tokens on +both source and destination URLs**. **Never use `azcopy login`** — its token cache is +separate from `az login`, expires after 90 days of inactivity (`AADSTS700082`), and +cannot be refreshed via `az login`. Always generate SAS tokens with `az` instead. + +**Generate SAS tokens for both containers and copy:** + +```powershell +$expiry = (Get-Date).AddHours(2).ToUniversalTime().ToString("yyyy-MM-ddTHH:mmZ") + +$srcSas = az storage container generate-sas ` + --account-name SOURCE_ACCOUNT --name SOURCE_CONTAINER ` + --permissions rl --expiry $expiry ` + --auth-mode login --as-user -o tsv + +$destSas = az storage container generate-sas ` + --account-name DEST_ACCOUNT --name DEST_CONTAINER ` + --permissions rwl --expiry $expiry ` + --auth-mode login --as-user -o tsv + +azcopy copy ` + "https://SOURCE_ACCOUNT.blob.core.windows.net/SOURCE_CONTAINER/path?$srcSas" ` + "https://DEST_ACCOUNT.blob.core.windows.net/DEST_CONTAINER/dest_path?$destSas" ` + --recursive +``` + +This uses your `az login` session to generate the tokens — no `azcopy login` needed. +Requires the **Storage Blob Delegator** role (included in **Storage Blob Data +Contributor**) on both storage accounts. + +**Important**: `azcopy` with `--recursive` preserves the source directory tree. The +copied paths will include intermediate directories from the source prefix. After +copying, update the rating clips CSV to reflect the **actual** destination paths +(verify with `az storage blob list`). + +**Common failures:** + +| Symptom | Fix | +|---------|-----| +| `AADSTS700082: refresh token expired` | You used `azcopy login` — switch to the SAS token approach above | +| `AuthorizationPermissionMismatch` | Assign **Storage Blob Data Contributor** role on both accounts | +| `azcopy: command not found` | Install from https://aka.ms/azcopy or use the `az` fallback below | + +**Fallback**: if `azcopy` is not installed, use `az storage blob copy start` with +SAS-authenticated source URIs: + +```powershell +$clips | ForEach-Object -Parallel { + $url = $_.rating_clips + $sasToken = $using:srcSas + if ($url -match 'https?:/+([^.]+)\.blob\.core\.windows\.net/([^/]+)/(.+)/([^/]+)$') { + $parentDir = ($Matches[3] -split '/')[-1]; $fileName = $Matches[4] + az storage blob copy start ` + --account-name DEST_ACCOUNT --destination-container DEST_CONTAINER ` + --destination-blob "DEST_PREFIX/$parentDir/$fileName" ` + --source-uri "$url`?$sasToken" ` + --auth-mode login 2>&1 | Out-Null + } +} -ThrottleLimit 20 +``` + +## CSV column names by method + +These are the **actual column names** expected by the code in `src\create_input.py` and +`src\master_script.py`. + +### Single-stimulus methods (ACR, P.835, echo_impairment_test) + +| CSV file | Columns | +|----------|---------| +| `rating_clips.csv` | `rating_clips` | +| `training_clips.csv` | `training_clips` | +| `gold_clips.csv` | `gold_clips`, `gold_clips_ans` | +| `trapping_clips.csv` | `trapping_clips`, `trapping_ans` | + +### P.804 + +P.804 gold clips use **per-dimension answer columns** and a **`ver` column** to assign +clips to gold slots. The `master_script.py` internally renames columns via +`update_gold_clips_for_p804()`. + +| CSV file | Columns | +|----------|---------| +| `rating_clips.csv` | `rating_clips` | +| `training_clips.csv` | `training_clips` (not needed if `training_gold_clips.csv` is used) | +| `training_gold_clips.csv` | `training_clips`, `noise_ans`, `noise_var`, `noise_msg`, `disc_ans`, `disc_var`, `disc_msg`, `col_ans`, `col_var`, `col_msg`, `loud_ans`, `loud_var`, `loud_msg`, `reverb_ans`, `reverb_var`, `reverb_msg`, `sig_ans`, `sig_var`, `sig_msg`, `ovrl_ans`, `ovrl_var`, `ovrl_msg` | +| `gold_clips.csv` | `gold_url`, `col_ans`, `disc_ans`, `loud_ans`, `noise_ans`, `reverb_ans`, `sig_ans`, `ovrl_ans`, `ver` | +| `trapping_clips.csv` | `trapping_clips`, `trapping_ans` | + +**Column mapping note**: `create_gold_clips.py --method p804` outputs a column named +`gold_clips`. You **must** rename it to `gold_url` before passing it to `master_script.py`. +The answer columns (`col_ans`, `disc_ans`, etc.) are output without the `gold_` prefix +and should be kept as-is — the master script adds the prefix internally. + +The `ver` column is **required** and must contain an integer (1 or 2) indicating which +gold slot the clip belongs to. See section 5 for how to generate two sets. + +### Double-stimulus methods (DCR, CCR) + +| CSV file | Columns | +|----------|---------| +| `rating_clips.csv` | `rating_clips`, `references` | +| `training_clips.csv` | `training_clips`, `training_references` | +| `trapping_clips.csv` | `trapping_clips` (uses references as trapping) | + +### Personalized P.835 (`pp835`) + +| CSV file | Columns | +|----------|---------| +| `gold_clips.csv` | `gold_url`, `gold_sig_ans`, `gold_bak_ans`, `gold_ovrl_ans` | +| `training_gold_clips.csv` | `training_clips`, `sig_ans`, `sig_var`, `sig_msg`, `bak_ans`, `bak_var`, `bak_msg`, `ovrl_ans`, `ovrl_var`, `ovrl_msg` | + +See `src\test_inputs\` for example CSV files. + +## Execution workflow + +### 1. Prepare the environment + +Environment setup is covered in "Environment prerequisites" above. Verify `az` login +and install dependencies before entering the workflow. + +### 2. Validate the input clip list + +Before proceeding, validate **every URL** to catch typos, broken links, or inaccessible +files early. + +1. Parse the CSV and fix common URL formatting issues (e.g. `https:/host` → `https://host`). +2. Validate URLs using one of the approaches below. +3. If **any URLs are invalid**, **stop** and report the full list to the user. +4. Check for **duplicate URLs** — report them and ask the user to clarify before continuing. + +#### Validation approaches + +**For public storage**: use `check_urls_in_files_exist` from `master_script.py` for fast +multicore validation: + +```python +import sys, os +sys.path.insert(0, os.path.join("REPO_ROOT", "src")) +from master_script import check_urls_in_files_exist +check_urls_in_files_exist("CLIP_LIST_PATH", ["COLUMN_NAME"]) +``` + +**For private storage**: `check_urls_in_files_exist` uses plain HTTP HEAD and will fail +with HTTP 409. Generate a SAS token using `az`, append it to every URL temporarily, then +validate: + +```powershell +$expiry = (Get-Date).AddHours(2).ToUniversalTime().ToString("yyyy-MM-ddTHH:mmZ") +$sasToken = az storage container generate-sas ` + --account-name SOURCE_ACCOUNT --name SOURCE_CONTAINER ` + --permissions rl --expiry $expiry ` + --auth-mode login --as-user -o tsv +``` + +Then temporarily append `?$sasToken` to every URL in the CSV, call +`check_urls_in_files_exist`, and strip the token afterwards. Store the SAS token for +reuse in later steps (downloading clips, azcopy transfers). + +**[ASK]** If clips are on private storage: "Your clips are on private storage. I can +generate a SAS token using your `az login` session to validate and download clips. +Should I go ahead?" + +### 3. Check for existing project config + +Look for a `.cfg` file next to the `rating_clips.csv` in the requester's data directory. +If one exists, offer to reuse it. If this is a re-run or "go yolo" request, use it directly. + +### 4. Prepare training clips + +Training clips anchor participants' perception and should represent the quality +distribution within the dataset — from worst to best. + +**P.804 and pp835 — training gold clips take priority:** + +For P.804 and pp835, `training_gold_clips.csv` and `training_clips.csv` are **mutually +exclusive** — the master script accepts one or the other, not both. Because training gold +clips provide richer per-dimension feedback, they are **always preferred**: + +1. If the user provides `training_gold_clips.csv` → use it, **skip** plain training clips. +2. If neither is provided → generate `training_gold_clips.csv` from gold clips (see + section 4b), **skip** plain training clips. +3. Only generate or ask for `training_clips.csv` when the method is **not** P.804/pp835, + or when the user explicitly opts out of training gold clips. + +**For all other methods (ACR, DCR, CCR, P.835, echo impairment):** + +**[ASK]** Ask the user: "Can you provide a `training_clips.csv` file with manually +selected clips that represent the quality distribution in your dataset? For multi-scale +tests (P.804, P.835), training clips should also show variations across all dimensions. +If not, I can randomly select some samples, but manual selection is recommended." + +If the user provides a file, use it directly. Otherwise, auto-generate: + +```powershell +Set-Location REPO_ROOT\src +python utils\select_training_clips.py ` + --input RATING_CLIPS_PATH\rating_clips.csv ` + --output RATING_CLIPS_PATH\training_clips.csv ` + --count 5 +``` + +Note: `select_training_clips.py` selects clips purely by list position without knowledge +of actual quality. Manual selection is always preferred. + +#### 4b. Prepare training gold clips (P.804 and pp835 only) + +For P.804 and personalized P.835, you can provide `training_gold_clips.csv` which adds +per-dimension answers, accepted variance, and feedback messages to training clips. This +enables the HIT app to show participants feedback if their training answers deviate too +far from the expected score. + +**[ASK]** Ask the user: "For P.804/pp835, do you have a `training_gold_clips.csv` with +per-dimension answers and feedback messages? If not, I can generate one from the gold +clips by selecting those with the highest deviation across dimensions (up to 5 clips)." + +**CSV format for P.804 `training_gold_clips.csv`:** + +| Column | Description | +|--------|-------------| +| `training_clips` | URL of the training clip | +| `noise_ans` | Expected noise score (1–5) | +| `noise_var` | Accepted deviation (e.g. 1); use 0 to skip feedback for this dimension | +| `noise_msg` | Feedback message shown if the answer deviates | +| `disc_ans`, `disc_var`, `disc_msg` | Same for discontinuity | +| `col_ans`, `col_var`, `col_msg` | Same for coloration | +| `loud_ans`, `loud_var`, `loud_msg` | Same for loudness | +| `reverb_ans`, `reverb_var`, `reverb_msg` | Same for reverberation | +| `sig_ans`, `sig_var`, `sig_msg` | Same for signal distortion | +| `ovrl_ans`, `ovrl_var`, `ovrl_msg` | Same for overall quality | + +For pp835, use columns: `sig_ans/var/msg`, `bak_ans/var/msg`, `ovrl_ans/var/msg`. + +See `src\test_inputs\training_gold_clips_p804.csv` for an example. + +**Rules**: `_var = 1` accepts ±1 deviation; `_var = 0` skips feedback. `_msg` is a +short feedback message. Empty `_ans` cells accept any answer. + +**Auto-generating from gold clips:** + +If the user does not provide training gold clips, generate them from the gold clips: +1. Select up to 5 gold clips with the most distinctive quality characteristics + (prefer clips with extreme or opposite dimension values). +2. Assign `_var = 1` for all dimensions that have an answer. +3. Write brief feedback messages for each dimension describing the expected quality. +4. Upload these clips to public storage (they may already be uploaded as gold clips). + +### 5. Generate gold clips (if not provided) + +Gold clips require **high-quality, clean reference WAV files** — not arbitrary rating clips. + +**[ASK] Source clips**: ask the user how to obtain clean source audio (see "Inputs the +agent must confirm", item 5). Options: +- The user provides a directory of clean WAV files from the same dataset. +- The user identifies clean clips by a URL pattern (e.g. `*/clean/*`). +- Download a subset and let the user review them to remove any with distortion. + +If downloading clips from Azure private storage, either: +- Use `download_clips.py` with `--sas_token` to authenticate, or +- Use `az storage blob download` with `--auth-mode login` for each clip individually. + +**How many source clips?** Use `BEST_PRACTICE_GOLD_SOURCE_COUNT` capped at +`BEST_PRACTICE_MAX_GOLD_SOURCE_CLIPS`. + +Generate gold clips (filenames are **anonymized** by default — do **not** use +`--no_anonymize`): + +```powershell +python create_gold_clips.py ` + --input_dir RATING_CLIPS_PATH\gold_source ` + --output_dir RATING_CLIPS_PATH\gold_output ` + --method GOLD_METHOD +``` + +**Method mapping for `create_gold_clips.py`:** + +| Study method | Use `--method` | Output columns | +|--------------|---------------|----------------| +| `acr` | `acr` | `gold_clips`, `gold_clips_ans` | +| `p835` | `acr` | `gold_clips`, `gold_clips_ans` | +| `echo_impairment_test` | `acr` | `gold_clips`, `gold_clips_ans` | +| `p804` | `p804` | `gold_clips`, `col_ans`, `disc_ans`, `loud_ans`, `noise_ans`, `reverb_ans`, `sig_ans`, `ovrl_ans` | +| `pp835` | `p835` | `gold_clips`, `gold_sig_ans`, `gold_bak_ans`, `gold_ovrl_ans` | + +**Note**: Each source clip produces multiple gold clips (clean, noisy, distorted, etc.). +With 3 source clips you get approximately 12 gold clips for ACR, more for P.804 +(~11 variants per source clip). + +#### P.804-specific: assigning `ver` column from a single gold set + +For P.804, always use `number_of_gold_clips_per_session = 2`. You do **not** need two +independent sets of source clips. Instead, generate one set and assign `ver` based on the +`ovrl_ans` value: + +- Clips with `ovrl_ans = 5` (clean/high-quality) → `ver = 1` +- Clips with `ovrl_ans = 1` (degraded) → `ver = 2` + +1. Run `create_gold_clips.py --method p804` once with all source clips. +2. Rename `gold_clips` → `gold_url` in the output CSV. +3. Add a `ver` column: `ver=1` when `ovrl_ans=5` (clean), `ver=2` when `ovrl_ans=1` (degraded). +4. Export as `gold_clips.csv`. + +After generation, upload to public storage: + +```powershell +python utils\copy_to_pub_storage.py upload ` + --input RATING_CLIPS_PATH\gold_output\gold_clips_report.csv ` + --columns gold_clips --local-dir RATING_CLIPS_PATH\gold_output ` + --account-name STORAGE_ACCOUNT_NAME ` + --target-container TARGET_CONTAINER ` + --dest-path PROJECT_NAME/RANDOM_SUBDIR +``` + +Use a **random subdirectory name** (not `gold` or `trapping`). This uploads via +`az login` credentials and produces `gold_clips_report_public.csv` with public URLs. +If `az` CLI is unavailable, fall back to `upload-local` mode. + +For P.804, apply column renaming (`gold_clips` → `gold_url`) and add `ver` **after** +URLs have been updated to public paths. + +### 6. Generate trapping clips (if not provided) + +Trapping clips can be generated from any rating clips — they do not need to be +high-quality references (unlike gold clips). Download a sample of rating clips: + +```powershell +python utils\download_clips.py ` + --input RATING_CLIPS_PATH\rating_clips.csv ` + --column rating_clips ` + --output_dir RATING_CLIPS_PATH\trapping_source ` + --sample BEST_PRACTICE_TRAPPING_SOURCE_COUNT ` + --strategy random --seed 99 ` + --sas_token "SAS_TOKEN_VALUE" +``` + +Omit `--sas_token` for public storage. If no SAS token is available for private storage, +fall back to `az storage blob download --auth-mode login` for each clip. + +Use a different seed or strategy than gold to avoid overlap with gold source clips. + +Clear the toolkit's trapping source directory and copy source clips there: + +```powershell +$trapSrc = "REPO_ROOT\src\trapping_clips_assets\source" +$trapOut = "REPO_ROOT\src\trapping_clips_assets\output" +Get-ChildItem $trapSrc -File | Remove-Item -Force +if (Test-Path $trapOut) { Get-ChildItem $trapOut -File | Remove-Item -Force } +Copy-Item "RATING_CLIPS_PATH\trapping_source\*.wav" $trapSrc -Force +``` + +Select the correct trapping config: + +| Study method | Config file | +|-------------|-------------| +| `acr` | `configurations\trapping.cfg` or `configurations\trapping_p835.cfg` | +| `p835` | `configurations\trapping.cfg` or `configurations\trapping_p835.cfg` | +| `echo_impairment_test` | `configurations\trapping.cfg` | +| `p804` | `configurations\trapping_p804.cfg` | + +DCR and CCR do not use generated trapping clips — see the legacy note in +"Supported test methods". For these methods, skip this section entirely and use +reference clips as the `trapping_clips.csv` (column: `trapping_clips` only). + +Run the trapping clip generator: + +```powershell +Set-Location REPO_ROOT\src +python create_trapping_stimuli.py ` + --cfg configurations\TRAPPING_CONFIG +``` + +Output goes to `trapping_clips_assets\output\`. The report is at +`trapping_clips_assets\output\output_report.csv` with columns `trapping_ans`, `trapping_clips`. + +Prepare for upload — use a **random subdirectory name** (not `trapping`): + +```powershell +python utils\copy_to_pub_storage.py upload ` + --input "trapping_clips_assets\output\output_report.csv" ` + --columns trapping_clips ` + --local-dir "trapping_clips_assets\output" ` + --account-name STORAGE_ACCOUNT_NAME ` + --target-container TARGET_CONTAINER ` + --dest-path PROJECT_NAME/RANDOM_SUBDIR +``` + +Copy the public CSV as `trapping_clips.csv` next to the rating clips. + +### 6b. Review generated clips + +**[ASK]** After generating and uploading gold, trapping, and training clips, ask: +"Gold, trapping, and training clips are generated and uploaded. Would you like to +review them before I run the master script, or should I continue?" + +### 7. Create the project config + +Create a `.cfg` file next to the input CSVs with the project name. + +Template (values **unquoted**): + +```ini +[create_input] +number_of_clips_per_session:10 +number_of_trapping_per_session:1 +number_of_gold_clips_per_session:GOLD_PER_SESSION +clip_packing_strategy: random + +[hit_app_html] +allowed_max_hit_in_project:COMPUTED_VALUE +bw_min: FB +bw_max: FB +hit_base_payment:0.5 +quantity_hits_more_than: COMPUTED_VALUE +quantity_bonus: 0.1 +quality_top_percentage: 20 +quality_bonus: 0.15 +contact_email:USER_PROVIDED_EMAIL +``` + +**Key rules:** +- `number_of_gold_clips_per_session` = **2 for P.804**, 1 for others. +- `bw_min` defaults to `FB`. Valid: `NB-WB`, `SWB`, `FB`. +- `contact_email` = user-provided. Never hardcode. +- `allowed_max_hit_in_project` = `BEST_PRACTICE_ALLOWED_MAX_HITS`. +- `quantity_hits_more_than` ≈ `floor(total_sessions / 2)`, at least 2. + +### 8. Run the master script + +Always include `--check_urls` and `--create_local_test` flags. URL checking validates +that all clip URLs are accessible and catches broken links before publishing. The local +test generates a preview HTML file for visual inspection. + +`--check_urls` may be skipped **only** if this is a re-run and the URLs were already +validated in a previous run (e.g. when re-running due to a config change). + +```powershell +Set-Location RATING_CLIPS_PATH +python REPO_ROOT\src\master_script.py ` + --project PROJECT_NAME ` + --method METHOD ` + --cfg PROJECT_CONFIG.cfg ` + --clips rating_clips.csv ` + --training_clips training_clips.csv ` + --gold_clips gold_clips.csv ` + --trapping_clips trapping_clips.csv ` + --check_urls ` + --create_local_test +``` + +For **P.804** and **pp835**, also pass `--training_gold_clips` if a training gold clips +CSV was provided or generated in step 4b: + +```powershell +python REPO_ROOT\src\master_script.py ` + --project PROJECT_NAME ` + --method p804 ` + --cfg PROJECT_CONFIG.cfg ` + --clips rating_clips.csv ` + --training_gold_clips training_gold_clips.csv ` + --gold_clips gold_clips.csv ` + --trapping_clips trapping_clips.csv ` + --check_urls ` + --create_local_test +``` + +Note: when `--training_gold_clips` is used, the `--training_clips` flag is **not** +needed — training clips are embedded in the training gold CSV. + +**Notes:** + +- Use **full absolute paths** for all arguments to avoid path resolution issues. +- The working directory should be the folder containing the input CSVs so that the + project output directory is created there. +- Supported `--method` values: `acr`, `dcr`, `ccr`, `p835`, `echo_impairment_test`, + `pp835`, `p804`. +- If `quantity_hits_more_than` triggers a warning, update the config file with the + suggested value and re-run. + +### 9. Verify the generated project artifacts + +The output project directory (`PROJECT_NAME\`) should contain: + +| File | Purpose | +|------|---------| +| `PROJECT_NAME_METHOD.html` | HIT app (HTML) for the crowd platform | +| `PROJECT_NAME_publish_batch.csv` | Session data with clip URLs for publishing | +| `PROJECT_NAME_METHOD_result_parser.cfg` | Config for `result_parser.py` when analyzing results | +| `url_mapping.csv` | Mapping of original (private/local) URLs to final public URLs | + +Verify: + +1. All three files exist. +2. The publish batch CSV has the expected number of rows (sessions). +3. The HTML file is non-empty. + +### 9b. Generate URL mapping CSV + +If clips were copied from private to public storage, generate `url_mapping.csv` in the +project output directory mapping every original URL to its public URL. + +Columns: `original_url`, `public_url`, `clip_type` (one of: `rating`, `gold`, +`trapping`, `training`, `training_gold`). + +Include all clip types that were uploaded or copied. For clips already public (e.g. +training gold clips), set `original_url = public_url`. + +Save as `PROJECT_NAME\url_mapping.csv`. + +### 10. Clean up temporary files + +**Always remove:** +- `tmp_gold.csv` (debug artifact from `master_script.py`). +- Downloaded source clips directories (`gold_source\`, `trapping_source\`). +- Toolkit trapping directories (`src\trapping_clips_assets\source\*.wav`, + `src\trapping_clips_assets\output\*`). + +**[ASK]** Ask whether to also remove local `gold_output\` (clips are already uploaded). + +### 11. Handoff + +**Upload status**: If the `upload` mode was used, gold and trapping clips are already +uploaded and publicly accessible. If `upload-local` was used as a fallback (no `az` CLI), +remind the requester to run the azcopy commands before publishing the study. + +**Handoff checklist:** + +1. The project directory with all three artifacts. +2. The config file used (saved next to input CSVs for future re-runs). +3. The azcopy commands for uploading generated clips (if applicable). +4. The method and scale used. +5. Any warnings or deviations from the documented flow. +6. Instructions for the requester to publish on their chosen platform: + - **Prolific**: follow the team's Prolific workflow or `docs\running_test_prolific.md`. + - **AMT**: follow `docs\running_test_mturk.md`. + +## Utility scripts reference + +| Script | Purpose | +|--------|---------| +| `src\utils\download_clips.py` | Download clips from URLs in a CSV to local directory | +| `src\utils\select_training_clips.py` | Select N evenly-spaced training clips from rating clips | +| `src\utils\copy_to_pub_storage.py` | Upload clips to Azure Blob Storage (direct via `az login`) or prepare azcopy commands | +| `src\utils\preview_html.py` | Generate local preview HTML from master script output | +| `src\create_gold_clips.py` | Generate gold standard clips from clean source WAVs | +| `src\create_trapping_stimuli.py` | Generate trapping stimuli by overlaying messages on source clips | diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md new file mode 100644 index 0000000..9d929da --- /dev/null +++ b/.github/copilot-instructions.md @@ -0,0 +1,92 @@ +# Copilot Instructions + +## Code Style and Formatting + +### Function Documentation + +#### Python + +Every function **must** include a docstring **inside** the function body, immediately after the +`def` line. Use the following format: + +```python +def calculate_mos(ratings, num_subjects): + """ + Calculate the Mean Opinion Score from a list of ratings. + + :param ratings: List of numeric ratings. + :param num_subjects: Number of subjects who provided ratings. + :return: The computed MOS value as a float. + """ + ... +``` + +#### JavaScript + +Every function **must** have a JSDoc comment **above** the function declaration: + +```javascript +/** + * Calculate the Mean Opinion Score from a list of ratings. + * + * @param {number[]} ratings - List of numeric ratings. + * @param {number} numSubjects - Number of subjects who provided ratings. + * @returns {number} The computed MOS value. + */ +function calculateMos(ratings, numSubjects) { + ... +} +``` + +### Python Spacing and Syntax + +Follow [PEP 8](https://peps.python.org/pep-0008/) conventions: + +- Use **4 spaces** per indentation level. Do **not** use tabs. +- Surround top-level function and class definitions with **two blank lines**. +- Surround method definitions inside a class with **one blank line**. +- Use **spaces around operators** (`=`, `+=`, `==`, `!=`, `<`, `>`, `in`, `not in`, etc.). +- **No spaces** immediately inside parentheses, brackets, or braces: + - ✅ `func(a, b)` — ❌ `func( a, b )` + - ✅ `data[0]` — ❌ `data[ 0 ]` + - ✅ `{'key': value}` — ❌ `{ 'key' : value }` +- Place **one space after commas** in argument lists, collections, and imports. +- **No trailing whitespace** on any line. +- Keep lines to a maximum of **120 characters**. +- Use **snake_case** for functions and variables, **PascalCase** for classes, and **UPPER_SNAKE_CASE** for constants. +- Imports should be grouped in the following order, separated by a blank line: + 1. Standard library imports + 2. Third-party imports + 3. Local / project imports + +### Line Endings + +All files in this repository **must** use **CRLF** (`\r\n`) line endings, not LF (`\n`). +Configure your editor and Git accordingly: + +``` +git config core.autocrlf true +``` + +Or use a `.gitattributes` file: + +``` +* text=auto eol=crlf +``` + +### Indentation + +- **Python and JavaScript source files**: use **spaces** (4 spaces per level). +- **Markdown files**: use **tabs** (1 tab per level). + +## Custom Agents + +This repository provides custom agents in `.github/agents/`. Use `/agent` in Copilot CLI +to browse and select them, or reference them by name in a prompt. + +| Agent | Purpose | Example prompts | +|-------|---------|-----------------| +| `create-study` | Create subjective speech quality tests (ACR, DCR, CCR, P.835, P.804) | "create a study", "run a P.804 test", "set up a P.835 study" | +| `analyze-results` | Analyze crowdsourced test results — data cleaning, MOS aggregation | "analyze results", "parse results", "evaluate the study" | + +See [`AGENTS.md`](../AGENTS.md) for full details and trigger phrases. diff --git a/.github/instructions/language.instructions.md b/.github/instructions/language.instructions.md new file mode 100644 index 0000000..e7b0ba6 --- /dev/null +++ b/.github/instructions/language.instructions.md @@ -0,0 +1,69 @@ +# Language and grammar correction rules + +When editing this repository, improve language quality in a safe, minimal, and non-breaking way. + +## Goal +Correct grammar, spelling, punctuation, clarity, and consistency in: +- Markdown documentation +- README files +- comments +- docstrings +- code examples written as prose +- HTML visible text +- HTML accessibility text such as `alt`, `title`, `aria-label`, and `placeholder` +- user-facing messages in code, only when the meaning is clearly preserved + +## Do not change +Do not modify any of the following unless explicitly asked: +- program logic or behavior +- variable names +- function names +- class names +- file names +- import paths +- URLs +- API names +- selectors +- IDs +- keys in JSON, YAML, or objects +- database fields +- CLI flags +- commands +- test expectations +- code formatting unrelated to the language fix + +## Editing rules +- Prefer the smallest safe diff. +- Preserve the original meaning. +- Preserve the existing tone unless it is clearly confusing or unprofessional. +- Keep technical terminology unchanged. +- Keep product names, library names, and framework names unchanged. +- Do not rewrite text just for style preference. +- Do not make speculative edits. +- Do not “improve” wording inside code identifiers or structured data. +- Do not touch generated files. + +## Code-specific rules +For Python and JavaScript: +- Fix grammar in comments, docstrings, help text, and clearly user-facing strings. +- Do not alter executable code unless required for a grammar fix in a user-facing string and the change is behavior-safe. +- Do not rename symbols to improve wording. + +For HTML: +- Fix visible text and accessibility-related text. +- Do not change class names, IDs, data attributes, script content, or linked resource paths. +- Do not change markup structure unless needed to correct broken visible text. + +## Documentation rules +For Markdown and docs: +- Correct grammar, spelling, punctuation, headings, and sentence clarity. +- Keep meaning, structure, and technical accuracy intact. +- Preserve code blocks exactly unless explicitly asked to edit them. + +## Output behavior +When asked to perform these fixes: +- first identify the text issues +- then apply the corrections +- keep the diff minimal +- summarize what was changed +- call out anything ambiguous instead of guessing \ No newline at end of file diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000..e609cbd --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,26 @@ +# Agents Guide + +## For AI Agents (GitHub Copilot, Claude, etc.) + +Before making any changes to this repository, **you must read and follow** the instructions in +[`.github/copilot-instructions.md`](.github/copilot-instructions.md). + +That file contains the canonical coding standards, formatting rules, and documentation requirements +for this project. All code contributions — whether from humans or AI agents — must conform to those +guidelines. + +## Custom agents + +This repository defines custom agents in `.github/agents/`. Use `/agent` in Copilot CLI +to browse and select them, or reference them by name in a prompt. + +| Agent | File | Trigger phrases | +|-------|------|-----------------| +| `create-study` | [`.github/agents/create-study.agent.md`](.github/agents/create-study.agent.md) | "create a study", "run a [method] test", "set up a study", "prepare a test" | +| `analyze-results` | [`.github/agents/analyze-results.agent.md`](.github/agents/analyze-results.agent.md) | "analyze results", "parse results", "evaluate the study", "process the answers" | + +## Task-specific instructions (future) + +| Task type | Instruction file | Trigger phrases | +|-----------|-----------------|-----------------| +| *(reserved)* | — | — | diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..370b58c --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,16 @@ +# Claude Code Instructions + +Before making any changes to this repository, read and follow the coding standards in +[`.github/copilot-instructions.md`](.github/copilot-instructions.md). + +## Custom Agents + +This repository defines reusable agent runbooks in `.github/agents/`. +When a user asks to create a study, run a test, or set up a subjective quality experiment, +follow the instructions in the relevant agent file. + +| Agent | File | Trigger phrases | +|-------|------|-----------------| +| `create-study` | [`.github/agents/create-study.agent.md`](.github/agents/create-study.agent.md) | "create a study", "run a [method] test", "set up a study", "prepare a test" | + +When triggered, read the full agent file and execute its workflow step by step. diff --git a/README.md b/README.md index 08c15cc..ed3b6d8 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ # P.808 Toolkit The P.808 Toolkit is a software package that enables users to run subjective speech quality assessment test -in Amazon Mechanical Turk (AMT) crowdsourcing platform, according to the ITU-T Recommendation P.808. +in crowdsourcing platforms like Amazon Mechanical Turk (AMT), Prolific, or conduct remote testing with a dedicated panel of workers, according to the ITU-T Recommendation P.808. It includes following test methods: * Absolute Category Rating (ACR) -- Annex A, P.808 * Degradation Category Ratings (DCR) -- Annex B, P.808 @@ -12,9 +12,9 @@ It also extends P.808 in the following ways: * Includes implementation of the ITU-T Rec. P.831 for the crowdsourcing approach is also provided based on the recommendations given in the ITU-T Rec. P.808. -* **NEW** - Multi-dimensional Speech Quality Assessment - Following the ITU-T Rec. P.804 and extending it with reverberation, signal and overall quality. +* Multi-dimensional Speech Quality Assessment - Following the ITU-T Rec. P.804 and extending it with reverberation, signal and overall quality. -* **NEW** - Extending P.835 test to evaluate personalized noise suppression +* Extending P.835 test to evaluate personalized noise suppression Relevant ITU-T Recommendations are : @@ -85,12 +85,31 @@ If you use this tool in your research please cite it with the following referenc ## Getting Started * [Preparation](docs/preparation.md) -* [Running the Test on Amazon Mechanical Turk](docs/running_test_mturk.md) +* Running the Test on crowdsourcing platform + * [Using Amazon Mechanical Turk](docs/running_test_mturk.md) + * [Using Prolific](docs/running_test_prolific.md) * [Analyzing Data](docs/results.md) +## Using an AI Agent + +You can use an AI coding agent (e.g. GitHub Copilot, Claude) to create and run studies +automatically. The agent will generate gold clips, trapping clips, training clips, upload +them to Azure storage, and build the complete project — all from a single prompt. + +**Setup**: This experince is trilored to use Azure Storage, if you use any other cloud provider for serving your clips, adapt the code accordingly.Otherwise, make sure you have `az login` configured with write access to your Azure Blob +Storage account. + +**Usage**: Open the repository in your IDE with an AI agent and ask it to create a study: + +> _"Run a P.835 test for the files in C:\path\to\my\rating_clips"_ + +The agent uses the `create-study` custom agent defined in [`.github/agents/create-study.agent.md`](.github/agents/create-study.agent.md). +Select it via `/agent` in Copilot CLI, or just describe your task — the model will auto-infer the +right agent. See [`AGENTS.md`](AGENTS.md) for the full list of available agents. + +**Supported methods**: `acr`, `dcr`, `ccr`, `p835`, `p804`, `echo_impairment_test`, `pp835`. + -## News -++ An update with support for [multi-dimensional quality assessment](https://arxiv.org/pdf/2309.07385.pdf) is published. ## Troubleshooting For bug reports and issues with this code, please see the diff --git a/docs/conf-trapping.md b/docs/conf-trapping.md index a8fec76..fd0c182 100644 --- a/docs/conf-trapping.md +++ b/docs/conf-trapping.md @@ -9,14 +9,14 @@ ## `[trappings]` `input_directory = trapping clips`: path pointing to the `trapping clips` directory. It is relative to the current working directory. The directory should contain following subdirectories: - * `source`: it should contains fair distributions of clips from your dataset under study. First couple of seconds from + * `source`: it should contain a fair distribution of clips from your dataset under study. The first couple of seconds from each clip in this directory will be used to generate the trapping clips. * `messages`: message clips found in this directory will be appended to first couple of seconds from each clip in the `source` directory to create the trapping clips. * `output`: generated trapping clips will be stored here. `message_file_prefix:ACR_`: specify prefix of audio clips available in `source` directory which should be used. - Use `ACR_` for the P.808 tests and `p835_score_` for the P.835 tests. The speech level of clips started by `adj_*` are + Use `adj_p835_score` for the P.808 ACR, DCR and P.835 tests, `p804_score_` for the P.804 tests and `adj_ccr_score_` for P.808 CCR test. The speech level of clips started by `adj_*` are adjusted to -26dBov. **One** of the following options should be used: diff --git a/docs/conf_master.md b/docs/conf_master.md index 6ef4194..404119d 100644 --- a/docs/conf_master.md +++ b/docs/conf_master.md @@ -1,82 +1,82 @@ -[Home](../README.md) > [Preparation](preparation.md) > [Preparation for Absolute Category Rating (ACR)](prep_acr.md) - -# Configure for `master_script.py` - -This describes the configuration for the `master_script.py`. A sample configuration file can be found in [`configurations\master.cfg`](.\src\configurations\master.cfg). - -## `[create_input]` - -* `number_of_clips_per_session:10`: Number of clips from "rating_clips" to be included in the "Rating section" of each HIT/listening session. -* `number_of_trapping_per_session:1`: Number of trapping questions to be included in the "Rating section". -* `number_of_gold_clips_per_session:1`:Number of gold clips to be included in the "Rating section". -* (optional) `condition_pattern:`: Specifies a regex to extract the condition name from the clip URL. example: -Assuming the URL is `htttp://test.com/D501_C03_M2_S02.wav` is the clip URL, and "03" is the condition name. -The pattern will be `.*_c(?P\d{1,2})_.*.wav`, you should also use condition_keys with `condition_num`. -* (optional) `condition_keys:` comma separated list of keys appearing in the `condition_pattern`: -* (optional) `clip_packing_strategy:random`: Either `random` or `balanced_block`. It specifies How to select clips -which will be assessed in a same HIT. For the `balanced_block` design, `condition_pattern`, `condition_pattern`, and - `condition_pattern` should be specified. `number_of_clips_per_session` should be a multiple of the unique values of the - key specified in the `block_keys`. -* (optional) `block_keys:`: The key(s) to be used for creating the blocks should be specified here.Up to two keys. -A comma separated list. For multiple keys, all values of the first key should appear in one block. - - -## `[hit_app_html]` -* `cookie_name:itu_p808_sup23_exp3`: A cookie with this name will be used to store the current state of a worker in this project. - Key attributes like number of assignments answered by the worker, if the training or setup sections are needed. - It is a project specific value. -* `qual_cookie_name:ACR_LISTENER_19_12_2019`: A cookie with this name will show if the user passed the Qualification section. -The cookies expires after 1 month. If a worker could not successfully pass the Qualification section, they will see the -following message next time they want to perform a HIT from this group: - ````text - There is no assignments that match to your profile now. Please try it again in two-weeks time. - We thank you for your participation. - ```` -* `allowed_max_hit_in_project:60`: Number of assignments that one worker can perform from this project. -* `hit_base_payment:0.5`: Base payment for an accepted assignment from this HIT. This value will be used as information. -* `quantity_hits_more_than: 30`: Defines the necessary hits required for quantity bonus. +[Home](../README.md) > [Preparation](preparation.md) > [Preparation for Absolute Category Rating (ACR)](prep_acr.md) + +# Configure for `master_script.py` + +This describes the configuration for the `master_script.py`. A sample configuration file can be found in [`configurations\master.cfg`](.\src\configurations\master.cfg). + +## `[create_input]` + +* `number_of_clips_per_session:10`: Number of clips from "rating_clips" to be included in the "Rating section" of each HIT/listening session. +* `number_of_trapping_per_session:1`: Number of trapping questions to be included in the "Rating section". +* `number_of_gold_clips_per_session:1`: Number of gold clips to be included in the "Rating section". +* (optional) `condition_pattern:`: Specifies a regex to extract the condition name from the clip URL. example: +Assuming the URL is `http://test.com/D501_C03_M2_S02.wav` is the clip URL, and "03" is the condition name. +The pattern will be `.*_c(?P\d{1,2})_.*.wav`, you should also use condition_keys with `condition_num`. +* (optional) `condition_keys:` comma separated list of keys appearing in the `condition_pattern`: +* (optional) `clip_packing_strategy:random`: Either `random` or `balanced_block`. It specifies how to select clips +which will be assessed in a same HIT. For the `balanced_block` design, `condition_pattern`, `condition_pattern`, and + `condition_pattern` should be specified. `number_of_clips_per_session` should be a multiple of the unique values of the + key specified in the `block_keys`. +* (optional) `block_keys:`: The key(s) to be used for creating the blocks should be specified here. Up to two keys. +A comma separated list. For multiple keys, all values of the first key should appear in one block. + + +## `[hit_app_html]` +* `cookie_name:itu_p808_sup23_exp3`: A cookie with this name will be used to store the current state of a worker in this project. + Key attributes like number of assignments answered by the worker, if the training or setup sections are needed. + It is a project specific value. +* `qual_cookie_name:ACR_LISTENER_19_12_2019`: A cookie with this name will show if the user passed the Qualification section. +The cookie expires after 1 month. If a worker could not successfully pass the Qualification section, they will see the +following message next time they want to perform a HIT from this group: + ````text + There is no assignments that match to your profile now. Please try it again in two-weeks time. + We thank you for your participation. + ```` +* `allowed_max_hit_in_project:60`: Number of assignments that one worker can perform from this project. +* `hit_base_payment:0.5`: Base payment for an accepted assignment from this HIT. This value will be used as information. +* `quantity_hits_more_than: 30`: Defines the necessary hits required for quantity bonus. * `quantity_bonus: 0.1`: The amount of the quantity bonus to be paid for each accepted assignment. -* `quality_top_percentage: 20`: Defines when quality bonus should be applied (in addition, participant should be -eligible for quantity bonus). -* `quality_bonus: 0.15`: the amount of the quality bonus per accepted assignment. -* `bw_min: FB `: minimum bandwidth that participants playback should support, can be "NB-WB", "SWB", "FB" -* `bw_max: FB `: maximum bandwidth that participants playback should support, can be "NB-WB", "SWB", "FB" - - -## `[acr_html]` or `[p835_html]` _deprecated_ -* `cookie_name:itu_p808_sup23_exp3`: A cookie with this name will be used to store the current state of a worker in this project. - Key attributes like number of assignments answered by the worker, if the training or setup sections are needed. - It is a project specific value. -* `qual_cookie_name:ACR_LISTENER_19_12_2019`: A cookie with this name will show if the user passed the Qualification section. -The cookies expires after 1 month. If a worker could not successfully pass the Qualification section, they will see the -following message next time they want to perform a HIT from this group: - ````text - There is no assignments that match to your profile now. Please try it again in two-weeks time. - We thank you for your participation. - ```` -* `allowed_max_hit_in_project:60`: Number of assignments that one worker can perform from this project. -* `hit_base_payment:0.5`: Base payment for an accepted assignment from this HIT. This value will be used as information. -* `quantity_hits_more_than: 30`: Defines the necessary hits required for quantity bonus. +* `quality_top_percentage: 20`: Defines when quality bonus should be applied (in addition, participant should be +eligible for quantity bonus). +* `quality_bonus: 0.15`: The amount of the quality bonus per accepted assignment. +* `bw_min: FB `: minimum bandwidth that participants playback should support, can be "NB-WB", "SWB", "FB" +* `bw_max: FB `: maximum bandwidth that participants playback should support, can be "NB-WB", "SWB", "FB" + + +## `[acr_html]` or `[p835_html]` _deprecated_ +* `cookie_name:itu_p808_sup23_exp3`: A cookie with this name will be used to store the current state of a worker in this project. + Key attributes like number of assignments answered by the worker, if the training or setup sections are needed. + It is a project specific value. +* `qual_cookie_name:ACR_LISTENER_19_12_2019`: A cookie with this name will show if the user passed the Qualification section. +The cookie expires after 1 month. If a worker could not successfully pass the Qualification section, they will see the +following message next time they want to perform a HIT from this group: + ````text + There is no assignments that match to your profile now. Please try it again in two-weeks time. + We thank you for your participation. + ```` +* `allowed_max_hit_in_project:60`: Number of assignments that one worker can perform from this project. +* `hit_base_payment:0.5`: Base payment for an accepted assignment from this HIT. This value will be used as information. +* `quantity_hits_more_than: 30`: Defines the necessary hits required for quantity bonus. * `quantity_bonus: 0.1`: The amount of the quantity bonus to be paid for each accepted assignment. -* `quality_top_percentage: 20`: Defines when quality bonus should be applied (in addition, participant should be -eligible for quantity bonus). -* `quality_bonus: 0.15`: the amount of the quality bonus per accepted assignment. - -## `[dcr_ccr_html]` _deprecated_ -* `cookie_name:itu_p808_sup23_exp3`: A cookie with this name will be used to store current state of a worker in this project. - Key attributes like number of assignments answered by the worker, if the training or setup sections are needed. - It is a project specific value. -* `qual_cookie_name:ACR_LISTENER_19_12_2019`: A cookie with this name will show if the user passed the Qualification section. -The cookies expires after 1 month. If a worker could not successfully pass the Qualification section, they will see the -following message next time they want to perform a HIT from this group: - ````text - There is no assignments that match to your profile now. Please try it again in two-weeks time. - We thank you for your participation. - ```` -* `allowed_max_hit_in_project:60`: Number of assignments that one worker can perform from this project. -* `hit_base_payment:0.5`: Base payment for an accepted assignment from this HIT. This value will be used as information. -* `quantity_hits_more_than: 30`: Defines when quantity bonus requirement. -* `quantity_bonus: 0.1`: the amount of the quantity bonus to be paid for each accepted assignment. -* `quality_top_percentage: 20`: Defines when quality bonus should be applied (in addition, participant should be -eligible for quantity bonus). -* `quality_bonus: 0.15`: the amount of the quality bonus per accepted assignment. +* `quality_top_percentage: 20`: Defines when quality bonus should be applied (in addition, participant should be +eligible for quantity bonus). +* `quality_bonus: 0.15`: The amount of the quality bonus per accepted assignment. + +## `[dcr_ccr_html]` _deprecated_ +* `cookie_name:itu_p808_sup23_exp3`: A cookie with this name will be used to store current state of a worker in this project. + Key attributes like number of assignments answered by the worker, if the training or setup sections are needed. + It is a project specific value. +* `qual_cookie_name:ACR_LISTENER_19_12_2019`: A cookie with this name will show if the user passed the Qualification section. +The cookie expires after 1 month. If a worker could not successfully pass the Qualification section, they will see the +following message next time they want to perform a HIT from this group: + ````text + There is no assignments that match to your profile now. Please try it again in two-weeks time. + We thank you for your participation. + ```` +* `allowed_max_hit_in_project:60`: Number of assignments that one worker can perform from this project. +* `hit_base_payment:0.5`: Base payment for an accepted assignment from this HIT. This value will be used as information. +* `quantity_hits_more_than: 30`: Defines the quantity bonus requirement. +* `quantity_bonus: 0.1`: The amount of the quantity bonus to be paid for each accepted assignment. +* `quality_top_percentage: 20`: Defines when quality bonus should be applied (in addition, participant should be +eligible for quantity bonus). +* `quality_bonus: 0.15`: The amount of the quality bonus per accepted assignment. diff --git a/docs/gold_clips.md b/docs/gold_clips.md new file mode 100644 index 0000000..9b55984 --- /dev/null +++ b/docs/gold_clips.md @@ -0,0 +1,117 @@ +[Home](../README.md) > [Preparation](preparation.md) > Gold Standard Clips + +# Gold Standard Clips + +Gold standard clips (gold clips) are hidden quality control items inserted into each listening session. +Their expected answers are known in advance and are used to verify that participants are paying attention +and rating consistently. A wrong answer to a gold clip may result in rejection of the submission. + +## Overview + +Gold clips should have quality so obvious that all attentive participants rate them correctly +(+/- 1 deviation is accepted). It is recommended to include clips at both extremes: excellent +quality (answer 5) and clearly degraded quality (answer 1). + +## Generating Gold Clips + +The `create_gold_clips.py` script automatically generates gold clips from a set of clean source +audio files. For each source file, it creates multiple degraded versions targeting different +quality dimensions. + +### Usage + +```bash +cd src +python create_gold_clips.py ^ + --input_dir path/to/clean_clips ^ + --output_dir path/to/gold_output ^ + --method acr +``` + +### Arguments + +| Argument | Required | Default | Description | +|----------|----------|---------|-------------| +| `--input_dir` | Yes | — | Directory containing clean source WAV files. | +| `--output_dir` | Yes | — | Directory where gold clips and the report CSV will be saved. | +| `--method` | Yes | — | Test method: `acr`, `p835`, or `p804`. | +| `--snr_db` | No | -5.0 | Signal-to-noise ratio in dB for background noise. Lower = more noise. | +| `--clip_threshold` | No | 0.005 | Hard clipping threshold as a fraction of peak amplitude. Lower = more distortion. | +| `--no_anonymize` | No | False | Use descriptive filenames instead of random anonymous names. | + +By default, output filenames are randomized so that quality cannot be inferred from the filename. +Use `--no_anonymize` for debugging and listening verification. + +### Output + +The script produces: +- Degraded WAV files in the output directory. +- `gold_clips_report.csv` with filenames and expected answers in the format required by `master_script.py`. + +## Degradation Types by Method + +### ACR + +| Type | Description | Expected answer | +|------|-------------|-----------------| +| Clean | Unmodified source clip | 5 | +| Background noise | Pink noise at -5 dB SNR | 1 | +| Signal distortion | Hard clipping | 1 | +| Both | Clipping + noise | 1 | + +**Output CSV columns:** `gold_clips`, `gold_clips_ans` + +### P.835 + +For P.835, each gold clip targets specific perceptual dimensions (signal, background, overall). + +| Type | Description | sig | bak | ovrl | +|------|-------------|-----|-----|------| +| Clean | Unmodified source clip | 5 | 5 | 5 | +| Background noise | Pink noise at -5 dB SNR | 5 | 1 | 1 | +| Signal distortion | Hard clipping | 1 | 4 | 1 | +| Both | Clipping + noise | 1 | 1 | 1 | + +**Note:** Signal distortion introduces harmonic content that can be perceived as mild background +degradation, so `bak` is set to 4 rather than 5 for that type. + +**Output CSV columns:** `gold_clips`, `gold_sig_ans`, `gold_bak_ans`, `gold_ovrl_ans` + +### P.804 + +For P.804, gold clips target multiple quality dimensions. Only dimensions with an expected answer +of 1 are listed in the CSV; empty cells mean the dimension is not targeted (implicitly 5). + +| Type | Description | col | disc | loud | noise | sig | ovrl | +|------|-------------|-----|------|------|-------|-----|------| +| Clean | Unmodified source clip | 5 | 5 | 5 | 5 | 5 | 5 | +| Background noise | Pink noise | | | | 1 | | 1 | +| Signal distortion | Hard clipping | | | | | 1 | 1 | +| Discontinuity | Random segment dropouts (choppy) | | 1 | | | 1 | 1 | +| Discontinuity + noise | Choppy + noise | | | | 1 | 1 | 1 | +| Coloration | Resonant/muffled/telephone filter | 1 | | | | 1 | 1 | +| Coloration + noise | Coloration + noise | | | | 1 | 1 | 1 | +| Distortion + noise | Clipping + noise | | | | 1 | 1 | 1 | +| Loudness | Too loud (+25 dB) or too quiet (-25 dB) | | | 1 | | | 1 | +| Loudness + distortion | Loudness + clipping | | | 1 | | 1 | 1 | +| Loudness + noise | Loudness + noise | | | 1 | 1 | | 1 | + +**Note:** When distortion is combined with noise, only `sig` and `noise` are flagged because the +specific type of underlying distortion is not clearly distinguishable to raters. + +**Output CSV columns:** `gold_clips`, `col_ans`, `disc_ans`, `loud_ans`, `noise_ans`, `reverb_ans`, `sig_ans`, `ovrl_ans` + +## Source Clip Recommendations + +- Use clean speech clips with no background noise or distortion. +- Include a variety of speakers (male and female). +- Clips should be representative of the duration used in the test (typically 2–5 seconds). +- A minimum of 4–6 source clips is recommended to generate a diverse gold clip set. + +## Using Gold Clips with master_script.py + +After generating gold clips: + +1. Upload the generated WAV files to a cloud server. +2. Update the `gold_clips` column in `gold_clips_report.csv` with the hosted URLs. +3. Pass the updated CSV as `--gold_clips` to `master_script.py`. diff --git a/docs/prep_acr.md b/docs/prep_acr.md index 8490f8b..7cb5b30 100644 --- a/docs/prep_acr.md +++ b/docs/prep_acr.md @@ -11,16 +11,16 @@ For all the resource files (steps 1-4) an example is provided in `src/test_input column named `rating_clips` (see [rating_clips.csv](../src/test_inputs/rating_clips.csv) as an example). **Note about file names**: - * Later in the analyzes, clip's file name will be used as a unique key and appears in the results. - * In case you have 'conditions' which are represented with more than one clip, you may consider to use the condition's - name in the clip's file name e.g. xxx_c01_xxxx.wav. When you provide the corresponding pattern, the analyzes script + * Later in the analysis, the clip's file name will be used as a unique key and appears in the results. + * In case you have 'conditions' which are represented with more than one clip, you may consider using the condition's + name in the clip's file name e.g. xxx_c01_xxxx.wav. When you provide the corresponding pattern, the analysis script will create aggregated results over conditions as well. The name pattern can also be used for creating clip sets using `balanced_block` design. 1. Upload your **training clips** in a cloud server and create `training_clips.csv` file which contains all URLs in a column named `training_clips` (see [training_clips.csv](../src/test_inputs/training_clips.csv) as an example). - **Hint**: Training clips are used for anchoring participants perception, and should represent the entire dataset. + **Hint**: Training clips are used for anchoring participants' perception, and should represent the entire dataset. They should approximately cover the range from worst to best quality to be expected in the test. It may contain about 5 clips. @@ -31,20 +31,21 @@ column named `gold_clips` and expected answer to each clip in a column named `go **Hint**: Gold standard clips are used as a hidden quality control item in each session. It is expected that their answers are so obvious for all participants that they all give the `gold_clips_ans` rating (+/- 1 deviation is accepted). It is recommended to use clips with excellent (answer 5) or very bad (answer 1) quality. + You can use `create_gold_clips.py` to generate gold clips automatically. See [Gold Standard Clips](gold_clips.md) for details. 1. Create trapping stimuli set for your dataset. 1. Configure the `create_trapping_stimuli.py` in your config file. See [configuration of create_trapping_stimuli script ](conf-trapping.md) for more information. - 2. Delete all files from `trapping clips\source` directory + 2. Delete all files from `trapping_clips_assets\source` directory ``` bash - cd "src\trapping clips\source" + cd "src\trapping_clips_assets\source" del *.* ``` - 3. Add some clips from your dataset to `trapping clips\source` directory. Select clips in a way that - 1. Covers fair distributions of speakers (best couple of clips per each speaker) - 1. Covers entire range of quality (some good, fair and bad ones) + 3. Add some clips from your dataset to `trapping_clips_assets\source` directory. Select clips in a way that + 1. Covers a fair distribution of speakers (best couple of clips per each speaker) + 1. Covers the entire range of quality (some good, fair, and bad ones) 4. Run `create_trapping_stimuli.py` ``` bash @@ -52,8 +53,8 @@ column named `gold_clips` and expected answer to each clip in a column named `go python create_trapping_stimuli.py ^ --cfg your_config_file.cfg ``` - 5. Trapping clips are stored in `trapping clips\output` directory. List of clips and their correct answer can - be found in `trapping clips\source\output_report.csv`. You can replace file names (appears in column named `trapping_clips`) + 5. Trapping clips are stored in `trapping_clips_assets\output` directory. List of clips and their correct answer can + be found in `trapping_clips_assets\output\output_report.csv`. You can replace file names (appears in column named `trapping_clips`) with the URLs pointing to those files to create the `trapping_clips.csv` file (see below). 1. Upload your **trapping clips** in a cloud server and create `trapping_clips.csv` file which contains all URLs in @@ -77,12 +78,16 @@ a column named `trapping_clips` and expected answer to each clip in a column nam --gold_clips gold_clips.csv ^ --trapping_clips trapping_clips.csv ``` + Optionally: + - Add `--check_urls` to validate that all links in the CSV files are accessible before creating the project. + - Add `--create_local_test` to generate a local preview HTML file for testing. See [preview_html](preview_html.md) for details. + Note: file paths are expected to be relative to the current working directory. - 1. Double check the outcome of the script. A folder should be created with YOUR_PROJECT_NAME in current working + 1. Double-check the outcome of the script. A folder should be created with YOUR_PROJECT_NAME in current working directory which contains: * `YOUR_PROJECT_NAME_acr.html`: Customized HIT app to be used in Amazon Mechanical Turk (AMT). * `YOUR_PROJECT_NAME_publish_batch.csv`: List of dynamic content to be used during publishing batch in AMT. * `YOUR_PROJECT_NAME_acr_result_parser.cfg`: Customized configuration file to be used by `result_parser.py` script -Now, you are ready for [Running the Test on Amazon Mechanical Turk](running_test_mturk.md). \ No newline at end of file +Now, you are ready for running the test on [Prolific](running_test_prolific.md) or [Amazon Mechanical Turk](running_test_mturk.md). \ No newline at end of file diff --git a/docs/prep_dcr_ccr.md b/docs/prep_dcr_ccr.md index 99c5142..29ca2a1 100644 --- a/docs/prep_dcr_ccr.md +++ b/docs/prep_dcr_ccr.md @@ -15,9 +15,9 @@ contains all URLs to speech clips in a column named `rating_clips` and URLs to t `references`. (see [rating_clips_ccr.csv](../src/test_inputs/rating_clips_ccr.csv) as an example). **Note about file names**: - * Later in the analyzes, clip's file name will be used as a unique key and appears in the results. - * In case you have 'conditions' which are represented with more than one clip, you may consider to use the condition's - name in the clip's file name e.g. xxx_c01_xxxx.wav. When you provide the corresponding pattern, the analyzes script + * Later in the analysis, the clip's file name will be used as a unique key and appears in the results. + * In case you have 'conditions' which are represented with more than one clip, you may consider using the condition's + name in the clip's file name e.g. xxx_c01_xxxx.wav. When you provide the corresponding pattern, the analysis script will create aggregated results over conditions as well. @@ -25,7 +25,7 @@ contains all URLs to speech clips in a column named `rating_clips` and URLs to t column named `training_clips` and URLs to corresponding reference clips in column `training_references` (see [training_clips_ccr.csv](../src/test_inputs/training_clips_ccr.csv) as an example). - **Hint**: Training clips are used for anchoring participants perception, and should represent the entire dataset. + **Hint**: Training clips are used for anchoring participants' perception, and should represent the entire dataset. They should approximately cover the range from worst to best quality to be expected in the test. It may contain about 5 clips. @@ -44,12 +44,16 @@ column named `training_clips` and URLs to corresponding reference clips in colum --clips rating_clips.csv ^ --training_clips training_clips.csv ``` + Optionally: + - Add `--check_urls` to validate that all links in the CSV files are accessible before creating the project. + - Add `--create_local_test` to generate a local preview HTML file for testing. See [preview_html](preview_html.md) for details. + Note: file paths are expected to be relative to the current working directory. - 1. Double check the outcome of the script. A folder should be created with YOUR_PROJECT_NAME in current working + 1. Double-check the outcome of the script. A folder should be created with YOUR_PROJECT_NAME in current working directory which contains: * `YOUR_PROJECT_NAME_ccr.html`: Customized HIT app to be used in Amazon Mechanical Turk (AMT). * `YOUR_PROJECT_NAME_publish_batch.csv`: List of dynamic content to be used during publishing batch in AMT. * `YOUR_PROJECT_NAME_ccr_result_parser.cfg`: Customized configuration file to be used by `result_parser.py` script -Now, you are ready for [Running the Test on Amazon Mechanical Turk](running_test_mturk.md). \ No newline at end of file +Now, you are ready for running the test on [Prolific](running_test_prolific.md) or [Amazon Mechanical Turk](running_test_mturk.md). \ No newline at end of file diff --git a/docs/prep_p804.md b/docs/prep_p804.md index cf91c99..ce7f2f9 100644 --- a/docs/prep_p804.md +++ b/docs/prep_p804.md @@ -1,4 +1,4 @@ -[Home](../README.md) > [Preparation](preparation.md) > Preparation for the P.804 (Multi-dimensional)] +[Home](../README.md) > [Preparation](preparation.md) > Preparation for P.804 (Multi-dimensional) # Preparation of P.804 test The following steps should be performed to prepare the P.804 test setup. @@ -10,9 +10,9 @@ The following steps should be performed to prepare the P.804 test setup. column named `rating_clips` (see [rating_clips.csv](../src/test_inputs/rating_clips.csv) as an example). **Note about file names**: - * Later in the analyzes, clip's file name will be used as a unique key and appears in the results. - * In case you have 'conditions' which are represented with more than one clip, you may consider to use the condition's - name in the clip's file name or in the URL e.g. xxx_c01_xxxx.wav. Latter you can use regex pattern to extract the + * Later in the analysis, the clip's file name will be used as a unique key and appears in the results. + * In case you have 'conditions' which are represented with more than one clip, you may consider using the condition's + name in the clip's file name or in the URL e.g. xxx_c01_xxxx.wav. Later, you can use a regex pattern to extract the condition identifier from the URLs. **Note on Reference Conditions** @@ -22,7 +22,7 @@ column named `rating_clips` (see [rating_clips.csv](../src/test_inputs/rating_cl 1. Upload your **training clips** in a cloud server and create `training_gold_clips.csv` file which contains all URLs in a column named `training_clips` (see [training_gold_clips.csv](../src/test_inputs/training_gold_clips_p804.csv) as an example). - **Hint**: Training clips are used for anchoring participants perception, and should represent the entire dataset. + **Hint**: Training clips are used for anchoring participants' perception, and should represent the entire dataset. They should approximately cover the range from worst to best quality to be expected in the test. In P.804, it is possible to add the correct answer, variance, and a message to be shown if the given answer is out of expected range per dimension. @@ -35,21 +35,22 @@ any given answer for that dimension will be considered to be correct. **Hint**: Gold standard clips are used as a hidden quality control item in each session. It is expected that their answers are so obvious for all participants that they all give the `*_ans` rating (+/- 1 deviation is accepted) for all dimensions. It is recommended to use clips with excellent (answer 5) or very bad - (answer 1) quality. Also clips with extreme and opposite value for multiple dimensions work best (e.g. Coloration 5 and Discontinuity 1). + (answer 1) quality. Also clips with extreme and opposite values for multiple dimensions work best (e.g. Coloration 5 and Discontinuity 1). + You can use `create_gold_clips.py` to generate gold clips automatically. See [Gold Standard Clips](gold_clips.md) for details. 1. Create trapping stimuli set for your dataset. - 1. Configure the `create_trapping_stimuli.py` in your config file. See [configuration of create_trapping_stimuli script ](conf-trapping.md) + 1. Configure the `create_trapping_stimuli.py` in your config file. See [configuration of create_trapping_stimuli script](conf-trapping.md) for more information. An example is provided in `configurations\trapping_p804.cfg`. - 2. Delete all files from `trapping clips\source` directory + 2. Delete all files from `trapping_clips_assets\source` directory ``` bash - cd "src\trapping clips\source" + cd "src\trapping_clips_assets\source" del *.* ``` - 3. Add some clips from your dataset to `trapping clips\source` directory. Select clips in a way that - 1. Covers fair distributions of speakers (best couple of clips per each speaker) - 1. Covers entire range of quality (some good, fair and bad ones) + 3. Add some clips from your dataset to `trapping_clips_assets\source` directory. Select clips in a way that + 1. Covers a fair distribution of speakers (best couple of clips per each speaker) + 1. Covers the entire range of quality (some good, fair, and bad ones) 4. Run `create_trapping_stimuli.py` ``` bash @@ -57,8 +58,8 @@ any given answer for that dimension will be considered to be correct. python create_trapping_stimuli.py ^ --cfg your_config_file.cfg ``` - 5. Trapping clips are stored in `trapping clips\output` directory. List of clips and their correct answer can - be found in `trapping clips\source\output_report.csv`. You can replace file names (appears in column named `trapping_clips`) + 5. Trapping clips are stored in `trapping_clips_assets\output` directory. List of clips and their correct answer can + be found in `trapping_clips_assets\output\output_report.csv`. You can replace file names (appears in column named `trapping_clips`) with the URLs pointing to those files to create the `trapping_clips.csv` file (see below). 1. Upload your **trapping clips** in a cloud server and create `trapping_clips.csv` file which contains all URLs in @@ -82,12 +83,16 @@ a column named `trapping_clips` and expected answer to each clip in a column nam --gold_clips gold_clips.csv ^ --trapping_clips trapping_clips.csv ``` + Optionally: + - Add `--check_urls` to validate that all links in the CSV files are accessible before creating the project. + - Add `--create_local_test` to generate a local preview HTML file for testing. See [preview_html](preview_html.md) for details. + Note: file paths are expected to be relative to the current working directory. - 1. Double check the outcome of the script. A folder should be created with YOUR_PROJECT_NAME in current working + 1. Double-check the outcome of the script. A folder should be created with YOUR_PROJECT_NAME in current working directory which contains: * `YOUR_PROJECT_NAME_p804.html`: Customized HIT app to be used in Amazon Mechanical Turk (AMT). * `YOUR_PROJECT_NAME_publish_batch.csv`: List of dynamic content to be used during publishing batch in AMT. * `YOUR_PROJECT_NAME_acr_result_parser.cfg`: Customized configuration file to be used by `result_parser.py` script -Now, you are ready for [Running the Test on Amazon Mechanical Turk](running_test_mturk.md). \ No newline at end of file +Now, you are ready for running the test on [Prolific](running_test_prolific.md) or [Amazon Mechanical Turk](running_test_mturk.md). \ No newline at end of file diff --git a/docs/prep_p835.md b/docs/prep_p835.md index c93f8b8..267f9f6 100644 --- a/docs/prep_p835.md +++ b/docs/prep_p835.md @@ -1,93 +1,98 @@ -[Home](../README.md) > [Preparation](preparation.md) > Preparation for the P.835] -# Preparation of P.835 test - -The following steps should be performed to prepare the P.835 test setup. - -**Note**: make sure to first perform steps listed in the [general preparation process](preparation.md). - - -1. Upload your **speech clips** in a cloud server and create `rating_clips.csv` file which contains all URLs in a -column named `rating_clips` (see [rating_clips.csv](../src/test_inputs/rating_clips.csv) as an example). - - **Note about file names**: - * Later in the analyzes, clip's file name will be used as a unique key and appears in the results. - * In case you have 'conditions' which are represented with more than one clip, you may consider to use the condition's - name in the clip's file name or in the URL e.g. xxx_c01_xxxx.wav. Latter you can use regex pattern to extract the - condition identifier from the URLs. - - **Note on Reference Conditions** - * It is strongly recommended to include Reference Conditions in your study to cover the entire range of MOS on all +[Home](../README.md) > [Preparation](preparation.md) > Preparation for P.835 +# Preparation of P.835 test + +The following steps should be performed to prepare the P.835 test setup. + +**Note**: make sure to first perform steps listed in the [general preparation process](preparation.md). + + +1. Upload your **speech clips** in a cloud server and create `rating_clips.csv` file which contains all URLs in a +column named `rating_clips` (see [rating_clips.csv](../src/test_inputs/rating_clips.csv) as an example). + + **Note about file names**: + * Later in the analysis, the clip's file name will be used as a unique key and appears in the results. + * In case you have 'conditions' which are represented with more than one clip, you may consider using the condition's + name in the clip's file name or in the URL e.g. xxx_c01_xxxx.wav. Later, you can use a regex pattern to extract the + condition identifier from the URLs. + + **Note on Reference Conditions** + * It is strongly recommended to include Reference Conditions in your study to cover the entire range of MOS on all three scales. Results of our studies showed that Reference Conditions based on the ITU-T Rec. P.835 do not cover - the entire range of scales, rather the framework propose in ETSI 103 281 Annex D can cover the entire range. We - recommend to use [3gpp_p501_FB](../p835_reference_conditions/3gpp_p501_FB) which is created base on the ETSI/3GPP framework. - -1. Upload your **training clips** in a cloud server and create `training_clips.csv` file which contains all URLs in a -column named `training_clips` (see [training_clips.csv](../src/test_inputs/training_clips.csv) as an example). - - **Hint**: Training clips are used for anchoring participants perception, and should represent the entire dataset. - They should approximately cover the range from worst to best quality to be expected in the test. It may contain - about 5 clips. - -1. Upload your **gold standard clips** in a cloud server and create `gold_clips.csv` file which contains all URLs in a -column named `gold_clips` and expected answer to each clip in a column named `gold_clips_ans` -(see [gold_clips.csv](../src/test_inputs/gold_clips.csv) as an example). - - **Hint**: Gold standard clips are used as a hidden quality control item in each session. It is expected that their - answers are so obvious for all participants that they all give the `gold_clips_ans` rating (+/- 1 deviation is - accepted) to the "overall quality" question. It is recommended to use clips with excellent (answer 5) or very bad - (answer 1) quality. - -1. Create trapping stimuli set for your dataset. - - 1. Configure the `create_trapping_stimuli.py` in your config file. See [configuration of create_trapping_stimuli script ](conf-trapping.md) - for more information. An example is provided in `configurations\trapping_p835.cfg`. - - 2. Delete all files from `trapping clips\source` directory - ``` bash - cd "src\trapping clips\source" - del *.* - ``` - 3. Add some clips from your dataset to `trapping clips\source` directory. Select clips in a way that - 1. Covers fair distributions of speakers (best couple of clips per each speaker) - 1. Covers entire range of quality (some good, fair and bad ones) - - 4. Run `create_trapping_stimuli.py` - ``` bash - cd src - python create_trapping_stimuli.py ^ - --cfg your_config_file.cfg - ``` - 5. Trapping clips are stored in `trapping clips\output` directory. List of clips and their correct answer can - be found in `trapping clips\source\output_report.csv`. You can replace file names (appears in column named `trapping_clips`) - with the URLs pointing to those files to create the `trapping_clips.csv` file (see below). - -1. Upload your **trapping clips** in a cloud server and create `trapping_clips.csv` file which contains all URLs in -a column named `trapping_clips` and expected answer to each clip in a column named `trapping_ans` -(see [trapping_clips.csv](../src/test_inputs/trapping_clips.csv) as an example). - -1. Create your custom project by running the master script: - - 1. Configure the project in your config file. See [master script configuration](conf_master.md) for more information. - - 1. Run `master_script.py` with all above-mentioned resources as input - - ``` bash - cd src - python master_script.py ^ - --project YOUR_PROJECT_NAME ^ - --method p835 ^ - --cfg your_configuration_file.cfg ^ - --clips rating_clips.csv ^ - --training_clips training_clips.csv ^ - --gold_clips gold_clips.csv ^ - --trapping_clips trapping_clips.csv - ``` - Note: file paths are expected to be relative to the current working directory. - - 1. Double check the outcome of the script. A folder should be created with YOUR_PROJECT_NAME in current working - directory which contains: - * `YOUR_PROJECT_NAME_p835.html`: Customized HIT app to be used in Amazon Mechanical Turk (AMT). - * `YOUR_PROJECT_NAME_publish_batch.csv`: List of dynamic content to be used during publishing batch in AMT. - * `YOUR_PROJECT_NAME_acr_result_parser.cfg`: Customized configuration file to be used by `result_parser.py` script - -Now, you are ready for [Running the Test on Amazon Mechanical Turk](running_test_mturk.md). \ No newline at end of file + the entire range of scales, rather the framework proposed in ETSI 103 281 Annex D can cover the entire range. We + recommend to use [3gpp_p501_FB](../p835_reference_conditions/3gpp_p501_FB) which is created based on the ETSI/3GPP framework. + +1. Upload your **training clips** in a cloud server and create `training_clips.csv` file which contains all URLs in a +column named `training_clips` (see [training_clips.csv](../src/test_inputs/training_clips.csv) as an example). + + **Hint**: Training clips are used for anchoring participants' perception, and should represent the entire dataset. + They should approximately cover the range from worst to best quality to be expected in the test. It may contain + about 5 clips. + +1. Upload your **gold standard clips** in a cloud server and create `gold_clips.csv` file which contains all URLs in a +column named `gold_clips` and expected answer to each clip in a column named `gold_clips_ans` +(see [gold_clips.csv](../src/test_inputs/gold_clips.csv) as an example). + + **Hint**: Gold standard clips are used as a hidden quality control item in each session. It is expected that their + answers are so obvious for all participants that they all give the `gold_clips_ans` rating (+/- 1 deviation is + accepted) to the "overall quality" question. It is recommended to use clips with excellent (answer 5) or very bad + (answer 1) quality. + You can use `create_gold_clips.py` to generate gold clips automatically. See [Gold Standard Clips](gold_clips.md) for details. + +1. Create trapping stimuli set for your dataset. + + 1. Configure the `create_trapping_stimuli.py` in your config file. See [configuration of create_trapping_stimuli script](conf-trapping.md) + for more information. An example is provided in `configurations\trapping_p835.cfg`. + + 2. Delete all files from `trapping_clips_assets\source` directory + ``` bash + cd "src\trapping_clips_assets\source" + del *.* + ``` + 3. Add some clips from your dataset to `trapping_clips_assets\source` directory. Select clips in a way that + 1. Covers a fair distribution of speakers (best couple of clips per each speaker) + 1. Covers the entire range of quality (some good, fair, and bad ones) + + 4. Run `create_trapping_stimuli.py` + ``` bash + cd src + python create_trapping_stimuli.py ^ + --cfg your_config_file.cfg + ``` + 5. Trapping clips are stored in `trapping_clips_assets\output` directory. List of clips and their correct answer can + be found in `trapping_clips_assets\output\output_report.csv`. You can replace file names (appears in column named `trapping_clips`) + with the URLs pointing to those files to create the `trapping_clips.csv` file (see below). + +1. Upload your **trapping clips** in a cloud server and create `trapping_clips.csv` file which contains all URLs in +a column named `trapping_clips` and expected answer to each clip in a column named `trapping_ans` +(see [trapping_clips.csv](../src/test_inputs/trapping_clips.csv) as an example). + +1. Create your custom project by running the master script: + + 1. Configure the project in your config file. See [master script configuration](conf_master.md) for more information. + + 1. Run `master_script.py` with all above-mentioned resources as input + + ``` bash + cd src + python master_script.py ^ + --project YOUR_PROJECT_NAME ^ + --method p835 ^ + --cfg your_configuration_file.cfg ^ + --clips rating_clips.csv ^ + --training_clips training_clips.csv ^ + --gold_clips gold_clips.csv ^ + --trapping_clips trapping_clips.csv + ``` + Optionally: + - Add `--check_urls` to validate that all links in the CSV files are accessible before creating the project. + - Add `--create_local_test` to generate a local preview HTML file for testing. See [preview_html](preview_html.md) for details. + + Note: file paths are expected to be relative to the current working directory. + + 1. Double-check the outcome of the script. A folder should be created with YOUR_PROJECT_NAME in current working + directory which contains: + * `YOUR_PROJECT_NAME_p835.html`: Customized HIT app to be used in Amazon Mechanical Turk (AMT). + * `YOUR_PROJECT_NAME_publish_batch.csv`: List of dynamic content to be used during publishing batch in AMT. + * `YOUR_PROJECT_NAME_acr_result_parser.cfg`: Customized configuration file to be used by `result_parser.py` script + +Now, you are ready for running the test on [Prolific](running_test_prolific.md) or [Amazon Mechanical Turk](running_test_mturk.md). \ No newline at end of file diff --git a/docs/preparation.md b/docs/preparation.md index 6805812..604c513 100644 --- a/docs/preparation.md +++ b/docs/preparation.md @@ -19,7 +19,7 @@ The following steps should be performed to prepare the test setup. pip install -r requirements.txt ``` -1. (optional) Upload the general resources (found in `src\P809Template\assets`) in a cloud server and change the +1. (optional) Upload the general resources (found in `src\P808Template\assets`) in a cloud server and change the URLs associated to them as described in [General Resources](general_res.md) 1. Follow the rest of preparation process based on the test methodology you want to apply: @@ -31,3 +31,8 @@ URLs associated to them as described in [General Resources](general_res.md) - Preparation for the P.831 - Preparation for Personalized P.835 - [Preparation for the P.804](prep_p804.md) + +## Utility Scripts + +- [Gold Standard Clips](gold_clips.md) — Generate gold clips for quality control. +- [Upload Clips to Storage](upload_clips.md) — Upload local clips or copy from private to public Azure storage. diff --git a/docs/preview_html.md b/docs/preview_html.md new file mode 100644 index 0000000..3e1bd9e --- /dev/null +++ b/docs/preview_html.md @@ -0,0 +1,39 @@ +# Preview HTML + +`preview_html.py` generates a local preview of the HIT by substituting one row from the +`_publish_batch.csv` into the generated `.html` template. + +Non-public asset URLs (`.js`, `.css`) are replaced with publicly accessible CDN equivalents +so the preview works without downloading external resources. + +## Usage + +```bash +cd src +python utils/preview_html.py --dir YOUR_PROJECT_NAME --samples 1 +``` + +| Argument | Required | Default | Description | +|-----------|----------|---------|-------------| +| `--dir` | Yes | — | Directory containing the `.html`, `_publish_batch.csv`, and `.cfg` files produced by `master_script.py`. | +| `--samples` | No | 1 | Number of CSV rows to generate preview files for. | + +The output is saved in the same directory as `_row-1.html`. + +## Automatic generation + +Pass `--create_local_test` to `master_script.py` to automatically generate one preview file +after the project is created: + +```bash +python master_script.py ^ + --project YOUR_PROJECT_NAME ^ + --method acr ^ + --cfg your_configuration_file.cfg ^ + --clips rating_clips.csv ^ + --training_clips training_clips.csv ^ + --gold_clips gold_clips.csv ^ + --trapping_clips trapping_clips.csv ^ + --check_urls ^ + --create_local_test +``` diff --git a/docs/results.md b/docs/results.md index 0bf0127..9290a82 100644 --- a/docs/results.md +++ b/docs/results.md @@ -36,10 +36,11 @@ created in the first step ([preparation](preparation.md)). ``` * `--cfg` use the configuration file generated for your project in the [preparation](preparation.md) step here (i.e.`YOUR_PROJECT_NAME_ccr_result_parser.cfg`). * `--method` could be either `acr`, `dcr`, `ccr`, `p835`, `pp835` or `p804`. + * if using Prolific, provide the csv file you downloaded from that platform as `--prolific_answers`. The answer you downloaded from HITAppServer should be provided as `--answers` * `--quantity_bonus` could be `all`, or `submitted`. It specify which assignments should be considered when calculating the amount of quantity bonus (everything i.e. `all` or just the assignments with status submitted i.e. `submitted`). - Beside the console outputs, following files will be generated in the same directory as the `--answers` file is located in. + Besides the console outputs, following files will be generated in the same directory as the `--answers` file is located in. All file names will start with the `--answers` file name. * `[downloaded_batch_result]_data_cleaning_report`: Data cleansing report. Each line refers to one line in answer file. * `[downloaded_batch_result]_accept_reject_gui.csv`: A report to be used for approving and rejecting assignments. One line @@ -58,7 +59,7 @@ created in the first step ([preparation](preparation.md)). the "processed" clip Compared to the Quality of the "reference/unprocessed" Clip is .. (Much Worse:-3 to Much Better:+3)." On the loading time of Rating Section in the HIT APP order of processed and reference clips are randomized, but the sign of vote is always corrected to answer the above-mentioned question. Order of presentation is also saved in the downloaded - csv file from AMT per question in column `Answer.Qx_order` where x is the question's number. `rp` refers to refernce-processed. + csv file from AMT per question in column `Answer.Qx_order` where x is the question's number. `rp` refers to reference-processed. Note for **P835** method: * for each of the `Signal`, `Background` and `Overall` quality scales, aggregated ratings will be stored in a separate csv file @@ -67,11 +68,22 @@ created in the first step ([preparation](preparation.md)). * `[downloaded_batch_result]_votes_per_cond_[postfix].csv`: Aggregated result per condition. * In addition a summary in the condition level will be provided for all three scales in `[downloaded_batch_result]_votes_per_cond_all`. + +## Approve/Reject submissions - Prolific + - Get your API token from the Prolific website and add it to your [Prolific config file](../src/configurations/prolific.cfg). + + - run the below command + + ``` bash + cd src + python prolific_utils.py ^ + --cfg your_prolific_configuration_file.cfg ^ + --review [path to you project's root directory] ^ + ``` - -## Approve/Reject submissions +## Approve/Reject submissions - AMT -Depending to how you create the HITs (using the AMT website or script) you should use the same method for approving/rejecting +Depending on how you create the HITs (using the AMT website or script) you should use the same method for approving/rejecting submission. ### Approve/Reject submissions - using website. @@ -92,7 +104,7 @@ submission. ``` -## Assign bonuses +## Assign bonuses - AMT only 1. Run the following script with both `[downloaded_batch_result]_quantity_bonus_report.csv` and `[downloaded_batch_result]_quality_bonus_report.csv`: @@ -103,7 +115,7 @@ submission. --cfg mturk.cfg ^ --send_bonus [downloaded_batch_result]_*_bonus_report.csv ``` - ## Extending HITs + ## Extending HITs - AMT only In case you want to reach the intended number of votes per clips, you may use the following procedure: diff --git a/docs/running_test_prolific.md b/docs/running_test_prolific.md new file mode 100644 index 0000000..2181e00 --- /dev/null +++ b/docs/running_test_prolific.md @@ -0,0 +1,53 @@ +[Home](../README.md) > Running the Test on Prolific + +# Running the Test on Prolific + +Following steps explain how to conduct the Speech Quality Assessment test on the Prolific crowdsourcing platform according to the ITU-T Rec. P.808 [1]. +It is required to perform the [preparation](preparation.md) step first. +As a result you should have a directory named YOUR_PROJECT_NAME which contains: + + * `YOUR_PROJECT_NAME_METHOD.html`: Customized HIT app template to be used in AMT/Prolific. + * `YOUR_PROJECT_NAME_publish_batch.csv`: List of dynamic content to be used during publishing batch in AMT. + * `YOUR_PROJECT_NAME_ccr_result_parser.cfg`: Customized configuration file to be used by `result_parser.py` script + +You will use the first two files in this part. + +In parallel you need to use a system to host your survey. One option is [HITAppServer](https://github.com/microsoft/P.910/tree/main/hitapp_server). The following procedure is customized to the HIT APP server and assumes a running server is available. + + +## Step 1: Create the test on HITApp Server +1. Visit your HITApp server landing page. +2. Start by "New project" +3. Provide the required information: + - Project name: Friendly internal name + - Number of assignments: how many votes per file do you aim for (here it is only for statistics) + - Crowdsourcing platform: select "Prolific" + - Choose HTML file: Upload `YOUR_PROJECT_NAME_METHOD.html` + - Choose CSV file with variables: Upload `YOUR_PROJECT_NAME_publish_batch.csv` + - Click on Submit +4. When finished download: + - download "HIT input file" + - download "HIT description" + +## Step2: Create the test on Prolific + - Login to your Prolific account + - Create a Project to organize your studies. + - Go to your project and create a New Study + - Use "Taskflow", fill the required information including study name, study description (the above "HIT Description" with some edits can be used here) + - Fill the rest of the information, and upload the above "HIT input file" into "upload the URL file". + - Set "How many total taskers do you need?" to be the number of votes you need × number of rows in "HIT input file" (one row in this file will be one task, and you need N votes per task) + - From *recording Prolific IDs*, select "URL parameters". Keep the values as it is. + - In "screening" select any prerequisite your participants should have. + - From *Submissions* select *Multiple*, and use max(50, N of tasks) +## Step3: publish + - Fill the rest of the information including a fair payment and follow the procedure on the Prolific website. + + +When the Batch is finished, download the answers from Prolific and HITAppServer and then you are ready to continue with [analyzing the results](results.md). + - "Download demographic data" from the study page in Prolific. + - Download the answers from your study page in the HITApp server. + + +## References +[1]. [ITU-T Recommendation P. 808](https://www.itu.int/rec/T-REC-P.808/en): _Subjective evaluation of speech quality with a crowdsourcing approach_, International Telecommunication Union, Geneva, 2018. + \ No newline at end of file diff --git a/docs/upload_clips.md b/docs/upload_clips.md new file mode 100644 index 0000000..7fb4ca8 --- /dev/null +++ b/docs/upload_clips.md @@ -0,0 +1,114 @@ +[Home](../README.md) > [Preparation](preparation.md) > Upload Clips to Storage + +# Upload Clips to Storage + +The `copy_to_pub_storage.py` utility prepares audio clips for crowdsourcing studies by uploading +or copying them to a publicly accessible Azure Blob Storage container. It updates the CSV report +with public URLs ready for use with `master_script.py`. + +## Prerequisites + +- [azcopy](https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-v10) must be installed and available in your PATH. +- A destination Azure Blob Storage account with a container created. +- SAS tokens with appropriate permissions (see below). + +## Use Case 1: Upload Local Clips + +After generating gold clips or trapping clips locally (e.g. with `create_gold_clips.py` or +`create_trapping_stimuli.py`), upload them to Azure Blob Storage. + +### Step 1: Prepare the upload + +```bash +cd src +python utils/copy_to_pub_storage.py upload-local ^ + --input path/to/gold_clips_report.csv ^ + --columns gold_clips ^ + --local-dir path/to/gold_clips_directory ^ + --dest-storage-url https://ACCOUNT.blob.core.windows.net ^ + --target-container my-study-clips +``` + +This produces: +- `gold_clips_report_to_upload.txt` — file list for azcopy. +- `gold_clips_report_public.csv` — updated CSV with public URLs. + +### Step 2: Run the azcopy command + +The script prints an azcopy command. Replace `[SAS_TOKEN_WITH_WRITE_CREATE]` with a SAS token +that has **write** and **create** permissions on the target container, then run it. + +### Step 3: Use the updated CSV + +Pass `gold_clips_report_public.csv` as `--gold_clips` to `master_script.py`. + +## Use Case 2: Copy from Private to Public Storage + +When clips are stored in a private Azure storage account and need to be moved to a public +container for the study. + +### Step 1: Prepare the copy + +```bash +cd src +python utils/copy_to_pub_storage.py copy-remote ^ + --input rating_clips.csv ^ + --columns rating_clips ^ + --src-url https://private.blob.core.windows.net/container ^ + --dest-storage-url https://public.blob.core.windows.net ^ + --target-container my-study-clips +``` + +This produces: +- `rating_clips_to_copy.txt` — file list for azcopy. +- `rating_clips_public.csv` — updated CSV with public URLs. + +### Step 2: Run the azcopy command + +Replace both SAS token placeholders: +- `[SAS_TOKEN_WITH_READ]` — read permission on the source container. +- `[SAS_TOKEN_WITH_WRITE_CREATE]` — write/create permission on the target container. + +## Arguments + +### upload-local + +| Argument | Required | Description | +|----------|----------|-------------| +| `--input` | Yes | Path to the input CSV file. | +| `--columns` | Yes | Column names containing clip filenames (space-separated). | +| `--local-dir` | Yes | Local directory containing the clip files. | +| `--dest-storage-url` | Yes | Base URL of the destination storage account. | +| `--target-container` | Yes | Name of the target blob container. | +| `--cdn-base-url` | No | Base URL for public access. Defaults to `--dest-storage-url`. | + +### copy-remote + +| Argument | Required | Description | +|----------|----------|-------------| +| `--input` | Yes | Path to the input CSV file. | +| `--columns` | Yes | Column names containing clip URLs (space-separated). | +| `--src-url` | Yes | URL of the source blob storage container. | +| `--dest-storage-url` | Yes | Base URL of the destination storage account. | +| `--target-container` | Yes | Name of the target blob container. | +| `--cdn-base-url` | No | Base URL for public access. Defaults to `--dest-storage-url`. | + +## Multiple Columns + +Both modes support multiple columns. For example, to upload gold and trapping clips in one pass: + +```bash +python utils/copy_to_pub_storage.py upload-local ^ + --input combined_report.csv ^ + --columns gold_clips trapping_clips ^ + --local-dir path/to/clips ^ + --dest-storage-url https://ACCOUNT.blob.core.windows.net ^ + --target-container my-study-clips +``` + +## SAS Token Tips + +- Generate SAS tokens from the Azure Portal: Storage Account > Shared access signature. +- For the **source** (read): select "Read" and "List" permissions, Container scope. +- For the **target** (write): select "Write" and "Create" permissions, Container scope. +- Set an appropriate expiry date (e.g. 24 hours for a one-time copy). diff --git a/src/P808Template/ACR_template.html b/src/P808Template/ACR_template.html index e72b42c..c3a7734 100644 --- a/src/P808Template/ACR_template.html +++ b/src/P808Template/ACR_template.html @@ -7,9 +7,9 @@ --> - - - + + + @@ -49,7 +49,7 @@ fieldset { padding: 10px; background:#fbfbfb; border-radius:5px; margin-bottom:5px; } - - - @@ -1108,19 +1208,19 @@

  • You must perform the task in a quiet environment
  • Do not change the volume after modifying it in the Setup section.
  • - -

    Payment:

    -

    The result of this experiment is very important for us and other scientist working in this area. We have methods that analyse the consistency of your answers. We will use these methods to rank the submitted assignments according to quality.

    -

    For this experiment, we will pay a base reward of ${{cfg.hit_base_payment}}/HIT for every accepted HIT. We have made available a set of 0 different HITs. You will receive a bonus of:

    - -
      -
    • ${{cfg.quantity_bonus}}/HIT (for a total of ${{cfg.sum_quantity}}/HIT) if you submit more than {{cfg.quantity_hits_more_than}} HITs that got accepted or
    • -
    • ${{cfg.quality_bonus}}/HIT (for a total of ${{cfg.sum_quality }}/HIT) if you submit more than {{cfg.quantity_hits_more_than}} HITs and be in the top {{cfg.quality_top_percentage}}% quality group.
    • -
    - -

    Bonuses will be assigned with in 7 days.

    - Please perform up to 0 HITs from this group. If you do more than that, the rest will be rejected. - +
    +

    Payment:

    +

    The result of this experiment is very important for us and other scientist working in this area. We have methods that analyse the consistency of your answers. We will use these methods to rank the submitted assignments according to quality.

    +

    For this experiment, we will pay a base reward of ${{cfg.hit_base_payment}}/HIT for every accepted HIT. We have made available a set of 0 different HITs. You will receive a bonus of:

    + +
      +
    • ${{cfg.quantity_bonus}}/HIT (for a total of ${{cfg.sum_quantity}}/HIT) if you submit more than {{cfg.quantity_hits_more_than}} HITs that got accepted or
    • +
    • ${{cfg.quality_bonus}}/HIT (for a total of ${{cfg.sum_quality }}/HIT) if you submit more than {{cfg.quantity_hits_more_than}} HITs and be in the top {{cfg.quality_top_percentage}}% quality group.
    • +
    + +

    Bonuses will be assigned with in 7 days.

    + Please perform up to 0 HITs from this group. If you do more than that, the rest will be rejected. +

    Attention:

    This hit includes one or more Control clips (gold clips). Control clips are ones that we know that answer for and should be very easy to rate (they are clearly very good or very poor). They may target one or more scales. We include control clips in the HIT to ensure raters are paying attention and their environment hasn't changed. Wrong answer to control clip(s) will result in rejection of the HIT.

    @@ -1170,7 +1270,7 @@

    -
    +
    @@ -1178,8 +1278,8 @@

    - - + + @@ -1196,13 +1296,13 @@

    - +
    @@ -1214,9 +1314,9 @@

    @@ -1260,7 +1360,7 @@

    -
    +
    @@ -1414,7 +1514,7 @@

     
    -
    Result:  
    +
    Result:  

    @@ -1539,7 +1639,7 @@

    After that, you might see a popup message from your browser asking for allowance to use your microphone. Please click on Allow (on some of smartphones there will be no message). We do not process or record any audio with your microphone.
    - +
    @@ -1572,8 +1672,8 @@

    Click next to answer all 0 questions.
    @@ -1587,6 +1687,8 @@

    + + @@ -1597,6 +1699,11 @@

    + + + + +
    @@ -1614,8 +1721,8 @@

    Ratings

    Click next to answer all 0 questions, then submit your response.
    diff --git a/src/P808Template/CCR_template.html b/src/P808Template/CCR_template.html index 5bf1d10..b4eb989 100644 --- a/src/P808Template/CCR_template.html +++ b/src/P808Template/CCR_template.html @@ -7,15 +7,14 @@ --> - - - + + + - - - - - + + + @@ -49,7 +49,7 @@ fieldset { padding: 10px; background:#fbfbfb; border-radius:5px; margin-bottom:5px; } - - - - + + + @@ -243,7 +241,7 @@ - - - - @@ -2614,7 +2645,7 @@
    Warning

    - +

    @@ -2626,7 +2657,7 @@
    Warning
    -

    HIT Type: 804

    New: we publish different HIT types which may look similar but have different instructions and aims. This number is an identifier for you to recognize them.

    +

    HIT Type: 804

    Introduction @@ -2650,7 +2681,7 @@

    scale -

    Welcome to this Data Labeling task! You are about to participate in our speech quality assessment task! It is a data labeling task in which you are responsible for providing reliable labels that pass our qualifity control system. This HIT has two + +

    Welcome to this Data Labeling task! You are about to participate in our speech quality assessment task! It is a data labeling task in which you are responsible for providing reliable labels that pass our quality control system. This HIT has two + two (just every now and then) sections:

    • Qualification (just once): Check if you are eligible to perform these HITs
    • @@ -2670,32 +2701,33 @@

    • You must perform the task in a quiet environment
    • Do not change the volume after modifying it in the Setup section.
    +
    +

    Payment:

    +

    The result of this experiment is very important for us and other scientists working in this + area. We have methods that analyse the consistency of your answers. We will use these methods to + rank the submitted assignments according to quality.

    +

    For this experiment, we will pay a base reward of ${{cfg.hit_base_payment}}/HIT for every + accepted HIT. We have made available a set of 0 different HITs. You + will receive a bonus of:

    -

    Payment:

    -

    The result of this experiment is very important for us and other scientist working in this - area. We have methods that analyse the consistency of your answers. We will use these methods to - rank the submitted assignments according to quality.

    -

    For this experiment, we will pay a base reward of ${{cfg.hit_base_payment}}/HIT for every - accepted HIT. We have made available a set of 0 different HITs. You - will receive a bonus of:

    - -
      -
    • ${{cfg.quantity_bonus}}/HIT (for a total of ${{cfg.sum_quantity}}/HIT) if you submit - more than {{cfg.quantity_hits_more_than}} HITs or
    • -
    • ${{cfg.quality_bonus}}/HIT (for a total of ${{cfg.sum_quality }}/HIT) if you submit - more than {{cfg.quantity_hits_more_than}} HITs and be in the top - {{cfg.quality_top_percentage}}% quality group.
    • -
    - -

    Bonuses will be assigned with in 7 days.

    - Please perform up to 0 HITs from this group. If you do more than - that, the rest will be rejected. +
      +
    • ${{cfg.quantity_bonus}}/HIT (for a total of ${{cfg.sum_quantity}}/HIT) if you submit + more than {{cfg.quantity_hits_more_than}} HITs or
    • +
    • ${{cfg.quality_bonus}}/HIT (for a total of ${{cfg.sum_quality }}/HIT) if you submit + more than {{cfg.quantity_hits_more_than}} HITs and be in the top + {{cfg.quality_top_percentage}}% quality group.
    • +
    +

    Bonuses will be assigned within 7 days.

    + Please perform up to 0 HITs from this group. If you do more than + that, the rest will be rejected. +

    Attention:

    This hit includes one or more Control clips (gold clips). Control clips are ones that we know that answer for and should be very easy to rate (they are clearly very good or very poor). They may target one or more scales. We include control clips in the HIT to ensure raters are paying attention and their environment hasn't changed. Wrong answer to control clip(s) will result in rejection of the HIT.

    +
    @@ -2740,8 +2772,8 @@

    -
    +

    in-ear headphones Over-the-ear headphones louadspeaker inbuild speakers loudspeaker built-in speakers
    @@ -2752,13 +2784,13 @@

    alt="in-ear headphones" width="100px">

    + alt="Closed-back headphones" width="100px"> + alt="loudspeaker" width="100px"> + alt="built-in speakers" width="100px"> + alt="loudspeaker" width="100px"> + alt="built-in speakers" width="100px"> - - + + @@ -1406,7 +1554,7 @@

    - +
    +
    Result:  
    @@ -1749,7 +1897,7 @@

    After that, you might see a popup message from your browser asking for allowance to use your microphone. Please click on Allow (on some of smartphones there will be no message). We do not process or record any audio with your microphone.
    - +
    @@ -1779,7 +1927,7 @@

    Note: The order of questions may change from one HIT to the other.

    -

    Please provide your rating for the following 0 trials. For each rating in a trial, you should listen to the audio sample again and give your opinion on the corresponding scale. Note that the scale will be activated when the speech sample played until the end. In case you hear an interruption message, please follow the instruction given in the message.

    +

    Please provide your rating for the following 0 trials. For each rating in a trial, you should listen to the audio sample again and give your opinion on the corresponding scale. Note that the scale will be activated when the speech sample played until the end. In case you hear an interruption message, please follow the instructions given in the message.

    @@ -1789,8 +1937,8 @@

    Click Next Trial to answer all 0 trials.
    @@ -1804,19 +1952,24 @@

    - - - + + + + + + + + -
    +

    Ratings

    @@ -1831,7 +1984,7 @@

    Ratings

    Note: The order of questions may change from one HIT to the other.

    -

    Please provide your rating for the following 0 trials. For each rating in a trial, please listen to the audio sample again and give your opinion on the corresponding scale. Note that the scale will be activated when the speech sample played until the end. In case you hear an interruption message, please follow the instruction given in the message.

    +

    Please provide your rating for the following 0 trials. For each rating in a trial, please listen to the audio sample again and give your opinion on the corresponding scale. Note that the scale will be activated when the speech sample played until the end. In case you hear an interruption message, please follow the instructions given in the message.

    @@ -1842,8 +1995,8 @@

    Ratings

    Click Next trial to answer all 0 trials, then submit your response.
    @@ -1854,7 +2007,7 @@

    Ratings

    Thanks for your participation. Please perform more HITs from this group when they are available for you.

    -

    Note: The submit button works only if you answer to all questions. Make sure to answer to all 3 questions in each trial. Click here to see which questions in the "Rating" section are not answered?

    +

    Note: The submit button works only if you answer to all questions. Make sure to answer to all 3 questions in each trial. Click here to see which questions in the "Rating" section are not answered?

    Closed back headphones louadspeaker inbuild speakers
    @@ -3002,8 +3034,7 @@

     
    -
    Result:  
    +
    Result:  
    @@ -3156,7 +3187,7 @@

    Please click on Allow (on some of smartphones there will be no message). We do not process or record any audio with your microphone.
    - +
    @@ -3352,7 +3383,7 @@

    Attention:

    Scores showing the best quality are: Not noisy, Optimal loudness, Continuous, Uncolored, No reverb, Not distorted, and Excellent quality.

    Scores showing the worst quality are: Noisy, Sub-optimal loudness, Discontinuous, Colored, High reverb, Very distorted, and Bad quality.

    - +

    @@ -3413,10 +3444,10 @@

    @@ -3500,7 +3537,7 @@

    Ratings

    Thanks for your participation. Please perform more HITs from this group when they are available for you.

    Note: The submit button works only if you answer to all questions. Make sure to answer to all 7 questions in each - trial. Click here to see which + trial. Click here to see which questions in the "Rating" section are not answered?

    diff --git a/src/P808Template/P831_ACR_template.html b/src/P808Template/P831_ACR_template.html index 2556c27..083ab97 100644 --- a/src/P808Template/P831_ACR_template.html +++ b/src/P808Template/P831_ACR_template.html @@ -5,17 +5,15 @@ *--------------------------------------------------------------------------------------------*/ @author: Babak Naderi --> - - - - + + + - - - - - + + + @@ -49,7 +49,7 @@ fieldset { padding: 10px; background:#fbfbfb; border-radius:5px; margin-bottom:5px; } - - - - + + + - - - - - @@ -1765,7 +1792,7 @@
    Warning
    louadspeaker inbuild speakers
    @@ -2219,8 +2247,7 @@

     
    -
    Result:  
    +
    Result:  
    @@ -2373,7 +2400,7 @@

    Please click on Allow (on some of smartphones there will be no message). We do not process or record any audio with your microphone.
    - +
    @@ -2466,10 +2493,10 @@

    Important examples:

    in-ear headphones Over-the-ear headphones louadspeaker inbuild speakers loudspeaker built-in speakers
    @@ -287,8 +364,8 @@

    - - + + @@ -305,13 +382,13 @@

    - +
    @@ -323,9 +400,9 @@

    @@ -368,7 +445,7 @@

    -
    +
    @@ -408,11 +485,18 @@

    -
    Thank you for your participation. The qualifications will be assigned to selected group of participants in up to next 3 days.
    +
    Thank you for your participation. The qualifications will be assigned to a selected group of participants within the next 3 days.
    + + + + + + + \ No newline at end of file diff --git a/src/P808Template/README.md b/src/P808Template/README.md index b5fdea7..9077b3a 100644 --- a/src/P808Template/README.md +++ b/src/P808Template/README.md @@ -1,8 +1,11 @@ # HIT App Templates -Templates for ACR, DCR, and CCR methods to be used in Amazon Mechanical Turk platform. +Templates for ACR, DCR, and CCR methods to be used in Amazon Mechanical Turk platform. The ACR implementation is based on the ITU-T P.808 Recommendation and implementation of DCR and CCR are based on ITU-T P.800 Recommendation adapted to the crowdsourcing approach. +Note: JQuery and Bootstrap are required for templates to work. We provide a public link for them, however if the link stop working, you can change them with other CDNs. + + ## Qualification The qualification can be used as a separate HIT or in the integrated mode i.e. a section in the main HIT. The qualification contains 10 questions including a hearing test. @@ -36,7 +39,7 @@ You can specify how often this section show up. It is recommended to have it in This section uses WebRTC to check if the user has a headset. #### Training (every X hours) -Here a small set of anchoring stimuli presented to worker in a similar GUI as the "rating section". +Here a small set of anchoring stimuli presented to worker in a similar GUI as the "rating section". Training section is generated dynamically based on list of URLs in the `config['trainingUrls']`. These will be added by master_script. Order of stimuli can be randomized, and retraining can be forced after X hours (see the Usage section). @@ -61,7 +64,7 @@ The implementation of Comparison Category Rating (CCR) listening test as specifi The template has same structure as ACR_Template. ## P835_template -The implementation of the ITU-T Rec. P.835 recommendation. An extension of it for only one time listening are given on +The implementation of the ITU-T Rec. P.835 recommendation. An extension for single-listening is given in P835_template_one_audio.html. ## P831_ACR_template diff --git a/src/P808Template/bw_check.html b/src/P808Template/bw_check.html index aceecb6..5611399 100644 --- a/src/P808Template/bw_check.html +++ b/src/P808Template/bw_check.html @@ -7,9 +7,9 @@ --> - - - + + + @@ -108,7 +108,7 @@ } - - - - + + + @@ -52,7 +52,7 @@ .done {background-color: #2ecc71;} - - - - + + + @@ -52,7 +52,7 @@ .done {background-color: #2ecc71;} - - -

    in-ear headphones Over-the-ear headphones louadspeaker inbuild speakers loudspeaker built-in speakers