Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
711d142
feat: add naive prompt optimizer with iterative optimization framework
DanielDango Nov 28, 2025
7887f8d
fix: correct typos and improve documentation in various files as comm…
DanielDango Nov 28, 2025
4a7744a
add feedback optimizer implementation
DanielDango Nov 28, 2025
75dddd1
wip: add more logging
DanielDango Dec 1, 2025
565a80e
feat: enhance cache parameters to include classifier type in Reasonin…
DanielDango Dec 3, 2025
883ede1
feat: Enhance logging for cache operations and misclassification checks
DanielDango Dec 11, 2025
cb89983
Rework caching. Now it shall be ensured that Cacheparameters are used…
dfuchss Dec 11, 2025
fb1de00
Update ArchitectureTest.java
dfuchss Dec 11, 2025
3e7facf
feat: Remove caching from GlobalMetric and PointwiseMetric for prompt…
DanielDango Dec 23, 2025
e31c6c1
Merge remote-tracking branch 'upstream/feature/simplify-caching' into…
DanielDango Dec 23, 2025
c5fff3f
chore: update formatting
DanielDango Dec 23, 2025
abd5a4f
chore: update cache parameter usage
DanielDango Dec 23, 2025
317d3cc
feat: add additional logging
DanielDango Dec 23, 2025
63df5fb
fix: revert reduced target store deduplication
DanielDango Dec 23, 2025
a8d1d47
fix: re-enable other e2e optimizer tests
DanielDango Dec 23, 2025
b09dea2
fix: enable correct element store deduplication and fix typo in itera…
DanielDango Dec 30, 2025
93f87e5
chore: bump license to 2026
DanielDango Jan 22, 2026
4929a71
fix: make prompt key field statically available to resolve null poin…
DanielDango Jan 7, 2026
a14496b
Merge remote-tracking branch 'upstream/main' into feature/add-prompt-…
DanielDango Jan 22, 2026
83b5b7b
revert: Evaluator module only required for future gradient optimizer
DanielDango Jan 22, 2026
6cbd159
refactor: address review of copilot
DanielDango Jan 22, 2026
f6f1895
Merge remote-tracking branch 'upstream/main' into feature/add-prompt-…
DanielDango Jan 22, 2026
bc44e6b
docs: add docstrings missed in #48
DanielDango Jan 22, 2026
03d1d0d
fix: revert evaluator param from javadoc
DanielDango Jan 22, 2026
f1da525
docs: add missing javadoc
DanielDango Jan 22, 2026
d0240e9
feat: introduce ModuleConfiguration.with() to overwrite the classific…
DanielDango Jan 22, 2026
1dc90b1
revert: apparently I accidentally reverted changes to CacheKey changes?
DanielDango Jan 22, 2026
e7bf61b
revert: Classifier copyOf visibility returned to protected
DanielDango Jan 22, 2026
5c3acbd
chore: undo updated license for not actually modified files
DanielDango Jan 30, 2026
2b82f4d
Merge remote-tracking branch 'upstream/main' into feature/add-prompt-…
DanielDango Feb 1, 2026
b0f9e28
chore: implement various suggestions for improvement
DanielDango Feb 4, 2026
7b0be53
refactor: rename Configuration to EvaluationConfiguration
DanielDango Feb 4, 2026
7cff7f7
docs: add documentation for prompt optimizers
DanielDango Feb 4, 2026
c6267e6
Update docs/prompt-optimization.md
DanielDango Feb 4, 2026
71e0f7c
Update docs/prompt-optimization.md
DanielDango Feb 4, 2026
e25257b
chore: implement various feedbacks raised during meeting
DanielDango Feb 5, 2026
e4fbc17
refactor: discontinue usage of mock classifier in feedback optimizer …
DanielDango Feb 6, 2026
f3703c9
Merge remote-tracking branch 'upstream/main' into feature/add-prompt-…
DanielDango Feb 16, 2026
d2cedeb
unify logger name
dfuchss Feb 16, 2026
c73670e
Fix configuration hierarchy
dfuchss Feb 16, 2026
d47995c
refactor: fix package hierarchy
DanielDango Feb 18, 2026
09dee58
fix: apply feedback from @copilot
DanielDango Feb 18, 2026
582173b
refactor: Merge factory methods into the related interfaces
DanielDango Feb 18, 2026
95d905f
refactor: Dont reimplement the f1-wheel
DanielDango Feb 18, 2026
6a3a0d3
apply spotless
DanielDango Feb 18, 2026
d5f758b
revert: remove evaluator from config which is to be added with protegi
DanielDango Feb 18, 2026
23bac91
fix: ensure correct argument retrieval for ModuleConfiguration.with()
DanielDango Feb 18, 2026
822116d
Modify configuration in constructor
dfuchss Feb 18, 2026
0356039
Assert -> if/except
dfuchss Feb 18, 2026
632a6c9
feat: use newly added fBeta metric instead of custom implementation
DanielDango Feb 18, 2026
80af533
docs: clarify Evaluation constructor behavior regarding prompt overwr…
DanielDango Feb 19, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions docs/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,3 +33,33 @@ Runs the pipeline in transitive mode and evaluates it. This is useful for multi-
java -jar ./ratlr.jar transitive -c ./configs/d2m.json ./configs/m2c.json -e ./configs/eval.json
```

## Prompt Optimization

Optimizes prompts used in trace link classification to improve performance.
This command runs the prompt optimization pipeline and optionally evaluates the optimized prompts against evaluation configurations.

The optimization process:
1. Runs baseline evaluation (if evaluation configs are provided)
2. Executes the prompt optimizer with the specified optimization configuration
3. Re-runs evaluation with the optimized prompt to measure improvement

As only the optimized prompt is transferred from the optimization results to the evaluation, other configuration parameters (e.g., model, dataset) do not have to match between optimization and evaluation configurations.

### Examples

```bash
# Run optimization with a single config
java -jar ./ratlr.jar optimize -c ./example-configs/optimizer-config.json

# Run optimization and evaluate the results
java -jar ./ratlr.jar optimize -c ./example-configs/optimizer-config.json -e ./example-configs/simple-config.json

# Run optimization with directories
java -jar ./ratlr.jar optimize -c ./configs/optimization -e ./configs/evaluation
```

### Options

- `-c, --configs`: **(Required)** One or more optimization configuration file paths. If a path points to a directory, all files within that directory will be processed.
- `-e, --eval`: **(Optional)** One or more evaluation configuration file paths. Each evaluation configuration will be used with each optimization config to measure performance before and after optimization.

107 changes: 107 additions & 0 deletions docs/prompt-optimization.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
# Prompt Optimization

## Overview

Prompt optimization in LiSSA-RATLR enables the automatic systematic refinement of prompts used for traceability link recovery.
By leveraging various optimization strategies and evaluation metrics, the effectiveness of prompts may be increased, leading to improved classification accuracy and overall performance.
This also enables us to quantify the importance of well designed prompts in the context of traceability link recovery.

## Core Components

### Prompt Metrics (`promptmetric` package)

A [`Metric`](../src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptoptimizer/promptmetric/Metric.java) is a numeric measure used to evaluate the quality of prompts during the optimization process.
They are used to guide the optimization by providing feedback on how well a prompt performs in generating accurate traceability links.
Currently, they are divided into two types of metrics.
Global metrics evaluate the prompt's performance across the entire test dataset.
Pointwise metrics scores the performance of prompts on individual data points and reduces the results into a single numeric performance value.
If a pointwise metric is used, different scoring and reduction strategies can be configured and combined as desired.

Custom metrics can be added either through implementation of the [`Global Metrics`](../src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptoptimizer/promptmetric/GlobalMetric.java) abstract class or through implementing new scoring and reduction strategies for pointwise metrics.

#### Available Metrics

- **[`Global Metrics`](../src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptoptimizer/promptmetric/GlobalMetric.java)**:
- **F_Beta-Score** (`fBeta` or `f1`)
- **[`Pointwise Metrics`](../src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptoptimizer/promptmetric/PointwiseMetric.java)** (`pointwise`):
- Scoring Strategies:
- Binary Scorer (Correct Classification / Incorrect Classification)
- Reduction Strategies:
- Mean
- **[`Mock Metric`](../src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptoptimizer/promptmetric/MockMetric.java)** (`mock`): Returns dummy values for testing purposes

### Optimizers (`promptoptimizer` package)

The [`Optimizer`](../src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptoptimizer/PromptOptimizer.java) module handles prompt optimization requests.
Different optimization strategies are implemented to improve prompts using various means.
Optimization approaches will usually utilize an iterative process.
Prompts are refined over multiple iterations based on the feedback provided through the selected prompt metric.
They are highly configurable with the optimization configuration file.

Prompt optimizers utilize the usual stages of the evaluation pipeline as well.
They utilize LiSSA's caching mechanism to provide consistent and reproducible results across different runs.

Custom optimizers can be added by implementing the [`Prompt Optimizer`](../src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptoptimizer/PromptOptimizer.java) interface.

#### Available Optimizers

- **[`Naive Iterative Optimizer`](../src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptoptimizer/IterativeOptimizer.java)** (`iterative` or `simple`):
The most basic optimizer that makes changes to the prompt in each iteration.
It simply queries the large language model to improve the current prompt using an optimization prompt.
The new prompt is naively carried over to the next iteration without any further checks.
- `simple`: Defaults to one (1) iteration
- `iterative`: Defaults to five (5) iterations
- **[`Feedback-Based Optimizer`](../src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptoptimizer/IterativeFeedbackOptimizer.java)** (`feedback`):
The iterative feedback optimizer improves prompts by leveraging feedback from the large language model.
In each iteration, it queries the model with an additional feedback text on the current prompt.
The optimizer carries the optimized prompt to the next iteration naively.
Trace links that were incorrectly classified in previous iterations are highlighted in the feedback text to guide the model towards better performance.
- **[`Mock Optimizer`](../src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptoptimizer/MockOptimizer.java)** (`mock`): Returns dummy optimized prompts for testing purposes

## Configuration

### Optimization Configuration Structure

Modules of the evaluation configuration file will also need to be configured in the optimization configuration file.
This excerpt shows the additional configuration options specific to prompt optimization.

```json

{
[...]
"metric" : {
"name" : "mock",
"args" : {}
},
"prompt_optimizer": {
"name" : "simple_openai",
"args" : {
"prompt": "Question: Here are two parts of software development artifacts.\n\n {source_type}: '''{source_content}'''\n\n {target_type}: '''{target_content}'''\n Are they related?\n\n Answer with 'yes' or 'no'.",
"model": "gpt-4o-mini-2024-07-18"
}
}
}

```

To see detailed configurable fields for any of the modules refer to a prompt optimization result file.
After executing a minimal configuration the resulting file will contain the full configuration with all default values filled in.

## Usage

Refer to the [CLI Documentation](cli.md#prompt-optimization) for instructions on how to run prompt optimization using the command line interface.

### Optimization Process

The optimization process generally follows these steps:

1. **Baseline Evaluation (Optional)**: If evaluation configurations are provided, the baseline performance of the original prompt is measured.
2. **Prompt Optimization**: The prompt optimizer is executed using the specified optimization configuration. The prompt is refined iteratively based on the selected metric.
3. **Post-Optimization Evaluation (Optional)**: If evaluation configurations are provided, the optimized prompt is evaluated to measure differences over the baseline.

## Output and Results

### Result Files

The prompt optimization results will be stored as `results-prompt-optimization-<config_filename>.md` just as regular evaluation results.
They include the full configuration used for optimization as well as the optimized prompt.
73 changes: 73 additions & 0 deletions example-configs/optimizer-config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@

{
Comment thread
dfuchss marked this conversation as resolved.
"cache_dir": "./cache/WARC",

"gold_standard_configuration": {
"path": "./datasets/req2req/WARC/answer.csv",
"hasHeader": "true"
},

"source_artifact_provider" : {
"name" : "text",
"args" : {
"artifact_type" : "requirement",
"path" : "./datasets/req2req/WARC/high"
}
},
"target_artifact_provider" : {
"name" : "text",
"args" : {
"artifact_type" : "requirement",
"path" : "./datasets/req2req/WARC/low"
}
},
"source_preprocessor" : {
"name" : "artifact",
"args" : {}
},
"target_preprocessor" : {
"name" : "artifact",
"args" : {}
},
"embedding_creator" : {
"name" : "openai",
"args" : {
"model": "text-embedding-3-large"
}
},
"source_store" : {
"name" : "custom",
"args" : {}
},
"target_store" : {
"name" : "cosine_similarity",
"args" : {
"max_results" : "4"
}
},
"metric" : {
"name" : "mock",
"args" : {}
},
"prompt_optimizer": {
"name" : "simple_openai",
"args" : {
"prompt": "Question: Here are two parts of software development artifacts.\n\n {source_type}: '''{source_content}'''\n\n {target_type}: '''{target_content}'''\n Are they related?\n\n Answer with 'yes' or 'no'.",
"model": "gpt-4o-mini-2024-07-18"
}
},
"classifier" : {
"name" : "simple_openai",
"args" : {
"model": "gpt-4o-mini-2024-07-18"
}
},
"result_aggregator" : {
"name" : "any_connection",
"args" : {}
},
"tracelinkid_postprocessor" : {
"name" : "identity",
"args" : {}
}
}
Comment thread
DanielDango marked this conversation as resolved.
2 changes: 1 addition & 1 deletion pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<picocli.version>4.7.7</picocli.version>
<record-builder.version>52</record-builder.version>
<metrics.version>0.2.0</metrics.version>
<metrics.version>0.2.1</metrics.version>
</properties>

<dependencyManagement>
Expand Down
6 changes: 4 additions & 2 deletions src/main/java/edu/kit/kastel/sdq/lissa/cli/MainCLI.java
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
/* Licensed under MIT 2025. */
/* Licensed under MIT 2025-2026. */
package edu.kit.kastel.sdq.lissa.cli;

import java.nio.file.Path;

import edu.kit.kastel.sdq.lissa.cli.command.EvaluateCommand;
import edu.kit.kastel.sdq.lissa.cli.command.OptimizeCommand;
import edu.kit.kastel.sdq.lissa.cli.command.TransitiveTraceCommand;

import picocli.CommandLine;
Expand All @@ -15,12 +16,13 @@
* <ul>
* <li>{@link EvaluateCommand} - Evaluates trace link analysis configurations</li>
* <li>{@link TransitiveTraceCommand} - Performs transitive trace link analysis</li>
* <li>{@link OptimizeCommand} - Optimize a single prompt for better trace link analysis classification results</li>
* </ul>
*
* The CLI supports various command-line options and provides help information
* through the standard help options (--help, -h).
*/
@CommandLine.Command(subcommands = {EvaluateCommand.class, TransitiveTraceCommand.class})
@CommandLine.Command(subcommands = {EvaluateCommand.class, TransitiveTraceCommand.class, OptimizeCommand.class})
public final class MainCLI {

/**
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
/* Licensed under MIT 2025-2026. */
package edu.kit.kastel.sdq.lissa.cli.command;

import static edu.kit.kastel.sdq.lissa.cli.command.EvaluateCommand.loadConfigs;

import java.io.IOException;
import java.nio.file.Path;
import java.util.List;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import edu.kit.kastel.sdq.lissa.ratlr.Evaluation;
import edu.kit.kastel.sdq.lissa.ratlr.Optimization;

import picocli.CommandLine;

/**
* Command implementation for optimizing prompts used in trace link analysis configurations.
* This command processes one or more optimization configuration files to run the prompt
* optimization pipeline, and optionally evaluates the optimized prompts using specified
* evaluation configuration files.
*/
@CommandLine.Command(
name = "optimize",
mixinStandardHelpOptions = true,
description = "Optimizes a prompt for usage in the pipeline")
public class OptimizeCommand implements Runnable {

private static final Logger logger = LoggerFactory.getLogger(OptimizeCommand.class);

/**
* Array of optimization configuration file paths to be processed.
* If a path points to a directory, all files within that directory will be processed.
* This option is required to run the optimization command.
*/
@CommandLine.Option(
names = {"-c", "--configs"},
arity = "1..*",
description =
"Specifies one or more config paths to be invoked by the pipeline iteratively. If the path points "
+ "to a directory, all files inside are chosen to get invoked.")
private Path[] optimizationConfigs;

/**
* Array of evaluation configuration file paths to be processed.
* If a path points to a directory, all files within that directory will be processed.
* This option is optional; if not provided, no evaluation will be performed after optimization.
*/
@CommandLine.Option(
names = {"-e", "--eval"},
arity = "0..*",
description = "Specifies optional evaluation config paths to be invoked by the pipeline iteratively. "
+ "Each evaluation configuration will be used with each optimization config."
+ "If the path points to a directory, all files inside are chosen to get invoked.")
private Path[] evaluationConfigs;

/**
* Runs the optimization and evaluation pipelines based on the provided configuration files.
* It first loads the optimization and evaluation configurations, then executes the evaluation
* pipeline for each evaluation configuration. This is the unoptimized baseline evaluation. <br>
* After that, it runs the optimization pipeline for
* each optimization configuration, and subsequently evaluates the optimized prompt using each
* evaluation configuration once more with the optimized prompt instead of the original one.
*/
@Override
public void run() {
List<Path> configsToOptimize = loadConfigs(optimizationConfigs);
List<Path> configsToEvaluate = loadConfigs(evaluationConfigs);
logger.info(
"Found {} optimization config files and {} evaluation config files to invoke",
configsToOptimize.size(),
configsToEvaluate.size());

for (Path evaluationConfig : configsToEvaluate) {
runEvaluation(evaluationConfig, "");
Comment thread
DanielDango marked this conversation as resolved.
}

for (Path optimizationConfig : configsToOptimize) {
String optimizedPrompt = runOptimization(optimizationConfig);
if (optimizedPrompt.isEmpty()) {
logger.warn(
"Skipping evaluation for optimization config '{}' as no optimized prompt was generated.",
optimizationConfig);
continue;
}
for (Path evaluationConfig : configsToEvaluate) {
runEvaluation(evaluationConfig, optimizedPrompt);
}
}
}

/**
* Runs the optimization pipeline using the specified configuration file.
*
* @param optimizationConfig The path to the optimization configuration file
* @return The optimized prompt generated by the optimization pipeline
*/
private static String runOptimization(Path optimizationConfig) {
logger.info("Invoking the optimization pipeline with '{}'", optimizationConfig);
String optimizedPrompt = "";
try {
var optimization = new Optimization(optimizationConfig);
optimizedPrompt = optimization.run();
} catch (IOException e) {
logger.warn(
"Optimization configuration '{}' threw an exception: {} \n Maybe the file does not exist?",
optimizationConfig,
e.getMessage());
}
return optimizedPrompt;
}

private static void runEvaluation(Path evaluationConfig, String optimizedPrompt) {
logger.info("Invoking the evaluation pipeline with '{}'", evaluationConfig);
try {
var evaluation = new Evaluation(evaluationConfig, optimizedPrompt);
evaluation.run();
} catch (IOException e) {
logger.warn(
"Baseline evaluation configuration '{}' threw an exception: {} \n Maybe the file does not exist?",
evaluationConfig,
e.getMessage());
}
}
}
Loading