-
Notifications
You must be signed in to change notification settings - Fork 3
Add prompt optimizers to LiSSA #44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
51 commits
Select commit
Hold shift + click to select a range
711d142
feat: add naive prompt optimizer with iterative optimization framework
DanielDango 7887f8d
fix: correct typos and improve documentation in various files as comm…
DanielDango 4a7744a
add feedback optimizer implementation
DanielDango 75dddd1
wip: add more logging
DanielDango 565a80e
feat: enhance cache parameters to include classifier type in Reasonin…
DanielDango 883ede1
feat: Enhance logging for cache operations and misclassification checks
DanielDango cb89983
Rework caching. Now it shall be ensured that Cacheparameters are used…
dfuchss fb1de00
Update ArchitectureTest.java
dfuchss 3e7facf
feat: Remove caching from GlobalMetric and PointwiseMetric for prompt…
DanielDango e31c6c1
Merge remote-tracking branch 'upstream/feature/simplify-caching' into…
DanielDango c5fff3f
chore: update formatting
DanielDango abd5a4f
chore: update cache parameter usage
DanielDango 317d3cc
feat: add additional logging
DanielDango 63df5fb
fix: revert reduced target store deduplication
DanielDango a8d1d47
fix: re-enable other e2e optimizer tests
DanielDango b09dea2
fix: enable correct element store deduplication and fix typo in itera…
DanielDango 93f87e5
chore: bump license to 2026
DanielDango 4929a71
fix: make prompt key field statically available to resolve null poin…
DanielDango a14496b
Merge remote-tracking branch 'upstream/main' into feature/add-prompt-…
DanielDango 83b5b7b
revert: Evaluator module only required for future gradient optimizer
DanielDango 6cbd159
refactor: address review of copilot
DanielDango f6f1895
Merge remote-tracking branch 'upstream/main' into feature/add-prompt-…
DanielDango bc44e6b
docs: add docstrings missed in #48
DanielDango 03d1d0d
fix: revert evaluator param from javadoc
DanielDango f1da525
docs: add missing javadoc
DanielDango d0240e9
feat: introduce ModuleConfiguration.with() to overwrite the classific…
DanielDango 1dc90b1
revert: apparently I accidentally reverted changes to CacheKey changes?
DanielDango e7bf61b
revert: Classifier copyOf visibility returned to protected
DanielDango 5c3acbd
chore: undo updated license for not actually modified files
DanielDango 2b82f4d
Merge remote-tracking branch 'upstream/main' into feature/add-prompt-…
DanielDango b0f9e28
chore: implement various suggestions for improvement
DanielDango 7b0be53
refactor: rename Configuration to EvaluationConfiguration
DanielDango 7cff7f7
docs: add documentation for prompt optimizers
DanielDango c6267e6
Update docs/prompt-optimization.md
DanielDango 71e0f7c
Update docs/prompt-optimization.md
DanielDango e25257b
chore: implement various feedbacks raised during meeting
DanielDango e4fbc17
refactor: discontinue usage of mock classifier in feedback optimizer …
DanielDango f3703c9
Merge remote-tracking branch 'upstream/main' into feature/add-prompt-…
DanielDango d2cedeb
unify logger name
dfuchss c73670e
Fix configuration hierarchy
dfuchss d47995c
refactor: fix package hierarchy
DanielDango 09dee58
fix: apply feedback from @copilot
DanielDango 582173b
refactor: Merge factory methods into the related interfaces
DanielDango 95d905f
refactor: Dont reimplement the f1-wheel
DanielDango 6a3a0d3
apply spotless
DanielDango d5f758b
revert: remove evaluator from config which is to be added with protegi
DanielDango 23bac91
fix: ensure correct argument retrieval for ModuleConfiguration.with()
DanielDango 822116d
Modify configuration in constructor
dfuchss 0356039
Assert -> if/except
dfuchss 632a6c9
feat: use newly added fBeta metric instead of custom implementation
DanielDango 80af533
docs: clarify Evaluation constructor behavior regarding prompt overwr…
DanielDango File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,107 @@ | ||
| # Prompt Optimization | ||
|
|
||
| ## Overview | ||
|
|
||
| Prompt optimization in LiSSA-RATLR enables the automatic systematic refinement of prompts used for traceability link recovery. | ||
| By leveraging various optimization strategies and evaluation metrics, the effectiveness of prompts may be increased, leading to improved classification accuracy and overall performance. | ||
| This also enables us to quantify the importance of well designed prompts in the context of traceability link recovery. | ||
|
|
||
| ## Core Components | ||
|
|
||
| ### Prompt Metrics (`promptmetric` package) | ||
|
|
||
| A [`Metric`](../src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptoptimizer/promptmetric/Metric.java) is a numeric measure used to evaluate the quality of prompts during the optimization process. | ||
| They are used to guide the optimization by providing feedback on how well a prompt performs in generating accurate traceability links. | ||
| Currently, they are divided into two types of metrics. | ||
| Global metrics evaluate the prompt's performance across the entire test dataset. | ||
| Pointwise metrics scores the performance of prompts on individual data points and reduces the results into a single numeric performance value. | ||
| If a pointwise metric is used, different scoring and reduction strategies can be configured and combined as desired. | ||
|
|
||
| Custom metrics can be added either through implementation of the [`Global Metrics`](../src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptoptimizer/promptmetric/GlobalMetric.java) abstract class or through implementing new scoring and reduction strategies for pointwise metrics. | ||
|
|
||
| #### Available Metrics | ||
|
|
||
| - **[`Global Metrics`](../src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptoptimizer/promptmetric/GlobalMetric.java)**: | ||
| - **F_Beta-Score** (`fBeta` or `f1`) | ||
| - **[`Pointwise Metrics`](../src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptoptimizer/promptmetric/PointwiseMetric.java)** (`pointwise`): | ||
| - Scoring Strategies: | ||
| - Binary Scorer (Correct Classification / Incorrect Classification) | ||
| - Reduction Strategies: | ||
| - Mean | ||
| - **[`Mock Metric`](../src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptoptimizer/promptmetric/MockMetric.java)** (`mock`): Returns dummy values for testing purposes | ||
|
|
||
| ### Optimizers (`promptoptimizer` package) | ||
|
|
||
| The [`Optimizer`](../src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptoptimizer/PromptOptimizer.java) module handles prompt optimization requests. | ||
| Different optimization strategies are implemented to improve prompts using various means. | ||
| Optimization approaches will usually utilize an iterative process. | ||
| Prompts are refined over multiple iterations based on the feedback provided through the selected prompt metric. | ||
| They are highly configurable with the optimization configuration file. | ||
|
|
||
| Prompt optimizers utilize the usual stages of the evaluation pipeline as well. | ||
| They utilize LiSSA's caching mechanism to provide consistent and reproducible results across different runs. | ||
|
|
||
| Custom optimizers can be added by implementing the [`Prompt Optimizer`](../src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptoptimizer/PromptOptimizer.java) interface. | ||
|
|
||
| #### Available Optimizers | ||
|
|
||
| - **[`Naive Iterative Optimizer`](../src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptoptimizer/IterativeOptimizer.java)** (`iterative` or `simple`): | ||
| The most basic optimizer that makes changes to the prompt in each iteration. | ||
| It simply queries the large language model to improve the current prompt using an optimization prompt. | ||
| The new prompt is naively carried over to the next iteration without any further checks. | ||
| - `simple`: Defaults to one (1) iteration | ||
| - `iterative`: Defaults to five (5) iterations | ||
| - **[`Feedback-Based Optimizer`](../src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptoptimizer/IterativeFeedbackOptimizer.java)** (`feedback`): | ||
| The iterative feedback optimizer improves prompts by leveraging feedback from the large language model. | ||
| In each iteration, it queries the model with an additional feedback text on the current prompt. | ||
| The optimizer carries the optimized prompt to the next iteration naively. | ||
| Trace links that were incorrectly classified in previous iterations are highlighted in the feedback text to guide the model towards better performance. | ||
| - **[`Mock Optimizer`](../src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptoptimizer/MockOptimizer.java)** (`mock`): Returns dummy optimized prompts for testing purposes | ||
|
|
||
| ## Configuration | ||
|
|
||
| ### Optimization Configuration Structure | ||
|
|
||
| Modules of the evaluation configuration file will also need to be configured in the optimization configuration file. | ||
| This excerpt shows the additional configuration options specific to prompt optimization. | ||
|
|
||
| ```json | ||
|
|
||
| { | ||
| [...] | ||
| "metric" : { | ||
| "name" : "mock", | ||
| "args" : {} | ||
| }, | ||
| "prompt_optimizer": { | ||
| "name" : "simple_openai", | ||
| "args" : { | ||
| "prompt": "Question: Here are two parts of software development artifacts.\n\n {source_type}: '''{source_content}'''\n\n {target_type}: '''{target_content}'''\n Are they related?\n\n Answer with 'yes' or 'no'.", | ||
| "model": "gpt-4o-mini-2024-07-18" | ||
| } | ||
| } | ||
| } | ||
|
|
||
| ``` | ||
|
|
||
| To see detailed configurable fields for any of the modules refer to a prompt optimization result file. | ||
| After executing a minimal configuration the resulting file will contain the full configuration with all default values filled in. | ||
|
|
||
| ## Usage | ||
|
|
||
| Refer to the [CLI Documentation](cli.md#prompt-optimization) for instructions on how to run prompt optimization using the command line interface. | ||
|
|
||
| ### Optimization Process | ||
|
|
||
| The optimization process generally follows these steps: | ||
|
|
||
| 1. **Baseline Evaluation (Optional)**: If evaluation configurations are provided, the baseline performance of the original prompt is measured. | ||
| 2. **Prompt Optimization**: The prompt optimizer is executed using the specified optimization configuration. The prompt is refined iteratively based on the selected metric. | ||
| 3. **Post-Optimization Evaluation (Optional)**: If evaluation configurations are provided, the optimized prompt is evaluated to measure differences over the baseline. | ||
|
|
||
| ## Output and Results | ||
|
|
||
| ### Result Files | ||
|
|
||
| The prompt optimization results will be stored as `results-prompt-optimization-<config_filename>.md` just as regular evaluation results. | ||
| They include the full configuration used for optimization as well as the optimized prompt. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,73 @@ | ||
|
|
||
| { | ||
| "cache_dir": "./cache/WARC", | ||
|
|
||
| "gold_standard_configuration": { | ||
| "path": "./datasets/req2req/WARC/answer.csv", | ||
| "hasHeader": "true" | ||
| }, | ||
|
|
||
| "source_artifact_provider" : { | ||
| "name" : "text", | ||
| "args" : { | ||
| "artifact_type" : "requirement", | ||
| "path" : "./datasets/req2req/WARC/high" | ||
| } | ||
| }, | ||
| "target_artifact_provider" : { | ||
| "name" : "text", | ||
| "args" : { | ||
| "artifact_type" : "requirement", | ||
| "path" : "./datasets/req2req/WARC/low" | ||
| } | ||
| }, | ||
| "source_preprocessor" : { | ||
| "name" : "artifact", | ||
| "args" : {} | ||
| }, | ||
| "target_preprocessor" : { | ||
| "name" : "artifact", | ||
| "args" : {} | ||
| }, | ||
| "embedding_creator" : { | ||
| "name" : "openai", | ||
| "args" : { | ||
| "model": "text-embedding-3-large" | ||
| } | ||
| }, | ||
| "source_store" : { | ||
| "name" : "custom", | ||
| "args" : {} | ||
| }, | ||
| "target_store" : { | ||
| "name" : "cosine_similarity", | ||
| "args" : { | ||
| "max_results" : "4" | ||
| } | ||
| }, | ||
| "metric" : { | ||
| "name" : "mock", | ||
| "args" : {} | ||
| }, | ||
| "prompt_optimizer": { | ||
| "name" : "simple_openai", | ||
| "args" : { | ||
| "prompt": "Question: Here are two parts of software development artifacts.\n\n {source_type}: '''{source_content}'''\n\n {target_type}: '''{target_content}'''\n Are they related?\n\n Answer with 'yes' or 'no'.", | ||
| "model": "gpt-4o-mini-2024-07-18" | ||
| } | ||
| }, | ||
| "classifier" : { | ||
| "name" : "simple_openai", | ||
| "args" : { | ||
| "model": "gpt-4o-mini-2024-07-18" | ||
| } | ||
| }, | ||
| "result_aggregator" : { | ||
| "name" : "any_connection", | ||
| "args" : {} | ||
| }, | ||
| "tracelinkid_postprocessor" : { | ||
| "name" : "identity", | ||
| "args" : {} | ||
| } | ||
| } | ||
|
DanielDango marked this conversation as resolved.
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
126 changes: 126 additions & 0 deletions
126
src/main/java/edu/kit/kastel/sdq/lissa/cli/command/OptimizeCommand.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,126 @@ | ||
| /* Licensed under MIT 2025-2026. */ | ||
| package edu.kit.kastel.sdq.lissa.cli.command; | ||
|
|
||
| import static edu.kit.kastel.sdq.lissa.cli.command.EvaluateCommand.loadConfigs; | ||
|
|
||
| import java.io.IOException; | ||
| import java.nio.file.Path; | ||
| import java.util.List; | ||
|
|
||
| import org.slf4j.Logger; | ||
| import org.slf4j.LoggerFactory; | ||
|
|
||
| import edu.kit.kastel.sdq.lissa.ratlr.Evaluation; | ||
| import edu.kit.kastel.sdq.lissa.ratlr.Optimization; | ||
|
|
||
| import picocli.CommandLine; | ||
|
|
||
| /** | ||
| * Command implementation for optimizing prompts used in trace link analysis configurations. | ||
| * This command processes one or more optimization configuration files to run the prompt | ||
| * optimization pipeline, and optionally evaluates the optimized prompts using specified | ||
| * evaluation configuration files. | ||
| */ | ||
| @CommandLine.Command( | ||
| name = "optimize", | ||
| mixinStandardHelpOptions = true, | ||
| description = "Optimizes a prompt for usage in the pipeline") | ||
| public class OptimizeCommand implements Runnable { | ||
|
|
||
| private static final Logger logger = LoggerFactory.getLogger(OptimizeCommand.class); | ||
|
|
||
| /** | ||
| * Array of optimization configuration file paths to be processed. | ||
| * If a path points to a directory, all files within that directory will be processed. | ||
| * This option is required to run the optimization command. | ||
| */ | ||
| @CommandLine.Option( | ||
| names = {"-c", "--configs"}, | ||
| arity = "1..*", | ||
| description = | ||
| "Specifies one or more config paths to be invoked by the pipeline iteratively. If the path points " | ||
| + "to a directory, all files inside are chosen to get invoked.") | ||
| private Path[] optimizationConfigs; | ||
|
|
||
| /** | ||
| * Array of evaluation configuration file paths to be processed. | ||
| * If a path points to a directory, all files within that directory will be processed. | ||
| * This option is optional; if not provided, no evaluation will be performed after optimization. | ||
| */ | ||
| @CommandLine.Option( | ||
| names = {"-e", "--eval"}, | ||
| arity = "0..*", | ||
| description = "Specifies optional evaluation config paths to be invoked by the pipeline iteratively. " | ||
| + "Each evaluation configuration will be used with each optimization config." | ||
| + "If the path points to a directory, all files inside are chosen to get invoked.") | ||
| private Path[] evaluationConfigs; | ||
|
|
||
| /** | ||
| * Runs the optimization and evaluation pipelines based on the provided configuration files. | ||
| * It first loads the optimization and evaluation configurations, then executes the evaluation | ||
| * pipeline for each evaluation configuration. This is the unoptimized baseline evaluation. <br> | ||
| * After that, it runs the optimization pipeline for | ||
| * each optimization configuration, and subsequently evaluates the optimized prompt using each | ||
| * evaluation configuration once more with the optimized prompt instead of the original one. | ||
| */ | ||
| @Override | ||
| public void run() { | ||
| List<Path> configsToOptimize = loadConfigs(optimizationConfigs); | ||
| List<Path> configsToEvaluate = loadConfigs(evaluationConfigs); | ||
| logger.info( | ||
| "Found {} optimization config files and {} evaluation config files to invoke", | ||
| configsToOptimize.size(), | ||
| configsToEvaluate.size()); | ||
|
|
||
| for (Path evaluationConfig : configsToEvaluate) { | ||
| runEvaluation(evaluationConfig, ""); | ||
|
DanielDango marked this conversation as resolved.
|
||
| } | ||
|
|
||
| for (Path optimizationConfig : configsToOptimize) { | ||
| String optimizedPrompt = runOptimization(optimizationConfig); | ||
| if (optimizedPrompt.isEmpty()) { | ||
| logger.warn( | ||
| "Skipping evaluation for optimization config '{}' as no optimized prompt was generated.", | ||
| optimizationConfig); | ||
| continue; | ||
| } | ||
| for (Path evaluationConfig : configsToEvaluate) { | ||
| runEvaluation(evaluationConfig, optimizedPrompt); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * Runs the optimization pipeline using the specified configuration file. | ||
| * | ||
| * @param optimizationConfig The path to the optimization configuration file | ||
| * @return The optimized prompt generated by the optimization pipeline | ||
| */ | ||
| private static String runOptimization(Path optimizationConfig) { | ||
| logger.info("Invoking the optimization pipeline with '{}'", optimizationConfig); | ||
| String optimizedPrompt = ""; | ||
| try { | ||
| var optimization = new Optimization(optimizationConfig); | ||
| optimizedPrompt = optimization.run(); | ||
| } catch (IOException e) { | ||
| logger.warn( | ||
| "Optimization configuration '{}' threw an exception: {} \n Maybe the file does not exist?", | ||
| optimizationConfig, | ||
| e.getMessage()); | ||
| } | ||
| return optimizedPrompt; | ||
| } | ||
|
|
||
| private static void runEvaluation(Path evaluationConfig, String optimizedPrompt) { | ||
| logger.info("Invoking the evaluation pipeline with '{}'", evaluationConfig); | ||
| try { | ||
| var evaluation = new Evaluation(evaluationConfig, optimizedPrompt); | ||
| evaluation.run(); | ||
| } catch (IOException e) { | ||
| logger.warn( | ||
| "Baseline evaluation configuration '{}' threw an exception: {} \n Maybe the file does not exist?", | ||
| evaluationConfig, | ||
| e.getMessage()); | ||
| } | ||
| } | ||
| } | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.