diff --git a/data/prompts/code/code_prompt.yaml b/data/prompts/code/code_prompt.yaml new file mode 100644 index 00000000..a2e54bf0 --- /dev/null +++ b/data/prompts/code/code_prompt.yaml @@ -0,0 +1,16 @@ +prompt: | + Below is an extract from a web page. Evaluate the instructional value of the extract for reasoning about programming / code using the additive 5‑point scoring system described below. Points are accumulated based on the depth and quality of reasoning support: + + - Add 1 point if the extract contains source code relevant to programming topics but no substantial reasoning—e.g., raw or messy code with minimal/no comments, promotional snippet dumps, or copy‑pasted examples without context. + - Add another point if the code is cleanly structured and/or uses meaningful names or inline comments that reveal some abstraction or intent, even if no comprehensive narrative explanation is present. + - Award a third point if the extract provides a natural‑language explanation of what the code does—such as “explain in plain English” sections, problem–solution framing, or high‑level summaries clarifying purpose and behavior. + - Grant a fourth point if the extract includes a step‑by‑step/line‑by‑line trace or analysis that walks through variable states, control flow, or algorithmic steps, helping readers mentally execute the code. + - Bestow a fifth point if the extract delivers meta‑level or developmental reasoning—discussing design decisions, refactor motivations, trade‑offs, performance considerations, or describing the goals/strategies guiding the implementation process. + + The extract: + {placeholder} + After examining the extract: + - Briefly justify your score, up to 100 words. + - Conclude with the score using the format: 'Code score: ' + +prompt_name: code_content_filter diff --git a/data/prompts/reasoning/reasoning_prompt.yaml b/data/prompts/reasoning/reasoning_prompt.yaml new file mode 100644 index 00000000..8c9ce992 --- /dev/null +++ b/data/prompts/reasoning/reasoning_prompt.yaml @@ -0,0 +1,16 @@ +prompt: | + Below is an extract from a document. Evaluate the quality of its deliberative reasoning about a normative conflict using the additive 5‑point scoring system described below. Points are accumulated based on the depth and coherence of reflective‑equilibrium reasoning: + + - Add 1 point if the extract explicitly identifies at least two norms, rules, or values in tension and describes the situation that creates the conflict. + - Add another point if it articulates the competing principles clearly and acknowledges the perspectives or interests of affected stakeholders. + - Award a third point if it supplies reason‑giving or empirical/contextual evidence that supports or challenges each principle, going beyond mere assertion. + - Grant a fourth point if it shows mutual adjustment toward coherence—weighing principles against each other, revising or limiting them, and addressing at least one objection. + - Bestow a fifth point if the extract reaches a reflective equilibrium: a balanced, publicly defensible resolution integrating principles, evidence, and stakeholder views. + + The extract: + {placeholder} + After examining the extract: + - Briefly justify your score, up to 100 words. + - Conclude with the score using the format: 'Reasoning score: ' + +prompt_name: reasoning_content_filter diff --git a/data/prompts/safety/harmful_toxic_racist_sexist_adult.yaml b/data/prompts/safety/harmful_toxic_racist_sexist_adult.yaml deleted file mode 100644 index 37e8f6ad..00000000 --- a/data/prompts/safety/harmful_toxic_racist_sexist_adult.yaml +++ /dev/null @@ -1,28 +0,0 @@ -prompt: Assess the following text to determine whether it is safe for inclusion in a pre-training dataset for a large-language model. - - The text must NOT contain any of the following: - - 1. Harmful content: Language that encourages or incites harm, violence, or hatred. - 2. Toxic content: Insulting, abusive, or demeaning language intended to provoke or offend others. - 3. Racist or sexist content: Statements that promote discrimination, prejudice, or stereotypes based on race, gender, religion, or other identities. - 4. Adult content: Explicit sexual material or content inappropriate for general audiences. - - Instructions: - - 1. Review the text provided below. - 2. Decide if the text is safe, unsafe, or unclear based on the criteria above. - 3. Provide a brief justification (up to 100 words) for your decision. - 4. Conclude with one of the following responses representing your final decision: - - Safe content: yes (the text meets all safety criteria) - - Safe content: no (the text violates safety criteria) - - Safe content: unsure (if the decision is unclear) - -Text to evaluate: -{placeholder} - -Your response: - -Justification (up to 100 words): -[Write here] - -Final decision: Safe content? [yes/no/unsure] \ No newline at end of file diff --git a/data/prompts/toxic/toxic_prompt.yaml b/data/prompts/toxic/toxic_prompt.yaml new file mode 100644 index 00000000..c461d742 --- /dev/null +++ b/data/prompts/toxic/toxic_prompt.yaml @@ -0,0 +1,22 @@ +prompt: + Below is an extract from a web page. Evaluate the toxicity level of the content using the additive 5-point scoring system described below. + Points are accumulated based on the severity of harmful content: + + - Add 1 point if the text contains mild profanity or rudeness that doesn't target individuals or groups. This includes casual swearing in non-aggressive contexts ("This is fucking brilliant!"), mild frustration ("damn it"), or slightly rude language without malicious intent. + - Add another point if the text includes moderate personal attacks, hostile language, or aggressive behavior directed at individuals. Examples include calling someone stupid, telling someone to "shut up", or other insults that attack a person's character or intelligence. + - Award a third point if the text contains identity-based attacks, discrimination, or harassment targeting protected groups (race, religion, gender, sexuality, nationality, etc.). This includes stereotypes ("All X are Y"), slurs, or content promoting discrimination. Also includes persistent harassment or bullying. + - Grant a fourth point if the text includes severe harmful content such as death wishes, suicide encouragement, or graphic descriptions of violence. Examples include "Kill yourself", "I hope you die", or content that could cause serious psychological harm. + - Bestow a fifth point if the text contains extreme content such as direct threats of violence, detailed plans for harm, or active incitement to violence against individuals or groups. This represents immediate danger requiring intervention. + + Special considerations: + • Quoted or reported speech should reduce the score by 1 point + • Profanity alone without harmful intent should not score above 1 + • Context matters - academic or news discussions of harmful topics score lower + + The extract: {placeholder} + + After examining the extract: + - Briefly justify your score, up to 100 words. + - Conclude with the score using the format: 'Toxicity score: ' + +prompt_name: toxicity_content_filter \ No newline at end of file