From 6de2f73bbd51ca9d2421698f895f5bb849d39810 Mon Sep 17 00:00:00 2001 From: Abbas Khan Date: Sun, 23 Feb 2025 03:04:48 +0100 Subject: [PATCH 1/2] chore: Updated the adult content filter prompt --- .../harmful_toxic_racist_sexist_adult.yaml | 44 +++++++------------ 1 file changed, 16 insertions(+), 28 deletions(-) diff --git a/data/prompts/safety/harmful_toxic_racist_sexist_adult.yaml b/data/prompts/safety/harmful_toxic_racist_sexist_adult.yaml index 37e8f6ad..6b4e2aec 100644 --- a/data/prompts/safety/harmful_toxic_racist_sexist_adult.yaml +++ b/data/prompts/safety/harmful_toxic_racist_sexist_adult.yaml @@ -1,28 +1,16 @@ -prompt: Assess the following text to determine whether it is safe for inclusion in a pre-training dataset for a large-language model. - - The text must NOT contain any of the following: - - 1. Harmful content: Language that encourages or incites harm, violence, or hatred. - 2. Toxic content: Insulting, abusive, or demeaning language intended to provoke or offend others. - 3. Racist or sexist content: Statements that promote discrimination, prejudice, or stereotypes based on race, gender, religion, or other identities. - 4. Adult content: Explicit sexual material or content inappropriate for general audiences. - - Instructions: - - 1. Review the text provided below. - 2. Decide if the text is safe, unsafe, or unclear based on the criteria above. - 3. Provide a brief justification (up to 100 words) for your decision. - 4. Conclude with one of the following responses representing your final decision: - - Safe content: yes (the text meets all safety criteria) - - Safe content: no (the text violates safety criteria) - - Safe content: unsure (if the decision is unclear) - -Text to evaluate: -{placeholder} - -Your response: - -Justification (up to 100 words): -[Write here] - -Final decision: Safe content? [yes/no/unsure] \ No newline at end of file +prompt: | + Below is an extract from a web page. Evaluate the level of explicit sexual material or content inappropriate for general audiences using the additive 5-point scoring system described below. + Points are accumulated based on the degree of inappropriateness: + + - Add 1 point if the extract contains mild sexual innuendo or suggestive content that is not explicit, generally acceptable for most audiences but possibly unsuitable for young children. + - Add another point if the extract includes moderate sexual content or references that are more direct but not graphic, inappropriate for children though potentially acceptable for teens. + - Award a third point if the extract contains explicit sexual references or mild explicit imagery, clearly unsuitable for minors. + - Grant a fourth point if the extract includes strong explicit sexual content or graphic imagery, intended for adults only. + - Bestow a fifth point if the extract is dominated by hardcore explicit sexual material, such as pornography, highly inappropriate for general audiences. + Note: Topics like sexual education, health-related sexual organ discussions, LGBTQ+ issues, contraception, or intimate hygiene should + be scored 0 if presented educationally and without explicit intent. + The extract: + {placeholder} + After examining the extract: + - Briefly justify your score, up to 100 words. + - Conclude with the score using the format: 'Adult score: ' \ No newline at end of file From 8a555bbd30b49f581f46901d063f4d9c3c84eb5d Mon Sep 17 00:00:00 2001 From: Abbas Khan Date: Mon, 24 Feb 2025 13:45:15 +0100 Subject: [PATCH 2/2] chore: Updated the adult content filter prompt and added the prompt_name --- data/prompts/safety/harmful_toxic_racist_sexist_adult.yaml | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/data/prompts/safety/harmful_toxic_racist_sexist_adult.yaml b/data/prompts/safety/harmful_toxic_racist_sexist_adult.yaml index 6b4e2aec..82f02e3e 100644 --- a/data/prompts/safety/harmful_toxic_racist_sexist_adult.yaml +++ b/data/prompts/safety/harmful_toxic_racist_sexist_adult.yaml @@ -13,4 +13,6 @@ prompt: | {placeholder} After examining the extract: - Briefly justify your score, up to 100 words. - - Conclude with the score using the format: 'Adult score: ' \ No newline at end of file + - Conclude with the score using the format: 'Adult score: ' + +prompt_name: adult_content_filter \ No newline at end of file