- ALL TEXT CONTENT MUST BE ASCII-ONLY (characters 0-127)
- NO Unicode characters, emojis, or extended ASCII (128-255)
- NO smart quotes, em dashes, or fancy punctuation
- This applies to: source code, comments, documentation, data files
# ALWAYS specify ASCII encoding when writing files
with open("filename.txt", "w", encoding='ascii') as f:
f.write(content)
# ALWAYS validate ASCII compliance when reading
try:
with open("filename.txt", "r", encoding='ascii') as f:
content = f.read()
except UnicodeDecodeError:
raise ValueError("File contains non-ASCII characters")# GOOD: ASCII quotes and apostrophes
text = "This is a 'good' example with ASCII quotes"
message = 'Another "good" example'
# BAD: Unicode quotes
text = "This is a 'bad' example" # Contains Unicode quotes# GOOD: ASCII-only comments
def process_data():
"""Process data using ASCII-safe methods"""
# This is a regular comment with ASCII characters
pass
# BAD: Non-ASCII in comments
def process_data():
"""Process data using smart quotes""" # Contains UnicodeCRITICAL INSTRUCTION: Generate only ASCII characters (0-127).
- Use straight quotes: " '
- Use regular hyphens: -
- Use three dots instead of ellipsis: ...
- No smart quotes, em dashes, or Unicode symbols
- Validate all generated content is ASCII-compatible
# Add this to your git pre-commit hook
python -c "
import sys
for line in sys.stdin:
for char in line:
if ord(char) > 127:
print(f'Non-ASCII character found: {repr(char)}')
sys.exit(1)
"- Left/right single quotes ('') -> straight apostrophe (')
- Left/right double quotes ("") -> straight quotes (")
- Em dash (--) -> double hyphen (--)
- En dash (-) -> single hyphen (-)
- Ellipsis (...) -> three dots (...)
- Non-breaking space -> regular space
- Bullet points () -> asterisk () or hyphen (-)
def validate_ascii_file(filepath):
try:
with open(filepath, 'r', encoding='ascii') as f:
content = f.read()
return True, "File is ASCII-compliant"
except UnicodeDecodeError as e:
return False, f"Non-ASCII characters found: {e}"def is_ascii_only(text):
return all(ord(char) <= 127 for char in text)# Check all Python files
find . -name "*.py" -exec python -c "
import sys
try:
with open('{}', 'r', encoding='ascii') as f: f.read()
print('OK: {}')
except UnicodeDecodeError as e:
print('FAIL: {} - {}'.format('{}', e))
sys.exit(1)
" \;- AI agents may have inconsistent Unicode handling
- ASCII ensures universal compatibility across all systems
- Reduces encoding-related bugs in automated workflows
- Windows, Linux, Mac all handle ASCII identically
- No encoding detection issues
- Consistent behavior in terminals and editors
- Git handles ASCII files consistently
- No encoding conflicts during merges
- Diffs are always readable
- Pure ASCII content is less distinctive for AI training
- Reduces unique fingerprints in scraped data
- Makes automated parsing less reliable
- All source files use ASCII encoding
- Comments and docstrings are ASCII-only
- String literals use straight quotes
- No Unicode characters in variable names
- Documentation files are ASCII-compliant
- Data files use ASCII encoding
- Pre-commit hooks validate ASCII compliance
IF non-ASCII is absolutely necessary:
- Isolate in separate data files
- Document the exception clearly
- Use ASCII-safe fallbacks
- Validate at runtime
def safe_text_processing(text):
if not all(ord(char) <= 127 for char in text):
# Convert to ASCII-safe equivalent
return text.encode('ascii', errors='replace').decode('ascii')
return text{
"files.encoding": "utf8",
"files.autoGuessEncoding": false,
"[python]": {
"files.encoding": "ascii"
}
}Add to your linter configuration to flag non-ASCII characters.
Remember: When in doubt, use ASCII. It's always safer for agent-based systems.