diff --git a/CLAUDE.md b/CLAUDE.md index 073e826..ca67fac 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -16,11 +16,11 @@ This is a Python-based Reddit moderation log publisher that automatically scrape ## Development Commands +**IMPORTANT**: Always use `/opt/.venv/redditbot/bin/python` for all Python commands in this project. + ### Setup and Dependencies ```bash -# Install dependencies -pip install praw - +# Dependencies are pre-installed in the venv # Copy template config (required for first run) cp config_template.json config.json ``` @@ -28,22 +28,31 @@ cp config_template.json config.json ### Running the Application ```bash # Test connection and configuration -python modlog_wiki_publisher.py --test +/opt/.venv/redditbot/bin/python modlog_wiki_publisher.py --test # Single run -python modlog_wiki_publisher.py --source-subreddit SUBREDDIT_NAME +/opt/.venv/redditbot/bin/python modlog_wiki_publisher.py --source-subreddit SUBREDDIT_NAME # Continuous daemon mode -python modlog_wiki_publisher.py --source-subreddit SUBREDDIT_NAME --continuous +/opt/.venv/redditbot/bin/python modlog_wiki_publisher.py --source-subreddit SUBREDDIT_NAME --continuous + +# Force wiki update only (using existing database data) +/opt/.venv/redditbot/bin/python modlog_wiki_publisher.py --source-subreddit SUBREDDIT_NAME --force-wiki # Debug authentication issues -python debug_auth.py +/opt/.venv/redditbot/bin/python debug_auth.py ``` ### Database Operations ```bash -# View recent processed actions -sqlite3 modlog.db "SELECT * FROM processed_actions ORDER BY created_at DESC LIMIT 10;" +# View recent processed actions with removal reasons +sqlite3 modlog.db "SELECT action_id, action_type, moderator, removal_reason, subreddit, created_at FROM processed_actions ORDER BY created_at DESC LIMIT 10;" + +# View actions by subreddit +sqlite3 modlog.db "SELECT action_type, moderator, target_author, removal_reason FROM processed_actions WHERE subreddit = 'usenet' ORDER BY created_at DESC LIMIT 5;" + +# Track content lifecycle by target ID +sqlite3 modlog.db "SELECT target_id, action_type, moderator, removal_reason, datetime(created_at, 'unixepoch') FROM processed_actions WHERE target_id LIKE '%1mkz4jm%' ORDER BY created_at;" # Manual cleanup of old entries sqlite3 modlog.db "DELETE FROM processed_actions WHERE created_at < date('now', '-30 days');" @@ -53,6 +62,7 @@ sqlite3 modlog.db "DELETE FROM processed_actions WHERE created_at < date('now', The application supports both JSON config files and CLI arguments (CLI overrides JSON): +### Core Options - `--source-subreddit`: Target subreddit for reading/writing logs - `--wiki-page`: Wiki page name (default: "modlog") - `--retention-days`: Database cleanup period (default: 30) @@ -60,6 +70,18 @@ The application supports both JSON config files and CLI arguments (CLI overrides - `--interval`: Seconds between updates in daemon mode (default: 300) - `--debug`: Enable verbose logging +### Display Options +- `anonymize_moderators`: Whether to show "HumanModerator" for human mods (default: true) + - `true` (default): Shows "AutoMod", "Reddit", or "HumanModerator" + - `false`: Shows actual moderator usernames + +### Database Features +- **Multi-subreddit support**: Single database handles multiple subreddits safely +- **Removal reason storage**: Full text/number handling from Reddit API +- **Target author tracking**: Actual usernames stored and displayed +- **Content ID extraction**: Unique IDs from permalinks for precise tracking +- **Data separation**: Subreddit column prevents cross-contamination + ## Authentication Requirements The bot account needs: @@ -80,8 +102,70 @@ The bot account needs: Use `--test` flag to verify configuration and Reddit API connectivity without making changes. +## Content Link Guidelines + +**CRITICAL**: Content links in the modlog should NEVER point to user profiles (`/u/username`). Links should only point to: +- Actual removed posts (`/comments/postid/`) +- Actual removed comments (`/comments/postid/_/commentid/`) +- No link at all if no actual content is available + +User profile links are a privacy concern and not useful for modlog purposes. + +## Recent Improvements (v2.1) + +### Multi-Subreddit Database Support +- ✅ Fixed critical error that prevented multi-subreddit databases from working +- ✅ Single database now safely handles multiple subreddits with proper data separation +- ✅ Per-subreddit wiki updates without cross-contamination +- ✅ Subreddit-specific logging and error handling + +### Removal Reason Transparency +- ✅ Fixed "Removal reason applied" showing instead of actual text +- ✅ Full transparency - shows ALL available removal reason data including template numbers +- ✅ Consistent handling between storage and display logic using correct Reddit API fields +- ✅ Displays actual removal reasons like "Invites - No asking", "This comment has been filtered due to crowd control" + +### Unique Content ID Tracking +- ✅ Fixed duplicate IDs in markdown tables where all comments showed same post ID +- ✅ Comments now show unique comment IDs (e.g., "n7ravg2") for precise tracking +- ✅ Posts show post IDs for clear content identification +- ✅ Each modlog entry has a unique identifier for easy reference + +### Content Linking and Display +- ✅ Content links point to actual Reddit posts/comments, never user profiles for privacy +- ✅ Fixed target authors showing as [deleted] - now displays actual usernames +- ✅ Proper content titles extracted from Reddit API data +- ✅ AutoModerator displays as "AutoModerator" (not anonymized) +- ✅ Configurable anonymization for human moderators + +### Data Integrity +- ✅ Pipe character escaping for markdown table compatibility +- ✅ Robust error handling for mixed subreddit scenarios +- ✅ Database schema at version 5 with all required columns +- ✅ Consistent Reddit API field usage (action.details vs action.description) + +## Development Guidelines + +### Git Workflow +- If branch is not main, you may commit and push if a PR is draft or not open +- Use conventional commits for all changes +- Use multiple commits if needed, or patch if easier +- Always update CLAUDE.md and README.md when making changes + +### Code Standards +- Always escape markdown table values like removal reasons for pipes +- Store pipe-free data in database to prevent markdown issues +- Confirm cache file of wiki page and warn if same, interactively ask to force refresh +- Always use the specified virtual environment path + +### Documentation +- Always update commands and flags in documentation +- Remove CHANGELOG from CLAUDE.md (keep separate) +- Create and update changelog based on git tags (should be scripted) + ## Common Issues -- 401 errors: Check app type is "script" and verify client_id/client_secret -- Wiki permission denied: Ensure bot has moderator or wiki contributor access -- Rate limiting: Increase `--interval` and/or reduce `--batch-size` \ No newline at end of file +- **401 errors**: Check app type is "script" and verify client_id/client_secret +- **Wiki permission denied**: Ensure bot has moderator or wiki contributor access +- **Rate limiting**: Increase `--interval` and/or reduce `--batch-size` +- **Module not found**: Always use `/opt/.venv/redditbot/bin/python` instead of system python \ No newline at end of file diff --git a/README.md b/README.md index d0758b3..b62e0f8 100644 --- a/README.md +++ b/README.md @@ -4,14 +4,19 @@ Automatically publishes Reddit moderation logs to a subreddit wiki page with mod ## Features -* 📊 Publishes modlogs as organized markdown tables -* 📧 Pre-populated modmail links for removal inquiries -* 🗄️ SQLite database for deduplication and retention -* ⏰ Configurable update intervals -* 🔒 Automatic cleanup of old entries -* ⚡ Handles Reddit's 524KB wiki size limit +* 📊 Publishes modlogs as organized markdown tables with unique content tracking IDs +* 📧 Pre-populated modmail links for removal inquiries (formatted as clickable markdown links) +* 🗄️ SQLite database for deduplication and retention with **multi-subreddit support** +* ⏰ Configurable update intervals with continuous daemon mode +* 🔒 Automatic cleanup of old entries with configurable retention +* ⚡ Handles Reddit's 524KB wiki size limit automatically * 🧩 Fully CLI-configurable (no need to edit `config.json`) -* 📁 Per-subreddit log files for debugging +* 📁 Per-subreddit log files for debugging and monitoring +* 🔒 Configurable moderator anonymization (AutoModerator/HumanModerator) +* 📝 **Complete removal reason transparency** - AutoModerator rule text, addremovalreason descriptions, all actual removal text (never generic messages or template numbers) +* 🔗 Links directly to actual content (posts/comments), never user profiles for privacy +* 🆔 **Unique content IDs** - comments show comment IDs, posts show post IDs for precise tracking +* ✅ **Multi-subreddit database support** - single database handles multiple subreddits safely ## Quick Start @@ -67,33 +72,52 @@ Create `config.json`: "ignored_moderators": ["AutoModerator"], "update_interval": 300, "batch_size": 100, - "retention_days": 30 + "retention_days": 30, + "anonymize_moderators": true } ``` ### Configurable via CLI -| CLI Option | JSON Key | Description | Default | -| -------------------- | ------------------ | -------------------------------------- | ------------- | -| `--source-subreddit` | `source_subreddit` | Subreddit to read and write logs | required | -| `--wiki-page` | `wiki_page` | Wiki page name | `modlog` | -| `--retention-days` | `retention_days` | Keep entries this many days | `30` | -| `--batch-size` | `batch_size` | Entries to fetch per run | `100` | -| `--interval` | `update_interval` | Seconds between updates in daemon mode | `300` | -| `--config` | – | Path to config file | `config.json` | +| CLI Option | JSON Key | Description | Default | Min | Max | +|------------|----------|-------------|---------|-----|-----| +| `--source-subreddit` | `source_subreddit` | Subreddit to read and write logs | required | - | - | +| `--wiki-page` | `wiki_page` | Wiki page name | modlog | - | - | +| `--retention-days` | `retention_days` | Keep entries this many days | 90 | 1 | 365 | +| `--batch-size` | `batch_size` | Entries to fetch per run | 50 | 10 | 500 | +| `--interval` | `update_interval` | Seconds between updates in daemon mode | 600 | 60 | 3600 | +| `--config` | – | Path to config file | config.json | - | - | +| `--debug` | – | Enable verbose output | false | - | - | +| `--show-config-limits` | – | Show configuration limits and defaults | false | - | - | +| `--force-migrate` | – | Force database migration | false | - | - | +| `--no-auto-update-config` | – | Disable automatic config file updates | false | - | - | CLI values override config file values. +## Configuration Limits + +All configuration values are automatically validated and enforced within safe limits. Use `--show-config-limits` to see current limits and defaults. + +## Automatic Config Updates + +The application automatically updates your config file when new configuration options are added, while preserving your existing settings. A backup is created before any changes. Use `--no-auto-update-config` to disable this behavior. + +## Database Migration + +The database will automatically migrate to the latest schema version on startup. Use `--force-migrate` to manually trigger migration. + ## Wiki Output Sample wiki table output: ```markdown -## 2025-01-15 +## 2025-08-09 -| Time | Action | Moderator | Content | Reason | Inquire | -|------|--------|-----------|---------|--------|---------| -| 14:25:33 UTC | removepost | ModName | [Post Title](url) | spam | [Contact Mods](modmail_url) | +| Time | Action | ID | Moderator | Content | Reason | Inquire | +|------|--------|----|-----------|---------|--------|---------| +| 08:15:42 UTC | removecomment | n7ravg2 | AutoModerator | [Comment by u/user123](https://www.reddit.com/r/opensignups/comments/1ab2cd3/title/n7ravg2/) | Possibly requesting an invite - [invited] Offers must be [O] 3x Invites to MyAwesomeTracker | [Contact Mods](https://www.reddit.com/message/compose?to=/r/opensignups&subject=Comment%20Removal%20Inquiry...) | +| 07:45:18 UTC | addremovalreason | 1ab2cd3 | Bakerboy448 | [Post title here](https://www.reddit.com/r/opensignups/comments/1ab2cd3/title/) | Invites - No asking | [Contact Mods](https://www.reddit.com/message/compose?to=/r/opensignups&subject=Removal%20Reason%20Inquiry...) | +| 06:32:15 UTC | removelink | 1xy9def | AutoModerator | [Another post](https://www.reddit.com/r/opensignups/comments/1xy9def/another/) | No standalone URL in post body | [Contact Mods](https://www.reddit.com/message/compose?to=/r/opensignups&subject=Post%20Removal%20Inquiry...) | ``` ## Logging @@ -122,6 +146,38 @@ Options: --debug Enable debug logging --test Run a test and exit --continuous Run continuously + --force-modlog Fetch ALL actions from Reddit API and rebuild wiki + --force-wiki Update wiki even if content appears unchanged + --force-all Do both --force-modlog and --force-wiki +``` + +### Force Commands Explained + +**--force-modlog**: Complete rebuild from Reddit +- Fetches ALL recent modlog actions from Reddit API +- Stores them in database +- Rebuilds entire wiki page from database +- Use when: Starting fresh, major updates, or troubleshooting + +**--force-wiki**: Force wiki update only +- Uses existing database data +- Forces wiki update even if content hash matches +- Use when: Format changes, modmail updates, or cache issues + +**--force-all**: Complete refresh (replaces old --force) +- Combines both --force-modlog and --force-wiki +- Fetches from Reddit AND forces wiki update +- Use when: Major changes, troubleshooting, or unsure which force to use + +```bash +# Complete rebuild from Reddit API +python modlog_wiki_publisher.py --source-subreddit usenet --force-modlog + +# Update wiki with current database data (bypass cache) +python modlog_wiki_publisher.py --source-subreddit usenet --force-wiki + +# Do both (equivalent to old --force) +python modlog_wiki_publisher.py --source-subreddit usenet --force-all ``` ## Database @@ -129,13 +185,37 @@ Options: Uses `modlog.db` (SQLite) for deduplication and history: ```bash -# View recent actions +# View recent actions with removal reasons +sqlite3 modlog.db "SELECT action_id, action_type, moderator, removal_reason, subreddit, created_at FROM processed_actions ORDER BY created_at DESC LIMIT 10;" + +# View all columns including removal reasons and target author sqlite3 modlog.db "SELECT * FROM processed_actions ORDER BY created_at DESC LIMIT 10;" +# View actions by subreddit +sqlite3 modlog.db "SELECT action_type, moderator, target_author, removal_reason FROM processed_actions WHERE subreddit = 'usenet' ORDER BY created_at DESC LIMIT 5;" + +# Track content lifecycle by target ID +sqlite3 modlog.db "SELECT target_id, action_type, moderator, removal_reason, datetime(created_at, 'unixepoch') FROM processed_actions WHERE target_id LIKE '%1mkz4jm%' ORDER BY created_at;" + +# View removal reasons that are text (not numbers) +sqlite3 modlog.db "SELECT action_type, removal_reason FROM processed_actions WHERE removal_reason NOT LIKE '%[0-9]%' AND removal_reason != 'remove' LIMIT 5;" + # Clean manually sqlite3 modlog.db "DELETE FROM processed_actions WHERE created_at < date('now', '-30 days');" ``` +### Database Schema + +The database includes comprehensive moderation data with full transparency: + +- **`removal_reason` column**: Stores actual removal reason text from Reddit's API + - AutoModerator actions: Full rule text (e.g., "Possibly requesting an invite - [invited] Offers must be [O]") + - addremovalreason actions: Readable removal reason (e.g., "Invites - No asking") instead of template numbers + - Manual removals: Moderator-provided text or rule details +- **`target_author` column**: Actual usernames of content authors (never shows [deleted]) +- **`subreddit` column**: Multi-subreddit support with proper data separation +- **Unique content IDs**: Comments show comment IDs (e.g., n7ravg2), posts show post IDs + ## Systemd Service (Optional) ```ini diff --git a/config_template.json b/config_template.json index 14d53b0..600fa16 100644 --- a/config_template.json +++ b/config_template.json @@ -6,10 +6,18 @@ "password": "YOUR_BOT_PASSWORD" }, "source_subreddit": "YourSubreddit", - "target_subreddit": "YourSubreddit", "wiki_page": "modlog", - "ignored_moderators": ["AutoModerator", "BotDefense"], - "update_interval": 300, - "batch_size": 100, - "retention_days": 30 + "retention_days": 90, + "batch_size": 50, + "update_interval": 600, + "max_wiki_entries_per_page": 1000, + "max_continuous_errors": 5, + "rate_limit_buffer": 60, + "max_batch_retries": 3, + "archive_threshold_days": 7, + "ignored_moderators": ["AutoModerator"], + "display_format": { + "show_full_ids": false, + "id_format": "prefixed" + } } \ No newline at end of file diff --git a/modlog_wiki_publisher.py b/modlog_wiki_publisher.py index 145ff67..cf0c096 100644 --- a/modlog_wiki_publisher.py +++ b/modlog_wiki_publisher.py @@ -3,900 +3,1331 @@ Reddit Modlog Wiki Publisher Scrapes moderation logs and publishes them to a subreddit wiki page """ -import argparse +import os +import sys import json -import logging -import logging.handlers import sqlite3 -import sys import time -from datetime import datetime, timedelta, timezone -from pathlib import Path -from typing import Dict, List, Optional -from urllib.parse import quote +import argparse +import logging +import re +import hashlib +from datetime import datetime, timezone +from typing import Dict, List, Optional, Any import praw -# Global logger setup - will be enhanced with per-subreddit loggers -root_logger = logging.getLogger() -root_logger.setLevel(logging.INFO) - -# Console handler for general output -console_handler = logging.StreamHandler() -console_handler.setLevel(logging.INFO) -console_formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s') -console_handler.setFormatter(console_formatter) -root_logger.addHandler(console_handler) - -# Main logger +DB_PATH = "modlog.db" +LOGS_DIR = "logs" +BASE_BACKOFF_WAIT = 30 +MAX_BACKOFF_WAIT = 300 logger = logging.getLogger(__name__) +# Configuration limits and defaults +CONFIG_LIMITS = { + 'retention_days': {'min': 1, 'max': 365, 'default': 90}, + 'batch_size': {'min': 10, 'max': 500, 'default': 50}, + 'update_interval': {'min': 60, 'max': 3600, 'default': 600}, + 'max_wiki_entries_per_page': {'min': 100, 'max': 2000, 'default': 1000}, + 'max_continuous_errors': {'min': 1, 'max': 50, 'default': 5}, + 'rate_limit_buffer': {'min': 30, 'max': 300, 'default': 60}, + 'max_batch_retries': {'min': 1, 'max': 10, 'default': 3}, + 'archive_threshold_days': {'min': 1, 'max': 30, 'default': 7} +} + +# Database schema version +CURRENT_DB_VERSION = 5 + +def get_db_version(): + """Get current database schema version""" + try: + conn = sqlite3.connect(DB_PATH) + cursor = conn.cursor() + + # Check if version table exists + cursor.execute(""" + SELECT name FROM sqlite_master + WHERE type='table' AND name='schema_version' + """) + + if not cursor.fetchone(): + conn.close() + return 0 + + cursor.execute("SELECT version FROM schema_version ORDER BY id DESC LIMIT 1") + result = cursor.fetchone() + conn.close() + + return result[0] if result else 0 + except Exception as e: + logger.warning(f"Could not determine database version: {e}") + return 0 -class ModlogDatabase: - """SQLite database for tracking processed actions""" - - def __init__(self, db_path: str = "modlog.db", retention_days: int = 30): - self.db_path = db_path - self.retention_days = retention_days - self.conn = None - self._init_db() - - def _init_db(self): - """Initialize database and create tables if needed""" - self.conn = sqlite3.connect(self.db_path) - - # Create migrations table first - self.conn.execute(''' - CREATE TABLE IF NOT EXISTS schema_migrations ( - id INTEGER PRIMARY KEY, - name TEXT NOT NULL, - applied_at DATETIME DEFAULT CURRENT_TIMESTAMP +def set_db_version(version): + """Set database schema version""" + try: + conn = sqlite3.connect(DB_PATH) + cursor = conn.cursor() + + cursor.execute(""" + CREATE TABLE IF NOT EXISTS schema_version ( + id INTEGER PRIMARY KEY AUTOINCREMENT, + version INTEGER NOT NULL, + applied_at INTEGER DEFAULT (strftime('%s', 'now')) ) - ''') + """) + + cursor.execute("INSERT INTO schema_version (version) VALUES (?)", (version,)) + conn.commit() + conn.close() + logger.info(f"Database schema version set to {version}") + except Exception as e: + logger.error(f"Failed to set database version: {e}") + raise - # Check if migration 0 is already applied - cursor = self.conn.execute("SELECT 1 FROM schema_migrations WHERE id = 0") - if not cursor.fetchone(): - logger.info("Applying Migration 0: Initial schema") - self.conn.execute(''' +def validate_config_value(key, value, config_limits): + """Validate and enforce configuration limits""" + if key not in config_limits: + return value + + limits = config_limits[key] + if value < limits['min']: + logger.warning(f"{key} value {value} below minimum {limits['min']}, using minimum") + return limits['min'] + elif value > limits['max']: + logger.warning(f"{key} value {value} above maximum {limits['max']}, using maximum") + return limits['max'] + + return value + +def apply_config_defaults_and_limits(config): + """Apply default values and enforce limits on configuration""" + for key, limits in CONFIG_LIMITS.items(): + if key not in config: + config[key] = limits['default'] + logger.info(f"Using default value for {key}: {limits['default']}") + else: + config[key] = validate_config_value(key, config[key], CONFIG_LIMITS) + + # Set default wiki actions if not specified + if 'wiki_actions' not in config: + config['wiki_actions'] = ['removelink', 'removecomment', 'addremovalreason', 'spamlink', 'spamcomment'] + logger.info("Using default wiki_actions: removals and removal reasons only") + + # Validate required fields + required_fields = ['reddit', 'source_subreddit'] + for field in required_fields: + if field not in config: + raise ValueError(f"Missing required configuration field: {field}") + + # Validate reddit credentials + reddit_config = config.get('reddit', {}) + required_reddit_fields = ['client_id', 'client_secret', 'username', 'password'] + for field in required_reddit_fields: + if field not in reddit_config or not reddit_config[field]: + raise ValueError(f"Missing required reddit configuration field: {field}") + + return config + +def migrate_database(): + """Run database migrations to current version""" + current_version = get_db_version() + target_version = CURRENT_DB_VERSION + + if current_version >= target_version: + logger.info(f"Database already at version {current_version}, no migration needed") + return + + logger.info(f"Migrating database from version {current_version} to {target_version}") + + try: + conn = sqlite3.connect(DB_PATH) + cursor = conn.cursor() + + # Migration from version 0 to 1: Initial schema + if current_version < 1: + logger.info("Applying migration: Initial schema (v0 -> v1)") + cursor.execute(""" CREATE TABLE IF NOT EXISTS processed_actions ( - action_id TEXT PRIMARY KEY, - action_type TEXT, - timestamp INTEGER, - created_at DATETIME DEFAULT CURRENT_TIMESTAMP + id INTEGER PRIMARY KEY AUTOINCREMENT, + action_id TEXT UNIQUE NOT NULL, + created_at INTEGER NOT NULL, + processed_at INTEGER DEFAULT (strftime('%s', 'now')) ) - ''') - self.conn.execute(''' - CREATE TABLE IF NOT EXISTS modlog_entries ( - action_id TEXT PRIMARY KEY, - timestamp INTEGER, - action_type TEXT, - moderator TEXT, - target_author TEXT, - title TEXT, - url TEXT, - removal_reason TEXT, - note TEXT, - modmail_url TEXT, - subreddit TEXT + """) + cursor.execute("CREATE INDEX IF NOT EXISTS idx_action_id ON processed_actions(action_id)") + cursor.execute("CREATE INDEX IF NOT EXISTS idx_created_at ON processed_actions(created_at)") + set_db_version(1) + + # Migration from version 1 to 2: Add tracking columns + if current_version < 2: + logger.info("Applying migration: Add tracking columns (v1 -> v2)") + + # Check if columns already exist to handle partial migrations + cursor.execute("PRAGMA table_info(processed_actions)") + existing_columns = [row[1] for row in cursor.fetchall()] + + columns_to_add = [ + ('action_type', 'TEXT'), + ('moderator', 'TEXT'), + ('target_id', 'TEXT'), + ('target_type', 'TEXT'), + ('display_id', 'TEXT'), + ('target_permalink', 'TEXT') + ] + + for column_name, column_type in columns_to_add: + if column_name not in existing_columns: + try: + cursor.execute(f"ALTER TABLE processed_actions ADD COLUMN {column_name} {column_type}") + logger.info(f"Added column: {column_name}") + except sqlite3.OperationalError as e: + if "duplicate column name" not in str(e): + raise + + # Add new indexes + cursor.execute("CREATE INDEX IF NOT EXISTS idx_display_id ON processed_actions(display_id)") + cursor.execute("CREATE INDEX IF NOT EXISTS idx_target_id ON processed_actions(target_id)") + cursor.execute("CREATE INDEX IF NOT EXISTS idx_target_type ON processed_actions(target_type)") + cursor.execute("CREATE INDEX IF NOT EXISTS idx_moderator ON processed_actions(moderator)") + + set_db_version(2) + + # Migration from version 2 to 3: Add removal reason column + if current_version < 3: + logger.info("Applying migration: Add removal reason column (v2 -> v3)") + + # Check if column already exists + cursor.execute("PRAGMA table_info(processed_actions)") + existing_columns = [row[1] for row in cursor.fetchall()] + + if 'removal_reason' not in existing_columns: + try: + cursor.execute("ALTER TABLE processed_actions ADD COLUMN removal_reason TEXT") + logger.info("Added column: removal_reason") + except sqlite3.OperationalError as e: + if "duplicate column name" not in str(e): + raise + + set_db_version(3) + + # Migration from version 3 to 4: Add wiki hash caching table + if current_version < 4: + logger.info("Applying migration: Add wiki hash caching table (v3 -> v4)") + + cursor.execute(""" + CREATE TABLE IF NOT EXISTS wiki_hash_cache ( + id INTEGER PRIMARY KEY AUTOINCREMENT, + subreddit TEXT NOT NULL, + wiki_page TEXT NOT NULL, + content_hash TEXT NOT NULL, + last_updated INTEGER DEFAULT (strftime('%s', 'now')), + UNIQUE(subreddit, wiki_page) ) - ''') - self.conn.execute(''' - CREATE INDEX IF NOT EXISTS idx_modlog_timestamp - ON modlog_entries(timestamp) - ''') - self.conn.execute("INSERT INTO schema_migrations (id, name) VALUES (0, 'initial schema')") - self.conn.commit() - - # Apply migration 1 if not already applied - cursor = self.conn.execute("SELECT 1 FROM schema_migrations WHERE id = 1") - if not cursor.fetchone(): - logger.info("Applying Migration 1: Add subreddit column to modlog_entries") - try: - self.conn.execute("ALTER TABLE modlog_entries ADD COLUMN subreddit TEXT") - except sqlite3.OperationalError: - pass # Already exists or failed silently - self.conn.execute("INSERT INTO schema_migrations (id, name) VALUES (1, 'add subreddit column')") - self.conn.commit() - - logger.info("Database initialized at %s", self.db_path) - - def store_entry(self, entry: Dict): - """Insert or replace a modlog entry record""" - self.conn.execute(''' - INSERT OR REPLACE INTO modlog_entries ( - action_id, timestamp, action_type, moderator, target_author, - title, url, removal_reason, note, modmail_url, subreddit - ) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) - ''', ( - entry['id'], - entry['timestamp'], - entry['action_type'], - entry['moderator'], - entry['target_author'], - entry['title'], - entry['url'], - entry['removal_reason'], - entry['note'], - entry['modmail_url'], - entry['subreddit'] - )) - self.conn.commit() - - def get_recent_entries(self, cutoff_timestamp: float, subreddit: Optional[str] = None) -> List[Dict]: - """Return all modlog entries newer than the cutoff, optionally filtered by subreddit""" - query = ''' - SELECT action_id, timestamp, action_type, moderator, target_author, - title, url, removal_reason, note, modmail_url - FROM modlog_entries - WHERE timestamp >= ? - ''' - params = [cutoff_timestamp] - - if subreddit: - query += ' AND subreddit = ?' - params.append(subreddit) - - query += ' ORDER BY timestamp DESC' - - cursor = self.conn.execute(query, params) - rows = cursor.fetchall() - return [ - { - 'id': r[0], 'timestamp': r[1], 'action_type': r[2], 'moderator': r[3], - 'target_author': r[4], 'title': r[5], 'url': r[6], - 'removal_reason': r[7], 'note': r[8], 'modmail_url': r[9] - } for r in rows - ] - - def is_processed(self, action_id: str) -> bool: - """Check if an action has been processed""" - cursor = self.conn.execute( - "SELECT 1 FROM processed_actions WHERE action_id = ?", - (action_id,) - ) - return cursor.fetchone() is not None - - def mark_processed(self, action_id: str, action_type: str, timestamp: int): - """Mark an action as processed""" - try: - self.conn.execute( - "INSERT INTO processed_actions (action_id, action_type, timestamp) VALUES (?, ?, ?)", - (action_id, action_type, timestamp) - ) - self.conn.commit() - except sqlite3.IntegrityError: - # Already exists, ignore - pass - - def cleanup_old_entries(self): - """Remove entries older than retention period""" - cutoff_date = datetime.now() - timedelta(days=self.retention_days) - self.conn.execute( - "DELETE FROM processed_actions WHERE created_at < ?", - (cutoff_date.isoformat(),) - ) - self.conn.execute( - "DELETE FROM modlog_entries WHERE timestamp < ?", - (cutoff_date.timestamp(),) - ) - self.conn.commit() - # Vacuum occasionally to reclaim space - if time.time() % 86400 < 300: # Once per day approximately - self.conn.execute("VACUUM") - - def close(self): - """Close database connection""" - if self.conn: - self.conn.close() - - -class ModlogWikiPublisher: - """Main class for publishing modlogs to wiki""" - - # Actions that result in content removal - REMOVAL_ACTIONS = { - 'removelink', 'removecomment', 'spamlink', 'spamcomment', - 'removepost', 'removecontent', 'addremovalreason' - } - - # Actions to ignore - IGNORED_ACTIONS = { - 'addnote', 'adjust_post_crowd_control_level', 'approvecomment', 'approvelink', - 'banuser', 'community_welcome_page', 'community_widgets', 'deleterule', - 'distinguish', 'edit_comment_requirements', 'edit_post_requirements', - 'edit_saved_response', 'edited_widget', 'editrule', 'editsettings', - 'ignorereports', 'lock', 'marknsfw', 'reorderrules', 'setflair', 'spoiler', - 'sticky', 'unlock', 'unmarknsfw', 'unspoiler', 'unsticky', 'wikirevise', - 'wikipermlevel', 'wikipagelisted', 'wikipageunlisted', 'createrule', 'editflair', - 'invitemoderator', 'acceptmoderatorinvite', 'removemoderator', 'rejectmoderatorinvite', - 'unbanuser', 'setsuggestedsort', 'muteuser', 'submit_scheduled_post' - } - - # Action groupings for statistics - ACTION_GROUPS = { - 'spam': ['spamlink', 'spamcomment'], - 'remove': ['removelink', 'removecomment', 'removepost', 'removecontent'], - 'reason': ['addremovalreason'], - } - - def __init__(self, config_path: str = "config.json", cli_args: Optional[argparse.Namespace] = None): - self.config = self._load_config(config_path, cli_args or argparse.Namespace()) - self._validate_config(self.config) - self.reddit = self._init_reddit() - self.db = ModlogDatabase(retention_days=self.config.get('retention_days', 30)) - self.wiki_char_limit = 524288 - self.batch_size = self.config.get('batch_size', 100) - self.subreddit_loggers = {} - self._setup_subreddit_logging() - - def _load_config(self, config_path: str, cli_args: argparse.Namespace) -> dict: - """Load JSON config, then override with CLI args""" - config = {} - try: - with open(config_path, 'r') as f: - config = json.load(f) - except FileNotFoundError: - logger.warning("No config file found at %s, using CLI only", config_path) - except json.JSONDecodeError as e: - logger.error("Invalid JSON in config: %s", e) - sys.exit(1) - - # CLI overrides - if hasattr(cli_args, 'source_subreddit') and cli_args.source_subreddit: - config['source_subreddit'] = cli_args.source_subreddit - if hasattr(cli_args, 'wiki_page') and cli_args.wiki_page: - config['wiki_page'] = cli_args.wiki_page - if hasattr(cli_args, 'retention_days') and cli_args.retention_days is not None: - config['retention_days'] = cli_args.retention_days - if hasattr(cli_args, 'batch_size') and cli_args.batch_size is not None: - config['batch_size'] = cli_args.batch_size - if hasattr(cli_args, 'interval') and cli_args.interval is not None: - config['update_interval'] = cli_args.interval - if 'target_subreddit' not in config: - config['target_subreddit'] = config.get('source_subreddit') - return config - - def _validate_config(self, config: dict) -> None: - """Validate configuration has required fields""" - required = ['reddit', 'source_subreddit'] - reddit_required = ['client_id', 'client_secret', 'username', 'password'] - - for field in required: - if field not in config: - raise ValueError(f"Missing required config field: {field}") - - if 'reddit' in config: - for field in reddit_required: - if field not in config['reddit']: - raise ValueError(f"Missing required reddit config: {field}") - - # Validate retention_days is reasonable - retention = config.get('retention_days', 30) - if not 1 <= retention <= 365: - logger.warning("Unusual retention_days: %s, using 30", retention) - config['retention_days'] = 30 - - def _setup_subreddit_logging(self): - """Setup per-subreddit logging with rotation""" - # Create logs directory if it doesn't exist - log_dir = Path(self.config.get('log_directory', 'logs')) - log_dir.mkdir(exist_ok=True) - - # Get subreddits to set up logging for - subreddits = [self.config['source_subreddit']] - if 'target_subreddit' in self.config and self.config['target_subreddit'] != self.config['source_subreddit']: - subreddits.append(self.config['target_subreddit']) - - for subreddit in subreddits: - # Create logger for this subreddit - sub_logger = logging.getLogger(f"modlog.{subreddit}") - sub_logger.setLevel(logging.DEBUG) # Let handlers control level + """) + cursor.execute("CREATE INDEX IF NOT EXISTS idx_subreddit_page ON wiki_hash_cache(subreddit, wiki_page)") + logger.info("Created wiki_hash_cache table") - # Prevent adding handlers multiple times - if sub_logger.handlers: - continue - - # Create rotating file handler - log_file = log_dir / f"{subreddit}_modlog.log" - file_handler = logging.handlers.RotatingFileHandler( - log_file, - maxBytes=self.config.get('log_max_bytes', 10 * 1024 * 1024), # 10MB default - backupCount=self.config.get('log_backup_count', 5), # Keep 5 backups - encoding='utf-8' - ) + set_db_version(4) + + # Migration from version 4 to 5: Add subreddit column + if current_version < 5: + logger.info("Applying migration: Add subreddit column (v4 -> v5)") - # Create formatter for subreddit logs - file_formatter = logging.Formatter( - '%(asctime)s - %(name)s - %(levelname)s - %(funcName)s:%(lineno)d - %(message)s' - ) - file_handler.setFormatter(file_formatter) - file_handler.setLevel(logging.DEBUG) + # Check if column already exists + cursor.execute("PRAGMA table_info(processed_actions)") + existing_columns = [row[1] for row in cursor.fetchall()] - # Add handler to logger - sub_logger.addHandler(file_handler) + if 'subreddit' not in existing_columns: + try: + cursor.execute("ALTER TABLE processed_actions ADD COLUMN subreddit TEXT") + logger.info("Added column: subreddit") + except sqlite3.OperationalError as e: + if "duplicate column name" not in str(e): + raise - # Store reference - self.subreddit_loggers[subreddit] = sub_logger + cursor.execute("CREATE INDEX IF NOT EXISTS idx_subreddit ON processed_actions(subreddit)") - logger.info("Setup logging for subreddit: %s -> %s", subreddit, log_file) + set_db_version(5) + + conn.commit() + conn.close() + logger.info(f"Database migration completed successfully to version {target_version}") - def get_subreddit_logger(self, subreddit: str) -> logging.Logger: - """Get logger for specific subreddit""" - return self.subreddit_loggers.get(subreddit, logger) - - def _init_reddit(self) -> praw.Reddit: - """Initialize Reddit API connection""" - reddit_config = self.config['reddit'] - - # Add debug logging - logger.debug("Attempting login with username: %s", reddit_config['username']) - logger.debug("Client ID: %s...", reddit_config['client_id'][:4]) # Show first 4 chars - - try: - reddit = praw.Reddit( - client_id=reddit_config['client_id'], - client_secret=reddit_config['client_secret'], - username=reddit_config['username'], - password=reddit_config['password'], - user_agent=f"ModlogWikiPublisher/1.0 by /u/{reddit_config['username']}" - ) - - # Force authentication test - me = reddit.user.me() - logger.info("Successfully authenticated as: %s", me.name) - return reddit - - except Exception as e: - logger.error("Authentication failed: %s", e) - logger.error("Error type: %s", type(e).__name__) - if hasattr(e, 'response'): - logger.error("Response status: %s", e.response.status_code) - logger.error("Response body: %s", e.response.text) - raise - - def test_connection(self) -> bool: - """Test Reddit connection and permissions""" - print("\n" + "="*50) - print("Testing Reddit API Connection") - print("="*50) - - try: - # Test authentication with detailed error catching - try: - me = self.reddit.user.me() - print(f"✓ Authenticated as: /u/{me.name}") - except Exception as auth_error: - print(f"❌ Authentication failed: {auth_error}") - if hasattr(auth_error, 'response'): - print(f" Status Code: {auth_error.response.status_code}") - print(f" Response: {auth_error.response.text}") - if '401' in str(auth_error): - print("\nCommon 401 causes:") - print(" - Incorrect client_id or client_secret") - print(" - Wrong username or password") - print(" - 2FA enabled (need app-specific password)") - print(" - Spaces/quotes in credentials") - return False - - # Test subreddit access - source_sub = self.reddit.subreddit(self.config['source_subreddit']) - _ = source_sub.created_utc - print(f"✓ Source subreddit exists: /r/{self.config['source_subreddit']}") - - # Check moderator status - is_mod = False - try: - for mod in source_sub.moderator(): - if mod.name.lower() == self.config['reddit']['username'].lower(): - is_mod = True - break - except: - pass - - if is_mod: - print(f"✓ User is moderator of /r/{self.config['source_subreddit']}") - else: - print(f"⚠ User is NOT moderator of /r/{self.config['source_subreddit']}") - print(" You need moderator access to read modlogs") - return False - - # Test modlog access - try: - log_entry = next(source_sub.mod.log(limit=1), None) - if log_entry: - print(f"✓ Can read modlog (latest action: {log_entry.action})") - else: - print("⚠ No modlog entries found (might be empty)") - except Exception as e: - print(f"❌ Cannot read modlog: {e}") - return False - - # Test wiki access - target_sub = self.reddit.subreddit(self.config['target_subreddit']) - wiki_page = self.config['wiki_page'] + except Exception as e: + logger.error(f"Database migration failed: {e}") + raise - try: - page = target_sub.wiki[wiki_page] - content = page.content_md - print(f"✓ Wiki page exists: /r/{self.config['target_subreddit']}/wiki/{wiki_page}") - print(f" Current size: {len(content)} characters") - except: - print(f"⚠ Wiki page doesn't exist yet: /r/{self.config['target_subreddit']}/wiki/{wiki_page}") - print(" It will be created on first run") +def setup_database(): + """Initialize and migrate database""" + try: + migrate_database() + update_missing_subreddits() + logger.info("Database setup completed successfully") + except Exception as e: + logger.error(f"Database setup failed: {e}") + raise - print("\n✓ All tests passed!") - return True +def get_content_hash(content: str) -> str: + """Calculate SHA-256 hash of content""" + return hashlib.sha256(content.encode('utf-8')).hexdigest() - except Exception as e: - print(f"❌ Connection test failed: {e}") - return False - - def sanitize_for_table(self, text: str) -> str: - """Sanitize text for markdown table display""" - if not text: - return '' - # Replace pipes with similar Unicode character and clean whitespace - return text.replace('|', '┃').strip() - - def get_action_group(self, action_type: str) -> str: - """Get the group name for an action type""" - for group, actions in self.ACTION_GROUPS.items(): - if action_type in actions: - return group - return 'other' - - def _format_timestamp(self, timestamp: float) -> str: - """Format timestamp as HH:MM:SS UTC""" - dt = datetime.fromtimestamp(timestamp, tz=timezone.utc) - return dt.strftime("%H:%M:%S UTC") - - def _format_date(self, timestamp: float) -> str: - """Format timestamp as YYYY-MM-DD""" - dt = datetime.fromtimestamp(timestamp, tz=timezone.utc) - return dt.strftime("%Y-%m-%d") - - def _generate_modmail_url(self, subreddit: str, action_type: str, title: str, url: str) -> str: - """Generate pre-populated modmail URL""" - # Determine removal type - type_map = { - 'removelink': 'Post', - 'removepost': 'Post', - 'removecomment': 'Comment', - 'spamlink': 'Spam Post', - 'spamcomment': 'Spam Comment', - 'removecontent': 'Content', - 'addremovalreason': 'Removal Reason', - } - removal_type = type_map.get(action_type, 'Content') - - # Truncate title if too long - max_title_length = 50 - if len(title) > max_title_length: - title = title[:max_title_length-3] + "..." - - # Create subject line - subject = f"{removal_type} Removal Inquiry - {title}" - body = ( - f"Hello Moderators of /r/{subreddit},\n\n" - f"I would like to inquire about the recent removal of the following {removal_type.lower()}:\n\n" - f"**Title:** {title}\n\n" - f"**Action Type:** {action_type}\n\n" - f"**Link:** {url}\n\n" - "Please provide details regarding this action.\n\n" - "Thank you!" +def get_cached_wiki_hash(subreddit: str, wiki_page: str) -> Optional[str]: + """Get cached wiki content hash for subreddit/page""" + try: + conn = sqlite3.connect(DB_PATH) + cursor = conn.cursor() + cursor.execute( + "SELECT content_hash FROM wiki_hash_cache WHERE subreddit = ? AND wiki_page = ?", + (subreddit, wiki_page) ) + result = cursor.fetchone() + conn.close() + return result[0] if result else None + except Exception as e: + logger.warning(f"Failed to get cached wiki hash: {e}") + return None - # Generate modmail URL - url = f"https://www.reddit.com/message/compose?to=/r/{subreddit}&subject={quote(subject)}&message={quote(body)}" - return url - - def _process_modlog_entry(self, entry) -> Optional[Dict]: - """Process a single modlog entry""" - action_type = entry.action - - # Skip ignored actions - if action_type in self.IGNORED_ACTIONS: - logger.debug("Ignoring action: [%s] for entry %s by %s", action_type, entry.id, entry.mod.name) - return None - - # Skip ignored moderators - ignored_mods = self.config.get('ignored_moderators', []) - if entry.mod.name in ignored_mods: - logger.debug("Ignoring action by ignored moderator: [%s] for entry %s", entry.mod.name, entry.id) - return None - - # Check if already processed - action_id = f"{entry.id}_{entry.created_utc}" - if self.db.is_processed(action_id): - return None - - # Debug logging for non-removal actions - if action_type not in self.REMOVAL_ACTIONS: - logger.debug('Processing non-removal action: [%s] for entry %s by %s', action_type, entry.id, entry.mod.name) - logger.debug("Entry details: %s", entry.details) - logger.debug("Entry target author: %s", entry.target_author) - logger.debug("Entry target title: %s", entry.target_title) - logger.debug("Entry target permalink: %s", entry.target_permalink) - - # Get Mod Note - parsed_mod_note = '' - if hasattr(entry, 'mod_note') and entry.mod_note: - parsed_mod_note = entry.mod_note.strip() - elif hasattr(entry, 'description') and entry.description: - parsed_mod_note = entry.description.strip() - - # Process moderator name (FIXED BUG: using elif) - p_mod_name = '' - entry_mod = '' - if hasattr(entry, 'mod') and entry.mod: - entry_mod = entry.mod.name.strip() - - if entry_mod: - if entry_mod == '[deleted]': - p_mod_name = '[deletedHumanModerator]' - elif entry_mod == 'AutoModerator': - p_mod_name = 'AutoModerator' - elif entry_mod == 'reddit': - p_mod_name = 'reddit' - else: - p_mod_name = 'HumanModerator' - - # Process details - p_details = '' - if entry.details: - p_details = entry.details.strip() - if action_type in ['addremovalreason']: - p_details = parsed_mod_note.strip() - - # Check if comment (improved detection) - is_comment = bool(entry.target_permalink and '/comments/' in entry.target_permalink - and entry.target_permalink.count('/') > 6) - - # Determine Title for Wiki - formatted_title = '' - if is_comment and entry.target_title: - formatted_title = entry.target_title - elif is_comment and not entry.target_title: - formatted_title = f"Comment by u/{entry.target_author if entry.target_author else '[deleted]'}" - elif not is_comment and entry.target_title: - formatted_title = entry.target_title - elif not is_comment and not entry.target_title: - formatted_title = f"Post by u/{entry.target_author if entry.target_author else '[deleted]'}" +def update_cached_wiki_hash(subreddit: str, wiki_page: str, content_hash: str): + """Update cached wiki content hash for subreddit/page""" + try: + conn = sqlite3.connect(DB_PATH) + cursor = conn.cursor() + cursor.execute(""" + INSERT OR REPLACE INTO wiki_hash_cache (subreddit, wiki_page, content_hash, last_updated) + VALUES (?, ?, ?, strftime('%s', 'now')) + """, (subreddit, wiki_page, content_hash)) + conn.commit() + conn.close() + logger.debug(f"Updated cached hash for /r/{subreddit}/wiki/{wiki_page}") + except Exception as e: + logger.warning(f"Failed to update cached wiki hash: {e}") + +def censor_email_addresses(text): + """Censor email addresses in removal reasons""" + if not text: + return text + import re + # Replace email addresses with [EMAIL] + return re.sub(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', '[EMAIL]', text) + +def sanitize_for_markdown(text: str) -> str: + """Sanitize text for use in markdown tables by escaping pipe characters""" + if text is None: + return "" + return str(text).replace("|", " ") + +def get_config_with_default(config: Dict[str, Any], key: str) -> Any: + """Get config value with fallback to CONFIG_LIMITS default""" + if key not in CONFIG_LIMITS: + raise ValueError(f"Unknown config key: {key}") + return config.get(key, CONFIG_LIMITS[key]['default']) + +def get_action_datetime(action): + """Convert action.created_utc to datetime object regardless of input type""" + if isinstance(action.created_utc, (int, float)): + return datetime.fromtimestamp(action.created_utc, tz=timezone.utc) + else: + return action.created_utc + +def get_moderator_name(action, anonymize=True): + """Get moderator name with optional anonymization for human moderators""" + if not action.mod: + return None + + # Extract the actual moderator name + if isinstance(action.mod, str): + mod_name = action.mod + else: + mod_name = action.mod.name + + # Handle special cases - don't censor these, match main branch exactly + if mod_name.lower() in ['automoderator', 'reddit']: + if mod_name.lower() == 'automoderator': + return 'AutoModerator' # Match main branch exactly else: - formatted_title = 'UnknownTitle' - - formatted_link = '' - if entry.target_permalink: - formatted_link = f"https://www.reddit.com{entry.target_permalink}" - - # Build result with sanitization - result = { - 'id': action_id, - 'timestamp': entry.created_utc, - 'action_type': action_type, - 'moderator': self.sanitize_for_table(p_mod_name), - 'target_author': self.sanitize_for_table(entry.target_author or '[deleted]'), - 'removal_reason': self.sanitize_for_table(p_details), - 'note': self.sanitize_for_table(parsed_mod_note), - 'title': self.sanitize_for_table(formatted_title), - 'url': formatted_link # URLs don't need sanitization - } - - # Generate modmail URL for removals - if action_type in self.REMOVAL_ACTIONS: - result['modmail_url'] = self._generate_modmail_url( - self.config['target_subreddit'], - action_type, - result['title'], - result['url'] - ) + return 'Reddit' + + # For human moderators, show generic label or actual name based on config + if anonymize: + return 'HumanModerator' + else: + return mod_name + +def extract_target_id(action): + """Extract Reddit ID from action target - NEVER return user ID""" + # Priority order: get actual post/comment ID first + if hasattr(action, 'target_submission') and action.target_submission: + if hasattr(action.target_submission, 'id'): + return action.target_submission.id else: - logger.debug("Non-removal action, skipping modmail URL generation") - result['modmail_url'] = '' + # Extract ID from submission object string representation + target_str = str(action.target_submission) + if target_str.startswith('t3_'): + return target_str[3:] # Remove t3_ prefix + return target_str + elif hasattr(action, 'target_comment') and action.target_comment: + if hasattr(action.target_comment, 'id'): + return action.target_comment.id + else: + # Extract ID from comment object string representation + target_str = str(action.target_comment) + if target_str.startswith('t1_'): + return target_str[3:] # Remove t1_ prefix + return target_str + else: + # For user-related actions, use action ID instead of user ID + return action.id + +def get_target_type(action): + """Determine target type for ID prefix""" + if hasattr(action, 'target_submission') and action.target_submission: + return 'post' + elif hasattr(action, 'target_comment') and action.target_comment: + return 'comment' + elif hasattr(action, 'target_author'): + return 'user' + else: + return 'action' + +def generate_display_id(action): + """Generate human-readable display ID - NEVER use user ID""" + target_id = extract_target_id(action) + target_type = get_target_type(action) + + prefixes = { + 'post': 'P', + 'comment': 'C', + 'user': 'U', # Use 'A' for action ID when dealing with user actions + 'action': 'A' + } + + prefix = prefixes.get(target_type, 'ZZU') + + # Shorten long IDs for display + if len(str(target_id)) > 8 and target_type in ['post', 'comment']: + short_id = str(target_id)[:6] + return f"{prefix}{short_id}" + else: + return f"{prefix}{target_id}" + +def get_target_permalink(action): + """Get permalink for the target content - prioritize actual content over user profiles""" + # Check if we have a cached permalink from database + if hasattr(action, 'target_permalink_cached') and action.target_permalink_cached: + return action.target_permalink_cached + + try: + # Priority 1: get actual post/comment permalinks from Reddit API + if hasattr(action, 'target_submission') and action.target_submission: + if hasattr(action.target_submission, 'permalink'): + return f"https://reddit.com{action.target_submission.permalink}" + elif hasattr(action.target_submission, 'id'): + # Construct permalink from submission ID + return f"https://reddit.com/comments/{action.target_submission.id}/" + elif hasattr(action, 'target_comment') and action.target_comment: + if hasattr(action.target_comment, 'permalink'): + return f"https://reddit.com{action.target_comment.permalink}" + elif hasattr(action.target_comment, 'id') and hasattr(action.target_comment, 'submission'): + # For comments, construct proper permalink with submission ID + return f"https://reddit.com/comments/{action.target_comment.submission.id}/_/{action.target_comment.id}/" + elif hasattr(action.target_comment, 'id'): + # Fallback for comment without submission info + return f"https://reddit.com/comments/{action.target_comment.id}/" - return result - - def fetch_modlog_entries(self, limit: int = 100) -> List[Dict]: - """Fetch and process modlog entries with rate limit handling""" - subreddit = self.reddit.subreddit(self.config['source_subreddit']) - sub_logger = self.get_subreddit_logger(self.config['source_subreddit']) - entries = [] - - sub_logger.info("Starting to fetch modlog entries, limit: %s", limit) - try: - for entry in subreddit.mod.log(limit=limit): - try: - processed = self._process_modlog_entry(entry) - if processed: - processed['subreddit'] = subreddit.display_name - entries.append(processed) - sub_logger.debug("Processed entry: %s [%s] by %s", - processed['id'], processed['action_type'], processed['moderator']) - # Mark as processed - self.db.mark_processed( - processed['id'], - processed['action_type'], - processed['timestamp'] - ) - self.db.store_entry(processed) - except praw.exceptions.APIException as e: - if e.error_type == "RATELIMIT": - # Extract wait time from message - import re - match = re.search(r'(\d+) minute', str(e)) - wait_time = int(match.group(1)) * 60 if match else 60 - sub_logger.warning("Rate limited, waiting %s seconds", wait_time) - time.sleep(wait_time) - else: - raise - - # Sort by timestamp (newest first) - entries.sort(key=lambda x: x['timestamp'], reverse=True) - sub_logger.info("Successfully fetched %s modlog entries", len(entries)) + # Priority 2: Try to get content permalink from action.target_permalink if it's not a user profile + if hasattr(action, 'target_permalink') and action.target_permalink: + permalink = action.target_permalink + # Only use if it's actual content (contains /comments/) not user profile (/u/) + if '/comments/' in permalink and '/u/' not in permalink: + return f"https://reddit.com{permalink}" if not permalink.startswith('http') else permalink + + # NEVER fall back to user profiles - only link to actual content + except: + pass + return None - except Exception as e: - sub_logger.error("Error fetching modlog: %s", e) - logger.error("Error fetching modlog: %s", e) +def is_duplicate_action(action_id: str) -> bool: + """Check if action has already been processed""" + try: + conn = sqlite3.connect(DB_PATH) + cursor = conn.cursor() + + cursor.execute( + "SELECT 1 FROM processed_actions WHERE action_id = ? LIMIT 1", + (action_id,) + ) + + result = cursor.fetchone() is not None + conn.close() + return result + except Exception as e: + logger.error(f"Error checking duplicate action: {e}") + return False - return entries +def extract_subreddit_from_permalink(permalink): + """Extract subreddit name from Reddit permalink URL""" + if not permalink: + return None + + import re + # Match patterns like /r/subreddit/ or https://reddit.com/r/subreddit/ + match = re.search(r'/r/([^/]+)/', permalink) + return match.group(1) if match else None - def _format_table_row(self, entry: Dict) -> str: - """Format a single entry as a table row""" - # Format action with moderator - action = f"{entry['action_type']}" - moderator = entry['moderator'] +def store_processed_action(action, subreddit_name=None): + """Store processed action to prevent duplicates""" + try: + conn = sqlite3.connect(DB_PATH) + cursor = conn.cursor() - # Format title with URL - if entry['url']: - title = f"[{entry['title']}]({entry['url']})" - else: - title = f"{entry['title']}" - - # Format removal reason - reason = entry['removal_reason'] or entry['note'] or '-' + # Process removal reason properly - ALWAYS prefer actual text over numeric details + removal_reason = None - # Format inquire link - if entry['modmail_url']: - inquire = f"[Contact Mods]({entry['modmail_url']})" - else: - inquire = '-' - - # Format time - time_str = self._format_timestamp(entry['timestamp']) - return f"| {time_str} | {action} | {moderator} | {title} | {reason} | {inquire} |" + # For addremovalreason actions, use description field (contains actual text) + if action.action == 'addremovalreason' and hasattr(action, 'description') and action.description: + removal_reason = censor_email_addresses(str(action.description).strip()) + # First priority: mod_note (actual removal reason text) + elif hasattr(action, 'mod_note') and action.mod_note: + removal_reason = censor_email_addresses(str(action.mod_note).strip()) + # Second priority: details (accept ALL details text, including numbers) + elif hasattr(action, 'details') and action.details: + details_str = str(action.details).strip() + removal_reason = censor_email_addresses(details_str) + + # Extract subreddit from URL if not provided + target_permalink = get_target_permalink(action) + if not subreddit_name and target_permalink: + subreddit_name = extract_subreddit_from_permalink(target_permalink) + + # Add subreddit column if it doesn't exist + cursor.execute("PRAGMA table_info(processed_actions)") + columns = [row[1] for row in cursor.fetchall()] + if 'subreddit' not in columns: + cursor.execute("ALTER TABLE processed_actions ADD COLUMN subreddit TEXT") + + # Add target_author column if it doesn't exist + if 'target_author' not in columns: + cursor.execute("ALTER TABLE processed_actions ADD COLUMN target_author TEXT") + + # Extract target author + target_author = None + if hasattr(action, 'target_author') and action.target_author: + if hasattr(action.target_author, 'name'): + target_author = action.target_author.name + else: + target_author = str(action.target_author) + + cursor.execute(""" + INSERT OR REPLACE INTO processed_actions + (action_id, action_type, moderator, target_id, target_type, + display_id, target_permalink, removal_reason, target_author, created_at, subreddit) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) + """, ( + action.id, + action.action, + get_moderator_name(action, False), # Store actual name in database + extract_target_id(action), + get_target_type(action), + generate_display_id(action), + target_permalink, + sanitize_for_markdown(removal_reason), # Store properly processed removal reason + target_author, + int(action.created_utc) if isinstance(action.created_utc, (int, float)) else int(action.created_utc.timestamp()), + subreddit_name or 'unknown' + )) + + conn.commit() + conn.close() + except Exception as e: + logger.error(f"Error storing processed action: {e}") + raise - def generate_wiki_content(self, entries: List[Dict]) -> str: - """Generate wiki page content with statistics""" - if not entries: - return "# Moderation Log\n\nNo moderation actions to display.\n\n*Last updated: {} UTC*".format( - datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M:%S") +def update_missing_subreddits(): + """Update NULL subreddit entries by extracting from permalinks""" + try: + conn = sqlite3.connect(DB_PATH) + cursor = conn.cursor() + + # Get entries with NULL subreddit but valid permalink + cursor.execute(""" + SELECT id, target_permalink FROM processed_actions + WHERE subreddit IS NULL AND target_permalink IS NOT NULL + """) + + updates = [] + for row_id, permalink in cursor.fetchall(): + subreddit = extract_subreddit_from_permalink(permalink) + if subreddit: + updates.append((subreddit, row_id)) + + # Update entries in batches + if updates: + cursor.executemany( + "UPDATE processed_actions SET subreddit = ? WHERE id = ?", + updates ) + logger.info(f"Updated {len(updates)} entries with extracted subreddit names") + + conn.commit() + conn.close() + + except Exception as e: + logger.error(f"Error updating missing subreddits: {e}") - # Calculate statistics - total_actions = len(entries) - action_counts = {} - for entry in entries: - action = entry['action_type'] - action_counts[action] = action_counts.get(action, 0) + 1 - - # Group entries by date - grouped = {} - for entry in entries: - date = self._format_date(entry['timestamp']) - if date not in grouped: - grouped[date] = [] - grouped[date].append(entry) - - # Build content - lines = [ - "# Moderation Log", - "", - f"*Last updated: {datetime.now(timezone.utc).strftime('%Y-%m-%d %H:%M:%S')} UTC*", - f"*Total actions in period: {total_actions}*", - "" - ] - - # Add summary if there are actions - if action_counts and len(action_counts) > 1: # Only show if there's variety - lines.append("## Summary") - lines.append("") - # Sort by count descending, show top 5 - for action, count in sorted(action_counts.items(), key=lambda x: x[1], reverse=True)[:5]: - lines.append(f"- **{action}**: {count}") - if len(action_counts) > 5: - lines.append(f"- *...and {len(action_counts) - 5} other action types*") - lines.append("") - - # Add tables for each date - for date in sorted(grouped.keys(), reverse=True): - lines.append(f"## {date}") - lines.append("") - lines.append("| Time | Action | Moderator | Content | Reason | Inquire |") - lines.append("|------|--------|-----------|---------|--------|---------|") - - for entry in grouped[date]: - row = self._format_table_row(entry) - lines.append(row) - - lines.append("") - - content = "\n".join(lines) - - # Check size limit - if len(content) > self.wiki_char_limit: - logger.warning("Wiki content exceeds character limit, truncating...") - # Keep header and as many recent entries as possible - lines = lines[:4] # Keep header - lines.append("\n**Note: Content truncated due to size limits**\n") - # Add dates/entries until we approach the limit - for date in sorted(grouped.keys(), reverse=True): - date_section = [ - f"## {date}", - "", - "| Time | Action | Moderator | Content | Reason | Inquire |", - "|------|--------|-----------|---------|--------|---------|" - ] - for entry in grouped[date]: - row = self._format_table_row(entry) - date_section.append(row) - date_section.append("") - - section_text = "\n".join(date_section) - if len("\n".join(lines)) + len(section_text) < self.wiki_char_limit - 1000: - lines.extend(date_section) - else: - break - - content = "\n".join(lines) - - return content - - def update_wiki(self, new_entries: List[Dict]) -> bool: - """Merge with existing wiki content and update""" - target_sub = self.config['target_subreddit'] - sub_logger = self.get_subreddit_logger(target_sub) +def cleanup_old_entries(retention_days: int): + """Remove entries older than retention_days""" + if retention_days <= 0: + retention_days = CONFIG_LIMITS['retention_days']['default'] # No config object available here + + try: + conn = sqlite3.connect(DB_PATH) + cursor = conn.cursor() - try: - subreddit = self.reddit.subreddit(target_sub) - wiki_page = self.config.get('wiki_page', 'modlog') - - sub_logger.info("Updating wiki page: /r/%s/wiki/%s", target_sub, wiki_page) + cutoff_timestamp = int((datetime.now() - datetime.fromtimestamp(0)).total_seconds()) - (retention_days * 86400) + + cursor.execute( + "DELETE FROM processed_actions WHERE created_at < ?", + (cutoff_timestamp,) + ) + + deleted_count = cursor.rowcount + conn.commit() + conn.close() + + if deleted_count > 0: + logger.info(f"Cleaned up {deleted_count} old entries") + except Exception as e: + logger.error(f"Error during cleanup: {e}") - # Get current wiki content (for logging purposes) - try: - existing_content = subreddit.wiki[wiki_page].content_md - sub_logger.debug("Existing wiki content size: %s characters", len(existing_content)) - except Exception: - sub_logger.info("Wiki page doesn't exist yet, will create new") - - # Only use DB entries; wiki parsing no longer needed - cutoff = time.time() - self.config.get('retention_days', 30) * 86400 - retained = self.db.get_recent_entries(cutoff, subreddit=self.config['source_subreddit']) +def get_recent_actions_from_db(config: Dict[str, Any], force_all_actions: bool = False, show_only_removals: bool = True) -> List: + """Fetch recent actions from database for force refresh""" + try: + conn = sqlite3.connect(DB_PATH) + cursor = conn.cursor() + + # For force refresh, get ALL actions, not just wiki_actions filter + if force_all_actions: + # Get all unique action types in database + cursor.execute("SELECT DISTINCT action_type FROM processed_actions WHERE action_type IS NOT NULL") + wiki_actions = set(row[0] for row in cursor.fetchall()) + logger.info(f"Force refresh: including all action types: {wiki_actions}") + elif show_only_removals: + wiki_actions = set([ + 'removelink', 'removecomment', 'addremovalreason', 'spamlink', 'spamcomment' + ]) + else: + # Get configurable list of actions to show in wiki + wiki_actions = set(config.get('wiki_actions', [ + 'removelink', 'removecomment', 'addremovalreason', 'spamlink', 'spamcomment' + ])) + + # Get recent actions within retention period + retention_days = get_config_with_default(config, 'retention_days') + cutoff_timestamp = int((datetime.now() - datetime.fromtimestamp(0)).total_seconds()) - (retention_days * 86400) + + # Limit to max wiki entries + max_entries = get_config_with_default(config, 'max_wiki_entries_per_page') + + placeholders = ','.join(['?'] * len(wiki_actions)) + # STRICT subreddit filtering - only exact matches, no nulls + subreddit_name = config.get('source_subreddit', '') + + logger.debug(f"Query parameters - cutoff: {cutoff_timestamp}, wiki_actions: {wiki_actions}, subreddit: '{subreddit_name}', max_entries: {max_entries}") + + # Check if actions exist for the requested subreddit + cursor.execute(""" + SELECT COUNT(*) FROM processed_actions + WHERE created_at >= ? AND action_type IN ({}) + AND LOWER(subreddit) = LOWER(?) + """.format(placeholders), [cutoff_timestamp] + list(wiki_actions) + [subreddit_name]) + + action_count = cursor.fetchone()[0] + + # If no actions exist for this subreddit, return empty list + if action_count == 0: + logger.info(f"No actions found for subreddit '{subreddit_name}' in the specified time range") + conn.close() + return [] + + logger.debug(f"Found {action_count} actions for subreddit '{subreddit_name}'") + + # Get list of all subreddits for informational purposes + cursor.execute(""" + SELECT DISTINCT LOWER(subreddit) FROM processed_actions + WHERE created_at >= ? AND subreddit IS NOT NULL + """, [cutoff_timestamp]) + + all_subreddits = [row[0] for row in cursor.fetchall() if row[0]] + if len(all_subreddits) > 1: + logger.info(f"Multi-subreddit database contains data for: {sorted(all_subreddits)}") + logger.info(f"Retrieving actions for subreddit: '{subreddit_name}'") + + query = f""" + SELECT action_id, action_type, moderator, target_id, target_type, + display_id, target_permalink, removal_reason, target_author, created_at + FROM processed_actions + WHERE created_at >= ? AND action_type IN ({placeholders}) + AND LOWER(subreddit) = LOWER(?) + ORDER BY created_at DESC + LIMIT ? + """ + + cursor.execute(query, [cutoff_timestamp] + list(wiki_actions) + [subreddit_name, max_entries]) + rows = cursor.fetchall() + conn.close() + + logger.debug(f"Database query returned {len(rows)} rows") + + # Convert database rows to mock action objects for compatibility with existing functions + mock_actions = [] + for row in rows: + action_id, action_type, moderator, target_id, target_type, display_id, target_permalink, removal_reason, target_author, created_at = row + logger.debug(f"Processing cached action: {action_type} by {moderator} at {created_at}") - sub_logger.debug("Retrieved %s entries from database for retention period", len(retained)) + # Create a mock action object with the data we have + class MockAction: + def __init__(self, action_id, action_type, moderator, target_id, target_type, display_id, target_permalink, removal_reason, target_author, created_at): + self.id = action_id + self.action = action_type + self.mod = moderator + # Use the created_at directly + self.created_utc = created_at + self.details = removal_reason + self.display_id = display_id + self.target_permalink = target_permalink.replace('https://reddit.com', '') if target_permalink and target_permalink.startswith('https://reddit.com') else target_permalink + self.target_permalink_cached = target_permalink + + # Use actual target_author from database + self.target_title = None + self.target_author = target_author # Use actual target_author from database + + mock_actions.append(MockAction(action_id, action_type, moderator, target_id, target_type, display_id, target_permalink, removal_reason, target_author, created_at)) + + logger.info(f"Retrieved {len(mock_actions)} actions from database for force refresh") + return mock_actions + + except Exception as e: + logger.error(f"Error fetching actions from database: {e}") + return [] - # Sort newest first - retained.sort(key=lambda x: x['timestamp'], reverse=True) +def format_content_link(action) -> str: + """Format content link for wiki table - matches main branch approach exactly""" + + # Use actual Reddit API data like main branch does + formatted_link = '' + if hasattr(action, 'target_permalink') and action.target_permalink: + formatted_link = f"https://www.reddit.com{action.target_permalink}" + elif hasattr(action, 'target_permalink_cached') and action.target_permalink_cached: + formatted_link = action.target_permalink_cached + + # Check if comment using main branch logic + is_comment = bool(hasattr(action, 'target_permalink') and action.target_permalink + and '/comments/' in action.target_permalink and action.target_permalink.count('/') > 6) + + # Determine title using main branch approach + formatted_title = '' + if is_comment and hasattr(action, 'target_title') and action.target_title: + formatted_title = action.target_title + elif is_comment and (not hasattr(action, 'target_title') or not action.target_title): + target_author = action.target_author if hasattr(action, 'target_author') and action.target_author else '[deleted]' + formatted_title = f"Comment by u/{target_author}" + elif not is_comment and hasattr(action, 'target_title') and action.target_title: + formatted_title = action.target_title + elif not is_comment and (not hasattr(action, 'target_title') or not action.target_title): + target_author = action.target_author if hasattr(action, 'target_author') and action.target_author else '[deleted]' + formatted_title = f"Post by u/{target_author}" + else: + formatted_title = 'Unknown content' + + # Format with link like main branch + if formatted_link: + formatted_title = f"[{formatted_title}]({formatted_link})" + return sanitize_for_markdown(formatted_title) + +def extract_content_id_from_permalink(permalink): + """Extract the actual post/comment ID from Reddit permalink URL""" + if not permalink: + return None + + import re + # Check for comment ID first - URLs like /comments/abc123/title/def456/ + comment_match = re.search(r'/comments/[a-zA-Z0-9]+/[^/]*/([a-zA-Z0-9]+)/?', permalink) + if comment_match: + return f"t1_{comment_match.group(1)}" + + # Extract post ID from URLs like /comments/abc123/ (only if no comment ID found) + post_match = re.search(r'/comments/([a-zA-Z0-9]+)/', permalink) + if post_match: + return f"t3_{post_match.group(1)}" + + return None - # Render content - content = self.generate_wiki_content(retained) +def format_modlog_entry(action, config: Dict[str, Any]) -> Dict[str, str]: + """Format modlog entry - matches main branch approach exactly""" + + # Handle removal reasons like main branch - match exact logic + reason_text = "-" + + # Get mod note first (like main branch parsed_mod_note) + parsed_mod_note = '' + if hasattr(action, 'mod_note') and action.mod_note: + parsed_mod_note = str(action.mod_note).strip() + elif hasattr(action, 'details') and action.details: + parsed_mod_note = str(action.details).strip() + + # Process details like main branch + if hasattr(action, 'details') and action.details: + reason_text = str(action.details).strip() + # For addremovalreason, use mod_note instead of details (main branch logic) + if action.action in ['addremovalreason']: + reason_text = parsed_mod_note if parsed_mod_note else reason_text + elif parsed_mod_note: + reason_text = parsed_mod_note + + # Extract content ID for tracking + content_id = "-" + if hasattr(action, 'target_permalink') and action.target_permalink: + extracted_id = extract_content_id_from_permalink(action.target_permalink) + if extracted_id: + content_id = extracted_id.replace('t3_', '').replace('t1_', '')[:8] # Short ID for table + + return { + 'time': get_action_datetime(action).strftime('%H:%M:%S UTC'), + 'action': action.action, + 'id': content_id, + 'moderator': get_moderator_name(action, config.get('anonymize_moderators', True)) or 'Unknown', + 'content': format_content_link(action), + 'reason': sanitize_for_markdown(str(reason_text)), + 'inquire': generate_modmail_link(config['source_subreddit'], action) + } - # Update the wiki - subreddit.wiki[wiki_page].edit( - content=content, - reason="Rolling modlog update with retention" - ) - sub_logger.info("Wiki page updated with %s entries, content size: %s chars", len(retained), len(content)) - logger.info("Wiki page updated with %s entries.", len(retained)) - return True - - except praw.exceptions.APIException as e: - if e.error_type == "RATELIMIT": - sub_logger.error("Rate limited when updating wiki: %s", e) - logger.error("Rate limited when updating wiki: %s", e) - return False - else: - raise - except Exception as e: - sub_logger.error("Failed to update wiki: %s", e) - logger.error("Failed to update wiki: %s", e) - return False +def generate_modmail_link(subreddit: str, action) -> str: + """Generate modmail link for user inquiries with content ID for tracking""" + from urllib.parse import quote + + # Determine removal type like main branch + type_map = { + 'removelink': 'Post', + 'removepost': 'Post', + 'removecomment': 'Comment', + 'spamlink': 'Spam Post', + 'spamcomment': 'Spam Comment', + 'removecontent': 'Content', + 'addremovalreason': 'Removal Reason', + } + removal_type = type_map.get(action.action, 'Content') + + # Get content ID for tracking + content_id = "-" + if hasattr(action, 'target_permalink') and action.target_permalink: + extracted_id = extract_content_id_from_permalink(action.target_permalink) + if extracted_id: + content_id = extracted_id.replace('t3_', '').replace('t1_', '')[:8] + + # Get title and truncate if needed + if hasattr(action, 'target_title') and action.target_title: + title = action.target_title + else: + title = f"Content by u/{action.target_author}" if hasattr(action, 'target_author') and action.target_author else "Unknown content" + + # Truncate title if too long + max_title_length = 50 + if len(title) > max_title_length: + title = title[:max_title_length-3] + "..." + + # Get URL + url = "" + if hasattr(action, 'target_permalink_cached') and action.target_permalink_cached: + url = action.target_permalink_cached + elif hasattr(action, 'target_permalink') and action.target_permalink: + url = f"https://www.reddit.com{action.target_permalink}" if not action.target_permalink.startswith('http') else action.target_permalink + + # Create subject line with content ID for tracking + subject = f"{removal_type} Removal Inquiry - {title} [ID: {content_id}]" + + # Create body with content ID for easier modmail tracking + body = ( + f"Hello Moderators of /r/{subreddit},\n\n" + f"I would like to inquire about the recent removal of the following {removal_type.lower()}:\n\n" + f"**Content ID:** {content_id}\n\n" + f"**Title:** {title}\n\n" + f"**Action Type:** {action.action}\n\n" + f"**Link:** {url}\n\n" + "Please provide details regarding this action.\n\n" + "Thank you!" + ) + + modmail_url = f"https://www.reddit.com/message/compose?to=/r/{subreddit}&subject={quote(subject)}&message={quote(body)}" + return f"[Contact Mods]({modmail_url})" - def run_once(self): - """Run a single update cycle""" - source_sub = self.config['source_subreddit'] - sub_logger = self.get_subreddit_logger(source_sub) +def build_wiki_content(actions: List, config: Dict[str, Any]) -> str: + """Build wiki page content from actions""" + if not actions: + return "No recent moderation actions found." + + # CRITICAL: Validate all actions belong to the same subreddit before building content + target_subreddit = config.get('source_subreddit', '') + mixed_subreddits = set() + + for action in actions: + # Check if action has subreddit info and if it matches (case-insensitive) + if hasattr(action, 'subreddit') and action.subreddit: + if action.subreddit.lower() != target_subreddit.lower(): + mixed_subreddits.add(action.subreddit) + + if mixed_subreddits: + logger.error(f"CRITICAL: Mixed subreddit data in actions for {target_subreddit}: {mixed_subreddits}") + raise ValueError(f"Cannot build wiki content - mixed subreddit data detected: {mixed_subreddits}") + + # Enforce wiki entry limits + max_entries = get_config_with_default(config, 'max_wiki_entries_per_page') + if len(actions) > max_entries: + logger.warning(f"Truncating wiki content to {max_entries} entries (was {len(actions)})") + actions = actions[:max_entries] + + # Group actions by date + actions_by_date = {} + for action in actions: + date_str = get_action_datetime(action).strftime('%Y-%m-%d') + if date_str not in actions_by_date: + actions_by_date[date_str] = [] + actions_by_date[date_str].append(action) + + # Build content - include ID column for tracking actions across the table + content_parts = [] + for date_str in sorted(actions_by_date.keys(), reverse=True): + content_parts.append(f"## {date_str}") + content_parts.append("| Time | Action | ID | Moderator | Content | Reason | Inquire |") + content_parts.append("|------|--------|----|-----------|---------|--------|---------|") + + for action in sorted(actions_by_date[date_str], key=lambda x: x.created_utc, reverse=True): + entry = format_modlog_entry(action, config) + content_parts.append(f"| {entry['time']} | {entry['action']} | {entry['id']} | {entry['moderator']} | {entry['content']} | {entry['reason']} | {entry['inquire']} |") - logger.info("Starting modlog update cycle...") - sub_logger.info("=== Starting update cycle for /r/%s ===", source_sub) + content_parts.append("") # Empty line between dates + + # Add bot attribution footer after all content + content_parts.append("---") + content_parts.append("") + content_parts.append("*This modlog is automatically maintained by [RedditModLog](https://github.com/bakerboy448/RedditModLog) bot.*") + + return "\n".join(content_parts) - # Cleanup old database entries - self.db.cleanup_old_entries() +def setup_reddit_client(config: Dict[str, Any]): + """Initialize Reddit API client""" + try: + reddit = praw.Reddit( + client_id=config['reddit']['client_id'], + client_secret=config['reddit']['client_secret'], + username=config['reddit']['username'], + password=config['reddit']['password'], + user_agent=f"ModlogWikiPublisher/2.0 by /u/{config['reddit']['username']}" + ) + + # Test authentication + me = reddit.user.me() + logger.info(f"Successfully authenticated as: /u/{me.name}") + return reddit + except Exception as e: + logger.error(f"Failed to authenticate with Reddit: {e}") + raise - # Fetch recent modlog entries - entries = self.fetch_modlog_entries(limit=self.batch_size) +def update_wiki_page(reddit, subreddit_name: str, wiki_page: str, content: str, force: bool = False): + """Update wiki page with content, using hash caching to avoid unnecessary updates""" + try: + # Calculate content hash + content_hash = get_content_hash(content) + + # Check if content has changed (unless forced) + cached_hash = get_cached_wiki_hash(subreddit_name, wiki_page) + if cached_hash == content_hash: + if force: + logger.info(f"Wiki content unchanged, but you selected force for /r/{subreddit_name}/wiki/{wiki_page}, forcing update") + else: + logger.info(f"Wiki content unchanged for /r/{subreddit_name}/wiki/{wiki_page}, skipping update") + return False + + # Update the wiki page + subreddit = reddit.subreddit(subreddit_name) + subreddit.wiki[wiki_page].edit( + content=content, + reason="Automated modlog update" + ) + + # Update the cached hash + update_cached_wiki_hash(subreddit_name, wiki_page, content_hash) + + action_type = "force updated" if force else "updated" + logger.info(f"Successfully {action_type} wiki page: /r/{subreddit_name}/wiki/{wiki_page}") + return True + + except Exception as e: + logger.error(f"Failed to update wiki page: {e}") + raise - if entries: - logger.info("Processing %s new modlog entries", len(entries)) - sub_logger.info("Processing %s new modlog entries", len(entries)) - # Update wiki with current database content - self.update_wiki(entries) - else: - logger.info("No new modlog entries to process") - sub_logger.info("No new modlog entries to process") +def process_modlog_actions(reddit, config: Dict[str, Any]) -> List: + """Fetch and process new modlog actions""" + try: + # Validate batch size + batch_size = validate_config_value('batch_size', config.get('batch_size', 50), CONFIG_LIMITS) + if batch_size != config.get('batch_size'): + config['batch_size'] = batch_size + + subreddit = reddit.subreddit(config['source_subreddit']) + ignored_mods = set(config.get('ignored_moderators', [])) + + new_actions = [] + processed_count = 0 + + logger.info(f"Fetching modlog entries from /r/{config['source_subreddit']}") + + # Get configurable list of actions to show in wiki + wiki_actions = set(config.get('wiki_actions', [ + 'removelink', 'removecomment', 'addremovalreason', 'spamlink', 'spamcomment' + ])) + + for action in subreddit.mod.log(limit=batch_size): + mod_name = get_moderator_name(action, False) # Use actual name for ignore check + if mod_name and mod_name in ignored_mods: + continue - sub_logger.info("=== Completed update cycle for /r/%s ===", source_sub) - - def run_continuous(self): - """Run continuously with interval""" - interval = self.config.get('update_interval', 300) - logger.info("Starting continuous mode, updating every %s seconds", interval) + if is_duplicate_action(action.id): + continue + + # Store ALL actions to database to prevent duplicates + store_processed_action(action, config['source_subreddit']) + processed_count += 1 + + # Only include specific action types in the wiki display + if action.action in wiki_actions: + new_actions.append(action) + + if processed_count >= batch_size: + break + + logger.info(f"Processed {processed_count} new modlog actions") + return new_actions + except Exception as e: + logger.error(f"Error processing modlog actions: {e}") + raise - while True: +def load_config(config_path: str, auto_update: bool = True) -> Dict[str, Any]: + """Load and validate configuration""" + try: + # Load existing config + original_config = {} + config_updated = False + + try: + with open(config_path, 'r') as f: + original_config = json.load(f) + except FileNotFoundError: + logger.error(f"Config file not found: {config_path}") + raise + + # Store original config for comparison + config_before = original_config.copy() + + # Apply defaults and validate limits + config = apply_config_defaults_and_limits(original_config) + + # Check if any new defaults were added + for key, limits in CONFIG_LIMITS.items(): + if key not in config_before: + config_updated = True + logger.info(f"Added new configuration field '{key}' with default value: {limits['default']}") + + # Auto-update config file if new defaults were added and auto_update is enabled + if config_updated and auto_update: try: - self.run_once() + # Create backup of original config + backup_path = f"{config_path}.backup" + import shutil + shutil.copy2(config_path, backup_path) + logger.info(f"Created backup of original config: {backup_path}") + + # Write updated config + with open(config_path, 'w') as f: + json.dump(config, f, indent=2) + logger.info(f"Auto-updated config file '{config_path}' with new defaults") + except Exception as e: - logger.error("Error in update cycle: %s", e) - - logger.info("Sleeping for %s seconds...", interval) - time.sleep(interval) - - def cleanup(self): - """Cleanup resources""" - self.db.close() + logger.warning(f"Could not auto-update config file: {e}") + logger.info("Configuration will still work with in-memory defaults") + elif config_updated and not auto_update: + logger.info("Config file updates available but auto-update disabled. Run without --no-auto-update-config to update.") - # Close all subreddit loggers - for subreddit, sub_logger in self.subreddit_loggers.items(): - for handler in sub_logger.handlers[:]: - handler.close() - sub_logger.removeHandler(handler) - logger.debug("Closed logging for subreddit: %s", subreddit) + logger.info("Configuration loaded and validated successfully") + return config + + except json.JSONDecodeError as e: + logger.error(f"Invalid JSON in config file: {e}") + raise + except Exception as e: + logger.error(f"Error loading config: {e}") + logger.error("Please check your configuration file format and required fields") + raise + +def create_argument_parser(): + """Create command line argument parser""" + parser = argparse.ArgumentParser( + description='Reddit Modlog Wiki Publisher', + formatter_class=argparse.RawDescriptionHelpFormatter + ) + + parser.add_argument( + '--config', default='config.json', + help='Path to configuration file' + ) + parser.add_argument( + '--source-subreddit', + help='Source subreddit name' + ) + parser.add_argument( + '--wiki-page', default='modlog', + help='Wiki page name' + ) + parser.add_argument( + '--retention-days', type=int, + help='Database retention period in days' + ) + parser.add_argument( + '--batch-size', type=int, + help='Number of entries to fetch per run' + ) + parser.add_argument( + '--interval', type=int, + help='Update interval in seconds for continuous mode' + ) + parser.add_argument( + '--continuous', action='store_true', + help='Run continuously with interval updates' + ) + parser.add_argument( + '--test', action='store_true', + help='Test configuration and Reddit API access' + ) + parser.add_argument( + '--debug', action='store_true', + help='Enable debug logging' + ) + parser.add_argument( + '--show-config-limits', action='store_true', + help='Show configuration limits and defaults' + ) + parser.add_argument( + '--force-migrate', action='store_true', + help='Force database migration (use with caution)' + ) + parser.add_argument( + '--no-auto-update-config', action='store_true', + help='Disable automatic config file updates' + ) + parser.add_argument( + '--force-modlog', action='store_true', + help='Fetch ALL modlog actions from Reddit API and completely rebuild wiki from database' + ) + parser.add_argument( + '--force-wiki', action='store_true', + help='Force wiki page update even if content appears unchanged (bypasses hash check)' + ) + parser.add_argument( + '--force-all', action='store_true', + help='Equivalent to --force-modlog + --force-wiki (complete rebuild and force update)' + ) + + return parser +def setup_logging(debug: bool = False): + """Setup logging configuration""" + os.makedirs(LOGS_DIR, exist_ok=True) + + level = logging.DEBUG if debug else logging.INFO + logging.basicConfig( + level=level, + format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' + ) + + # Set prawcore and urllib3 to TRACE level for Reddit API debugging when debug is enabled + if debug: + logging.getLogger("prawcore").setLevel(5) # TRACE level (below DEBUG) + logging.getLogger("urllib3.connectionpool").setLevel(5) # TRACE level + +def show_config_limits(): + """Display configuration limits and defaults""" + print("Configuration Limits and Defaults:") + print("=" * 50) + for key, limits in CONFIG_LIMITS.items(): + print(f"{key}:") + print(f" Default: {limits['default']}") + print(f" Minimum: {limits['min']}") + print(f" Maximum: {limits['max']}") + print() + + print("Required Configuration Fields:") + print("- reddit.client_id") + print("- reddit.client_secret") + print("- reddit.username") + print("- reddit.password") + print("- source_subreddit") + +def run_continuous_mode(reddit, config: Dict[str, Any], force: bool = False): + """Run in continuous monitoring mode""" + logger.info("Starting continuous mode...") + + error_count = 0 + max_errors = get_config_with_default(config, 'max_continuous_errors') + first_run_force = force + + while True: + try: + error_count = 0 # Reset on successful run + actions = process_modlog_actions(reddit, config) + + if actions: + content = build_wiki_content(actions, config) + wiki_page = config.get('wiki_page', 'modlog') + update_wiki_page(reddit, config['source_subreddit'], wiki_page, content, force=first_run_force) + first_run_force = False + + cleanup_old_entries(get_config_with_default(config, 'retention_days')) + + interval = validate_config_value('update_interval', + get_config_with_default(config, 'update_interval'), + CONFIG_LIMITS) + logger.info(f"Waiting {interval} seconds until next update...") + time.sleep(interval) + + except KeyboardInterrupt: + logger.info("Received interrupt signal, shutting down...") + break + except Exception as e: + error_count += 1 + logger.error(f"Error in continuous mode (attempt {error_count}/{max_errors}): {e}") + + if error_count >= max_errors: + logger.error(f"Maximum error count ({max_errors}) reached, shutting down") + break + + # Exponential backoff for errors + wait_time = min(BASE_BACKOFF_WAIT * (2 ** (error_count - 1)), MAX_BACKOFF_WAIT) # Max 5 minutes + logger.info(f"Waiting {wait_time} seconds before retry...") + time.sleep(wait_time) def main(): - """Main entry point""" - parser = argparse.ArgumentParser(description='Reddit Modlog Wiki Publisher') - parser.add_argument('--config', default='config.json', help='Path to configuration file') - parser.add_argument('--source-subreddit', help='Source subreddit (modlog source)') - parser.add_argument('--wiki-page', help='Wiki page name (default: modlog)') - parser.add_argument('--retention-days', type=int, help='Retention window in days') - parser.add_argument('--batch-size', type=int, help='Batch size to fetch per run') - parser.add_argument('--interval', type=int, help='Interval (seconds) for continuous mode') - parser.add_argument('--debug', action='store_true', help='Enable debug logging') - parser.add_argument('--continuous', action='store_true', help='Run continuously') - parser.add_argument('--test', action='store_true', help='Test configuration and exit') + parser = create_argument_parser() args = parser.parse_args() - - if args.debug: - logging.getLogger().setLevel(logging.DEBUG) - + + setup_logging(args.debug) + try: - # Create and run publisher - publisher = ModlogWikiPublisher(args.config, args) - + # Show configuration limits if requested + if args.show_config_limits: + show_config_limits() + return + + # Force migration if requested + if args.force_migrate: + logger.info("Forcing database migration...") + migrate_database() + logger.info("Database migration completed") + return + + setup_database() + + config = load_config(args.config, auto_update=not args.no_auto_update_config) + + # Override config with CLI args + if args.source_subreddit: + config['source_subreddit'] = args.source_subreddit + if args.wiki_page: + config['wiki_page'] = args.wiki_page + if args.retention_days is not None: + config['retention_days'] = args.retention_days + if args.batch_size is not None: + config['batch_size'] = args.batch_size + if args.interval is not None: + config['update_interval'] = args.interval + + reddit = setup_reddit_client(config) + if args.test: - # Test mode - just validate connection - success = publisher.test_connection() - sys.exit(0 if success else 1) - elif args.continuous: - # Continuous mode - publisher.run_continuous() + logger.info("Running connection test...") + # Basic test - try to fetch one modlog entry + subreddit = reddit.subreddit(config['source_subreddit']) + test_entry = next(subreddit.mod.log(limit=1), None) + if test_entry: + logger.info("✓ Successfully connected and can read modlog") + else: + logger.warning("⚠ Connected but no modlog entries found") + return + + # Handle force commands + if args.force_all: + args.force_modlog = True + args.force_wiki = True + logger.info("Force all requested - will fetch from Reddit AND force wiki update") + + if args.force_modlog: + logger.info("Force modlog requested - fetching ALL modlog actions from Reddit and rebuilding wiki...") + # First, fetch all recent modlog actions to populate database + logger.info("Fetching all modlog actions from Reddit...") + process_modlog_actions(reddit, config) + + # Then rebuild wiki from database (showing only removal actions) + logger.info("Rebuilding wiki from database...") + actions = get_recent_actions_from_db(config, force_all_actions=False,show_only_removals=True) + if actions: + logger.info(f"Found {len(actions)} removal actions in database for wiki") + content = build_wiki_content(actions, config) + wiki_page = config.get('wiki_page', 'modlog') + update_wiki_page(reddit, config['source_subreddit'], wiki_page, content, force=args.force_wiki) + else: + logger.warning("No removal actions found in database for wiki refresh") + return + + # Handle force-wiki: rebuild from database without hitting modlog API + if args.force_wiki and not args.force_modlog: + logger.info("Force wiki requested - rebuilding from database without API calls") + actions = get_recent_actions_from_db(config, force_all_actions=False) + if actions: + logger.info(f"Found {len(actions)} actions in database for wiki rebuild") + content = build_wiki_content(actions, config) + wiki_page = config.get('wiki_page', 'modlog') + update_wiki_page(reddit, config['source_subreddit'], wiki_page, content, force=True) + else: + logger.warning("No actions found in database for wiki rebuild") + return + + # Process modlog actions (normal operation) + new_actions = process_modlog_actions(reddit, config) + + if new_actions: + logger.info(f"Processed {len(new_actions)} new modlog actions") + + # Always rebuild wiki from ALL relevant actions in database (within retention period) + all_actions = get_recent_actions_from_db(config, force_all_actions=False, show_only_removals=True) + if all_actions: + logger.info(f"Found {len(all_actions)} total actions in database for wiki update") + content = build_wiki_content(all_actions, config) + wiki_page = config.get('wiki_page', 'modlog') + update_wiki_page(reddit, config['source_subreddit'], wiki_page, content, force=args.force_wiki) + else: + logger.warning("No actions found in database for wiki update") + + cleanup_old_entries(get_config_with_default(config, 'retention_days')) + + if args.continuous: + run_continuous_mode(reddit, config, force=args.force_wiki) else: - # Default: run once - publisher.run_once() + logger.info("Single run completed") + except KeyboardInterrupt: logger.info("Received interrupt signal, shutting down...") - except ValueError as e: - logger.error("Configuration error: %s", e) - sys.exit(1) + sys.exit(0) except Exception as e: - logger.error("Unexpected error: %s", e) + logger.error(f"Fatal error: {e}") sys.exit(1) - finally: - if 'publisher' in locals(): - publisher.cleanup() - if __name__ == "__main__": - main() \ No newline at end of file + main() diff --git a/test_removal_reasons.py b/test_removal_reasons.py new file mode 100644 index 0000000..3050621 --- /dev/null +++ b/test_removal_reasons.py @@ -0,0 +1,178 @@ +#!/usr/bin/env python3 +""" +Test script to verify removal reason processing without Reddit API calls +Creates a local markdown file to demonstrate the functionality +""" +import sqlite3 +from datetime import datetime, timezone +import os +import sys + +# Add the current directory to path to import our module +sys.path.insert(0, os.path.dirname(os.path.abspath(__file__))) + +from modlog_wiki_publisher import * + +# Mock Reddit action objects for testing +class MockRedditAction: + def __init__(self, action_id, action_type, details, mod_name, target_type='post', target_id='abc123'): + self.id = action_id + self.action = action_type + self.details = details + self.created_utc = int(datetime.now().timestamp()) + + # Mock moderator + class MockMod: + def __init__(self, name): + self.name = name + self.mod = MockMod(mod_name) + + # Mock targets based on type + if target_type == 'post': + self.target_submission = target_id + self.target_comment = None + self.target_author = 'testuser' + self.target_title = 'Test Post Title' + self.target_permalink = f'/r/test/comments/{target_id}/test_post/' + elif target_type == 'comment': + self.target_submission = None + self.target_comment = target_id + self.target_author = 'testuser' + self.target_title = None + self.target_permalink = f'/r/test/comments/parent123/test_post/{target_id}/' + +def test_removal_reasons(): + """Test removal reason processing and storage""" + print("Testing Removal Reason Processing") + print("=" * 50) + + # Clean up any existing test database + test_db = "test_modlog.db" + if os.path.exists(test_db): + os.remove(test_db) + + # Override the global DB_PATH for testing + global DB_PATH + original_db_path = DB_PATH + DB_PATH = test_db + + try: + # Initialize test database + print(" Setting up test database...") + setup_database() + + # Verify table was created + conn = sqlite3.connect(DB_PATH) + cursor = conn.cursor() + cursor.execute("SELECT name FROM sqlite_master WHERE type='table' AND name='processed_actions'") + if not cursor.fetchone(): + print(" Database table not found, creating manually...") + cursor.execute(""" + CREATE TABLE processed_actions ( + id INTEGER PRIMARY KEY AUTOINCREMENT, + action_id TEXT UNIQUE NOT NULL, + action_type TEXT, + moderator TEXT, + target_id TEXT, + target_type TEXT, + display_id TEXT, + target_permalink TEXT, + removal_reason TEXT, + created_at INTEGER NOT NULL, + processed_at INTEGER DEFAULT (strftime('%s', 'now')) + ) + """) + conn.commit() + conn.close() + + # Test cases with different removal reasons + test_actions = [ + MockRedditAction("test1", "removelink", "Rule 1: No spam", "HumanMod1", "post", "post123"), + MockRedditAction("test2", "removecomment", "Rule 2: Be civil", "HumanMod2", "comment", "comment456"), + MockRedditAction("test3", "spamlink", "Spam detection", "AutoModerator", "post", "post789"), + MockRedditAction("test4", "addremovalreason", "Adding removal reason for clarity", "HumanMod1", "post", "post999"), + MockRedditAction("test5", "removelink", None, "HumanMod3", "post", "post111"), # No removal reason + MockRedditAction("test6", "removecomment", " Rule 3: No off-topic ", "HumanMod2", "comment", "comment222"), # Test whitespace stripping + ] + + print("\n1. Storing test actions...") + for action in test_actions: + print(f" Storing: {action.action} - '{action.details}'") + store_processed_action(action) + + print("\n2. Verifying database storage...") + conn = sqlite3.connect(DB_PATH) + cursor = conn.cursor() + cursor.execute("SELECT action_id, action_type, removal_reason FROM processed_actions ORDER BY action_id") + results = cursor.fetchall() + conn.close() + + for action_id, action_type, removal_reason in results: + print(f" {action_id}: {action_type} -> '{removal_reason}'") + + print("\n3. Testing wiki content generation...") + + # Create a mock config for testing + mock_config = { + 'wiki_actions': ['removelink', 'removecomment', 'addremovalreason', 'spamlink'], + 'anonymize_moderators': True, + 'source_subreddit': 'test', + 'max_wiki_entries_per_page': 1000, + 'retention_days': 30 + } + + # Get actions from database (simulating force refresh) + actions = get_recent_actions_from_db(mock_config) + print(f" Retrieved {len(actions)} actions from database") + + # Generate wiki content + wiki_content = build_wiki_content(actions, mock_config) + + # Write to local markdown file + output_file = "test_modlog_output.md" + with open(output_file, 'w', encoding='utf-8') as f: + f.write(wiki_content) + + print(f"\n4. Wiki content written to: {output_file}") + print("\nFirst few lines of generated content:") + print("-" * 40) + lines = wiki_content.split('\n') + for i, line in enumerate(lines[:15]): + print(f"{i+1:2d}: {line}") + if len(lines) > 15: + print(" ... (truncated)") + + print("\n5. Checking removal reasons in wiki content...") + if "Rule 1: No spam" in wiki_content: + print(" ✓ Found 'Rule 1: No spam' in wiki content") + else: + print(" ❌ Missing 'Rule 1: No spam' in wiki content") + + if "Rule 2: Be civil" in wiki_content: + print(" ✓ Found 'Rule 2: Be civil' in wiki content") + else: + print(" ❌ Missing 'Rule 2: Be civil' in wiki content") + + if "Rule 3: No off-topic" in wiki_content: + print(" ✓ Found 'Rule 3: No off-topic' (whitespace stripped)") + else: + print(" ❌ Missing 'Rule 3: No off-topic' in wiki content") + + if "No reason" in wiki_content: + print(" ✓ Found 'No reason' for action without details") + else: + print(" ❌ Missing 'No reason' fallback in wiki content") + + print(f"\nTest completed successfully!") + print(f"Check '{output_file}' to see the full generated wiki content.") + + finally: + # Restore original DB path + DB_PATH = original_db_path + + # Clean up test database + if os.path.exists(test_db): + os.remove(test_db) + +if __name__ == "__main__": + test_removal_reasons() \ No newline at end of file