Skip to content

[P2] Add production observability, readiness checks, and operational data retention #288

@jjoonleo

Description

@jjoonleo

Problem

The backend has limited production observability beyond ad-hoc logs and an api_log table. It lacks a clear health/readiness model, metrics, tracing/correlation, alerting signals, and retention/privacy policy for operational data.

Why this is not production ready

Operators need to know whether the service, database, Firebase, OAuth providers, scheduling, and async executors are healthy. Without metrics and retention rules, incidents are harder to diagnose and logs can become a privacy liability.

Evidence

  • spring-boot-starter-actuator is included, but no production management endpoint configuration was found.
  • /health returns a static string and does not check dependencies.
  • ApiLogService writes request metadata asynchronously but no retention, aggregation, or privacy policy is defined.
  • Notification/provider failures are not exposed as metrics.

Required work

  • Configure Spring Boot Actuator liveness/readiness endpoints for production.
  • Add health contributors for database, Firebase initialization, scheduler state, and required external provider readiness where appropriate.
  • Add metrics for request latency/error rate, auth failures, token refreshes, notification scheduling/sending, and provider calls.
  • Add correlation/request IDs to logs and error responses.
  • Define retention and privacy policy for api_log, IP addresses, and user identifiers.
  • Add alerting criteria for high error rates, notification failures, provider failures, and DB connectivity issues.

Acceptance criteria

  • Production has meaningful liveness and readiness endpoints.
  • Operators can observe request/error/notification/provider health from metrics.
  • Logs include correlation IDs and exclude sensitive payloads.
  • api_log retention/privacy behavior is documented and implemented.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:deploymentBuild, config, deployment, infrastructurearea:stabilityReliability and runtime stabilitypriority:P2Medium: important hardening or operational maturity workproduction-readinessProduction readiness audit itemtype:opsOperational readiness task

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions