-
Notifications
You must be signed in to change notification settings - Fork 46
[7/7] Telemetry Testing and Documentation - FINAL LAYER #330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
samikshya-db
wants to merge
19
commits into
telemetry-6-integration
Choose a base branch
from
telemetry-7-documentation
base: telemetry-6-integration
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2b8abc3 to
9ac0978
Compare
dd62b6d to
886a509
Compare
This is part 7 of 7 in the telemetry implementation stack - FINAL LAYER. Documentation: - README.md: Add telemetry overview section - docs/TELEMETRY.md: Comprehensive telemetry documentation - spec/telemetry-design.md: Detailed design document - spec/telemetry-sprint-plan.md: Implementation plan - spec/telemetry-test-completion-summary.md: Test coverage report README.md Updates: - Added telemetry overview section - Configuration examples with all 7 options - Privacy-first design highlights - Link to detailed TELEMETRY.md TELEMETRY.md - Complete User Guide: - Overview and benefits - Privacy-first design (what is/isn't collected) - Configuration guide with examples - Event types with JSON schemas - Feature control (server-side flag + client override) - Architecture overview - Troubleshooting guide - Privacy & compliance (GDPR, CCPA, SOC 2) - Performance impact analysis - FAQ (12 common questions) Design Document (telemetry-design.md): - Complete system architecture - Component specifications - Data flow diagrams - Error handling requirements - Testing strategy - Implementation phases Test Coverage Summary: - 226 telemetry tests passing - 97.76% line coverage - 90.59% branch coverage - 100% function coverage - Critical requirements verified Test Breakdown by Component: - ExceptionClassifier: 51 tests (100% coverage) - CircuitBreaker: 32 tests (100% functions) - FeatureFlagCache: 29 tests (100% functions) - TelemetryEventEmitter: 31 tests (100% functions) - TelemetryClient: 31 tests (100% functions) - TelemetryClientProvider: 31 tests (100% functions) - MetricsAggregator: 32 tests (94% lines, 82% branches) - DatabricksTelemetryExporter: 24 tests (96% statements) - Integration: 11 E2E tests Critical Test Verification: ✅ All exceptions swallowed (no propagation) ✅ Debug-only logging (no warn/error) ✅ No console logging ✅ Driver works when telemetry fails ✅ Reference counting correct ✅ Circuit breaker behavior correct This completes the 7-layer telemetry implementation stack! Signed-off-by: samikshya-chand_data <samikshya.chand@databricks.com>
Implement proper authentication for feature flag fetching and telemetry
export by adding getAuthHeaders() method to IClientContext.
- **IClientContext**: Add getAuthHeaders() method to expose auth headers
- **DBSQLClient**: Implement getAuthHeaders() using authProvider.authenticate()
- Returns empty object gracefully if no auth provider available
- **FeatureFlagCache**: Implement actual server API call
- Endpoint: GET /api/2.0/connector-service/feature-flags/OSS_NODEJS/{version}
- Uses context.getAuthHeaders() for authentication
- Parses JSON response with flags array
- Updates cache duration from server-provided ttl_seconds
- Looks for: databricks.partnerplatform.clientConfigsFeatureFlags.enableTelemetryForNodeJs
- All exceptions swallowed with debug logging only
- **DatabricksTelemetryExporter**: Add authentication to authenticated endpoint
- Uses context.getAuthHeaders() when authenticatedExport=true
- Properly authenticates POST to /api/2.0/sql/telemetry-ext
- Removes TODO comments about missing authentication
Follows same pattern as JDBC driver:
- Endpoint: /api/2.0/connector-service/feature-flags/OSS_JDBC/{version} (JDBC)
- Endpoint: /api/2.0/connector-service/feature-flags/OSS_NODEJS/{version} (Node.js)
- Auth headers from connection's authenticate() method
- Response format: { flags: [{ name, value }], ttl_seconds }
- Build: ✅ Successful
- E2E: ✅ Verified with real credentials
- Feature flag fetch now fully functional
- Telemetry export now properly authenticated
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: samikshya-chand_data <samikshya.chand@databricks.com>
- Fix event listener names: use 'connection.open' not 'telemetry.connection.open' - Fix feature flag endpoint: use NODEJS client type instead of OSS_NODEJS - Fix telemetry endpoints: use /telemetry-ext and /telemetry-unauth (not /api/2.0/sql/...) - Update telemetry payload to match proto: use system_configuration with snake_case fields - Add URL utility to handle hosts with or without protocol - Add telemetryBatchSize and telemetryAuthenticatedExport config options - Remove debug statements and temporary feature flag override Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: samikshya-chand_data <samikshya.chand@databricks.com>
Signed-off-by: samikshya-chand_data <samikshya.chand@databricks.com>
Added detailed documentation for: - System configuration fields (osArch, runtimeVendor, localeName, charSetEncoding, processName) with JDBC equivalents - protoLogs payload format matching JDBC TelemetryRequest structure - Complete log object structure with all field descriptions - Example JSON payloads showing actual format sent to server Clarified that: - Each log is JSON-stringified before adding to protoLogs array - Connection events include full system_configuration - Statement events include operation_latency_ms and sql_operation - The items field is required but always empty Signed-off-by: samikshya-chand_data <samikshya.chand@databricks.com>
Added comprehensive section 6.5 explaining exactly when telemetry exports occur: - Statement close: Aggregates metrics, exports only if batch full - Connection close: ALWAYS exports all pending metrics via aggregator.close() - Process exit: NO automatic export unless close() was called - Batch size/timer: Automatic background exports Included: - Code examples showing actual implementation - Summary table comparing all lifecycle events - Best practices for ensuring telemetry export (SIGINT/SIGTERM handlers) - Key differences from JDBC (JVM shutdown hooks vs manual close) Clarified that aggregator.close() does three things: 1. Stops the periodic flush timer 2. Completes any remaining incomplete statements 3. Performs final flush to export all buffered metrics Signed-off-by: samikshya-chand_data <samikshya.chand@databricks.com>
886a509 to
ea1643b
Compare
Changes: - Track and export connection open latency (session creation time) - Enable telemetry by default (was false), gated by feature flag - Update design doc to document connection latency Implementation: - DBSQLClient.openSession(): Track start time and calculate latency - TelemetryEventEmitter: Accept latencyMs in connection event - MetricsAggregator: Include latency in connection metrics - DatabricksTelemetryExporter: Export operation_latency_ms for connections Config changes: - telemetryEnabled: true by default (in DBSQLClient and types.ts) - Feature flag check still gates initialization for safe rollout Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: samikshya-chand_data <samikshya.chand@databricks.com>
Fixes: - sql_operation now properly populated by fetching metadata before statement close - statement_id always populated from operation handle GUID - auth_type now included in driver_connection_params Changes: - DBSQLOperation: Fetch metadata before emitting statement.complete to ensure resultFormat is available for sql_operation field - DBSQLClient: Track authType from connection options and include in driver configuration - DatabricksTelemetryExporter: Export auth_type in driver_connection_params - types.ts: Add authType to DriverConfiguration interface - Design doc: Document auth_type, resultFormat population, and connection params Implementation details: - emitStatementComplete() is now async to await metadata fetch - Auth type defaults to 'access-token' if not specified - Result format fetched even if not explicitly requested by user - Handles metadata fetch failures gracefully (continues without resultFormat) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: samikshya-chand_data <samikshya.chand@databricks.com>
- Convert 'access-token' (or undefined) to 'pat' - Convert 'databricks-oauth' to 'external-browser' (U2M) or 'oauth-m2m' (M2M) - Distinguish M2M from U2M by checking for oauthClientSecret - Keep 'custom' as 'custom' Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Add statement_type field from operationType - Add is_compressed field from compression tracking - Export both fields in sql_operation for statement metrics - Fields populated from CloudFetch chunk events Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Exclude '00000000-0000-0000-0000-000000000000' from sql_statement_id - Only include valid statement IDs in telemetry logs Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- statement_type only included if operationType is set - is_compressed only included if compressed value is set - execution_result only included if resultFormat is set - sql_operation object only created if any field is present Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Convert TOperationType (Thrift) to proto Operation.Type names - EXECUTE_STATEMENT remains EXECUTE_STATEMENT - GET_TYPE_INFO -> LIST_TYPE_INFO - GET_CATALOGS -> LIST_CATALOGS - GET_SCHEMAS -> LIST_SCHEMAS - GET_TABLES -> LIST_TABLES - GET_TABLE_TYPES -> LIST_TABLE_TYPES - GET_COLUMNS -> LIST_COLUMNS - GET_FUNCTIONS -> LIST_FUNCTIONS - UNKNOWN -> TYPE_UNSPECIFIED Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- auth_type is field 5 at OssSqlDriverTelemetryLog level, not nested - Remove driver_connection_params (not populated in Node.js driver) - Export auth_type directly in sql_driver_log for connection events Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- ARROW_BASED_SET -> INLINE_ARROW - COLUMN_BASED_SET -> COLUMNAR_INLINE - ROW_BASED_SET -> INLINE_JSON - URL_BASED_SET -> EXTERNAL_LINKS Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Create lib/telemetry/telemetryTypeMappers.ts - Move mapOperationTypeToTelemetryType (renamed from mapOperationTypeToProto) - Move mapResultFormatToTelemetryType (renamed from mapResultFormatToProto) - Keep all telemetry-specific mapping functions in one place Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- http_path: API endpoint path - socket_timeout: Connection timeout in milliseconds - enable_arrow: Whether Arrow format is enabled - enable_direct_results: Whether direct results are enabled - enable_metric_view_metadata: Whether metric view metadata is enabled - Only populate fields that are present Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Add section 14 detailing implemented and missing proto fields - List all fields from OssSqlDriverTelemetryLog that are implemented - Document which fields are not implemented and why - Explain that missing fields require additional instrumentation Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Contributor
Author
Proto Field CoverageAdded proto fields:
Not implemented (not prioritized yet):
See |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Part 7 of 7-part Telemetry Implementation Stack - COMPLETE! 🎉
This is the final layer completing the telemetry system with comprehensive documentation and test summaries.
Summary
Adds comprehensive user documentation, design documents, and test coverage summaries. This PR completes the 7-layer telemetry implementation stack.
Documentation Added
User Documentation
README.md Updates:
docs/TELEMETRY.md - Complete Guide (682 lines):
Technical Documentation
spec/telemetry-design.md:
spec/telemetry-sprint-plan.md:
spec/telemetry-test-completion-summary.md:
Test Coverage Summary
Overall Coverage:
Coverage by Component:
Critical Test Verification ✅
All CRITICAL requirements verified with tests:
Key Documentation Highlights
Privacy-First Design
Configuration Examples
Event Types
connection.open- Connection establishedstatement.start- Statement execution beginsstatement.complete- Statement finishescloudfetch.chunk- CloudFetch chunk downloadedStack Complete! 🎉
This PR completes the 7-layer implementation stack:
Review Strategy
Recommended review order:
Each PR builds on the previous layer, creating a clean dependency stack.
Dependencies
Depends on all previous layers: #324, #325, #326, #327, #328, #329