Skip to content

Conversation

@m-abulazm
Copy link
Contributor

@m-abulazm m-abulazm commented Oct 28, 2025

Changes

What does this PR do?

  • Support databricks credentials
  • Standardize usages of credentials manager

Linked issues

Progresses #1008

Functionality

  • added relevant user documentation
  • modified existing command: databricks labs lakebridge ...

Tests

  • manually tested
  • added unit tests
  • added integration tests

@codecov
Copy link

codecov bot commented Oct 28, 2025

Codecov Report

❌ Patch coverage is 69.64286% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 65.37%. Comparing base (eebf284) to head (fef02c5).

Files with missing lines Patch % Lines
.../labs/lakebridge/connections/credential_manager.py 85.71% 3 Missing and 1 partial ⚠️
...abs/lakebridge/assessments/configure_assessment.py 33.33% 2 Missing ⚠️
src/databricks/labs/lakebridge/config.py 86.66% 1 Missing and 1 partial ⚠️
...s/assessments/synapse/dedicated_sqlpool_extract.py 0.00% 2 Missing ⚠️
.../assessments/synapse/monitoring_metrics_extract.py 0.00% 2 Missing ⚠️
.../assessments/synapse/serverless_sqlpool_extract.py 0.00% 2 Missing ⚠️
...resources/assessments/synapse/workspace_extract.py 0.00% 2 Missing ⚠️
...databricks/labs/lakebridge/assessments/profiler.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2123      +/-   ##
==========================================
+ Coverage   65.24%   65.37%   +0.12%     
==========================================
  Files         100      100              
  Lines        8506     8535      +29     
  Branches      876      878       +2     
==========================================
+ Hits         5550     5580      +30     
+ Misses       2769     2767       -2     
- Partials      187      188       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link

github-actions bot commented Oct 28, 2025

✅ 51/51 passed, 11 flaky, 3m53s total

Flaky tests:

  • 🤪 test_validate_invalid_source_tech (183ms)
  • 🤪 test_validate_table_not_found (1ms)
  • 🤪 test_validate_non_empty_tables (7ms)
  • 🤪 test_validate_mixed_checks (293ms)
  • 🤪 test_validate_invalid_schema_path (1ms)
  • 🤪 test_transpiles_informatica_to_sparksql_non_interactive[False] (20.681s)
  • 🤪 test_transpiles_informatica_to_sparksql (22.239s)
  • 🤪 test_transpile_teradata_sql_non_interactive[False] (22.59s)
  • 🤪 test_transpile_teradata_sql_non_interactive[True] (25.232s)
  • 🤪 test_transpiles_informatica_to_sparksql_non_interactive[True] (3.838s)
  • 🤪 test_transpile_teradata_sql (6.324s)

Running from acceptance #3107

# Conflicts:
#	src/databricks/labs/lakebridge/reconcile/connectors/jdbc_reader.py
#	src/databricks/labs/lakebridge/reconcile/connectors/oracle.py
@m-abulazm m-abulazm marked this pull request as ready for review November 10, 2025 14:31
@m-abulazm m-abulazm requested a review from a team as a code owner November 10, 2025 14:31
Copy link
Contributor

@asnare asnare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've highlighted some style and design issues that I think need to be resolved, but appreciate that this is the start of what is needed for #1008. On the testing side I really like that we've eliminated some monkey-patching during tests. (Some integration tests would be nice.)

One big concern I have is that I don't see where we're using the new provider because we don't pass the WorkspaceClient in anywhere that I can see. Can you elaborate a bit on the situation there?

Comment on lines +69 to +74
raise UnicodeDecodeError(
"utf-8",
key_only.encode(),
0,
1,
f"Secret {key} has Base64 bytes that cannot be decoded to utf-8 string: {e}.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be ValueError (from e): we're signalling that there's a problem with a user-supplied argument (due to an underlying unicode issue).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think this is on the user. databricks returned a malformed response and not the utf8 base64 value it should return

@dataclass
class ReconcileCredentialConfig:
vault_type: str # supports local, env, databricks creds.
source_creds: dict[str, str]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a better name for this: it doesn't hold the credentials… it's more of a vault configuration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree but renaming this will require changing across three PRs. I would rename it after reviewing all PRs

'local': LocalSecretProvider(),
'env': EnvSecretProvider(env_getter),
'databricks': DatabricksSecretProvider(),
def create_credential_manager(creds_or_path: dict | Path, ws: WorkspaceClient | None = None) -> CredentialManager:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I can see some tests, I can't see where the ws argument is provided in any of the non-test call-sites. Aside from the tests, is it actually being used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using credential manager in reconcile and supplying ws is done in #2159.

spark=spark,
ws=ws_client,
secret_scope=reconcile_config.secret_scope,
secret_scope=reconcile_config.creds.source_creds["__secret_scope"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this now be using the CredentialManager mechanism?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes and it is implemented in #2159 to make the reviews more manageable.
And #2157 adds the prompts to configure the creds.

this current PR can go to main first since it is backwards-compatible without the other two.

Comment on lines 66 to 67
except NotFound as e:
raise KeyError(f'Secret does not exist with scope: {scope} and key: {key_only} : {e}') from e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is different to the other providers: they just return the key if the secret cannot be found, whereas here we raise an exception instead.

What do you think the providers should do? I think they need to be consistent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should not raise an error. the return type should be optional and it is up to the caller how to handle missing secrets.

I did not want to change lots of things in one go so the
the implementation you see here of DatabricksSecretProvider is copied from src/databricks/labs/lakebridge/reconcile/connectors/secrets.py without changing the way it works which led to some inconsistency.

I would address your comment here in a later PR if you dont mind

Co-authored-by: Andrew Snare <asnare@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

internal technical pr's not end user facing tech debt design flaws and other cascading effects

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants