Collibra Data Catalog Plugin

Designed by Agile Lab, witboost is a versatile platform that addresses a wide range of sophisticated data engineering challenges. It enables businesses to discover, enhance, and productize their data, fostering the creation of automated data platforms that adhere to the highest standards of data governance. Want to know more about witboost? Check it out here or contact us!.

This repository is part of our Starter Kit meant to showcase witboost's integration capabilities and provide a "batteries-included" product.

Collibra Data Catalog Plugin

Overview

This project implements a Witboost Data Catalog Plugin for Collibra using Python & FastAPI.

What's a Data Catalog Plugin?

A Data Catalog Plugin is an extension point for Witboost that allows publishing entities on an external, pluggable Data Catalog. It is invoked at the end of the provisioning flow and receives the whole information about the entity descriptor, provisioning info, etc.

You can learn more about how Data Catalog plugins fit in the broader picture here.

Software stack

This microservice is written in Python 3.11, using FastAPI for the HTTP layer. Project is built with Poetry and supports packaging as Wheel and Docker image, ideal for Kubernetes deployments (which is the preferred option).

Git hooks

Hooks are programs you can place in a hooks directory to trigger actions at certain points in git’s execution. Hooks that don’t have the executable bit set are ignored.

The hooks are all stored in the hooks subdirectory of the Git directory. In most projects, that’s .git/hooks.

Out of the many available hooks supported by Git, we use pre-commit hook in order to check the code changes before each commit. If the hook returns a non-zero exit status, the commit is aborted.

Setup Pre-commit hooks

In order to use pre-commit hook, you can use pre-commit framework to set up and manage multi-language pre-commit hooks.

To set up pre-commit hooks, follow the below steps:

Install pre-commit framework either using pip (or) using homebrew (if your Operating System is macOS):
- Using pip:
```
pip install pre-commit
```
- Using homebrew:
```
brew install pre-commit
```
Once pre-commit is installed, you can execute the following:

pre-commit --version

If you see something like pre-commit 3.3.3, your installation is ready to use!

To use pre-commit, create a file named .pre-commit-config.yaml inside the project directory. This file tells pre-commit which hooks needed to be installed based on your inputs. Below is an example configuration:

repos:
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.4.0
    hooks:
      - id: trailing-whitespace

The above configuration says to download the pre-commit-hooks project and run its trailing-whitespace hook on the project.

Run the below command to install pre-commit into your git hooks. pre-commit will then run on every commit.

pre-commit install

Building

Requirements:

Python ~3.11 (this is a strict requirement as of now, due to uvloop 0.17.0)
Poetry

If you don't have any instance of Python 3 installed, you can install Python 3.11 directly executing the following commands:

sudo apt update
sudo apt install software-properties-common
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt update
sudo apt install python3.11
python3.11 --version
which python3.11

If you do have Python 3 installed, but not version 3.11 required by this project, we recommend using pyenv to manage python environments with different versions, as it is flexible and fully compatible with Poetry. You can install pyenv following the guide here. Then, to install and configure to use Python 3.11 on the project simply execute:

pyenv install 3.11
pyenv local 3.11

Installing:

To set up a Python environment we use Poetry:

curl -sSL https://install.python-poetry.org | python3 -

Once Poetry is installed and in your $PATH, you can execute the following:

poetry --version

If you see something like Poetry (version x.x.x), your installation is ready to use!

Install the dependencies defined in pyproject.toml:

poetry env use 3.11
poetry install

Note: All the following commands are to be run in the Poetry project directory with the virtualenv enabled. If you use pyenv to manage multiple Python runtimes, make sure Poetry is using the right version. You can tell pyenv to use the Python version available in the current shell. Check this Poetry docs page here.

Type check: is handled by mypy:

poetry run mypy src/

Tests: are handled by pytest:

poetry run pytest --cov=src/ tests/. --cov-report=xml

Artifacts & Docker image: the project leverages Poetry for packaging. Build package with:

poetry build

The Docker image can be built with:

docker build .

More details can be found here.

Note: the version for the project is automatically computed using information gathered from Git, using branch name and tags. Unless you are on a release branch 1.2.x or a tag v1.2.3 it will end up being 0.0.0. You can follow this branch/tag convention or update the version computation to match your preferred strategy.

CI/CD: the pipeline is based on GitLab CI as that's what we use internally. It's configured by the .gitlab-ci.yaml file in the root of the repository. You can use that as a starting point for your customizations.

Configuring

Configuration is handled via Pydantic but based on the application.yaml file located in the root folder of the repository. It allows setting custom domain and asset types for the provisioned descriptors, as well as the set of relation types and attributes related to each asset. Check Configuration for more information.

Running

To run the server locally, use:

source $(poetry env info --path)/bin/activate # only needed if venv is not already enabled
uvicorn src.main:app --host 127.0.0.1 --port 8091

By default, the server binds to port 8091 on localhost. After it's up and running you can make provisioning requests to this address. You can also check the API documentation served here.

Deploying

This microservice is meant to be deployed to a Kubernetes cluster with the included Helm chart and the scripts that can be found in the helm subdirectory. You can find more details here.

License

This project is available under the Apache License, Version 2.0; see LICENSE for full details.

About Witboost

Witboost is a cutting-edge Data Experience platform, that streamlines complex data projects across various platforms, enabling seamless data production and consumption. This unified approach empowers you to fully utilize your data without platform-specific hurdles, fostering smoother collaboration across teams.

It seamlessly blends business-relevant information, data governance processes, and IT delivery, ensuring technically sound data projects aligned with strategic objectives. Witboost facilitates data-driven decision-making while maintaining data security, ethics, and regulatory compliance.

Moreover, Witboost maximizes data potential through automation, freeing resources for strategic initiatives. Apply your data for growth, innovation and competitive advantage.

Contact us or follow us on:

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
docs		docs
helm		helm
src		src
tests		tests
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
Dockerfile		Dockerfile
README.md		README.md
application.yaml		application.yaml
interface-specification.yml		interface-specification.yml
mypy.ini		mypy.ini
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
server_start.sh		server_start.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Collibra Data Catalog Plugin

Overview

What's a Data Catalog Plugin?

Software stack

Git hooks

Setup Pre-commit hooks

Building

Configuring

Running

Deploying

License

About Witboost

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Collibra Data Catalog Plugin

Overview

What's a Data Catalog Plugin?

Software stack

Git hooks

Setup Pre-commit hooks

Building

Configuring

Running

Deploying

License

About Witboost

About

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages