Skip to content

feat: Multi instance metrics#31

Open
CasperTeirlinck wants to merge 14 commits intomainfrom
feat/metric-multi-instance
Open

feat: Multi instance metrics#31
CasperTeirlinck wants to merge 14 commits intomainfrom
feat/metric-multi-instance

Conversation

@CasperTeirlinck
Copy link
Copy Markdown
Contributor

@CasperTeirlinck CasperTeirlinck commented Apr 24, 2026

This PR adds the ability to configure multiple instances for the same Metric class.

Note: also includes the following:

Context

#29 made it possible to configure metrics using the constructor, which is more intuitive than subclassing.
It is however not possible yet to have more than one instance for the same metric. This is a common use case for for example the GitTrackedFileCountMetric:

Before, you would define multiple metric classes that track certain files like this:

class CruftFileExistsMetric(GitTrackedFileCountMetric):
    name: ClassVar[str] = "cruft_file_exists"
    description: ClassVar[str] = "Whether .cruft.json exists"
    pattern: str = ".cruft.json"

class SomeConfigExistsMetric(GitTrackedFileCountMetric):
    name: ClassVar[str] = "config_exists"
    description: ClassVar[str] = "Whether config.yml exists"
    pattern: str = "config.yml"

class DagCountMetric(GitTrackedFileCountMetric):
    name: ClassVar[str] = "airflow_dag_count"
    description: ClassVar[str] = "Number of DAG files"
    pattern: str = "dags/*.py"

This is however possible now because of #29 by just creating multiple instances of the GitTrackedFileCountMetric:

GitTrackedFileCountMetric(
    name="cruft_file_exists",
    description="Whether .cruft.json exists",
    pattern=".cruft.json",
)
GitTrackedFileCountMetric(
    name="config_exists",
    description="Whether config.yml exists",
    pattern="config.yml",
)
GitTrackedFileCountMetric(
    name="airflow_dag_count",
    description="Number of DAG files",
    pattern="dags/*.py",
)

except that the calculation of metrics still assumed one instance per metric class which would only keep a single Measurement out of the 3 configured metrics above.

Summary

  • Instance-based metric configuration: Changed name, description, and unit from ClassVar to instance fields on the Metric class. This allows creating multiple instances of the same metric class with different configurations.
  • Multiple metric instances support: Changed the measurements signature from dict[type[Metric], Measurement] to dict[type[Metric], list[Measurement]] to support multiple instances of the same metric class. It is the responsibility of the calculate method of a metric to handle multiple Measurements of a dependent metric. Metric dependencies keep being defined using Metric classes.
  • Refactored executor module because it was too large (~540 lines).

@CasperTeirlinck CasperTeirlinck marked this pull request as ready for review April 24, 2026 13:52
Copy link
Copy Markdown
Member

@jvanbuel jvanbuel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be quite honest, I don't find the list of measurements that elegant (especially that you now always need to index with zero to get the default behaviour of one measurement per metric), but I do understand the reasoning. The requirement that downstream metrics need to decide what to do with multiple measurements also makes sense to. The utility function to assert there is only one measurement seems like a good way to make dealing with this easier.

Maybe one idea: what if instead of list of measurements, the signature of measure is an iterator of measurements? Than the default behaviour becomes calling next. I don't know why, but to me that feels a bit better. But that's a feeling, which is not really sound for making decisions.

logger.debug("Executing provider: %s", provider.name)
data = provider.provide()
context[provider.name] = data
if provider.is_tag_provider():
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tag provider is treated as a special provider, but I'm not sure if that's absolutely necessary? What prevents the tag provider from just enriching the context, and the metrics trying to fetch data from the context to populate their tags?

On the other hand, we do have a tag attribute for each metric, so tags are kind of special. Not blocking, just wanted to hear your thoughts on this.

Copy link
Copy Markdown
Contributor Author

@CasperTeirlinck CasperTeirlinck Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree, I see no real reason why we should treat it as special when executing the providers, using the context is much more consistent and simplifies the code a bit too. In the MetricCalculator we can then just access the tags from the context to add it to the tags attribute of the measurements.
I refactored it here: #33

Comment thread src/checkup/validators.py Outdated
if len(instances) > 1
}
if duplicates:
# Report the first duplicate found
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only report the first duplicate?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason I think. I changed it so it reports all duplicates.
d22049a

Comment thread src/checkup/validators.py Outdated
name, classes = next(iter(duplicates.items()))
raise DuplicateMetricNameError(name, classes)
name, instances = next(iter(duplicates.items()))
raise DuplicateMetricNameError(name, [type(m) for m in instances])
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't the iterator return a list of the same types? Or do I misinterpret this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It gives a list of Metric (classes) that share the same name, so it could be different types if you gave the same name to 2 different metrics I think.

@CasperTeirlinck
Copy link
Copy Markdown
Contributor Author

To be quite honest, I don't find the list of measurements that elegant (especially that you now always need to index with zero to get the default behaviour of one measurement per metric), but I do understand the reasoning. The requirement that downstream metrics need to decide what to do with multiple measurements also makes sense to. The utility function to assert there is only one measurement seems like a good way to make dealing with this easier.

Maybe one idea: what if instead of list of measurements, the signature of measure is an iterator of measurements? Than the default behaviour becomes calling next. I don't know why, but to me that feels a bit better. But that's a feeling, which is not really sound for making decisions.

@jvanbuel I agree it does not feel super intuitive and is a bit ugly with the list, especially the long type hint. I propose an alternative using a Measurements class to provide hopefully a cleaner interface for the measurements here: #34
Internally it still uses a list because we need to append, but type hint it as a Sequence to the outside.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants