Skip to content

Conversation

@ecomodeller
Copy link
Member

@ecomodeller ecomodeller commented Nov 11, 2025

image

@ryan-kipawa
Copy link
Collaborator

The CI test for the latest Python version fails, and will likely continue to in the future due to Pythonnet dependency of mikeio1d:

DHI/mikeio1d#187

@ryan-kipawa
Copy link
Collaborator

Current problem of integration between MIKE IO 1D and ModelSkill

A user of MIKE 1D would most often have observational data from a monitor at a specific location (e.g. a flow meter in a pipe, a gauge in a water distribution network, or some operational gauges at a WWTP). Track observations are not relevant (to my knowledge).

ModelSkill has two available observation types: PointObservation and TrackObservation. It also has various model types: PointModelResult, TrackModelResult, DfsuModelResult, and GridModelResult.

The PointObservation is not sufficient to uniquely identify a location in a MIKE 1D result file due to:

  1. A manhole can have several inflow / outflow pipes that are represented as a single coordinate in a network result.
  2. There can be pipes that cross at the same point in space, but at different elevations (e.g. storm sewers and sanitary sewers).
  3. Precise coordinates of flow meters are often not known. It's more common to reference the location by a manhole and the specific pipe it's measuring. Approximate locations are known for plotting, but it is often not possible to distinguish between multiple closely located objects (e.g. within 1-2m of each other).
  4. It is not necessary for a network to have coordinates for computation, even though they often do.
  5. Model result grid points may not align with observation point. It's not trivial how to make these comparable in the general sense (e.g. linearly interpolating water levels could be incorrect due to step changes in bottom elevation).
  6. Simplified models may use hydrologic results only, which requires comparing a catchment with a point.

The workflow today is to manually save observation and model results to PointObservation and PointModelResult, and then manually match them. This is awkward and annoying to use: exporting network results to dfs0 first is an unnecessary step.

Therefore, there's a need for a new Observation and ModelResult type for networks.

Alternative designs

Design 1 - network location as kwargs, specific to MIKE 1D

obs_12l1 = ms.NetworkLocationObservation(
    data = "data/flow_meter.dfs0",
    item = 0,
    reach =12l1”,              # alternatively catchment=, or node= ...
    chainage =28.410”,    # alternatively, x,y coords to choose nearest gridpoint, respecting graph connectivity
    name = "Flow Meter A",
)

mod_12l1_v1 = ms.NetworkModelResult(
    data="data/network_v1.res1d",
)

mod_12l1_v2 = ms.NetworkModelResult(
    data="data/network_v2.res1d",
)


cc = ms.ComparerCollection([
    ms.match(obs_12l1, mod_12l1_v1),
    ms.match(obs_12l1, mod_12l1_v2),

])

Design 2 - generic location interface, implemented by each library

obs_12l1 = ms.NetworkLocationObservation(
    data = "data/flow_meter.dfs0",
    item = 0,
    network_location = res.reaches["my_reach"][0]   # leave complexity of locating network in mikeio1d
    name = "Flow Meter A",
)

mod_12l1_v1 = ms.NetworkModelResult(
    data="data/network_v1.res1d",
)

mod_12l1_v2 = ms.NetworkModelResult(
    data="data/network_v2.res1d",
)


cc = ms.ComparerCollection([
    ms.match(obs_12l1, mod_12l1_v1),
    ms.match(obs_12l1, mod_12l1_v2),

])

# this requires updates to mikeio1d (and other network libraries) to implement interfaces somewhat like this

@dataclass(frozen=True)
class NodeLocation(NetowrkLocation):
    """
    Minimal explicit node location.
    """
    node_id: str
    group: str | None

@dataclass(frozen=True)
class ReachLocation(NetworkLocation):
    """
    Minimal explicit reach location.
    """
    reach_id: str
    chainage: float | None
    gridpoint_index: int | None # either chainage or gridpoint index need to be provided.
    group: str | None

NetworkLocation = NodeLocation | ReachLocation

class GetNetworkQuantityResult(Protocol):
    def __call__(self, location: "NetworkLocation", quantity: str) -> pd.Series:
        ...

There would need to be a plugin mechanism where ModelSkill discovers packages providing these capabilities. This could be determined by file extension, or by a backend parameter where ambiguous. The plugin should happen behind the scenes (e.g. )

Work in progress...

@jpalm3r
Copy link
Contributor

jpalm3r commented Jan 14, 2026

Observations will typically be passed to modelskill via tabular format (e.g. csv). Then, users will need to specify the name of the relevant column. Taking this into account: is there an actual need for a network-specific observation class? To me it looks like PointObservation already does the trick.

On the other hand, I do see that some sort of NetworkModelResult is necessary so we can correctly select a timeseries based on network coordinates. This is a workflow I have been working on:

quantity = "Water Level"
models = [
    ms.NetworkModelResult(path_to_res1d, quantity, node=7, name="model 1"),
    ms.NetworkModelResult(path_to_res1d, quantity, reach="100l1", gridpoint="start", name="model 2"),
    ms.NetworkModelResult(path_to_res1d, quantity, reach="54l1", gridpoint=0, name="model 3")
    ms.NetworkModelResult(path_to_res1d, quantity, reach="54l1", chainage=23.65, name="model 4")
]

obs1 = ms.PointObservation(observations, quantity=quantity, item="sensor_9")

comparer_1 = ms.match(obs1, models)

Still, I see that this approach is the opposite of @ecomodeller's suggestion here. There, the observations are accessed by reach and chainage, but the model results are accessed directly by name.

I will push my suggestions soon, just let me know if you have any thoughts.

@ecomodeller
Copy link
Member Author

A model result covers the entire domain, while each named observation is located in a single position. So in order to assess the quality of a model, several observations distributed across the domain are used. Thus, the location of the observation has to be recorded somewhere, NetCDF can have the location as metadata, while other formats, csv, dfs0 typically this information is store elsewhere and is added to the observation object to be able to extract matching model data from the model result.

@jpalm3r
Copy link
Contributor

jpalm3r commented Jan 15, 2026

In such case we can compare multiple PointObservation objects to a single NetworkModelResult with a ComparerCollection.
My question is more about the benefits of developing a new object, NetworkLocationObservation, when essentially it would be the same as a PointObservation. Also, as far as I know, observations will typically come as a csv, so the user will need to know which one is the relevant column anyways.

@ryan-kipawa
Copy link
Collaborator

In such case we can compare multiple PointObservation objects to a single NetworkModelResult with a ComparerCollection. My question is more about the benefits of developing a new object, NetworkLocationObservation, when essentially it would be the same as a PointObservation. Also, as far as I know, observations will typically come as a csv, so the user will need to know which one is the relevant column anyways.

In this design, NetworkModelResult is a subset of the model results at a specific location that has a sensor. So, the location of the sensor is bundled into the model result object in this approach. By definition, it means the model result no longer covers the entire network domain, but rather a very specific subset. So, if you have a single network result with 20 sensors, you would need 20 model objects with 20 corresponding observations, and then match them using the NetworkModelResult approach. However, if you put the sensor location information on the observation, then you would only need 1 model object with 20 corresponding observations, and there's less risk of making mistakes when matching.

The most important aspect that I see is that we need to be consistent with the existing Modelskill API. Here, the user expects the model result to cover an entire domain and the observation to include sufficient spatial metadata to extract the right data from the modelling domain for comparison. For example, when you compare a dfsu model result with a point observation that has an x, y, then the spatial information exists on the sensor which is sufficient to unambiguously extract data from the dfsu. Unfortunately, with network data, an x,y coordinate is not enough to uniquely identify a time series.

@jpalm3r
Copy link
Contributor

jpalm3r commented Jan 15, 2026

My current implementation works like this.

quantity = "Water Level"
mod_item = ms.NetworkModelResult(path_to_res1d, quantity, node=7)
sens = [
    ms.PointObservation(observations, item="sensor_1"),
    ms.PointObservation(observations, item="sensor_2"),
    ms.PointObservation(observations, item="sensor_3"),
    ms.PointObservation(observations, item="sensor_4"),
]

cmp = ms.match(sens, mod_item)
cmp.skill()

In this case, I use one model and I compare it to multiple observations. I agree with being consistent with modelskill API, this is the reason behind this approach.

@ryan-kipawa
Copy link
Collaborator

I see. I think the common case is that the sensors would be located at different spatial locations in the network. For example, one might have 5-20 sensors that each exist at a unique locations in the network. How does the implementation handle that case?

In the example above, I understand it that these 4 sensors would all exist at a single location in the network (node 7).

Here's a a small example of such a use case. In this network, there's two "flow meters" that measure discharge at two unique locations in the network (shown in image below). If I understand it, the current implementation would need to handle it like this:

quantity = "Discharge"
mod_item1 = ms.NetworkModelResult(path_to_res1d, quantity, reach="pipe1", chainage=0)
mod_item2 = ms.NetworkModelResult(path_to_res1d, quantity, reach="pipe2", chainage=0)
sens = [
    ms.PointObservation(observations, item="sensor_1"),
    ms.PointObservation(observations, item="sensor_2"),
]

cmp1 = ms.match(sens[0], mod_item1)
cmp2 = ms.match(sens[1], mod_item2)

cmp1.skill()
cmp2.skill()
image

@jpalm3r
Copy link
Contributor

jpalm3r commented Jan 15, 2026

Oh, of course, that make sense. Let me update my changes and let you know.

@ryan-kipawa
Copy link
Collaborator

In this case, I use one model and I compare it to multiple observations. I agree with being consistent with modelskill API, this is the reason behind this approach.

I was reflecting on this. Maybe neither of these are the right abstraction yet since they weren't obvious. Perhaps there's a need for both a NetworkModelResult AND a NetworkModelPointResult to make the distinction more explicit. The parallel with the existing modelskill API would be Dfsu/Grid ModelResults and PointModelResult, where the former can be extracted from the latter. Just thinking out loud with this for your consideration

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants