Skip to content

Commit aa1deef

Browse files
committed
fix: handle DST transitions in I90 datetime parsing
Two changes to I90Sheet._preprocess and _normalize_datetime_columns: 1. DST column format: I90 files on DST days use range headers like "00-01", "01-02", "02-03a", "02-03b" instead of sequential integers. Added detection for this format (first column starts with "0") and assign sequential 1-based indices, letting the period count (23/25 for hourly, 92/100 for QH) encode the DST information. 2. UTC-first datetime construction: replaced tz_localize('Europe/Madrid', ambiguous='infer') with anchoring midnight in Europe/Madrid, converting to UTC, then adding period offsets. This eliminates DST ambiguity entirely — each period maps to a unique UTC instant regardless of fall-back (repeated hour) or spring-forward (missing hour). Tested with fall-back (2022-10-30: 25h/100QH), spring-forward (2022-03-27: 23h), and normal days (24h).
1 parent b85d4fd commit aa1deef

1 file changed

Lines changed: 32 additions & 11 deletions

File tree

src/esios/processing/i90.py

Lines changed: 32 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -166,18 +166,33 @@ def _get_rows(self) -> np.ndarray:
166166
def _normalize_datetime_columns(self, columns: np.ndarray) -> np.ndarray:
167167
"""Normalize time column headers to integer period indices.
168168
169-
Handles three column formats found in I90 files:
170-
- Sequential integers 1–24 (hourly) or 1–96 (quarterly)
171-
- H-Q format with dash notation: "1-1", "1-2", "1-3", "1-4", "2-1", …
172-
- NaN-filler format: [1, NaN, NaN, NaN, 2, …] (one label per hour,
173-
three trailing NaNs for quarters 2–4)
169+
Handles four column formats found in I90 files:
170+
171+
1. Sequential integers: 1–24 (hourly) or 1–96 (quarterly)
172+
2. H-Q format: "1-1", "1-2", "1-3", "1-4", "2-1", …
173+
3. NaN-filler format: [1, NaN, NaN, NaN, 2, …]
174+
4. Range format (DST days): "00-01", "01-02", "02-03a", "02-03b", …
175+
where the first number is the start hour and a/b suffix marks
176+
the repeated hour on fall-back days. Detected by the first
177+
column starting with "0" (e.g. "00-01").
174178
"""
175179
if any(pd.isna(columns)):
176180
self._n_columns_totals = 3
177181
else:
178182
self._n_columns_totals = 2
179183

180184
series = pd.Series(columns, dtype=str).ffill()
185+
186+
# Range format (DST): "00-01", "01-02", "02-03a", "02-03b", ...
187+
# Detected by first column starting with "0" (sequential ints start at 1).
188+
first_val = str(columns[0]).strip()
189+
if first_val.startswith("0") and "-" in first_val:
190+
# Simply assign sequential 1-based indices.
191+
# The count of columns (23, 24, or 25 for hourly; 92, 96, or 100
192+
# for QH) already encodes the DST information. The datetime builder
193+
# in _preprocess uses these as offsets from midnight UTC.
194+
return np.arange(1, len(columns) + 1)
195+
181196
parts = series.str.split("-")
182197
hours = parts.str[0].astype(float).astype(int)
183198

@@ -251,12 +266,18 @@ def _preprocess(self) -> pd.DataFrame:
251266
self.frequency = "hourly"
252267
time_deltas = columns_date * 60 # minutes
253268

254-
# Build datetime index
255-
base_date = pd.to_datetime(self.metadata["date_data"])
256-
columns_datetime = base_date + pd.to_timedelta(time_deltas, unit="m")
257-
columns_datetime = pd.DatetimeIndex(columns_datetime).tz_localize(
258-
"Europe/Madrid", ambiguous="infer"
259-
)
269+
# Build datetime index in UTC to avoid DST ambiguity.
270+
# On fall-back days (Oct), I90 has 25 hourly periods (or 100 QH).
271+
# Naïve offset arithmetic creates a single 02:00 that tz_localize
272+
# cannot disambiguate. By anchoring midnight in Europe/Madrid,
273+
# converting to UTC, then adding offsets, each period maps to a
274+
# unique UTC instant — no ambiguity.
275+
# On spring-forward days (Mar), I90 has 23 periods (or 92 QH)
276+
# and this approach naturally skips the non-existent hour.
277+
midnight_utc = pd.Timestamp(
278+
self.metadata["date_data"], tz="Europe/Madrid"
279+
).tz_convert("UTC")
280+
columns_datetime = midnight_utc + pd.to_timedelta(time_deltas, unit="m")
260281

261282
data = pd.DataFrame(self.rows[idx + 1 :], columns=columns)
262283

0 commit comments

Comments
 (0)