The Parquet serializer has two issues. They are illustrated with the following api query
https://data.ssb.no/api/pxwebapi/v2/tables/14216/data/?valuecodes[TettSted]=0801
&valuecodes[ContentsCode]=Areal,Bosatte&valuecodes[Tid]=2025,2024&outputFormat=parquet
Resulting Parquet
| år |
timestamp |
tettsted |
ContentsCode_Areal |
ContentsCode_Areal_symbol |
ContentsCode_Bosatte |
ContentsCode_Bosatte_symbol |
| 2024 |
2024-01-01T00:00:00.000 |
0801 |
275,87 |
|
1110887 |
|
| 2024 |
2024-01-01T00:00:00.000 |
0801 |
1110887 |
|
276,3 |
|
| 2025 |
2025-01-01T00:00:00.000 |
0801 |
276,3 |
|
1098061 |
|
| 2025 |
2025-01-01T00:00:00.000 |
0801 |
1098061 |
|
1098061 |
|
- Selecting two or more contents (Areal and Bosatte) creates to many rows in the resulting parquet file, in this case there should have been two rows
- Selecting years
2025,2024 is not the same as selecting 2024,2025. In this case the the first row is actually the 2025 figures. The reason for this is that the parquet seralizer uses TIMEVAL and from the px output below we see that TIMVAL is the same when swapping the years. The api does not sort any valuecodes. This is intentional in the new api.
$ curl "https://data.ssb.no/api/pxwebapi/v2/tables/14216/data/?valuecodes%5bTettSted%5d=0801&valuecodes%5bContentsCode%5d=Areal,Bosatte&valuecodes%5bTid%5d=2025,2024&outputFormat=px" -s -i | grep -E '(TIMEVAL|CODES|VALUES)'
VALUES("tettsted")="Oslo";
VALUES("statistikkvariabel")="Areal av tettsted (km?)","Bosatte";
VALUES("år")="2025","2024";
TIMEVAL("år")=TLIST(A1),"2024","2025";
CODES("tettsted")="0801";
CODES("statistikkvariabel")="Areal","Bosatte";
CODES("år")="2025","2024";
$ curl "https://data.ssb.no/api/pxwebapi/v2/tables/14216/data/?valuecodes%5bTettSted%5d=0801&valuecodes%5bContentsCode%5d=Areal,Bosatte&valuecodes%5bTid%5d=2024,2025&outputFormat=px" -s -i | grep -E '(TIMEVAL|CODES|VALUES)'
VALUES("tettsted")="Oslo";
VALUES("statistikkvariabel")="Areal av tettsted (km?)","Bosatte";
VALUES("år")="2024","2025";
TIMEVAL("år")=TLIST(A1),"2024","2025";
CODES("tettsted")="0801";
CODES("statistikkvariabel")="Areal","Bosatte";
CODES("år")="2024","2025";
The first issue with to many rows is a clear bug. I have changed the tests and will try and fix the bug in PxTools/PCAxis.Serializers#181
For the second issue it is not clear if the bug is in the parquet serializer or in the PxWebApi for not sorting time in ascending order?
Is this a valid PX file according to the TIMEVAL documentation?
The Parquet serializer has two issues. They are illustrated with the following api query
Resulting Parquet
2025,2024is not the same as selecting2024,2025. In this case the the first row is actually the 2025 figures. The reason for this is that the parquet seralizer usesTIMEVALand from the px output below we see that TIMVAL is the same when swapping the years. The api does not sort any valuecodes. This is intentional in the new api.The first issue with to many rows is a clear bug. I have changed the tests and will try and fix the bug in PxTools/PCAxis.Serializers#181
For the second issue it is not clear if the bug is in the parquet serializer or in the PxWebApi for not sorting time in ascending order?
Is this a valid PX file according to the TIMEVAL documentation?