Csv options #834

ldecicco-USGS · 2025-12-19T22:02:14Z

Added note to function docs that properties NA will return all columns
Check properties against data in sysdata.rda. If it seems a property doesn't exist, go check dynamically
Make sure query errors on Bad Request.
Make sure empty returns come back with the attributes (they were getting dropped).
Cleanup some of the internal functions that were overly complex (due to not being refactored after other parts of the code were refactored)
Moved some functions into their own files to fine them easier
Added a no_page option for quicker, but sketchier downloads (not recommended unless you promise not to get mad at us if it doesn't come back with all the data)

Adding continuous data See merge request water/dataRetrieval!455

…nto develop

ldecicco-USGS · 2025-12-19T22:25:47Z

I don't know why GH Actions are having so much problems with the setup-r-dependencies. Sometimes that seems to just happen and it's fixed later. For now, you can verify that things are working on code.usgs.gov (the latest pipeline just passed the package checks).

ehinman

Still need to do some testing, and I've included a few thoughts/tiny edits so far.

After running these tests:

test <- read_waterdata_daily(monitoring_location_id = "USGS-11348500", properties = "dogs")
test <- read_waterdata_daily(monitoring_location_id = "USGS-11348500", properties = c("value", "time", "dogs"))

I get the nice error message:

Error in (function (service, properties = NA_character_, bbox = NA, limit = NA,  : 
  Invalid properties requested.

A couple thoughts:

I don't see a request sent out to grab the properties schema. Should I?
It doesn't tell me which properties are invalid/valid. How tricky would it be to add that?

More later, nice work!!

DESCRIPTION

NEWS

R/get_ogc_data.R

ehinman · 2025-12-22T20:18:06Z

R/get_ogc_data.R

+#' Switch properties id
+#' 
+#' If a user asks for either "id" or "output_id", it is only included in the 
+#' properties if that's the only column requested. "id" will always come back,


Why does id always come back? Seems like you could leave it out of non-NA properties and it wouldn't come back, right? That's how drpy works.

It doesn't. It only comes back when it's the ONLY property requested(because if you left it out there, you'd get all of the properties), or the only property + geometry (because geometry will come no matter what unless you add skipGeometry to the url):

properties <- c("id", "state_name", "country_name") dataRetrieval:::switch_properties_id(properties, id = "monitoring_location_id") [1] "state_name" "country_name" properties2 <- c("monitoring_location_id", "state_name", "country_name") dataRetrieval:::switch_properties_id(properties2, id = "monitoring_location_id") [1] "state_name" "country_name" properties3 <- c("monitoring_locations_id", "state_name", "country_name") dataRetrieval:::switch_properties_id(properties3, id = "monitoring_location_id") [1] "monitoring_locations_id" "state_name" "country_name" properties4 <- c("monitoring_location_id") dataRetrieval:::switch_properties_id(properties4, id = "monitoring_location_id") [1] "id" properties5 <- c("monitoring_location_id", "geometry") dataRetrieval:::switch_properties_id(properties5, id = "monitoring_location_id") [1] "id"

ehinman · 2025-12-22T20:52:39Z

R/read_waterdata_daily.R

                                 time = NA_character_,
                                 bbox = NA,
                                 limit = NA,
                                 max_results = NA,


Does max_results make sense with the no_paging argument being added? It seems like they do similar things...though max_results lets you do LESS than 50k rows.

I'm torn on this...you could set it up this way. Since I just added the sf integration to the csv it might be OK.

Did you guys end up not doing a max_results in drpy?

I'm going to make a new PR for removing max_results after I merge this one.

R/read_waterdata_latest_daily.R

ehinman · 2025-12-22T21:06:44Z

R/rejigger_cols.R

+#' Convert columns if needed
+#' 
+#' These are columns that have caused problems in testing.
+#' Mostly if the columns are empty on 1 page, but not the next.


Out of curiosity: Do you have a good real life example?

R/walk_pages.R

ehinman · 2025-12-23T02:32:24Z

R/read_waterdata_continuous.R

 #' @param convertType logical, defaults to `TRUE`. If `TRUE`, the function
 #' will convert the data to dates and qualifier to string vector, and sepcifically
 #' order the returning data frame by time and monitoring_location_id.
+#' @param no_paging logical, defaults to `FALSE`. If `TRUE`, the data will


Might also want to say something about how this option does not return geometry.

It does return an x, y however. I'll add some text

Actually, I just added some code to convert the x, y to geometry - so should be equal now to the json option (or...closer to equal at least).

ehinman · 2025-12-23T02:37:17Z

R/read_waterdata_continuous.R

 #' @param properties A vector of requested columns to be returned from the query.
 #' Available options are: 
-#' `r schema <- check_OGC_requests(endpoint = "continuous", type = "schema"); paste(names(schema$properties)[!names(schema$properties) %in% c("id", "internal_id")], collapse = ", ")`
+#' `r dataRetrieval:::get_properties_for_docs("continuous", "continuous_id")`.


For some reason, ?read_waterdata_continuous is not showing up for me. Is this an issue on my end?

ehinman · 2025-12-23T02:50:45Z

For some reason, I am getting an error with this line:

test <- read_waterdata_continuous(monitoring_location_id = "USGS-11348500", parameter_code = "00065", time = "2024-01-01/..")

Requesting:
https://api.waterdata.usgs.gov/ogcapi/v0/collections/continuous/items?f=json&lang=en-US&skipGeometry=TRUE&limit=50000&monitoring_location_id=USGS-11348500&parameter_code=00065&time=2024-01-01%2F..
Error in walk_pages(req, max_results) : HTTP 500 Internal Server Error.

At first I was confused about why skipGeometry=TRUE, but now I remember that that's by design. Is this enumerated somewhere?

ldecicco-USGS · 2025-12-23T14:00:05Z

For some reason, I am getting an error with this line:
test <- read_waterdata_continuous(monitoring_location_id = "USGS-11348500", parameter_code = "00065", time = "2024-01-01/..")

Requesting:
https://api.waterdata.usgs.gov/ogcapi/v0/collections/continuous/items?f=json&lang=en-US&skipGeometry=TRUE&limit=50000&monitoring_location_id=USGS-11348500&parameter_code=00065&time=2024-01-01%2F..
Error in walk_pages(req, max_results) : HTTP 500 Internal Server Error.
At first I was confused about why skipGeometry=TRUE, but now I remember that that's by design. Is this enumerated somewhere?

Good catch. Playing around it looks like the continuous data needs the time specified. If you use the vector approach (time = c(start, stop)) it adds that info automatically. We can add an issue to try to convert the time if someone enters a character string, but I kind of think most people will use the "vector" approach (since that's how the examples are shown):

# works:
test <- read_waterdata_continuous(monitoring_location_id = "USGS-11348500", 
                                  parameter_code = "00065", 
                                  time = c("2024-01-01", NA))
# works:
test <- read_waterdata_continuous(monitoring_location_id = "USGS-11348500", 
                                  parameter_code = "00065", 
                                  time = "2024-01-01T00:00:00Z/..")

# daily works with just a date:
test <- read_waterdata_daily(monitoring_location_id = "USGS-11348500", 
                                  parameter_code = "00060", 
                                  time = "2024-01-01/..")
# Doesn't work:
test <- read_waterdata_continuous(monitoring_location_id = "USGS-11348500", 
                             parameter_code = "00065", 
                             time = "2024-01-01/..")

Co-authored-by: Elise Hinman <121896266+ehinman@users.noreply.github.com>

ldecicco-USGS · 2025-12-23T14:21:39Z

I merged some comment about returning 49k rows, not sure why you were seeing that, but you can get 50k rows when using the no_paging:

```r
> multi_site <- read_waterdata_daily(monitoring_location_id =  c("USGS-01491000",
+                                                                "USGS-01645000"),
+                                    parameter_code = c("00060", "00010"), 
+                                    no_paging = TRUE)
Setting no_paging to TRUE will only return the first 50000 rows of data with no indication of missing data
Requesting:
https://api.waterdata.usgs.gov/ogcapi/v0/collections/daily/items?f=csv&lang=en-US&limit=50000
Warning message:
In get_csv(req, max_results) :
  Missing data is probable. Use no_paging = FALSE to 
assure all requested data is returned.

> nrow(multi_site)
[1] 50000

ehinman · 2025-12-23T15:04:23Z

I merged some comment about returning 49k rows, not sure why you were seeing that, but you can get 50k rows when using the no_paging:

```r
> multi_site <- read_waterdata_daily(monitoring_location_id =  c("USGS-01491000",
+                                                                "USGS-01645000"),
+                                    parameter_code = c("00060", "00010"), 
+                                    no_paging = TRUE)
Setting no_paging to TRUE will only return the first 50000 rows of data with no indication of missing data
Requesting:
https://api.waterdata.usgs.gov/ogcapi/v0/collections/daily/items?f=csv&lang=en-US&limit=50000
Warning message:
In get_csv(req, max_results) :
  Missing data is probable. Use no_paging = FALSE to 
assure all requested data is returned.

> nrow(multi_site)
[1] 50000

Oh, the comment I made was that this message says "....will only return the first 50000 rows..." but technically it returns up to 50000 rows, and may return less if there are actually less than 50000 rows of data. I was just anticipating a scenario where someone might be stumped that the message says 50k but their call returns less than 50k.

ldecicco-USGS added 7 commits December 4, 2025 21:14

Merge branch 'continuous' into 'develop'

64ae491

Adding continuous data See merge request water/dataRetrieval!455

Merge branch 'develop' of https://code.usgs.gov/water/dataRetrieval i…

62e5bb7

…nto develop

Cleanup internal functions and property lists in the docs.

a13e982

Check properties only if they seem wrong.

61065cc

missed offset lib

22007c9

Make sure attributes get propogated

1c31ded

Allow users to get a no paging option.

f43d36d

ldecicco-USGS had a problem deploying to CI_config December 19, 2025 22:02 — with GitHub Actions Failure

ldecicco-USGS had a problem deploying to CI_config December 19, 2025 22:08 — with GitHub Actions Failure

update version

85c38a2

ldecicco-USGS had a problem deploying to CI_config December 19, 2025 22:15 — with GitHub Actions Failure

ldecicco-USGS requested a review from ehinman December 19, 2025 22:20

ehinman reviewed Dec 22, 2025

View reviewed changes

ehinman reviewed Dec 23, 2025

View reviewed changes

R/walk_pages.R Outdated Show resolved Hide resolved

ehinman reviewed Dec 23, 2025

View reviewed changes

R/walk_pages.R Outdated Show resolved Hide resolved

ehinman reviewed Dec 23, 2025

View reviewed changes

Update NEWS

c23810c

Co-authored-by: Elise Hinman <121896266+ehinman@users.noreply.github.com>

ldecicco-USGS temporarily deployed to CI_config December 23, 2025 14:07 — with GitHub Actions Inactive

Update R/walk_pages.R

0b402a9

Co-authored-by: Elise Hinman <121896266+ehinman@users.noreply.github.com>

ldecicco-USGS temporarily deployed to CI_config December 23, 2025 14:17 — with GitHub Actions Inactive

Update R/walk_pages.R

6664351

Co-authored-by: Elise Hinman <121896266+ehinman@users.noreply.github.com>

ldecicco-USGS temporarily deployed to CI_config December 23, 2025 14:18 — with GitHub Actions Inactive

ldecicco-USGS added 2 commits December 23, 2025 09:48

add time text

e161bbb

Add geometry to no_paging output

f3bf6f8

ldecicco-USGS added 2 commits December 23, 2025 10:40

cleanup

dc363f7

Better message

4d4f2fb

ldecicco-USGS temporarily deployed to CI_config December 23, 2025 17:00 — with GitHub Actions Inactive

Don't allow no_paging for metadata type retrievals.

7d8372f

ldecicco-USGS temporarily deployed to CI_config December 29, 2025 14:39 — with GitHub Actions Inactive

limit is not in ...

ed2b151

ldecicco-USGS temporarily deployed to CI_config December 29, 2025 15:00 — with GitHub Actions Inactive

ldecicco-USGS merged commit e397bbc into DOI-USGS:develop Dec 29, 2025
1 check passed

Csv options #834

Csv options #834

Uh oh!

Conversation

ldecicco-USGS commented Dec 19, 2025

Uh oh!

ldecicco-USGS commented Dec 19, 2025

Uh oh!

ehinman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ehinman commented Dec 23, 2025

Uh oh!

ldecicco-USGS commented Dec 23, 2025

Uh oh!

ldecicco-USGS commented Dec 23, 2025

Uh oh!

ehinman commented Dec 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants