fix(drive): write files.download --output contents#805
Conversation
|
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the CLI's file download capabilities by enabling automatic resolution of Google Drive download URIs. When a user specifies an output file for a 'drive.files.download' operation, the CLI now intelligently parses the API response to locate the download URL, validates its origin, and streams the binary data to the destination, while maintaining existing behavior for other API calls. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request implements logic to handle file downloads for Google Drive by extracting download URIs from API responses and executing the subsequent download request. The changes include URI validation to ensure only Google-hosted links are followed. Feedback indicates that the new download request should include the "x-goog-user-project" header for consistent quota and billing attribution, similar to other API calls in the executor.
| let mut download_request = client.get(download_uri); | ||
| if let Some(token) = token { | ||
| if auth_method == AuthMethod::OAuth { | ||
| download_request = download_request.bearer_auth(token); | ||
| } | ||
| } |
There was a problem hiding this comment.
The download request for the downloadUri is missing the x-goog-user-project header, which is required by many Google APIs for quota and billing attribution when using a service account or specific project-based credentials. For consistency with the primary API request built in build_http_request, this header should be included if a quota project is configured.
let mut download_request = client.get(download_uri);
if let Some(quota_project) = crate::auth::get_quota_project() {
download_request = download_request.header("x-goog-user-project", quota_project);
}
if let Some(token) = token {
if auth_method == AuthMethod::OAuth {
download_request = download_request.bearer_auth(token);
}
}|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces specialized handling for Google Drive file downloads by extracting download URIs from API responses and performing secondary requests to retrieve the file content. It includes security checks to validate that download URIs belong to Google domains and refactors quota project header logic into a helper function. Review feedback suggests ensuring the secondary download request inherits all necessary headers from the client and improving error context when reading failed download response bodies.
|
|
||
| if output_path.is_some() && method.id.as_deref() == Some("drive.files.download") { | ||
| if let Some(download_uri) = extract_google_download_uri(&body_text)? { | ||
| let mut download_request = add_quota_project_header(client.get(download_uri)); |
There was a problem hiding this comment.
The download request is built using only the download_uri and the quota project header. It does not inherit other potentially important headers from the original request, such as User-Agent or custom headers that might be required by the storage backend. Ensure that the client used (built at line 481) has the necessary default headers configured.
| .to_string(); | ||
|
|
||
| if !download_status.is_success() { | ||
| let error_body = download_response.text().await.unwrap_or_default(); |
There was a problem hiding this comment.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request implements support for Google Drive file downloads by extracting download URIs from API responses and performing secondary requests. It includes domain validation for security, refactors quota header logic into a helper function, and adds comprehensive unit tests. A critical feedback point was raised regarding the potential for 400 Bad Request errors when applying Authorization headers to Signed URLs, which already contain authentication parameters.
| if let Some(token) = token { | ||
| if *auth_method == AuthMethod::OAuth { | ||
| request = request.bearer_auth(token); | ||
| } | ||
| } |
There was a problem hiding this comment.
Adding an Authorization header to a Google download URI can cause a 400 Bad Request if the URI is a "Signed URL" (common for large file downloads from Google Cloud Storage). Signed URLs already contain authentication in their query parameters (e.g., GoogleAccessId or X-Goog-Signature), and providing a Bearer token simultaneously creates an authentication conflict. You should only add the bearer token if the URL does not appear to be signed.
if let Some(token) = token {
if *auth_method == AuthMethod::OAuth {
// Only add Authorization header if the URL doesn't appear to be a GCS Signed URL.
// Signed URLs contain authentication in query parameters, and adding a Bearer
// token causes a 400 Bad Request conflict.
let is_signed_url = reqwest::Url::parse(download_uri)
.map(|u| {
u.query_pairs().any(|(k, _)| {
k == "GoogleAccessId" || k == "Signature" || k.starts_with("X-Goog-")
})
})
.unwrap_or(false);
if !is_signed_url {
request = request.bearer_auth(token);
}
}
}|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces logic to handle secondary downloads for Google Drive by extracting download URIs from API responses and executing follow-up requests. It includes helper functions for URI validation and header management, along with corresponding unit tests. A security concern was raised regarding the potential leakage of Bearer tokens if the CLI misidentifies user-controlled JSON as an API response; it is recommended to verify the response type (e.g., checking for a 'kind' or 'done' field) before following URIs. Additionally, the method ID 'drive.files.download' should be verified against the Discovery Document to ensure the logic triggers as intended.
| if output_path.is_some() && method.id.as_deref() == Some("drive.files.download") { | ||
| if let Some(download_uri) = extract_google_download_uri(&body_text)? { |
There was a problem hiding this comment.
There are two significant concerns with this logic:
- Method ID Accuracy: Please verify that
drive.files.downloadis the correct method ID. In the standard Google Drive API v3 Discovery Document, the method for downloading file content isdrive.files.get(typically used withalt=media). If the ID is actuallydrive.files.get, this block will never execute. - Security/Correctness Risk: Following a
downloadUrifrom a JSON body is dangerous if the response is actually the file content (e.g., a user-controlled JSON file) rather than an API Operation or metadata. If a user's file contains adownloadUrifield pointing to a malicious but Google-hosted domain (like a subdomain ofgoogleusercontent.com), the CLI might follow it and leak the user's Bearer token.
To fix this, you should verify that the JSON response is indeed an API response (e.g., by checking for a "kind": "drive#operation" field or the presence of the "done" field characteristic of Google Operations) before attempting to extract and follow a download URI.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request implements automatic following of download URIs for Google Drive file downloads. It adds several helper functions to extract and validate download URIs, ensuring they belong to trusted Google domains before proceeding. The implementation also refactors quota project header handling and includes comprehensive unit tests. Review feedback suggests simplifying the host validation logic in is_google_download_uri to remove redundant checks and explicitly include the base domain.
| host == "storage.googleapis.com" | ||
| || host.ends_with(".googleapis.com") | ||
| || host.ends_with(".googleusercontent.com") |
There was a problem hiding this comment.
The check host == "storage.googleapis.com" is redundant because host.ends_with(".googleapis.com") already covers it. To ensure the host is either exactly the base domain or a proper subdomain, the check should include the base domain explicitly. Suggestions to add additional domains like googleusercontent.com are omitted to avoid scope creep.
| host == "storage.googleapis.com" | |
| || host.ends_with(".googleapis.com") | |
| || host.ends_with(".googleusercontent.com") | |
| host == "googleapis.com" | |
| || host.ends_with(".googleapis.com") |
References
- Avoid introducing changes that are outside the primary goal of a pull request to prevent scope creep.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces logic to handle Google Drive downloads by detecting download URIs in API responses and performing secondary requests. It includes refactoring for quota headers and comprehensive unit tests. The review feedback highlights critical security and functional concerns, specifically regarding overly permissive host validation that could lead to token exfiltration, the improper inclusion of authorization and quota headers in requests to GCS or signed URLs, and the need for more specific detection of Drive operations to prevent false positives and SSRF risks.
| fn is_google_download_uri(uri: &str) -> bool { | ||
| let Ok(url) = reqwest::Url::parse(uri) else { | ||
| return false; | ||
| }; | ||
| if url.scheme() != "https" || !url.username().is_empty() || url.password().is_some() { | ||
| return false; | ||
| } | ||
| let Some(host) = url.host_str() else { | ||
| return false; | ||
| }; | ||
|
|
||
| host == "googleapis.com" || host.ends_with(".googleapis.com") | ||
| } |
There was a problem hiding this comment.
The host validation logic is overly permissive and susceptible to token exfiltration. By allowing any host ending in .googleapis.com, the CLI will trust attacker-controlled subdomains such as attacker-bucket.storage.googleapis.com. An attacker could share a malicious JSON file that mimics a Drive operation, causing the CLI to send the user's OAuth Bearer token to the attacker's bucket during a download.
Restrict the allowed hosts to a strict allowlist of trusted Google API domains.
fn is_google_download_uri(uri: &str) -> bool {
let Ok(url) = reqwest::Url::parse(uri) else {
return false;
};
if url.scheme() != "https" || !url.username().is_empty() || url.password().is_some() {
return false;
}
let Some(host) = url.host_str() else {
return false;
};
// Restrict to known trusted Google API hosts to prevent token exfiltration
// to attacker-controlled subdomains (e.g., GCS virtual-hosted buckets).
matches!(host, "googleapis.com" | "www.googleapis.com" | "storage.googleapis.com")
}| fn build_download_request( | ||
| client: &reqwest::Client, | ||
| download_uri: &str, | ||
| token: Option<&str>, | ||
| auth_method: &AuthMethod, | ||
| ) -> reqwest::RequestBuilder { | ||
| // Keep secondary Drive downloads on the same reqwest client so they use | ||
| // the same client-level configuration as API requests. | ||
| let mut request = add_quota_project_header(client.get(download_uri)); | ||
| if let Some(token) = token { | ||
| if *auth_method == AuthMethod::OAuth && !is_signed_download_uri(download_uri) { | ||
| request = request.bearer_auth(token); | ||
| } | ||
| } | ||
| request | ||
| } |
There was a problem hiding this comment.
This implementation has two significant issues:
- Signed URL Compatibility: Adding the x-goog-user-project header to a signed URL (like those from GCS) will cause a SignatureDoesNotMatch (403) error because the header was not part of the original signature.
- Security (Token Leakage): Sending Bearer tokens to storage.googleapis.com is dangerous as it allows token exfiltration to user-controlled buckets. Bearer tokens should only be sent to the main API surface (www.googleapis.com). GCS downloads should rely on signed URLs which do not require the Authorization header.
fn build_download_request(
client: &reqwest::Client,
download_uri: &str,
token: Option<&str>,
auth_method: &AuthMethod,
) -> reqwest::RequestBuilder {
let mut request = client.get(download_uri);
let is_signed = is_signed_download_uri(download_uri);
// Only add auth/quota headers if the URI is NOT a signed URL.
if !is_signed {
request = add_quota_project_header(request);
if let Some(token) = token {
// Only send Bearer tokens to the main API surface.
// GCS downloads (storage.googleapis.com) should use signed URLs.
let is_api_host = reqwest::Url::parse(download_uri)
.map(|u| matches!(u.host_str(), Some("www.googleapis.com" | "googleapis.com")))
.unwrap_or(false);
if *auth_method == AuthMethod::OAuth && is_api_host {
request = request.bearer_auth(token);
}
}
}
request
}| fn is_drive_download_operation(json_val: &Value) -> bool { | ||
| json_val.get("done").is_some() | ||
| || json_val | ||
| .get("kind") | ||
| .and_then(Value::as_str) | ||
| .is_some_and(|kind| kind == "drive#operation") | ||
| } |
There was a problem hiding this comment.
Checking for the presence of a done field is too generic and can lead to false positives when downloading user-provided JSON files. This increases the risk of the CLI misinterpreting file content as an API operation. Rely on the more specific kind field or other unique markers of a Drive operation response.
fn is_drive_download_operation(json_val: &Value) -> bool {
json_val
.get("kind")
.and_then(Value::as_str)
.is_some_and(|kind| kind == "drive#operation")
}| .context("Failed to read response body")?; | ||
|
|
||
| if output_path.is_some() && method.id.as_deref() == Some("drive.files.download") { | ||
| if let Some(download_uri) = extract_google_download_uri(&body_text)? { |
There was a problem hiding this comment.
There is a security risk here: if a user downloads a malicious JSON file that mimics a Drive operation response, the CLI will follow the downloadUri contained within that file. While the host check and token restrictions (suggested elsewhere) mitigate token theft, this still constitutes a Server-Side Request Forgery (SSRF) where the CLI makes unexpected requests based on untrusted content. Consider adding a check to ensure the response is an actual API response (e.g., by verifying the presence of Google-specific API headers like X-Goog-Api-Client) before attempting to extract a download URI.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request implements support for following redirect URIs in Google Drive download operations. It introduces logic to extract download URIs from API responses, validate them against allowed Google domains, and perform the subsequent download request with appropriate authentication and quota headers. Review feedback suggests expanding the domain whitelist to include googleusercontent.com subdomains and allowing authentication tokens to be sent to storage.googleapis.com to correctly handle redirects to Google Cloud Storage.
| fn is_google_download_uri(uri: &str) -> bool { | ||
| matches!( | ||
| parse_download_uri_host(uri).as_deref(), | ||
| Some("googleapis.com" | "www.googleapis.com" | "storage.googleapis.com") | ||
| ) | ||
| } |
There was a problem hiding this comment.
The whitelist for following download URIs is too restrictive and will likely break legitimate Google Drive downloads. Google frequently uses various subdomains for content delivery, such as drive.googleusercontent.com or dynamically generated subdomains like doc-0k-8s-docs.googleusercontent.com.
It is safer to allow any subdomain of googleapis.com and googleusercontent.com for the purpose of following the redirect, as long as the authentication token is only sent to a more strictly validated set of hosts (which is handled separately in is_google_api_download_host).
fn is_google_download_uri(uri: &str) -> bool {
parse_download_uri_host(uri).as_deref().is_some_and(|host| {
host == "googleapis.com" || host.ends_with(".googleapis.com") || host.ends_with(".googleusercontent.com")
})
}| fn is_google_api_download_host(uri: &str) -> bool { | ||
| matches!( | ||
| parse_download_uri_host(uri).as_deref(), | ||
| Some("googleapis.com" | "www.googleapis.com") | ||
| ) | ||
| } |
There was a problem hiding this comment.
The storage.googleapis.com host should be included in the list of hosts allowed to receive the Bearer authentication token. Many Drive download operations redirect to Google Cloud Storage (GCS). If the GCS URL is not a signed URL (e.g., it's a direct link to a private object), the request will fail with a 401/403 error unless the OAuth2 token is provided.
Note that exact matching on storage.googleapis.com is safe, as it prevents sending the token to virtual-hosted style buckets (e.g., bucket.storage.googleapis.com) which could be attacker-controlled.
| fn is_google_api_download_host(uri: &str) -> bool { | |
| matches!( | |
| parse_download_uri_host(uri).as_deref(), | |
| Some("googleapis.com" | "www.googleapis.com") | |
| ) | |
| } | |
| fn is_google_api_download_host(uri: &str) -> bool { | |
| matches!( | |
| parse_download_uri_host(uri).as_deref(), | |
| Some("googleapis.com" | "www.googleapis.com" | "storage.googleapis.com") | |
| ) | |
| } |
Fixes #789.
Summary
drive.files.downloadreturns an operation JSON with adownloadUriand the caller provided--output, follow the Google download URI and stream the actual bytes to the requested output file.files.downloadcalls without--output.downloadUrivalues before following them.Testing
cargo fmt --checkcargo test -p google-workspace-cli test_extract_cargo test -p google-workspace-cli executor::tests