[WIP] Parsing current RHEL Streams from Product Pages#424
[WIP] Parsing current RHEL Streams from Product Pages#424mruprich wants to merge 12 commits intopackit:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a new module product_pages.py to fetch and process RHEL release stream data from the internal Product Pages API. The implementation includes logic for identifying current y-streams, current z-streams, and upcoming z-streams. Feedback was provided to improve the reliability and efficiency of the API interaction by adding timeouts, using a context manager for the session, consolidating multiple API calls into a single request, and enhancing error handling.
| s = requests.Session() | ||
| auth = requests_gssapi.HTTPSPNEGOAuth(mutual_authentication=requests_gssapi.OPTIONAL) | ||
| auth_resp = s.post(_OIDC_AUTHENTICATE_URL, auth=auth) | ||
| _require_ok(auth_resp, "OIDC authenticate") | ||
|
|
||
| # Multiple active releases per major: lower stream is finishing; higher is main y-stream. | ||
| response_active = s.get( | ||
| _RELEASES_API_URL, | ||
| params={ | ||
| "fields": "shortname", | ||
| "active": "", | ||
| "product__shortname": "rhel", | ||
| }, | ||
| ) | ||
| _require_ok(response_active, "active releases") | ||
| active_data = response_active.json() | ||
|
|
||
| current_y_streams = _build_current_y_streams(active_data) | ||
| upcoming_z_streams = _build_upcoming_z_streams(active_data) | ||
|
|
||
| response_zstream = s.get( | ||
| _RELEASES_API_URL, | ||
| params={ | ||
| "fields": "shortname,name_incl_maint,name", | ||
| "product__shortname": "rhel", | ||
| }, | ||
| ) | ||
| _require_ok(response_zstream, "releases for z-stream filtering") | ||
| z_data = response_zstream.json() | ||
|
|
||
| fields = [ | ||
| "shortname", | ||
| "name_incl_maint", | ||
| "name", | ||
| ] | ||
| filtered = [ | ||
| {k: item[k] for k in fields} | ||
| for item in z_data | ||
| if _GA_ZSTREAM_RE.search(item.get("name_incl_maint") or "") | ||
| ] | ||
|
|
||
| current_z_streams = _build_current_z_streams_ga_zstream(filtered) | ||
|
|
||
| return { | ||
| "current_y_streams": current_y_streams, | ||
| "current_z_streams": current_z_streams, | ||
| "upcoming_z_streams": upcoming_z_streams, | ||
| } |
There was a problem hiding this comment.
The current implementation of _fetch_rhel_streams_snapshot_sync can be improved for reliability, efficiency, and resource management:
- Timeouts: Network requests using
requestsshould always specify atimeout(e.g., 30 seconds) to prevent the application from hanging indefinitely if the server is unresponsive. - Efficiency: Instead of making two separate calls to the releases API, you can fetch all RHEL releases in a single call by requesting the
activefield and then filtering the results in memory. This reduces network overhead and load on the Product Pages API. - Resource Management: Using
requests.Sessionas a context manager (with requests.Session() as s:) ensures that the underlying TCP connections are properly closed when the function exits. - Error Handling: Wrapping the network calls and JSON parsing in a
try-exceptblock (catchingrequests.RequestExceptionandValueError) allows for more robust error reporting by wrapping these low-level errors into aToolErrorwith a descriptive message.
with requests.Session() as s:
auth = requests_gssapi.HTTPSPNEGOAuth(mutual_authentication=requests_gssapi.OPTIONAL)
try:
# Always specify a timeout for network requests to avoid hanging indefinitely.
auth_resp = s.post(_OIDC_AUTHENTICATE_URL, auth=auth, timeout=30)
_require_ok(auth_resp, "OIDC authenticate")
# Fetch all RHEL releases in a single call and filter in memory to reduce network overhead.
response = s.get(
_RELEASES_API_URL,
params={
"fields": "shortname,name_incl_maint,name,active",
"product__shortname": "rhel",
},
timeout=30,
)
_require_ok(response, "fetch releases")
all_data = response.json()
except (requests.RequestException, ValueError) as e:
raise ToolError(f"Failed to fetch RHEL streams from Product Pages: {e}") from e
active_data = [item for item in all_data if item.get("active")]
current_y_streams = _build_current_y_streams(active_data)
upcoming_z_streams = _build_upcoming_z_streams(active_data)
filtered_z = [
item for item in all_data
if _GA_ZSTREAM_RE.search(item.get("name_incl_maint") or "")
]
current_z_streams = _build_current_z_streams_ga_zstream(filtered_z)
return {
"current_y_streams": current_y_streams,
"current_z_streams": current_z_streams,
"upcoming_z_streams": upcoming_z_streams,
}There was a problem hiding this comment.
- added
- I don't think that is possible. I did not find a way to get all that data in a single API call. The active API call gives very different results from the second one that grabs everything.
- added
- added
TomasTomecek
left a comment
There was a problem hiding this comment.
very nice!! it would be lovely to get at least a single test case that has mocked data with what PP returns (fake data obviously so we're not leaking anything here)
| timeout = _PRODUCT_PAGES_TIMEOUT | ||
| try: | ||
| with requests.Session() as s: | ||
| auth = requests_gssapi.HTTPSPNEGOAuth(mutual_authentication=requests_gssapi.OPTIONAL) |
There was a problem hiding this comment.
we need to make sure this works okay with our kerberos setup
There was a problem hiding this comment.
This should work with an active and valid krb ticket in your system but I realize that packit might have a different way to get a ticket right?
There was a problem hiding this comment.
Take a look at init_kerberos_ticket function. That is the one used to obtain kerberos ticket or verify it already exists.
There was a problem hiding this comment.
Pushing another round of changes. I used the init_kereberos_ticket to get a ticket. I hope I am using that right.
I also added a unit test for the product pages..
|
Just a headsup a related commit was merged recently: a0c211a |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
AFAIK it is used for z-stream builds where the repo is added into the copr. If I remember correctly we had some issues with copr not having all the latest packages which were in the z-stream repo. With internal_repos_host defined you always get the latest z-stream packages to build against |
|
It's the location of Brew repos, and it has nothing to do with Product Pages, it's in rhel-config.json only because it's an internal URL (and yes, it changes in time, so it can't be hardcoded). |
|
Ok, thanks. So one solution would be to keep the rhel-config.json at least for copr.py to use. That requires minimal changes in copr.py. Maybe the rhel-config.json could be used in the future as some sort of a cache? So that you don't have to contact PP every time you do something? Might save some bandwidth and you would be able to work even when the PP is unavailable. There would have to be a mechanism to make sure that it is up-to-date though... |
nforro
left a comment
There was a problem hiding this comment.
Shouldn't this be a privileged MCP tool as it accesses internal data?
Yes, we can rename the file later or use an environment variable for the URL instead, but that's out of scope of this PR.
Some caching mechanism would be definitely welcome, but I don't think it should be a file in |
|
@nforro the placement of the product-pages.py is really up to you guys, I will move it if necessary. |
It's more about not calling Perhaps it would be easier for you to contribute just product-pages.py in a form of standalone "library" and leave the rest on us. Kerberos ticket initialization would be the tool's responsibility and raising |
TODO:
packit/packit.dev.The Product Pages should be the source of truth for this, but getting current and upcoming z-streams and current y-streams is tricky since they overlap during the DevDocTest phases. My logic is as follows:
Anyway, the fetch_rhel_streams_snapshot() function should return a structure similar to what load_rhel_config() function returns now except for the package_instructions part. It will only work with a valid Kerberos ticket, otherwise it will raise a ToolError.
I am marking this as WIP for now so let me have it. Let me know if I chose a good place for the code.
RELEASE NOTES BEGIN
Packit now supports automatic parsing of all the current RHEL streams.
RELEASE NOTES END