Commit graph

9 commits

Author SHA1 Message Date
Rohan McGovern
137388d475 Avoid spurious mypy failure in latest xml type hints
startElement signature changed in the .pyi stubs for XML classes,
triggering a mypy complaint here. Suppress it as there is no actual
error here.
2024-01-12 08:48:42 +10:00
Rohan McGovern
eac74ec1e4 Further reduce memory usage on large yum repos [RHELDST-20453]
The Fetcher type was designed to return a 'str'.
That wasn't a good idea because it implies that every fetched file must
be loaded into memory completely. On certain large yum repos,
decompressed primary XML can be hundreds of MB, and it's not appropriate
to require loading that all into memory at once.

Make it support a file-like object (stream of bytes). Since the SAX
XML parser supports reading from a stream, this makes it possible to
avoid loading everything into memory at once.

A test of repo-autoindex CLI against
/content/dist/rhel/server/7/7Server/x86_64/os showed major
improvement:

- before: ~1200MiB
- after:    ~80MiB

Note that achieving the full improvement requires any downstream users
of the library (e.g. exodus-gw) to update their Fetcher implementation
as well, to stop returning a 'str'.
2023-09-21 11:05:21 +10:00
Rohan McGovern
6ffbe4736c Update and fix mypy with latest type hints again
3f478e76f7 added a "type: ignore" here due to a change in
typeshed. The commit message mentioned that the type hint may have been
wrong.

It looks like that was fixed in
https://github.com/python/typeshed/pull/9919/files,
so it's necessary to also remove the "type: ignore" now.
2023-05-15 08:45:01 +10:00
Rohan McGovern
3f478e76f7 Fix type check with latest mypy/typeshed
The following commit defined a return type hint for
getElementsByTagName:

3fc2f27990 (diff-f451f731d037ef9d79347194490b32ba613798ea7eaa2c160351a69625f05e08R150)

It defined the return type as a list of Node, while this code expects a
list of Element (Element is a subtype of Node).

Given that one would expect a getElements method to return
specifically *elements* and not other types of node, I think the
typeshed change may be incorrect, but it's hard to be sure since the
stdlib docs themselves are ambiguous.

Suppress it for now to unblock dependency updates.
2023-04-19 13:57:11 +10:00
Caleigh Runge-Hottman
b284ddbd9f Generate kickstart repo index [RHELDST-14528]
Due to the presence of a "repodata/repomd.xml" path in a kickstart
repo, repo-autoindex previously interpreted kickstart repos as yum
repos. As such, a kickstart repo's index would solely consist of two
directories: "Packages" and "repodata".

While a kickstart repo does contain a yum repo, kickstart repos also
contain two additional repo entry points: treeinfo and extra_files.json.
Each entry point references additional files that should be included
in a kickstart repo's index. These files were previously ignored.

Now, when repo-autoindex encounters a kickstart repo, repo-autoindex
produces a repo index that reflects the content referenced in all
three repo entry points (repomd.xml, treeinfo, extra_files.json).
2023-03-31 17:53:57 -04:00
Rohan McGovern
117cabb0b7 Use SAX instead of pulldom for primary.xml parsing [RHELDST-14338]
Redo the parsing of packages from primary.xml to use SAX; previously it
was using pulldom. The motivation for the change is to reduce memory usage.

When parsing a larger yum repo such as that contained within
rhel-8-for-ppc64le-appstream-kickstart__8_DOT_4, the observed memory
usage from repo-autoindex command was:

- pulldom: ~700MB
- SAX:     ~85MB

This does not affect the output of the indexing process, and is covered
by existing tests.
2022-10-20 09:51:30 +10:00
Rohan McGovern
af99a34e39 Fix 'error: Unused "type: ignore" comment' with mypy 0.981 2022-10-05 10:38:32 +10:00
Rohan McGovern
293f5887b7 Implement error handling
Ultimately, all errors are propagated in some way, but it's important to
differentiate between "the content was invalid" vs "failed to fetch the
content".
2022-08-09 08:51:06 +10:00
Rohan McGovern
787ba01a0e Rearrange sources to keep API separate, add a real test 2022-07-07 13:20:43 +10:00
Renamed from repo_autoindex/yum.py (Browse further)