Directories are generally expected to be listed first in directory
indexes. That was already working for yum and file repos, but wasn't the
case for kickstart repos due to their combination of different types of
content.
This commit applies a consistent sorting so that directories will always
come first, and entries will otherwise be sorted by name, for all repo
types.
The Fetcher type was designed to return a 'str'.
That wasn't a good idea because it implies that every fetched file must
be loaded into memory completely. On certain large yum repos,
decompressed primary XML can be hundreds of MB, and it's not appropriate
to require loading that all into memory at once.
Make it support a file-like object (stream of bytes). Since the SAX
XML parser supports reading from a stream, this makes it possible to
avoid loading everything into memory at once.
A test of repo-autoindex CLI against
/content/dist/rhel/server/7/7Server/x86_64/os showed major
improvement:
- before: ~1200MiB
- after: ~80MiB
Note that achieving the full improvement requires any downstream users
of the library (e.g. exodus-gw) to update their Fetcher implementation
as well, to stop returning a 'str'.
AppStream kickstart repos were missing from the initial collection
of repos used to test the kickstart repo index functionality. AppStream
repos uniquely do not contain "checksums" sections in their treeinfo
files. So, when attempting to run repo-autoindex against an AppStream
kickstart repo, "KeyError: 'checksums'" was raised.
Now, when encountering an AppStream kickstart repo, repo-autoindex
does not attempt to parse the "checksums" section.
Due to the presence of a "repodata/repomd.xml" path in a kickstart
repo, repo-autoindex previously interpreted kickstart repos as yum
repos. As such, a kickstart repo's index would solely consist of two
directories: "Packages" and "repodata".
While a kickstart repo does contain a yum repo, kickstart repos also
contain two additional repo entry points: treeinfo and extra_files.json.
Each entry point references additional files that should be included
in a kickstart repo's index. These files were previously ignored.
Now, when repo-autoindex encounters a kickstart repo, repo-autoindex
produces a repo index that reflects the content referenced in all
three repo entry points (repomd.xml, treeinfo, extra_files.json).
Ultimately, all errors are propagated in some way, but it's important to
differentiate between "the content was invalid" vs "failed to fetch the
content".