require libdeflate and unbundle it

kdelibs requires too

Signed-off-by: Ivailo Monev <xakepa10@gmail.com>
This commit is contained in:
Ivailo Monev 2023-05-29 18:39:48 +03:00
parent b7663fe761
commit b5c8de2b7e
45 changed files with 5 additions and 14020 deletions

View file

@ -116,9 +116,6 @@ set(KATIE_PKGCONFIG_PATH "${KATIE_LIBRARIES_PATH}/pkgconfig" CACHE PATH "pkg-con
set(KATIE_TOOLS_SUFFIX "" CACHE STRING "Tools (moc, uic, etc.) suffix")
# bundled packages
option(WITH_DEFLATE "Build with external libdeflate" ON)
add_feature_info(deflate WITH_DEFLATE "build with external libdeflate")
option(WITH_XXHASH "Build with external xxHash" OFF)
add_feature_info(xxhash WITH_XXHASH "build with external xxHash")
@ -204,7 +201,7 @@ set_package_properties(Deflate PROPERTIES
PURPOSE "Required for compression and decompression support"
DESCRIPTION "Heavily optimized library for DEFLATE/zlib/gzip compression and decompression"
URL "https://github.com/ebiggers/libdeflate"
TYPE RECOMMENDED
TYPE REQUIRED
)
find_package(xxHash)

View file

@ -1,21 +0,0 @@
Copyright 2016 Eric Biggers
Permission is hereby granted, free of charge, to any person
obtaining a copy of this software and associated documentation files
(the "Software"), to deal in the Software without restriction,
including without limitation the rights to use, copy, modify, merge,
publish, distribute, sublicense, and/or sell copies of the Software,
and to permit persons to whom the Software is furnished to do so,
subject to the following conditions:
The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

View file

@ -1,2 +0,0 @@
This is Git checkout 18d6cc22b75643ec52111efeb27a22b9d860a982
from https://github.com/ebiggers/libdeflate that has not been modified.

View file

@ -1,210 +0,0 @@
# Overview
libdeflate is a library for fast, whole-buffer DEFLATE-based compression and
decompression.
The supported formats are:
- DEFLATE (raw)
- zlib (a.k.a. DEFLATE with a zlib wrapper)
- gzip (a.k.a. DEFLATE with a gzip wrapper)
libdeflate is heavily optimized. It is significantly faster than the zlib
library, both for compression and decompression, and especially on x86
processors. In addition, libdeflate provides optional high compression modes
that provide a better compression ratio than the zlib's "level 9".
libdeflate itself is a library. The following command-line programs which use
this library are also included:
* `libdeflate-gzip`, a program which can be a drop-in replacement for standard
`gzip` under some circumstances. Note that `libdeflate-gzip` has some
limitations; it is provided for convenience and is **not** meant to be the
main use case of libdeflate. It needs a lot of memory to process large files,
and it omits support for some infrequently-used options of GNU gzip.
* `benchmark`, a test program that does round-trip compression and decompression
of the provided data, and measures the compression and decompression speed.
It can use libdeflate, zlib, or a combination of the two.
* `checksum`, a test program that checksums the provided data with Adler-32 or
CRC-32, and optionally measures the speed. It can use libdeflate or zlib.
For the release notes, see the [NEWS file](NEWS.md).
## Table of Contents
- [Building](#building)
- [Using CMake](#using-cmake)
- [Directly integrating the library sources](#directly-integrating-the-library-sources)
- [API](#api)
- [Bindings for other programming languages](#bindings-for-other-programming-languages)
- [DEFLATE vs. zlib vs. gzip](#deflate-vs-zlib-vs-gzip)
- [Compression levels](#compression-levels)
- [Motivation](#motivation)
- [License](#license)
# Building
## Using CMake
libdeflate uses [CMake](https://cmake.org/). It can be built just like any
other CMake project, e.g. with:
cmake -B build && cmake --build build
By default the following targets are built:
- The static library (normally called `libdeflate.a`)
- The shared library (normally called `libdeflate.so`)
- The `libdeflate-gzip` program, including its alias `libdeflate-gunzip`
Besides the standard CMake build and installation options, there are some
libdeflate-specific build options. See `CMakeLists.txt` for the list of these
options. To set an option, add `-DOPTION=VALUE` to the `cmake` command.
Prebuilt Windows binaries can be downloaded from
https://github.com/ebiggers/libdeflate/releases.
## Directly integrating the library sources
Although the official build system is CMake, care has been taken to keep the
library source files compilable directly, without a prerequisite configuration
step. Therefore, it is also fine to just add the library source files directly
to your application, without using CMake.
You should compile both `lib/*.c` and `lib/*/*.c`. You don't need to worry
about excluding irrelevant architecture-specific code, as this is already
handled in the source files themselves using `#ifdef`s.
It is strongly recommended to use either gcc or clang, and to use `-O2`.
If you are doing a freestanding build with `-ffreestanding`, you must add
`-DFREESTANDING` as well (matching what the `CMakeLists.txt` does).
# API
libdeflate has a simple API that is not zlib-compatible. You can create
compressors and decompressors and use them to compress or decompress buffers.
See libdeflate.h for details.
There is currently no support for streaming. This has been considered, but it
always significantly increases complexity and slows down fast paths.
Unfortunately, at this point it remains a future TODO. So: if your application
compresses data in "chunks", say, less than 1 MB in size, then libdeflate is a
great choice for you; that's what it's designed to do. This is perfect for
certain use cases such as transparent filesystem compression. But if your
application compresses large files as a single compressed stream, similarly to
the `gzip` program, then libdeflate isn't for you.
Note that with chunk-based compression, you generally should have the
uncompressed size of each chunk stored outside of the compressed data itself.
This enables you to allocate an output buffer of the correct size without
guessing. However, libdeflate's decompression routines do optionally provide
the actual number of output bytes in case you need it.
Windows developers: note that the calling convention of libdeflate.dll is
"cdecl". (libdeflate v1.4 through v1.12 used "stdcall" instead.)
# Bindings for other programming languages
The libdeflate project itself only provides a C library. If you need to use
libdeflate from a programming language other than C or C++, consider using the
following bindings:
* C#: [LibDeflate.NET](https://github.com/jzebedee/LibDeflate.NET)
* Go: [go-libdeflate](https://github.com/4kills/go-libdeflate)
* Java: [libdeflate-java](https://github.com/astei/libdeflate-java)
* Julia: [LibDeflate.jl](https://github.com/jakobnissen/LibDeflate.jl)
* Nim: [libdeflate-nim](https://github.com/gemesa/libdeflate-nim)
* Perl: [Gzip::Libdeflate](https://github.com/benkasminbullock/gzip-libdeflate)
* Python: [deflate](https://github.com/dcwatson/deflate)
* Ruby: [libdeflate-ruby](https://github.com/kaorimatz/libdeflate-ruby)
* Rust: [libdeflater](https://github.com/adamkewley/libdeflater)
Note: these are third-party projects which haven't necessarily been vetted by
the authors of libdeflate. Please direct all questions, bugs, and improvements
for these bindings to their authors.
Also, unfortunately many of these bindings bundle or pin an old version of
libdeflate. To avoid known issues in old versions and to improve performance,
before using any of these bindings please ensure that the bundled or pinned
version of libdeflate has been upgraded to the latest release.
# DEFLATE vs. zlib vs. gzip
The DEFLATE format ([rfc1951](https://www.ietf.org/rfc/rfc1951.txt)), the zlib
format ([rfc1950](https://www.ietf.org/rfc/rfc1950.txt)), and the gzip format
([rfc1952](https://www.ietf.org/rfc/rfc1952.txt)) are commonly confused with
each other as well as with the [zlib software library](http://zlib.net), which
actually supports all three formats. libdeflate (this library) also supports
all three formats.
Briefly, DEFLATE is a raw compressed stream, whereas zlib and gzip are different
wrappers for this stream. Both zlib and gzip include checksums, but gzip can
include extra information such as the original filename. Generally, you should
choose a format as follows:
- If you are compressing whole files with no subdivisions, similar to the `gzip`
program, you probably should use the gzip format.
- Otherwise, if you don't need the features of the gzip header and footer but do
still want a checksum for corruption detection, you probably should use the
zlib format.
- Otherwise, you probably should use raw DEFLATE. This is ideal if you don't
need checksums, e.g. because they're simply not needed for your use case or
because you already compute your own checksums that are stored separately from
the compressed stream.
Note that gzip and zlib streams can be distinguished from each other based on
their starting bytes, but this is not necessarily true of raw DEFLATE streams.
# Compression levels
An often-underappreciated fact of compression formats such as DEFLATE is that
there are an enormous number of different ways that a given input could be
compressed. Different algorithms and different amounts of computation time will
result in different compression ratios, while remaining equally compatible with
the decompressor.
For this reason, the commonly used zlib library provides nine compression
levels. Level 1 is the fastest but provides the worst compression; level 9
provides the best compression but is the slowest. It defaults to level 6.
libdeflate uses this same design but is designed to improve on both zlib's
performance *and* compression ratio at every compression level. In addition,
libdeflate's levels go [up to 12](https://xkcd.com/670/) to make room for a
minimum-cost-path based algorithm (sometimes called "optimal parsing") that can
significantly improve on zlib's compression ratio.
If you are using DEFLATE (or zlib, or gzip) in your application, you should test
different levels to see which works best for your application.
# Motivation
Despite DEFLATE's widespread use mainly through the zlib library, in the
compression community this format from the early 1990s is often considered
obsolete. And in a few significant ways, it is.
So why implement DEFLATE at all, instead of focusing entirely on
bzip2/LZMA/xz/LZ4/LZX/ZSTD/Brotli/LZHAM/LZFSE/[insert cool new format here]?
To do something better, you need to understand what came before. And it turns
out that most ideas from DEFLATE are still relevant. Many of the newer formats
share a similar structure as DEFLATE, with different tweaks. The effects of
trivial but very useful tweaks, such as increasing the sliding window size, are
often confused with the effects of nontrivial but less useful tweaks. And
actually, many of these formats are similar enough that common algorithms and
optimizations (e.g. those dealing with LZ77 matchfinding) can be reused.
In addition, comparing compressors fairly is difficult because the performance
of a compressor depends heavily on optimizations which are not intrinsic to the
compression format itself. In this respect, the zlib library sometimes compares
poorly to certain newer code because zlib is not well optimized for modern
processors. libdeflate addresses this by providing an optimized DEFLATE
implementation which can be used for benchmarking purposes. And, of course,
real applications can use it as well.
# License
libdeflate is [MIT-licensed](COPYING).
I am not aware of any patents or patent applications relevant to libdeflate.

View file

@ -1,718 +0,0 @@
/*
* common_defs.h
*
* Copyright 2016 Eric Biggers
*
* Permission is hereby granted, free of charge, to any person
* obtaining a copy of this software and associated documentation
* files (the "Software"), to deal in the Software without
* restriction, including without limitation the rights to use,
* copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following
* conditions:
*
* The above copyright notice and this permission notice shall be
* included in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
* OTHER DEALINGS IN THE SOFTWARE.
*/
#ifndef COMMON_DEFS_H
#define COMMON_DEFS_H
#include "libdeflate.h"
#include <stdbool.h>
#include <stddef.h> /* for size_t */
#include <stdint.h>
#ifdef _MSC_VER
# include <intrin.h> /* for _BitScan*() and other intrinsics */
# include <stdlib.h> /* for _byteswap_*() */
/* Disable MSVC warnings that are expected. */
/* /W2 */
# pragma warning(disable : 4146) /* unary minus on unsigned type */
/* /W3 */
# pragma warning(disable : 4018) /* signed/unsigned mismatch */
# pragma warning(disable : 4244) /* possible loss of data */
# pragma warning(disable : 4267) /* possible loss of precision */
# pragma warning(disable : 4310) /* cast truncates constant value */
/* /W4 */
# pragma warning(disable : 4100) /* unreferenced formal parameter */
# pragma warning(disable : 4127) /* conditional expression is constant */
# pragma warning(disable : 4189) /* local variable initialized but not referenced */
# pragma warning(disable : 4232) /* nonstandard extension used */
# pragma warning(disable : 4245) /* conversion from 'int' to 'unsigned int' */
# pragma warning(disable : 4295) /* array too small to include terminating null */
#endif
#ifndef FREESTANDING
# include <string.h> /* for memcpy() */
#endif
/* ========================================================================== */
/* Target architecture */
/* ========================================================================== */
/* If possible, define a compiler-independent ARCH_* macro. */
#undef ARCH_X86_64
#undef ARCH_X86_32
#undef ARCH_ARM64
#undef ARCH_ARM32
#ifdef _MSC_VER
# if defined(_M_X64)
# define ARCH_X86_64
# elif defined(_M_IX86)
# define ARCH_X86_32
# elif defined(_M_ARM64)
# define ARCH_ARM64
# elif defined(_M_ARM)
# define ARCH_ARM32
# endif
#else
# if defined(__x86_64__)
# define ARCH_X86_64
# elif defined(__i386__)
# define ARCH_X86_32
# elif defined(__aarch64__)
# define ARCH_ARM64
# elif defined(__arm__)
# define ARCH_ARM32
# endif
#endif
/* ========================================================================== */
/* Type definitions */
/* ========================================================================== */
/* Fixed-width integer types */
typedef uint8_t u8;
typedef uint16_t u16;
typedef uint32_t u32;
typedef uint64_t u64;
typedef int8_t s8;
typedef int16_t s16;
typedef int32_t s32;
typedef int64_t s64;
/* ssize_t, if not available in <sys/types.h> */
#ifdef _MSC_VER
# ifdef _WIN64
typedef long long ssize_t;
# else
typedef long ssize_t;
# endif
#endif
/*
* Word type of the target architecture. Use 'size_t' instead of
* 'unsigned long' to account for platforms such as Windows that use 32-bit
* 'unsigned long' on 64-bit architectures.
*/
typedef size_t machine_word_t;
/* Number of bytes in a word */
#define WORDBYTES ((int)sizeof(machine_word_t))
/* Number of bits in a word */
#define WORDBITS (8 * WORDBYTES)
/* ========================================================================== */
/* Optional compiler features */
/* ========================================================================== */
/* Compiler version checks. Only use when absolutely necessary. */
#if defined(__GNUC__) && !defined(__clang__) && !defined(__INTEL_COMPILER)
# define GCC_PREREQ(major, minor) \
(__GNUC__ > (major) || \
(__GNUC__ == (major) && __GNUC_MINOR__ >= (minor)))
#else
# define GCC_PREREQ(major, minor) 0
#endif
#ifdef __clang__
# ifdef __apple_build_version__
# define CLANG_PREREQ(major, minor, apple_version) \
(__apple_build_version__ >= (apple_version))
# else
# define CLANG_PREREQ(major, minor, apple_version) \
(__clang_major__ > (major) || \
(__clang_major__ == (major) && __clang_minor__ >= (minor)))
# endif
#else
# define CLANG_PREREQ(major, minor, apple_version) 0
#endif
/*
* Macros to check for compiler support for attributes and builtins. clang
* implements these macros, but gcc doesn't, so generally any use of one of
* these macros must also be combined with a gcc version check.
*/
#ifndef __has_attribute
# define __has_attribute(attribute) 0
#endif
#ifndef __has_builtin
# define __has_builtin(builtin) 0
#endif
/* inline - suggest that a function be inlined */
#ifdef _MSC_VER
# define inline __inline
#endif /* else assume 'inline' is usable as-is */
/* forceinline - force a function to be inlined, if possible */
#if defined(__GNUC__) || __has_attribute(always_inline)
# define forceinline inline __attribute__((always_inline))
#elif defined(_MSC_VER)
# define forceinline __forceinline
#else
# define forceinline inline
#endif
/* MAYBE_UNUSED - mark a function or variable as maybe unused */
#if defined(__GNUC__) || __has_attribute(unused)
# define MAYBE_UNUSED __attribute__((unused))
#else
# define MAYBE_UNUSED
#endif
/*
* restrict - hint that writes only occur through the given pointer.
*
* Don't use MSVC's __restrict, since it has nonstandard behavior.
* Standard restrict is okay, if it is supported.
*/
#if !defined(__STDC_VERSION__) || (__STDC_VERSION__ < 201112L)
# if defined(__GNUC__) || defined(__clang__)
# define restrict __restrict__
# else
# define restrict
# endif
#endif /* else assume 'restrict' is usable as-is */
/* likely(expr) - hint that an expression is usually true */
#if defined(__GNUC__) || __has_builtin(__builtin_expect)
# define likely(expr) __builtin_expect(!!(expr), 1)
#else
# define likely(expr) (expr)
#endif
/* unlikely(expr) - hint that an expression is usually false */
#if defined(__GNUC__) || __has_builtin(__builtin_expect)
# define unlikely(expr) __builtin_expect(!!(expr), 0)
#else
# define unlikely(expr) (expr)
#endif
/* prefetchr(addr) - prefetch into L1 cache for read */
#undef prefetchr
#if defined(__GNUC__) || __has_builtin(__builtin_prefetch)
# define prefetchr(addr) __builtin_prefetch((addr), 0)
#elif defined(_MSC_VER)
# if defined(ARCH_X86_32) || defined(ARCH_X86_64)
# define prefetchr(addr) _mm_prefetch((addr), _MM_HINT_T0)
# elif defined(ARCH_ARM64)
# define prefetchr(addr) __prefetch2((addr), 0x00 /* prfop=PLDL1KEEP */)
# elif defined(ARCH_ARM32)
# define prefetchr(addr) __prefetch(addr)
# endif
#endif
#ifndef prefetchr
# define prefetchr(addr)
#endif
/* prefetchw(addr) - prefetch into L1 cache for write */
#undef prefetchw
#if defined(__GNUC__) || __has_builtin(__builtin_prefetch)
# define prefetchw(addr) __builtin_prefetch((addr), 1)
#elif defined(_MSC_VER)
# if defined(ARCH_X86_32) || defined(ARCH_X86_64)
# define prefetchw(addr) _m_prefetchw(addr)
# elif defined(ARCH_ARM64)
# define prefetchw(addr) __prefetch2((addr), 0x10 /* prfop=PSTL1KEEP */)
# elif defined(ARCH_ARM32)
# define prefetchw(addr) __prefetchw(addr)
# endif
#endif
#ifndef prefetchw
# define prefetchw(addr)
#endif
/*
* _aligned_attribute(n) - declare that the annotated variable, or variables of
* the annotated type, must be aligned on n-byte boundaries.
*/
#undef _aligned_attribute
#if defined(__GNUC__) || __has_attribute(aligned)
# define _aligned_attribute(n) __attribute__((aligned(n)))
#elif defined(_MSC_VER)
# define _aligned_attribute(n) __declspec(align(n))
#endif
/*
* _target_attribute(attrs) - override the compilation target for a function.
*
* This accepts one or more comma-separated suffixes to the -m prefix jointly
* forming the name of a machine-dependent option. On gcc-like compilers, this
* enables codegen for the given targets, including arbitrary compiler-generated
* code as well as the corresponding intrinsics. On other compilers this macro
* expands to nothing, though MSVC allows intrinsics to be used anywhere anyway.
*/
#if GCC_PREREQ(4, 4) || __has_attribute(target)
# define _target_attribute(attrs) __attribute__((target(attrs)))
# define COMPILER_SUPPORTS_TARGET_FUNCTION_ATTRIBUTE 1
#else
# define _target_attribute(attrs)
# define COMPILER_SUPPORTS_TARGET_FUNCTION_ATTRIBUTE 0
#endif
/* ========================================================================== */
/* Miscellaneous macros */
/* ========================================================================== */
#define ARRAY_LEN(A) (sizeof(A) / sizeof((A)[0]))
#define MIN(a, b) ((a) <= (b) ? (a) : (b))
#define MAX(a, b) ((a) >= (b) ? (a) : (b))
#define DIV_ROUND_UP(n, d) (((n) + (d) - 1) / (d))
#define STATIC_ASSERT(expr) ((void)sizeof(char[1 - 2 * !(expr)]))
#define ALIGN(n, a) (((n) + (a) - 1) & ~((a) - 1))
#define ROUND_UP(n, d) ((d) * DIV_ROUND_UP((n), (d)))
/* ========================================================================== */
/* Endianness handling */
/* ========================================================================== */
/*
* CPU_IS_LITTLE_ENDIAN() - 1 if the CPU is little endian, or 0 if it is big
* endian. When possible this is a compile-time macro that can be used in
* preprocessor conditionals. As a fallback, a generic method is used that
* can't be used in preprocessor conditionals but should still be optimized out.
*/
#if defined(__BYTE_ORDER__) /* gcc v4.6+ and clang */
# define CPU_IS_LITTLE_ENDIAN() (__BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__)
#elif defined(_MSC_VER)
# define CPU_IS_LITTLE_ENDIAN() true
#else
static forceinline bool CPU_IS_LITTLE_ENDIAN(void)
{
union {
u32 w;
u8 b;
} u;
u.w = 1;
return u.b;
}
#endif
/* bswap16(v) - swap the bytes of a 16-bit integer */
static forceinline u16 bswap16(u16 v)
{
#if GCC_PREREQ(4, 8) || __has_builtin(__builtin_bswap16)
return __builtin_bswap16(v);
#elif defined(_MSC_VER)
return _byteswap_ushort(v);
#else
return (v << 8) | (v >> 8);
#endif
}
/* bswap32(v) - swap the bytes of a 32-bit integer */
static forceinline u32 bswap32(u32 v)
{
#if GCC_PREREQ(4, 3) || __has_builtin(__builtin_bswap32)
return __builtin_bswap32(v);
#elif defined(_MSC_VER)
return _byteswap_ulong(v);
#else
return ((v & 0x000000FF) << 24) |
((v & 0x0000FF00) << 8) |
((v & 0x00FF0000) >> 8) |
((v & 0xFF000000) >> 24);
#endif
}
/* bswap64(v) - swap the bytes of a 64-bit integer */
static forceinline u64 bswap64(u64 v)
{
#if GCC_PREREQ(4, 3) || __has_builtin(__builtin_bswap64)
return __builtin_bswap64(v);
#elif defined(_MSC_VER)
return _byteswap_uint64(v);
#else
return ((v & 0x00000000000000FF) << 56) |
((v & 0x000000000000FF00) << 40) |
((v & 0x0000000000FF0000) << 24) |
((v & 0x00000000FF000000) << 8) |
((v & 0x000000FF00000000) >> 8) |
((v & 0x0000FF0000000000) >> 24) |
((v & 0x00FF000000000000) >> 40) |
((v & 0xFF00000000000000) >> 56);
#endif
}
#define le16_bswap(v) (CPU_IS_LITTLE_ENDIAN() ? (v) : bswap16(v))
#define le32_bswap(v) (CPU_IS_LITTLE_ENDIAN() ? (v) : bswap32(v))
#define le64_bswap(v) (CPU_IS_LITTLE_ENDIAN() ? (v) : bswap64(v))
#define be16_bswap(v) (CPU_IS_LITTLE_ENDIAN() ? bswap16(v) : (v))
#define be32_bswap(v) (CPU_IS_LITTLE_ENDIAN() ? bswap32(v) : (v))
#define be64_bswap(v) (CPU_IS_LITTLE_ENDIAN() ? bswap64(v) : (v))
/* ========================================================================== */
/* Unaligned memory accesses */
/* ========================================================================== */
/*
* UNALIGNED_ACCESS_IS_FAST() - 1 if unaligned memory accesses can be performed
* efficiently on the target platform, otherwise 0.
*/
#if (defined(__GNUC__) || defined(__clang__)) && \
(defined(ARCH_X86_64) || defined(ARCH_X86_32) || \
defined(__ARM_FEATURE_UNALIGNED) || defined(__powerpc64__) || \
/*
* For all compilation purposes, WebAssembly behaves like any other CPU
* instruction set. Even though WebAssembly engine might be running on
* top of different actual CPU architectures, the WebAssembly spec
* itself permits unaligned access and it will be fast on most of those
* platforms, and simulated at the engine level on others, so it's
* worth treating it as a CPU architecture with fast unaligned access.
*/ defined(__wasm__))
# define UNALIGNED_ACCESS_IS_FAST 1
#elif defined(_MSC_VER)
# define UNALIGNED_ACCESS_IS_FAST 1
#else
# define UNALIGNED_ACCESS_IS_FAST 0
#endif
/*
* Implementing unaligned memory accesses using memcpy() is portable, and it
* usually gets optimized appropriately by modern compilers. I.e., each
* memcpy() of 1, 2, 4, or WORDBYTES bytes gets compiled to a load or store
* instruction, not to an actual function call.
*
* We no longer use the "packed struct" approach to unaligned accesses, as that
* is nonstandard, has unclear semantics, and doesn't receive enough testing
* (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94994).
*
* arm32 with __ARM_FEATURE_UNALIGNED in gcc 5 and earlier is a known exception
* where memcpy() generates inefficient code
* (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67366). However, we no longer
* consider that one case important enough to maintain different code for.
* If you run into it, please just use a newer version of gcc (or use clang).
*/
#ifdef FREESTANDING
# define MEMCOPY __builtin_memcpy
#else
# define MEMCOPY memcpy
#endif
/* Unaligned loads and stores without endianness conversion */
#define DEFINE_UNALIGNED_TYPE(type) \
static forceinline type \
load_##type##_unaligned(const void *p) \
{ \
type v; \
\
MEMCOPY(&v, p, sizeof(v)); \
return v; \
} \
\
static forceinline void \
store_##type##_unaligned(type v, void *p) \
{ \
MEMCOPY(p, &v, sizeof(v)); \
}
DEFINE_UNALIGNED_TYPE(u16)
DEFINE_UNALIGNED_TYPE(u32)
DEFINE_UNALIGNED_TYPE(u64)
DEFINE_UNALIGNED_TYPE(machine_word_t)
#undef MEMCOPY
#define load_word_unaligned load_machine_word_t_unaligned
#define store_word_unaligned store_machine_word_t_unaligned
/* Unaligned loads with endianness conversion */
static forceinline u16
get_unaligned_le16(const u8 *p)
{
if (UNALIGNED_ACCESS_IS_FAST)
return le16_bswap(load_u16_unaligned(p));
else
return ((u16)p[1] << 8) | p[0];
}
static forceinline u16
get_unaligned_be16(const u8 *p)
{
if (UNALIGNED_ACCESS_IS_FAST)
return be16_bswap(load_u16_unaligned(p));
else
return ((u16)p[0] << 8) | p[1];
}
static forceinline u32
get_unaligned_le32(const u8 *p)
{
if (UNALIGNED_ACCESS_IS_FAST)
return le32_bswap(load_u32_unaligned(p));
else
return ((u32)p[3] << 24) | ((u32)p[2] << 16) |
((u32)p[1] << 8) | p[0];
}
static forceinline u32
get_unaligned_be32(const u8 *p)
{
if (UNALIGNED_ACCESS_IS_FAST)
return be32_bswap(load_u32_unaligned(p));
else
return ((u32)p[0] << 24) | ((u32)p[1] << 16) |
((u32)p[2] << 8) | p[3];
}
static forceinline u64
get_unaligned_le64(const u8 *p)
{
if (UNALIGNED_ACCESS_IS_FAST)
return le64_bswap(load_u64_unaligned(p));
else
return ((u64)p[7] << 56) | ((u64)p[6] << 48) |
((u64)p[5] << 40) | ((u64)p[4] << 32) |
((u64)p[3] << 24) | ((u64)p[2] << 16) |
((u64)p[1] << 8) | p[0];
}
static forceinline machine_word_t
get_unaligned_leword(const u8 *p)
{
STATIC_ASSERT(WORDBITS == 32 || WORDBITS == 64);
if (WORDBITS == 32)
return get_unaligned_le32(p);
else
return get_unaligned_le64(p);
}
/* Unaligned stores with endianness conversion */
static forceinline void
put_unaligned_le16(u16 v, u8 *p)
{
if (UNALIGNED_ACCESS_IS_FAST) {
store_u16_unaligned(le16_bswap(v), p);
} else {
p[0] = (u8)(v >> 0);
p[1] = (u8)(v >> 8);
}
}
static forceinline void
put_unaligned_be16(u16 v, u8 *p)
{
if (UNALIGNED_ACCESS_IS_FAST) {
store_u16_unaligned(be16_bswap(v), p);
} else {
p[0] = (u8)(v >> 8);
p[1] = (u8)(v >> 0);
}
}
static forceinline void
put_unaligned_le32(u32 v, u8 *p)
{
if (UNALIGNED_ACCESS_IS_FAST) {
store_u32_unaligned(le32_bswap(v), p);
} else {
p[0] = (u8)(v >> 0);
p[1] = (u8)(v >> 8);
p[2] = (u8)(v >> 16);
p[3] = (u8)(v >> 24);
}
}
static forceinline void
put_unaligned_be32(u32 v, u8 *p)
{
if (UNALIGNED_ACCESS_IS_FAST) {
store_u32_unaligned(be32_bswap(v), p);
} else {
p[0] = (u8)(v >> 24);
p[1] = (u8)(v >> 16);
p[2] = (u8)(v >> 8);
p[3] = (u8)(v >> 0);
}
}
static forceinline void
put_unaligned_le64(u64 v, u8 *p)
{
if (UNALIGNED_ACCESS_IS_FAST) {
store_u64_unaligned(le64_bswap(v), p);
} else {
p[0] = (u8)(v >> 0);
p[1] = (u8)(v >> 8);
p[2] = (u8)(v >> 16);
p[3] = (u8)(v >> 24);
p[4] = (u8)(v >> 32);
p[5] = (u8)(v >> 40);
p[6] = (u8)(v >> 48);
p[7] = (u8)(v >> 56);
}
}
static forceinline void
put_unaligned_leword(machine_word_t v, u8 *p)
{
STATIC_ASSERT(WORDBITS == 32 || WORDBITS == 64);
if (WORDBITS == 32)
put_unaligned_le32(v, p);
else
put_unaligned_le64(v, p);
}
/* ========================================================================== */
/* Bit manipulation functions */
/* ========================================================================== */
/*
* Bit Scan Reverse (BSR) - find the 0-based index (relative to the least
* significant end) of the *most* significant 1 bit in the input value. The
* input value must be nonzero!
*/
static forceinline unsigned
bsr32(u32 v)
{
#if defined(__GNUC__) || __has_builtin(__builtin_clz)
return 31 - __builtin_clz(v);
#elif defined(_MSC_VER)
unsigned long i;
_BitScanReverse(&i, v);
return i;
#else
unsigned i = 0;
while ((v >>= 1) != 0)
i++;
return i;
#endif
}
static forceinline unsigned
bsr64(u64 v)
{
#if defined(__GNUC__) || __has_builtin(__builtin_clzll)
return 63 - __builtin_clzll(v);
#elif defined(_MSC_VER) && defined(_WIN64)
unsigned long i;
_BitScanReverse64(&i, v);
return i;
#else
unsigned i = 0;
while ((v >>= 1) != 0)
i++;
return i;
#endif
}
static forceinline unsigned
bsrw(machine_word_t v)
{
STATIC_ASSERT(WORDBITS == 32 || WORDBITS == 64);
if (WORDBITS == 32)
return bsr32(v);
else
return bsr64(v);
}
/*
* Bit Scan Forward (BSF) - find the 0-based index (relative to the least
* significant end) of the *least* significant 1 bit in the input value. The
* input value must be nonzero!
*/
static forceinline unsigned
bsf32(u32 v)
{
#if defined(__GNUC__) || __has_builtin(__builtin_ctz)
return __builtin_ctz(v);
#elif defined(_MSC_VER)
unsigned long i;
_BitScanForward(&i, v);
return i;
#else
unsigned i = 0;
for (; (v & 1) == 0; v >>= 1)
i++;
return i;
#endif
}
static forceinline unsigned
bsf64(u64 v)
{
#if defined(__GNUC__) || __has_builtin(__builtin_ctzll)
return __builtin_ctzll(v);
#elif defined(_MSC_VER) && defined(_WIN64)
unsigned long i;
_BitScanForward64(&i, v);
return i;
#else
unsigned i = 0;
for (; (v & 1) == 0; v >>= 1)
i++;
return i;
#endif
}
static forceinline unsigned
bsfw(machine_word_t v)
{
STATIC_ASSERT(WORDBITS == 32 || WORDBITS == 64);
if (WORDBITS == 32)
return bsf32(v);
else
return bsf64(v);
}
/*
* rbit32(v): reverse the bits in a 32-bit integer. This doesn't have a
* fallback implementation; use '#ifdef rbit32' to check if this is available.
*/
#undef rbit32
#if (defined(__GNUC__) || defined(__clang__)) && defined(ARCH_ARM32) && \
(__ARM_ARCH >= 7 || (__ARM_ARCH == 6 && defined(__ARM_ARCH_6T2__)))
static forceinline u32
rbit32(u32 v)
{
__asm__("rbit %0, %1" : "=r" (v) : "r" (v));
return v;
}
#define rbit32 rbit32
#elif (defined(__GNUC__) || defined(__clang__)) && defined(ARCH_ARM64)
static forceinline u32
rbit32(u32 v)
{
__asm__("rbit %w0, %w1" : "=r" (v) : "r" (v));
return v;
}
#define rbit32 rbit32
#endif
#endif /* COMMON_DEFS_H */

View file

@ -1,130 +0,0 @@
/*
* adler32.c - Adler-32 checksum algorithm
*
* Copyright 2016 Eric Biggers
*
* Permission is hereby granted, free of charge, to any person
* obtaining a copy of this software and associated documentation
* files (the "Software"), to deal in the Software without
* restriction, including without limitation the rights to use,
* copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following
* conditions:
*
* The above copyright notice and this permission notice shall be
* included in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
* OTHER DEALINGS IN THE SOFTWARE.
*/
#include "lib_common.h"
/* The Adler-32 divisor, or "base", value */
#define DIVISOR 65521
/*
* MAX_CHUNK_LEN is the most bytes that can be processed without the possibility
* of s2 overflowing when it is represented as an unsigned 32-bit integer. This
* value was computed using the following Python script:
*
* divisor = 65521
* count = 0
* s1 = divisor - 1
* s2 = divisor - 1
* while True:
* s1 += 0xFF
* s2 += s1
* if s2 > 0xFFFFFFFF:
* break
* count += 1
* print(count)
*
* Note that to get the correct worst-case value, we must assume that every byte
* has value 0xFF and that s1 and s2 started with the highest possible values
* modulo the divisor.
*/
#define MAX_CHUNK_LEN 5552
static u32 MAYBE_UNUSED
adler32_generic(u32 adler, const u8 *p, size_t len)
{
u32 s1 = adler & 0xFFFF;
u32 s2 = adler >> 16;
const u8 * const end = p + len;
while (p != end) {
size_t chunk_len = MIN(end - p, MAX_CHUNK_LEN);
const u8 *chunk_end = p + chunk_len;
size_t num_unrolled_iterations = chunk_len / 4;
while (num_unrolled_iterations--) {
s1 += *p++;
s2 += s1;
s1 += *p++;
s2 += s1;
s1 += *p++;
s2 += s1;
s1 += *p++;
s2 += s1;
}
while (p != chunk_end) {
s1 += *p++;
s2 += s1;
}
s1 %= DIVISOR;
s2 %= DIVISOR;
}
return (s2 << 16) | s1;
}
/* Include architecture-specific implementation(s) if available. */
#undef DEFAULT_IMPL
#undef arch_select_adler32_func
typedef u32 (*adler32_func_t)(u32 adler, const u8 *p, size_t len);
#if defined(ARCH_ARM32) || defined(ARCH_ARM64)
# include "arm/adler32_impl.h"
#elif defined(ARCH_X86_32) || defined(ARCH_X86_64)
# include "x86/adler32_impl.h"
#endif
#ifndef DEFAULT_IMPL
# define DEFAULT_IMPL adler32_generic
#endif
#ifdef arch_select_adler32_func
static u32 dispatch_adler32(u32 adler, const u8 *p, size_t len);
static volatile adler32_func_t adler32_impl = dispatch_adler32;
/* Choose the best implementation at runtime. */
static u32 dispatch_adler32(u32 adler, const u8 *p, size_t len)
{
adler32_func_t f = arch_select_adler32_func();
if (f == NULL)
f = DEFAULT_IMPL;
adler32_impl = f;
return f(adler, p, len);
}
#else
/* The best implementation is statically known, so call it directly. */
#define adler32_impl DEFAULT_IMPL
#endif
LIBDEFLATEAPI u32
libdeflate_adler32(u32 adler, const void *buffer, size_t len)
{
if (buffer == NULL) /* Return initial value. */
return 1;
return adler32_impl(adler, buffer, len);
}

View file

@ -1,123 +0,0 @@
/*
* adler32_vec_template.h - template for vectorized Adler-32 implementations
*
* Copyright 2016 Eric Biggers
*
* Permission is hereby granted, free of charge, to any person
* obtaining a copy of this software and associated documentation
* files (the "Software"), to deal in the Software without
* restriction, including without limitation the rights to use,
* copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following
* conditions:
*
* The above copyright notice and this permission notice shall be
* included in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
* OTHER DEALINGS IN THE SOFTWARE.
*/
/*
* This file contains a template for vectorized Adler-32 implementations.
*
* The inner loop between reductions modulo 65521 of an unvectorized Adler-32
* implementation looks something like this:
*
* do {
* s1 += *p;
* s2 += s1;
* } while (++p != chunk_end);
*
* For vectorized calculation of s1, we only need to sum the input bytes. They
* can be accumulated into multiple counters which are eventually summed
* together.
*
* For vectorized calculation of s2, the basic idea is that for each iteration
* that processes N bytes, we can perform the following vectorizable
* calculation:
*
* s2 += N*byte_1 + (N-1)*byte_2 + (N-2)*byte_3 + ... + 1*byte_N
*
* Or, equivalently, we can sum the byte_1...byte_N for each iteration into N
* separate counters, then do the multiplications by N...1 just once at the end
* rather than once per iteration.
*
* Also, we must account for how previous bytes will affect s2 by doing the
* following at beginning of each iteration:
*
* s2 += s1 * N
*
* Furthermore, like s1, "s2" can actually be multiple counters which are
* eventually summed together.
*/
static u32 ATTRIBUTES MAYBE_UNUSED
FUNCNAME(u32 adler, const u8 *p, size_t len)
{
const size_t max_chunk_len =
MIN(MAX_CHUNK_LEN, IMPL_MAX_CHUNK_LEN) -
(MIN(MAX_CHUNK_LEN, IMPL_MAX_CHUNK_LEN) % IMPL_SEGMENT_LEN);
u32 s1 = adler & 0xFFFF;
u32 s2 = adler >> 16;
const u8 * const end = p + len;
const u8 *vend;
/* Process a byte at a time until the needed alignment is reached. */
if (p != end && (uintptr_t)p % IMPL_ALIGNMENT) {
do {
s1 += *p++;
s2 += s1;
} while (p != end && (uintptr_t)p % IMPL_ALIGNMENT);
s1 %= DIVISOR;
s2 %= DIVISOR;
}
/*
* Process "chunks" of bytes using vector instructions. Chunk lengths
* are limited to MAX_CHUNK_LEN, which guarantees that s1 and s2 never
* overflow before being reduced modulo DIVISOR. For vector processing,
* chunk lengths are also made evenly divisible by IMPL_SEGMENT_LEN and
* may be further limited to IMPL_MAX_CHUNK_LEN.
*/
STATIC_ASSERT(IMPL_SEGMENT_LEN % IMPL_ALIGNMENT == 0);
vend = end - ((size_t)(end - p) % IMPL_SEGMENT_LEN);
while (p != vend) {
size_t chunk_len = MIN((size_t)(vend - p), max_chunk_len);
s2 += s1 * chunk_len;
FUNCNAME_CHUNK((const void *)p, (const void *)(p + chunk_len),
&s1, &s2);
p += chunk_len;
s1 %= DIVISOR;
s2 %= DIVISOR;
}
/* Process any remaining bytes. */
if (p != end) {
do {
s1 += *p++;
s2 += s1;
} while (p != end);
s1 %= DIVISOR;
s2 %= DIVISOR;
}
return (s2 << 16) | s1;
}
#undef FUNCNAME
#undef FUNCNAME_CHUNK
#undef ATTRIBUTES
#undef IMPL_ALIGNMENT
#undef IMPL_SEGMENT_LEN
#undef IMPL_MAX_CHUNK_LEN

View file

@ -1,272 +0,0 @@
/*
* arm/adler32_impl.h - ARM implementations of Adler-32 checksum algorithm
*
* Copyright 2016 Eric Biggers
*
* Permission is hereby granted, free of charge, to any person
* obtaining a copy of this software and associated documentation
* files (the "Software"), to deal in the Software without
* restriction, including without limitation the rights to use,
* copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following
* conditions:
*
* The above copyright notice and this permission notice shall be
* included in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
* OTHER DEALINGS IN THE SOFTWARE.
*/
#ifndef LIB_ARM_ADLER32_IMPL_H
#define LIB_ARM_ADLER32_IMPL_H
#include "cpu_features.h"
/* Regular NEON implementation */
#if HAVE_NEON_INTRIN && CPU_IS_LITTLE_ENDIAN()
# define adler32_neon adler32_neon
# define FUNCNAME adler32_neon
# define FUNCNAME_CHUNK adler32_neon_chunk
# define IMPL_ALIGNMENT 16
# define IMPL_SEGMENT_LEN 64
/* Prevent unsigned overflow of the 16-bit precision byte counters */
# define IMPL_MAX_CHUNK_LEN (64 * (0xFFFF / 0xFF))
# if HAVE_NEON_NATIVE
# define ATTRIBUTES
# else
# ifdef ARCH_ARM32
# define ATTRIBUTES _target_attribute("fpu=neon")
# else
# define ATTRIBUTES _target_attribute("+simd")
# endif
# endif
# include <arm_neon.h>
static forceinline ATTRIBUTES void
adler32_neon_chunk(const uint8x16_t *p, const uint8x16_t * const end,
u32 *s1, u32 *s2)
{
static const u16 _aligned_attribute(16) mults[64] = {
64, 63, 62, 61, 60, 59, 58, 57, 56, 55, 54, 53, 52, 51, 50, 49,
48, 47, 46, 45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33,
32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17,
16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1,
};
const uint16x8_t mults_a = vld1q_u16(&mults[0]);
const uint16x8_t mults_b = vld1q_u16(&mults[8]);
const uint16x8_t mults_c = vld1q_u16(&mults[16]);
const uint16x8_t mults_d = vld1q_u16(&mults[24]);
const uint16x8_t mults_e = vld1q_u16(&mults[32]);
const uint16x8_t mults_f = vld1q_u16(&mults[40]);
const uint16x8_t mults_g = vld1q_u16(&mults[48]);
const uint16x8_t mults_h = vld1q_u16(&mults[56]);
uint32x4_t v_s1 = vdupq_n_u32(0);
uint32x4_t v_s2 = vdupq_n_u32(0);
/*
* v_byte_sums_* contain the sum of the bytes at index i across all
* 64-byte segments, for each index 0..63.
*/
uint16x8_t v_byte_sums_a = vdupq_n_u16(0);
uint16x8_t v_byte_sums_b = vdupq_n_u16(0);
uint16x8_t v_byte_sums_c = vdupq_n_u16(0);
uint16x8_t v_byte_sums_d = vdupq_n_u16(0);
uint16x8_t v_byte_sums_e = vdupq_n_u16(0);
uint16x8_t v_byte_sums_f = vdupq_n_u16(0);
uint16x8_t v_byte_sums_g = vdupq_n_u16(0);
uint16x8_t v_byte_sums_h = vdupq_n_u16(0);
do {
/* Load the next 64 bytes. */
const uint8x16_t bytes1 = *p++;
const uint8x16_t bytes2 = *p++;
const uint8x16_t bytes3 = *p++;
const uint8x16_t bytes4 = *p++;
uint16x8_t tmp;
/*
* Accumulate the previous s1 counters into the s2 counters.
* The needed multiplication by 64 is delayed to later.
*/
v_s2 = vaddq_u32(v_s2, v_s1);
/*
* Add the 64 bytes to their corresponding v_byte_sums counters,
* while also accumulating the sums of each adjacent set of 4
* bytes into v_s1.
*/
tmp = vpaddlq_u8(bytes1);
v_byte_sums_a = vaddw_u8(v_byte_sums_a, vget_low_u8(bytes1));
v_byte_sums_b = vaddw_u8(v_byte_sums_b, vget_high_u8(bytes1));
tmp = vpadalq_u8(tmp, bytes2);
v_byte_sums_c = vaddw_u8(v_byte_sums_c, vget_low_u8(bytes2));
v_byte_sums_d = vaddw_u8(v_byte_sums_d, vget_high_u8(bytes2));
tmp = vpadalq_u8(tmp, bytes3);
v_byte_sums_e = vaddw_u8(v_byte_sums_e, vget_low_u8(bytes3));
v_byte_sums_f = vaddw_u8(v_byte_sums_f, vget_high_u8(bytes3));
tmp = vpadalq_u8(tmp, bytes4);
v_byte_sums_g = vaddw_u8(v_byte_sums_g, vget_low_u8(bytes4));
v_byte_sums_h = vaddw_u8(v_byte_sums_h, vget_high_u8(bytes4));
v_s1 = vpadalq_u16(v_s1, tmp);
} while (p != end);
/* s2 = 64*s2 + (64*bytesum0 + 63*bytesum1 + ... + 1*bytesum63) */
#ifdef ARCH_ARM32
# define umlal2(a, b, c) vmlal_u16((a), vget_high_u16(b), vget_high_u16(c))
#else
# define umlal2 vmlal_high_u16
#endif
v_s2 = vqshlq_n_u32(v_s2, 6);
v_s2 = vmlal_u16(v_s2, vget_low_u16(v_byte_sums_a), vget_low_u16(mults_a));
v_s2 = umlal2(v_s2, v_byte_sums_a, mults_a);
v_s2 = vmlal_u16(v_s2, vget_low_u16(v_byte_sums_b), vget_low_u16(mults_b));
v_s2 = umlal2(v_s2, v_byte_sums_b, mults_b);
v_s2 = vmlal_u16(v_s2, vget_low_u16(v_byte_sums_c), vget_low_u16(mults_c));
v_s2 = umlal2(v_s2, v_byte_sums_c, mults_c);
v_s2 = vmlal_u16(v_s2, vget_low_u16(v_byte_sums_d), vget_low_u16(mults_d));
v_s2 = umlal2(v_s2, v_byte_sums_d, mults_d);
v_s2 = vmlal_u16(v_s2, vget_low_u16(v_byte_sums_e), vget_low_u16(mults_e));
v_s2 = umlal2(v_s2, v_byte_sums_e, mults_e);
v_s2 = vmlal_u16(v_s2, vget_low_u16(v_byte_sums_f), vget_low_u16(mults_f));
v_s2 = umlal2(v_s2, v_byte_sums_f, mults_f);
v_s2 = vmlal_u16(v_s2, vget_low_u16(v_byte_sums_g), vget_low_u16(mults_g));
v_s2 = umlal2(v_s2, v_byte_sums_g, mults_g);
v_s2 = vmlal_u16(v_s2, vget_low_u16(v_byte_sums_h), vget_low_u16(mults_h));
v_s2 = umlal2(v_s2, v_byte_sums_h, mults_h);
#undef umlal2
/* Horizontal sum to finish up */
#ifdef ARCH_ARM32
*s1 += vgetq_lane_u32(v_s1, 0) + vgetq_lane_u32(v_s1, 1) +
vgetq_lane_u32(v_s1, 2) + vgetq_lane_u32(v_s1, 3);
*s2 += vgetq_lane_u32(v_s2, 0) + vgetq_lane_u32(v_s2, 1) +
vgetq_lane_u32(v_s2, 2) + vgetq_lane_u32(v_s2, 3);
#else
*s1 += vaddvq_u32(v_s1);
*s2 += vaddvq_u32(v_s2);
#endif
}
# include "../adler32_vec_template.h"
#endif /* Regular NEON implementation */
/* NEON+dotprod implementation */
#if HAVE_DOTPROD_INTRIN && CPU_IS_LITTLE_ENDIAN()
# define adler32_neon_dotprod adler32_neon_dotprod
# define FUNCNAME adler32_neon_dotprod
# define FUNCNAME_CHUNK adler32_neon_dotprod_chunk
# define IMPL_ALIGNMENT 16
# define IMPL_SEGMENT_LEN 64
# define IMPL_MAX_CHUNK_LEN MAX_CHUNK_LEN
# if HAVE_DOTPROD_NATIVE
# define ATTRIBUTES
# else
# ifdef __clang__
# define ATTRIBUTES _target_attribute("dotprod")
/*
* With gcc, arch=armv8.2-a is needed for dotprod intrinsics, unless the
* default target is armv8.3-a or later in which case it must be omitted.
* armv8.3-a or later can be detected by checking for __ARM_FEATURE_JCVT.
*/
# elif defined(__ARM_FEATURE_JCVT)
# define ATTRIBUTES _target_attribute("+dotprod")
# else
# define ATTRIBUTES _target_attribute("arch=armv8.2-a+dotprod")
# endif
# endif
# include <arm_neon.h>
static forceinline ATTRIBUTES void
adler32_neon_dotprod_chunk(const uint8x16_t *p, const uint8x16_t * const end,
u32 *s1, u32 *s2)
{
static const u8 _aligned_attribute(16) mults[64] = {
64, 63, 62, 61, 60, 59, 58, 57, 56, 55, 54, 53, 52, 51, 50, 49,
48, 47, 46, 45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33,
32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17,
16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1,
};
const uint8x16_t mults_a = vld1q_u8(&mults[0]);
const uint8x16_t mults_b = vld1q_u8(&mults[16]);
const uint8x16_t mults_c = vld1q_u8(&mults[32]);
const uint8x16_t mults_d = vld1q_u8(&mults[48]);
const uint8x16_t ones = vdupq_n_u8(1);
uint32x4_t v_s1_a = vdupq_n_u32(0);
uint32x4_t v_s1_b = vdupq_n_u32(0);
uint32x4_t v_s1_c = vdupq_n_u32(0);
uint32x4_t v_s1_d = vdupq_n_u32(0);
uint32x4_t v_s2_a = vdupq_n_u32(0);
uint32x4_t v_s2_b = vdupq_n_u32(0);
uint32x4_t v_s2_c = vdupq_n_u32(0);
uint32x4_t v_s2_d = vdupq_n_u32(0);
uint32x4_t v_s1_sums_a = vdupq_n_u32(0);
uint32x4_t v_s1_sums_b = vdupq_n_u32(0);
uint32x4_t v_s1_sums_c = vdupq_n_u32(0);
uint32x4_t v_s1_sums_d = vdupq_n_u32(0);
uint32x4_t v_s1;
uint32x4_t v_s2;
uint32x4_t v_s1_sums;
do {
uint8x16_t bytes_a = *p++;
uint8x16_t bytes_b = *p++;
uint8x16_t bytes_c = *p++;
uint8x16_t bytes_d = *p++;
v_s1_sums_a = vaddq_u32(v_s1_sums_a, v_s1_a);
v_s1_a = vdotq_u32(v_s1_a, bytes_a, ones);
v_s2_a = vdotq_u32(v_s2_a, bytes_a, mults_a);
v_s1_sums_b = vaddq_u32(v_s1_sums_b, v_s1_b);
v_s1_b = vdotq_u32(v_s1_b, bytes_b, ones);
v_s2_b = vdotq_u32(v_s2_b, bytes_b, mults_b);
v_s1_sums_c = vaddq_u32(v_s1_sums_c, v_s1_c);
v_s1_c = vdotq_u32(v_s1_c, bytes_c, ones);
v_s2_c = vdotq_u32(v_s2_c, bytes_c, mults_c);
v_s1_sums_d = vaddq_u32(v_s1_sums_d, v_s1_d);
v_s1_d = vdotq_u32(v_s1_d, bytes_d, ones);
v_s2_d = vdotq_u32(v_s2_d, bytes_d, mults_d);
} while (p != end);
v_s1 = vaddq_u32(vaddq_u32(v_s1_a, v_s1_b), vaddq_u32(v_s1_c, v_s1_d));
v_s2 = vaddq_u32(vaddq_u32(v_s2_a, v_s2_b), vaddq_u32(v_s2_c, v_s2_d));
v_s1_sums = vaddq_u32(vaddq_u32(v_s1_sums_a, v_s1_sums_b),
vaddq_u32(v_s1_sums_c, v_s1_sums_d));
v_s2 = vaddq_u32(v_s2, vqshlq_n_u32(v_s1_sums, 6));
*s1 += vaddvq_u32(v_s1);
*s2 += vaddvq_u32(v_s2);
}
# include "../adler32_vec_template.h"
#endif /* NEON+dotprod implementation */
#if defined(adler32_neon_dotprod) && HAVE_DOTPROD_NATIVE
#define DEFAULT_IMPL adler32_neon_dotprod
#else
static inline adler32_func_t
arch_select_adler32_func(void)
{
const u32 features MAYBE_UNUSED = get_arm_cpu_features();
#ifdef adler32_neon_dotprod
if (HAVE_NEON(features) && HAVE_DOTPROD(features))
return adler32_neon_dotprod;
#endif
#ifdef adler32_neon
if (HAVE_NEON(features))
return adler32_neon;
#endif
return NULL;
}
#define arch_select_adler32_func arch_select_adler32_func
#endif
#endif /* LIB_ARM_ADLER32_IMPL_H */

View file

@ -1,212 +0,0 @@
/*
* arm/cpu_features.c - feature detection for ARM CPUs
*
* Copyright 2018 Eric Biggers
*
* Permission is hereby granted, free of charge, to any person
* obtaining a copy of this software and associated documentation
* files (the "Software"), to deal in the Software without
* restriction, including without limitation the rights to use,
* copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following
* conditions:
*
* The above copyright notice and this permission notice shall be
* included in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
* OTHER DEALINGS IN THE SOFTWARE.
*/
/*
* ARM CPUs don't have a standard way for unprivileged programs to detect CPU
* features. But an OS-specific way can be used when available.
*/
#ifdef __APPLE__
# undef _ANSI_SOURCE
# undef _DARWIN_C_SOURCE
# define _DARWIN_C_SOURCE /* for sysctlbyname() */
#endif
#include "../cpu_features_common.h" /* must be included first */
#include "cpu_features.h"
#if HAVE_DYNAMIC_ARM_CPU_FEATURES
#ifdef __linux__
/*
* On Linux, arm32 and arm64 CPU features can be detected by reading the
* AT_HWCAP and AT_HWCAP2 values from /proc/self/auxv.
*
* Ideally we'd use the C library function getauxval(), but it's not guaranteed
* to be available: it was only added to glibc in 2.16, and in Android it was
* added to API level 18 for arm32 and level 21 for arm64.
*/
#include <errno.h>
#include <fcntl.h>
#include <string.h>
#include <unistd.h>
#define AT_HWCAP 16
#define AT_HWCAP2 26
static void scan_auxv(unsigned long *hwcap, unsigned long *hwcap2)
{
int fd;
unsigned long auxbuf[32];
int filled = 0;
int i;
fd = open("/proc/self/auxv", O_RDONLY);
if (fd < 0)
return;
for (;;) {
do {
int ret = read(fd, &((char *)auxbuf)[filled],
sizeof(auxbuf) - filled);
if (ret <= 0) {
if (ret < 0 && errno == EINTR)
continue;
goto out;
}
filled += ret;
} while (filled < 2 * sizeof(long));
i = 0;
do {
unsigned long type = auxbuf[i];
unsigned long value = auxbuf[i + 1];
if (type == AT_HWCAP)
*hwcap = value;
else if (type == AT_HWCAP2)
*hwcap2 = value;
i += 2;
filled -= 2 * sizeof(long);
} while (filled >= 2 * sizeof(long));
memmove(auxbuf, &auxbuf[i], filled);
}
out:
close(fd);
}
static u32 query_arm_cpu_features(void)
{
u32 features = 0;
unsigned long hwcap = 0;
unsigned long hwcap2 = 0;
scan_auxv(&hwcap, &hwcap2);
#ifdef ARCH_ARM32
STATIC_ASSERT(sizeof(long) == 4);
if (hwcap & (1 << 12)) /* HWCAP_NEON */
features |= ARM_CPU_FEATURE_NEON;
if (hwcap2 & (1 << 1)) /* HWCAP2_PMULL */
features |= ARM_CPU_FEATURE_PMULL;
if (hwcap2 & (1 << 4)) /* HWCAP2_CRC32 */
features |= ARM_CPU_FEATURE_CRC32;
#else
STATIC_ASSERT(sizeof(long) == 8);
if (hwcap & (1 << 1)) /* HWCAP_ASIMD */
features |= ARM_CPU_FEATURE_NEON;
if (hwcap & (1 << 4)) /* HWCAP_PMULL */
features |= ARM_CPU_FEATURE_PMULL;
if (hwcap & (1 << 7)) /* HWCAP_CRC32 */
features |= ARM_CPU_FEATURE_CRC32;
if (hwcap & (1 << 17)) /* HWCAP_SHA3 */
features |= ARM_CPU_FEATURE_SHA3;
if (hwcap & (1 << 20)) /* HWCAP_ASIMDDP */
features |= ARM_CPU_FEATURE_DOTPROD;
#endif
return features;
}
#elif defined(__APPLE__)
/* On Apple platforms, arm64 CPU features can be detected via sysctlbyname(). */
#include <sys/types.h>
#include <sys/sysctl.h>
static const struct {
const char *name;
u32 feature;
} feature_sysctls[] = {
{ "hw.optional.neon", ARM_CPU_FEATURE_NEON },
{ "hw.optional.AdvSIMD", ARM_CPU_FEATURE_NEON },
{ "hw.optional.arm.FEAT_PMULL", ARM_CPU_FEATURE_PMULL },
{ "hw.optional.armv8_crc32", ARM_CPU_FEATURE_CRC32 },
{ "hw.optional.armv8_2_sha3", ARM_CPU_FEATURE_SHA3 },
{ "hw.optional.arm.FEAT_SHA3", ARM_CPU_FEATURE_SHA3 },
{ "hw.optional.arm.FEAT_DotProd", ARM_CPU_FEATURE_DOTPROD },
};
static u32 query_arm_cpu_features(void)
{
u32 features = 0;
size_t i;
for (i = 0; i < ARRAY_LEN(feature_sysctls); i++) {
const char *name = feature_sysctls[i].name;
u32 val = 0;
size_t valsize = sizeof(val);
if (sysctlbyname(name, &val, &valsize, NULL, 0) == 0 &&
valsize == sizeof(val) && val == 1)
features |= feature_sysctls[i].feature;
}
return features;
}
#elif defined(_WIN32)
#include <windows.h>
static u32 query_arm_cpu_features(void)
{
u32 features = ARM_CPU_FEATURE_NEON;
if (IsProcessorFeaturePresent(PF_ARM_V8_CRYPTO_INSTRUCTIONS_AVAILABLE))
features |= ARM_CPU_FEATURE_PMULL;
if (IsProcessorFeaturePresent(PF_ARM_V8_CRC32_INSTRUCTIONS_AVAILABLE))
features |= ARM_CPU_FEATURE_CRC32;
/* FIXME: detect SHA3 and DOTPROD support too. */
return features;
}
#else
#error "unhandled case"
#endif
static const struct cpu_feature arm_cpu_feature_table[] = {
{ARM_CPU_FEATURE_NEON, "neon"},
{ARM_CPU_FEATURE_PMULL, "pmull"},
{ARM_CPU_FEATURE_CRC32, "crc32"},
{ARM_CPU_FEATURE_SHA3, "sha3"},
{ARM_CPU_FEATURE_DOTPROD, "dotprod"},
};
volatile u32 libdeflate_arm_cpu_features = 0;
void libdeflate_init_arm_cpu_features(void)
{
u32 features = query_arm_cpu_features();
disable_cpu_features_for_testing(&features, arm_cpu_feature_table,
ARRAY_LEN(arm_cpu_feature_table));
libdeflate_arm_cpu_features = features | ARM_CPU_FEATURES_KNOWN;
}
#endif /* HAVE_DYNAMIC_ARM_CPU_FEATURES */

View file

@ -1,265 +0,0 @@
/*
* arm/cpu_features.h - feature detection for ARM CPUs
*
* Copyright 2018 Eric Biggers
*
* Permission is hereby granted, free of charge, to any person
* obtaining a copy of this software and associated documentation
* files (the "Software"), to deal in the Software without
* restriction, including without limitation the rights to use,
* copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following
* conditions:
*
* The above copyright notice and this permission notice shall be
* included in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
* OTHER DEALINGS IN THE SOFTWARE.
*/
#ifndef LIB_ARM_CPU_FEATURES_H
#define LIB_ARM_CPU_FEATURES_H
#include "../lib_common.h"
#define HAVE_DYNAMIC_ARM_CPU_FEATURES 0
#if defined(ARCH_ARM32) || defined(ARCH_ARM64)
#if !defined(FREESTANDING) && \
(COMPILER_SUPPORTS_TARGET_FUNCTION_ATTRIBUTE || defined(_MSC_VER)) && \
(defined(__linux__) || \
(defined(__APPLE__) && defined(ARCH_ARM64)) || \
(defined(_WIN32) && defined(ARCH_ARM64)))
# undef HAVE_DYNAMIC_ARM_CPU_FEATURES
# define HAVE_DYNAMIC_ARM_CPU_FEATURES 1
#endif
#define ARM_CPU_FEATURE_NEON 0x00000001
#define ARM_CPU_FEATURE_PMULL 0x00000002
#define ARM_CPU_FEATURE_CRC32 0x00000004
#define ARM_CPU_FEATURE_SHA3 0x00000008
#define ARM_CPU_FEATURE_DOTPROD 0x00000010
#define HAVE_NEON(features) (HAVE_NEON_NATIVE || ((features) & ARM_CPU_FEATURE_NEON))
#define HAVE_PMULL(features) (HAVE_PMULL_NATIVE || ((features) & ARM_CPU_FEATURE_PMULL))
#define HAVE_CRC32(features) (HAVE_CRC32_NATIVE || ((features) & ARM_CPU_FEATURE_CRC32))
#define HAVE_SHA3(features) (HAVE_SHA3_NATIVE || ((features) & ARM_CPU_FEATURE_SHA3))
#define HAVE_DOTPROD(features) (HAVE_DOTPROD_NATIVE || ((features) & ARM_CPU_FEATURE_DOTPROD))
#if HAVE_DYNAMIC_ARM_CPU_FEATURES
#define ARM_CPU_FEATURES_KNOWN 0x80000000
extern volatile u32 libdeflate_arm_cpu_features;
void libdeflate_init_arm_cpu_features(void);
static inline u32 get_arm_cpu_features(void)
{
if (libdeflate_arm_cpu_features == 0)
libdeflate_init_arm_cpu_features();
return libdeflate_arm_cpu_features;
}
#else /* HAVE_DYNAMIC_ARM_CPU_FEATURES */
static inline u32 get_arm_cpu_features(void) { return 0; }
#endif /* !HAVE_DYNAMIC_ARM_CPU_FEATURES */
/* NEON */
#if defined(__ARM_NEON) || defined(ARCH_ARM64)
# define HAVE_NEON_NATIVE 1
#else
# define HAVE_NEON_NATIVE 0
#endif
/*
* With both gcc and clang, NEON intrinsics require that the main target has
* NEON enabled already. Exception: with gcc 6.1 and later (r230411 for arm32,
* r226563 for arm64), hardware floating point support is sufficient.
*/
#if HAVE_NEON_NATIVE || \
(HAVE_DYNAMIC_ARM_CPU_FEATURES && GCC_PREREQ(6, 1) && defined(__ARM_FP))
# define HAVE_NEON_INTRIN 1
#else
# define HAVE_NEON_INTRIN 0
#endif
/* PMULL */
#ifdef __ARM_FEATURE_CRYPTO
# define HAVE_PMULL_NATIVE 1
#else
# define HAVE_PMULL_NATIVE 0
#endif
#if HAVE_PMULL_NATIVE || \
(HAVE_DYNAMIC_ARM_CPU_FEATURES && \
HAVE_NEON_INTRIN /* needed to exclude soft float arm32 case */ && \
(GCC_PREREQ(6, 1) || CLANG_PREREQ(3, 5, 6010000) || \
defined(_MSC_VER)) && \
/*
* On arm32 with clang, the crypto intrinsics (which include pmull)
* are not defined, even when using -mfpu=crypto-neon-fp-armv8,
* because clang's <arm_neon.h> puts their definitions behind
* __aarch64__.
*/ \
!(defined(ARCH_ARM32) && defined(__clang__)))
# define HAVE_PMULL_INTRIN CPU_IS_LITTLE_ENDIAN() /* untested on big endian */
/* Work around MSVC's vmull_p64() taking poly64x1_t instead of poly64_t */
# ifdef _MSC_VER
# define compat_vmull_p64(a, b) vmull_p64(vcreate_p64(a), vcreate_p64(b))
# else
# define compat_vmull_p64(a, b) vmull_p64((a), (b))
# endif
#else
# define HAVE_PMULL_INTRIN 0
#endif
/*
* Set USE_PMULL_TARGET_EVEN_IF_NATIVE if a workaround for a gcc bug that was
* fixed by commit 11a113d501ff ("aarch64: Simplify feature definitions") in gcc
* 13 is needed. A minimal program that fails to build due to this bug when
* compiled with -mcpu=emag, at least with gcc 10 through 12, is:
*
* static inline __attribute__((always_inline,target("+crypto"))) void f() {}
* void g() { f(); }
*
* The error is:
*
* error: inlining failed in call to always_inline f: target specific option mismatch
*
* The workaround is to explicitly add the crypto target to the non-inline
* function g(), even though this should not be required due to -mcpu=emag
* enabling 'crypto' natively and causing __ARM_FEATURE_CRYPTO to be defined.
*/
#if HAVE_PMULL_NATIVE && defined(ARCH_ARM64) && \
GCC_PREREQ(6, 1) && !GCC_PREREQ(13, 1)
# define USE_PMULL_TARGET_EVEN_IF_NATIVE 1
#else
# define USE_PMULL_TARGET_EVEN_IF_NATIVE 0
#endif
/* CRC32 */
#ifdef __ARM_FEATURE_CRC32
# define HAVE_CRC32_NATIVE 1
#else
# define HAVE_CRC32_NATIVE 0
#endif
#undef HAVE_CRC32_INTRIN
#if HAVE_CRC32_NATIVE
# define HAVE_CRC32_INTRIN 1
#elif HAVE_DYNAMIC_ARM_CPU_FEATURES
# if GCC_PREREQ(1, 0)
/*
* Support for ARM CRC32 intrinsics when CRC32 instructions are not enabled
* in the main target has been affected by two gcc bugs, which we must avoid
* by only allowing gcc versions that have the corresponding fixes. First,
* gcc commit 943766d37ae4 ("[arm] Fix use of CRC32 intrinsics with Armv8-a
* and hard-float"), i.e. gcc 8.4+, 9.3+, 10.1+, or 11+, is needed. Second,
* gcc commit c1cdabe3aab8 ("arm: reorder assembler architecture directives
* [PR101723]"), i.e. gcc 9.5+, 10.4+, 11.3+, or 12+, is needed when
* binutils is 2.34 or later, due to
* https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104439. We use the second
* set of prerequisites, as they are stricter and we have no way to detect
* the binutils version directly from a C source file.
*
* Also exclude the cases where the main target arch is armv6kz or armv7e-m.
* In those cases, gcc doesn't let functions that use the main arch be
* inlined into functions that are targeted to armv8-a+crc. (armv8-a is
* necessary for crc to be accepted at all.) That causes build errors.
* This issue happens for these specific sub-archs because they are not a
* subset of armv8-a. Note: clang does not have this limitation.
*/
# if (GCC_PREREQ(11, 3) || \
(GCC_PREREQ(10, 4) && !GCC_PREREQ(11, 0)) || \
(GCC_PREREQ(9, 5) && !GCC_PREREQ(10, 0))) && \
!defined(__ARM_ARCH_6KZ__) && \
!defined(__ARM_ARCH_7EM__)
# define HAVE_CRC32_INTRIN 1
# endif
# elif CLANG_PREREQ(3, 4, 6000000)
# define HAVE_CRC32_INTRIN 1
# elif defined(_MSC_VER)
# define HAVE_CRC32_INTRIN 1
# endif
#endif
#ifndef HAVE_CRC32_INTRIN
# define HAVE_CRC32_INTRIN 0
#endif
/* SHA3 (needed for the eor3 instruction) */
#if defined(ARCH_ARM64) && !defined(_MSC_VER)
# ifdef __ARM_FEATURE_SHA3
# define HAVE_SHA3_NATIVE 1
# else
# define HAVE_SHA3_NATIVE 0
# endif
# define HAVE_SHA3_TARGET (HAVE_DYNAMIC_ARM_CPU_FEATURES && \
(GCC_PREREQ(8, 1) /* r256478 */ || \
CLANG_PREREQ(7, 0, 10010463) /* r338010 */))
# define HAVE_SHA3_INTRIN (HAVE_NEON_INTRIN && \
(HAVE_SHA3_NATIVE || HAVE_SHA3_TARGET) && \
(GCC_PREREQ(9, 1) /* r268049 */ || \
CLANG_PREREQ(13, 0, 13160000)))
#else
# define HAVE_SHA3_NATIVE 0
# define HAVE_SHA3_TARGET 0
# define HAVE_SHA3_INTRIN 0
#endif
/* dotprod */
#ifdef ARCH_ARM64
# ifdef __ARM_FEATURE_DOTPROD
# define HAVE_DOTPROD_NATIVE 1
# else
# define HAVE_DOTPROD_NATIVE 0
# endif
# if HAVE_DOTPROD_NATIVE || \
(HAVE_DYNAMIC_ARM_CPU_FEATURES && \
(GCC_PREREQ(8, 1) || CLANG_PREREQ(7, 0, 10010000) || \
defined(_MSC_VER)))
# define HAVE_DOTPROD_INTRIN 1
# else
# define HAVE_DOTPROD_INTRIN 0
# endif
#else
# define HAVE_DOTPROD_NATIVE 0
# define HAVE_DOTPROD_INTRIN 0
#endif
/*
* Work around bugs in arm_acle.h and arm_neon.h where sometimes intrinsics are
* only defined when the corresponding __ARM_FEATURE_* macro is defined. The
* intrinsics actually work in target attribute functions too if they are
* defined, though, so work around this by temporarily defining the
* corresponding __ARM_FEATURE_* macros while including the headers.
*/
#if HAVE_CRC32_INTRIN && !HAVE_CRC32_NATIVE && \
(defined(__clang__) || defined(ARCH_ARM32))
# define __ARM_FEATURE_CRC32 1
#endif
#if HAVE_SHA3_INTRIN && !HAVE_SHA3_NATIVE && defined(__clang__)
# define __ARM_FEATURE_SHA3 1
#endif
#if HAVE_DOTPROD_INTRIN && !HAVE_DOTPROD_NATIVE && defined(__clang__)
# define __ARM_FEATURE_DOTPROD 1
#endif
#if HAVE_CRC32_INTRIN && !HAVE_CRC32_NATIVE && \
(defined(__clang__) || defined(ARCH_ARM32))
# include <arm_acle.h>
# undef __ARM_FEATURE_CRC32
#endif
#if HAVE_SHA3_INTRIN && !HAVE_SHA3_NATIVE && defined(__clang__)
# include <arm_neon.h>
# undef __ARM_FEATURE_SHA3
#endif
#if HAVE_DOTPROD_INTRIN && !HAVE_DOTPROD_NATIVE && defined(__clang__)
# include <arm_neon.h>
# undef __ARM_FEATURE_DOTPROD
#endif
#endif /* ARCH_ARM32 || ARCH_ARM64 */
#endif /* LIB_ARM_CPU_FEATURES_H */

View file

@ -1,682 +0,0 @@
/*
* arm/crc32_impl.h - ARM implementations of the gzip CRC-32 algorithm
*
* Copyright 2022 Eric Biggers
*
* Permission is hereby granted, free of charge, to any person
* obtaining a copy of this software and associated documentation
* files (the "Software"), to deal in the Software without
* restriction, including without limitation the rights to use,
* copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following
* conditions:
*
* The above copyright notice and this permission notice shall be
* included in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
* OTHER DEALINGS IN THE SOFTWARE.
*/
#ifndef LIB_ARM_CRC32_IMPL_H
#define LIB_ARM_CRC32_IMPL_H
#include "cpu_features.h"
/*
* crc32_arm_crc() - implementation using crc32 instructions (only)
*
* In general this implementation is straightforward. However, naive use of the
* crc32 instructions is serial: one of the two inputs to each crc32 instruction
* is the output of the previous one. To take advantage of CPUs that can
* execute multiple crc32 instructions in parallel, when possible we interleave
* the checksumming of several adjacent chunks, then combine their CRCs.
*
* However, without pmull, combining CRCs is fairly slow. So in this pmull-less
* version, we only use a large chunk length, and thus we only do chunked
* processing if there is a lot of data to checksum. This also means that a
* variable chunk length wouldn't help much, so we just support a fixed length.
*/
#if HAVE_CRC32_INTRIN
# if HAVE_CRC32_NATIVE
# define ATTRIBUTES
# else
# ifdef ARCH_ARM32
# ifdef __clang__
# define ATTRIBUTES _target_attribute("armv8-a,crc")
# elif defined(__ARM_PCS_VFP)
/*
* +simd is needed to avoid a "selected architecture lacks an FPU"
* error with Debian arm-linux-gnueabihf-gcc when -mfpu is not
* explicitly specified on the command line.
*/
# define ATTRIBUTES _target_attribute("arch=armv8-a+crc+simd")
# else
# define ATTRIBUTES _target_attribute("arch=armv8-a+crc")
# endif
# else
# ifdef __clang__
# define ATTRIBUTES _target_attribute("crc")
# else
# define ATTRIBUTES _target_attribute("+crc")
# endif
# endif
# endif
#ifndef _MSC_VER
# include <arm_acle.h>
#endif
/*
* Combine the CRCs for 4 adjacent chunks of length L = CRC32_FIXED_CHUNK_LEN
* bytes each by computing:
*
* [ crc0*x^(3*8*L) + crc1*x^(2*8*L) + crc2*x^(1*8*L) + crc3 ] mod G(x)
*
* This has been optimized in several ways:
*
* - The needed multipliers (x to some power, reduced mod G(x)) were
* precomputed.
*
* - The 3 multiplications are interleaved.
*
* - The reduction mod G(x) is delayed to the end and done using __crc32d.
* Note that the use of __crc32d introduces an extra factor of x^32. To
* cancel that out along with the extra factor of x^1 that gets introduced
* because of how the 63-bit products are aligned in their 64-bit integers,
* the multipliers are actually x^(j*8*L - 33) instead of x^(j*8*L).
*/
static forceinline ATTRIBUTES u32
combine_crcs_slow(u32 crc0, u32 crc1, u32 crc2, u32 crc3)
{
u64 res0 = 0, res1 = 0, res2 = 0;
int i;
/* Multiply crc{0,1,2} by CRC32_FIXED_CHUNK_MULT_{3,2,1}. */
for (i = 0; i < 32; i++) {
if (CRC32_FIXED_CHUNK_MULT_3 & (1U << i))
res0 ^= (u64)crc0 << i;
if (CRC32_FIXED_CHUNK_MULT_2 & (1U << i))
res1 ^= (u64)crc1 << i;
if (CRC32_FIXED_CHUNK_MULT_1 & (1U << i))
res2 ^= (u64)crc2 << i;
}
/* Add the different parts and reduce mod G(x). */
return __crc32d(0, res0 ^ res1 ^ res2) ^ crc3;
}
#define crc32_arm_crc crc32_arm_crc
static u32 ATTRIBUTES MAYBE_UNUSED
crc32_arm_crc(u32 crc, const u8 *p, size_t len)
{
if (len >= 64) {
const size_t align = -(uintptr_t)p & 7;
/* Align p to the next 8-byte boundary. */
if (align) {
if (align & 1)
crc = __crc32b(crc, *p++);
if (align & 2) {
crc = __crc32h(crc, le16_bswap(*(u16 *)p));
p += 2;
}
if (align & 4) {
crc = __crc32w(crc, le32_bswap(*(u32 *)p));
p += 4;
}
len -= align;
}
/*
* Interleave the processing of multiple adjacent data chunks to
* take advantage of instruction-level parallelism.
*
* Some CPUs don't prefetch the data if it's being fetched in
* multiple interleaved streams, so do explicit prefetching.
*/
while (len >= CRC32_NUM_CHUNKS * CRC32_FIXED_CHUNK_LEN) {
const u64 *wp0 = (const u64 *)p;
const u64 * const wp0_end =
(const u64 *)(p + CRC32_FIXED_CHUNK_LEN);
u32 crc1 = 0, crc2 = 0, crc3 = 0;
STATIC_ASSERT(CRC32_NUM_CHUNKS == 4);
STATIC_ASSERT(CRC32_FIXED_CHUNK_LEN % (4 * 8) == 0);
do {
prefetchr(&wp0[64 + 0*CRC32_FIXED_CHUNK_LEN/8]);
prefetchr(&wp0[64 + 1*CRC32_FIXED_CHUNK_LEN/8]);
prefetchr(&wp0[64 + 2*CRC32_FIXED_CHUNK_LEN/8]);
prefetchr(&wp0[64 + 3*CRC32_FIXED_CHUNK_LEN/8]);
crc = __crc32d(crc, le64_bswap(wp0[0*CRC32_FIXED_CHUNK_LEN/8]));
crc1 = __crc32d(crc1, le64_bswap(wp0[1*CRC32_FIXED_CHUNK_LEN/8]));
crc2 = __crc32d(crc2, le64_bswap(wp0[2*CRC32_FIXED_CHUNK_LEN/8]));
crc3 = __crc32d(crc3, le64_bswap(wp0[3*CRC32_FIXED_CHUNK_LEN/8]));
wp0++;
crc = __crc32d(crc, le64_bswap(wp0[0*CRC32_FIXED_CHUNK_LEN/8]));
crc1 = __crc32d(crc1, le64_bswap(wp0[1*CRC32_FIXED_CHUNK_LEN/8]));
crc2 = __crc32d(crc2, le64_bswap(wp0[2*CRC32_FIXED_CHUNK_LEN/8]));
crc3 = __crc32d(crc3, le64_bswap(wp0[3*CRC32_FIXED_CHUNK_LEN/8]));
wp0++;
crc = __crc32d(crc, le64_bswap(wp0[0*CRC32_FIXED_CHUNK_LEN/8]));
crc1 = __crc32d(crc1, le64_bswap(wp0[1*CRC32_FIXED_CHUNK_LEN/8]));
crc2 = __crc32d(crc2, le64_bswap(wp0[2*CRC32_FIXED_CHUNK_LEN/8]));
crc3 = __crc32d(crc3, le64_bswap(wp0[3*CRC32_FIXED_CHUNK_LEN/8]));
wp0++;
crc = __crc32d(crc, le64_bswap(wp0[0*CRC32_FIXED_CHUNK_LEN/8]));
crc1 = __crc32d(crc1, le64_bswap(wp0[1*CRC32_FIXED_CHUNK_LEN/8]));
crc2 = __crc32d(crc2, le64_bswap(wp0[2*CRC32_FIXED_CHUNK_LEN/8]));
crc3 = __crc32d(crc3, le64_bswap(wp0[3*CRC32_FIXED_CHUNK_LEN/8]));
wp0++;
} while (wp0 != wp0_end);
crc = combine_crcs_slow(crc, crc1, crc2, crc3);
p += CRC32_NUM_CHUNKS * CRC32_FIXED_CHUNK_LEN;
len -= CRC32_NUM_CHUNKS * CRC32_FIXED_CHUNK_LEN;
}
/*
* Due to the large fixed chunk length used above, there might
* still be a lot of data left. So use a 64-byte loop here,
* instead of a loop that is less unrolled.
*/
while (len >= 64) {
crc = __crc32d(crc, le64_bswap(*(u64 *)(p + 0)));
crc = __crc32d(crc, le64_bswap(*(u64 *)(p + 8)));
crc = __crc32d(crc, le64_bswap(*(u64 *)(p + 16)));
crc = __crc32d(crc, le64_bswap(*(u64 *)(p + 24)));
crc = __crc32d(crc, le64_bswap(*(u64 *)(p + 32)));
crc = __crc32d(crc, le64_bswap(*(u64 *)(p + 40)));
crc = __crc32d(crc, le64_bswap(*(u64 *)(p + 48)));
crc = __crc32d(crc, le64_bswap(*(u64 *)(p + 56)));
p += 64;
len -= 64;
}
}
if (len & 32) {
crc = __crc32d(crc, get_unaligned_le64(p + 0));
crc = __crc32d(crc, get_unaligned_le64(p + 8));
crc = __crc32d(crc, get_unaligned_le64(p + 16));
crc = __crc32d(crc, get_unaligned_le64(p + 24));
p += 32;
}
if (len & 16) {
crc = __crc32d(crc, get_unaligned_le64(p + 0));
crc = __crc32d(crc, get_unaligned_le64(p + 8));
p += 16;
}
if (len & 8) {
crc = __crc32d(crc, get_unaligned_le64(p));
p += 8;
}
if (len & 4) {
crc = __crc32w(crc, get_unaligned_le32(p));
p += 4;
}
if (len & 2) {
crc = __crc32h(crc, get_unaligned_le16(p));
p += 2;
}
if (len & 1)
crc = __crc32b(crc, *p);
return crc;
}
#undef ATTRIBUTES
#endif /* crc32_arm_crc() */
/*
* crc32_arm_crc_pmullcombine() - implementation using crc32 instructions, plus
* pmull instructions for CRC combining
*
* This is similar to crc32_arm_crc(), but it enables the use of pmull
* (carryless multiplication) instructions for the steps where the CRCs of
* adjacent data chunks are combined. As this greatly speeds up CRC
* combination, this implementation also differs from crc32_arm_crc() in that it
* uses a variable chunk length which can get fairly small. The precomputed
* multipliers needed for the selected chunk length are loaded from a table.
*
* Note that pmull is used here only for combining the CRCs of separately
* checksummed chunks, not for folding the data itself. See crc32_arm_pmull*()
* for implementations that use pmull for folding the data itself.
*/
#if HAVE_CRC32_INTRIN && HAVE_PMULL_INTRIN
# if HAVE_CRC32_NATIVE && HAVE_PMULL_NATIVE && !USE_PMULL_TARGET_EVEN_IF_NATIVE
# define ATTRIBUTES
# else
# ifdef ARCH_ARM32
# define ATTRIBUTES _target_attribute("arch=armv8-a+crc,fpu=crypto-neon-fp-armv8")
# else
# ifdef __clang__
# define ATTRIBUTES _target_attribute("crc,aes")
# else
# define ATTRIBUTES _target_attribute("+crc,+crypto")
# endif
# endif
# endif
#ifndef _MSC_VER
# include <arm_acle.h>
#endif
#include <arm_neon.h>
/* Do carryless multiplication of two 32-bit values. */
static forceinline ATTRIBUTES u64
clmul_u32(u32 a, u32 b)
{
uint64x2_t res = vreinterpretq_u64_p128(
compat_vmull_p64((poly64_t)a, (poly64_t)b));
return vgetq_lane_u64(res, 0);
}
/*
* Like combine_crcs_slow(), but uses vmull_p64 to do the multiplications more
* quickly, and supports a variable chunk length. The chunk length is
* 'i * CRC32_MIN_VARIABLE_CHUNK_LEN'
* where 1 <= i < ARRAY_LEN(crc32_mults_for_chunklen).
*/
static forceinline ATTRIBUTES u32
combine_crcs_fast(u32 crc0, u32 crc1, u32 crc2, u32 crc3, size_t i)
{
u64 res0 = clmul_u32(crc0, crc32_mults_for_chunklen[i][0]);
u64 res1 = clmul_u32(crc1, crc32_mults_for_chunklen[i][1]);
u64 res2 = clmul_u32(crc2, crc32_mults_for_chunklen[i][2]);
return __crc32d(0, res0 ^ res1 ^ res2) ^ crc3;
}
#define crc32_arm_crc_pmullcombine crc32_arm_crc_pmullcombine
static u32 ATTRIBUTES MAYBE_UNUSED
crc32_arm_crc_pmullcombine(u32 crc, const u8 *p, size_t len)
{
const size_t align = -(uintptr_t)p & 7;
if (len >= align + CRC32_NUM_CHUNKS * CRC32_MIN_VARIABLE_CHUNK_LEN) {
/* Align p to the next 8-byte boundary. */
if (align) {
if (align & 1)
crc = __crc32b(crc, *p++);
if (align & 2) {
crc = __crc32h(crc, le16_bswap(*(u16 *)p));
p += 2;
}
if (align & 4) {
crc = __crc32w(crc, le32_bswap(*(u32 *)p));
p += 4;
}
len -= align;
}
/*
* Handle CRC32_MAX_VARIABLE_CHUNK_LEN specially, so that better
* code is generated for it.
*/
while (len >= CRC32_NUM_CHUNKS * CRC32_MAX_VARIABLE_CHUNK_LEN) {
const u64 *wp0 = (const u64 *)p;
const u64 * const wp0_end =
(const u64 *)(p + CRC32_MAX_VARIABLE_CHUNK_LEN);
u32 crc1 = 0, crc2 = 0, crc3 = 0;
STATIC_ASSERT(CRC32_NUM_CHUNKS == 4);
STATIC_ASSERT(CRC32_MAX_VARIABLE_CHUNK_LEN % (4 * 8) == 0);
do {
prefetchr(&wp0[64 + 0*CRC32_MAX_VARIABLE_CHUNK_LEN/8]);
prefetchr(&wp0[64 + 1*CRC32_MAX_VARIABLE_CHUNK_LEN/8]);
prefetchr(&wp0[64 + 2*CRC32_MAX_VARIABLE_CHUNK_LEN/8]);
prefetchr(&wp0[64 + 3*CRC32_MAX_VARIABLE_CHUNK_LEN/8]);
crc = __crc32d(crc, le64_bswap(wp0[0*CRC32_MAX_VARIABLE_CHUNK_LEN/8]));
crc1 = __crc32d(crc1, le64_bswap(wp0[1*CRC32_MAX_VARIABLE_CHUNK_LEN/8]));
crc2 = __crc32d(crc2, le64_bswap(wp0[2*CRC32_MAX_VARIABLE_CHUNK_LEN/8]));
crc3 = __crc32d(crc3, le64_bswap(wp0[3*CRC32_MAX_VARIABLE_CHUNK_LEN/8]));
wp0++;
crc = __crc32d(crc, le64_bswap(wp0[0*CRC32_MAX_VARIABLE_CHUNK_LEN/8]));
crc1 = __crc32d(crc1, le64_bswap(wp0[1*CRC32_MAX_VARIABLE_CHUNK_LEN/8]));
crc2 = __crc32d(crc2, le64_bswap(wp0[2*CRC32_MAX_VARIABLE_CHUNK_LEN/8]));
crc3 = __crc32d(crc3, le64_bswap(wp0[3*CRC32_MAX_VARIABLE_CHUNK_LEN/8]));
wp0++;
crc = __crc32d(crc, le64_bswap(wp0[0*CRC32_MAX_VARIABLE_CHUNK_LEN/8]));
crc1 = __crc32d(crc1, le64_bswap(wp0[1*CRC32_MAX_VARIABLE_CHUNK_LEN/8]));
crc2 = __crc32d(crc2, le64_bswap(wp0[2*CRC32_MAX_VARIABLE_CHUNK_LEN/8]));
crc3 = __crc32d(crc3, le64_bswap(wp0[3*CRC32_MAX_VARIABLE_CHUNK_LEN/8]));
wp0++;
crc = __crc32d(crc, le64_bswap(wp0[0*CRC32_MAX_VARIABLE_CHUNK_LEN/8]));
crc1 = __crc32d(crc1, le64_bswap(wp0[1*CRC32_MAX_VARIABLE_CHUNK_LEN/8]));
crc2 = __crc32d(crc2, le64_bswap(wp0[2*CRC32_MAX_VARIABLE_CHUNK_LEN/8]));
crc3 = __crc32d(crc3, le64_bswap(wp0[3*CRC32_MAX_VARIABLE_CHUNK_LEN/8]));
wp0++;
} while (wp0 != wp0_end);
crc = combine_crcs_fast(crc, crc1, crc2, crc3,
ARRAY_LEN(crc32_mults_for_chunklen) - 1);
p += CRC32_NUM_CHUNKS * CRC32_MAX_VARIABLE_CHUNK_LEN;
len -= CRC32_NUM_CHUNKS * CRC32_MAX_VARIABLE_CHUNK_LEN;
}
/* Handle up to one variable-length chunk. */
if (len >= CRC32_NUM_CHUNKS * CRC32_MIN_VARIABLE_CHUNK_LEN) {
const size_t i = len / (CRC32_NUM_CHUNKS *
CRC32_MIN_VARIABLE_CHUNK_LEN);
const size_t chunk_len =
i * CRC32_MIN_VARIABLE_CHUNK_LEN;
const u64 *wp0 = (const u64 *)(p + 0*chunk_len);
const u64 *wp1 = (const u64 *)(p + 1*chunk_len);
const u64 *wp2 = (const u64 *)(p + 2*chunk_len);
const u64 *wp3 = (const u64 *)(p + 3*chunk_len);
const u64 * const wp0_end = wp1;
u32 crc1 = 0, crc2 = 0, crc3 = 0;
STATIC_ASSERT(CRC32_NUM_CHUNKS == 4);
STATIC_ASSERT(CRC32_MIN_VARIABLE_CHUNK_LEN % (4 * 8) == 0);
do {
prefetchr(wp0 + 64);
prefetchr(wp1 + 64);
prefetchr(wp2 + 64);
prefetchr(wp3 + 64);
crc = __crc32d(crc, le64_bswap(*wp0++));
crc1 = __crc32d(crc1, le64_bswap(*wp1++));
crc2 = __crc32d(crc2, le64_bswap(*wp2++));
crc3 = __crc32d(crc3, le64_bswap(*wp3++));
crc = __crc32d(crc, le64_bswap(*wp0++));
crc1 = __crc32d(crc1, le64_bswap(*wp1++));
crc2 = __crc32d(crc2, le64_bswap(*wp2++));
crc3 = __crc32d(crc3, le64_bswap(*wp3++));
crc = __crc32d(crc, le64_bswap(*wp0++));
crc1 = __crc32d(crc1, le64_bswap(*wp1++));
crc2 = __crc32d(crc2, le64_bswap(*wp2++));
crc3 = __crc32d(crc3, le64_bswap(*wp3++));
crc = __crc32d(crc, le64_bswap(*wp0++));
crc1 = __crc32d(crc1, le64_bswap(*wp1++));
crc2 = __crc32d(crc2, le64_bswap(*wp2++));
crc3 = __crc32d(crc3, le64_bswap(*wp3++));
} while (wp0 != wp0_end);
crc = combine_crcs_fast(crc, crc1, crc2, crc3, i);
p += CRC32_NUM_CHUNKS * chunk_len;
len -= CRC32_NUM_CHUNKS * chunk_len;
}
while (len >= 32) {
crc = __crc32d(crc, le64_bswap(*(u64 *)(p + 0)));
crc = __crc32d(crc, le64_bswap(*(u64 *)(p + 8)));
crc = __crc32d(crc, le64_bswap(*(u64 *)(p + 16)));
crc = __crc32d(crc, le64_bswap(*(u64 *)(p + 24)));
p += 32;
len -= 32;
}
} else {
while (len >= 32) {
crc = __crc32d(crc, get_unaligned_le64(p + 0));
crc = __crc32d(crc, get_unaligned_le64(p + 8));
crc = __crc32d(crc, get_unaligned_le64(p + 16));
crc = __crc32d(crc, get_unaligned_le64(p + 24));
p += 32;
len -= 32;
}
}
if (len & 16) {
crc = __crc32d(crc, get_unaligned_le64(p + 0));
crc = __crc32d(crc, get_unaligned_le64(p + 8));
p += 16;
}
if (len & 8) {
crc = __crc32d(crc, get_unaligned_le64(p));
p += 8;
}
if (len & 4) {
crc = __crc32w(crc, get_unaligned_le32(p));
p += 4;
}
if (len & 2) {
crc = __crc32h(crc, get_unaligned_le16(p));
p += 2;
}
if (len & 1)
crc = __crc32b(crc, *p);
return crc;
}
#undef ATTRIBUTES
#endif /* crc32_arm_crc_pmullcombine() */
/*
* crc32_arm_pmullx4() - implementation using "folding" with pmull instructions
*
* This implementation is intended for CPUs that support pmull instructions but
* not crc32 instructions.
*/
#if HAVE_PMULL_INTRIN
# define crc32_arm_pmullx4 crc32_arm_pmullx4
# define SUFFIX _pmullx4
# if HAVE_PMULL_NATIVE && !USE_PMULL_TARGET_EVEN_IF_NATIVE
# define ATTRIBUTES
# else
# ifdef ARCH_ARM32
# define ATTRIBUTES _target_attribute("fpu=crypto-neon-fp-armv8")
# else
# ifdef __clang__
/*
* This used to use "crypto", but that stopped working with clang 16.
* Now only "aes" works. "aes" works with older versions too, so use
* that. No "+" prefix; clang 15 and earlier doesn't accept that.
*/
# define ATTRIBUTES _target_attribute("aes")
# else
/*
* With gcc, only "+crypto" works. Both the "+" prefix and the
* "crypto" (not "aes") are essential...
*/
# define ATTRIBUTES _target_attribute("+crypto")
# endif
# endif
# endif
# define ENABLE_EOR3 0
# include "crc32_pmull_helpers.h"
static u32 ATTRIBUTES MAYBE_UNUSED
crc32_arm_pmullx4(u32 crc, const u8 *p, size_t len)
{
static const u64 _aligned_attribute(16) mults[3][2] = {
CRC32_1VECS_MULTS,
CRC32_4VECS_MULTS,
CRC32_2VECS_MULTS,
};
static const u64 _aligned_attribute(16) final_mults[3][2] = {
{ CRC32_FINAL_MULT, 0 },
{ CRC32_BARRETT_CONSTANT_1, 0 },
{ CRC32_BARRETT_CONSTANT_2, 0 },
};
const uint8x16_t zeroes = vdupq_n_u8(0);
const uint8x16_t mask32 = vreinterpretq_u8_u64(vdupq_n_u64(0xFFFFFFFF));
const poly64x2_t multipliers_1 = load_multipliers(mults[0]);
uint8x16_t v0, v1, v2, v3;
if (len < 64 + 15) {
if (len < 16)
return crc32_slice1(crc, p, len);
v0 = veorq_u8(vld1q_u8(p), u32_to_bytevec(crc));
p += 16;
len -= 16;
while (len >= 16) {
v0 = fold_vec(v0, vld1q_u8(p), multipliers_1);
p += 16;
len -= 16;
}
} else {
const poly64x2_t multipliers_4 = load_multipliers(mults[1]);
const poly64x2_t multipliers_2 = load_multipliers(mults[2]);
const size_t align = -(uintptr_t)p & 15;
const uint8x16_t *vp;
v0 = veorq_u8(vld1q_u8(p), u32_to_bytevec(crc));
p += 16;
/* Align p to the next 16-byte boundary. */
if (align) {
v0 = fold_partial_vec(v0, p, align, multipliers_1);
p += align;
len -= align;
}
vp = (const uint8x16_t *)p;
v1 = *vp++;
v2 = *vp++;
v3 = *vp++;
while (len >= 64 + 64) {
v0 = fold_vec(v0, *vp++, multipliers_4);
v1 = fold_vec(v1, *vp++, multipliers_4);
v2 = fold_vec(v2, *vp++, multipliers_4);
v3 = fold_vec(v3, *vp++, multipliers_4);
len -= 64;
}
v0 = fold_vec(v0, v2, multipliers_2);
v1 = fold_vec(v1, v3, multipliers_2);
if (len & 32) {
v0 = fold_vec(v0, *vp++, multipliers_2);
v1 = fold_vec(v1, *vp++, multipliers_2);
}
v0 = fold_vec(v0, v1, multipliers_1);
if (len & 16)
v0 = fold_vec(v0, *vp++, multipliers_1);
p = (const u8 *)vp;
len &= 15;
}
/* Handle any remaining partial block now before reducing to 32 bits. */
if (len)
v0 = fold_partial_vec(v0, p, len, multipliers_1);
/*
* Fold 128 => 96 bits. This also implicitly appends 32 zero bits,
* which is equivalent to multiplying by x^32. This is needed because
* the CRC is defined as M(x)*x^32 mod G(x), not just M(x) mod G(x).
*/
v0 = veorq_u8(vextq_u8(v0, zeroes, 8),
clmul_high(vextq_u8(zeroes, v0, 8), multipliers_1));
/* Fold 96 => 64 bits. */
v0 = veorq_u8(vextq_u8(v0, zeroes, 4),
clmul_low(vandq_u8(v0, mask32),
load_multipliers(final_mults[0])));
/* Reduce 64 => 32 bits using Barrett reduction. */
v1 = clmul_low(vandq_u8(v0, mask32), load_multipliers(final_mults[1]));
v1 = clmul_low(vandq_u8(v1, mask32), load_multipliers(final_mults[2]));
return vgetq_lane_u32(vreinterpretq_u32_u8(veorq_u8(v0, v1)), 1);
}
#undef SUFFIX
#undef ATTRIBUTES
#undef ENABLE_EOR3
#endif /* crc32_arm_pmullx4() */
/*
* crc32_arm_pmullx12_crc() - large-stride implementation using "folding" with
* pmull instructions, where crc32 instructions are also available
*
* See crc32_pmull_wide.h for explanation.
*/
#if defined(ARCH_ARM64) && HAVE_PMULL_INTRIN && HAVE_CRC32_INTRIN
# define crc32_arm_pmullx12_crc crc32_arm_pmullx12_crc
# define SUFFIX _pmullx12_crc
# if HAVE_PMULL_NATIVE && HAVE_CRC32_NATIVE && !USE_PMULL_TARGET_EVEN_IF_NATIVE
# define ATTRIBUTES
# else
# ifdef __clang__
# define ATTRIBUTES _target_attribute("aes,crc")
# else
# define ATTRIBUTES _target_attribute("+crypto,+crc")
# endif
# endif
# define ENABLE_EOR3 0
# include "crc32_pmull_wide.h"
#endif
/*
* crc32_arm_pmullx12_crc_eor3()
*
* This like crc32_arm_pmullx12_crc(), but it adds the eor3 instruction (from
* the sha3 extension) for even better performance.
*
* Note: we require HAVE_SHA3_TARGET (or HAVE_SHA3_NATIVE) rather than
* HAVE_SHA3_INTRIN, as we have an inline asm fallback for eor3.
*/
#if defined(ARCH_ARM64) && HAVE_PMULL_INTRIN && HAVE_CRC32_INTRIN && \
(HAVE_SHA3_TARGET || HAVE_SHA3_NATIVE)
# define crc32_arm_pmullx12_crc_eor3 crc32_arm_pmullx12_crc_eor3
# define SUFFIX _pmullx12_crc_eor3
# if HAVE_PMULL_NATIVE && HAVE_CRC32_NATIVE && HAVE_SHA3_NATIVE && \
!USE_PMULL_TARGET_EVEN_IF_NATIVE
# define ATTRIBUTES
# else
# ifdef __clang__
# define ATTRIBUTES _target_attribute("aes,crc,sha3")
/*
* With gcc, arch=armv8.2-a is needed for the sha3 intrinsics, unless the
* default target is armv8.3-a or later in which case it must be omitted.
* armv8.3-a or later can be detected by checking for __ARM_FEATURE_JCVT.
*/
# elif defined(__ARM_FEATURE_JCVT)
# define ATTRIBUTES _target_attribute("+crypto,+crc,+sha3")
# else
# define ATTRIBUTES _target_attribute("arch=armv8.2-a+crypto+crc+sha3")
# endif
# endif
# define ENABLE_EOR3 1
# include "crc32_pmull_wide.h"
#endif
/*
* On the Apple M1 processor, crc32 instructions max out at about 25.5 GB/s in
* the best case of using a 3-way or greater interleaved chunked implementation,
* whereas a pmull-based implementation achieves 68 GB/s provided that the
* stride length is large enough (about 10+ vectors with eor3, or 12+ without).
*
* For now we assume that crc32 instructions are preferable in other cases.
*/
#define PREFER_PMULL_TO_CRC 0
#ifdef __APPLE__
# include <TargetConditionals.h>
# if TARGET_OS_OSX
# undef PREFER_PMULL_TO_CRC
# define PREFER_PMULL_TO_CRC 1
# endif
#endif
/*
* If the best implementation is statically available, use it unconditionally.
* Otherwise choose the best implementation at runtime.
*/
#if PREFER_PMULL_TO_CRC && defined(crc32_arm_pmullx12_crc_eor3) && \
HAVE_PMULL_NATIVE && HAVE_CRC32_NATIVE && HAVE_SHA3_NATIVE
# define DEFAULT_IMPL crc32_arm_pmullx12_crc_eor3
#elif !PREFER_PMULL_TO_CRC && defined(crc32_arm_crc_pmullcombine) && \
HAVE_CRC32_NATIVE && HAVE_PMULL_NATIVE
# define DEFAULT_IMPL crc32_arm_crc_pmullcombine
#else
static inline crc32_func_t
arch_select_crc32_func(void)
{
const u32 features MAYBE_UNUSED = get_arm_cpu_features();
#if PREFER_PMULL_TO_CRC && defined(crc32_arm_pmullx12_crc_eor3)
if (HAVE_PMULL(features) && HAVE_CRC32(features) && HAVE_SHA3(features))
return crc32_arm_pmullx12_crc_eor3;
#endif
#if PREFER_PMULL_TO_CRC && defined(crc32_arm_pmullx12_crc)
if (HAVE_PMULL(features) && HAVE_CRC32(features))
return crc32_arm_pmullx12_crc;
#endif
#ifdef crc32_arm_crc_pmullcombine
if (HAVE_CRC32(features) && HAVE_PMULL(features))
return crc32_arm_crc_pmullcombine;
#endif
#ifdef crc32_arm_crc
if (HAVE_CRC32(features))
return crc32_arm_crc;
#endif
#ifdef crc32_arm_pmullx4
if (HAVE_PMULL(features))
return crc32_arm_pmullx4;
#endif
return NULL;
}
#define arch_select_crc32_func arch_select_crc32_func
#endif
#endif /* LIB_ARM_CRC32_IMPL_H */

View file

@ -1,184 +0,0 @@
/*
* arm/crc32_pmull_helpers.h - helper functions for CRC-32 folding with PMULL
*
* Copyright 2022 Eric Biggers
*
* Permission is hereby granted, free of charge, to any person
* obtaining a copy of this software and associated documentation
* files (the "Software"), to deal in the Software without
* restriction, including without limitation the rights to use,
* copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following
* conditions:
*
* The above copyright notice and this permission notice shall be
* included in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
* OTHER DEALINGS IN THE SOFTWARE.
*/
/*
* This file is a "template" for instantiating helper functions for CRC folding
* with pmull instructions. It accepts the following parameters:
*
* SUFFIX:
* Name suffix to append to all instantiated functions.
* ATTRIBUTES:
* Target function attributes to use.
* ENABLE_EOR3:
* Use the eor3 instruction (from the sha3 extension).
*/
#include <arm_neon.h>
/* Create a vector with 'a' in the first 4 bytes, and the rest zeroed out. */
#undef u32_to_bytevec
static forceinline ATTRIBUTES uint8x16_t
ADD_SUFFIX(u32_to_bytevec)(u32 a)
{
return vreinterpretq_u8_u32(vsetq_lane_u32(a, vdupq_n_u32(0), 0));
}
#define u32_to_bytevec ADD_SUFFIX(u32_to_bytevec)
/* Load two 64-bit values into a vector. */
#undef load_multipliers
static forceinline ATTRIBUTES poly64x2_t
ADD_SUFFIX(load_multipliers)(const u64 p[2])
{
return vreinterpretq_p64_u64(vld1q_u64(p));
}
#define load_multipliers ADD_SUFFIX(load_multipliers)
/* Do carryless multiplication of the low halves of two vectors. */
#undef clmul_low
static forceinline ATTRIBUTES uint8x16_t
ADD_SUFFIX(clmul_low)(uint8x16_t a, poly64x2_t b)
{
return vreinterpretq_u8_p128(
compat_vmull_p64(vgetq_lane_p64(vreinterpretq_p64_u8(a), 0),
vgetq_lane_p64(b, 0)));
}
#define clmul_low ADD_SUFFIX(clmul_low)
/* Do carryless multiplication of the high halves of two vectors. */
#undef clmul_high
static forceinline ATTRIBUTES uint8x16_t
ADD_SUFFIX(clmul_high)(uint8x16_t a, poly64x2_t b)
{
#if defined(__clang__) && defined(ARCH_ARM64)
/*
* Use inline asm to ensure that pmull2 is really used. This works
* around clang bug https://github.com/llvm/llvm-project/issues/52868.
*/
uint8x16_t res;
__asm__("pmull2 %0.1q, %1.2d, %2.2d" : "=w" (res) : "w" (a), "w" (b));
return res;
#else
return vreinterpretq_u8_p128(vmull_high_p64(vreinterpretq_p64_u8(a), b));
#endif
}
#define clmul_high ADD_SUFFIX(clmul_high)
#undef eor3
static forceinline ATTRIBUTES uint8x16_t
ADD_SUFFIX(eor3)(uint8x16_t a, uint8x16_t b, uint8x16_t c)
{
#if ENABLE_EOR3
#if HAVE_SHA3_INTRIN
return veor3q_u8(a, b, c);
#else
uint8x16_t res;
__asm__("eor3 %0.16b, %1.16b, %2.16b, %3.16b"
: "=w" (res) : "w" (a), "w" (b), "w" (c));
return res;
#endif
#else /* ENABLE_EOR3 */
return veorq_u8(veorq_u8(a, b), c);
#endif /* !ENABLE_EOR3 */
}
#define eor3 ADD_SUFFIX(eor3)
#undef fold_vec
static forceinline ATTRIBUTES uint8x16_t
ADD_SUFFIX(fold_vec)(uint8x16_t src, uint8x16_t dst, poly64x2_t multipliers)
{
uint8x16_t a = clmul_low(src, multipliers);
uint8x16_t b = clmul_high(src, multipliers);
return eor3(a, b, dst);
}
#define fold_vec ADD_SUFFIX(fold_vec)
#undef vtbl
static forceinline ATTRIBUTES uint8x16_t
ADD_SUFFIX(vtbl)(uint8x16_t table, uint8x16_t indices)
{
#ifdef ARCH_ARM64
return vqtbl1q_u8(table, indices);
#else
uint8x8x2_t tab2;
tab2.val[0] = vget_low_u8(table);
tab2.val[1] = vget_high_u8(table);
return vcombine_u8(vtbl2_u8(tab2, vget_low_u8(indices)),
vtbl2_u8(tab2, vget_high_u8(indices)));
#endif
}
#define vtbl ADD_SUFFIX(vtbl)
/*
* Given v containing a 16-byte polynomial, and a pointer 'p' that points to the
* next '1 <= len <= 15' data bytes, rearrange the concatenation of v and the
* data into vectors x0 and x1 that contain 'len' bytes and 16 bytes,
* respectively. Then fold x0 into x1 and return the result. Assumes that
* 'p + len - 16' is in-bounds.
*/
#undef fold_partial_vec
static forceinline ATTRIBUTES MAYBE_UNUSED uint8x16_t
ADD_SUFFIX(fold_partial_vec)(uint8x16_t v, const u8 *p, size_t len,
poly64x2_t multipliers_1)
{
/*
* vtbl(v, shift_tab[len..len+15]) left shifts v by 16-len bytes.
* vtbl(v, shift_tab[len+16..len+31]) right shifts v by len bytes.
*/
static const u8 shift_tab[48] = {
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
};
const uint8x16_t lshift = vld1q_u8(&shift_tab[len]);
const uint8x16_t rshift = vld1q_u8(&shift_tab[len + 16]);
uint8x16_t x0, x1, bsl_mask;
/* x0 = v left-shifted by '16 - len' bytes */
x0 = vtbl(v, lshift);
/* Create a vector of '16 - len' 0x00 bytes, then 'len' 0xff bytes. */
bsl_mask = vreinterpretq_u8_s8(
vshrq_n_s8(vreinterpretq_s8_u8(rshift), 7));
/*
* x1 = the last '16 - len' bytes from v (i.e. v right-shifted by 'len'
* bytes) followed by the remaining data.
*/
x1 = vbslq_u8(bsl_mask /* 0 bits select from arg3, 1 bits from arg2 */,
vld1q_u8(p + len - 16), vtbl(v, rshift));
return fold_vec(x0, x1, multipliers_1);
}
#define fold_partial_vec ADD_SUFFIX(fold_partial_vec)

View file

@ -1,227 +0,0 @@
/*
* arm/crc32_pmull_wide.h - gzip CRC-32 with PMULL (extra-wide version)
*
* Copyright 2022 Eric Biggers
*
* Permission is hereby granted, free of charge, to any person
* obtaining a copy of this software and associated documentation
* files (the "Software"), to deal in the Software without
* restriction, including without limitation the rights to use,
* copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following
* conditions:
*
* The above copyright notice and this permission notice shall be
* included in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
* OTHER DEALINGS IN THE SOFTWARE.
*/
/*
* This file is a "template" for instantiating PMULL-based crc32_arm functions.
* The "parameters" are:
*
* SUFFIX:
* Name suffix to append to all instantiated functions.
* ATTRIBUTES:
* Target function attributes to use.
* ENABLE_EOR3:
* Use the eor3 instruction (from the sha3 extension).
*
* This is the extra-wide version; it uses an unusually large stride length of
* 12, and it assumes that crc32 instructions are available too. It's intended
* for powerful CPUs that support both pmull and crc32 instructions, but where
* throughput of pmull and xor (given enough instructions issued in parallel) is
* significantly higher than that of crc32, thus making the crc32 instructions
* (counterintuitively) not actually the fastest way to compute the CRC-32. The
* Apple M1 processor is an example of such a CPU.
*/
#ifndef _MSC_VER
# include <arm_acle.h>
#endif
#include <arm_neon.h>
#include "crc32_pmull_helpers.h"
static u32 ATTRIBUTES MAYBE_UNUSED
ADD_SUFFIX(crc32_arm)(u32 crc, const u8 *p, size_t len)
{
uint8x16_t v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11;
if (len < 3 * 192) {
static const u64 _aligned_attribute(16) mults[3][2] = {
CRC32_4VECS_MULTS, CRC32_2VECS_MULTS, CRC32_1VECS_MULTS,
};
poly64x2_t multipliers_4, multipliers_2, multipliers_1;
if (len < 64)
goto tail;
multipliers_4 = load_multipliers(mults[0]);
multipliers_2 = load_multipliers(mults[1]);
multipliers_1 = load_multipliers(mults[2]);
/*
* Short length; don't bother aligning the pointer, and fold
* 64 bytes (4 vectors) at a time, at most.
*/
v0 = veorq_u8(vld1q_u8(p + 0), u32_to_bytevec(crc));
v1 = vld1q_u8(p + 16);
v2 = vld1q_u8(p + 32);
v3 = vld1q_u8(p + 48);
p += 64;
len -= 64;
while (len >= 64) {
v0 = fold_vec(v0, vld1q_u8(p + 0), multipliers_4);
v1 = fold_vec(v1, vld1q_u8(p + 16), multipliers_4);
v2 = fold_vec(v2, vld1q_u8(p + 32), multipliers_4);
v3 = fold_vec(v3, vld1q_u8(p + 48), multipliers_4);
p += 64;
len -= 64;
}
v0 = fold_vec(v0, v2, multipliers_2);
v1 = fold_vec(v1, v3, multipliers_2);
if (len >= 32) {
v0 = fold_vec(v0, vld1q_u8(p + 0), multipliers_2);
v1 = fold_vec(v1, vld1q_u8(p + 16), multipliers_2);
p += 32;
len -= 32;
}
v0 = fold_vec(v0, v1, multipliers_1);
} else {
static const u64 _aligned_attribute(16) mults[4][2] = {
CRC32_12VECS_MULTS, CRC32_6VECS_MULTS,
CRC32_3VECS_MULTS, CRC32_1VECS_MULTS,
};
const poly64x2_t multipliers_12 = load_multipliers(mults[0]);
const poly64x2_t multipliers_6 = load_multipliers(mults[1]);
const poly64x2_t multipliers_3 = load_multipliers(mults[2]);
const poly64x2_t multipliers_1 = load_multipliers(mults[3]);
const size_t align = -(uintptr_t)p & 15;
const uint8x16_t *vp;
/* Align p to the next 16-byte boundary. */
if (align) {
if (align & 1)
crc = __crc32b(crc, *p++);
if (align & 2) {
crc = __crc32h(crc, le16_bswap(*(u16 *)p));
p += 2;
}
if (align & 4) {
crc = __crc32w(crc, le32_bswap(*(u32 *)p));
p += 4;
}
if (align & 8) {
crc = __crc32d(crc, le64_bswap(*(u64 *)p));
p += 8;
}
len -= align;
}
vp = (const uint8x16_t *)p;
v0 = veorq_u8(*vp++, u32_to_bytevec(crc));
v1 = *vp++;
v2 = *vp++;
v3 = *vp++;
v4 = *vp++;
v5 = *vp++;
v6 = *vp++;
v7 = *vp++;
v8 = *vp++;
v9 = *vp++;
v10 = *vp++;
v11 = *vp++;
len -= 192;
/* Fold 192 bytes (12 vectors) at a time. */
do {
v0 = fold_vec(v0, *vp++, multipliers_12);
v1 = fold_vec(v1, *vp++, multipliers_12);
v2 = fold_vec(v2, *vp++, multipliers_12);
v3 = fold_vec(v3, *vp++, multipliers_12);
v4 = fold_vec(v4, *vp++, multipliers_12);
v5 = fold_vec(v5, *vp++, multipliers_12);
v6 = fold_vec(v6, *vp++, multipliers_12);
v7 = fold_vec(v7, *vp++, multipliers_12);
v8 = fold_vec(v8, *vp++, multipliers_12);
v9 = fold_vec(v9, *vp++, multipliers_12);
v10 = fold_vec(v10, *vp++, multipliers_12);
v11 = fold_vec(v11, *vp++, multipliers_12);
len -= 192;
} while (len >= 192);
/*
* Fewer than 192 bytes left. Fold v0-v11 down to just v0,
* while processing up to 144 more bytes.
*/
v0 = fold_vec(v0, v6, multipliers_6);
v1 = fold_vec(v1, v7, multipliers_6);
v2 = fold_vec(v2, v8, multipliers_6);
v3 = fold_vec(v3, v9, multipliers_6);
v4 = fold_vec(v4, v10, multipliers_6);
v5 = fold_vec(v5, v11, multipliers_6);
if (len >= 96) {
v0 = fold_vec(v0, *vp++, multipliers_6);
v1 = fold_vec(v1, *vp++, multipliers_6);
v2 = fold_vec(v2, *vp++, multipliers_6);
v3 = fold_vec(v3, *vp++, multipliers_6);
v4 = fold_vec(v4, *vp++, multipliers_6);
v5 = fold_vec(v5, *vp++, multipliers_6);
len -= 96;
}
v0 = fold_vec(v0, v3, multipliers_3);
v1 = fold_vec(v1, v4, multipliers_3);
v2 = fold_vec(v2, v5, multipliers_3);
if (len >= 48) {
v0 = fold_vec(v0, *vp++, multipliers_3);
v1 = fold_vec(v1, *vp++, multipliers_3);
v2 = fold_vec(v2, *vp++, multipliers_3);
len -= 48;
}
v0 = fold_vec(v0, v1, multipliers_1);
v0 = fold_vec(v0, v2, multipliers_1);
p = (const u8 *)vp;
}
/* Reduce 128 to 32 bits using crc32 instructions. */
crc = __crc32d(0, vgetq_lane_u64(vreinterpretq_u64_u8(v0), 0));
crc = __crc32d(crc, vgetq_lane_u64(vreinterpretq_u64_u8(v0), 1));
tail:
/* Finish up the remainder using crc32 instructions. */
if (len & 32) {
crc = __crc32d(crc, get_unaligned_le64(p + 0));
crc = __crc32d(crc, get_unaligned_le64(p + 8));
crc = __crc32d(crc, get_unaligned_le64(p + 16));
crc = __crc32d(crc, get_unaligned_le64(p + 24));
p += 32;
}
if (len & 16) {
crc = __crc32d(crc, get_unaligned_le64(p + 0));
crc = __crc32d(crc, get_unaligned_le64(p + 8));
p += 16;
}
if (len & 8) {
crc = __crc32d(crc, get_unaligned_le64(p));
p += 8;
}
if (len & 4) {
crc = __crc32w(crc, get_unaligned_le32(p));
p += 4;
}
if (len & 2) {
crc = __crc32h(crc, get_unaligned_le16(p));
p += 2;
}
if (len & 1)
crc = __crc32b(crc, *p);
return crc;
}
#undef SUFFIX
#undef ATTRIBUTES
#undef ENABLE_EOR3

View file

@ -1,79 +0,0 @@
/*
* arm/matchfinder_impl.h - ARM implementations of matchfinder functions
*
* Copyright 2016 Eric Biggers
*
* Permission is hereby granted, free of charge, to any person
* obtaining a copy of this software and associated documentation
* files (the "Software"), to deal in the Software without
* restriction, including without limitation the rights to use,
* copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following
* conditions:
*
* The above copyright notice and this permission notice shall be
* included in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
* OTHER DEALINGS IN THE SOFTWARE.
*/
#ifndef LIB_ARM_MATCHFINDER_IMPL_H
#define LIB_ARM_MATCHFINDER_IMPL_H
#include "cpu_features.h"
#if HAVE_NEON_NATIVE
# include <arm_neon.h>
static forceinline void
matchfinder_init_neon(mf_pos_t *data, size_t size)
{
int16x8_t *p = (int16x8_t *)data;
int16x8_t v = vdupq_n_s16(MATCHFINDER_INITVAL);
STATIC_ASSERT(MATCHFINDER_MEM_ALIGNMENT % sizeof(*p) == 0);
STATIC_ASSERT(MATCHFINDER_SIZE_ALIGNMENT % (4 * sizeof(*p)) == 0);
STATIC_ASSERT(sizeof(mf_pos_t) == 2);
do {
p[0] = v;
p[1] = v;
p[2] = v;
p[3] = v;
p += 4;
size -= 4 * sizeof(*p);
} while (size != 0);
}
#define matchfinder_init matchfinder_init_neon
static forceinline void
matchfinder_rebase_neon(mf_pos_t *data, size_t size)
{
int16x8_t *p = (int16x8_t *)data;
int16x8_t v = vdupq_n_s16((u16)-MATCHFINDER_WINDOW_SIZE);
STATIC_ASSERT(MATCHFINDER_MEM_ALIGNMENT % sizeof(*p) == 0);
STATIC_ASSERT(MATCHFINDER_SIZE_ALIGNMENT % (4 * sizeof(*p)) == 0);
STATIC_ASSERT(sizeof(mf_pos_t) == 2);
do {
p[0] = vqaddq_s16(p[0], v);
p[1] = vqaddq_s16(p[1], v);
p[2] = vqaddq_s16(p[2], v);
p[3] = vqaddq_s16(p[3], v);
p += 4;
size -= 4 * sizeof(*p);
} while (size != 0);
}
#define matchfinder_rebase matchfinder_rebase_neon
#endif /* HAVE_NEON_NATIVE */
#endif /* LIB_ARM_MATCHFINDER_IMPL_H */

View file

@ -1,342 +0,0 @@
/*
* bt_matchfinder.h - Lempel-Ziv matchfinding with a hash table of binary trees
*
* Copyright 2016 Eric Biggers
*
* Permission is hereby granted, free of charge, to any person
* obtaining a copy of this software and associated documentation
* files (the "Software"), to deal in the Software without
* restriction, including without limitation the rights to use,
* copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following
* conditions:
*
* The above copyright notice and this permission notice shall be
* included in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
* OTHER DEALINGS IN THE SOFTWARE.
*
* ----------------------------------------------------------------------------
*
* This is a Binary Trees (bt) based matchfinder.
*
* The main data structure is a hash table where each hash bucket contains a
* binary tree of sequences whose first 4 bytes share the same hash code. Each
* sequence is identified by its starting position in the input buffer. Each
* binary tree is always sorted such that each left child represents a sequence
* lexicographically lesser than its parent and each right child represents a
* sequence lexicographically greater than its parent.
*
* The algorithm processes the input buffer sequentially. At each byte
* position, the hash code of the first 4 bytes of the sequence beginning at
* that position (the sequence being matched against) is computed. This
* identifies the hash bucket to use for that position. Then, a new binary tree
* node is created to represent the current sequence. Then, in a single tree
* traversal, the hash bucket's binary tree is searched for matches and is
* re-rooted at the new node.
*
* Compared to the simpler algorithm that uses linked lists instead of binary
* trees (see hc_matchfinder.h), the binary tree version gains more information
* at each node visitation. Ideally, the binary tree version will examine only
* 'log(n)' nodes to find the same matches that the linked list version will
* find by examining 'n' nodes. In addition, the binary tree version can
* examine fewer bytes at each node by taking advantage of the common prefixes
* that result from the sort order, whereas the linked list version may have to
* examine up to the full length of the match at each node.
*
* However, it is not always best to use the binary tree version. It requires
* nearly twice as much memory as the linked list version, and it takes time to
* keep the binary trees sorted, even at positions where the compressor does not
* need matches. Generally, when doing fast compression on small buffers,
* binary trees are the wrong approach. They are best suited for thorough
* compression and/or large buffers.
*
* ----------------------------------------------------------------------------
*/
#ifndef LIB_BT_MATCHFINDER_H
#define LIB_BT_MATCHFINDER_H
#include "matchfinder_common.h"
#define BT_MATCHFINDER_HASH3_ORDER 16
#define BT_MATCHFINDER_HASH3_WAYS 2
#define BT_MATCHFINDER_HASH4_ORDER 16
#define BT_MATCHFINDER_TOTAL_HASH_SIZE \
(((1UL << BT_MATCHFINDER_HASH3_ORDER) * BT_MATCHFINDER_HASH3_WAYS + \
(1UL << BT_MATCHFINDER_HASH4_ORDER)) * sizeof(mf_pos_t))
/* Representation of a match found by the bt_matchfinder */
struct lz_match {
/* The number of bytes matched. */
u16 length;
/* The offset back from the current position that was matched. */
u16 offset;
};
struct MATCHFINDER_ALIGNED bt_matchfinder {
/* The hash table for finding length 3 matches */
mf_pos_t hash3_tab[1UL << BT_MATCHFINDER_HASH3_ORDER][BT_MATCHFINDER_HASH3_WAYS];
/* The hash table which contains the roots of the binary trees for
* finding length 4+ matches */
mf_pos_t hash4_tab[1UL << BT_MATCHFINDER_HASH4_ORDER];
/* The child node references for the binary trees. The left and right
* children of the node for the sequence with position 'pos' are
* 'child_tab[pos * 2]' and 'child_tab[pos * 2 + 1]', respectively. */
mf_pos_t child_tab[2UL * MATCHFINDER_WINDOW_SIZE];
};
/* Prepare the matchfinder for a new input buffer. */
static forceinline void
bt_matchfinder_init(struct bt_matchfinder *mf)
{
STATIC_ASSERT(BT_MATCHFINDER_TOTAL_HASH_SIZE %
MATCHFINDER_SIZE_ALIGNMENT == 0);
matchfinder_init((mf_pos_t *)mf, BT_MATCHFINDER_TOTAL_HASH_SIZE);
}
static forceinline void
bt_matchfinder_slide_window(struct bt_matchfinder *mf)
{
STATIC_ASSERT(sizeof(*mf) % MATCHFINDER_SIZE_ALIGNMENT == 0);
matchfinder_rebase((mf_pos_t *)mf, sizeof(*mf));
}
static forceinline mf_pos_t *
bt_left_child(struct bt_matchfinder *mf, s32 node)
{
return &mf->child_tab[2 * (node & (MATCHFINDER_WINDOW_SIZE - 1)) + 0];
}
static forceinline mf_pos_t *
bt_right_child(struct bt_matchfinder *mf, s32 node)
{
return &mf->child_tab[2 * (node & (MATCHFINDER_WINDOW_SIZE - 1)) + 1];
}
/* The minimum permissible value of 'max_len' for bt_matchfinder_get_matches()
* and bt_matchfinder_skip_byte(). There must be sufficiently many bytes
* remaining to load a 32-bit integer from the *next* position. */
#define BT_MATCHFINDER_REQUIRED_NBYTES 5
/* Advance the binary tree matchfinder by one byte, optionally recording
* matches. @record_matches should be a compile-time constant. */
static forceinline struct lz_match *
bt_matchfinder_advance_one_byte(struct bt_matchfinder * const mf,
const u8 * const in_base,
const ptrdiff_t cur_pos,
const u32 max_len,
const u32 nice_len,
const u32 max_search_depth,
u32 * const next_hashes,
struct lz_match *lz_matchptr,
const bool record_matches)
{
const u8 *in_next = in_base + cur_pos;
u32 depth_remaining = max_search_depth;
const s32 cutoff = cur_pos - MATCHFINDER_WINDOW_SIZE;
u32 next_hashseq;
u32 hash3;
u32 hash4;
s32 cur_node;
#if BT_MATCHFINDER_HASH3_WAYS >= 2
s32 cur_node_2;
#endif
const u8 *matchptr;
mf_pos_t *pending_lt_ptr, *pending_gt_ptr;
u32 best_lt_len, best_gt_len;
u32 len;
u32 best_len = 3;
STATIC_ASSERT(BT_MATCHFINDER_HASH3_WAYS >= 1 &&
BT_MATCHFINDER_HASH3_WAYS <= 2);
next_hashseq = get_unaligned_le32(in_next + 1);
hash3 = next_hashes[0];
hash4 = next_hashes[1];
next_hashes[0] = lz_hash(next_hashseq & 0xFFFFFF, BT_MATCHFINDER_HASH3_ORDER);
next_hashes[1] = lz_hash(next_hashseq, BT_MATCHFINDER_HASH4_ORDER);
prefetchw(&mf->hash3_tab[next_hashes[0]]);
prefetchw(&mf->hash4_tab[next_hashes[1]]);
cur_node = mf->hash3_tab[hash3][0];
mf->hash3_tab[hash3][0] = cur_pos;
#if BT_MATCHFINDER_HASH3_WAYS >= 2
cur_node_2 = mf->hash3_tab[hash3][1];
mf->hash3_tab[hash3][1] = cur_node;
#endif
if (record_matches && cur_node > cutoff) {
u32 seq3 = load_u24_unaligned(in_next);
if (seq3 == load_u24_unaligned(&in_base[cur_node])) {
lz_matchptr->length = 3;
lz_matchptr->offset = in_next - &in_base[cur_node];
lz_matchptr++;
}
#if BT_MATCHFINDER_HASH3_WAYS >= 2
else if (cur_node_2 > cutoff &&
seq3 == load_u24_unaligned(&in_base[cur_node_2]))
{
lz_matchptr->length = 3;
lz_matchptr->offset = in_next - &in_base[cur_node_2];
lz_matchptr++;
}
#endif
}
cur_node = mf->hash4_tab[hash4];
mf->hash4_tab[hash4] = cur_pos;
pending_lt_ptr = bt_left_child(mf, cur_pos);
pending_gt_ptr = bt_right_child(mf, cur_pos);
if (cur_node <= cutoff) {
*pending_lt_ptr = MATCHFINDER_INITVAL;
*pending_gt_ptr = MATCHFINDER_INITVAL;
return lz_matchptr;
}
best_lt_len = 0;
best_gt_len = 0;
len = 0;
for (;;) {
matchptr = &in_base[cur_node];
if (matchptr[len] == in_next[len]) {
len = lz_extend(in_next, matchptr, len + 1, max_len);
if (!record_matches || len > best_len) {
if (record_matches) {
best_len = len;
lz_matchptr->length = len;
lz_matchptr->offset = in_next - matchptr;
lz_matchptr++;
}
if (len >= nice_len) {
*pending_lt_ptr = *bt_left_child(mf, cur_node);
*pending_gt_ptr = *bt_right_child(mf, cur_node);
return lz_matchptr;
}
}
}
if (matchptr[len] < in_next[len]) {
*pending_lt_ptr = cur_node;
pending_lt_ptr = bt_right_child(mf, cur_node);
cur_node = *pending_lt_ptr;
best_lt_len = len;
if (best_gt_len < len)
len = best_gt_len;
} else {
*pending_gt_ptr = cur_node;
pending_gt_ptr = bt_left_child(mf, cur_node);
cur_node = *pending_gt_ptr;
best_gt_len = len;
if (best_lt_len < len)
len = best_lt_len;
}
if (cur_node <= cutoff || !--depth_remaining) {
*pending_lt_ptr = MATCHFINDER_INITVAL;
*pending_gt_ptr = MATCHFINDER_INITVAL;
return lz_matchptr;
}
}
}
/*
* Retrieve a list of matches with the current position.
*
* @mf
* The matchfinder structure.
* @in_base
* Pointer to the next byte in the input buffer to process _at the last
* time bt_matchfinder_init() or bt_matchfinder_slide_window() was called_.
* @cur_pos
* The current position in the input buffer relative to @in_base (the
* position of the sequence being matched against).
* @max_len
* The maximum permissible match length at this position. Must be >=
* BT_MATCHFINDER_REQUIRED_NBYTES.
* @nice_len
* Stop searching if a match of at least this length is found.
* Must be <= @max_len.
* @max_search_depth
* Limit on the number of potential matches to consider. Must be >= 1.
* @next_hashes
* The precomputed hash codes for the sequence beginning at @in_next.
* These will be used and then updated with the precomputed hashcodes for
* the sequence beginning at @in_next + 1.
* @lz_matchptr
* An array in which this function will record the matches. The recorded
* matches will be sorted by strictly increasing length and (non-strictly)
* increasing offset. The maximum number of matches that may be found is
* 'nice_len - 2'.
*
* The return value is a pointer to the next available slot in the @lz_matchptr
* array. (If no matches were found, this will be the same as @lz_matchptr.)
*/
static forceinline struct lz_match *
bt_matchfinder_get_matches(struct bt_matchfinder *mf,
const u8 *in_base,
ptrdiff_t cur_pos,
u32 max_len,
u32 nice_len,
u32 max_search_depth,
u32 next_hashes[2],
struct lz_match *lz_matchptr)
{
return bt_matchfinder_advance_one_byte(mf,
in_base,
cur_pos,
max_len,
nice_len,
max_search_depth,
next_hashes,
lz_matchptr,
true);
}
/*
* Advance the matchfinder, but don't record any matches.
*
* This is very similar to bt_matchfinder_get_matches() because both functions
* must do hashing and tree re-rooting.
*/
static forceinline void
bt_matchfinder_skip_byte(struct bt_matchfinder *mf,
const u8 *in_base,
ptrdiff_t cur_pos,
u32 nice_len,
u32 max_search_depth,
u32 next_hashes[2])
{
bt_matchfinder_advance_one_byte(mf,
in_base,
cur_pos,
nice_len,
nice_len,
max_search_depth,
next_hashes,
NULL,
false);
}
#endif /* LIB_BT_MATCHFINDER_H */

View file

@ -1,93 +0,0 @@
/*
* cpu_features_common.h - code shared by all lib/$arch/cpu_features.c
*
* Copyright 2020 Eric Biggers
*
* Permission is hereby granted, free of charge, to any person
* obtaining a copy of this software and associated documentation
* files (the "Software"), to deal in the Software without
* restriction, including without limitation the rights to use,
* copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following
* conditions:
*
* The above copyright notice and this permission notice shall be
* included in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
* OTHER DEALINGS IN THE SOFTWARE.
*/
#ifndef LIB_CPU_FEATURES_COMMON_H
#define LIB_CPU_FEATURES_COMMON_H
#if defined(TEST_SUPPORT__DO_NOT_USE) && !defined(FREESTANDING)
/* for strdup() and strtok_r() */
# undef _ANSI_SOURCE
# ifndef __APPLE__
# undef _GNU_SOURCE
# define _GNU_SOURCE
# endif
# include <stdio.h>
# include <stdlib.h>
# include <string.h>
#endif
#include "lib_common.h"
struct cpu_feature {
u32 bit;
const char *name;
};
#if defined(TEST_SUPPORT__DO_NOT_USE) && !defined(FREESTANDING)
/* Disable any features that are listed in $LIBDEFLATE_DISABLE_CPU_FEATURES. */
static inline void
disable_cpu_features_for_testing(u32 *features,
const struct cpu_feature *feature_table,
size_t feature_table_length)
{
char *env_value, *strbuf, *p, *saveptr = NULL;
size_t i;
env_value = getenv("LIBDEFLATE_DISABLE_CPU_FEATURES");
if (!env_value)
return;
strbuf = strdup(env_value);
if (!strbuf)
abort();
p = strtok_r(strbuf, ",", &saveptr);
while (p) {
for (i = 0; i < feature_table_length; i++) {
if (strcmp(p, feature_table[i].name) == 0) {
*features &= ~feature_table[i].bit;
break;
}
}
if (i == feature_table_length) {
fprintf(stderr,
"unrecognized feature in LIBDEFLATE_DISABLE_CPU_FEATURES: \"%s\"\n",
p);
abort();
}
p = strtok_r(NULL, ",", &saveptr);
}
free(strbuf);
}
#else /* TEST_SUPPORT__DO_NOT_USE */
static inline void
disable_cpu_features_for_testing(u32 *features,
const struct cpu_feature *feature_table,
size_t feature_table_length)
{
}
#endif /* !TEST_SUPPORT__DO_NOT_USE */
#endif /* LIB_CPU_FEATURES_COMMON_H */

View file

@ -1,262 +0,0 @@
/*
* crc32.c - CRC-32 checksum algorithm for the gzip format
*
* Copyright 2016 Eric Biggers
*
* Permission is hereby granted, free of charge, to any person
* obtaining a copy of this software and associated documentation
* files (the "Software"), to deal in the Software without
* restriction, including without limitation the rights to use,
* copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following
* conditions:
*
* The above copyright notice and this permission notice shall be
* included in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
* OTHER DEALINGS IN THE SOFTWARE.
*/
/*
* High-level description of CRC
* =============================
*
* Consider a bit sequence 'bits[1...len]'. Interpret 'bits' as the "message"
* polynomial M(x) with coefficients in GF(2) (the field of integers modulo 2),
* where the coefficient of 'x^i' is 'bits[len - i]'. Then, compute:
*
* R(x) = M(x)*x^n mod G(x)
*
* where G(x) is a selected "generator" polynomial of degree 'n'. The remainder
* R(x) is a polynomial of max degree 'n - 1'. The CRC of 'bits' is R(x)
* interpreted as a bitstring of length 'n'.
*
* CRC used in gzip
* ================
*
* In the gzip format (RFC 1952):
*
* - The bitstring to checksum is formed from the bytes of the uncompressed
* data by concatenating the bits from the bytes in order, proceeding
* from the low-order bit to the high-order bit within each byte.
*
* - The generator polynomial G(x) is: x^32 + x^26 + x^23 + x^22 + x^16 +
* x^12 + x^11 + x^10 + x^8 + x^7 + x^5 + x^4 + x^2 + x + 1.
* Consequently, the CRC length is 32 bits ("CRC-32").
*
* - The highest order 32 coefficients of M(x)*x^n are inverted.
*
* - All 32 coefficients of R(x) are inverted.
*
* The two inversions cause added leading and trailing zero bits to affect the
* resulting CRC, whereas with a regular CRC such bits would have no effect on
* the CRC.
*
* Computation and optimizations
* =============================
*
* We can compute R(x) through "long division", maintaining only 32 bits of
* state at any given time. Multiplication by 'x' can be implemented as
* right-shifting by 1 (assuming the polynomial<=>bitstring mapping where the
* highest order bit represents the coefficient of x^0), and both addition and
* subtraction can be implemented as bitwise exclusive OR (since we are working
* in GF(2)). Here is an unoptimized implementation:
*
* static u32 crc32_gzip(const u8 *p, size_t len)
* {
* u32 crc = 0;
* const u32 divisor = 0xEDB88320;
*
* for (size_t i = 0; i < len * 8 + 32; i++) {
* int bit;
* u32 multiple;
*
* if (i < len * 8)
* bit = (p[i / 8] >> (i % 8)) & 1;
* else
* bit = 0; // one of the 32 appended 0 bits
*
* if (i < 32) // the first 32 bits are inverted
* bit ^= 1;
*
* if (crc & 1)
* multiple = divisor;
* else
* multiple = 0;
*
* crc >>= 1;
* crc |= (u32)bit << 31;
* crc ^= multiple;
* }
*
* return ~crc;
* }
*
* In this implementation, the 32-bit integer 'crc' maintains the remainder of
* the currently processed portion of the message (with 32 zero bits appended)
* when divided by the generator polynomial. 'crc' is the representation of
* R(x), and 'divisor' is the representation of G(x) excluding the x^32
* coefficient. For each bit to process, we multiply R(x) by 'x^1', then add
* 'x^0' if the new bit is a 1. If this causes R(x) to gain a nonzero x^32
* term, then we subtract G(x) from R(x).
*
* We can speed this up by taking advantage of the fact that XOR is commutative
* and associative, so the order in which we combine the inputs into 'crc' is
* unimportant. And since each message bit we add doesn't affect the choice of
* 'multiple' until 32 bits later, we need not actually add each message bit
* until that point:
*
* static u32 crc32_gzip(const u8 *p, size_t len)
* {
* u32 crc = ~0;
* const u32 divisor = 0xEDB88320;
*
* for (size_t i = 0; i < len * 8; i++) {
* int bit;
* u32 multiple;
*
* bit = (p[i / 8] >> (i % 8)) & 1;
* crc ^= bit;
* if (crc & 1)
* multiple = divisor;
* else
* multiple = 0;
* crc >>= 1;
* crc ^= multiple;
* }
*
* return ~crc;
* }
*
* With the above implementation we get the effect of 32 appended 0 bits for
* free; they never affect the choice of a divisor, nor would they change the
* value of 'crc' if they were to be actually XOR'ed in. And by starting with a
* remainder of all 1 bits, we get the effect of complementing the first 32
* message bits.
*
* The next optimization is to process the input in multi-bit units. Suppose
* that we insert the next 'n' message bits into the remainder. Then we get an
* intermediate remainder of length '32 + n' bits, and the CRC of the extra 'n'
* bits is the amount by which the low 32 bits of the remainder will change as a
* result of cancelling out those 'n' bits. Taking n=8 (one byte) and
* precomputing a table containing the CRC of each possible byte, we get
* crc32_slice1() defined below.
*
* As a further optimization, we could increase the multi-bit unit size to 16.
* However, that is inefficient because the table size explodes from 256 entries
* (1024 bytes) to 65536 entries (262144 bytes), which wastes memory and won't
* fit in L1 cache on typical processors.
*
* However, we can actually process 4 bytes at a time using 4 different tables
* with 256 entries each. Logically, we form a 64-bit intermediate remainder
* and cancel out the high 32 bits in 8-bit chunks. Bits 32-39 are cancelled
* out by the CRC of those bits, whereas bits 40-47 are be cancelled out by the
* CRC of those bits with 8 zero bits appended, and so on.
*
* In crc32_slice8(), this method is extended to 8 bytes at a time. The
* intermediate remainder (which we never actually store explicitly) is 96 bits.
*
* On CPUs that support fast carryless multiplication, CRCs can be computed even
* more quickly via "folding". See e.g. the x86 PCLMUL implementation.
*/
#include "lib_common.h"
#include "crc32_multipliers.h"
#include "crc32_tables.h"
/* This is the default implementation. It uses the slice-by-8 method. */
static u32 MAYBE_UNUSED
crc32_slice8(u32 crc, const u8 *p, size_t len)
{
const u8 * const end = p + len;
const u8 *end64;
for (; ((uintptr_t)p & 7) && p != end; p++)
crc = (crc >> 8) ^ crc32_slice8_table[(u8)crc ^ *p];
end64 = p + ((end - p) & ~7);
for (; p != end64; p += 8) {
u32 v1 = le32_bswap(*(const u32 *)(p + 0));
u32 v2 = le32_bswap(*(const u32 *)(p + 4));
crc = crc32_slice8_table[0x700 + (u8)((crc ^ v1) >> 0)] ^
crc32_slice8_table[0x600 + (u8)((crc ^ v1) >> 8)] ^
crc32_slice8_table[0x500 + (u8)((crc ^ v1) >> 16)] ^
crc32_slice8_table[0x400 + (u8)((crc ^ v1) >> 24)] ^
crc32_slice8_table[0x300 + (u8)(v2 >> 0)] ^
crc32_slice8_table[0x200 + (u8)(v2 >> 8)] ^
crc32_slice8_table[0x100 + (u8)(v2 >> 16)] ^
crc32_slice8_table[0x000 + (u8)(v2 >> 24)];
}
for (; p != end; p++)
crc = (crc >> 8) ^ crc32_slice8_table[(u8)crc ^ *p];
return crc;
}
/*
* This is a more lightweight generic implementation, which can be used as a
* subroutine by architecture-specific implementations to process small amounts
* of unaligned data at the beginning and/or end of the buffer.
*/
static forceinline u32 MAYBE_UNUSED
crc32_slice1(u32 crc, const u8 *p, size_t len)
{
size_t i;
for (i = 0; i < len; i++)
crc = (crc >> 8) ^ crc32_slice1_table[(u8)crc ^ p[i]];
return crc;
}
/* Include architecture-specific implementation(s) if available. */
#undef DEFAULT_IMPL
#undef arch_select_crc32_func
typedef u32 (*crc32_func_t)(u32 crc, const u8 *p, size_t len);
#if defined(ARCH_ARM32) || defined(ARCH_ARM64)
# include "arm/crc32_impl.h"
#elif defined(ARCH_X86_32) || defined(ARCH_X86_64)
# include "x86/crc32_impl.h"
#endif
#ifndef DEFAULT_IMPL
# define DEFAULT_IMPL crc32_slice8
#endif
#ifdef arch_select_crc32_func
static u32 dispatch_crc32(u32 crc, const u8 *p, size_t len);
static volatile crc32_func_t crc32_impl = dispatch_crc32;
/* Choose the best implementation at runtime. */
static u32 dispatch_crc32(u32 crc, const u8 *p, size_t len)
{
crc32_func_t f = arch_select_crc32_func();
if (f == NULL)
f = DEFAULT_IMPL;
crc32_impl = f;
return f(crc, p, len);
}
#else
/* The best implementation is statically known, so call it directly. */
#define crc32_impl DEFAULT_IMPL
#endif
LIBDEFLATEAPI u32
libdeflate_crc32(u32 crc, const void *p, size_t len)
{
if (p == NULL) /* Return initial value. */
return 0;
return ~crc32_impl(~crc, p, len);
}

View file

@ -1,329 +0,0 @@
/*
* crc32_multipliers.h - constants for CRC-32 folding
*
* THIS FILE WAS GENERATED BY gen_crc32_multipliers.c. DO NOT EDIT.
*/
#define CRC32_1VECS_MULT_1 0xae689191 /* x^159 mod G(x) */
#define CRC32_1VECS_MULT_2 0xccaa009e /* x^95 mod G(x) */
#define CRC32_1VECS_MULTS { CRC32_1VECS_MULT_1, CRC32_1VECS_MULT_2 }
#define CRC32_2VECS_MULT_1 0xf1da05aa /* x^287 mod G(x) */
#define CRC32_2VECS_MULT_2 0x81256527 /* x^223 mod G(x) */
#define CRC32_2VECS_MULTS { CRC32_2VECS_MULT_1, CRC32_2VECS_MULT_2 }
#define CRC32_3VECS_MULT_1 0x3db1ecdc /* x^415 mod G(x) */
#define CRC32_3VECS_MULT_2 0xaf449247 /* x^351 mod G(x) */
#define CRC32_3VECS_MULTS { CRC32_3VECS_MULT_1, CRC32_3VECS_MULT_2 }
#define CRC32_4VECS_MULT_1 0x8f352d95 /* x^543 mod G(x) */
#define CRC32_4VECS_MULT_2 0x1d9513d7 /* x^479 mod G(x) */
#define CRC32_4VECS_MULTS { CRC32_4VECS_MULT_1, CRC32_4VECS_MULT_2 }
#define CRC32_5VECS_MULT_1 0x1c279815 /* x^671 mod G(x) */
#define CRC32_5VECS_MULT_2 0xae0b5394 /* x^607 mod G(x) */
#define CRC32_5VECS_MULTS { CRC32_5VECS_MULT_1, CRC32_5VECS_MULT_2 }
#define CRC32_6VECS_MULT_1 0xdf068dc2 /* x^799 mod G(x) */
#define CRC32_6VECS_MULT_2 0x57c54819 /* x^735 mod G(x) */
#define CRC32_6VECS_MULTS { CRC32_6VECS_MULT_1, CRC32_6VECS_MULT_2 }
#define CRC32_7VECS_MULT_1 0x31f8303f /* x^927 mod G(x) */
#define CRC32_7VECS_MULT_2 0x0cbec0ed /* x^863 mod G(x) */
#define CRC32_7VECS_MULTS { CRC32_7VECS_MULT_1, CRC32_7VECS_MULT_2 }
#define CRC32_8VECS_MULT_1 0x33fff533 /* x^1055 mod G(x) */
#define CRC32_8VECS_MULT_2 0x910eeec1 /* x^991 mod G(x) */
#define CRC32_8VECS_MULTS { CRC32_8VECS_MULT_1, CRC32_8VECS_MULT_2 }
#define CRC32_9VECS_MULT_1 0x26b70c3d /* x^1183 mod G(x) */
#define CRC32_9VECS_MULT_2 0x3f41287a /* x^1119 mod G(x) */
#define CRC32_9VECS_MULTS { CRC32_9VECS_MULT_1, CRC32_9VECS_MULT_2 }
#define CRC32_10VECS_MULT_1 0xe3543be0 /* x^1311 mod G(x) */
#define CRC32_10VECS_MULT_2 0x9026d5b1 /* x^1247 mod G(x) */
#define CRC32_10VECS_MULTS { CRC32_10VECS_MULT_1, CRC32_10VECS_MULT_2 }
#define CRC32_11VECS_MULT_1 0x5a1bb05d /* x^1439 mod G(x) */
#define CRC32_11VECS_MULT_2 0xd1df2327 /* x^1375 mod G(x) */
#define CRC32_11VECS_MULTS { CRC32_11VECS_MULT_1, CRC32_11VECS_MULT_2 }
#define CRC32_12VECS_MULT_1 0x596c8d81 /* x^1567 mod G(x) */
#define CRC32_12VECS_MULT_2 0xf5e48c85 /* x^1503 mod G(x) */
#define CRC32_12VECS_MULTS { CRC32_12VECS_MULT_1, CRC32_12VECS_MULT_2 }
#define CRC32_FINAL_MULT 0xb8bc6765 /* x^63 mod G(x) */
#define CRC32_BARRETT_CONSTANT_1 0x00000001f7011641ULL /* floor(x^64 / G(x)) */
#define CRC32_BARRETT_CONSTANT_2 0x00000001db710641ULL /* G(x) */
#define CRC32_BARRETT_CONSTANTS { CRC32_BARRETT_CONSTANT_1, CRC32_BARRETT_CONSTANT_2 }
#define CRC32_NUM_CHUNKS 4
#define CRC32_MIN_VARIABLE_CHUNK_LEN 128UL
#define CRC32_MAX_VARIABLE_CHUNK_LEN 16384UL
/* Multipliers for implementations that use a variable chunk length */
static const u32 crc32_mults_for_chunklen[][CRC32_NUM_CHUNKS - 1] MAYBE_UNUSED = {
{ 0 /* unused row */ },
/* chunk_len=128 */
{ 0xd31343ea /* x^3039 mod G(x) */, 0xe95c1271 /* x^2015 mod G(x) */, 0x910eeec1 /* x^991 mod G(x) */, },
/* chunk_len=256 */
{ 0x1d6708a0 /* x^6111 mod G(x) */, 0x0c30f51d /* x^4063 mod G(x) */, 0xe95c1271 /* x^2015 mod G(x) */, },
/* chunk_len=384 */
{ 0xdb3839f3 /* x^9183 mod G(x) */, 0x1d6708a0 /* x^6111 mod G(x) */, 0xd31343ea /* x^3039 mod G(x) */, },
/* chunk_len=512 */
{ 0x1753ab84 /* x^12255 mod G(x) */, 0xbbf2f6d6 /* x^8159 mod G(x) */, 0x0c30f51d /* x^4063 mod G(x) */, },
/* chunk_len=640 */
{ 0x3796455c /* x^15327 mod G(x) */, 0xb8e0e4a8 /* x^10207 mod G(x) */, 0xc352f6de /* x^5087 mod G(x) */, },
/* chunk_len=768 */
{ 0x3954de39 /* x^18399 mod G(x) */, 0x1753ab84 /* x^12255 mod G(x) */, 0x1d6708a0 /* x^6111 mod G(x) */, },
/* chunk_len=896 */
{ 0x632d78c5 /* x^21471 mod G(x) */, 0x3fc33de4 /* x^14303 mod G(x) */, 0x9a1b53c8 /* x^7135 mod G(x) */, },
/* chunk_len=1024 */
{ 0xa0decef3 /* x^24543 mod G(x) */, 0x7b4aa8b7 /* x^16351 mod G(x) */, 0xbbf2f6d6 /* x^8159 mod G(x) */, },
/* chunk_len=1152 */
{ 0xe9c09bb0 /* x^27615 mod G(x) */, 0x3954de39 /* x^18399 mod G(x) */, 0xdb3839f3 /* x^9183 mod G(x) */, },
/* chunk_len=1280 */
{ 0xd51917a4 /* x^30687 mod G(x) */, 0xcae68461 /* x^20447 mod G(x) */, 0xb8e0e4a8 /* x^10207 mod G(x) */, },
/* chunk_len=1408 */
{ 0x154a8a62 /* x^33759 mod G(x) */, 0x41e7589c /* x^22495 mod G(x) */, 0x3e9a43cd /* x^11231 mod G(x) */, },
/* chunk_len=1536 */
{ 0xf196555d /* x^36831 mod G(x) */, 0xa0decef3 /* x^24543 mod G(x) */, 0x1753ab84 /* x^12255 mod G(x) */, },
/* chunk_len=1664 */
{ 0x8eec2999 /* x^39903 mod G(x) */, 0xefb0a128 /* x^26591 mod G(x) */, 0x6044fbb0 /* x^13279 mod G(x) */, },
/* chunk_len=1792 */
{ 0x27892abf /* x^42975 mod G(x) */, 0x48d72bb1 /* x^28639 mod G(x) */, 0x3fc33de4 /* x^14303 mod G(x) */, },
/* chunk_len=1920 */
{ 0x77bc2419 /* x^46047 mod G(x) */, 0xd51917a4 /* x^30687 mod G(x) */, 0x3796455c /* x^15327 mod G(x) */, },
/* chunk_len=2048 */
{ 0xcea114a5 /* x^49119 mod G(x) */, 0x68c0a2c5 /* x^32735 mod G(x) */, 0x7b4aa8b7 /* x^16351 mod G(x) */, },
/* chunk_len=2176 */
{ 0xa1077e85 /* x^52191 mod G(x) */, 0x188cc628 /* x^34783 mod G(x) */, 0x0c21f835 /* x^17375 mod G(x) */, },
/* chunk_len=2304 */
{ 0xc5ed75e1 /* x^55263 mod G(x) */, 0xf196555d /* x^36831 mod G(x) */, 0x3954de39 /* x^18399 mod G(x) */, },
/* chunk_len=2432 */
{ 0xca4fba3f /* x^58335 mod G(x) */, 0x0acfa26f /* x^38879 mod G(x) */, 0x6cb21510 /* x^19423 mod G(x) */, },
/* chunk_len=2560 */
{ 0xcf5bcdc4 /* x^61407 mod G(x) */, 0x4fae7fc0 /* x^40927 mod G(x) */, 0xcae68461 /* x^20447 mod G(x) */, },
/* chunk_len=2688 */
{ 0xf36b9d16 /* x^64479 mod G(x) */, 0x27892abf /* x^42975 mod G(x) */, 0x632d78c5 /* x^21471 mod G(x) */, },
/* chunk_len=2816 */
{ 0xf76fd988 /* x^67551 mod G(x) */, 0xed5c39b1 /* x^45023 mod G(x) */, 0x41e7589c /* x^22495 mod G(x) */, },
/* chunk_len=2944 */
{ 0x6c45d92e /* x^70623 mod G(x) */, 0xff809fcd /* x^47071 mod G(x) */, 0x0c46baec /* x^23519 mod G(x) */, },
/* chunk_len=3072 */
{ 0x6116b82b /* x^73695 mod G(x) */, 0xcea114a5 /* x^49119 mod G(x) */, 0xa0decef3 /* x^24543 mod G(x) */, },
/* chunk_len=3200 */
{ 0x4d9899bb /* x^76767 mod G(x) */, 0x9f9d8d9c /* x^51167 mod G(x) */, 0x53deb236 /* x^25567 mod G(x) */, },
/* chunk_len=3328 */
{ 0x3e7c93b9 /* x^79839 mod G(x) */, 0x6666b805 /* x^53215 mod G(x) */, 0xefb0a128 /* x^26591 mod G(x) */, },
/* chunk_len=3456 */
{ 0x388b20ac /* x^82911 mod G(x) */, 0xc5ed75e1 /* x^55263 mod G(x) */, 0xe9c09bb0 /* x^27615 mod G(x) */, },
/* chunk_len=3584 */
{ 0x0956d953 /* x^85983 mod G(x) */, 0x97fbdb14 /* x^57311 mod G(x) */, 0x48d72bb1 /* x^28639 mod G(x) */, },
/* chunk_len=3712 */
{ 0x55cb4dfe /* x^89055 mod G(x) */, 0x1b37c832 /* x^59359 mod G(x) */, 0xc07331b3 /* x^29663 mod G(x) */, },
/* chunk_len=3840 */
{ 0x52222fea /* x^92127 mod G(x) */, 0xcf5bcdc4 /* x^61407 mod G(x) */, 0xd51917a4 /* x^30687 mod G(x) */, },
/* chunk_len=3968 */
{ 0x0603989b /* x^95199 mod G(x) */, 0xb03c8112 /* x^63455 mod G(x) */, 0x5e04b9a5 /* x^31711 mod G(x) */, },
/* chunk_len=4096 */
{ 0x4470c029 /* x^98271 mod G(x) */, 0x2339d155 /* x^65503 mod G(x) */, 0x68c0a2c5 /* x^32735 mod G(x) */, },
/* chunk_len=4224 */
{ 0xb6f35093 /* x^101343 mod G(x) */, 0xf76fd988 /* x^67551 mod G(x) */, 0x154a8a62 /* x^33759 mod G(x) */, },
/* chunk_len=4352 */
{ 0xc46805ba /* x^104415 mod G(x) */, 0x416f9449 /* x^69599 mod G(x) */, 0x188cc628 /* x^34783 mod G(x) */, },
/* chunk_len=4480 */
{ 0xc3876592 /* x^107487 mod G(x) */, 0x4b809189 /* x^71647 mod G(x) */, 0xc35cf6e7 /* x^35807 mod G(x) */, },
/* chunk_len=4608 */
{ 0x5b0c98b9 /* x^110559 mod G(x) */, 0x6116b82b /* x^73695 mod G(x) */, 0xf196555d /* x^36831 mod G(x) */, },
/* chunk_len=4736 */
{ 0x30d13e5f /* x^113631 mod G(x) */, 0x4c5a315a /* x^75743 mod G(x) */, 0x8c224466 /* x^37855 mod G(x) */, },
/* chunk_len=4864 */
{ 0x54afca53 /* x^116703 mod G(x) */, 0xbccfa2c1 /* x^77791 mod G(x) */, 0x0acfa26f /* x^38879 mod G(x) */, },
/* chunk_len=4992 */
{ 0x93102436 /* x^119775 mod G(x) */, 0x3e7c93b9 /* x^79839 mod G(x) */, 0x8eec2999 /* x^39903 mod G(x) */, },
/* chunk_len=5120 */
{ 0xbd2655a8 /* x^122847 mod G(x) */, 0x3e116c9d /* x^81887 mod G(x) */, 0x4fae7fc0 /* x^40927 mod G(x) */, },
/* chunk_len=5248 */
{ 0x70cd7f26 /* x^125919 mod G(x) */, 0x408e57f2 /* x^83935 mod G(x) */, 0x1691be45 /* x^41951 mod G(x) */, },
/* chunk_len=5376 */
{ 0x2d546c53 /* x^128991 mod G(x) */, 0x0956d953 /* x^85983 mod G(x) */, 0x27892abf /* x^42975 mod G(x) */, },
/* chunk_len=5504 */
{ 0xb53410a8 /* x^132063 mod G(x) */, 0x42ebf0ad /* x^88031 mod G(x) */, 0x161f3c12 /* x^43999 mod G(x) */, },
/* chunk_len=5632 */
{ 0x67a93f75 /* x^135135 mod G(x) */, 0xcf3233e4 /* x^90079 mod G(x) */, 0xed5c39b1 /* x^45023 mod G(x) */, },
/* chunk_len=5760 */
{ 0x9830ac33 /* x^138207 mod G(x) */, 0x52222fea /* x^92127 mod G(x) */, 0x77bc2419 /* x^46047 mod G(x) */, },
/* chunk_len=5888 */
{ 0xb0b6fc3e /* x^141279 mod G(x) */, 0x2fde73f8 /* x^94175 mod G(x) */, 0xff809fcd /* x^47071 mod G(x) */, },
/* chunk_len=6016 */
{ 0x84170f16 /* x^144351 mod G(x) */, 0xced90d99 /* x^96223 mod G(x) */, 0x30de0f98 /* x^48095 mod G(x) */, },
/* chunk_len=6144 */
{ 0xd7017a0c /* x^147423 mod G(x) */, 0x4470c029 /* x^98271 mod G(x) */, 0xcea114a5 /* x^49119 mod G(x) */, },
/* chunk_len=6272 */
{ 0xadb25de6 /* x^150495 mod G(x) */, 0x84f40beb /* x^100319 mod G(x) */, 0x2b7e0e1b /* x^50143 mod G(x) */, },
/* chunk_len=6400 */
{ 0x8282fddc /* x^153567 mod G(x) */, 0xec855937 /* x^102367 mod G(x) */, 0x9f9d8d9c /* x^51167 mod G(x) */, },
/* chunk_len=6528 */
{ 0x46362bee /* x^156639 mod G(x) */, 0xc46805ba /* x^104415 mod G(x) */, 0xa1077e85 /* x^52191 mod G(x) */, },
/* chunk_len=6656 */
{ 0xb9077a01 /* x^159711 mod G(x) */, 0xdf7a24ac /* x^106463 mod G(x) */, 0x6666b805 /* x^53215 mod G(x) */, },
/* chunk_len=6784 */
{ 0xf51d9bc6 /* x^162783 mod G(x) */, 0x2b52dc39 /* x^108511 mod G(x) */, 0x7e774cf6 /* x^54239 mod G(x) */, },
/* chunk_len=6912 */
{ 0x4ca19a29 /* x^165855 mod G(x) */, 0x5b0c98b9 /* x^110559 mod G(x) */, 0xc5ed75e1 /* x^55263 mod G(x) */, },
/* chunk_len=7040 */
{ 0xdc0fc3fc /* x^168927 mod G(x) */, 0xb939fcdf /* x^112607 mod G(x) */, 0x3678fed2 /* x^56287 mod G(x) */, },
/* chunk_len=7168 */
{ 0x63c3d167 /* x^171999 mod G(x) */, 0x70f9947d /* x^114655 mod G(x) */, 0x97fbdb14 /* x^57311 mod G(x) */, },
/* chunk_len=7296 */
{ 0x5851d254 /* x^175071 mod G(x) */, 0x54afca53 /* x^116703 mod G(x) */, 0xca4fba3f /* x^58335 mod G(x) */, },
/* chunk_len=7424 */
{ 0xfeacf2a1 /* x^178143 mod G(x) */, 0x7a3c0a6a /* x^118751 mod G(x) */, 0x1b37c832 /* x^59359 mod G(x) */, },
/* chunk_len=7552 */
{ 0x93b7edc8 /* x^181215 mod G(x) */, 0x1fea4d2a /* x^120799 mod G(x) */, 0x58fa96ee /* x^60383 mod G(x) */, },
/* chunk_len=7680 */
{ 0x5539e44a /* x^184287 mod G(x) */, 0xbd2655a8 /* x^122847 mod G(x) */, 0xcf5bcdc4 /* x^61407 mod G(x) */, },
/* chunk_len=7808 */
{ 0xde32a3d2 /* x^187359 mod G(x) */, 0x4ff61aa1 /* x^124895 mod G(x) */, 0x6a6a3694 /* x^62431 mod G(x) */, },
/* chunk_len=7936 */
{ 0xf0baeeb6 /* x^190431 mod G(x) */, 0x7ae2f6f4 /* x^126943 mod G(x) */, 0xb03c8112 /* x^63455 mod G(x) */, },
/* chunk_len=8064 */
{ 0xbe15887f /* x^193503 mod G(x) */, 0x2d546c53 /* x^128991 mod G(x) */, 0xf36b9d16 /* x^64479 mod G(x) */, },
/* chunk_len=8192 */
{ 0x64f34a05 /* x^196575 mod G(x) */, 0xe0ee5efe /* x^131039 mod G(x) */, 0x2339d155 /* x^65503 mod G(x) */, },
/* chunk_len=8320 */
{ 0x1b6d1aea /* x^199647 mod G(x) */, 0xfeafb67c /* x^133087 mod G(x) */, 0x4fb001a8 /* x^66527 mod G(x) */, },
/* chunk_len=8448 */
{ 0x82adb0b8 /* x^202719 mod G(x) */, 0x67a93f75 /* x^135135 mod G(x) */, 0xf76fd988 /* x^67551 mod G(x) */, },
/* chunk_len=8576 */
{ 0x694587c7 /* x^205791 mod G(x) */, 0x3b34408b /* x^137183 mod G(x) */, 0xeccb2978 /* x^68575 mod G(x) */, },
/* chunk_len=8704 */
{ 0xd2fc57c3 /* x^208863 mod G(x) */, 0x07fcf8c6 /* x^139231 mod G(x) */, 0x416f9449 /* x^69599 mod G(x) */, },
/* chunk_len=8832 */
{ 0x9dd6837c /* x^211935 mod G(x) */, 0xb0b6fc3e /* x^141279 mod G(x) */, 0x6c45d92e /* x^70623 mod G(x) */, },
/* chunk_len=8960 */
{ 0x3a9d1f97 /* x^215007 mod G(x) */, 0xefd033b2 /* x^143327 mod G(x) */, 0x4b809189 /* x^71647 mod G(x) */, },
/* chunk_len=9088 */
{ 0x1eee1d2a /* x^218079 mod G(x) */, 0xf2a6e46e /* x^145375 mod G(x) */, 0x55b4c814 /* x^72671 mod G(x) */, },
/* chunk_len=9216 */
{ 0xb57c7728 /* x^221151 mod G(x) */, 0xd7017a0c /* x^147423 mod G(x) */, 0x6116b82b /* x^73695 mod G(x) */, },
/* chunk_len=9344 */
{ 0xf2fc5d61 /* x^224223 mod G(x) */, 0x242aac86 /* x^149471 mod G(x) */, 0x05245cf0 /* x^74719 mod G(x) */, },
/* chunk_len=9472 */
{ 0x26387824 /* x^227295 mod G(x) */, 0xc15c4ca5 /* x^151519 mod G(x) */, 0x4c5a315a /* x^75743 mod G(x) */, },
/* chunk_len=9600 */
{ 0x8c151e77 /* x^230367 mod G(x) */, 0x8282fddc /* x^153567 mod G(x) */, 0x4d9899bb /* x^76767 mod G(x) */, },
/* chunk_len=9728 */
{ 0x8ea1f680 /* x^233439 mod G(x) */, 0xf5ff6cdd /* x^155615 mod G(x) */, 0xbccfa2c1 /* x^77791 mod G(x) */, },
/* chunk_len=9856 */
{ 0xe8cf3d2a /* x^236511 mod G(x) */, 0x338b1fb1 /* x^157663 mod G(x) */, 0xeda61f70 /* x^78815 mod G(x) */, },
/* chunk_len=9984 */
{ 0x21f15b59 /* x^239583 mod G(x) */, 0xb9077a01 /* x^159711 mod G(x) */, 0x3e7c93b9 /* x^79839 mod G(x) */, },
/* chunk_len=10112 */
{ 0x6f68d64a /* x^242655 mod G(x) */, 0x901b0161 /* x^161759 mod G(x) */, 0xb9fd3537 /* x^80863 mod G(x) */, },
/* chunk_len=10240 */
{ 0x71b74d95 /* x^245727 mod G(x) */, 0xf5ddd5ad /* x^163807 mod G(x) */, 0x3e116c9d /* x^81887 mod G(x) */, },
/* chunk_len=10368 */
{ 0x4c2e7261 /* x^248799 mod G(x) */, 0x4ca19a29 /* x^165855 mod G(x) */, 0x388b20ac /* x^82911 mod G(x) */, },
/* chunk_len=10496 */
{ 0x8a2d38e8 /* x^251871 mod G(x) */, 0xd27ee0a1 /* x^167903 mod G(x) */, 0x408e57f2 /* x^83935 mod G(x) */, },
/* chunk_len=10624 */
{ 0x7e58ca17 /* x^254943 mod G(x) */, 0x69dfedd2 /* x^169951 mod G(x) */, 0x3a76805e /* x^84959 mod G(x) */, },
/* chunk_len=10752 */
{ 0xf997967f /* x^258015 mod G(x) */, 0x63c3d167 /* x^171999 mod G(x) */, 0x0956d953 /* x^85983 mod G(x) */, },
/* chunk_len=10880 */
{ 0x48215963 /* x^261087 mod G(x) */, 0x71e1dfe0 /* x^174047 mod G(x) */, 0x42a6d410 /* x^87007 mod G(x) */, },
/* chunk_len=11008 */
{ 0xa704b94c /* x^264159 mod G(x) */, 0x679f198a /* x^176095 mod G(x) */, 0x42ebf0ad /* x^88031 mod G(x) */, },
/* chunk_len=11136 */
{ 0x1d699056 /* x^267231 mod G(x) */, 0xfeacf2a1 /* x^178143 mod G(x) */, 0x55cb4dfe /* x^89055 mod G(x) */, },
/* chunk_len=11264 */
{ 0x6800bcc5 /* x^270303 mod G(x) */, 0x16024f15 /* x^180191 mod G(x) */, 0xcf3233e4 /* x^90079 mod G(x) */, },
/* chunk_len=11392 */
{ 0x2d48e4ca /* x^273375 mod G(x) */, 0xbe61582f /* x^182239 mod G(x) */, 0x46026283 /* x^91103 mod G(x) */, },
/* chunk_len=11520 */
{ 0x4c4c2b55 /* x^276447 mod G(x) */, 0x5539e44a /* x^184287 mod G(x) */, 0x52222fea /* x^92127 mod G(x) */, },
/* chunk_len=11648 */
{ 0xd8ce94cb /* x^279519 mod G(x) */, 0xbc613c26 /* x^186335 mod G(x) */, 0x33776b4b /* x^93151 mod G(x) */, },
/* chunk_len=11776 */
{ 0xd0b5a02b /* x^282591 mod G(x) */, 0x490d3cc6 /* x^188383 mod G(x) */, 0x2fde73f8 /* x^94175 mod G(x) */, },
/* chunk_len=11904 */
{ 0xa223f7ec /* x^285663 mod G(x) */, 0xf0baeeb6 /* x^190431 mod G(x) */, 0x0603989b /* x^95199 mod G(x) */, },
/* chunk_len=12032 */
{ 0x58de337a /* x^288735 mod G(x) */, 0x3bf3d597 /* x^192479 mod G(x) */, 0xced90d99 /* x^96223 mod G(x) */, },
/* chunk_len=12160 */
{ 0x37f5d8f4 /* x^291807 mod G(x) */, 0x4d5b699b /* x^194527 mod G(x) */, 0xd7262e5f /* x^97247 mod G(x) */, },
/* chunk_len=12288 */
{ 0xfa8a435d /* x^294879 mod G(x) */, 0x64f34a05 /* x^196575 mod G(x) */, 0x4470c029 /* x^98271 mod G(x) */, },
/* chunk_len=12416 */
{ 0x238709fe /* x^297951 mod G(x) */, 0x52e7458f /* x^198623 mod G(x) */, 0x9a174cd3 /* x^99295 mod G(x) */, },
/* chunk_len=12544 */
{ 0x9e1ba6f5 /* x^301023 mod G(x) */, 0xef0272f7 /* x^200671 mod G(x) */, 0x84f40beb /* x^100319 mod G(x) */, },
/* chunk_len=12672 */
{ 0xcd8b57fa /* x^304095 mod G(x) */, 0x82adb0b8 /* x^202719 mod G(x) */, 0xb6f35093 /* x^101343 mod G(x) */, },
/* chunk_len=12800 */
{ 0x0aed142f /* x^307167 mod G(x) */, 0xb1650290 /* x^204767 mod G(x) */, 0xec855937 /* x^102367 mod G(x) */, },
/* chunk_len=12928 */
{ 0xd1f064db /* x^310239 mod G(x) */, 0x6e7340d3 /* x^206815 mod G(x) */, 0x5c28cb52 /* x^103391 mod G(x) */, },
/* chunk_len=13056 */
{ 0x464ac895 /* x^313311 mod G(x) */, 0xd2fc57c3 /* x^208863 mod G(x) */, 0xc46805ba /* x^104415 mod G(x) */, },
/* chunk_len=13184 */
{ 0xa0e6beea /* x^316383 mod G(x) */, 0xcfeec3d0 /* x^210911 mod G(x) */, 0x0225d214 /* x^105439 mod G(x) */, },
/* chunk_len=13312 */
{ 0x78703ce0 /* x^319455 mod G(x) */, 0xc60f6075 /* x^212959 mod G(x) */, 0xdf7a24ac /* x^106463 mod G(x) */, },
/* chunk_len=13440 */
{ 0xfea48165 /* x^322527 mod G(x) */, 0x3a9d1f97 /* x^215007 mod G(x) */, 0xc3876592 /* x^107487 mod G(x) */, },
/* chunk_len=13568 */
{ 0xdb89b8db /* x^325599 mod G(x) */, 0xa6172211 /* x^217055 mod G(x) */, 0x2b52dc39 /* x^108511 mod G(x) */, },
/* chunk_len=13696 */
{ 0x7ca03731 /* x^328671 mod G(x) */, 0x1db42849 /* x^219103 mod G(x) */, 0xc5df246e /* x^109535 mod G(x) */, },
/* chunk_len=13824 */
{ 0x8801d0aa /* x^331743 mod G(x) */, 0xb57c7728 /* x^221151 mod G(x) */, 0x5b0c98b9 /* x^110559 mod G(x) */, },
/* chunk_len=13952 */
{ 0xf89cd7f0 /* x^334815 mod G(x) */, 0xcc396a0b /* x^223199 mod G(x) */, 0xdb799c51 /* x^111583 mod G(x) */, },
/* chunk_len=14080 */
{ 0x1611a808 /* x^337887 mod G(x) */, 0xaeae6105 /* x^225247 mod G(x) */, 0xb939fcdf /* x^112607 mod G(x) */, },
/* chunk_len=14208 */
{ 0xe3cdb888 /* x^340959 mod G(x) */, 0x26387824 /* x^227295 mod G(x) */, 0x30d13e5f /* x^113631 mod G(x) */, },
/* chunk_len=14336 */
{ 0x552a4cf6 /* x^344031 mod G(x) */, 0xee2d04bb /* x^229343 mod G(x) */, 0x70f9947d /* x^114655 mod G(x) */, },
/* chunk_len=14464 */
{ 0x85e248e9 /* x^347103 mod G(x) */, 0x0a79663f /* x^231391 mod G(x) */, 0x53339cf7 /* x^115679 mod G(x) */, },
/* chunk_len=14592 */
{ 0x1c61c3e9 /* x^350175 mod G(x) */, 0x8ea1f680 /* x^233439 mod G(x) */, 0x54afca53 /* x^116703 mod G(x) */, },
/* chunk_len=14720 */
{ 0xb14cfc2b /* x^353247 mod G(x) */, 0x2e073302 /* x^235487 mod G(x) */, 0x10897992 /* x^117727 mod G(x) */, },
/* chunk_len=14848 */
{ 0x6ec444cc /* x^356319 mod G(x) */, 0x9e819f13 /* x^237535 mod G(x) */, 0x7a3c0a6a /* x^118751 mod G(x) */, },
/* chunk_len=14976 */
{ 0xe2fa5f80 /* x^359391 mod G(x) */, 0x21f15b59 /* x^239583 mod G(x) */, 0x93102436 /* x^119775 mod G(x) */, },
/* chunk_len=15104 */
{ 0x6d33f4c6 /* x^362463 mod G(x) */, 0x31a27455 /* x^241631 mod G(x) */, 0x1fea4d2a /* x^120799 mod G(x) */, },
/* chunk_len=15232 */
{ 0xb6dec609 /* x^365535 mod G(x) */, 0x4d437056 /* x^243679 mod G(x) */, 0x42eb1e2a /* x^121823 mod G(x) */, },
/* chunk_len=15360 */
{ 0x1846c518 /* x^368607 mod G(x) */, 0x71b74d95 /* x^245727 mod G(x) */, 0xbd2655a8 /* x^122847 mod G(x) */, },
/* chunk_len=15488 */
{ 0x9f947f8a /* x^371679 mod G(x) */, 0x2b501619 /* x^247775 mod G(x) */, 0xa4924b0e /* x^123871 mod G(x) */, },
/* chunk_len=15616 */
{ 0xb7442f4d /* x^374751 mod G(x) */, 0xba30a5d8 /* x^249823 mod G(x) */, 0x4ff61aa1 /* x^124895 mod G(x) */, },
/* chunk_len=15744 */
{ 0xe2c93242 /* x^377823 mod G(x) */, 0x8a2d38e8 /* x^251871 mod G(x) */, 0x70cd7f26 /* x^125919 mod G(x) */, },
/* chunk_len=15872 */
{ 0xcd6863df /* x^380895 mod G(x) */, 0x78fd88dc /* x^253919 mod G(x) */, 0x7ae2f6f4 /* x^126943 mod G(x) */, },
/* chunk_len=16000 */
{ 0xd512001d /* x^383967 mod G(x) */, 0xe6612dff /* x^255967 mod G(x) */, 0x5c4d0ca9 /* x^127967 mod G(x) */, },
/* chunk_len=16128 */
{ 0x4e8d6b6c /* x^387039 mod G(x) */, 0xf997967f /* x^258015 mod G(x) */, 0x2d546c53 /* x^128991 mod G(x) */, },
/* chunk_len=16256 */
{ 0xfa653ba1 /* x^390111 mod G(x) */, 0xc99014d4 /* x^260063 mod G(x) */, 0xa0c9fd27 /* x^130015 mod G(x) */, },
/* chunk_len=16384 */
{ 0x49893408 /* x^393183 mod G(x) */, 0x29c2448b /* x^262111 mod G(x) */, 0xe0ee5efe /* x^131039 mod G(x) */, },
};
/* Multipliers for implementations that use a large fixed chunk length */
#define CRC32_FIXED_CHUNK_LEN 32768UL
#define CRC32_FIXED_CHUNK_MULT_1 0x29c2448b /* x^262111 mod G(x) */
#define CRC32_FIXED_CHUNK_MULT_2 0x4b912f53 /* x^524255 mod G(x) */
#define CRC32_FIXED_CHUNK_MULT_3 0x454c93be /* x^786399 mod G(x) */

View file

@ -1,587 +0,0 @@
/*
* crc32_tables.h - data tables for CRC-32 computation
*
* THIS FILE WAS GENERATED BY gen_crc32_tables.c. DO NOT EDIT.
*/
static const u32 crc32_slice1_table[] MAYBE_UNUSED = {
0x00000000, 0x77073096, 0xee0e612c, 0x990951ba,
0x076dc419, 0x706af48f, 0xe963a535, 0x9e6495a3,
0x0edb8832, 0x79dcb8a4, 0xe0d5e91e, 0x97d2d988,
0x09b64c2b, 0x7eb17cbd, 0xe7b82d07, 0x90bf1d91,
0x1db71064, 0x6ab020f2, 0xf3b97148, 0x84be41de,
0x1adad47d, 0x6ddde4eb, 0xf4d4b551, 0x83d385c7,
0x136c9856, 0x646ba8c0, 0xfd62f97a, 0x8a65c9ec,
0x14015c4f, 0x63066cd9, 0xfa0f3d63, 0x8d080df5,
0x3b6e20c8, 0x4c69105e, 0xd56041e4, 0xa2677172,
0x3c03e4d1, 0x4b04d447, 0xd20d85fd, 0xa50ab56b,
0x35b5a8fa, 0x42b2986c, 0xdbbbc9d6, 0xacbcf940,
0x32d86ce3, 0x45df5c75, 0xdcd60dcf, 0xabd13d59,
0x26d930ac, 0x51de003a, 0xc8d75180, 0xbfd06116,
0x21b4f4b5, 0x56b3c423, 0xcfba9599, 0xb8bda50f,
0x2802b89e, 0x5f058808, 0xc60cd9b2, 0xb10be924,
0x2f6f7c87, 0x58684c11, 0xc1611dab, 0xb6662d3d,
0x76dc4190, 0x01db7106, 0x98d220bc, 0xefd5102a,
0x71b18589, 0x06b6b51f, 0x9fbfe4a5, 0xe8b8d433,
0x7807c9a2, 0x0f00f934, 0x9609a88e, 0xe10e9818,
0x7f6a0dbb, 0x086d3d2d, 0x91646c97, 0xe6635c01,
0x6b6b51f4, 0x1c6c6162, 0x856530d8, 0xf262004e,
0x6c0695ed, 0x1b01a57b, 0x8208f4c1, 0xf50fc457,
0x65b0d9c6, 0x12b7e950, 0x8bbeb8ea, 0xfcb9887c,
0x62dd1ddf, 0x15da2d49, 0x8cd37cf3, 0xfbd44c65,
0x4db26158, 0x3ab551ce, 0xa3bc0074, 0xd4bb30e2,
0x4adfa541, 0x3dd895d7, 0xa4d1c46d, 0xd3d6f4fb,
0x4369e96a, 0x346ed9fc, 0xad678846, 0xda60b8d0,
0x44042d73, 0x33031de5, 0xaa0a4c5f, 0xdd0d7cc9,
0x5005713c, 0x270241aa, 0xbe0b1010, 0xc90c2086,
0x5768b525, 0x206f85b3, 0xb966d409, 0xce61e49f,
0x5edef90e, 0x29d9c998, 0xb0d09822, 0xc7d7a8b4,
0x59b33d17, 0x2eb40d81, 0xb7bd5c3b, 0xc0ba6cad,
0xedb88320, 0x9abfb3b6, 0x03b6e20c, 0x74b1d29a,
0xead54739, 0x9dd277af, 0x04db2615, 0x73dc1683,
0xe3630b12, 0x94643b84, 0x0d6d6a3e, 0x7a6a5aa8,
0xe40ecf0b, 0x9309ff9d, 0x0a00ae27, 0x7d079eb1,
0xf00f9344, 0x8708a3d2, 0x1e01f268, 0x6906c2fe,
0xf762575d, 0x806567cb, 0x196c3671, 0x6e6b06e7,
0xfed41b76, 0x89d32be0, 0x10da7a5a, 0x67dd4acc,
0xf9b9df6f, 0x8ebeeff9, 0x17b7be43, 0x60b08ed5,
0xd6d6a3e8, 0xa1d1937e, 0x38d8c2c4, 0x4fdff252,
0xd1bb67f1, 0xa6bc5767, 0x3fb506dd, 0x48b2364b,
0xd80d2bda, 0xaf0a1b4c, 0x36034af6, 0x41047a60,
0xdf60efc3, 0xa867df55, 0x316e8eef, 0x4669be79,
0xcb61b38c, 0xbc66831a, 0x256fd2a0, 0x5268e236,
0xcc0c7795, 0xbb0b4703, 0x220216b9, 0x5505262f,
0xc5ba3bbe, 0xb2bd0b28, 0x2bb45a92, 0x5cb36a04,
0xc2d7ffa7, 0xb5d0cf31, 0x2cd99e8b, 0x5bdeae1d,
0x9b64c2b0, 0xec63f226, 0x756aa39c, 0x026d930a,
0x9c0906a9, 0xeb0e363f, 0x72076785, 0x05005713,
0x95bf4a82, 0xe2b87a14, 0x7bb12bae, 0x0cb61b38,
0x92d28e9b, 0xe5d5be0d, 0x7cdcefb7, 0x0bdbdf21,
0x86d3d2d4, 0xf1d4e242, 0x68ddb3f8, 0x1fda836e,
0x81be16cd, 0xf6b9265b, 0x6fb077e1, 0x18b74777,
0x88085ae6, 0xff0f6a70, 0x66063bca, 0x11010b5c,
0x8f659eff, 0xf862ae69, 0x616bffd3, 0x166ccf45,
0xa00ae278, 0xd70dd2ee, 0x4e048354, 0x3903b3c2,
0xa7672661, 0xd06016f7, 0x4969474d, 0x3e6e77db,
0xaed16a4a, 0xd9d65adc, 0x40df0b66, 0x37d83bf0,
0xa9bcae53, 0xdebb9ec5, 0x47b2cf7f, 0x30b5ffe9,
0xbdbdf21c, 0xcabac28a, 0x53b39330, 0x24b4a3a6,
0xbad03605, 0xcdd70693, 0x54de5729, 0x23d967bf,
0xb3667a2e, 0xc4614ab8, 0x5d681b02, 0x2a6f2b94,
0xb40bbe37, 0xc30c8ea1, 0x5a05df1b, 0x2d02ef8d,
};
static const u32 crc32_slice8_table[] MAYBE_UNUSED = {
0x00000000, 0x77073096, 0xee0e612c, 0x990951ba,
0x076dc419, 0x706af48f, 0xe963a535, 0x9e6495a3,
0x0edb8832, 0x79dcb8a4, 0xe0d5e91e, 0x97d2d988,
0x09b64c2b, 0x7eb17cbd, 0xe7b82d07, 0x90bf1d91,
0x1db71064, 0x6ab020f2, 0xf3b97148, 0x84be41de,
0x1adad47d, 0x6ddde4eb, 0xf4d4b551, 0x83d385c7,
0x136c9856, 0x646ba8c0, 0xfd62f97a, 0x8a65c9ec,
0x14015c4f, 0x63066cd9, 0xfa0f3d63, 0x8d080df5,
0x3b6e20c8, 0x4c69105e, 0xd56041e4, 0xa2677172,
0x3c03e4d1, 0x4b04d447, 0xd20d85fd, 0xa50ab56b,
0x35b5a8fa, 0x42b2986c, 0xdbbbc9d6, 0xacbcf940,
0x32d86ce3, 0x45df5c75, 0xdcd60dcf, 0xabd13d59,
0x26d930ac, 0x51de003a, 0xc8d75180, 0xbfd06116,
0x21b4f4b5, 0x56b3c423, 0xcfba9599, 0xb8bda50f,
0x2802b89e, 0x5f058808, 0xc60cd9b2, 0xb10be924,
0x2f6f7c87, 0x58684c11, 0xc1611dab, 0xb6662d3d,
0x76dc4190, 0x01db7106, 0x98d220bc, 0xefd5102a,
0x71b18589, 0x06b6b51f, 0x9fbfe4a5, 0xe8b8d433,
0x7807c9a2, 0x0f00f934, 0x9609a88e, 0xe10e9818,
0x7f6a0dbb, 0x086d3d2d, 0x91646c97, 0xe6635c01,
0x6b6b51f4, 0x1c6c6162, 0x856530d8, 0xf262004e,
0x6c0695ed, 0x1b01a57b, 0x8208f4c1, 0xf50fc457,
0x65b0d9c6, 0x12b7e950, 0x8bbeb8ea, 0xfcb9887c,
0x62dd1ddf, 0x15da2d49, 0x8cd37cf3, 0xfbd44c65,
0x4db26158, 0x3ab551ce, 0xa3bc0074, 0xd4bb30e2,
0x4adfa541, 0x3dd895d7, 0xa4d1c46d, 0xd3d6f4fb,
0x4369e96a, 0x346ed9fc, 0xad678846, 0xda60b8d0,
0x44042d73, 0x33031de5, 0xaa0a4c5f, 0xdd0d7cc9,
0x5005713c, 0x270241aa, 0xbe0b1010, 0xc90c2086,
0x5768b525, 0x206f85b3, 0xb966d409, 0xce61e49f,
0x5edef90e, 0x29d9c998, 0xb0d09822, 0xc7d7a8b4,
0x59b33d17, 0x2eb40d81, 0xb7bd5c3b, 0xc0ba6cad,
0xedb88320, 0x9abfb3b6, 0x03b6e20c, 0x74b1d29a,
0xead54739, 0x9dd277af, 0x04db2615, 0x73dc1683,
0xe3630b12, 0x94643b84, 0x0d6d6a3e, 0x7a6a5aa8,
0xe40ecf0b, 0x9309ff9d, 0x0a00ae27, 0x7d079eb1,
0xf00f9344, 0x8708a3d2, 0x1e01f268, 0x6906c2fe,
0xf762575d, 0x806567cb, 0x196c3671, 0x6e6b06e7,
0xfed41b76, 0x89d32be0, 0x10da7a5a, 0x67dd4acc,
0xf9b9df6f, 0x8ebeeff9, 0x17b7be43, 0x60b08ed5,
0xd6d6a3e8, 0xa1d1937e, 0x38d8c2c4, 0x4fdff252,
0xd1bb67f1, 0xa6bc5767, 0x3fb506dd, 0x48b2364b,
0xd80d2bda, 0xaf0a1b4c, 0x36034af6, 0x41047a60,
0xdf60efc3, 0xa867df55, 0x316e8eef, 0x4669be79,
0xcb61b38c, 0xbc66831a, 0x256fd2a0, 0x5268e236,
0xcc0c7795, 0xbb0b4703, 0x220216b9, 0x5505262f,
0xc5ba3bbe, 0xb2bd0b28, 0x2bb45a92, 0x5cb36a04,
0xc2d7ffa7, 0xb5d0cf31, 0x2cd99e8b, 0x5bdeae1d,
0x9b64c2b0, 0xec63f226, 0x756aa39c, 0x026d930a,
0x9c0906a9, 0xeb0e363f, 0x72076785, 0x05005713,
0x95bf4a82, 0xe2b87a14, 0x7bb12bae, 0x0cb61b38,
0x92d28e9b, 0xe5d5be0d, 0x7cdcefb7, 0x0bdbdf21,
0x86d3d2d4, 0xf1d4e242, 0x68ddb3f8, 0x1fda836e,
0x81be16cd, 0xf6b9265b, 0x6fb077e1, 0x18b74777,
0x88085ae6, 0xff0f6a70, 0x66063bca, 0x11010b5c,
0x8f659eff, 0xf862ae69, 0x616bffd3, 0x166ccf45,
0xa00ae278, 0xd70dd2ee, 0x4e048354, 0x3903b3c2,
0xa7672661, 0xd06016f7, 0x4969474d, 0x3e6e77db,
0xaed16a4a, 0xd9d65adc, 0x40df0b66, 0x37d83bf0,
0xa9bcae53, 0xdebb9ec5, 0x47b2cf7f, 0x30b5ffe9,
0xbdbdf21c, 0xcabac28a, 0x53b39330, 0x24b4a3a6,
0xbad03605, 0xcdd70693, 0x54de5729, 0x23d967bf,
0xb3667a2e, 0xc4614ab8, 0x5d681b02, 0x2a6f2b94,
0xb40bbe37, 0xc30c8ea1, 0x5a05df1b, 0x2d02ef8d,
0x00000000, 0x191b3141, 0x32366282, 0x2b2d53c3,
0x646cc504, 0x7d77f445, 0x565aa786, 0x4f4196c7,
0xc8d98a08, 0xd1c2bb49, 0xfaefe88a, 0xe3f4d9cb,
0xacb54f0c, 0xb5ae7e4d, 0x9e832d8e, 0x87981ccf,
0x4ac21251, 0x53d92310, 0x78f470d3, 0x61ef4192,
0x2eaed755, 0x37b5e614, 0x1c98b5d7, 0x05838496,
0x821b9859, 0x9b00a918, 0xb02dfadb, 0xa936cb9a,
0xe6775d5d, 0xff6c6c1c, 0xd4413fdf, 0xcd5a0e9e,
0x958424a2, 0x8c9f15e3, 0xa7b24620, 0xbea97761,
0xf1e8e1a6, 0xe8f3d0e7, 0xc3de8324, 0xdac5b265,
0x5d5daeaa, 0x44469feb, 0x6f6bcc28, 0x7670fd69,
0x39316bae, 0x202a5aef, 0x0b07092c, 0x121c386d,
0xdf4636f3, 0xc65d07b2, 0xed705471, 0xf46b6530,
0xbb2af3f7, 0xa231c2b6, 0x891c9175, 0x9007a034,
0x179fbcfb, 0x0e848dba, 0x25a9de79, 0x3cb2ef38,
0x73f379ff, 0x6ae848be, 0x41c51b7d, 0x58de2a3c,
0xf0794f05, 0xe9627e44, 0xc24f2d87, 0xdb541cc6,
0x94158a01, 0x8d0ebb40, 0xa623e883, 0xbf38d9c2,
0x38a0c50d, 0x21bbf44c, 0x0a96a78f, 0x138d96ce,
0x5ccc0009, 0x45d73148, 0x6efa628b, 0x77e153ca,
0xbabb5d54, 0xa3a06c15, 0x888d3fd6, 0x91960e97,
0xded79850, 0xc7cca911, 0xece1fad2, 0xf5facb93,
0x7262d75c, 0x6b79e61d, 0x4054b5de, 0x594f849f,
0x160e1258, 0x0f152319, 0x243870da, 0x3d23419b,
0x65fd6ba7, 0x7ce65ae6, 0x57cb0925, 0x4ed03864,
0x0191aea3, 0x188a9fe2, 0x33a7cc21, 0x2abcfd60,
0xad24e1af, 0xb43fd0ee, 0x9f12832d, 0x8609b26c,
0xc94824ab, 0xd05315ea, 0xfb7e4629, 0xe2657768,
0x2f3f79f6, 0x362448b7, 0x1d091b74, 0x04122a35,
0x4b53bcf2, 0x52488db3, 0x7965de70, 0x607eef31,
0xe7e6f3fe, 0xfefdc2bf, 0xd5d0917c, 0xcccba03d,
0x838a36fa, 0x9a9107bb, 0xb1bc5478, 0xa8a76539,
0x3b83984b, 0x2298a90a, 0x09b5fac9, 0x10aecb88,
0x5fef5d4f, 0x46f46c0e, 0x6dd93fcd, 0x74c20e8c,
0xf35a1243, 0xea412302, 0xc16c70c1, 0xd8774180,
0x9736d747, 0x8e2de606, 0xa500b5c5, 0xbc1b8484,
0x71418a1a, 0x685abb5b, 0x4377e898, 0x5a6cd9d9,
0x152d4f1e, 0x0c367e5f, 0x271b2d9c, 0x3e001cdd,
0xb9980012, 0xa0833153, 0x8bae6290, 0x92b553d1,
0xddf4c516, 0xc4eff457, 0xefc2a794, 0xf6d996d5,
0xae07bce9, 0xb71c8da8, 0x9c31de6b, 0x852aef2a,
0xca6b79ed, 0xd37048ac, 0xf85d1b6f, 0xe1462a2e,
0x66de36e1, 0x7fc507a0, 0x54e85463, 0x4df36522,
0x02b2f3e5, 0x1ba9c2a4, 0x30849167, 0x299fa026,
0xe4c5aeb8, 0xfdde9ff9, 0xd6f3cc3a, 0xcfe8fd7b,
0x80a96bbc, 0x99b25afd, 0xb29f093e, 0xab84387f,
0x2c1c24b0, 0x350715f1, 0x1e2a4632, 0x07317773,
0x4870e1b4, 0x516bd0f5, 0x7a468336, 0x635db277,
0xcbfad74e, 0xd2e1e60f, 0xf9ccb5cc, 0xe0d7848d,
0xaf96124a, 0xb68d230b, 0x9da070c8, 0x84bb4189,
0x03235d46, 0x1a386c07, 0x31153fc4, 0x280e0e85,
0x674f9842, 0x7e54a903, 0x5579fac0, 0x4c62cb81,
0x8138c51f, 0x9823f45e, 0xb30ea79d, 0xaa1596dc,
0xe554001b, 0xfc4f315a, 0xd7626299, 0xce7953d8,
0x49e14f17, 0x50fa7e56, 0x7bd72d95, 0x62cc1cd4,
0x2d8d8a13, 0x3496bb52, 0x1fbbe891, 0x06a0d9d0,
0x5e7ef3ec, 0x4765c2ad, 0x6c48916e, 0x7553a02f,
0x3a1236e8, 0x230907a9, 0x0824546a, 0x113f652b,
0x96a779e4, 0x8fbc48a5, 0xa4911b66, 0xbd8a2a27,
0xf2cbbce0, 0xebd08da1, 0xc0fdde62, 0xd9e6ef23,
0x14bce1bd, 0x0da7d0fc, 0x268a833f, 0x3f91b27e,
0x70d024b9, 0x69cb15f8, 0x42e6463b, 0x5bfd777a,
0xdc656bb5, 0xc57e5af4, 0xee530937, 0xf7483876,
0xb809aeb1, 0xa1129ff0, 0x8a3fcc33, 0x9324fd72,
0x00000000, 0x01c26a37, 0x0384d46e, 0x0246be59,
0x0709a8dc, 0x06cbc2eb, 0x048d7cb2, 0x054f1685,
0x0e1351b8, 0x0fd13b8f, 0x0d9785d6, 0x0c55efe1,
0x091af964, 0x08d89353, 0x0a9e2d0a, 0x0b5c473d,
0x1c26a370, 0x1de4c947, 0x1fa2771e, 0x1e601d29,
0x1b2f0bac, 0x1aed619b, 0x18abdfc2, 0x1969b5f5,
0x1235f2c8, 0x13f798ff, 0x11b126a6, 0x10734c91,
0x153c5a14, 0x14fe3023, 0x16b88e7a, 0x177ae44d,
0x384d46e0, 0x398f2cd7, 0x3bc9928e, 0x3a0bf8b9,
0x3f44ee3c, 0x3e86840b, 0x3cc03a52, 0x3d025065,
0x365e1758, 0x379c7d6f, 0x35dac336, 0x3418a901,
0x3157bf84, 0x3095d5b3, 0x32d36bea, 0x331101dd,
0x246be590, 0x25a98fa7, 0x27ef31fe, 0x262d5bc9,
0x23624d4c, 0x22a0277b, 0x20e69922, 0x2124f315,
0x2a78b428, 0x2bbade1f, 0x29fc6046, 0x283e0a71,
0x2d711cf4, 0x2cb376c3, 0x2ef5c89a, 0x2f37a2ad,
0x709a8dc0, 0x7158e7f7, 0x731e59ae, 0x72dc3399,
0x7793251c, 0x76514f2b, 0x7417f172, 0x75d59b45,
0x7e89dc78, 0x7f4bb64f, 0x7d0d0816, 0x7ccf6221,
0x798074a4, 0x78421e93, 0x7a04a0ca, 0x7bc6cafd,
0x6cbc2eb0, 0x6d7e4487, 0x6f38fade, 0x6efa90e9,
0x6bb5866c, 0x6a77ec5b, 0x68315202, 0x69f33835,
0x62af7f08, 0x636d153f, 0x612bab66, 0x60e9c151,
0x65a6d7d4, 0x6464bde3, 0x662203ba, 0x67e0698d,
0x48d7cb20, 0x4915a117, 0x4b531f4e, 0x4a917579,
0x4fde63fc, 0x4e1c09cb, 0x4c5ab792, 0x4d98dda5,
0x46c49a98, 0x4706f0af, 0x45404ef6, 0x448224c1,
0x41cd3244, 0x400f5873, 0x4249e62a, 0x438b8c1d,
0x54f16850, 0x55330267, 0x5775bc3e, 0x56b7d609,
0x53f8c08c, 0x523aaabb, 0x507c14e2, 0x51be7ed5,
0x5ae239e8, 0x5b2053df, 0x5966ed86, 0x58a487b1,
0x5deb9134, 0x5c29fb03, 0x5e6f455a, 0x5fad2f6d,
0xe1351b80, 0xe0f771b7, 0xe2b1cfee, 0xe373a5d9,
0xe63cb35c, 0xe7fed96b, 0xe5b86732, 0xe47a0d05,
0xef264a38, 0xeee4200f, 0xeca29e56, 0xed60f461,
0xe82fe2e4, 0xe9ed88d3, 0xebab368a, 0xea695cbd,
0xfd13b8f0, 0xfcd1d2c7, 0xfe976c9e, 0xff5506a9,
0xfa1a102c, 0xfbd87a1b, 0xf99ec442, 0xf85cae75,
0xf300e948, 0xf2c2837f, 0xf0843d26, 0xf1465711,
0xf4094194, 0xf5cb2ba3, 0xf78d95fa, 0xf64fffcd,
0xd9785d60, 0xd8ba3757, 0xdafc890e, 0xdb3ee339,
0xde71f5bc, 0xdfb39f8b, 0xddf521d2, 0xdc374be5,
0xd76b0cd8, 0xd6a966ef, 0xd4efd8b6, 0xd52db281,
0xd062a404, 0xd1a0ce33, 0xd3e6706a, 0xd2241a5d,
0xc55efe10, 0xc49c9427, 0xc6da2a7e, 0xc7184049,
0xc25756cc, 0xc3953cfb, 0xc1d382a2, 0xc011e895,
0xcb4dafa8, 0xca8fc59f, 0xc8c97bc6, 0xc90b11f1,
0xcc440774, 0xcd866d43, 0xcfc0d31a, 0xce02b92d,
0x91af9640, 0x906dfc77, 0x922b422e, 0x93e92819,
0x96a63e9c, 0x976454ab, 0x9522eaf2, 0x94e080c5,
0x9fbcc7f8, 0x9e7eadcf, 0x9c381396, 0x9dfa79a1,
0x98b56f24, 0x99770513, 0x9b31bb4a, 0x9af3d17d,
0x8d893530, 0x8c4b5f07, 0x8e0de15e, 0x8fcf8b69,
0x8a809dec, 0x8b42f7db, 0x89044982, 0x88c623b5,
0x839a6488, 0x82580ebf, 0x801eb0e6, 0x81dcdad1,
0x8493cc54, 0x8551a663, 0x8717183a, 0x86d5720d,
0xa9e2d0a0, 0xa820ba97, 0xaa6604ce, 0xaba46ef9,
0xaeeb787c, 0xaf29124b, 0xad6fac12, 0xacadc625,
0xa7f18118, 0xa633eb2f, 0xa4755576, 0xa5b73f41,
0xa0f829c4, 0xa13a43f3, 0xa37cfdaa, 0xa2be979d,
0xb5c473d0, 0xb40619e7, 0xb640a7be, 0xb782cd89,
0xb2cddb0c, 0xb30fb13b, 0xb1490f62, 0xb08b6555,
0xbbd72268, 0xba15485f, 0xb853f606, 0xb9919c31,
0xbcde8ab4, 0xbd1ce083, 0xbf5a5eda, 0xbe9834ed,
0x00000000, 0xb8bc6765, 0xaa09c88b, 0x12b5afee,
0x8f629757, 0x37def032, 0x256b5fdc, 0x9dd738b9,
0xc5b428ef, 0x7d084f8a, 0x6fbde064, 0xd7018701,
0x4ad6bfb8, 0xf26ad8dd, 0xe0df7733, 0x58631056,
0x5019579f, 0xe8a530fa, 0xfa109f14, 0x42acf871,
0xdf7bc0c8, 0x67c7a7ad, 0x75720843, 0xcdce6f26,
0x95ad7f70, 0x2d111815, 0x3fa4b7fb, 0x8718d09e,
0x1acfe827, 0xa2738f42, 0xb0c620ac, 0x087a47c9,
0xa032af3e, 0x188ec85b, 0x0a3b67b5, 0xb28700d0,
0x2f503869, 0x97ec5f0c, 0x8559f0e2, 0x3de59787,
0x658687d1, 0xdd3ae0b4, 0xcf8f4f5a, 0x7733283f,
0xeae41086, 0x525877e3, 0x40edd80d, 0xf851bf68,
0xf02bf8a1, 0x48979fc4, 0x5a22302a, 0xe29e574f,
0x7f496ff6, 0xc7f50893, 0xd540a77d, 0x6dfcc018,
0x359fd04e, 0x8d23b72b, 0x9f9618c5, 0x272a7fa0,
0xbafd4719, 0x0241207c, 0x10f48f92, 0xa848e8f7,
0x9b14583d, 0x23a83f58, 0x311d90b6, 0x89a1f7d3,
0x1476cf6a, 0xaccaa80f, 0xbe7f07e1, 0x06c36084,
0x5ea070d2, 0xe61c17b7, 0xf4a9b859, 0x4c15df3c,
0xd1c2e785, 0x697e80e0, 0x7bcb2f0e, 0xc377486b,
0xcb0d0fa2, 0x73b168c7, 0x6104c729, 0xd9b8a04c,
0x446f98f5, 0xfcd3ff90, 0xee66507e, 0x56da371b,
0x0eb9274d, 0xb6054028, 0xa4b0efc6, 0x1c0c88a3,
0x81dbb01a, 0x3967d77f, 0x2bd27891, 0x936e1ff4,
0x3b26f703, 0x839a9066, 0x912f3f88, 0x299358ed,
0xb4446054, 0x0cf80731, 0x1e4da8df, 0xa6f1cfba,
0xfe92dfec, 0x462eb889, 0x549b1767, 0xec277002,
0x71f048bb, 0xc94c2fde, 0xdbf98030, 0x6345e755,
0x6b3fa09c, 0xd383c7f9, 0xc1366817, 0x798a0f72,
0xe45d37cb, 0x5ce150ae, 0x4e54ff40, 0xf6e89825,
0xae8b8873, 0x1637ef16, 0x048240f8, 0xbc3e279d,
0x21e91f24, 0x99557841, 0x8be0d7af, 0x335cb0ca,
0xed59b63b, 0x55e5d15e, 0x47507eb0, 0xffec19d5,
0x623b216c, 0xda874609, 0xc832e9e7, 0x708e8e82,
0x28ed9ed4, 0x9051f9b1, 0x82e4565f, 0x3a58313a,
0xa78f0983, 0x1f336ee6, 0x0d86c108, 0xb53aa66d,
0xbd40e1a4, 0x05fc86c1, 0x1749292f, 0xaff54e4a,
0x322276f3, 0x8a9e1196, 0x982bbe78, 0x2097d91d,
0x78f4c94b, 0xc048ae2e, 0xd2fd01c0, 0x6a4166a5,
0xf7965e1c, 0x4f2a3979, 0x5d9f9697, 0xe523f1f2,
0x4d6b1905, 0xf5d77e60, 0xe762d18e, 0x5fdeb6eb,
0xc2098e52, 0x7ab5e937, 0x680046d9, 0xd0bc21bc,
0x88df31ea, 0x3063568f, 0x22d6f961, 0x9a6a9e04,
0x07bda6bd, 0xbf01c1d8, 0xadb46e36, 0x15080953,
0x1d724e9a, 0xa5ce29ff, 0xb77b8611, 0x0fc7e174,
0x9210d9cd, 0x2aacbea8, 0x38191146, 0x80a57623,
0xd8c66675, 0x607a0110, 0x72cfaefe, 0xca73c99b,
0x57a4f122, 0xef189647, 0xfdad39a9, 0x45115ecc,
0x764dee06, 0xcef18963, 0xdc44268d, 0x64f841e8,
0xf92f7951, 0x41931e34, 0x5326b1da, 0xeb9ad6bf,
0xb3f9c6e9, 0x0b45a18c, 0x19f00e62, 0xa14c6907,
0x3c9b51be, 0x842736db, 0x96929935, 0x2e2efe50,
0x2654b999, 0x9ee8defc, 0x8c5d7112, 0x34e11677,
0xa9362ece, 0x118a49ab, 0x033fe645, 0xbb838120,
0xe3e09176, 0x5b5cf613, 0x49e959fd, 0xf1553e98,
0x6c820621, 0xd43e6144, 0xc68bceaa, 0x7e37a9cf,
0xd67f4138, 0x6ec3265d, 0x7c7689b3, 0xc4caeed6,
0x591dd66f, 0xe1a1b10a, 0xf3141ee4, 0x4ba87981,
0x13cb69d7, 0xab770eb2, 0xb9c2a15c, 0x017ec639,
0x9ca9fe80, 0x241599e5, 0x36a0360b, 0x8e1c516e,
0x866616a7, 0x3eda71c2, 0x2c6fde2c, 0x94d3b949,
0x090481f0, 0xb1b8e695, 0xa30d497b, 0x1bb12e1e,
0x43d23e48, 0xfb6e592d, 0xe9dbf6c3, 0x516791a6,
0xccb0a91f, 0x740cce7a, 0x66b96194, 0xde0506f1,
0x00000000, 0x3d6029b0, 0x7ac05360, 0x47a07ad0,
0xf580a6c0, 0xc8e08f70, 0x8f40f5a0, 0xb220dc10,
0x30704bc1, 0x0d106271, 0x4ab018a1, 0x77d03111,
0xc5f0ed01, 0xf890c4b1, 0xbf30be61, 0x825097d1,
0x60e09782, 0x5d80be32, 0x1a20c4e2, 0x2740ed52,
0x95603142, 0xa80018f2, 0xefa06222, 0xd2c04b92,
0x5090dc43, 0x6df0f5f3, 0x2a508f23, 0x1730a693,
0xa5107a83, 0x98705333, 0xdfd029e3, 0xe2b00053,
0xc1c12f04, 0xfca106b4, 0xbb017c64, 0x866155d4,
0x344189c4, 0x0921a074, 0x4e81daa4, 0x73e1f314,
0xf1b164c5, 0xccd14d75, 0x8b7137a5, 0xb6111e15,
0x0431c205, 0x3951ebb5, 0x7ef19165, 0x4391b8d5,
0xa121b886, 0x9c419136, 0xdbe1ebe6, 0xe681c256,
0x54a11e46, 0x69c137f6, 0x2e614d26, 0x13016496,
0x9151f347, 0xac31daf7, 0xeb91a027, 0xd6f18997,
0x64d15587, 0x59b17c37, 0x1e1106e7, 0x23712f57,
0x58f35849, 0x659371f9, 0x22330b29, 0x1f532299,
0xad73fe89, 0x9013d739, 0xd7b3ade9, 0xead38459,
0x68831388, 0x55e33a38, 0x124340e8, 0x2f236958,
0x9d03b548, 0xa0639cf8, 0xe7c3e628, 0xdaa3cf98,
0x3813cfcb, 0x0573e67b, 0x42d39cab, 0x7fb3b51b,
0xcd93690b, 0xf0f340bb, 0xb7533a6b, 0x8a3313db,
0x0863840a, 0x3503adba, 0x72a3d76a, 0x4fc3feda,
0xfde322ca, 0xc0830b7a, 0x872371aa, 0xba43581a,
0x9932774d, 0xa4525efd, 0xe3f2242d, 0xde920d9d,
0x6cb2d18d, 0x51d2f83d, 0x167282ed, 0x2b12ab5d,
0xa9423c8c, 0x9422153c, 0xd3826fec, 0xeee2465c,
0x5cc29a4c, 0x61a2b3fc, 0x2602c92c, 0x1b62e09c,
0xf9d2e0cf, 0xc4b2c97f, 0x8312b3af, 0xbe729a1f,
0x0c52460f, 0x31326fbf, 0x7692156f, 0x4bf23cdf,
0xc9a2ab0e, 0xf4c282be, 0xb362f86e, 0x8e02d1de,
0x3c220dce, 0x0142247e, 0x46e25eae, 0x7b82771e,
0xb1e6b092, 0x8c869922, 0xcb26e3f2, 0xf646ca42,
0x44661652, 0x79063fe2, 0x3ea64532, 0x03c66c82,
0x8196fb53, 0xbcf6d2e3, 0xfb56a833, 0xc6368183,
0x74165d93, 0x49767423, 0x0ed60ef3, 0x33b62743,
0xd1062710, 0xec660ea0, 0xabc67470, 0x96a65dc0,
0x248681d0, 0x19e6a860, 0x5e46d2b0, 0x6326fb00,
0xe1766cd1, 0xdc164561, 0x9bb63fb1, 0xa6d61601,
0x14f6ca11, 0x2996e3a1, 0x6e369971, 0x5356b0c1,
0x70279f96, 0x4d47b626, 0x0ae7ccf6, 0x3787e546,
0x85a73956, 0xb8c710e6, 0xff676a36, 0xc2074386,
0x4057d457, 0x7d37fde7, 0x3a978737, 0x07f7ae87,
0xb5d77297, 0x88b75b27, 0xcf1721f7, 0xf2770847,
0x10c70814, 0x2da721a4, 0x6a075b74, 0x576772c4,
0xe547aed4, 0xd8278764, 0x9f87fdb4, 0xa2e7d404,
0x20b743d5, 0x1dd76a65, 0x5a7710b5, 0x67173905,
0xd537e515, 0xe857cca5, 0xaff7b675, 0x92979fc5,
0xe915e8db, 0xd475c16b, 0x93d5bbbb, 0xaeb5920b,
0x1c954e1b, 0x21f567ab, 0x66551d7b, 0x5b3534cb,
0xd965a31a, 0xe4058aaa, 0xa3a5f07a, 0x9ec5d9ca,
0x2ce505da, 0x11852c6a, 0x562556ba, 0x6b457f0a,
0x89f57f59, 0xb49556e9, 0xf3352c39, 0xce550589,
0x7c75d999, 0x4115f029, 0x06b58af9, 0x3bd5a349,
0xb9853498, 0x84e51d28, 0xc34567f8, 0xfe254e48,
0x4c059258, 0x7165bbe8, 0x36c5c138, 0x0ba5e888,
0x28d4c7df, 0x15b4ee6f, 0x521494bf, 0x6f74bd0f,
0xdd54611f, 0xe03448af, 0xa794327f, 0x9af41bcf,
0x18a48c1e, 0x25c4a5ae, 0x6264df7e, 0x5f04f6ce,
0xed242ade, 0xd044036e, 0x97e479be, 0xaa84500e,
0x4834505d, 0x755479ed, 0x32f4033d, 0x0f942a8d,
0xbdb4f69d, 0x80d4df2d, 0xc774a5fd, 0xfa148c4d,
0x78441b9c, 0x4524322c, 0x028448fc, 0x3fe4614c,
0x8dc4bd5c, 0xb0a494ec, 0xf704ee3c, 0xca64c78c,
0x00000000, 0xcb5cd3a5, 0x4dc8a10b, 0x869472ae,
0x9b914216, 0x50cd91b3, 0xd659e31d, 0x1d0530b8,
0xec53826d, 0x270f51c8, 0xa19b2366, 0x6ac7f0c3,
0x77c2c07b, 0xbc9e13de, 0x3a0a6170, 0xf156b2d5,
0x03d6029b, 0xc88ad13e, 0x4e1ea390, 0x85427035,
0x9847408d, 0x531b9328, 0xd58fe186, 0x1ed33223,
0xef8580f6, 0x24d95353, 0xa24d21fd, 0x6911f258,
0x7414c2e0, 0xbf481145, 0x39dc63eb, 0xf280b04e,
0x07ac0536, 0xccf0d693, 0x4a64a43d, 0x81387798,
0x9c3d4720, 0x57619485, 0xd1f5e62b, 0x1aa9358e,
0xebff875b, 0x20a354fe, 0xa6372650, 0x6d6bf5f5,
0x706ec54d, 0xbb3216e8, 0x3da66446, 0xf6fab7e3,
0x047a07ad, 0xcf26d408, 0x49b2a6a6, 0x82ee7503,
0x9feb45bb, 0x54b7961e, 0xd223e4b0, 0x197f3715,
0xe82985c0, 0x23755665, 0xa5e124cb, 0x6ebdf76e,
0x73b8c7d6, 0xb8e41473, 0x3e7066dd, 0xf52cb578,
0x0f580a6c, 0xc404d9c9, 0x4290ab67, 0x89cc78c2,
0x94c9487a, 0x5f959bdf, 0xd901e971, 0x125d3ad4,
0xe30b8801, 0x28575ba4, 0xaec3290a, 0x659ffaaf,
0x789aca17, 0xb3c619b2, 0x35526b1c, 0xfe0eb8b9,
0x0c8e08f7, 0xc7d2db52, 0x4146a9fc, 0x8a1a7a59,
0x971f4ae1, 0x5c439944, 0xdad7ebea, 0x118b384f,
0xe0dd8a9a, 0x2b81593f, 0xad152b91, 0x6649f834,
0x7b4cc88c, 0xb0101b29, 0x36846987, 0xfdd8ba22,
0x08f40f5a, 0xc3a8dcff, 0x453cae51, 0x8e607df4,
0x93654d4c, 0x58399ee9, 0xdeadec47, 0x15f13fe2,
0xe4a78d37, 0x2ffb5e92, 0xa96f2c3c, 0x6233ff99,
0x7f36cf21, 0xb46a1c84, 0x32fe6e2a, 0xf9a2bd8f,
0x0b220dc1, 0xc07ede64, 0x46eaacca, 0x8db67f6f,
0x90b34fd7, 0x5bef9c72, 0xdd7beedc, 0x16273d79,
0xe7718fac, 0x2c2d5c09, 0xaab92ea7, 0x61e5fd02,
0x7ce0cdba, 0xb7bc1e1f, 0x31286cb1, 0xfa74bf14,
0x1eb014d8, 0xd5ecc77d, 0x5378b5d3, 0x98246676,
0x852156ce, 0x4e7d856b, 0xc8e9f7c5, 0x03b52460,
0xf2e396b5, 0x39bf4510, 0xbf2b37be, 0x7477e41b,
0x6972d4a3, 0xa22e0706, 0x24ba75a8, 0xefe6a60d,
0x1d661643, 0xd63ac5e6, 0x50aeb748, 0x9bf264ed,
0x86f75455, 0x4dab87f0, 0xcb3ff55e, 0x006326fb,
0xf135942e, 0x3a69478b, 0xbcfd3525, 0x77a1e680,
0x6aa4d638, 0xa1f8059d, 0x276c7733, 0xec30a496,
0x191c11ee, 0xd240c24b, 0x54d4b0e5, 0x9f886340,
0x828d53f8, 0x49d1805d, 0xcf45f2f3, 0x04192156,
0xf54f9383, 0x3e134026, 0xb8873288, 0x73dbe12d,
0x6eded195, 0xa5820230, 0x2316709e, 0xe84aa33b,
0x1aca1375, 0xd196c0d0, 0x5702b27e, 0x9c5e61db,
0x815b5163, 0x4a0782c6, 0xcc93f068, 0x07cf23cd,
0xf6999118, 0x3dc542bd, 0xbb513013, 0x700de3b6,
0x6d08d30e, 0xa65400ab, 0x20c07205, 0xeb9ca1a0,
0x11e81eb4, 0xdab4cd11, 0x5c20bfbf, 0x977c6c1a,
0x8a795ca2, 0x41258f07, 0xc7b1fda9, 0x0ced2e0c,
0xfdbb9cd9, 0x36e74f7c, 0xb0733dd2, 0x7b2fee77,
0x662adecf, 0xad760d6a, 0x2be27fc4, 0xe0beac61,
0x123e1c2f, 0xd962cf8a, 0x5ff6bd24, 0x94aa6e81,
0x89af5e39, 0x42f38d9c, 0xc467ff32, 0x0f3b2c97,
0xfe6d9e42, 0x35314de7, 0xb3a53f49, 0x78f9ecec,
0x65fcdc54, 0xaea00ff1, 0x28347d5f, 0xe368aefa,
0x16441b82, 0xdd18c827, 0x5b8cba89, 0x90d0692c,
0x8dd55994, 0x46898a31, 0xc01df89f, 0x0b412b3a,
0xfa1799ef, 0x314b4a4a, 0xb7df38e4, 0x7c83eb41,
0x6186dbf9, 0xaada085c, 0x2c4e7af2, 0xe712a957,
0x15921919, 0xdececabc, 0x585ab812, 0x93066bb7,
0x8e035b0f, 0x455f88aa, 0xc3cbfa04, 0x089729a1,
0xf9c19b74, 0x329d48d1, 0xb4093a7f, 0x7f55e9da,
0x6250d962, 0xa90c0ac7, 0x2f987869, 0xe4c4abcc,
0x00000000, 0xa6770bb4, 0x979f1129, 0x31e81a9d,
0xf44f2413, 0x52382fa7, 0x63d0353a, 0xc5a73e8e,
0x33ef4e67, 0x959845d3, 0xa4705f4e, 0x020754fa,
0xc7a06a74, 0x61d761c0, 0x503f7b5d, 0xf64870e9,
0x67de9cce, 0xc1a9977a, 0xf0418de7, 0x56368653,
0x9391b8dd, 0x35e6b369, 0x040ea9f4, 0xa279a240,
0x5431d2a9, 0xf246d91d, 0xc3aec380, 0x65d9c834,
0xa07ef6ba, 0x0609fd0e, 0x37e1e793, 0x9196ec27,
0xcfbd399c, 0x69ca3228, 0x582228b5, 0xfe552301,
0x3bf21d8f, 0x9d85163b, 0xac6d0ca6, 0x0a1a0712,
0xfc5277fb, 0x5a257c4f, 0x6bcd66d2, 0xcdba6d66,
0x081d53e8, 0xae6a585c, 0x9f8242c1, 0x39f54975,
0xa863a552, 0x0e14aee6, 0x3ffcb47b, 0x998bbfcf,
0x5c2c8141, 0xfa5b8af5, 0xcbb39068, 0x6dc49bdc,
0x9b8ceb35, 0x3dfbe081, 0x0c13fa1c, 0xaa64f1a8,
0x6fc3cf26, 0xc9b4c492, 0xf85cde0f, 0x5e2bd5bb,
0x440b7579, 0xe27c7ecd, 0xd3946450, 0x75e36fe4,
0xb044516a, 0x16335ade, 0x27db4043, 0x81ac4bf7,
0x77e43b1e, 0xd19330aa, 0xe07b2a37, 0x460c2183,
0x83ab1f0d, 0x25dc14b9, 0x14340e24, 0xb2430590,
0x23d5e9b7, 0x85a2e203, 0xb44af89e, 0x123df32a,
0xd79acda4, 0x71edc610, 0x4005dc8d, 0xe672d739,
0x103aa7d0, 0xb64dac64, 0x87a5b6f9, 0x21d2bd4d,
0xe47583c3, 0x42028877, 0x73ea92ea, 0xd59d995e,
0x8bb64ce5, 0x2dc14751, 0x1c295dcc, 0xba5e5678,
0x7ff968f6, 0xd98e6342, 0xe86679df, 0x4e11726b,
0xb8590282, 0x1e2e0936, 0x2fc613ab, 0x89b1181f,
0x4c162691, 0xea612d25, 0xdb8937b8, 0x7dfe3c0c,
0xec68d02b, 0x4a1fdb9f, 0x7bf7c102, 0xdd80cab6,
0x1827f438, 0xbe50ff8c, 0x8fb8e511, 0x29cfeea5,
0xdf879e4c, 0x79f095f8, 0x48188f65, 0xee6f84d1,
0x2bc8ba5f, 0x8dbfb1eb, 0xbc57ab76, 0x1a20a0c2,
0x8816eaf2, 0x2e61e146, 0x1f89fbdb, 0xb9fef06f,
0x7c59cee1, 0xda2ec555, 0xebc6dfc8, 0x4db1d47c,
0xbbf9a495, 0x1d8eaf21, 0x2c66b5bc, 0x8a11be08,
0x4fb68086, 0xe9c18b32, 0xd82991af, 0x7e5e9a1b,
0xefc8763c, 0x49bf7d88, 0x78576715, 0xde206ca1,
0x1b87522f, 0xbdf0599b, 0x8c184306, 0x2a6f48b2,
0xdc27385b, 0x7a5033ef, 0x4bb82972, 0xedcf22c6,
0x28681c48, 0x8e1f17fc, 0xbff70d61, 0x198006d5,
0x47abd36e, 0xe1dcd8da, 0xd034c247, 0x7643c9f3,
0xb3e4f77d, 0x1593fcc9, 0x247be654, 0x820cede0,
0x74449d09, 0xd23396bd, 0xe3db8c20, 0x45ac8794,
0x800bb91a, 0x267cb2ae, 0x1794a833, 0xb1e3a387,
0x20754fa0, 0x86024414, 0xb7ea5e89, 0x119d553d,
0xd43a6bb3, 0x724d6007, 0x43a57a9a, 0xe5d2712e,
0x139a01c7, 0xb5ed0a73, 0x840510ee, 0x22721b5a,
0xe7d525d4, 0x41a22e60, 0x704a34fd, 0xd63d3f49,
0xcc1d9f8b, 0x6a6a943f, 0x5b828ea2, 0xfdf58516,
0x3852bb98, 0x9e25b02c, 0xafcdaab1, 0x09baa105,
0xfff2d1ec, 0x5985da58, 0x686dc0c5, 0xce1acb71,
0x0bbdf5ff, 0xadcafe4b, 0x9c22e4d6, 0x3a55ef62,
0xabc30345, 0x0db408f1, 0x3c5c126c, 0x9a2b19d8,
0x5f8c2756, 0xf9fb2ce2, 0xc813367f, 0x6e643dcb,
0x982c4d22, 0x3e5b4696, 0x0fb35c0b, 0xa9c457bf,
0x6c636931, 0xca146285, 0xfbfc7818, 0x5d8b73ac,
0x03a0a617, 0xa5d7ada3, 0x943fb73e, 0x3248bc8a,
0xf7ef8204, 0x519889b0, 0x6070932d, 0xc6079899,
0x304fe870, 0x9638e3c4, 0xa7d0f959, 0x01a7f2ed,
0xc400cc63, 0x6277c7d7, 0x539fdd4a, 0xf5e8d6fe,
0x647e3ad9, 0xc209316d, 0xf3e12bf0, 0x55962044,
0x90311eca, 0x3646157e, 0x07ae0fe3, 0xa1d90457,
0x579174be, 0xf1e67f0a, 0xc00e6597, 0x66796e23,
0xa3de50ad, 0x05a95b19, 0x34414184, 0x92364a30,
0x00000000, 0xccaa009e, 0x4225077d, 0x8e8f07e3,
0x844a0efa, 0x48e00e64, 0xc66f0987, 0x0ac50919,
0xd3e51bb5, 0x1f4f1b2b, 0x91c01cc8, 0x5d6a1c56,
0x57af154f, 0x9b0515d1, 0x158a1232, 0xd92012ac,
0x7cbb312b, 0xb01131b5, 0x3e9e3656, 0xf23436c8,
0xf8f13fd1, 0x345b3f4f, 0xbad438ac, 0x767e3832,
0xaf5e2a9e, 0x63f42a00, 0xed7b2de3, 0x21d12d7d,
0x2b142464, 0xe7be24fa, 0x69312319, 0xa59b2387,
0xf9766256, 0x35dc62c8, 0xbb53652b, 0x77f965b5,
0x7d3c6cac, 0xb1966c32, 0x3f196bd1, 0xf3b36b4f,
0x2a9379e3, 0xe639797d, 0x68b67e9e, 0xa41c7e00,
0xaed97719, 0x62737787, 0xecfc7064, 0x205670fa,
0x85cd537d, 0x496753e3, 0xc7e85400, 0x0b42549e,
0x01875d87, 0xcd2d5d19, 0x43a25afa, 0x8f085a64,
0x562848c8, 0x9a824856, 0x140d4fb5, 0xd8a74f2b,
0xd2624632, 0x1ec846ac, 0x9047414f, 0x5ced41d1,
0x299dc2ed, 0xe537c273, 0x6bb8c590, 0xa712c50e,
0xadd7cc17, 0x617dcc89, 0xeff2cb6a, 0x2358cbf4,
0xfa78d958, 0x36d2d9c6, 0xb85dde25, 0x74f7debb,
0x7e32d7a2, 0xb298d73c, 0x3c17d0df, 0xf0bdd041,
0x5526f3c6, 0x998cf358, 0x1703f4bb, 0xdba9f425,
0xd16cfd3c, 0x1dc6fda2, 0x9349fa41, 0x5fe3fadf,
0x86c3e873, 0x4a69e8ed, 0xc4e6ef0e, 0x084cef90,
0x0289e689, 0xce23e617, 0x40ace1f4, 0x8c06e16a,
0xd0eba0bb, 0x1c41a025, 0x92cea7c6, 0x5e64a758,
0x54a1ae41, 0x980baedf, 0x1684a93c, 0xda2ea9a2,
0x030ebb0e, 0xcfa4bb90, 0x412bbc73, 0x8d81bced,
0x8744b5f4, 0x4beeb56a, 0xc561b289, 0x09cbb217,
0xac509190, 0x60fa910e, 0xee7596ed, 0x22df9673,
0x281a9f6a, 0xe4b09ff4, 0x6a3f9817, 0xa6959889,
0x7fb58a25, 0xb31f8abb, 0x3d908d58, 0xf13a8dc6,
0xfbff84df, 0x37558441, 0xb9da83a2, 0x7570833c,
0x533b85da, 0x9f918544, 0x111e82a7, 0xddb48239,
0xd7718b20, 0x1bdb8bbe, 0x95548c5d, 0x59fe8cc3,
0x80de9e6f, 0x4c749ef1, 0xc2fb9912, 0x0e51998c,
0x04949095, 0xc83e900b, 0x46b197e8, 0x8a1b9776,
0x2f80b4f1, 0xe32ab46f, 0x6da5b38c, 0xa10fb312,
0xabcaba0b, 0x6760ba95, 0xe9efbd76, 0x2545bde8,
0xfc65af44, 0x30cfafda, 0xbe40a839, 0x72eaa8a7,
0x782fa1be, 0xb485a120, 0x3a0aa6c3, 0xf6a0a65d,
0xaa4de78c, 0x66e7e712, 0xe868e0f1, 0x24c2e06f,
0x2e07e976, 0xe2ade9e8, 0x6c22ee0b, 0xa088ee95,
0x79a8fc39, 0xb502fca7, 0x3b8dfb44, 0xf727fbda,
0xfde2f2c3, 0x3148f25d, 0xbfc7f5be, 0x736df520,
0xd6f6d6a7, 0x1a5cd639, 0x94d3d1da, 0x5879d144,
0x52bcd85d, 0x9e16d8c3, 0x1099df20, 0xdc33dfbe,
0x0513cd12, 0xc9b9cd8c, 0x4736ca6f, 0x8b9ccaf1,
0x8159c3e8, 0x4df3c376, 0xc37cc495, 0x0fd6c40b,
0x7aa64737, 0xb60c47a9, 0x3883404a, 0xf42940d4,
0xfeec49cd, 0x32464953, 0xbcc94eb0, 0x70634e2e,
0xa9435c82, 0x65e95c1c, 0xeb665bff, 0x27cc5b61,
0x2d095278, 0xe1a352e6, 0x6f2c5505, 0xa386559b,
0x061d761c, 0xcab77682, 0x44387161, 0x889271ff,
0x825778e6, 0x4efd7878, 0xc0727f9b, 0x0cd87f05,
0xd5f86da9, 0x19526d37, 0x97dd6ad4, 0x5b776a4a,
0x51b26353, 0x9d1863cd, 0x1397642e, 0xdf3d64b0,
0x83d02561, 0x4f7a25ff, 0xc1f5221c, 0x0d5f2282,
0x079a2b9b, 0xcb302b05, 0x45bf2ce6, 0x89152c78,
0x50353ed4, 0x9c9f3e4a, 0x121039a9, 0xdeba3937,
0xd47f302e, 0x18d530b0, 0x965a3753, 0x5af037cd,
0xff6b144a, 0x33c114d4, 0xbd4e1337, 0x71e413a9,
0x7b211ab0, 0xb78b1a2e, 0x39041dcd, 0xf5ae1d53,
0x2c8e0fff, 0xe0240f61, 0x6eab0882, 0xa201081c,
0xa8c40105, 0x646e019b, 0xeae10678, 0x264b06e6,
};

View file

@ -1,777 +0,0 @@
/*
* decompress_template.h
*
* Copyright 2016 Eric Biggers
*
* Permission is hereby granted, free of charge, to any person
* obtaining a copy of this software and associated documentation
* files (the "Software"), to deal in the Software without
* restriction, including without limitation the rights to use,
* copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following
* conditions:
*
* The above copyright notice and this permission notice shall be
* included in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
* OTHER DEALINGS IN THE SOFTWARE.
*/
/*
* This is the actual DEFLATE decompression routine, lifted out of
* deflate_decompress.c so that it can be compiled multiple times with different
* target instruction sets.
*/
#ifndef ATTRIBUTES
# define ATTRIBUTES
#endif
#ifndef EXTRACT_VARBITS
# define EXTRACT_VARBITS(word, count) ((word) & BITMASK(count))
#endif
#ifndef EXTRACT_VARBITS8
# define EXTRACT_VARBITS8(word, count) ((word) & BITMASK((u8)(count)))
#endif
static enum libdeflate_result ATTRIBUTES MAYBE_UNUSED
FUNCNAME(struct libdeflate_decompressor * restrict d,
const void * restrict in, size_t in_nbytes,
void * restrict out, size_t out_nbytes_avail,
size_t *actual_in_nbytes_ret, size_t *actual_out_nbytes_ret)
{
u8 *out_next = out;
u8 * const out_end = out_next + out_nbytes_avail;
u8 * const out_fastloop_end =
out_end - MIN(out_nbytes_avail, FASTLOOP_MAX_BYTES_WRITTEN);
/* Input bitstream state; see deflate_decompress.c for documentation */
const u8 *in_next = in;
const u8 * const in_end = in_next + in_nbytes;
const u8 * const in_fastloop_end =
in_end - MIN(in_nbytes, FASTLOOP_MAX_BYTES_READ);
bitbuf_t bitbuf = 0;
bitbuf_t saved_bitbuf;
u32 bitsleft = 0;
size_t overread_count = 0;
bool is_final_block;
unsigned block_type;
unsigned num_litlen_syms;
unsigned num_offset_syms;
bitbuf_t litlen_tablemask;
u32 entry;
next_block:
/* Starting to read the next block */
;
STATIC_ASSERT(CAN_CONSUME(1 + 2 + 5 + 5 + 4 + 3));
REFILL_BITS();
/* BFINAL: 1 bit */
is_final_block = bitbuf & BITMASK(1);
/* BTYPE: 2 bits */
block_type = (bitbuf >> 1) & BITMASK(2);
if (block_type == DEFLATE_BLOCKTYPE_DYNAMIC_HUFFMAN) {
/* Dynamic Huffman block */
/* The order in which precode lengths are stored */
static const u8 deflate_precode_lens_permutation[DEFLATE_NUM_PRECODE_SYMS] = {
16, 17, 18, 0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15
};
unsigned num_explicit_precode_lens;
unsigned i;
/* Read the codeword length counts. */
STATIC_ASSERT(DEFLATE_NUM_LITLEN_SYMS == 257 + BITMASK(5));
num_litlen_syms = 257 + ((bitbuf >> 3) & BITMASK(5));
STATIC_ASSERT(DEFLATE_NUM_OFFSET_SYMS == 1 + BITMASK(5));
num_offset_syms = 1 + ((bitbuf >> 8) & BITMASK(5));
STATIC_ASSERT(DEFLATE_NUM_PRECODE_SYMS == 4 + BITMASK(4));
num_explicit_precode_lens = 4 + ((bitbuf >> 13) & BITMASK(4));
d->static_codes_loaded = false;
/*
* Read the precode codeword lengths.
*
* A 64-bit bitbuffer is just one bit too small to hold the
* maximum number of precode lens, so to minimize branches we
* merge one len with the previous fields.
*/
STATIC_ASSERT(DEFLATE_MAX_PRE_CODEWORD_LEN == (1 << 3) - 1);
if (CAN_CONSUME(3 * (DEFLATE_NUM_PRECODE_SYMS - 1))) {
d->u.precode_lens[deflate_precode_lens_permutation[0]] =
(bitbuf >> 17) & BITMASK(3);
bitbuf >>= 20;
bitsleft -= 20;
REFILL_BITS();
i = 1;
do {
d->u.precode_lens[deflate_precode_lens_permutation[i]] =
bitbuf & BITMASK(3);
bitbuf >>= 3;
bitsleft -= 3;
} while (++i < num_explicit_precode_lens);
} else {
bitbuf >>= 17;
bitsleft -= 17;
i = 0;
do {
if ((u8)bitsleft < 3)
REFILL_BITS();
d->u.precode_lens[deflate_precode_lens_permutation[i]] =
bitbuf & BITMASK(3);
bitbuf >>= 3;
bitsleft -= 3;
} while (++i < num_explicit_precode_lens);
}
for (; i < DEFLATE_NUM_PRECODE_SYMS; i++)
d->u.precode_lens[deflate_precode_lens_permutation[i]] = 0;
/* Build the decode table for the precode. */
SAFETY_CHECK(build_precode_decode_table(d));
/* Decode the litlen and offset codeword lengths. */
i = 0;
do {
unsigned presym;
u8 rep_val;
unsigned rep_count;
if ((u8)bitsleft < DEFLATE_MAX_PRE_CODEWORD_LEN + 7)
REFILL_BITS();
/*
* The code below assumes that the precode decode table
* doesn't have any subtables.
*/
STATIC_ASSERT(PRECODE_TABLEBITS == DEFLATE_MAX_PRE_CODEWORD_LEN);
/* Decode the next precode symbol. */
entry = d->u.l.precode_decode_table[
bitbuf & BITMASK(DEFLATE_MAX_PRE_CODEWORD_LEN)];
bitbuf >>= (u8)entry;
bitsleft -= entry; /* optimization: subtract full entry */
presym = entry >> 16;
if (presym < 16) {
/* Explicit codeword length */
d->u.l.lens[i++] = presym;
continue;
}
/* Run-length encoded codeword lengths */
/*
* Note: we don't need to immediately verify that the
* repeat count doesn't overflow the number of elements,
* since we've sized the lens array to have enough extra
* space to allow for the worst-case overrun (138 zeroes
* when only 1 length was remaining).
*
* In the case of the small repeat counts (presyms 16
* and 17), it is fastest to always write the maximum
* number of entries. That gets rid of branches that
* would otherwise be required.
*
* It is not just because of the numerical order that
* our checks go in the order 'presym < 16', 'presym ==
* 16', and 'presym == 17'. For typical data this is
* ordered from most frequent to least frequent case.
*/
STATIC_ASSERT(DEFLATE_MAX_LENS_OVERRUN == 138 - 1);
if (presym == 16) {
/* Repeat the previous length 3 - 6 times. */
SAFETY_CHECK(i != 0);
rep_val = d->u.l.lens[i - 1];
STATIC_ASSERT(3 + BITMASK(2) == 6);
rep_count = 3 + (bitbuf & BITMASK(2));
bitbuf >>= 2;
bitsleft -= 2;
d->u.l.lens[i + 0] = rep_val;
d->u.l.lens[i + 1] = rep_val;
d->u.l.lens[i + 2] = rep_val;
d->u.l.lens[i + 3] = rep_val;
d->u.l.lens[i + 4] = rep_val;
d->u.l.lens[i + 5] = rep_val;
i += rep_count;
} else if (presym == 17) {
/* Repeat zero 3 - 10 times. */
STATIC_ASSERT(3 + BITMASK(3) == 10);
rep_count = 3 + (bitbuf & BITMASK(3));
bitbuf >>= 3;
bitsleft -= 3;
d->u.l.lens[i + 0] = 0;
d->u.l.lens[i + 1] = 0;
d->u.l.lens[i + 2] = 0;
d->u.l.lens[i + 3] = 0;
d->u.l.lens[i + 4] = 0;
d->u.l.lens[i + 5] = 0;
d->u.l.lens[i + 6] = 0;
d->u.l.lens[i + 7] = 0;
d->u.l.lens[i + 8] = 0;
d->u.l.lens[i + 9] = 0;
i += rep_count;
} else {
/* Repeat zero 11 - 138 times. */
STATIC_ASSERT(11 + BITMASK(7) == 138);
rep_count = 11 + (bitbuf & BITMASK(7));
bitbuf >>= 7;
bitsleft -= 7;
memset(&d->u.l.lens[i], 0,
rep_count * sizeof(d->u.l.lens[i]));
i += rep_count;
}
} while (i < num_litlen_syms + num_offset_syms);
/* Unnecessary, but check this for consistency with zlib. */
SAFETY_CHECK(i == num_litlen_syms + num_offset_syms);
} else if (block_type == DEFLATE_BLOCKTYPE_UNCOMPRESSED) {
u16 len, nlen;
/*
* Uncompressed block: copy 'len' bytes literally from the input
* buffer to the output buffer.
*/
bitsleft -= 3; /* for BTYPE and BFINAL */
/*
* Align the bitstream to the next byte boundary. This means
* the next byte boundary as if we were reading a byte at a
* time. Therefore, we have to rewind 'in_next' by any bytes
* that have been refilled but not actually consumed yet (not
* counting overread bytes, which don't increment 'in_next').
*/
bitsleft = (u8)bitsleft;
SAFETY_CHECK(overread_count <= (bitsleft >> 3));
in_next -= (bitsleft >> 3) - overread_count;
overread_count = 0;
bitbuf = 0;
bitsleft = 0;
SAFETY_CHECK(in_end - in_next >= 4);
len = get_unaligned_le16(in_next);
nlen = get_unaligned_le16(in_next + 2);
in_next += 4;
SAFETY_CHECK(len == (u16)~nlen);
if (unlikely(len > out_end - out_next))
return LIBDEFLATE_INSUFFICIENT_SPACE;
SAFETY_CHECK(len <= in_end - in_next);
memcpy(out_next, in_next, len);
in_next += len;
out_next += len;
goto block_done;
} else {
unsigned i;
SAFETY_CHECK(block_type == DEFLATE_BLOCKTYPE_STATIC_HUFFMAN);
/*
* Static Huffman block: build the decode tables for the static
* codes. Skip doing so if the tables are already set up from
* an earlier static block; this speeds up decompression of
* degenerate input of many empty or very short static blocks.
*
* Afterwards, the remainder is the same as decompressing a
* dynamic Huffman block.
*/
bitbuf >>= 3; /* for BTYPE and BFINAL */
bitsleft -= 3;
if (d->static_codes_loaded)
goto have_decode_tables;
d->static_codes_loaded = true;
STATIC_ASSERT(DEFLATE_NUM_LITLEN_SYMS == 288);
STATIC_ASSERT(DEFLATE_NUM_OFFSET_SYMS == 32);
for (i = 0; i < 144; i++)
d->u.l.lens[i] = 8;
for (; i < 256; i++)
d->u.l.lens[i] = 9;
for (; i < 280; i++)
d->u.l.lens[i] = 7;
for (; i < 288; i++)
d->u.l.lens[i] = 8;
for (; i < 288 + 32; i++)
d->u.l.lens[i] = 5;
num_litlen_syms = 288;
num_offset_syms = 32;
}
/* Decompressing a Huffman block (either dynamic or static) */
SAFETY_CHECK(build_offset_decode_table(d, num_litlen_syms, num_offset_syms));
SAFETY_CHECK(build_litlen_decode_table(d, num_litlen_syms, num_offset_syms));
have_decode_tables:
litlen_tablemask = BITMASK(d->litlen_tablebits);
/*
* This is the "fastloop" for decoding literals and matches. It does
* bounds checks on in_next and out_next in the loop conditions so that
* additional bounds checks aren't needed inside the loop body.
*
* To reduce latency, the bitbuffer is refilled and the next litlen
* decode table entry is preloaded before each loop iteration.
*/
if (in_next >= in_fastloop_end || out_next >= out_fastloop_end)
goto generic_loop;
REFILL_BITS_IN_FASTLOOP();
entry = d->u.litlen_decode_table[bitbuf & litlen_tablemask];
do {
u32 length, offset, lit;
const u8 *src;
u8 *dst;
/*
* Consume the bits for the litlen decode table entry. Save the
* original bitbuf for later, in case the extra match length
* bits need to be extracted from it.
*/
saved_bitbuf = bitbuf;
bitbuf >>= (u8)entry;
bitsleft -= entry; /* optimization: subtract full entry */
/*
* Begin by checking for a "fast" literal, i.e. a literal that
* doesn't need a subtable.
*/
if (entry & HUFFDEC_LITERAL) {
/*
* On 64-bit platforms, we decode up to 2 extra fast
* literals in addition to the primary item, as this
* increases performance and still leaves enough bits
* remaining for what follows. We could actually do 3,
* assuming LITLEN_TABLEBITS=11, but that actually
* decreases performance slightly (perhaps by messing
* with the branch prediction of the conditional refill
* that happens later while decoding the match offset).
*
* Note: the definitions of FASTLOOP_MAX_BYTES_WRITTEN
* and FASTLOOP_MAX_BYTES_READ need to be updated if the
* number of extra literals decoded here is changed.
*/
if (/* enough bits for 2 fast literals + length + offset preload? */
CAN_CONSUME_AND_THEN_PRELOAD(2 * LITLEN_TABLEBITS +
LENGTH_MAXBITS,
OFFSET_TABLEBITS) &&
/* enough bits for 2 fast literals + slow literal + litlen preload? */
CAN_CONSUME_AND_THEN_PRELOAD(2 * LITLEN_TABLEBITS +
DEFLATE_MAX_LITLEN_CODEWORD_LEN,
LITLEN_TABLEBITS)) {
/* 1st extra fast literal */
lit = entry >> 16;
entry = d->u.litlen_decode_table[bitbuf & litlen_tablemask];
saved_bitbuf = bitbuf;
bitbuf >>= (u8)entry;
bitsleft -= entry;
*out_next++ = lit;
if (entry & HUFFDEC_LITERAL) {
/* 2nd extra fast literal */
lit = entry >> 16;
entry = d->u.litlen_decode_table[bitbuf & litlen_tablemask];
saved_bitbuf = bitbuf;
bitbuf >>= (u8)entry;
bitsleft -= entry;
*out_next++ = lit;
if (entry & HUFFDEC_LITERAL) {
/*
* Another fast literal, but
* this one is in lieu of the
* primary item, so it doesn't
* count as one of the extras.
*/
lit = entry >> 16;
entry = d->u.litlen_decode_table[bitbuf & litlen_tablemask];
REFILL_BITS_IN_FASTLOOP();
*out_next++ = lit;
continue;
}
}
} else {
/*
* Decode a literal. While doing so, preload
* the next litlen decode table entry and refill
* the bitbuffer. To reduce latency, we've
* arranged for there to be enough "preloadable"
* bits remaining to do the table preload
* independently of the refill.
*/
STATIC_ASSERT(CAN_CONSUME_AND_THEN_PRELOAD(
LITLEN_TABLEBITS, LITLEN_TABLEBITS));
lit = entry >> 16;
entry = d->u.litlen_decode_table[bitbuf & litlen_tablemask];
REFILL_BITS_IN_FASTLOOP();
*out_next++ = lit;
continue;
}
}
/*
* It's not a literal entry, so it can be a length entry, a
* subtable pointer entry, or an end-of-block entry. Detect the
* two unlikely cases by testing the HUFFDEC_EXCEPTIONAL flag.
*/
if (unlikely(entry & HUFFDEC_EXCEPTIONAL)) {
/* Subtable pointer or end-of-block entry */
if (unlikely(entry & HUFFDEC_END_OF_BLOCK))
goto block_done;
/*
* A subtable is required. Load and consume the
* subtable entry. The subtable entry can be of any
* type: literal, length, or end-of-block.
*/
entry = d->u.litlen_decode_table[(entry >> 16) +
EXTRACT_VARBITS(bitbuf, (entry >> 8) & 0x3F)];
saved_bitbuf = bitbuf;
bitbuf >>= (u8)entry;
bitsleft -= entry;
/*
* 32-bit platforms that use the byte-at-a-time refill
* method have to do a refill here for there to always
* be enough bits to decode a literal that requires a
* subtable, then preload the next litlen decode table
* entry; or to decode a match length that requires a
* subtable, then preload the offset decode table entry.
*/
if (!CAN_CONSUME_AND_THEN_PRELOAD(DEFLATE_MAX_LITLEN_CODEWORD_LEN,
LITLEN_TABLEBITS) ||
!CAN_CONSUME_AND_THEN_PRELOAD(LENGTH_MAXBITS,
OFFSET_TABLEBITS))
REFILL_BITS_IN_FASTLOOP();
if (entry & HUFFDEC_LITERAL) {
/* Decode a literal that required a subtable. */
lit = entry >> 16;
entry = d->u.litlen_decode_table[bitbuf & litlen_tablemask];
REFILL_BITS_IN_FASTLOOP();
*out_next++ = lit;
continue;
}
if (unlikely(entry & HUFFDEC_END_OF_BLOCK))
goto block_done;
/* Else, it's a length that required a subtable. */
}
/*
* Decode the match length: the length base value associated
* with the litlen symbol (which we extract from the decode
* table entry), plus the extra length bits. We don't need to
* consume the extra length bits here, as they were included in
* the bits consumed by the entry earlier. We also don't need
* to check for too-long matches here, as this is inside the
* fastloop where it's already been verified that the output
* buffer has enough space remaining to copy a max-length match.
*/
length = entry >> 16;
length += EXTRACT_VARBITS8(saved_bitbuf, entry) >> (u8)(entry >> 8);
/*
* Decode the match offset. There are enough "preloadable" bits
* remaining to preload the offset decode table entry, but a
* refill might be needed before consuming it.
*/
STATIC_ASSERT(CAN_CONSUME_AND_THEN_PRELOAD(LENGTH_MAXFASTBITS,
OFFSET_TABLEBITS));
entry = d->offset_decode_table[bitbuf & BITMASK(OFFSET_TABLEBITS)];
if (CAN_CONSUME_AND_THEN_PRELOAD(OFFSET_MAXBITS,
LITLEN_TABLEBITS)) {
/*
* Decoding a match offset on a 64-bit platform. We may
* need to refill once, but then we can decode the whole
* offset and preload the next litlen table entry.
*/
if (unlikely(entry & HUFFDEC_EXCEPTIONAL)) {
/* Offset codeword requires a subtable */
if (unlikely((u8)bitsleft < OFFSET_MAXBITS +
LITLEN_TABLEBITS - PRELOAD_SLACK))
REFILL_BITS_IN_FASTLOOP();
bitbuf >>= OFFSET_TABLEBITS;
bitsleft -= OFFSET_TABLEBITS;
entry = d->offset_decode_table[(entry >> 16) +
EXTRACT_VARBITS(bitbuf, (entry >> 8) & 0x3F)];
} else if (unlikely((u8)bitsleft < OFFSET_MAXFASTBITS +
LITLEN_TABLEBITS - PRELOAD_SLACK))
REFILL_BITS_IN_FASTLOOP();
} else {
/* Decoding a match offset on a 32-bit platform */
REFILL_BITS_IN_FASTLOOP();
if (unlikely(entry & HUFFDEC_EXCEPTIONAL)) {
/* Offset codeword requires a subtable */
bitbuf >>= OFFSET_TABLEBITS;
bitsleft -= OFFSET_TABLEBITS;
entry = d->offset_decode_table[(entry >> 16) +
EXTRACT_VARBITS(bitbuf, (entry >> 8) & 0x3F)];
REFILL_BITS_IN_FASTLOOP();
/* No further refill needed before extra bits */
STATIC_ASSERT(CAN_CONSUME(
OFFSET_MAXBITS - OFFSET_TABLEBITS));
} else {
/* No refill needed before extra bits */
STATIC_ASSERT(CAN_CONSUME(OFFSET_MAXFASTBITS));
}
}
saved_bitbuf = bitbuf;
bitbuf >>= (u8)entry;
bitsleft -= entry; /* optimization: subtract full entry */
offset = entry >> 16;
offset += EXTRACT_VARBITS8(saved_bitbuf, entry) >> (u8)(entry >> 8);
/* Validate the match offset; needed even in the fastloop. */
SAFETY_CHECK(offset <= out_next - (const u8 *)out);
src = out_next - offset;
dst = out_next;
out_next += length;
/*
* Before starting to issue the instructions to copy the match,
* refill the bitbuffer and preload the litlen decode table
* entry for the next loop iteration. This can increase
* performance by allowing the latency of the match copy to
* overlap with these other operations. To further reduce
* latency, we've arranged for there to be enough bits remaining
* to do the table preload independently of the refill, except
* on 32-bit platforms using the byte-at-a-time refill method.
*/
if (!CAN_CONSUME_AND_THEN_PRELOAD(
MAX(OFFSET_MAXBITS - OFFSET_TABLEBITS,
OFFSET_MAXFASTBITS),
LITLEN_TABLEBITS) &&
unlikely((u8)bitsleft < LITLEN_TABLEBITS - PRELOAD_SLACK))
REFILL_BITS_IN_FASTLOOP();
entry = d->u.litlen_decode_table[bitbuf & litlen_tablemask];
REFILL_BITS_IN_FASTLOOP();
/*
* Copy the match. On most CPUs the fastest method is a
* word-at-a-time copy, unconditionally copying about 5 words
* since this is enough for most matches without being too much.
*
* The normal word-at-a-time copy works for offset >= WORDBYTES,
* which is most cases. The case of offset == 1 is also common
* and is worth optimizing for, since it is just RLE encoding of
* the previous byte, which is the result of compressing long
* runs of the same byte.
*
* Writing past the match 'length' is allowed here, since it's
* been ensured there is enough output space left for a slight
* overrun. FASTLOOP_MAX_BYTES_WRITTEN needs to be updated if
* the maximum possible overrun here is changed.
*/
if (UNALIGNED_ACCESS_IS_FAST && offset >= WORDBYTES) {
store_word_unaligned(load_word_unaligned(src), dst);
src += WORDBYTES;
dst += WORDBYTES;
store_word_unaligned(load_word_unaligned(src), dst);
src += WORDBYTES;
dst += WORDBYTES;
store_word_unaligned(load_word_unaligned(src), dst);
src += WORDBYTES;
dst += WORDBYTES;
store_word_unaligned(load_word_unaligned(src), dst);
src += WORDBYTES;
dst += WORDBYTES;
store_word_unaligned(load_word_unaligned(src), dst);
src += WORDBYTES;
dst += WORDBYTES;
while (dst < out_next) {
store_word_unaligned(load_word_unaligned(src), dst);
src += WORDBYTES;
dst += WORDBYTES;
store_word_unaligned(load_word_unaligned(src), dst);
src += WORDBYTES;
dst += WORDBYTES;
store_word_unaligned(load_word_unaligned(src), dst);
src += WORDBYTES;
dst += WORDBYTES;
store_word_unaligned(load_word_unaligned(src), dst);
src += WORDBYTES;
dst += WORDBYTES;
store_word_unaligned(load_word_unaligned(src), dst);
src += WORDBYTES;
dst += WORDBYTES;
}
} else if (UNALIGNED_ACCESS_IS_FAST && offset == 1) {
machine_word_t v;
/*
* This part tends to get auto-vectorized, so keep it
* copying a multiple of 16 bytes at a time.
*/
v = (machine_word_t)0x0101010101010101 * src[0];
store_word_unaligned(v, dst);
dst += WORDBYTES;
store_word_unaligned(v, dst);
dst += WORDBYTES;
store_word_unaligned(v, dst);
dst += WORDBYTES;
store_word_unaligned(v, dst);
dst += WORDBYTES;
while (dst < out_next) {
store_word_unaligned(v, dst);
dst += WORDBYTES;
store_word_unaligned(v, dst);
dst += WORDBYTES;
store_word_unaligned(v, dst);
dst += WORDBYTES;
store_word_unaligned(v, dst);
dst += WORDBYTES;
}
} else if (UNALIGNED_ACCESS_IS_FAST) {
store_word_unaligned(load_word_unaligned(src), dst);
src += offset;
dst += offset;
store_word_unaligned(load_word_unaligned(src), dst);
src += offset;
dst += offset;
do {
store_word_unaligned(load_word_unaligned(src), dst);
src += offset;
dst += offset;
store_word_unaligned(load_word_unaligned(src), dst);
src += offset;
dst += offset;
} while (dst < out_next);
} else {
*dst++ = *src++;
*dst++ = *src++;
do {
*dst++ = *src++;
} while (dst < out_next);
}
} while (in_next < in_fastloop_end && out_next < out_fastloop_end);
/*
* This is the generic loop for decoding literals and matches. This
* handles cases where in_next and out_next are close to the end of
* their respective buffers. Usually this loop isn't performance-
* critical, as most time is spent in the fastloop above instead. We
* therefore omit some optimizations here in favor of smaller code.
*/
generic_loop:
for (;;) {
u32 length, offset;
const u8 *src;
u8 *dst;
REFILL_BITS();
entry = d->u.litlen_decode_table[bitbuf & litlen_tablemask];
saved_bitbuf = bitbuf;
bitbuf >>= (u8)entry;
bitsleft -= entry;
if (unlikely(entry & HUFFDEC_SUBTABLE_POINTER)) {
entry = d->u.litlen_decode_table[(entry >> 16) +
EXTRACT_VARBITS(bitbuf, (entry >> 8) & 0x3F)];
saved_bitbuf = bitbuf;
bitbuf >>= (u8)entry;
bitsleft -= entry;
}
length = entry >> 16;
if (entry & HUFFDEC_LITERAL) {
if (unlikely(out_next == out_end))
return LIBDEFLATE_INSUFFICIENT_SPACE;
*out_next++ = length;
continue;
}
if (unlikely(entry & HUFFDEC_END_OF_BLOCK))
goto block_done;
length += EXTRACT_VARBITS8(saved_bitbuf, entry) >> (u8)(entry >> 8);
if (unlikely(length > out_end - out_next))
return LIBDEFLATE_INSUFFICIENT_SPACE;
if (!CAN_CONSUME(LENGTH_MAXBITS + OFFSET_MAXBITS))
REFILL_BITS();
entry = d->offset_decode_table[bitbuf & BITMASK(OFFSET_TABLEBITS)];
if (unlikely(entry & HUFFDEC_EXCEPTIONAL)) {
bitbuf >>= OFFSET_TABLEBITS;
bitsleft -= OFFSET_TABLEBITS;
entry = d->offset_decode_table[(entry >> 16) +
EXTRACT_VARBITS(bitbuf, (entry >> 8) & 0x3F)];
if (!CAN_CONSUME(OFFSET_MAXBITS))
REFILL_BITS();
}
offset = entry >> 16;
offset += EXTRACT_VARBITS8(bitbuf, entry) >> (u8)(entry >> 8);
bitbuf >>= (u8)entry;
bitsleft -= entry;
SAFETY_CHECK(offset <= out_next - (const u8 *)out);
src = out_next - offset;
dst = out_next;
out_next += length;
STATIC_ASSERT(DEFLATE_MIN_MATCH_LEN == 3);
*dst++ = *src++;
*dst++ = *src++;
do {
*dst++ = *src++;
} while (dst < out_next);
}
block_done:
/* Finished decoding a block */
if (!is_final_block)
goto next_block;
/* That was the last block. */
bitsleft = (u8)bitsleft;
/*
* If any of the implicit appended zero bytes were consumed (not just
* refilled) before hitting end of stream, then the data is bad.
*/
SAFETY_CHECK(overread_count <= (bitsleft >> 3));
/* Optionally return the actual number of bytes consumed. */
if (actual_in_nbytes_ret) {
/* Don't count bytes that were refilled but not consumed. */
in_next -= (bitsleft >> 3) - overread_count;
*actual_in_nbytes_ret = in_next - (u8 *)in;
}
/* Optionally return the actual number of bytes written. */
if (actual_out_nbytes_ret) {
*actual_out_nbytes_ret = out_next - (u8 *)out;
} else {
if (out_next != out_end)
return LIBDEFLATE_SHORT_OUTPUT;
}
return LIBDEFLATE_SUCCESS;
}
#undef FUNCNAME
#undef ATTRIBUTES
#undef EXTRACT_VARBITS
#undef EXTRACT_VARBITS8

File diff suppressed because it is too large Load diff

View file

@ -1,15 +0,0 @@
#ifndef LIB_DEFLATE_COMPRESS_H
#define LIB_DEFLATE_COMPRESS_H
#include "lib_common.h"
/*
* DEFLATE compression is private to deflate_compress.c, but we do need to be
* able to query the compression level for zlib and gzip header generation.
*/
struct libdeflate_compressor;
unsigned int libdeflate_get_compression_level(struct libdeflate_compressor *c);
#endif /* LIB_DEFLATE_COMPRESS_H */

View file

@ -1,56 +0,0 @@
/*
* deflate_constants.h - constants for the DEFLATE compression format
*/
#ifndef LIB_DEFLATE_CONSTANTS_H
#define LIB_DEFLATE_CONSTANTS_H
/* Valid block types */
#define DEFLATE_BLOCKTYPE_UNCOMPRESSED 0
#define DEFLATE_BLOCKTYPE_STATIC_HUFFMAN 1
#define DEFLATE_BLOCKTYPE_DYNAMIC_HUFFMAN 2
/* Minimum and maximum supported match lengths (in bytes) */
#define DEFLATE_MIN_MATCH_LEN 3
#define DEFLATE_MAX_MATCH_LEN 258
/* Maximum supported match offset (in bytes) */
#define DEFLATE_MAX_MATCH_OFFSET 32768
/* log2 of DEFLATE_MAX_MATCH_OFFSET */
#define DEFLATE_WINDOW_ORDER 15
/* Number of symbols in each Huffman code. Note: for the literal/length
* and offset codes, these are actually the maximum values; a given block
* might use fewer symbols. */
#define DEFLATE_NUM_PRECODE_SYMS 19
#define DEFLATE_NUM_LITLEN_SYMS 288
#define DEFLATE_NUM_OFFSET_SYMS 32
/* The maximum number of symbols across all codes */
#define DEFLATE_MAX_NUM_SYMS 288
/* Division of symbols in the literal/length code */
#define DEFLATE_NUM_LITERALS 256
#define DEFLATE_END_OF_BLOCK 256
#define DEFLATE_FIRST_LEN_SYM 257
/* Maximum codeword length, in bits, within each Huffman code */
#define DEFLATE_MAX_PRE_CODEWORD_LEN 7
#define DEFLATE_MAX_LITLEN_CODEWORD_LEN 15
#define DEFLATE_MAX_OFFSET_CODEWORD_LEN 15
/* The maximum codeword length across all codes */
#define DEFLATE_MAX_CODEWORD_LEN 15
/* Maximum possible overrun when decoding codeword lengths */
#define DEFLATE_MAX_LENS_OVERRUN 137
/*
* Maximum number of extra bits that may be required to represent a match
* length or offset.
*/
#define DEFLATE_MAX_EXTRA_LENGTH_BITS 5
#define DEFLATE_MAX_EXTRA_OFFSET_BITS 13
#endif /* LIB_DEFLATE_CONSTANTS_H */

File diff suppressed because it is too large Load diff

View file

@ -1,90 +0,0 @@
/*
* gzip_compress.c - compress with a gzip wrapper
*
* Copyright 2016 Eric Biggers
*
* Permission is hereby granted, free of charge, to any person
* obtaining a copy of this software and associated documentation
* files (the "Software"), to deal in the Software without
* restriction, including without limitation the rights to use,
* copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following
* conditions:
*
* The above copyright notice and this permission notice shall be
* included in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
* OTHER DEALINGS IN THE SOFTWARE.
*/
#include "deflate_compress.h"
#include "gzip_constants.h"
LIBDEFLATEAPI size_t
libdeflate_gzip_compress(struct libdeflate_compressor *c,
const void *in, size_t in_nbytes,
void *out, size_t out_nbytes_avail)
{
u8 *out_next = out;
unsigned compression_level;
u8 xfl;
size_t deflate_size;
if (out_nbytes_avail <= GZIP_MIN_OVERHEAD)
return 0;
/* ID1 */
*out_next++ = GZIP_ID1;
/* ID2 */
*out_next++ = GZIP_ID2;
/* CM */
*out_next++ = GZIP_CM_DEFLATE;
/* FLG */
*out_next++ = 0;
/* MTIME */
put_unaligned_le32(GZIP_MTIME_UNAVAILABLE, out_next);
out_next += 4;
/* XFL */
xfl = 0;
compression_level = libdeflate_get_compression_level(c);
if (compression_level < 2)
xfl |= GZIP_XFL_FASTEST_COMPRESSION;
else if (compression_level >= 8)
xfl |= GZIP_XFL_SLOWEST_COMPRESSION;
*out_next++ = xfl;
/* OS */
*out_next++ = GZIP_OS_UNKNOWN; /* OS */
/* Compressed data */
deflate_size = libdeflate_deflate_compress(c, in, in_nbytes, out_next,
out_nbytes_avail - GZIP_MIN_OVERHEAD);
if (deflate_size == 0)
return 0;
out_next += deflate_size;
/* CRC32 */
put_unaligned_le32(libdeflate_crc32(0, in, in_nbytes), out_next);
out_next += 4;
/* ISIZE */
put_unaligned_le32((u32)in_nbytes, out_next);
out_next += 4;
return out_next - (u8 *)out;
}
LIBDEFLATEAPI size_t
libdeflate_gzip_compress_bound(struct libdeflate_compressor *c,
size_t in_nbytes)
{
return GZIP_MIN_OVERHEAD +
libdeflate_deflate_compress_bound(c, in_nbytes);
}

View file

@ -1,45 +0,0 @@
/*
* gzip_constants.h - constants for the gzip wrapper format
*/
#ifndef LIB_GZIP_CONSTANTS_H
#define LIB_GZIP_CONSTANTS_H
#define GZIP_MIN_HEADER_SIZE 10
#define GZIP_FOOTER_SIZE 8
#define GZIP_MIN_OVERHEAD (GZIP_MIN_HEADER_SIZE + GZIP_FOOTER_SIZE)
#define GZIP_ID1 0x1F
#define GZIP_ID2 0x8B
#define GZIP_CM_DEFLATE 8
#define GZIP_FTEXT 0x01
#define GZIP_FHCRC 0x02
#define GZIP_FEXTRA 0x04
#define GZIP_FNAME 0x08
#define GZIP_FCOMMENT 0x10
#define GZIP_FRESERVED 0xE0
#define GZIP_MTIME_UNAVAILABLE 0
#define GZIP_XFL_SLOWEST_COMPRESSION 0x02
#define GZIP_XFL_FASTEST_COMPRESSION 0x04
#define GZIP_OS_FAT 0
#define GZIP_OS_AMIGA 1
#define GZIP_OS_VMS 2
#define GZIP_OS_UNIX 3
#define GZIP_OS_VM_CMS 4
#define GZIP_OS_ATARI_TOS 5
#define GZIP_OS_HPFS 6
#define GZIP_OS_MACINTOSH 7
#define GZIP_OS_Z_SYSTEM 8
#define GZIP_OS_CP_M 9
#define GZIP_OS_TOPS_20 10
#define GZIP_OS_NTFS 11
#define GZIP_OS_QDOS 12
#define GZIP_OS_RISCOS 13
#define GZIP_OS_UNKNOWN 255
#endif /* LIB_GZIP_CONSTANTS_H */

View file

@ -1,144 +0,0 @@
/*
* gzip_decompress.c - decompress with a gzip wrapper
*
* Copyright 2016 Eric Biggers
*
* Permission is hereby granted, free of charge, to any person
* obtaining a copy of this software and associated documentation
* files (the "Software"), to deal in the Software without
* restriction, including without limitation the rights to use,
* copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following
* conditions:
*
* The above copyright notice and this permission notice shall be
* included in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
* OTHER DEALINGS IN THE SOFTWARE.
*/
#include "lib_common.h"
#include "gzip_constants.h"
LIBDEFLATEAPI enum libdeflate_result
libdeflate_gzip_decompress_ex(struct libdeflate_decompressor *d,
const void *in, size_t in_nbytes,
void *out, size_t out_nbytes_avail,
size_t *actual_in_nbytes_ret,
size_t *actual_out_nbytes_ret)
{
const u8 *in_next = in;
const u8 * const in_end = in_next + in_nbytes;
u8 flg;
size_t actual_in_nbytes;
size_t actual_out_nbytes;
enum libdeflate_result result;
if (in_nbytes < GZIP_MIN_OVERHEAD)
return LIBDEFLATE_BAD_DATA;
/* ID1 */
if (*in_next++ != GZIP_ID1)
return LIBDEFLATE_BAD_DATA;
/* ID2 */
if (*in_next++ != GZIP_ID2)
return LIBDEFLATE_BAD_DATA;
/* CM */
if (*in_next++ != GZIP_CM_DEFLATE)
return LIBDEFLATE_BAD_DATA;
flg = *in_next++;
/* MTIME */
in_next += 4;
/* XFL */
in_next += 1;
/* OS */
in_next += 1;
if (flg & GZIP_FRESERVED)
return LIBDEFLATE_BAD_DATA;
/* Extra field */
if (flg & GZIP_FEXTRA) {
u16 xlen = get_unaligned_le16(in_next);
in_next += 2;
if (in_end - in_next < (u32)xlen + GZIP_FOOTER_SIZE)
return LIBDEFLATE_BAD_DATA;
in_next += xlen;
}
/* Original file name (zero terminated) */
if (flg & GZIP_FNAME) {
while (*in_next++ != 0 && in_next != in_end)
;
if (in_end - in_next < GZIP_FOOTER_SIZE)
return LIBDEFLATE_BAD_DATA;
}
/* File comment (zero terminated) */
if (flg & GZIP_FCOMMENT) {
while (*in_next++ != 0 && in_next != in_end)
;
if (in_end - in_next < GZIP_FOOTER_SIZE)
return LIBDEFLATE_BAD_DATA;
}
/* CRC16 for gzip header */
if (flg & GZIP_FHCRC) {
in_next += 2;
if (in_end - in_next < GZIP_FOOTER_SIZE)
return LIBDEFLATE_BAD_DATA;
}
/* Compressed data */
result = libdeflate_deflate_decompress_ex(d, in_next,
in_end - GZIP_FOOTER_SIZE - in_next,
out, out_nbytes_avail,
&actual_in_nbytes,
actual_out_nbytes_ret);
if (result != LIBDEFLATE_SUCCESS)
return result;
if (actual_out_nbytes_ret)
actual_out_nbytes = *actual_out_nbytes_ret;
else
actual_out_nbytes = out_nbytes_avail;
in_next += actual_in_nbytes;
/* CRC32 */
if (libdeflate_crc32(0, out, actual_out_nbytes) !=
get_unaligned_le32(in_next))
return LIBDEFLATE_BAD_DATA;
in_next += 4;
/* ISIZE */
if ((u32)actual_out_nbytes != get_unaligned_le32(in_next))
return LIBDEFLATE_BAD_DATA;
in_next += 4;
if (actual_in_nbytes_ret)
*actual_in_nbytes_ret = in_next - (u8 *)in;
return LIBDEFLATE_SUCCESS;
}
LIBDEFLATEAPI enum libdeflate_result
libdeflate_gzip_decompress(struct libdeflate_decompressor *d,
const void *in, size_t in_nbytes,
void *out, size_t out_nbytes_avail,
size_t *actual_out_nbytes_ret)
{
return libdeflate_gzip_decompress_ex(d, in, in_nbytes,
out, out_nbytes_avail,
NULL, actual_out_nbytes_ret);
}

View file

@ -1,401 +0,0 @@
/*
* hc_matchfinder.h - Lempel-Ziv matchfinding with a hash table of linked lists
*
* Copyright 2016 Eric Biggers
*
* Permission is hereby granted, free of charge, to any person
* obtaining a copy of this software and associated documentation
* files (the "Software"), to deal in the Software without
* restriction, including without limitation the rights to use,
* copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following
* conditions:
*
* The above copyright notice and this permission notice shall be
* included in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
* OTHER DEALINGS IN THE SOFTWARE.
*
* ---------------------------------------------------------------------------
*
* Algorithm
*
* This is a Hash Chains (hc) based matchfinder.
*
* The main data structure is a hash table where each hash bucket contains a
* linked list (or "chain") of sequences whose first 4 bytes share the same hash
* code. Each sequence is identified by its starting position in the input
* buffer.
*
* The algorithm processes the input buffer sequentially. At each byte
* position, the hash code of the first 4 bytes of the sequence beginning at
* that position (the sequence being matched against) is computed. This
* identifies the hash bucket to use for that position. Then, this hash
* bucket's linked list is searched for matches. Then, a new linked list node
* is created to represent the current sequence and is prepended to the list.
*
* This algorithm has several useful properties:
*
* - It only finds true Lempel-Ziv matches; i.e., those where the matching
* sequence occurs prior to the sequence being matched against.
*
* - The sequences in each linked list are always sorted by decreasing starting
* position. Therefore, the closest (smallest offset) matches are found
* first, which in many compression formats tend to be the cheapest to encode.
*
* - Although fast running time is not guaranteed due to the possibility of the
* lists getting very long, the worst degenerate behavior can be easily
* prevented by capping the number of nodes searched at each position.
*
* - If the compressor decides not to search for matches at a certain position,
* then that position can be quickly inserted without searching the list.
*
* - The algorithm is adaptable to sliding windows: just store the positions
* relative to a "base" value that is updated from time to time, and stop
* searching each list when the sequences get too far away.
*
* ----------------------------------------------------------------------------
*
* Optimizations
*
* The main hash table and chains handle length 4+ matches. Length 3 matches
* are handled by a separate hash table with no chains. This works well for
* typical "greedy" or "lazy"-style compressors, where length 3 matches are
* often only helpful if they have small offsets. Instead of searching a full
* chain for length 3+ matches, the algorithm just checks for one close length 3
* match, then focuses on finding length 4+ matches.
*
* The longest_match() and skip_bytes() functions are inlined into the
* compressors that use them. This isn't just about saving the overhead of a
* function call. These functions are intended to be called from the inner
* loops of compressors, where giving the compiler more control over register
* allocation is very helpful. There is also significant benefit to be gained
* from allowing the CPU to predict branches independently at each call site.
* For example, "lazy"-style compressors can be written with two calls to
* longest_match(), each of which starts with a different 'best_len' and
* therefore has significantly different performance characteristics.
*
* Although any hash function can be used, a multiplicative hash is fast and
* works well.
*
* On some processors, it is significantly faster to extend matches by whole
* words (32 or 64 bits) instead of by individual bytes. For this to be the
* case, the processor must implement unaligned memory accesses efficiently and
* must have either a fast "find first set bit" instruction or a fast "find last
* set bit" instruction, depending on the processor's endianness.
*
* The code uses one loop for finding the first match and one loop for finding a
* longer match. Each of these loops is tuned for its respective task and in
* combination are faster than a single generalized loop that handles both
* tasks.
*
* The code also uses a tight inner loop that only compares the last and first
* bytes of a potential match. It is only when these bytes match that a full
* match extension is attempted.
*
* ----------------------------------------------------------------------------
*/
#ifndef LIB_HC_MATCHFINDER_H
#define LIB_HC_MATCHFINDER_H
#include "matchfinder_common.h"
#define HC_MATCHFINDER_HASH3_ORDER 15
#define HC_MATCHFINDER_HASH4_ORDER 16
#define HC_MATCHFINDER_TOTAL_HASH_SIZE \
(((1UL << HC_MATCHFINDER_HASH3_ORDER) + \
(1UL << HC_MATCHFINDER_HASH4_ORDER)) * sizeof(mf_pos_t))
struct MATCHFINDER_ALIGNED hc_matchfinder {
/* The hash table for finding length 3 matches */
mf_pos_t hash3_tab[1UL << HC_MATCHFINDER_HASH3_ORDER];
/* The hash table which contains the first nodes of the linked lists for
* finding length 4+ matches */
mf_pos_t hash4_tab[1UL << HC_MATCHFINDER_HASH4_ORDER];
/* The "next node" references for the linked lists. The "next node" of
* the node for the sequence with position 'pos' is 'next_tab[pos]'. */
mf_pos_t next_tab[MATCHFINDER_WINDOW_SIZE];
};
/* Prepare the matchfinder for a new input buffer. */
static forceinline void
hc_matchfinder_init(struct hc_matchfinder *mf)
{
STATIC_ASSERT(HC_MATCHFINDER_TOTAL_HASH_SIZE %
MATCHFINDER_SIZE_ALIGNMENT == 0);
matchfinder_init((mf_pos_t *)mf, HC_MATCHFINDER_TOTAL_HASH_SIZE);
}
static forceinline void
hc_matchfinder_slide_window(struct hc_matchfinder *mf)
{
STATIC_ASSERT(sizeof(*mf) % MATCHFINDER_SIZE_ALIGNMENT == 0);
matchfinder_rebase((mf_pos_t *)mf, sizeof(*mf));
}
/*
* Find the longest match longer than 'best_len' bytes.
*
* @mf
* The matchfinder structure.
* @in_base_p
* Location of a pointer which points to the place in the input data the
* matchfinder currently stores positions relative to. This may be updated
* by this function.
* @in_next
* Pointer to the next position in the input buffer, i.e. the sequence
* being matched against.
* @best_len
* Require a match longer than this length.
* @max_len
* The maximum permissible match length at this position.
* @nice_len
* Stop searching if a match of at least this length is found.
* Must be <= @max_len.
* @max_search_depth
* Limit on the number of potential matches to consider. Must be >= 1.
* @next_hashes
* The precomputed hash codes for the sequence beginning at @in_next.
* These will be used and then updated with the precomputed hashcodes for
* the sequence beginning at @in_next + 1.
* @offset_ret
* If a match is found, its offset is returned in this location.
*
* Return the length of the match found, or 'best_len' if no match longer than
* 'best_len' was found.
*/
static forceinline u32
hc_matchfinder_longest_match(struct hc_matchfinder * const mf,
const u8 ** const in_base_p,
const u8 * const in_next,
u32 best_len,
const u32 max_len,
const u32 nice_len,
const u32 max_search_depth,
u32 * const next_hashes,
u32 * const offset_ret)
{
u32 depth_remaining = max_search_depth;
const u8 *best_matchptr = in_next;
mf_pos_t cur_node3, cur_node4;
u32 hash3, hash4;
u32 next_hashseq;
u32 seq4;
const u8 *matchptr;
u32 len;
u32 cur_pos = in_next - *in_base_p;
const u8 *in_base;
mf_pos_t cutoff;
if (cur_pos == MATCHFINDER_WINDOW_SIZE) {
hc_matchfinder_slide_window(mf);
*in_base_p += MATCHFINDER_WINDOW_SIZE;
cur_pos = 0;
}
in_base = *in_base_p;
cutoff = cur_pos - MATCHFINDER_WINDOW_SIZE;
if (unlikely(max_len < 5)) /* can we read 4 bytes from 'in_next + 1'? */
goto out;
/* Get the precomputed hash codes. */
hash3 = next_hashes[0];
hash4 = next_hashes[1];
/* From the hash buckets, get the first node of each linked list. */
cur_node3 = mf->hash3_tab[hash3];
cur_node4 = mf->hash4_tab[hash4];
/* Update for length 3 matches. This replaces the singleton node in the
* 'hash3' bucket with the node for the current sequence. */
mf->hash3_tab[hash3] = cur_pos;
/* Update for length 4 matches. This prepends the node for the current
* sequence to the linked list in the 'hash4' bucket. */
mf->hash4_tab[hash4] = cur_pos;
mf->next_tab[cur_pos] = cur_node4;
/* Compute the next hash codes. */
next_hashseq = get_unaligned_le32(in_next + 1);
next_hashes[0] = lz_hash(next_hashseq & 0xFFFFFF, HC_MATCHFINDER_HASH3_ORDER);
next_hashes[1] = lz_hash(next_hashseq, HC_MATCHFINDER_HASH4_ORDER);
prefetchw(&mf->hash3_tab[next_hashes[0]]);
prefetchw(&mf->hash4_tab[next_hashes[1]]);
if (best_len < 4) { /* No match of length >= 4 found yet? */
/* Check for a length 3 match if needed. */
if (cur_node3 <= cutoff)
goto out;
seq4 = load_u32_unaligned(in_next);
if (best_len < 3) {
matchptr = &in_base[cur_node3];
if (load_u24_unaligned(matchptr) == loaded_u32_to_u24(seq4)) {
best_len = 3;
best_matchptr = matchptr;
}
}
/* Check for a length 4 match. */
if (cur_node4 <= cutoff)
goto out;
for (;;) {
/* No length 4 match found yet. Check the first 4 bytes. */
matchptr = &in_base[cur_node4];
if (load_u32_unaligned(matchptr) == seq4)
break;
/* The first 4 bytes did not match. Keep trying. */
cur_node4 = mf->next_tab[cur_node4 & (MATCHFINDER_WINDOW_SIZE - 1)];
if (cur_node4 <= cutoff || !--depth_remaining)
goto out;
}
/* Found a match of length >= 4. Extend it to its full length. */
best_matchptr = matchptr;
best_len = lz_extend(in_next, best_matchptr, 4, max_len);
if (best_len >= nice_len)
goto out;
cur_node4 = mf->next_tab[cur_node4 & (MATCHFINDER_WINDOW_SIZE - 1)];
if (cur_node4 <= cutoff || !--depth_remaining)
goto out;
} else {
if (cur_node4 <= cutoff || best_len >= nice_len)
goto out;
}
/* Check for matches of length >= 5. */
for (;;) {
for (;;) {
matchptr = &in_base[cur_node4];
/* Already found a length 4 match. Try for a longer
* match; start by checking either the last 4 bytes and
* the first 4 bytes, or the last byte. (The last byte,
* the one which would extend the match length by 1, is
* the most important.) */
#if UNALIGNED_ACCESS_IS_FAST
if ((load_u32_unaligned(matchptr + best_len - 3) ==
load_u32_unaligned(in_next + best_len - 3)) &&
(load_u32_unaligned(matchptr) ==
load_u32_unaligned(in_next)))
#else
if (matchptr[best_len] == in_next[best_len])
#endif
break;
/* Continue to the next node in the list. */
cur_node4 = mf->next_tab[cur_node4 & (MATCHFINDER_WINDOW_SIZE - 1)];
if (cur_node4 <= cutoff || !--depth_remaining)
goto out;
}
#if UNALIGNED_ACCESS_IS_FAST
len = 4;
#else
len = 0;
#endif
len = lz_extend(in_next, matchptr, len, max_len);
if (len > best_len) {
/* This is the new longest match. */
best_len = len;
best_matchptr = matchptr;
if (best_len >= nice_len)
goto out;
}
/* Continue to the next node in the list. */
cur_node4 = mf->next_tab[cur_node4 & (MATCHFINDER_WINDOW_SIZE - 1)];
if (cur_node4 <= cutoff || !--depth_remaining)
goto out;
}
out:
*offset_ret = in_next - best_matchptr;
return best_len;
}
/*
* Advance the matchfinder, but don't search for matches.
*
* @mf
* The matchfinder structure.
* @in_base_p
* Location of a pointer which points to the place in the input data the
* matchfinder currently stores positions relative to. This may be updated
* by this function.
* @in_next
* Pointer to the next position in the input buffer.
* @in_end
* Pointer to the end of the input buffer.
* @count
* The number of bytes to advance. Must be > 0.
* @next_hashes
* The precomputed hash codes for the sequence beginning at @in_next.
* These will be used and then updated with the precomputed hashcodes for
* the sequence beginning at @in_next + @count.
*/
static forceinline void
hc_matchfinder_skip_bytes(struct hc_matchfinder * const mf,
const u8 ** const in_base_p,
const u8 *in_next,
const u8 * const in_end,
const u32 count,
u32 * const next_hashes)
{
u32 cur_pos;
u32 hash3, hash4;
u32 next_hashseq;
u32 remaining = count;
if (unlikely(count + 5 > in_end - in_next))
return;
cur_pos = in_next - *in_base_p;
hash3 = next_hashes[0];
hash4 = next_hashes[1];
do {
if (cur_pos == MATCHFINDER_WINDOW_SIZE) {
hc_matchfinder_slide_window(mf);
*in_base_p += MATCHFINDER_WINDOW_SIZE;
cur_pos = 0;
}
mf->hash3_tab[hash3] = cur_pos;
mf->next_tab[cur_pos] = mf->hash4_tab[hash4];
mf->hash4_tab[hash4] = cur_pos;
next_hashseq = get_unaligned_le32(++in_next);
hash3 = lz_hash(next_hashseq & 0xFFFFFF, HC_MATCHFINDER_HASH3_ORDER);
hash4 = lz_hash(next_hashseq, HC_MATCHFINDER_HASH4_ORDER);
cur_pos++;
} while (--remaining);
prefetchw(&mf->hash3_tab[hash3]);
prefetchw(&mf->hash4_tab[hash4]);
next_hashes[0] = hash3;
next_hashes[1] = hash4;
}
#endif /* LIB_HC_MATCHFINDER_H */

View file

@ -1,234 +0,0 @@
/*
* ht_matchfinder.h - Lempel-Ziv matchfinding with a hash table
*
* Copyright 2022 Eric Biggers
*
* Permission is hereby granted, free of charge, to any person
* obtaining a copy of this software and associated documentation
* files (the "Software"), to deal in the Software without
* restriction, including without limitation the rights to use,
* copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following
* conditions:
*
* The above copyright notice and this permission notice shall be
* included in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
* OTHER DEALINGS IN THE SOFTWARE.
*
* ---------------------------------------------------------------------------
*
* This is a Hash Table (ht) matchfinder.
*
* This is a variant of the Hash Chains (hc) matchfinder that is optimized for
* very fast compression. The ht_matchfinder stores the hash chains inline in
* the hash table, whereas the hc_matchfinder stores them in a separate array.
* Storing the hash chains inline is the faster method when max_search_depth
* (the maximum chain length) is very small. It is not appropriate when
* max_search_depth is larger, as then it uses too much memory.
*
* Due to its focus on speed, the ht_matchfinder doesn't support length 3
* matches. It also doesn't allow max_search_depth to vary at runtime; it is
* fixed at build time as HT_MATCHFINDER_BUCKET_SIZE.
*
* See hc_matchfinder.h for more information.
*/
#ifndef LIB_HT_MATCHFINDER_H
#define LIB_HT_MATCHFINDER_H
#include "matchfinder_common.h"
#define HT_MATCHFINDER_HASH_ORDER 15
#define HT_MATCHFINDER_BUCKET_SIZE 2
#define HT_MATCHFINDER_MIN_MATCH_LEN 4
/* Minimum value of max_len for ht_matchfinder_longest_match() */
#define HT_MATCHFINDER_REQUIRED_NBYTES 5
struct MATCHFINDER_ALIGNED ht_matchfinder {
mf_pos_t hash_tab[1UL << HT_MATCHFINDER_HASH_ORDER]
[HT_MATCHFINDER_BUCKET_SIZE];
};
static forceinline void
ht_matchfinder_init(struct ht_matchfinder *mf)
{
STATIC_ASSERT(sizeof(*mf) % MATCHFINDER_SIZE_ALIGNMENT == 0);
matchfinder_init((mf_pos_t *)mf, sizeof(*mf));
}
static forceinline void
ht_matchfinder_slide_window(struct ht_matchfinder *mf)
{
matchfinder_rebase((mf_pos_t *)mf, sizeof(*mf));
}
/* Note: max_len must be >= HT_MATCHFINDER_REQUIRED_NBYTES */
static forceinline u32
ht_matchfinder_longest_match(struct ht_matchfinder * const mf,
const u8 ** const in_base_p,
const u8 * const in_next,
const u32 max_len,
const u32 nice_len,
u32 * const next_hash,
u32 * const offset_ret)
{
u32 best_len = 0;
const u8 *best_matchptr = in_next;
u32 cur_pos = in_next - *in_base_p;
const u8 *in_base;
mf_pos_t cutoff;
u32 hash;
u32 seq;
mf_pos_t cur_node;
const u8 *matchptr;
#if HT_MATCHFINDER_BUCKET_SIZE > 1
mf_pos_t to_insert;
u32 len;
#endif
#if HT_MATCHFINDER_BUCKET_SIZE > 2
int i;
#endif
/* This is assumed throughout this function. */
STATIC_ASSERT(HT_MATCHFINDER_MIN_MATCH_LEN == 4);
if (cur_pos == MATCHFINDER_WINDOW_SIZE) {
ht_matchfinder_slide_window(mf);
*in_base_p += MATCHFINDER_WINDOW_SIZE;
cur_pos = 0;
}
in_base = *in_base_p;
cutoff = cur_pos - MATCHFINDER_WINDOW_SIZE;
hash = *next_hash;
STATIC_ASSERT(HT_MATCHFINDER_REQUIRED_NBYTES == 5);
*next_hash = lz_hash(get_unaligned_le32(in_next + 1),
HT_MATCHFINDER_HASH_ORDER);
seq = load_u32_unaligned(in_next);
prefetchw(&mf->hash_tab[*next_hash]);
#if HT_MATCHFINDER_BUCKET_SIZE == 1
/* Hand-unrolled version for BUCKET_SIZE == 1 */
cur_node = mf->hash_tab[hash][0];
mf->hash_tab[hash][0] = cur_pos;
if (cur_node <= cutoff)
goto out;
matchptr = &in_base[cur_node];
if (load_u32_unaligned(matchptr) == seq) {
best_len = lz_extend(in_next, matchptr, 4, max_len);
best_matchptr = matchptr;
}
#elif HT_MATCHFINDER_BUCKET_SIZE == 2
/*
* Hand-unrolled version for BUCKET_SIZE == 2. The logic here also
* differs slightly in that it copies the first entry to the second even
* if nice_len is reached on the first, as this can be slightly faster.
*/
cur_node = mf->hash_tab[hash][0];
mf->hash_tab[hash][0] = cur_pos;
if (cur_node <= cutoff)
goto out;
matchptr = &in_base[cur_node];
to_insert = cur_node;
cur_node = mf->hash_tab[hash][1];
mf->hash_tab[hash][1] = to_insert;
if (load_u32_unaligned(matchptr) == seq) {
best_len = lz_extend(in_next, matchptr, 4, max_len);
best_matchptr = matchptr;
if (cur_node <= cutoff || best_len >= nice_len)
goto out;
matchptr = &in_base[cur_node];
if (load_u32_unaligned(matchptr) == seq &&
load_u32_unaligned(matchptr + best_len - 3) ==
load_u32_unaligned(in_next + best_len - 3)) {
len = lz_extend(in_next, matchptr, 4, max_len);
if (len > best_len) {
best_len = len;
best_matchptr = matchptr;
}
}
} else {
if (cur_node <= cutoff)
goto out;
matchptr = &in_base[cur_node];
if (load_u32_unaligned(matchptr) == seq) {
best_len = lz_extend(in_next, matchptr, 4, max_len);
best_matchptr = matchptr;
}
}
#else
/* Generic version for HT_MATCHFINDER_BUCKET_SIZE > 2 */
to_insert = cur_pos;
for (i = 0; i < HT_MATCHFINDER_BUCKET_SIZE; i++) {
cur_node = mf->hash_tab[hash][i];
mf->hash_tab[hash][i] = to_insert;
if (cur_node <= cutoff)
goto out;
matchptr = &in_base[cur_node];
if (load_u32_unaligned(matchptr) == seq) {
len = lz_extend(in_next, matchptr, 4, max_len);
if (len > best_len) {
best_len = len;
best_matchptr = matchptr;
if (best_len >= nice_len)
goto out;
}
}
to_insert = cur_node;
}
#endif
out:
*offset_ret = in_next - best_matchptr;
return best_len;
}
static forceinline void
ht_matchfinder_skip_bytes(struct ht_matchfinder * const mf,
const u8 ** const in_base_p,
const u8 *in_next,
const u8 * const in_end,
const u32 count,
u32 * const next_hash)
{
s32 cur_pos = in_next - *in_base_p;
u32 hash;
u32 remaining = count;
int i;
if (unlikely(count + HT_MATCHFINDER_REQUIRED_NBYTES > in_end - in_next))
return;
if (cur_pos + count - 1 >= MATCHFINDER_WINDOW_SIZE) {
ht_matchfinder_slide_window(mf);
*in_base_p += MATCHFINDER_WINDOW_SIZE;
cur_pos -= MATCHFINDER_WINDOW_SIZE;
}
hash = *next_hash;
do {
for (i = HT_MATCHFINDER_BUCKET_SIZE - 1; i > 0; i--)
mf->hash_tab[hash][i] = mf->hash_tab[hash][i - 1];
mf->hash_tab[hash][0] = cur_pos;
hash = lz_hash(get_unaligned_le32(++in_next),
HT_MATCHFINDER_HASH_ORDER);
cur_pos++;
} while (--remaining);
prefetchw(&mf->hash_tab[hash]);
*next_hash = hash;
}
#endif /* LIB_HT_MATCHFINDER_H */

View file

@ -1,94 +0,0 @@
/*
* lib_common.h - internal header included by all library code
*/
#ifndef LIB_LIB_COMMON_H
#define LIB_LIB_COMMON_H
#ifdef LIBDEFLATE_H
/*
* When building the library, LIBDEFLATEAPI needs to be defined properly before
* including libdeflate.h.
*/
# error "lib_common.h must always be included before libdeflate.h"
#endif
#if defined(LIBDEFLATE_DLL) && (defined(_WIN32) || defined(__CYGWIN__))
# define LIBDEFLATE_EXPORT_SYM __declspec(dllexport)
#elif defined(__GNUC__)
# define LIBDEFLATE_EXPORT_SYM __attribute__((visibility("default")))
#else
# define LIBDEFLATE_EXPORT_SYM
#endif
/*
* On i386, gcc assumes that the stack is 16-byte aligned at function entry.
* However, some compilers (e.g. MSVC) and programming languages (e.g. Delphi)
* only guarantee 4-byte alignment when calling functions. This is mainly an
* issue on Windows, but it has been seen on Linux too. Work around this ABI
* incompatibility by realigning the stack pointer when entering libdeflate.
* This prevents crashes in SSE/AVX code.
*/
#if defined(__GNUC__) && defined(__i386__)
# define LIBDEFLATE_ALIGN_STACK __attribute__((force_align_arg_pointer))
#else
# define LIBDEFLATE_ALIGN_STACK
#endif
#define LIBDEFLATEAPI LIBDEFLATE_EXPORT_SYM LIBDEFLATE_ALIGN_STACK
#include "../common_defs.h"
void *libdeflate_malloc(size_t size);
void libdeflate_free(void *ptr);
void *libdeflate_aligned_malloc(size_t alignment, size_t size);
void libdeflate_aligned_free(void *ptr);
#ifdef FREESTANDING
/*
* With -ffreestanding, <string.h> may be missing, and we must provide
* implementations of memset(), memcpy(), memmove(), and memcmp().
* See https://gcc.gnu.org/onlinedocs/gcc/Standards.html
*
* Also, -ffreestanding disables interpreting calls to these functions as
* built-ins. E.g., calling memcpy(&v, p, WORDBYTES) will make a function call,
* not be optimized to a single load instruction. For performance reasons we
* don't want that. So, declare these functions as macros that expand to the
* corresponding built-ins. This approach is recommended in the gcc man page.
* We still need the actual function definitions in case gcc calls them.
*/
void *memset(void *s, int c, size_t n);
#define memset(s, c, n) __builtin_memset((s), (c), (n))
void *memcpy(void *dest, const void *src, size_t n);
#define memcpy(dest, src, n) __builtin_memcpy((dest), (src), (n))
void *memmove(void *dest, const void *src, size_t n);
#define memmove(dest, src, n) __builtin_memmove((dest), (src), (n))
int memcmp(const void *s1, const void *s2, size_t n);
#define memcmp(s1, s2, n) __builtin_memcmp((s1), (s2), (n))
#undef LIBDEFLATE_ENABLE_ASSERTIONS
#else
#include <string.h>
#endif
/*
* Runtime assertion support. Don't enable this in production builds; it may
* hurt performance significantly.
*/
#ifdef LIBDEFLATE_ENABLE_ASSERTIONS
void libdeflate_assertion_failed(const char *expr, const char *file, int line);
#define ASSERT(expr) { if (unlikely(!(expr))) \
libdeflate_assertion_failed(#expr, __FILE__, __LINE__); }
#else
#define ASSERT(expr) (void)(expr)
#endif
#define CONCAT_IMPL(a, b) a##b
#define CONCAT(a, b) CONCAT_IMPL(a, b)
#define ADD_SUFFIX(name) CONCAT(name, SUFFIX)
#endif /* LIB_LIB_COMMON_H */

View file

@ -1,199 +0,0 @@
/*
* matchfinder_common.h - common code for Lempel-Ziv matchfinding
*/
#ifndef LIB_MATCHFINDER_COMMON_H
#define LIB_MATCHFINDER_COMMON_H
#include "lib_common.h"
#ifndef MATCHFINDER_WINDOW_ORDER
# error "MATCHFINDER_WINDOW_ORDER must be defined!"
#endif
/*
* Given a 32-bit value that was loaded with the platform's native endianness,
* return a 32-bit value whose high-order 8 bits are 0 and whose low-order 24
* bits contain the first 3 bytes, arranged in octets in a platform-dependent
* order, at the memory location from which the input 32-bit value was loaded.
*/
static forceinline u32
loaded_u32_to_u24(u32 v)
{
if (CPU_IS_LITTLE_ENDIAN())
return v & 0xFFFFFF;
else
return v >> 8;
}
/*
* Load the next 3 bytes from @p into the 24 low-order bits of a 32-bit value.
* The order in which the 3 bytes will be arranged as octets in the 24 bits is
* platform-dependent. At least 4 bytes (not 3) must be available at @p.
*/
static forceinline u32
load_u24_unaligned(const u8 *p)
{
#if UNALIGNED_ACCESS_IS_FAST
return loaded_u32_to_u24(load_u32_unaligned(p));
#else
if (CPU_IS_LITTLE_ENDIAN())
return ((u32)p[0] << 0) | ((u32)p[1] << 8) | ((u32)p[2] << 16);
else
return ((u32)p[2] << 0) | ((u32)p[1] << 8) | ((u32)p[0] << 16);
#endif
}
#define MATCHFINDER_WINDOW_SIZE (1UL << MATCHFINDER_WINDOW_ORDER)
typedef s16 mf_pos_t;
#define MATCHFINDER_INITVAL ((mf_pos_t)-MATCHFINDER_WINDOW_SIZE)
/*
* Required alignment of the matchfinder buffer pointer and size. The values
* here come from the AVX-2 implementation, which is the worst case.
*/
#define MATCHFINDER_MEM_ALIGNMENT 32
#define MATCHFINDER_SIZE_ALIGNMENT 128
#undef matchfinder_init
#undef matchfinder_rebase
#ifdef _aligned_attribute
# define MATCHFINDER_ALIGNED _aligned_attribute(MATCHFINDER_MEM_ALIGNMENT)
# if defined(ARCH_ARM32) || defined(ARCH_ARM64)
# include "arm/matchfinder_impl.h"
# elif defined(ARCH_X86_32) || defined(ARCH_X86_64)
# include "x86/matchfinder_impl.h"
# endif
#else
# define MATCHFINDER_ALIGNED
#endif
/*
* Initialize the hash table portion of the matchfinder.
*
* Essentially, this is an optimized memset().
*
* 'data' must be aligned to a MATCHFINDER_MEM_ALIGNMENT boundary, and
* 'size' must be a multiple of MATCHFINDER_SIZE_ALIGNMENT.
*/
#ifndef matchfinder_init
static forceinline void
matchfinder_init(mf_pos_t *data, size_t size)
{
size_t num_entries = size / sizeof(*data);
size_t i;
for (i = 0; i < num_entries; i++)
data[i] = MATCHFINDER_INITVAL;
}
#endif
/*
* Slide the matchfinder by MATCHFINDER_WINDOW_SIZE bytes.
*
* This must be called just after each MATCHFINDER_WINDOW_SIZE bytes have been
* run through the matchfinder.
*
* This subtracts MATCHFINDER_WINDOW_SIZE bytes from each entry in the given
* array, making the entries be relative to the current position rather than the
* position MATCHFINDER_WINDOW_SIZE bytes prior. To avoid integer underflows,
* entries that would become less than -MATCHFINDER_WINDOW_SIZE stay at
* -MATCHFINDER_WINDOW_SIZE, keeping them permanently out of bounds.
*
* The given array must contain all matchfinder data that is position-relative:
* the hash table(s) as well as any hash chain or binary tree links. Its
* address must be aligned to a MATCHFINDER_MEM_ALIGNMENT boundary, and its size
* must be a multiple of MATCHFINDER_SIZE_ALIGNMENT.
*/
#ifndef matchfinder_rebase
static forceinline void
matchfinder_rebase(mf_pos_t *data, size_t size)
{
size_t num_entries = size / sizeof(*data);
size_t i;
if (MATCHFINDER_WINDOW_SIZE == 32768) {
/*
* Branchless version for 32768-byte windows. Clear all bits if
* the value was already negative, then set the sign bit. This
* is equivalent to subtracting 32768 with signed saturation.
*/
for (i = 0; i < num_entries; i++)
data[i] = 0x8000 | (data[i] & ~(data[i] >> 15));
} else {
for (i = 0; i < num_entries; i++) {
if (data[i] >= 0)
data[i] -= (mf_pos_t)-MATCHFINDER_WINDOW_SIZE;
else
data[i] = (mf_pos_t)-MATCHFINDER_WINDOW_SIZE;
}
}
}
#endif
/*
* The hash function: given a sequence prefix held in the low-order bits of a
* 32-bit value, multiply by a carefully-chosen large constant. Discard any
* bits of the product that don't fit in a 32-bit value, but take the
* next-highest @num_bits bits of the product as the hash value, as those have
* the most randomness.
*/
static forceinline u32
lz_hash(u32 seq, unsigned num_bits)
{
return (u32)(seq * 0x1E35A7BD) >> (32 - num_bits);
}
/*
* Return the number of bytes at @matchptr that match the bytes at @strptr, up
* to a maximum of @max_len. Initially, @start_len bytes are matched.
*/
static forceinline unsigned
lz_extend(const u8 * const strptr, const u8 * const matchptr,
const unsigned start_len, const unsigned max_len)
{
unsigned len = start_len;
machine_word_t v_word;
if (UNALIGNED_ACCESS_IS_FAST) {
if (likely(max_len - len >= 4 * WORDBYTES)) {
#define COMPARE_WORD_STEP \
v_word = load_word_unaligned(&matchptr[len]) ^ \
load_word_unaligned(&strptr[len]); \
if (v_word != 0) \
goto word_differs; \
len += WORDBYTES; \
COMPARE_WORD_STEP
COMPARE_WORD_STEP
COMPARE_WORD_STEP
COMPARE_WORD_STEP
#undef COMPARE_WORD_STEP
}
while (len + WORDBYTES <= max_len) {
v_word = load_word_unaligned(&matchptr[len]) ^
load_word_unaligned(&strptr[len]);
if (v_word != 0)
goto word_differs;
len += WORDBYTES;
}
}
while (len < max_len && matchptr[len] == strptr[len])
len++;
return len;
word_differs:
if (CPU_IS_LITTLE_ENDIAN())
len += (bsfw(v_word) >> 3);
else
len += (WORDBITS - 1 - bsrw(v_word)) >> 3;
return len;
}
#endif /* LIB_MATCHFINDER_COMMON_H */

View file

@ -1,151 +0,0 @@
/*
* utils.c - utility functions for libdeflate
*
* Copyright 2016 Eric Biggers
*
* Permission is hereby granted, free of charge, to any person
* obtaining a copy of this software and associated documentation
* files (the "Software"), to deal in the Software without
* restriction, including without limitation the rights to use,
* copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following
* conditions:
*
* The above copyright notice and this permission notice shall be
* included in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
* OTHER DEALINGS IN THE SOFTWARE.
*/
#include "lib_common.h"
#ifdef FREESTANDING
# define malloc NULL
# define free NULL
#else
# include <stdlib.h>
#endif
static void *(*libdeflate_malloc_func)(size_t) = malloc;
static void (*libdeflate_free_func)(void *) = free;
void *
libdeflate_malloc(size_t size)
{
return (*libdeflate_malloc_func)(size);
}
void
libdeflate_free(void *ptr)
{
(*libdeflate_free_func)(ptr);
}
void *
libdeflate_aligned_malloc(size_t alignment, size_t size)
{
void *ptr = libdeflate_malloc(sizeof(void *) + alignment - 1 + size);
if (ptr) {
void *orig_ptr = ptr;
ptr = (void *)ALIGN((uintptr_t)ptr + sizeof(void *), alignment);
((void **)ptr)[-1] = orig_ptr;
}
return ptr;
}
void
libdeflate_aligned_free(void *ptr)
{
if (ptr)
libdeflate_free(((void **)ptr)[-1]);
}
LIBDEFLATEAPI void
libdeflate_set_memory_allocator(void *(*malloc_func)(size_t),
void (*free_func)(void *))
{
libdeflate_malloc_func = malloc_func;
libdeflate_free_func = free_func;
}
/*
* Implementations of libc functions for freestanding library builds.
* Normal library builds don't use these. Not optimized yet; usually the
* compiler expands these functions and doesn't actually call them anyway.
*/
#ifdef FREESTANDING
#undef memset
void * __attribute__((weak))
memset(void *s, int c, size_t n)
{
u8 *p = s;
size_t i;
for (i = 0; i < n; i++)
p[i] = c;
return s;
}
#undef memcpy
void * __attribute__((weak))
memcpy(void *dest, const void *src, size_t n)
{
u8 *d = dest;
const u8 *s = src;
size_t i;
for (i = 0; i < n; i++)
d[i] = s[i];
return dest;
}
#undef memmove
void * __attribute__((weak))
memmove(void *dest, const void *src, size_t n)
{
u8 *d = dest;
const u8 *s = src;
size_t i;
if (d <= s)
return memcpy(d, s, n);
for (i = n; i > 0; i--)
d[i - 1] = s[i - 1];
return dest;
}
#undef memcmp
int __attribute__((weak))
memcmp(const void *s1, const void *s2, size_t n)
{
const u8 *p1 = s1;
const u8 *p2 = s2;
size_t i;
for (i = 0; i < n; i++) {
if (p1[i] != p2[i])
return (int)p1[i] - (int)p2[i];
}
return 0;
}
#endif /* FREESTANDING */
#ifdef LIBDEFLATE_ENABLE_ASSERTIONS
#include <stdio.h>
#include <stdlib.h>
void
libdeflate_assertion_failed(const char *expr, const char *file, int line)
{
fprintf(stderr, "Assertion failed: %s at %s:%d\n", expr, file, line);
abort();
}
#endif /* LIBDEFLATE_ENABLE_ASSERTIONS */

View file

@ -1,287 +0,0 @@
/*
* x86/adler32_impl.h - x86 implementations of Adler-32 checksum algorithm
*
* Copyright 2016 Eric Biggers
*
* Permission is hereby granted, free of charge, to any person
* obtaining a copy of this software and associated documentation
* files (the "Software"), to deal in the Software without
* restriction, including without limitation the rights to use,
* copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following
* conditions:
*
* The above copyright notice and this permission notice shall be
* included in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
* OTHER DEALINGS IN THE SOFTWARE.
*/
#ifndef LIB_X86_ADLER32_IMPL_H
#define LIB_X86_ADLER32_IMPL_H
#include "cpu_features.h"
/*
* The following macros horizontally sum the s1 counters and add them to the
* real s1, and likewise for s2. They do this via a series of reductions, each
* of which halves the vector length, until just one counter remains.
*
* The s1 reductions don't depend on the s2 reductions and vice versa, so for
* efficiency they are interleaved. Also, every other s1 counter is 0 due to
* the 'psadbw' instruction (_mm_sad_epu8) summing groups of 8 bytes rather than
* 4; hence, one of the s1 reductions is skipped when going from 128 => 32 bits.
*/
#define ADLER32_FINISH_VEC_CHUNK_128(s1, s2, v_s1, v_s2) \
{ \
__m128i /* __v4su */ s1_last = (v_s1), s2_last = (v_s2); \
\
/* 128 => 32 bits */ \
s2_last = _mm_add_epi32(s2_last, _mm_shuffle_epi32(s2_last, 0x31)); \
s1_last = _mm_add_epi32(s1_last, _mm_shuffle_epi32(s1_last, 0x02)); \
s2_last = _mm_add_epi32(s2_last, _mm_shuffle_epi32(s2_last, 0x02)); \
\
*(s1) += (u32)_mm_cvtsi128_si32(s1_last); \
*(s2) += (u32)_mm_cvtsi128_si32(s2_last); \
}
#define ADLER32_FINISH_VEC_CHUNK_256(s1, s2, v_s1, v_s2) \
{ \
__m128i /* __v4su */ s1_128bit, s2_128bit; \
\
/* 256 => 128 bits */ \
s1_128bit = _mm_add_epi32(_mm256_extracti128_si256((v_s1), 0), \
_mm256_extracti128_si256((v_s1), 1)); \
s2_128bit = _mm_add_epi32(_mm256_extracti128_si256((v_s2), 0), \
_mm256_extracti128_si256((v_s2), 1)); \
\
ADLER32_FINISH_VEC_CHUNK_128((s1), (s2), s1_128bit, s2_128bit); \
}
/*
* This is a very silly partial workaround for gcc bug
* https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107892. The bug causes gcc to
* generate extra move instructions in some loops containing vector intrinsics.
*
* An alternate workaround would be to use gcc native vector operations instead
* of vector intrinsics. But that would result in MSVC needing its own code.
*/
#if GCC_PREREQ(1, 0)
# define GCC_UPDATE_VARS(a, b, c, d, e, f) \
__asm__("" : "+x" (a), "+x" (b), "+x" (c), "+x" (d), "+x" (e), "+x" (f))
#else
# define GCC_UPDATE_VARS(a, b, c, d, e, f) \
(void)a, (void)b, (void)c, (void)d, (void)e, (void)f
#endif
/* SSE2 implementation */
#if HAVE_SSE2_INTRIN
# define adler32_sse2 adler32_sse2
# define FUNCNAME adler32_sse2
# define FUNCNAME_CHUNK adler32_sse2_chunk
# define IMPL_ALIGNMENT 16
# define IMPL_SEGMENT_LEN 32
/*
* The 16-bit precision byte counters must not be allowed to undergo *signed*
* overflow, otherwise the signed multiplications at the end (_mm_madd_epi16)
* would behave incorrectly.
*/
# define IMPL_MAX_CHUNK_LEN (32 * (0x7FFF / 0xFF))
# if HAVE_SSE2_NATIVE
# define ATTRIBUTES
# else
# define ATTRIBUTES _target_attribute("sse2")
# endif
# include <emmintrin.h>
static forceinline ATTRIBUTES void
adler32_sse2_chunk(const __m128i *p, const __m128i *const end, u32 *s1, u32 *s2)
{
const __m128i zeroes = _mm_setzero_si128();
const __m128i /* __v8hu */ mults_a =
_mm_setr_epi16(32, 31, 30, 29, 28, 27, 26, 25);
const __m128i /* __v8hu */ mults_b =
_mm_setr_epi16(24, 23, 22, 21, 20, 19, 18, 17);
const __m128i /* __v8hu */ mults_c =
_mm_setr_epi16(16, 15, 14, 13, 12, 11, 10, 9);
const __m128i /* __v8hu */ mults_d =
_mm_setr_epi16(8, 7, 6, 5, 4, 3, 2, 1);
/* s1 counters: 32-bit, sum of bytes */
__m128i /* __v4su */ v_s1 = zeroes;
/* s2 counters: 32-bit, sum of s1 values */
__m128i /* __v4su */ v_s2 = zeroes;
/*
* Thirty-two 16-bit counters for byte sums. Each accumulates the bytes
* that eventually need to be multiplied by a number 32...1 for addition
* into s2.
*/
__m128i /* __v8hu */ v_byte_sums_a = zeroes;
__m128i /* __v8hu */ v_byte_sums_b = zeroes;
__m128i /* __v8hu */ v_byte_sums_c = zeroes;
__m128i /* __v8hu */ v_byte_sums_d = zeroes;
do {
/* Load the next 32 bytes. */
const __m128i bytes1 = *p++;
const __m128i bytes2 = *p++;
/*
* Accumulate the previous s1 counters into the s2 counters.
* Logically, this really should be v_s2 += v_s1 * 32, but we
* can do the multiplication (or left shift) later.
*/
v_s2 = _mm_add_epi32(v_s2, v_s1);
/*
* s1 update: use "Packed Sum of Absolute Differences" to add
* the bytes horizontally with 8 bytes per sum. Then add the
* sums to the s1 counters.
*/
v_s1 = _mm_add_epi32(v_s1, _mm_sad_epu8(bytes1, zeroes));
v_s1 = _mm_add_epi32(v_s1, _mm_sad_epu8(bytes2, zeroes));
/*
* Also accumulate the bytes into 32 separate counters that have
* 16-bit precision.
*/
v_byte_sums_a = _mm_add_epi16(
v_byte_sums_a, _mm_unpacklo_epi8(bytes1, zeroes));
v_byte_sums_b = _mm_add_epi16(
v_byte_sums_b, _mm_unpackhi_epi8(bytes1, zeroes));
v_byte_sums_c = _mm_add_epi16(
v_byte_sums_c, _mm_unpacklo_epi8(bytes2, zeroes));
v_byte_sums_d = _mm_add_epi16(
v_byte_sums_d, _mm_unpackhi_epi8(bytes2, zeroes));
GCC_UPDATE_VARS(v_s1, v_s2, v_byte_sums_a, v_byte_sums_b,
v_byte_sums_c, v_byte_sums_d);
} while (p != end);
/* Finish calculating the s2 counters. */
v_s2 = _mm_slli_epi32(v_s2, 5);
v_s2 = _mm_add_epi32(v_s2, _mm_madd_epi16(v_byte_sums_a, mults_a));
v_s2 = _mm_add_epi32(v_s2, _mm_madd_epi16(v_byte_sums_b, mults_b));
v_s2 = _mm_add_epi32(v_s2, _mm_madd_epi16(v_byte_sums_c, mults_c));
v_s2 = _mm_add_epi32(v_s2, _mm_madd_epi16(v_byte_sums_d, mults_d));
/* Add the counters to the real s1 and s2. */
ADLER32_FINISH_VEC_CHUNK_128(s1, s2, v_s1, v_s2);
}
# include "../adler32_vec_template.h"
#endif /* HAVE_SSE2_INTRIN */
/*
* AVX2 implementation. Basically the same as the SSE2 one, but with the vector
* width doubled.
*/
#if HAVE_AVX2_INTRIN
# define adler32_avx2 adler32_avx2
# define FUNCNAME adler32_avx2
# define FUNCNAME_CHUNK adler32_avx2_chunk
# define IMPL_ALIGNMENT 32
# define IMPL_SEGMENT_LEN 64
# define IMPL_MAX_CHUNK_LEN (64 * (0x7FFF / 0xFF))
# if HAVE_AVX2_NATIVE
# define ATTRIBUTES
# else
# define ATTRIBUTES _target_attribute("avx2")
# endif
# include <immintrin.h>
/*
* With clang in MSVC compatibility mode, immintrin.h incorrectly skips
* including some sub-headers.
*/
# if defined(__clang__) && defined(_MSC_VER)
# include <avxintrin.h>
# include <avx2intrin.h>
# endif
static forceinline ATTRIBUTES void
adler32_avx2_chunk(const __m256i *p, const __m256i *const end, u32 *s1, u32 *s2)
{
const __m256i zeroes = _mm256_setzero_si256();
/*
* Note, the multipliers have to be in this order because
* _mm256_unpack{lo,hi}_epi8 work on each 128-bit lane separately.
*/
const __m256i /* __v16hu */ mults_a =
_mm256_setr_epi16(64, 63, 62, 61, 60, 59, 58, 57,
48, 47, 46, 45, 44, 43, 42, 41);
const __m256i /* __v16hu */ mults_b =
_mm256_setr_epi16(56, 55, 54, 53, 52, 51, 50, 49,
40, 39, 38, 37, 36, 35, 34, 33);
const __m256i /* __v16hu */ mults_c =
_mm256_setr_epi16(32, 31, 30, 29, 28, 27, 26, 25,
16, 15, 14, 13, 12, 11, 10, 9);
const __m256i /* __v16hu */ mults_d =
_mm256_setr_epi16(24, 23, 22, 21, 20, 19, 18, 17,
8, 7, 6, 5, 4, 3, 2, 1);
__m256i /* __v8su */ v_s1 = zeroes;
__m256i /* __v8su */ v_s2 = zeroes;
__m256i /* __v16hu */ v_byte_sums_a = zeroes;
__m256i /* __v16hu */ v_byte_sums_b = zeroes;
__m256i /* __v16hu */ v_byte_sums_c = zeroes;
__m256i /* __v16hu */ v_byte_sums_d = zeroes;
do {
const __m256i bytes1 = *p++;
const __m256i bytes2 = *p++;
v_s2 = _mm256_add_epi32(v_s2, v_s1);
v_s1 = _mm256_add_epi32(v_s1, _mm256_sad_epu8(bytes1, zeroes));
v_s1 = _mm256_add_epi32(v_s1, _mm256_sad_epu8(bytes2, zeroes));
v_byte_sums_a = _mm256_add_epi16(
v_byte_sums_a, _mm256_unpacklo_epi8(bytes1, zeroes));
v_byte_sums_b = _mm256_add_epi16(
v_byte_sums_b, _mm256_unpackhi_epi8(bytes1, zeroes));
v_byte_sums_c = _mm256_add_epi16(
v_byte_sums_c, _mm256_unpacklo_epi8(bytes2, zeroes));
v_byte_sums_d = _mm256_add_epi16(
v_byte_sums_d, _mm256_unpackhi_epi8(bytes2, zeroes));
GCC_UPDATE_VARS(v_s1, v_s2, v_byte_sums_a, v_byte_sums_b,
v_byte_sums_c, v_byte_sums_d);
} while (p != end);
v_s2 = _mm256_slli_epi32(v_s2, 6);
v_s2 = _mm256_add_epi32(v_s2, _mm256_madd_epi16(v_byte_sums_a, mults_a));
v_s2 = _mm256_add_epi32(v_s2, _mm256_madd_epi16(v_byte_sums_b, mults_b));
v_s2 = _mm256_add_epi32(v_s2, _mm256_madd_epi16(v_byte_sums_c, mults_c));
v_s2 = _mm256_add_epi32(v_s2, _mm256_madd_epi16(v_byte_sums_d, mults_d));
ADLER32_FINISH_VEC_CHUNK_256(s1, s2, v_s1, v_s2);
}
# include "../adler32_vec_template.h"
#endif /* HAVE_AVX2_INTRIN */
#if defined(adler32_avx2) && HAVE_AVX2_NATIVE
#define DEFAULT_IMPL adler32_avx2
#else
static inline adler32_func_t
arch_select_adler32_func(void)
{
const u32 features MAYBE_UNUSED = get_x86_cpu_features();
#ifdef adler32_avx2
if (HAVE_AVX2(features))
return adler32_avx2;
#endif
#ifdef adler32_sse2
if (HAVE_SSE2(features))
return adler32_sse2;
#endif
return NULL;
}
#define arch_select_adler32_func arch_select_adler32_func
#endif
#endif /* LIB_X86_ADLER32_IMPL_H */

View file

@ -1,157 +0,0 @@
/*
* x86/cpu_features.c - feature detection for x86 CPUs
*
* Copyright 2016 Eric Biggers
*
* Permission is hereby granted, free of charge, to any person
* obtaining a copy of this software and associated documentation
* files (the "Software"), to deal in the Software without
* restriction, including without limitation the rights to use,
* copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following
* conditions:
*
* The above copyright notice and this permission notice shall be
* included in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
* OTHER DEALINGS IN THE SOFTWARE.
*/
#include "../cpu_features_common.h" /* must be included first */
#include "cpu_features.h"
#if HAVE_DYNAMIC_X86_CPU_FEATURES
/* With old GCC versions we have to manually save and restore the x86_32 PIC
* register (ebx). See: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47602 */
#if defined(ARCH_X86_32) && defined(__PIC__)
# define EBX_CONSTRAINT "=&r"
#else
# define EBX_CONSTRAINT "=b"
#endif
/* Execute the CPUID instruction. */
static inline void
cpuid(u32 leaf, u32 subleaf, u32 *a, u32 *b, u32 *c, u32 *d)
{
#ifdef _MSC_VER
int result[4];
__cpuidex(result, leaf, subleaf);
*a = result[0];
*b = result[1];
*c = result[2];
*d = result[3];
#else
__asm__ volatile(".ifnc %%ebx, %1; mov %%ebx, %1; .endif\n"
"cpuid \n"
".ifnc %%ebx, %1; xchg %%ebx, %1; .endif\n"
: "=a" (*a), EBX_CONSTRAINT (*b), "=c" (*c), "=d" (*d)
: "a" (leaf), "c" (subleaf));
#endif
}
/* Read an extended control register. */
static inline u64
read_xcr(u32 index)
{
#ifdef _MSC_VER
return _xgetbv(index);
#else
u32 edx, eax;
/*
* Execute the "xgetbv" instruction. Old versions of binutils do not
* recognize this instruction, so list the raw bytes instead.
*
* This must be 'volatile' to prevent this code from being moved out
* from under the check for OSXSAVE.
*/
__asm__ volatile(".byte 0x0f, 0x01, 0xd0" :
"=d" (edx), "=a" (eax) : "c" (index));
return ((u64)edx << 32) | eax;
#endif
}
#undef BIT
#define BIT(nr) (1UL << (nr))
#define XCR0_BIT_SSE BIT(1)
#define XCR0_BIT_AVX BIT(2)
#define IS_SET(reg, nr) ((reg) & BIT(nr))
#define IS_ALL_SET(reg, mask) (((reg) & (mask)) == (mask))
static const struct cpu_feature x86_cpu_feature_table[] = {
{X86_CPU_FEATURE_SSE2, "sse2"},
{X86_CPU_FEATURE_PCLMUL, "pclmul"},
{X86_CPU_FEATURE_AVX, "avx"},
{X86_CPU_FEATURE_AVX2, "avx2"},
{X86_CPU_FEATURE_BMI2, "bmi2"},
};
volatile u32 libdeflate_x86_cpu_features = 0;
/* Initialize libdeflate_x86_cpu_features. */
void libdeflate_init_x86_cpu_features(void)
{
u32 features = 0;
u32 dummy1, dummy2, dummy3, dummy4;
u32 max_function;
u32 features_1, features_2, features_3, features_4;
bool os_avx_support = false;
/* Get maximum supported function */
cpuid(0, 0, &max_function, &dummy2, &dummy3, &dummy4);
if (max_function < 1)
goto out;
/* Standard feature flags */
cpuid(1, 0, &dummy1, &dummy2, &features_2, &features_1);
if (IS_SET(features_1, 26))
features |= X86_CPU_FEATURE_SSE2;
if (IS_SET(features_2, 1))
features |= X86_CPU_FEATURE_PCLMUL;
if (IS_SET(features_2, 27)) { /* OSXSAVE set? */
u64 xcr0 = read_xcr(0);
os_avx_support = IS_ALL_SET(xcr0,
XCR0_BIT_SSE |
XCR0_BIT_AVX);
}
if (os_avx_support && IS_SET(features_2, 28))
features |= X86_CPU_FEATURE_AVX;
if (max_function < 7)
goto out;
/* Extended feature flags */
cpuid(7, 0, &dummy1, &features_3, &features_4, &dummy4);
if (os_avx_support && IS_SET(features_3, 5))
features |= X86_CPU_FEATURE_AVX2;
if (IS_SET(features_3, 8))
features |= X86_CPU_FEATURE_BMI2;
out:
disable_cpu_features_for_testing(&features, x86_cpu_feature_table,
ARRAY_LEN(x86_cpu_feature_table));
libdeflate_x86_cpu_features = features | X86_CPU_FEATURES_KNOWN;
}
#endif /* HAVE_DYNAMIC_X86_CPU_FEATURES */

View file

@ -1,151 +0,0 @@
/*
* x86/cpu_features.h - feature detection for x86 CPUs
*
* Copyright 2016 Eric Biggers
*
* Permission is hereby granted, free of charge, to any person
* obtaining a copy of this software and associated documentation
* files (the "Software"), to deal in the Software without
* restriction, including without limitation the rights to use,
* copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following
* conditions:
*
* The above copyright notice and this permission notice shall be
* included in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
* OTHER DEALINGS IN THE SOFTWARE.
*/
#ifndef LIB_X86_CPU_FEATURES_H
#define LIB_X86_CPU_FEATURES_H
#include "../lib_common.h"
#define HAVE_DYNAMIC_X86_CPU_FEATURES 0
#if defined(ARCH_X86_32) || defined(ARCH_X86_64)
#if COMPILER_SUPPORTS_TARGET_FUNCTION_ATTRIBUTE || defined(_MSC_VER)
# undef HAVE_DYNAMIC_X86_CPU_FEATURES
# define HAVE_DYNAMIC_X86_CPU_FEATURES 1
#endif
#define X86_CPU_FEATURE_SSE2 0x00000001
#define X86_CPU_FEATURE_PCLMUL 0x00000002
#define X86_CPU_FEATURE_AVX 0x00000004
#define X86_CPU_FEATURE_AVX2 0x00000008
#define X86_CPU_FEATURE_BMI2 0x00000010
#define HAVE_SSE2(features) (HAVE_SSE2_NATIVE || ((features) & X86_CPU_FEATURE_SSE2))
#define HAVE_PCLMUL(features) (HAVE_PCLMUL_NATIVE || ((features) & X86_CPU_FEATURE_PCLMUL))
#define HAVE_AVX(features) (HAVE_AVX_NATIVE || ((features) & X86_CPU_FEATURE_AVX))
#define HAVE_AVX2(features) (HAVE_AVX2_NATIVE || ((features) & X86_CPU_FEATURE_AVX2))
#define HAVE_BMI2(features) (HAVE_BMI2_NATIVE || ((features) & X86_CPU_FEATURE_BMI2))
#if HAVE_DYNAMIC_X86_CPU_FEATURES
#define X86_CPU_FEATURES_KNOWN 0x80000000
extern volatile u32 libdeflate_x86_cpu_features;
void libdeflate_init_x86_cpu_features(void);
static inline u32 get_x86_cpu_features(void)
{
if (libdeflate_x86_cpu_features == 0)
libdeflate_init_x86_cpu_features();
return libdeflate_x86_cpu_features;
}
#else /* HAVE_DYNAMIC_X86_CPU_FEATURES */
static inline u32 get_x86_cpu_features(void) { return 0; }
#endif /* !HAVE_DYNAMIC_X86_CPU_FEATURES */
/*
* Prior to gcc 4.9 (r200349) and clang 3.8 (r239883), x86 intrinsics not
* available in the main target couldn't be used in 'target' attribute
* functions. Unfortunately clang has no feature test macro for this, so we
* have to check its version.
*/
#if HAVE_DYNAMIC_X86_CPU_FEATURES && \
(GCC_PREREQ(4, 9) || CLANG_PREREQ(3, 8, 7030000) || defined(_MSC_VER))
# define HAVE_TARGET_INTRINSICS 1
#else
# define HAVE_TARGET_INTRINSICS 0
#endif
/* SSE2 */
#if defined(__SSE2__) || \
(defined(_MSC_VER) && \
(defined(ARCH_X86_64) || (defined(_M_IX86_FP) && _M_IX86_FP >= 2)))
# define HAVE_SSE2_NATIVE 1
#else
# define HAVE_SSE2_NATIVE 0
#endif
#define HAVE_SSE2_INTRIN (HAVE_SSE2_NATIVE || HAVE_TARGET_INTRINSICS)
/* PCLMUL */
#if defined(__PCLMUL__) || (defined(_MSC_VER) && defined(__AVX2__))
# define HAVE_PCLMUL_NATIVE 1
#else
# define HAVE_PCLMUL_NATIVE 0
#endif
#if HAVE_PCLMUL_NATIVE || (HAVE_TARGET_INTRINSICS && \
(GCC_PREREQ(4, 4) || CLANG_PREREQ(3, 2, 0) || \
defined(_MSC_VER)))
# define HAVE_PCLMUL_INTRIN 1
#else
# define HAVE_PCLMUL_INTRIN 0
#endif
/* AVX */
#ifdef __AVX__
# define HAVE_AVX_NATIVE 1
#else
# define HAVE_AVX_NATIVE 0
#endif
#if HAVE_AVX_NATIVE || (HAVE_TARGET_INTRINSICS && \
(GCC_PREREQ(4, 6) || CLANG_PREREQ(3, 0, 0) || \
defined(_MSC_VER)))
# define HAVE_AVX_INTRIN 1
#else
# define HAVE_AVX_INTRIN 0
#endif
/* AVX2 */
#ifdef __AVX2__
# define HAVE_AVX2_NATIVE 1
#else
# define HAVE_AVX2_NATIVE 0
#endif
#if HAVE_AVX2_NATIVE || (HAVE_TARGET_INTRINSICS && \
(GCC_PREREQ(4, 7) || CLANG_PREREQ(3, 1, 0) || \
defined(_MSC_VER)))
# define HAVE_AVX2_INTRIN 1
#else
# define HAVE_AVX2_INTRIN 0
#endif
/* BMI2 */
#if defined(__BMI2__) || (defined(_MSC_VER) && defined(__AVX2__))
# define HAVE_BMI2_NATIVE 1
#else
# define HAVE_BMI2_NATIVE 0
#endif
#if HAVE_BMI2_NATIVE || (HAVE_TARGET_INTRINSICS && \
(GCC_PREREQ(4, 7) || CLANG_PREREQ(3, 1, 0) || \
defined(_MSC_VER)))
# define HAVE_BMI2_INTRIN 1
#else
# define HAVE_BMI2_INTRIN 0
#endif
#endif /* ARCH_X86_32 || ARCH_X86_64 */
#endif /* LIB_X86_CPU_FEATURES_H */

View file

@ -1,96 +0,0 @@
/*
* x86/crc32_impl.h - x86 implementations of the gzip CRC-32 algorithm
*
* Copyright 2016 Eric Biggers
*
* Permission is hereby granted, free of charge, to any person
* obtaining a copy of this software and associated documentation
* files (the "Software"), to deal in the Software without
* restriction, including without limitation the rights to use,
* copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following
* conditions:
*
* The above copyright notice and this permission notice shall be
* included in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
* OTHER DEALINGS IN THE SOFTWARE.
*/
#ifndef LIB_X86_CRC32_IMPL_H
#define LIB_X86_CRC32_IMPL_H
#include "cpu_features.h"
/* PCLMUL implementation */
#if HAVE_PCLMUL_INTRIN
# define crc32_x86_pclmul crc32_x86_pclmul
# define SUFFIX _pclmul
# if HAVE_PCLMUL_NATIVE
# define ATTRIBUTES
# else
# define ATTRIBUTES _target_attribute("pclmul")
# endif
# define FOLD_PARTIAL_VECS 0
# include "crc32_pclmul_template.h"
#endif
/*
* PCLMUL/AVX implementation. This implementation has two benefits over the
* regular PCLMUL one. First, simply compiling against the AVX target can
* improve performance significantly (e.g. 10100 MB/s to 16700 MB/s on Skylake)
* without actually using any AVX intrinsics, probably due to the availability
* of non-destructive VEX-encoded instructions. Second, AVX support implies
* SSSE3 and SSE4.1 support, and we can use SSSE3 and SSE4.1 intrinsics for
* efficient handling of partial blocks. (We *could* compile a variant with
* PCLMUL+SSSE3+SSE4.1 w/o AVX, but for simplicity we don't currently bother.)
*
* FIXME: with MSVC, this isn't actually compiled with AVX code generation
* enabled yet. That would require that this be moved to its own .c file.
*/
#if HAVE_PCLMUL_INTRIN && HAVE_AVX_INTRIN
# define crc32_x86_pclmul_avx crc32_x86_pclmul_avx
# define SUFFIX _pclmul_avx
# if HAVE_PCLMUL_NATIVE && HAVE_AVX_NATIVE
# define ATTRIBUTES
# else
# define ATTRIBUTES _target_attribute("pclmul,avx")
# endif
# define FOLD_PARTIAL_VECS 1
# include "crc32_pclmul_template.h"
#endif
/*
* If the best implementation is statically available, use it unconditionally.
* Otherwise choose the best implementation at runtime.
*/
#if defined(crc32_x86_pclmul_avx) && HAVE_PCLMUL_NATIVE && HAVE_AVX_NATIVE
#define DEFAULT_IMPL crc32_x86_pclmul_avx
#else
static inline crc32_func_t
arch_select_crc32_func(void)
{
const u32 features MAYBE_UNUSED = get_x86_cpu_features();
#ifdef crc32_x86_pclmul_avx
if (HAVE_PCLMUL(features) && HAVE_AVX(features))
return crc32_x86_pclmul_avx;
#endif
#ifdef crc32_x86_pclmul
if (HAVE_PCLMUL(features))
return crc32_x86_pclmul;
#endif
return NULL;
}
#define arch_select_crc32_func arch_select_crc32_func
#endif
#endif /* LIB_X86_CRC32_IMPL_H */

View file

@ -1,354 +0,0 @@
/*
* x86/crc32_pclmul_template.h - gzip CRC-32 with PCLMULQDQ instructions
*
* Copyright 2016 Eric Biggers
*
* Permission is hereby granted, free of charge, to any person
* obtaining a copy of this software and associated documentation
* files (the "Software"), to deal in the Software without
* restriction, including without limitation the rights to use,
* copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following
* conditions:
*
* The above copyright notice and this permission notice shall be
* included in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
* OTHER DEALINGS IN THE SOFTWARE.
*/
/*
* This file is a "template" for instantiating PCLMULQDQ-based crc32_x86
* functions. The "parameters" are:
*
* SUFFIX:
* Name suffix to append to all instantiated functions.
* ATTRIBUTES:
* Target function attributes to use.
* FOLD_PARTIAL_VECS:
* Use vector instructions to handle any partial blocks at the beginning
* and end, instead of falling back to scalar instructions for those parts.
* Requires SSSE3 and SSE4.1 intrinsics.
*
* The overall algorithm used is CRC folding with carryless multiplication
* instructions. Note that the x86 crc32 instruction cannot be used, as it is
* for a different polynomial, not the gzip one. For an explanation of CRC
* folding with carryless multiplication instructions, see
* scripts/gen_crc32_multipliers.c and the following paper:
*
* "Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction"
* https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf
*/
#include <immintrin.h>
/*
* With clang in MSVC compatibility mode, immintrin.h incorrectly skips
* including some sub-headers.
*/
#if defined(__clang__) && defined(_MSC_VER)
# include <tmmintrin.h>
# include <smmintrin.h>
# include <wmmintrin.h>
#endif
#undef fold_vec
static forceinline ATTRIBUTES __m128i
ADD_SUFFIX(fold_vec)(__m128i src, __m128i dst, __m128i /* __v2di */ multipliers)
{
/*
* The immediate constant for PCLMULQDQ specifies which 64-bit halves of
* the 128-bit vectors to multiply:
*
* 0x00 means low halves (higher degree polynomial terms for us)
* 0x11 means high halves (lower degree polynomial terms for us)
*/
dst = _mm_xor_si128(dst, _mm_clmulepi64_si128(src, multipliers, 0x00));
dst = _mm_xor_si128(dst, _mm_clmulepi64_si128(src, multipliers, 0x11));
return dst;
}
#define fold_vec ADD_SUFFIX(fold_vec)
#if FOLD_PARTIAL_VECS
/*
* Given v containing a 16-byte polynomial, and a pointer 'p' that points to the
* next '1 <= len <= 15' data bytes, rearrange the concatenation of v and the
* data into vectors x0 and x1 that contain 'len' bytes and 16 bytes,
* respectively. Then fold x0 into x1 and return the result. Assumes that
* 'p + len - 16' is in-bounds.
*/
#undef fold_partial_vec
static forceinline ATTRIBUTES __m128i
ADD_SUFFIX(fold_partial_vec)(__m128i v, const u8 *p, size_t len,
__m128i /* __v2du */ multipliers_1)
{
/*
* pshufb(v, shift_tab[len..len+15]) left shifts v by 16-len bytes.
* pshufb(v, shift_tab[len+16..len+31]) right shifts v by len bytes.
*/
static const u8 shift_tab[48] = {
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
};
__m128i lshift = _mm_loadu_si128((const void *)&shift_tab[len]);
__m128i rshift = _mm_loadu_si128((const void *)&shift_tab[len + 16]);
__m128i x0, x1;
/* x0 = v left-shifted by '16 - len' bytes */
x0 = _mm_shuffle_epi8(v, lshift);
/*
* x1 = the last '16 - len' bytes from v (i.e. v right-shifted by 'len'
* bytes) followed by the remaining data.
*/
x1 = _mm_blendv_epi8(_mm_shuffle_epi8(v, rshift),
_mm_loadu_si128((const void *)(p + len - 16)),
/* msb 0/1 of each byte selects byte from arg1/2 */
rshift);
return fold_vec(x0, x1, multipliers_1);
}
#define fold_partial_vec ADD_SUFFIX(fold_partial_vec)
#endif /* FOLD_PARTIAL_VECS */
static u32 ATTRIBUTES MAYBE_UNUSED
ADD_SUFFIX(crc32_x86)(u32 crc, const u8 *p, size_t len)
{
const __m128i /* __v2du */ multipliers_8 =
_mm_set_epi64x(CRC32_8VECS_MULT_2, CRC32_8VECS_MULT_1);
const __m128i /* __v2du */ multipliers_4 =
_mm_set_epi64x(CRC32_4VECS_MULT_2, CRC32_4VECS_MULT_1);
const __m128i /* __v2du */ multipliers_2 =
_mm_set_epi64x(CRC32_2VECS_MULT_2, CRC32_2VECS_MULT_1);
const __m128i /* __v2du */ multipliers_1 =
_mm_set_epi64x(CRC32_1VECS_MULT_2, CRC32_1VECS_MULT_1);
const __m128i /* __v2du */ final_multiplier =
_mm_set_epi64x(0, CRC32_FINAL_MULT);
const __m128i mask32 = _mm_set_epi32(0, 0, 0, 0xFFFFFFFF);
const __m128i /* __v2du */ barrett_reduction_constants =
_mm_set_epi64x(CRC32_BARRETT_CONSTANT_2,
CRC32_BARRETT_CONSTANT_1);
__m128i v0, v1, v2, v3, v4, v5, v6, v7;
/*
* There are two overall code paths. The first path supports all
* lengths, but is intended for short lengths; it uses unaligned loads
* and does at most 4-way folds. The second path only supports longer
* lengths, aligns the pointer in order to do aligned loads, and does up
* to 8-way folds. The length check below decides which path to take.
*/
if (len < 1024) {
if (len < 16)
return crc32_slice1(crc, p, len);
v0 = _mm_xor_si128(_mm_loadu_si128((const void *)p),
_mm_cvtsi32_si128(crc));
p += 16;
if (len >= 64) {
v1 = _mm_loadu_si128((const void *)(p + 0));
v2 = _mm_loadu_si128((const void *)(p + 16));
v3 = _mm_loadu_si128((const void *)(p + 32));
p += 48;
while (len >= 64 + 64) {
v0 = fold_vec(v0, _mm_loadu_si128((const void *)(p + 0)),
multipliers_4);
v1 = fold_vec(v1, _mm_loadu_si128((const void *)(p + 16)),
multipliers_4);
v2 = fold_vec(v2, _mm_loadu_si128((const void *)(p + 32)),
multipliers_4);
v3 = fold_vec(v3, _mm_loadu_si128((const void *)(p + 48)),
multipliers_4);
p += 64;
len -= 64;
}
v0 = fold_vec(v0, v2, multipliers_2);
v1 = fold_vec(v1, v3, multipliers_2);
if (len & 32) {
v0 = fold_vec(v0, _mm_loadu_si128((const void *)(p + 0)),
multipliers_2);
v1 = fold_vec(v1, _mm_loadu_si128((const void *)(p + 16)),
multipliers_2);
p += 32;
}
v0 = fold_vec(v0, v1, multipliers_1);
if (len & 16) {
v0 = fold_vec(v0, _mm_loadu_si128((const void *)p),
multipliers_1);
p += 16;
}
} else {
if (len >= 32) {
v0 = fold_vec(v0, _mm_loadu_si128((const void *)p),
multipliers_1);
p += 16;
if (len >= 48) {
v0 = fold_vec(v0, _mm_loadu_si128((const void *)p),
multipliers_1);
p += 16;
}
}
}
} else {
const size_t align = -(uintptr_t)p & 15;
const __m128i *vp;
#if FOLD_PARTIAL_VECS
v0 = _mm_xor_si128(_mm_loadu_si128((const void *)p),
_mm_cvtsi32_si128(crc));
p += 16;
/* Align p to the next 16-byte boundary. */
if (align) {
v0 = fold_partial_vec(v0, p, align, multipliers_1);
p += align;
len -= align;
}
vp = (const __m128i *)p;
#else
/* Align p to the next 16-byte boundary. */
if (align) {
crc = crc32_slice1(crc, p, align);
p += align;
len -= align;
}
vp = (const __m128i *)p;
v0 = _mm_xor_si128(*vp++, _mm_cvtsi32_si128(crc));
#endif
v1 = *vp++;
v2 = *vp++;
v3 = *vp++;
v4 = *vp++;
v5 = *vp++;
v6 = *vp++;
v7 = *vp++;
do {
v0 = fold_vec(v0, *vp++, multipliers_8);
v1 = fold_vec(v1, *vp++, multipliers_8);
v2 = fold_vec(v2, *vp++, multipliers_8);
v3 = fold_vec(v3, *vp++, multipliers_8);
v4 = fold_vec(v4, *vp++, multipliers_8);
v5 = fold_vec(v5, *vp++, multipliers_8);
v6 = fold_vec(v6, *vp++, multipliers_8);
v7 = fold_vec(v7, *vp++, multipliers_8);
len -= 128;
} while (len >= 128 + 128);
v0 = fold_vec(v0, v4, multipliers_4);
v1 = fold_vec(v1, v5, multipliers_4);
v2 = fold_vec(v2, v6, multipliers_4);
v3 = fold_vec(v3, v7, multipliers_4);
if (len & 64) {
v0 = fold_vec(v0, *vp++, multipliers_4);
v1 = fold_vec(v1, *vp++, multipliers_4);
v2 = fold_vec(v2, *vp++, multipliers_4);
v3 = fold_vec(v3, *vp++, multipliers_4);
}
v0 = fold_vec(v0, v2, multipliers_2);
v1 = fold_vec(v1, v3, multipliers_2);
if (len & 32) {
v0 = fold_vec(v0, *vp++, multipliers_2);
v1 = fold_vec(v1, *vp++, multipliers_2);
}
v0 = fold_vec(v0, v1, multipliers_1);
if (len & 16)
v0 = fold_vec(v0, *vp++, multipliers_1);
p = (const u8 *)vp;
}
len &= 15;
/*
* If fold_partial_vec() is available, handle any remaining partial
* block now before reducing to 32 bits.
*/
#if FOLD_PARTIAL_VECS
if (len)
v0 = fold_partial_vec(v0, p, len, multipliers_1);
#endif
/*
* Fold 128 => 96 bits. This also implicitly appends 32 zero bits,
* which is equivalent to multiplying by x^32. This is needed because
* the CRC is defined as M(x)*x^32 mod G(x), not just M(x) mod G(x).
*/
v0 = _mm_xor_si128(_mm_srli_si128(v0, 8),
_mm_clmulepi64_si128(v0, multipliers_1, 0x10));
/* Fold 96 => 64 bits. */
v0 = _mm_xor_si128(_mm_srli_si128(v0, 4),
_mm_clmulepi64_si128(_mm_and_si128(v0, mask32),
final_multiplier, 0x00));
/*
* Reduce 64 => 32 bits using Barrett reduction.
*
* Let M(x) = A(x)*x^32 + B(x) be the remaining message. The goal is to
* compute R(x) = M(x) mod G(x). Since degree(B(x)) < degree(G(x)):
*
* R(x) = (A(x)*x^32 + B(x)) mod G(x)
* = (A(x)*x^32) mod G(x) + B(x)
*
* Then, by the Division Algorithm there exists a unique q(x) such that:
*
* A(x)*x^32 mod G(x) = A(x)*x^32 - q(x)*G(x)
*
* Since the left-hand side is of maximum degree 31, the right-hand side
* must be too. This implies that we can apply 'mod x^32' to the
* right-hand side without changing its value:
*
* (A(x)*x^32 - q(x)*G(x)) mod x^32 = q(x)*G(x) mod x^32
*
* Note that '+' is equivalent to '-' in polynomials over GF(2).
*
* We also know that:
*
* / A(x)*x^32 \
* q(x) = floor ( --------- )
* \ G(x) /
*
* To compute this efficiently, we can multiply the top and bottom by
* x^32 and move the division by G(x) to the top:
*
* / A(x) * floor(x^64 / G(x)) \
* q(x) = floor ( ------------------------- )
* \ x^32 /
*
* Note that floor(x^64 / G(x)) is a constant.
*
* So finally we have:
*
* / A(x) * floor(x^64 / G(x)) \
* R(x) = B(x) + G(x)*floor ( ------------------------- )
* \ x^32 /
*/
v1 = _mm_clmulepi64_si128(_mm_and_si128(v0, mask32),
barrett_reduction_constants, 0x00);
v1 = _mm_clmulepi64_si128(_mm_and_si128(v1, mask32),
barrett_reduction_constants, 0x10);
v0 = _mm_xor_si128(v0, v1);
#if FOLD_PARTIAL_VECS
crc = _mm_extract_epi32(v0, 1);
#else
crc = _mm_cvtsi128_si32(_mm_shuffle_epi32(v0, 0x01));
/* Process up to 15 bytes left over at the end. */
crc = crc32_slice1(crc, p, len);
#endif
return crc;
}
#undef SUFFIX
#undef ATTRIBUTES
#undef FOLD_PARTIAL_VECS

View file

@ -1,54 +0,0 @@
#ifndef LIB_X86_DECOMPRESS_IMPL_H
#define LIB_X86_DECOMPRESS_IMPL_H
#include "cpu_features.h"
/*
* BMI2 optimized version
*
* FIXME: with MSVC, this isn't actually compiled with BMI2 code generation
* enabled yet. That would require that this be moved to its own .c file.
*/
#if HAVE_BMI2_INTRIN
# define deflate_decompress_bmi2 deflate_decompress_bmi2
# define FUNCNAME deflate_decompress_bmi2
# if !HAVE_BMI2_NATIVE
# define ATTRIBUTES _target_attribute("bmi2")
# endif
/*
* Even with __attribute__((target("bmi2"))), gcc doesn't reliably use the
* bzhi instruction for 'word & BITMASK(count)'. So use the bzhi intrinsic
* explicitly. EXTRACT_VARBITS() is equivalent to 'word & BITMASK(count)';
* EXTRACT_VARBITS8() is equivalent to 'word & BITMASK((u8)count)'.
* Nevertheless, their implementation using the bzhi intrinsic is identical,
* as the bzhi instruction truncates the count to 8 bits implicitly.
*/
# ifndef __clang__
# include <immintrin.h>
# ifdef ARCH_X86_64
# define EXTRACT_VARBITS(word, count) _bzhi_u64((word), (count))
# define EXTRACT_VARBITS8(word, count) _bzhi_u64((word), (count))
# else
# define EXTRACT_VARBITS(word, count) _bzhi_u32((word), (count))
# define EXTRACT_VARBITS8(word, count) _bzhi_u32((word), (count))
# endif
# endif
# include "../decompress_template.h"
#endif /* HAVE_BMI2_INTRIN */
#if defined(deflate_decompress_bmi2) && HAVE_BMI2_NATIVE
#define DEFAULT_IMPL deflate_decompress_bmi2
#else
static inline decompress_func_t
arch_select_decompress_func(void)
{
#ifdef deflate_decompress_bmi2
if (HAVE_BMI2(get_x86_cpu_features()))
return deflate_decompress_bmi2;
#endif
return NULL;
}
#define arch_select_decompress_func arch_select_decompress_func
#endif
#endif /* LIB_X86_DECOMPRESS_IMPL_H */

View file

@ -1,124 +0,0 @@
/*
* x86/matchfinder_impl.h - x86 implementations of matchfinder functions
*
* Copyright 2016 Eric Biggers
*
* Permission is hereby granted, free of charge, to any person
* obtaining a copy of this software and associated documentation
* files (the "Software"), to deal in the Software without
* restriction, including without limitation the rights to use,
* copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following
* conditions:
*
* The above copyright notice and this permission notice shall be
* included in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
* OTHER DEALINGS IN THE SOFTWARE.
*/
#ifndef LIB_X86_MATCHFINDER_IMPL_H
#define LIB_X86_MATCHFINDER_IMPL_H
#include "cpu_features.h"
#if HAVE_AVX2_NATIVE
# include <immintrin.h>
static forceinline void
matchfinder_init_avx2(mf_pos_t *data, size_t size)
{
__m256i *p = (__m256i *)data;
__m256i v = _mm256_set1_epi16(MATCHFINDER_INITVAL);
STATIC_ASSERT(MATCHFINDER_MEM_ALIGNMENT % sizeof(*p) == 0);
STATIC_ASSERT(MATCHFINDER_SIZE_ALIGNMENT % (4 * sizeof(*p)) == 0);
STATIC_ASSERT(sizeof(mf_pos_t) == 2);
do {
p[0] = v;
p[1] = v;
p[2] = v;
p[3] = v;
p += 4;
size -= 4 * sizeof(*p);
} while (size != 0);
}
#define matchfinder_init matchfinder_init_avx2
static forceinline void
matchfinder_rebase_avx2(mf_pos_t *data, size_t size)
{
__m256i *p = (__m256i *)data;
__m256i v = _mm256_set1_epi16((u16)-MATCHFINDER_WINDOW_SIZE);
STATIC_ASSERT(MATCHFINDER_MEM_ALIGNMENT % sizeof(*p) == 0);
STATIC_ASSERT(MATCHFINDER_SIZE_ALIGNMENT % (4 * sizeof(*p)) == 0);
STATIC_ASSERT(sizeof(mf_pos_t) == 2);
do {
/* PADDSW: Add Packed Signed Integers With Signed Saturation */
p[0] = _mm256_adds_epi16(p[0], v);
p[1] = _mm256_adds_epi16(p[1], v);
p[2] = _mm256_adds_epi16(p[2], v);
p[3] = _mm256_adds_epi16(p[3], v);
p += 4;
size -= 4 * sizeof(*p);
} while (size != 0);
}
#define matchfinder_rebase matchfinder_rebase_avx2
#elif HAVE_SSE2_NATIVE
# include <emmintrin.h>
static forceinline void
matchfinder_init_sse2(mf_pos_t *data, size_t size)
{
__m128i *p = (__m128i *)data;
__m128i v = _mm_set1_epi16(MATCHFINDER_INITVAL);
STATIC_ASSERT(MATCHFINDER_MEM_ALIGNMENT % sizeof(*p) == 0);
STATIC_ASSERT(MATCHFINDER_SIZE_ALIGNMENT % (4 * sizeof(*p)) == 0);
STATIC_ASSERT(sizeof(mf_pos_t) == 2);
do {
p[0] = v;
p[1] = v;
p[2] = v;
p[3] = v;
p += 4;
size -= 4 * sizeof(*p);
} while (size != 0);
}
#define matchfinder_init matchfinder_init_sse2
static forceinline void
matchfinder_rebase_sse2(mf_pos_t *data, size_t size)
{
__m128i *p = (__m128i *)data;
__m128i v = _mm_set1_epi16((u16)-MATCHFINDER_WINDOW_SIZE);
STATIC_ASSERT(MATCHFINDER_MEM_ALIGNMENT % sizeof(*p) == 0);
STATIC_ASSERT(MATCHFINDER_SIZE_ALIGNMENT % (4 * sizeof(*p)) == 0);
STATIC_ASSERT(sizeof(mf_pos_t) == 2);
do {
/* PADDSW: Add Packed Signed Integers With Signed Saturation */
p[0] = _mm_adds_epi16(p[0], v);
p[1] = _mm_adds_epi16(p[1], v);
p[2] = _mm_adds_epi16(p[2], v);
p[3] = _mm_adds_epi16(p[3], v);
p += 4;
size -= 4 * sizeof(*p);
} while (size != 0);
}
#define matchfinder_rebase matchfinder_rebase_sse2
#endif /* HAVE_SSE2_NATIVE */
#endif /* LIB_X86_MATCHFINDER_IMPL_H */

View file

@ -1,82 +0,0 @@
/*
* zlib_compress.c - compress with a zlib wrapper
*
* Copyright 2016 Eric Biggers
*
* Permission is hereby granted, free of charge, to any person
* obtaining a copy of this software and associated documentation
* files (the "Software"), to deal in the Software without
* restriction, including without limitation the rights to use,
* copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following
* conditions:
*
* The above copyright notice and this permission notice shall be
* included in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
* OTHER DEALINGS IN THE SOFTWARE.
*/
#include "deflate_compress.h"
#include "zlib_constants.h"
LIBDEFLATEAPI size_t
libdeflate_zlib_compress(struct libdeflate_compressor *c,
const void *in, size_t in_nbytes,
void *out, size_t out_nbytes_avail)
{
u8 *out_next = out;
u16 hdr;
unsigned compression_level;
unsigned level_hint;
size_t deflate_size;
if (out_nbytes_avail <= ZLIB_MIN_OVERHEAD)
return 0;
/* 2 byte header: CMF and FLG */
hdr = (ZLIB_CM_DEFLATE << 8) | (ZLIB_CINFO_32K_WINDOW << 12);
compression_level = libdeflate_get_compression_level(c);
if (compression_level < 2)
level_hint = ZLIB_FASTEST_COMPRESSION;
else if (compression_level < 6)
level_hint = ZLIB_FAST_COMPRESSION;
else if (compression_level < 8)
level_hint = ZLIB_DEFAULT_COMPRESSION;
else
level_hint = ZLIB_SLOWEST_COMPRESSION;
hdr |= level_hint << 6;
hdr |= 31 - (hdr % 31);
put_unaligned_be16(hdr, out_next);
out_next += 2;
/* Compressed data */
deflate_size = libdeflate_deflate_compress(c, in, in_nbytes, out_next,
out_nbytes_avail - ZLIB_MIN_OVERHEAD);
if (deflate_size == 0)
return 0;
out_next += deflate_size;
/* ADLER32 */
put_unaligned_be32(libdeflate_adler32(1, in, in_nbytes), out_next);
out_next += 4;
return out_next - (u8 *)out;
}
LIBDEFLATEAPI size_t
libdeflate_zlib_compress_bound(struct libdeflate_compressor *c,
size_t in_nbytes)
{
return ZLIB_MIN_OVERHEAD +
libdeflate_deflate_compress_bound(c, in_nbytes);
}

View file

@ -1,21 +0,0 @@
/*
* zlib_constants.h - constants for the zlib wrapper format
*/
#ifndef LIB_ZLIB_CONSTANTS_H
#define LIB_ZLIB_CONSTANTS_H
#define ZLIB_MIN_HEADER_SIZE 2
#define ZLIB_FOOTER_SIZE 4
#define ZLIB_MIN_OVERHEAD (ZLIB_MIN_HEADER_SIZE + ZLIB_FOOTER_SIZE)
#define ZLIB_CM_DEFLATE 8
#define ZLIB_CINFO_32K_WINDOW 7
#define ZLIB_FASTEST_COMPRESSION 0
#define ZLIB_FAST_COMPRESSION 1
#define ZLIB_DEFAULT_COMPRESSION 2
#define ZLIB_SLOWEST_COMPRESSION 3
#endif /* LIB_ZLIB_CONSTANTS_H */

View file

@ -1,104 +0,0 @@
/*
* zlib_decompress.c - decompress with a zlib wrapper
*
* Copyright 2016 Eric Biggers
*
* Permission is hereby granted, free of charge, to any person
* obtaining a copy of this software and associated documentation
* files (the "Software"), to deal in the Software without
* restriction, including without limitation the rights to use,
* copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following
* conditions:
*
* The above copyright notice and this permission notice shall be
* included in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
* OTHER DEALINGS IN THE SOFTWARE.
*/
#include "lib_common.h"
#include "zlib_constants.h"
LIBDEFLATEAPI enum libdeflate_result
libdeflate_zlib_decompress_ex(struct libdeflate_decompressor *d,
const void *in, size_t in_nbytes,
void *out, size_t out_nbytes_avail,
size_t *actual_in_nbytes_ret,
size_t *actual_out_nbytes_ret)
{
const u8 *in_next = in;
const u8 * const in_end = in_next + in_nbytes;
u16 hdr;
size_t actual_in_nbytes;
size_t actual_out_nbytes;
enum libdeflate_result result;
if (in_nbytes < ZLIB_MIN_OVERHEAD)
return LIBDEFLATE_BAD_DATA;
/* 2 byte header: CMF and FLG */
hdr = get_unaligned_be16(in_next);
in_next += 2;
/* FCHECK */
if ((hdr % 31) != 0)
return LIBDEFLATE_BAD_DATA;
/* CM */
if (((hdr >> 8) & 0xF) != ZLIB_CM_DEFLATE)
return LIBDEFLATE_BAD_DATA;
/* CINFO */
if ((hdr >> 12) > ZLIB_CINFO_32K_WINDOW)
return LIBDEFLATE_BAD_DATA;
/* FDICT */
if ((hdr >> 5) & 1)
return LIBDEFLATE_BAD_DATA;
/* Compressed data */
result = libdeflate_deflate_decompress_ex(d, in_next,
in_end - ZLIB_FOOTER_SIZE - in_next,
out, out_nbytes_avail,
&actual_in_nbytes, actual_out_nbytes_ret);
if (result != LIBDEFLATE_SUCCESS)
return result;
if (actual_out_nbytes_ret)
actual_out_nbytes = *actual_out_nbytes_ret;
else
actual_out_nbytes = out_nbytes_avail;
in_next += actual_in_nbytes;
/* ADLER32 */
if (libdeflate_adler32(1, out, actual_out_nbytes) !=
get_unaligned_be32(in_next))
return LIBDEFLATE_BAD_DATA;
in_next += 4;
if (actual_in_nbytes_ret)
*actual_in_nbytes_ret = in_next - (u8 *)in;
return LIBDEFLATE_SUCCESS;
}
LIBDEFLATEAPI enum libdeflate_result
libdeflate_zlib_decompress(struct libdeflate_decompressor *d,
const void *in, size_t in_nbytes,
void *out, size_t out_nbytes_avail,
size_t *actual_out_nbytes_ret)
{
return libdeflate_zlib_decompress_ex(d, in, in_nbytes,
out, out_nbytes_avail,
NULL, actual_out_nbytes_ret);
}

View file

@ -1,368 +0,0 @@
/*
* libdeflate.h - public header for libdeflate
*/
#ifndef LIBDEFLATE_H
#define LIBDEFLATE_H
#include <stddef.h>
#include <stdint.h>
#ifdef __cplusplus
extern "C" {
#endif
#define LIBDEFLATE_VERSION_MAJOR 1
#define LIBDEFLATE_VERSION_MINOR 18
#define LIBDEFLATE_VERSION_STRING "1.18"
/*
* Users of libdeflate.dll on Windows can define LIBDEFLATE_DLL to cause
* __declspec(dllimport) to be used. This should be done when it's easy to do.
* Otherwise it's fine to skip it, since it is a very minor performance
* optimization that is irrelevant for most use cases of libdeflate.
*/
#ifndef LIBDEFLATEAPI
# if defined(LIBDEFLATE_DLL) && (defined(_WIN32) || defined(__CYGWIN__))
# define LIBDEFLATEAPI __declspec(dllimport)
# else
# define LIBDEFLATEAPI
# endif
#endif
/* ========================================================================== */
/* Compression */
/* ========================================================================== */
struct libdeflate_compressor;
/*
* libdeflate_alloc_compressor() allocates a new compressor that supports
* DEFLATE, zlib, and gzip compression. 'compression_level' is the compression
* level on a zlib-like scale but with a higher maximum value (1 = fastest, 6 =
* medium/default, 9 = slow, 12 = slowest). Level 0 is also supported and means
* "no compression", specifically "create a valid stream, but only emit
* uncompressed blocks" (this will expand the data slightly).
*
* The return value is a pointer to the new compressor, or NULL if out of memory
* or if the compression level is invalid (i.e. outside the range [0, 12]).
*
* Note: for compression, the sliding window size is defined at compilation time
* to 32768, the largest size permissible in the DEFLATE format. It cannot be
* changed at runtime.
*
* A single compressor is not safe to use by multiple threads concurrently.
* However, different threads may use different compressors concurrently.
*/
LIBDEFLATEAPI struct libdeflate_compressor *
libdeflate_alloc_compressor(int compression_level);
/*
* libdeflate_deflate_compress() performs raw DEFLATE compression on a buffer of
* data. It attempts to compress 'in_nbytes' bytes of data located at 'in' and
* write the result to 'out', which has space for 'out_nbytes_avail' bytes. The
* return value is the compressed size in bytes, or 0 if the data could not be
* compressed to 'out_nbytes_avail' bytes or fewer (but see note below).
*
* If compression is successful, then the output data is guaranteed to be a
* valid DEFLATE stream that decompresses to the input data. No other
* guarantees are made about the output data. Notably, different versions of
* libdeflate can produce different compressed data for the same uncompressed
* data, even at the same compression level. Do ***NOT*** do things like
* writing tests that compare compressed data to a golden output, as this can
* break when libdeflate is updated. (This property isn't specific to
* libdeflate; the same is true for zlib and other compression libraries too.)
*
* Note: due to a performance optimization, libdeflate_deflate_compress()
* currently needs a small amount of slack space at the end of the output
* buffer. As a result, it can't actually report compressed sizes very close to
* 'out_nbytes_avail'. This doesn't matter in real-world use cases, and
* libdeflate_deflate_compress_bound() already includes the slack space.
* However, it does mean that testing code that redundantly compresses data
* using an exact-sized output buffer won't work as might be expected:
*
* out_nbytes = libdeflate_deflate_compress(c, in, in_nbytes, out,
* libdeflate_deflate_compress_bound(in_nbytes));
* // The following assertion will fail.
* assert(libdeflate_deflate_compress(c, in, in_nbytes, out, out_nbytes) != 0);
*
* To avoid this, either don't write tests like the above, or make sure to
* include at least 9 bytes of slack space in 'out_nbytes_avail'.
*/
LIBDEFLATEAPI size_t
libdeflate_deflate_compress(struct libdeflate_compressor *compressor,
const void *in, size_t in_nbytes,
void *out, size_t out_nbytes_avail);
/*
* libdeflate_deflate_compress_bound() returns a worst-case upper bound on the
* number of bytes of compressed data that may be produced by compressing any
* buffer of length less than or equal to 'in_nbytes' using
* libdeflate_deflate_compress() with the specified compressor. This bound will
* necessarily be a number greater than or equal to 'in_nbytes'. It may be an
* overestimate of the true upper bound. The return value is guaranteed to be
* the same for all invocations with the same compressor and same 'in_nbytes'.
*
* As a special case, 'compressor' may be NULL. This causes the bound to be
* taken across *any* libdeflate_compressor that could ever be allocated with
* this build of the library, with any options.
*
* Note that this function is not necessary in many applications. With
* block-based compression, it is usually preferable to separately store the
* uncompressed size of each block and to store any blocks that did not compress
* to less than their original size uncompressed. In that scenario, there is no
* need to know the worst-case compressed size, since the maximum number of
* bytes of compressed data that may be used would always be one less than the
* input length. You can just pass a buffer of that size to
* libdeflate_deflate_compress() and store the data uncompressed if
* libdeflate_deflate_compress() returns 0, indicating that the compressed data
* did not fit into the provided output buffer.
*/
LIBDEFLATEAPI size_t
libdeflate_deflate_compress_bound(struct libdeflate_compressor *compressor,
size_t in_nbytes);
/*
* Like libdeflate_deflate_compress(), but uses the zlib wrapper format instead
* of raw DEFLATE.
*/
LIBDEFLATEAPI size_t
libdeflate_zlib_compress(struct libdeflate_compressor *compressor,
const void *in, size_t in_nbytes,
void *out, size_t out_nbytes_avail);
/*
* Like libdeflate_deflate_compress_bound(), but assumes the data will be
* compressed with libdeflate_zlib_compress() rather than with
* libdeflate_deflate_compress().
*/
LIBDEFLATEAPI size_t
libdeflate_zlib_compress_bound(struct libdeflate_compressor *compressor,
size_t in_nbytes);
/*
* Like libdeflate_deflate_compress(), but uses the gzip wrapper format instead
* of raw DEFLATE.
*/
LIBDEFLATEAPI size_t
libdeflate_gzip_compress(struct libdeflate_compressor *compressor,
const void *in, size_t in_nbytes,
void *out, size_t out_nbytes_avail);
/*
* Like libdeflate_deflate_compress_bound(), but assumes the data will be
* compressed with libdeflate_gzip_compress() rather than with
* libdeflate_deflate_compress().
*/
LIBDEFLATEAPI size_t
libdeflate_gzip_compress_bound(struct libdeflate_compressor *compressor,
size_t in_nbytes);
/*
* libdeflate_free_compressor() frees a compressor that was allocated with
* libdeflate_alloc_compressor(). If a NULL pointer is passed in, no action is
* taken.
*/
LIBDEFLATEAPI void
libdeflate_free_compressor(struct libdeflate_compressor *compressor);
/* ========================================================================== */
/* Decompression */
/* ========================================================================== */
struct libdeflate_decompressor;
/*
* libdeflate_alloc_decompressor() allocates a new decompressor that can be used
* for DEFLATE, zlib, and gzip decompression. The return value is a pointer to
* the new decompressor, or NULL if out of memory.
*
* This function takes no parameters, and the returned decompressor is valid for
* decompressing data that was compressed at any compression level and with any
* sliding window size.
*
* A single decompressor is not safe to use by multiple threads concurrently.
* However, different threads may use different decompressors concurrently.
*/
LIBDEFLATEAPI struct libdeflate_decompressor *
libdeflate_alloc_decompressor(void);
/*
* Result of a call to libdeflate_deflate_decompress(),
* libdeflate_zlib_decompress(), or libdeflate_gzip_decompress().
*/
enum libdeflate_result {
/* Decompression was successful. */
LIBDEFLATE_SUCCESS = 0,
/* Decompression failed because the compressed data was invalid,
* corrupt, or otherwise unsupported. */
LIBDEFLATE_BAD_DATA = 1,
/* A NULL 'actual_out_nbytes_ret' was provided, but the data would have
* decompressed to fewer than 'out_nbytes_avail' bytes. */
LIBDEFLATE_SHORT_OUTPUT = 2,
/* The data would have decompressed to more than 'out_nbytes_avail'
* bytes. */
LIBDEFLATE_INSUFFICIENT_SPACE = 3,
};
/*
* libdeflate_deflate_decompress() decompresses a DEFLATE stream from the buffer
* 'in' with compressed size up to 'in_nbytes' bytes. The uncompressed data is
* written to 'out', a buffer with size 'out_nbytes_avail' bytes. If
* decompression succeeds, then 0 (LIBDEFLATE_SUCCESS) is returned. Otherwise,
* a nonzero result code such as LIBDEFLATE_BAD_DATA is returned, and the
* contents of the output buffer are undefined.
*
* Decompression stops at the end of the DEFLATE stream (as indicated by the
* BFINAL flag), even if it is actually shorter than 'in_nbytes' bytes.
*
* libdeflate_deflate_decompress() can be used in cases where the actual
* uncompressed size is known (recommended) or unknown (not recommended):
*
* - If the actual uncompressed size is known, then pass the actual
* uncompressed size as 'out_nbytes_avail' and pass NULL for
* 'actual_out_nbytes_ret'. This makes libdeflate_deflate_decompress() fail
* with LIBDEFLATE_SHORT_OUTPUT if the data decompressed to fewer than the
* specified number of bytes.
*
* - If the actual uncompressed size is unknown, then provide a non-NULL
* 'actual_out_nbytes_ret' and provide a buffer with some size
* 'out_nbytes_avail' that you think is large enough to hold all the
* uncompressed data. In this case, if the data decompresses to less than
* or equal to 'out_nbytes_avail' bytes, then
* libdeflate_deflate_decompress() will write the actual uncompressed size
* to *actual_out_nbytes_ret and return 0 (LIBDEFLATE_SUCCESS). Otherwise,
* it will return LIBDEFLATE_INSUFFICIENT_SPACE if the provided buffer was
* not large enough but no other problems were encountered, or another
* nonzero result code if decompression failed for another reason.
*/
LIBDEFLATEAPI enum libdeflate_result
libdeflate_deflate_decompress(struct libdeflate_decompressor *decompressor,
const void *in, size_t in_nbytes,
void *out, size_t out_nbytes_avail,
size_t *actual_out_nbytes_ret);
/*
* Like libdeflate_deflate_decompress(), but adds the 'actual_in_nbytes_ret'
* argument. If decompression succeeds and 'actual_in_nbytes_ret' is not NULL,
* then the actual compressed size of the DEFLATE stream (aligned to the next
* byte boundary) is written to *actual_in_nbytes_ret.
*/
LIBDEFLATEAPI enum libdeflate_result
libdeflate_deflate_decompress_ex(struct libdeflate_decompressor *decompressor,
const void *in, size_t in_nbytes,
void *out, size_t out_nbytes_avail,
size_t *actual_in_nbytes_ret,
size_t *actual_out_nbytes_ret);
/*
* Like libdeflate_deflate_decompress(), but assumes the zlib wrapper format
* instead of raw DEFLATE.
*
* Decompression will stop at the end of the zlib stream, even if it is shorter
* than 'in_nbytes'. If you need to know exactly where the zlib stream ended,
* use libdeflate_zlib_decompress_ex().
*/
LIBDEFLATEAPI enum libdeflate_result
libdeflate_zlib_decompress(struct libdeflate_decompressor *decompressor,
const void *in, size_t in_nbytes,
void *out, size_t out_nbytes_avail,
size_t *actual_out_nbytes_ret);
/*
* Like libdeflate_zlib_decompress(), but adds the 'actual_in_nbytes_ret'
* argument. If 'actual_in_nbytes_ret' is not NULL and the decompression
* succeeds (indicating that the first zlib-compressed stream in the input
* buffer was decompressed), then the actual number of input bytes consumed is
* written to *actual_in_nbytes_ret.
*/
LIBDEFLATEAPI enum libdeflate_result
libdeflate_zlib_decompress_ex(struct libdeflate_decompressor *decompressor,
const void *in, size_t in_nbytes,
void *out, size_t out_nbytes_avail,
size_t *actual_in_nbytes_ret,
size_t *actual_out_nbytes_ret);
/*
* Like libdeflate_deflate_decompress(), but assumes the gzip wrapper format
* instead of raw DEFLATE.
*
* If multiple gzip-compressed members are concatenated, then only the first
* will be decompressed. Use libdeflate_gzip_decompress_ex() if you need
* multi-member support.
*/
LIBDEFLATEAPI enum libdeflate_result
libdeflate_gzip_decompress(struct libdeflate_decompressor *decompressor,
const void *in, size_t in_nbytes,
void *out, size_t out_nbytes_avail,
size_t *actual_out_nbytes_ret);
/*
* Like libdeflate_gzip_decompress(), but adds the 'actual_in_nbytes_ret'
* argument. If 'actual_in_nbytes_ret' is not NULL and the decompression
* succeeds (indicating that the first gzip-compressed member in the input
* buffer was decompressed), then the actual number of input bytes consumed is
* written to *actual_in_nbytes_ret.
*/
LIBDEFLATEAPI enum libdeflate_result
libdeflate_gzip_decompress_ex(struct libdeflate_decompressor *decompressor,
const void *in, size_t in_nbytes,
void *out, size_t out_nbytes_avail,
size_t *actual_in_nbytes_ret,
size_t *actual_out_nbytes_ret);
/*
* libdeflate_free_decompressor() frees a decompressor that was allocated with
* libdeflate_alloc_decompressor(). If a NULL pointer is passed in, no action
* is taken.
*/
LIBDEFLATEAPI void
libdeflate_free_decompressor(struct libdeflate_decompressor *decompressor);
/* ========================================================================== */
/* Checksums */
/* ========================================================================== */
/*
* libdeflate_adler32() updates a running Adler-32 checksum with 'len' bytes of
* data and returns the updated checksum. When starting a new checksum, the
* required initial value for 'adler' is 1. This value is also returned when
* 'buffer' is specified as NULL.
*/
LIBDEFLATEAPI uint32_t
libdeflate_adler32(uint32_t adler, const void *buffer, size_t len);
/*
* libdeflate_crc32() updates a running CRC-32 checksum with 'len' bytes of data
* and returns the updated checksum. When starting a new checksum, the required
* initial value for 'crc' is 0. This value is also returned when 'buffer' is
* specified as NULL.
*/
LIBDEFLATEAPI uint32_t
libdeflate_crc32(uint32_t crc, const void *buffer, size_t len);
/* ========================================================================== */
/* Custom memory allocator */
/* ========================================================================== */
/*
* Install a custom memory allocator which libdeflate will use for all memory
* allocations. 'malloc_func' is a function that must behave like malloc(), and
* 'free_func' is a function that must behave like free().
*
* There must not be any libdeflate_compressor or libdeflate_decompressor
* structures in existence when calling this function.
*/
LIBDEFLATEAPI void
libdeflate_set_memory_allocator(void *(*malloc_func)(size_t),
void (*free_func)(void *));
#ifdef __cplusplus
}
#endif
#endif /* LIBDEFLATE_H */

View file

@ -2,6 +2,7 @@
set(EXTRA_CORE_LIBS
${ICU_LIBRARIES}
${JANSSON_LIBRARIES}
${DEFLATE_LIBRARIES}
${CMAKE_THREAD_LIBS_INIT}
# libm should be present by default becaue this is C++
m
@ -92,6 +93,7 @@ include_directories(
${CMAKE_BINARY_DIR}/include/QtCore
${ICU_INCLUDES}
${JANSSON_INCLUDES}
${DEFLATE_INCLUDES}
)
set(CORE_HEADERS
@ -325,33 +327,6 @@ if(WITH_EXECINFO AND EXECINFO_FOUND)
include_directories(${EXECINFO_INCLUDES})
endif()
if(WITH_DEFLATE AND DEFLATE_FOUND)
set(EXTRA_CORE_LIBS
${EXTRA_CORE_LIBS}
${DEFLATE_LIBRARIES}
)
include_directories(${DEFLATE_INCLUDES})
else()
set(CORE_SOURCES
${CORE_SOURCES}
# common files
${CMAKE_SOURCE_DIR}/src/3rdparty/libdeflate/lib/deflate_decompress.c
${CMAKE_SOURCE_DIR}/src/3rdparty/libdeflate/lib/deflate_compress.c
${CMAKE_SOURCE_DIR}/src/3rdparty/libdeflate/lib/utils.c
${CMAKE_SOURCE_DIR}/src/3rdparty/libdeflate/lib/arm/cpu_features.c
${CMAKE_SOURCE_DIR}/src/3rdparty/libdeflate/lib/x86/cpu_features.c
# zlib wrapper files
${CMAKE_SOURCE_DIR}/src/3rdparty/libdeflate/lib/adler32.c
${CMAKE_SOURCE_DIR}/src/3rdparty/libdeflate/lib/zlib_decompress.c
${CMAKE_SOURCE_DIR}/src/3rdparty/libdeflate/lib/zlib_compress.c
# gzip wrapper files
${CMAKE_SOURCE_DIR}/src/3rdparty/libdeflate/lib/crc32.c
${CMAKE_SOURCE_DIR}/src/3rdparty/libdeflate/lib/gzip_decompress.c
${CMAKE_SOURCE_DIR}/src/3rdparty/libdeflate/lib/gzip_compress.c
)
include_directories(${CMAKE_SOURCE_DIR}/src/3rdparty/libdeflate)
endif()
katie_unity_exclude(
${CMAKE_CURRENT_SOURCE_DIR}/global/qt_error_string.cpp
)

View file

@ -4,6 +4,7 @@ set(EXTRA_GUI_LIBS
${FREETYPE_LIBRARIES}
${X11_X11_LIB}
${PNG_LIBRARIES}
${DEFLATE_LIBRARIES}
)
set(GUI_PUBLIC_HEADERS
@ -205,6 +206,7 @@ include_directories(
${FREETYPE_INCLUDE_DIRS}
${X11_INCLUDE_DIR}
${PNG_INCLUDE_DIRS}
${DEFLATE_INCLUDES}
)
set(GUI_HEADERS
@ -884,33 +886,6 @@ if(WITH_FONTCONFIG AND FONTCONFIG_FOUND)
add_definitions(${FONTCONFIG_DEFINITIONS})
endif()
if(WITH_DEFLATE AND DEFLATE_FOUND)
set(EXTRA_GUI_LIBS
${EXTRA_GUI_LIBS}
${DEFLATE_LIBRARIES}
)
include_directories(${DEFLATE_INCLUDES})
else()
set(GUI_SOURCES
${GUI_SOURCES}
# common files
${CMAKE_SOURCE_DIR}/src/3rdparty/libdeflate/lib/deflate_decompress.c
${CMAKE_SOURCE_DIR}/src/3rdparty/libdeflate/lib/deflate_compress.c
${CMAKE_SOURCE_DIR}/src/3rdparty/libdeflate/lib/utils.c
${CMAKE_SOURCE_DIR}/src/3rdparty/libdeflate/lib/arm/cpu_features.c
${CMAKE_SOURCE_DIR}/src/3rdparty/libdeflate/lib/x86/cpu_features.c
# zlib wrapper files
${CMAKE_SOURCE_DIR}/src/3rdparty/libdeflate/lib/adler32.c
${CMAKE_SOURCE_DIR}/src/3rdparty/libdeflate/lib/zlib_decompress.c
${CMAKE_SOURCE_DIR}/src/3rdparty/libdeflate/lib/zlib_compress.c
# gzip wrapper files
${CMAKE_SOURCE_DIR}/src/3rdparty/libdeflate/lib/crc32.c
${CMAKE_SOURCE_DIR}/src/3rdparty/libdeflate/lib/gzip_decompress.c
${CMAKE_SOURCE_DIR}/src/3rdparty/libdeflate/lib/gzip_compress.c
)
include_directories(${CMAKE_SOURCE_DIR}/src/3rdparty/libdeflate)
endif()
# anything that includes qt_x11_p.h is known to break unity build
katie_unity_exclude(
${CMAKE_CURRENT_SOURCE_DIR}/dialogs/qdialog.cpp