linux-erofs.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [WIP] [PATCH v0.0-20200229 00/11] ez: LZMA fixed-sized output compression
       [not found] <20200229045017.12424-1-hsiangkao.ref@aol.com>
@ 2020-02-29  4:50 ` Gao Xiang via Linux-erofs
  2020-02-29  4:50   ` [WIP] [PATCH v0.0-20200229 01/11] ez: add basic source files Gao Xiang via Linux-erofs
                     ` (11 more replies)
  0 siblings, 12 replies; 14+ messages in thread
From: Gao Xiang via Linux-erofs @ 2020-02-29  4:50 UTC (permalink / raw)
  To: linux-erofs

From: Gao Xiang <gaoxiang25@huawei.com>

This is a WIP PREVIEW patchset, just for archiving to open
source community only.

For now, it implements LZMA SDK-like GetOptimumFast approach
and GetOptimum is still on schedule.

It's still buggy, lack of formal APIs and actively under
development for a while...

Usage:
$ ./run.sh
$ ./a.out output.bin.lzma infile

It will compress the beginning as much as possible into
4k RAW LZMA block.

Thanks,
Gao Xiang

Gao Xiang (11):
  ez: add basic source files
  ez: add helpers for unaligned accesses
  ez: introduce bitops header file
  ez: lzma: add range encoder
  ez: lzma: add common header file
  ez: lzma: add byte hashtable generated with CRC32
  ez: lzma: add LZMA matchfinder
  ez: lzma: add LZMA encoder
  ez: lzma: checkpoint feature for range encoder
  ez: lzma: add fixed-sized output compression
  ez: lzma: add test program

-- 
2.20.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [WIP] [PATCH v0.0-20200229 01/11] ez: add basic source files
  2020-02-29  4:50 ` [WIP] [PATCH v0.0-20200229 00/11] ez: LZMA fixed-sized output compression Gao Xiang via Linux-erofs
@ 2020-02-29  4:50   ` Gao Xiang via Linux-erofs
  2020-02-29  4:50   ` [WIP] [PATCH v0.0-20200229 02/11] ez: add helpers for unaligned accesses Gao Xiang via Linux-erofs
                     ` (10 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Gao Xiang via Linux-erofs @ 2020-02-29  4:50 UTC (permalink / raw)
  To: linux-erofs

Add common headers to provide basic definitions and
helpers. Open source license is included as well.

Signed-off-by: Gao Xiang <hsiangkao@aol.com>
---
 include/ez/defs.h | 76 +++++++++++++++++++++++++++++++++++++++++++++++
 include/ez/util.h | 25 ++++++++++++++++
 2 files changed, 101 insertions(+)
 create mode 100644 include/ez/defs.h
 create mode 100644 include/ez/util.h

diff --git a/include/ez/defs.h b/include/ez/defs.h
new file mode 100644
index 0000000..d5acccf
--- /dev/null
+++ b/include/ez/defs.h
@@ -0,0 +1,76 @@
+/* SPDX-License-Identifier: Apache-2.0 */
+/*
+ * ez/include/ez/defs.h
+ *
+ * Copyright (C) 2019-2020 Gao Xiang <hsiangkao@aol.com>
+ */
+#ifndef __EZ_DEFS_H
+#define __EZ_DEFS_H
+
+#include <stdint.h>
+#include <assert.h>
+#include <limits.h>
+#include <stdbool.h>
+#include <string.h>
+#include <errno.h>
+
+#define DIV_ROUND_UP(n, d)	(((n) + (d) - 1) / (d))
+
+#define min(x, y) ({				\
+	typeof(x) _min1 = (x);			\
+	typeof(y) _min2 = (y);			\
+	(void) (&_min1 == &_min2);		\
+	_min1 < _min2 ? _min1 : _min2; })
+
+#define max(x, y) ({				\
+	typeof(x) _max1 = (x);			\
+	typeof(y) _max2 = (y);			\
+	(void) (&_max1 == &_max2);		\
+	_max1 > _max2 ? _max1 : _max2; })
+
+/*
+ * ..and if you can't take the strict types, you can specify one yourself.
+ * Or don't use min/max at all, of course.
+ */
+#define min_t(type, x, y) ({			\
+	type __min1 = (x);			\
+	type __min2 = (y);			\
+	__min1 < __min2 ? __min1: __min2; })
+
+#define max_t(type, x, y) ({			\
+	type __max1 = (x);			\
+	type __max2 = (y);			\
+	__max1 > __max2 ? __max1: __max2; })
+
+#define ARRAY_SIZE(arr)	(sizeof(arr) / sizeof((arr)[0]))
+
+#ifndef __OPTIMIZE__
+#define BUILD_BUG_ON(condition)	((void)sizeof(char[1 - 2 * !!(condition)]))
+#else
+#define BUILD_BUG_ON(condition)	assert(condition)
+#endif
+
+#define BUG_ON(cond)	assert(!(cond))
+
+#ifdef NDEBUG
+#define DBG_BUGON(condition)	((void)(condition))
+#else
+#define DBG_BUGON(condition)	BUG_ON(condition)
+#endif
+
+#ifndef __maybe_unused
+#define __maybe_unused		__attribute__((__unused__))
+#endif
+
+#define ARRAY_SIZE(arr)		(sizeof(arr) / sizeof((arr)[0]))
+
+#ifndef likely
+#define likely(x)	__builtin_expect(!!(x), 1)
+#endif
+
+#ifndef unlikely
+#define unlikely(x)	__builtin_expect(!!(x), 0)
+#endif
+
+#endif
+
diff --git a/include/ez/util.h b/include/ez/util.h
new file mode 100644
index 0000000..9cb6e62
--- /dev/null
+++ b/include/ez/util.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: Apache-2.0 */
+/*
+ * ez/include/ez/util.h
+ *
+ * Copyright (C) 2019-2020 Gao Xiang <hsiangkao@aol.com>
+ */
+#ifndef __EZ_UTIL_H
+#define __EZ_UTIL_H
+
+#include "defs.h"
+
+static inline const uint8_t *ez_memcmp(const void *ptr1, const void *ptr2,
+				       const void *buf1end)
+{
+	const uint8_t *buf1 = ptr1;
+	const uint8_t *buf2 = ptr2;
+
+	for (; buf1 != buf1end; ++buf1, ++buf2)
+		if (*buf1 != *buf2)
+			break;
+	return buf1;
+}
+
+#endif
+
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [WIP] [PATCH v0.0-20200229 02/11] ez: add helpers for unaligned accesses
  2020-02-29  4:50 ` [WIP] [PATCH v0.0-20200229 00/11] ez: LZMA fixed-sized output compression Gao Xiang via Linux-erofs
  2020-02-29  4:50   ` [WIP] [PATCH v0.0-20200229 01/11] ez: add basic source files Gao Xiang via Linux-erofs
@ 2020-02-29  4:50   ` Gao Xiang via Linux-erofs
  2020-02-29  4:50   ` [WIP] [PATCH v0.0-20200229 03/11] ez: introduce bitops header file Gao Xiang via Linux-erofs
                     ` (9 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Gao Xiang via Linux-erofs @ 2020-02-29  4:50 UTC (permalink / raw)
  To: linux-erofs

Signed-off-by: Gao Xiang <hsiangkao@aol.com>
---
 include/ez/unaligned.h | 55 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)
 create mode 100644 include/ez/unaligned.h

diff --git a/include/ez/unaligned.h b/include/ez/unaligned.h
new file mode 100644
index 0000000..f7c7f4f
--- /dev/null
+++ b/include/ez/unaligned.h
@@ -0,0 +1,55 @@
+/* SPDX-License-Identifier: Apache-2.0 */
+/*
+ * ez/include/ez/unaligned.h
+ *
+ * Copyright (C) 2019 Gao Xiang <hsiangkao@aol.com>
+ */
+#ifndef __EZ_UNALIGNED_H
+#define __EZ_UNALIGNED_H
+
+#include <stdint.h>
+
+/*
+ * __pack instructions are safer, but compiler specific, hence potentially
+ * problematic for some compilers (workable on gcc, clang).
+ */
+static inline uint16_t get_unaligned16(const void *ptr)
+{
+	const struct { uint16_t v; } __attribute__((packed)) *unalign = ptr;
+
+	return unalign->v;
+}
+
+static inline uint32_t get_unaligned32(const void *ptr)
+{
+	const struct { uint32_t v; } __attribute__((packed)) *unalign = ptr;
+
+	return unalign->v;
+}
+
+static inline unsigned int __is_little_endian(void)
+{
+#ifdef __LITTLE_ENDIAN
+	return 1;
+#elif defined(__BIG_ENDIAN)
+	return 0;
+#else
+	/* don't use static : performance detrimental */
+	const union { uint32_t u; uint8_t c[4]; } one = { 1 };
+
+	return one.c[0];
+#endif
+}
+
+static inline uint32_t get_unaligned_le32(const void *ptr)
+{
+	if (!__is_little_endian()) {
+		const uint8_t *p = (const uint8_t *)ptr;
+
+		return p[0] | (p[1] << 8) | (p[2] << 16) | (p[3] << 24);
+	}
+	return get_unaligned32(ptr);
+}
+
+#endif
+
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [WIP] [PATCH v0.0-20200229 03/11] ez: introduce bitops header file
  2020-02-29  4:50 ` [WIP] [PATCH v0.0-20200229 00/11] ez: LZMA fixed-sized output compression Gao Xiang via Linux-erofs
  2020-02-29  4:50   ` [WIP] [PATCH v0.0-20200229 01/11] ez: add basic source files Gao Xiang via Linux-erofs
  2020-02-29  4:50   ` [WIP] [PATCH v0.0-20200229 02/11] ez: add helpers for unaligned accesses Gao Xiang via Linux-erofs
@ 2020-02-29  4:50   ` Gao Xiang via Linux-erofs
  2020-02-29  4:50   ` [WIP] [PATCH v0.0-20200229 04/11] ez: lzma: add range encoder Gao Xiang via Linux-erofs
                     ` (8 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Gao Xiang via Linux-erofs @ 2020-02-29  4:50 UTC (permalink / raw)
  To: linux-erofs

Currently it's used to find most significant set bit.

Signed-off-by: Gao Xiang <hsiangkao@aol.com>
---
 include/ez/bitops.h | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)
 create mode 100644 include/ez/bitops.h

diff --git a/include/ez/bitops.h b/include/ez/bitops.h
new file mode 100644
index 0000000..5d8858e
--- /dev/null
+++ b/include/ez/bitops.h
@@ -0,0 +1,23 @@
+/* SPDX-License-Identifier: Apache-2.0 */
+/*
+ * ez/include/ez/bitops.h
+ *
+ * Copyright (C) 2020 Gao Xiang <hsiangkao@aol.com>
+ */
+#ifndef __EZ_BITOPS_H
+#define __EZ_BITOPS_H
+
+/**
+ * fls - find last (most-significant) bit set
+ * @x: the word to search
+ *
+ * This is defined the same way as ffs.
+ * Note fls(0) = 0, fls(1) = 1, fls(0x80000000) = 32.
+ */
+static inline int fls(unsigned int x)
+{
+	return x ? sizeof(x) * 8 - __builtin_clz(x) : 0;
+}
+
+#endif
+
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [WIP] [PATCH v0.0-20200229 04/11] ez: lzma: add range encoder
  2020-02-29  4:50 ` [WIP] [PATCH v0.0-20200229 00/11] ez: LZMA fixed-sized output compression Gao Xiang via Linux-erofs
                     ` (2 preceding siblings ...)
  2020-02-29  4:50   ` [WIP] [PATCH v0.0-20200229 03/11] ez: introduce bitops header file Gao Xiang via Linux-erofs
@ 2020-02-29  4:50   ` Gao Xiang via Linux-erofs
  2020-02-29  4:50   ` [WIP] [PATCH v0.0-20200229 05/11] ez: lzma: add common header file Gao Xiang via Linux-erofs
                     ` (7 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Gao Xiang via Linux-erofs @ 2020-02-29  4:50 UTC (permalink / raw)
  To: linux-erofs

The implementation is greatly inspired by XZ Utils,
which is created by Lasse Collin <lasse.collin@tukaani.org>.

Signed-off-by: Gao Xiang <hsiangkao@aol.com>
---
 lzma/rc_common.h  |  48 ++++++++++
 lzma/rc_encoder.h | 221 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 269 insertions(+)
 create mode 100644 lzma/rc_common.h
 create mode 100644 lzma/rc_encoder.h

diff --git a/lzma/rc_common.h b/lzma/rc_common.h
new file mode 100644
index 0000000..b424fa1
--- /dev/null
+++ b/lzma/rc_common.h
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: Unlicense */
+/*
+ * ez/lzma/rc_common.h - common definitions for range coder
+ *
+ * Copyright (C) 2019-2020 Gao Xiang <hsiangkao@aol.com>
+ *
+ * Authors: Igor Pavlov <http://7-zip.org/>
+ *          Lasse Collin <lasse.collin@tukaani.org>
+ *          Gao Xiang <hsiangkao@aol.com>
+ */
+#ifndef __EZ_LZMA_RC_COMMON_H
+#define __EZ_LZMA_RC_COMMON_H
+
+#include <ez/defs.h>
+
+/* Constants */
+#define RC_SHIFT_BITS		8
+#define RC_TOP_BITS		24
+#define RC_TOP_VALUE		(1U << RC_TOP_BITS)
+#define RC_BIT_MODEL_TOTAL_BITS	11
+#define RC_BIT_MODEL_TOTAL	(1U << RC_BIT_MODEL_TOTAL_BITS)
+#define RC_MOVE_BITS		5
+
+/* Type definitions */
+
+/*
+ * Type of probabilities used with range coder
+ *
+ * This needs to be at least 12-bit, so uint16_t is a logical choice. However,
+ * on some architecture and compiler combinations, a bigger type may give
+ * better speed since the probability variables are accessed a lot.
+ * On the other hand, bigger probability type increases cache footprint,
+ * since there are 2 to 14 thousand probability variables in LZMA (assuming
+ * the limit of lc + lp <= 4; with lc + lp <= 12 there would be about 1.5
+ * million variables).
+ *
+ * I will stick unless some specific architectures are *much* faster (20-50%)
+ * with uint32_t than uint16_t.
+ */
+typedef uint16_t probability;
+
+static inline uint32_t rc_bound(uint32_t range, probability prob)
+{
+	return (range >> RC_BIT_MODEL_TOTAL_BITS) * prob;
+}
+
+#endif
+
diff --git a/lzma/rc_encoder.h b/lzma/rc_encoder.h
new file mode 100644
index 0000000..98bf34d
--- /dev/null
+++ b/lzma/rc_encoder.h
@@ -0,0 +1,221 @@
+/* SPDX-License-Identifier: Unlicense */
+/*
+ * ez/lzma/rc_encoder.h - range code encoder
+ *
+ * Copyright (C) 2019-2020 Gao Xiang <hsiangkao@aol.com>
+ *
+ * Authors: Igor Pavlov <http://7-zip.org/>
+ *          Lasse Collin <lasse.collin@tukaani.org>
+ *          Gao Xiang <hsiangkao@aol.com>
+ */
+#ifndef __EZ_LZMA_RC_ENCODER_H
+#define __EZ_LZMA_RC_ENCODER_H
+
+#include "rc_common.h"
+
+/*
+ * Maximum number of symbols that can be put pending into lzma_range_encoder
+ * structure between calls to lzma_rc_encode(). For LZMA, 52+5 is enough
+ * (match with big distance and length followed by range encoder flush).
+ */
+#define RC_SYMBOLS_MAX 58
+
+#define RC_BIT_0	0
+#define RC_BIT_1	1
+#define RC_DIRECT_0	2
+#define RC_DIRECT_1	3
+#define RC_FLUSH	4
+
+struct lzma_rc_encoder {
+	uint64_t low;
+	uint64_t extended_bytes;
+	uint32_t range;
+	uint8_t firstbyte;
+
+	/* Number of symbols in the tables */
+	uint8_t count;
+
+	/* rc_encode()'s position in the tables */
+	uint8_t pos;
+
+	/* Symbols to encode (use uint8_t so can be in a single cacheline.) */
+	uint8_t symbols[RC_SYMBOLS_MAX];
+
+	/* Probabilities associated with RC_BIT_0 or RC_BIT_1 */
+	probability *probs[RC_SYMBOLS_MAX];
+};
+
+static inline void rc_reset(struct lzma_rc_encoder *rc)
+{
+	*rc = (struct lzma_rc_encoder) {
+		.range = UINT32_MAX,
+		/* .firstbyte = 0, */
+	};
+}
+
+static inline void rc_bit(struct lzma_rc_encoder *rc,
+			  probability *prob, uint32_t bit)
+{
+	rc->symbols[rc->count] = bit;
+	rc->probs[rc->count] = prob;
+	++rc->count;
+}
+
+static inline void rc_bittree(struct lzma_rc_encoder *rc,
+			      probability *probs, uint32_t nbits,
+			      uint32_t symbol)
+{
+	uint32_t model_index = 1;
+
+	do {
+		const uint32_t bit = (symbol >> --nbits) & 1;
+
+		rc_bit(rc, &probs[model_index], bit);
+		model_index = (model_index << 1) + bit;
+	} while (nbits);
+}
+
+static inline void rc_bittree_reverse(struct lzma_rc_encoder *rc,
+				      probability *probs,
+				      uint32_t nbits, uint32_t symbol)
+{
+	uint32_t model_index = 1;
+
+	do {
+		const uint32_t bit = symbol & 1;
+
+		symbol >>= 1;
+		rc_bit(rc, &probs[model_index], bit);
+		model_index = (model_index << 1) + bit;
+	} while (--nbits);
+}
+
+static inline void rc_direct(struct lzma_rc_encoder *rc,
+			     uint32_t val, uint32_t nbits)
+{
+	do {
+		rc->symbols[rc->count] = RC_DIRECT_0 + ((val >> --nbits) & 1);
+		++rc->count;
+	} while (nbits);
+}
+
+
+static inline void rc_flush(struct lzma_rc_encoder *rc)
+{
+	unsigned int i;
+
+	for (i = 0; i < 5; ++i)
+		rc->symbols[rc->count++] = RC_FLUSH;
+}
+
+static inline bool rc_shift_low(struct lzma_rc_encoder *rc,
+				uint8_t **ppos, uint8_t *oend)
+{
+	if (rc->low >> 24 != UINT8_MAX) {
+		const uint32_t carrybit = rc->low >> 32;
+
+		DBG_BUGON(carrybit > 1);
+
+		/* first or interrupted byte */
+		if (unlikely(*ppos >= oend))
+			return true;
+		*(*ppos)++ = rc->firstbyte + carrybit;
+
+		while (rc->extended_bytes) {
+			--rc->extended_bytes;
+			if (unlikely(*ppos >= oend)) {
+				rc->firstbyte = UINT8_MAX;
+				return true;
+			}
+			*(*ppos)++ = carrybit - 1;
+		}
+		rc->firstbyte = rc->low >> 24;
+	} else {
+		++rc->extended_bytes;
+	}
+	rc->low = (rc->low & 0x00FFFFFF) << RC_SHIFT_BITS;
+	return false;
+}
+
+static inline bool rc_encode(struct lzma_rc_encoder *rc,
+			     uint8_t **ppos, uint8_t *oend)
+{
+	DBG_BUGON(rc->count > RC_SYMBOLS_MAX);
+
+	while (rc->pos < rc->count) {
+		/* Normalize */
+		if (rc->range < RC_TOP_VALUE) {
+			if (rc_shift_low(rc, ppos, oend))
+				return true;
+
+			rc->range <<= RC_SHIFT_BITS;
+		}
+
+		/* Encode a bit */
+		switch (rc->symbols[rc->pos]) {
+		case RC_BIT_0: {
+			probability prob = *rc->probs[rc->pos];
+
+			rc->range = rc_bound(rc->range, prob);
+			prob += (RC_BIT_MODEL_TOTAL - prob) >> RC_MOVE_BITS;
+			*rc->probs[rc->pos] = prob;
+			break;
+		}
+
+		case RC_BIT_1: {
+			probability prob = *rc->probs[rc->pos];
+			const uint32_t bound = rc_bound(rc->range, prob);
+
+			rc->low += bound;
+			rc->range -= bound;
+			prob -= prob >> RC_MOVE_BITS;
+			*rc->probs[rc->pos] = prob;
+			break;
+		}
+
+		case RC_DIRECT_0:
+			rc->range >>= 1;
+			break;
+
+		case RC_DIRECT_1:
+			rc->range >>= 1;
+			rc->low += rc->range;
+			break;
+
+		case RC_FLUSH:
+			/* Prevent further normalizations */
+			rc->range = UINT32_MAX;
+
+			/* Flush the last five bytes (see rc_flush()) */
+			do {
+				if (rc_shift_low(rc, ppos, oend))
+					return true;
+			} while (++rc->pos < rc->count);
+
+			/*
+			 * Reset the range encoder so we are ready to continue
+			 * encoding if we weren't finishing the stream.
+			 */
+			rc_reset(rc);
+			return false;
+
+		default:
+			DBG_BUGON(1);
+			break;
+		}
+		++rc->pos;
+	}
+
+	rc->count = 0;
+	rc->pos = 0;
+	return false;
+}
+
+
+static inline uint64_t rc_pending(const struct lzma_rc_encoder *rc)
+{
+	return rc->extended_bytes + 5;
+}
+
+#endif
+
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [WIP] [PATCH v0.0-20200229 05/11] ez: lzma: add common header file
  2020-02-29  4:50 ` [WIP] [PATCH v0.0-20200229 00/11] ez: LZMA fixed-sized output compression Gao Xiang via Linux-erofs
                     ` (3 preceding siblings ...)
  2020-02-29  4:50   ` [WIP] [PATCH v0.0-20200229 04/11] ez: lzma: add range encoder Gao Xiang via Linux-erofs
@ 2020-02-29  4:50   ` Gao Xiang via Linux-erofs
  2020-02-29  4:50   ` [WIP] [PATCH v0.0-20200229 06/11] ez: lzma: add byte hashtable generated with CRC32 Gao Xiang via Linux-erofs
                     ` (6 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Gao Xiang via Linux-erofs @ 2020-02-29  4:50 UTC (permalink / raw)
  To: linux-erofs

Note that meaningful names mentioned in LZMA implementation
keep in sync with LZMA SDK rather than XZ utils so that we
can follow latest offical LZMA SDK easily.

Apart from that, Linux kernel coding style is applied to
other parts of ez.

Signed-off-by: Gao Xiang <hsiangkao@aol.com>
---
 lzma/lzma_common.h | 59 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 59 insertions(+)
 create mode 100644 lzma/lzma_common.h

diff --git a/lzma/lzma_common.h b/lzma/lzma_common.h
new file mode 100644
index 0000000..14df123
--- /dev/null
+++ b/lzma/lzma_common.h
@@ -0,0 +1,59 @@
+/* SPDX-License-Identifier: Unlicense */
+/*
+ * ez/lzma/lzma_common.h - common definitions of LZMA encoder
+ *
+ * Copyright (C) 2019-2020 Gao Xiang <hsiangkao@aol.com>
+ *
+ * Authors: Igor Pavlov <http://7-zip.org/>
+ *          Gao Xiang <hsiangkao@aol.com>
+ */
+#ifndef __EZ_LZMA_LZMA_COMMON_H
+#define __EZ_LZMA_LZMA_COMMON_H
+
+#include <ez/defs.h>
+#include <ez/unaligned.h>
+
+/*
+ * LZMA Matchlength
+ */
+
+/* Minimum length of a match is two bytes. */
+#define kMatchMinLen	2
+
+/*
+ * Match length is encoded with 4, 5, or 10 bits.
+ *
+ * Length    Bits
+ *    2-9     4 = (Choice = 0) + 3 bits
+ *  10-17     5 = (Choice = 1) + (Choice2 = 0) + 3 bits
+ * 18-273    10 = (Choice = 1) + (Choice2 = 1) + 8 bits
+ */
+#define kLenNumLowBits		3
+#define kLenNumLowSymbols	(1 << kLenNumLowBits)
+#define kLenNumHighBits		8
+#define kLenNumHighSymbols	(1 << kLenNumHighBits)
+#define kLenNumSymbolsTotal	(kLenNumLowSymbols * 2 + kLenNumHighSymbols)
+
+/*
+ * Maximum length of a match is 273 which is a result
+ * of the encoding described above.
+ */
+#define kMatchMaxLen	(kMatchMinLen + kLenNumSymbolsTotal - 1)
+
+/*
+ * LZMA remembers the four most recent match distances.
+ * Reusing these distances tend to take less space than
+ * re-encoding the actual distance value.
+ */
+#define LZMA_NUM_REPS	4
+
+#define MARK_LIT ((uint32_t)-1)
+
+/*
+ * LZMA_REQUIRED_INPUT_MAX = number of required input bytes for worst case.
+ * Num bits = log2((2^11 / 31) ^ 22) + 26 < 134 + 26 = 160;
+ */
+#define LZMA_REQUIRED_INPUT_MAX 20
+
+#endif
+
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [WIP] [PATCH v0.0-20200229 06/11] ez: lzma: add byte hashtable generated with CRC32
  2020-02-29  4:50 ` [WIP] [PATCH v0.0-20200229 00/11] ez: LZMA fixed-sized output compression Gao Xiang via Linux-erofs
                     ` (4 preceding siblings ...)
  2020-02-29  4:50   ` [WIP] [PATCH v0.0-20200229 05/11] ez: lzma: add common header file Gao Xiang via Linux-erofs
@ 2020-02-29  4:50   ` Gao Xiang via Linux-erofs
  2020-02-29  4:50   ` [WIP] [PATCH v0.0-20200229 07/11] ez: lzma: add LZMA matchfinder Gao Xiang via Linux-erofs
                     ` (5 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Gao Xiang via Linux-erofs @ 2020-02-29  4:50 UTC (permalink / raw)
  To: linux-erofs

which is also directly taken from XZ Utils for now.
Maybe it could be improved later.

Signed-off-by: Gao Xiang <hsiangkao@aol.com>
---
 lzma/bytehash.h | 69 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 69 insertions(+)
 create mode 100644 lzma/bytehash.h

diff --git a/lzma/bytehash.h b/lzma/bytehash.h
new file mode 100644
index 0000000..671c878
--- /dev/null
+++ b/lzma/bytehash.h
@@ -0,0 +1,69 @@
+/* This file has been automatically generated by crc32_tablegen.c. */
+
+static const uint32_t crc32_byte_hashtable[256] = {
+	0x00000000, 0x77073096, 0xEE0E612C, 0x990951BA,
+	0x076DC419, 0x706AF48F, 0xE963A535, 0x9E6495A3,
+	0x0EDB8832, 0x79DCB8A4, 0xE0D5E91E, 0x97D2D988,
+	0x09B64C2B, 0x7EB17CBD, 0xE7B82D07, 0x90BF1D91,
+	0x1DB71064, 0x6AB020F2, 0xF3B97148, 0x84BE41DE,
+	0x1ADAD47D, 0x6DDDE4EB, 0xF4D4B551, 0x83D385C7,
+	0x136C9856, 0x646BA8C0, 0xFD62F97A, 0x8A65C9EC,
+	0x14015C4F, 0x63066CD9, 0xFA0F3D63, 0x8D080DF5,
+	0x3B6E20C8, 0x4C69105E, 0xD56041E4, 0xA2677172,
+	0x3C03E4D1, 0x4B04D447, 0xD20D85FD, 0xA50AB56B,
+	0x35B5A8FA, 0x42B2986C, 0xDBBBC9D6, 0xACBCF940,
+	0x32D86CE3, 0x45DF5C75, 0xDCD60DCF, 0xABD13D59,
+	0x26D930AC, 0x51DE003A, 0xC8D75180, 0xBFD06116,
+	0x21B4F4B5, 0x56B3C423, 0xCFBA9599, 0xB8BDA50F,
+	0x2802B89E, 0x5F058808, 0xC60CD9B2, 0xB10BE924,
+	0x2F6F7C87, 0x58684C11, 0xC1611DAB, 0xB6662D3D,
+	0x76DC4190, 0x01DB7106, 0x98D220BC, 0xEFD5102A,
+	0x71B18589, 0x06B6B51F, 0x9FBFE4A5, 0xE8B8D433,
+	0x7807C9A2, 0x0F00F934, 0x9609A88E, 0xE10E9818,
+	0x7F6A0DBB, 0x086D3D2D, 0x91646C97, 0xE6635C01,
+	0x6B6B51F4, 0x1C6C6162, 0x856530D8, 0xF262004E,
+	0x6C0695ED, 0x1B01A57B, 0x8208F4C1, 0xF50FC457,
+	0x65B0D9C6, 0x12B7E950, 0x8BBEB8EA, 0xFCB9887C,
+	0x62DD1DDF, 0x15DA2D49, 0x8CD37CF3, 0xFBD44C65,
+	0x4DB26158, 0x3AB551CE, 0xA3BC0074, 0xD4BB30E2,
+	0x4ADFA541, 0x3DD895D7, 0xA4D1C46D, 0xD3D6F4FB,
+	0x4369E96A, 0x346ED9FC, 0xAD678846, 0xDA60B8D0,
+	0x44042D73, 0x33031DE5, 0xAA0A4C5F, 0xDD0D7CC9,
+	0x5005713C, 0x270241AA, 0xBE0B1010, 0xC90C2086,
+	0x5768B525, 0x206F85B3, 0xB966D409, 0xCE61E49F,
+	0x5EDEF90E, 0x29D9C998, 0xB0D09822, 0xC7D7A8B4,
+	0x59B33D17, 0x2EB40D81, 0xB7BD5C3B, 0xC0BA6CAD,
+	0xEDB88320, 0x9ABFB3B6, 0x03B6E20C, 0x74B1D29A,
+	0xEAD54739, 0x9DD277AF, 0x04DB2615, 0x73DC1683,
+	0xE3630B12, 0x94643B84, 0x0D6D6A3E, 0x7A6A5AA8,
+	0xE40ECF0B, 0x9309FF9D, 0x0A00AE27, 0x7D079EB1,
+	0xF00F9344, 0x8708A3D2, 0x1E01F268, 0x6906C2FE,
+	0xF762575D, 0x806567CB, 0x196C3671, 0x6E6B06E7,
+	0xFED41B76, 0x89D32BE0, 0x10DA7A5A, 0x67DD4ACC,
+	0xF9B9DF6F, 0x8EBEEFF9, 0x17B7BE43, 0x60B08ED5,
+	0xD6D6A3E8, 0xA1D1937E, 0x38D8C2C4, 0x4FDFF252,
+	0xD1BB67F1, 0xA6BC5767, 0x3FB506DD, 0x48B2364B,
+	0xD80D2BDA, 0xAF0A1B4C, 0x36034AF6, 0x41047A60,
+	0xDF60EFC3, 0xA867DF55, 0x316E8EEF, 0x4669BE79,
+	0xCB61B38C, 0xBC66831A, 0x256FD2A0, 0x5268E236,
+	0xCC0C7795, 0xBB0B4703, 0x220216B9, 0x5505262F,
+	0xC5BA3BBE, 0xB2BD0B28, 0x2BB45A92, 0x5CB36A04,
+	0xC2D7FFA7, 0xB5D0CF31, 0x2CD99E8B, 0x5BDEAE1D,
+	0x9B64C2B0, 0xEC63F226, 0x756AA39C, 0x026D930A,
+	0x9C0906A9, 0xEB0E363F, 0x72076785, 0x05005713,
+	0x95BF4A82, 0xE2B87A14, 0x7BB12BAE, 0x0CB61B38,
+	0x92D28E9B, 0xE5D5BE0D, 0x7CDCEFB7, 0x0BDBDF21,
+	0x86D3D2D4, 0xF1D4E242, 0x68DDB3F8, 0x1FDA836E,
+	0x81BE16CD, 0xF6B9265B, 0x6FB077E1, 0x18B74777,
+	0x88085AE6, 0xFF0F6A70, 0x66063BCA, 0x11010B5C,
+	0x8F659EFF, 0xF862AE69, 0x616BFFD3, 0x166CCF45,
+	0xA00AE278, 0xD70DD2EE, 0x4E048354, 0x3903B3C2,
+	0xA7672661, 0xD06016F7, 0x4969474D, 0x3E6E77DB,
+	0xAED16A4A, 0xD9D65ADC, 0x40DF0B66, 0x37D83BF0,
+	0xA9BCAE53, 0xDEBB9EC5, 0x47B2CF7F, 0x30B5FFE9,
+	0xBDBDF21C, 0xCABAC28A, 0x53B39330, 0x24B4A3A6,
+	0xBAD03605, 0xCDD70693, 0x54DE5729, 0x23D967BF,
+	0xB3667A2E, 0xC4614AB8, 0x5D681B02, 0x2A6F2B94,
+	0xB40BBE37, 0xC30C8EA1, 0x5A05DF1B, 0x2D02EF8D
+};
+
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [WIP] [PATCH v0.0-20200229 07/11] ez: lzma: add LZMA matchfinder
  2020-02-29  4:50 ` [WIP] [PATCH v0.0-20200229 00/11] ez: LZMA fixed-sized output compression Gao Xiang via Linux-erofs
                     ` (5 preceding siblings ...)
  2020-02-29  4:50   ` [WIP] [PATCH v0.0-20200229 06/11] ez: lzma: add byte hashtable generated with CRC32 Gao Xiang via Linux-erofs
@ 2020-02-29  4:50   ` Gao Xiang via Linux-erofs
  2020-02-29  4:50   ` [WIP] [PATCH v0.0-20200229 08/11] ez: lzma: add LZMA encoder Gao Xiang via Linux-erofs
                     ` (4 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Gao Xiang via Linux-erofs @ 2020-02-29  4:50 UTC (permalink / raw)
  To: linux-erofs

It implements HC4 matchfinder for now
with some minor optimization.

Signed-off-by: Gao Xiang <hsiangkao@aol.com>
---
 lzma/mf.c | 311 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 lzma/mf.h |  73 +++++++++++++
 2 files changed, 384 insertions(+)
 create mode 100644 lzma/mf.c
 create mode 100644 lzma/mf.h

diff --git a/lzma/mf.c b/lzma/mf.c
new file mode 100644
index 0000000..e7bcc40
--- /dev/null
+++ b/lzma/mf.c
@@ -0,0 +1,311 @@
+// SPDX-License-Identifier: Apache-2.0
+/*
+ * ez/lzma/mf.c - LZMA matchfinder
+ *
+ * Copyright (C) 2019-2020 Gao Xiang <hsiangkao@aol.com>
+ */
+#include <stdlib.h>
+#include <ez/unaligned.h>
+#include <ez/bitops.h>
+#include "mf.h"
+#include "bytehash.h"
+
+#define LZMA_HASH_2_SZ		(1U << 10)
+#define LZMA_HASH_3_SZ		(1U << 16)
+
+#define LZMA_HASH_3_BASE	(LZMA_HASH_2_SZ)
+#define LZMA_HASH_4_BASE	(LZMA_HASH_2_SZ + LZMA_HASH_3_SZ)
+
+static inline uint32_t mt_calc_dualhash(const uint8_t cur[2])
+{
+	return crc32_byte_hashtable[cur[0]] ^ cur[1];
+}
+
+static inline uint32_t mt_calc_hash_3(const uint8_t cur[3],
+				      const uint32_t dualhash)
+{
+	return (dualhash ^ (cur[2] << 8)) & (LZMA_HASH_3_SZ - 1);
+}
+
+static inline uint32_t mt_calc_hash_4(const uint8_t cur[4], unsigned int nbits)
+{
+	const uint32_t golden_ratio_32 = 0x61C88647;
+
+	return (get_unaligned_le32(cur) * golden_ratio_32) >> (32 - nbits);
+}
+
+/* Mark the current byte as processed from point of view of the match finder. */
+static void mf_move(struct lzma_mf *mf)
+{
+	if (mf->chaincur + 1 > mf->max_distance)
+		mf->chaincur = 0;
+	else
+		++mf->chaincur;
+
+	++mf->cur;
+	DBG_BUGON(mf->buffer + mf->cur > mf->iend);
+}
+
+static unsigned int lzma_mf_do_hc4_find(struct lzma_mf *mf,
+					struct lzma_match *matches)
+{
+	const uint32_t cur = mf->cur;
+	const uint8_t *ip = mf->buffer + cur;
+	const uint32_t pos = cur + mf->offset;
+	const uint32_t nice_len = mf->nice_len;
+	const uint8_t *ilimit =
+		ip + nice_len < mf->iend ? ip + nice_len : mf->iend;
+
+	const uint32_t dualhash = mt_calc_dualhash(ip);
+	const uint32_t hash_2 = dualhash & (LZMA_HASH_2_SZ - 1);
+	const uint32_t delta2 = pos - mf->hash[hash_2];
+	const uint32_t hash_3 = mt_calc_hash_3(ip, dualhash);
+	const uint32_t delta3 = pos - mf->hash[LZMA_HASH_3_BASE + hash_3];
+	const uint32_t hash_value = mt_calc_hash_4(ip, mf->hashbits);
+	uint32_t cur_match = mf->hash[LZMA_HASH_4_BASE + hash_value];
+	unsigned int bestlen, depth;
+	const uint8_t *matchend;
+	struct lzma_match *mp;
+
+	mf->hash[hash_2] = pos;
+	mf->hash[LZMA_HASH_3_BASE + hash_3] = pos;
+	mf->hash[LZMA_HASH_4_BASE + hash_value] = pos;
+	mf->chain[mf->chaincur] = cur_match;
+
+	mp = matches;
+	bestlen = 0;
+
+	/* check the 2-byte match */
+	if (delta2 <= mf->max_distance && *(ip - delta2) == *ip) {
+		matchend = ez_memcmp(ip + 2, ip - delta2 + 2, ilimit);
+
+		bestlen = matchend - ip;
+		*(mp++) = (struct lzma_match) { .len = bestlen,
+						.dist = delta2 };
+
+		if (matchend >= ilimit)
+			goto out;
+	}
+
+	/* check the 3-byte match */
+	if (delta2 != delta3 && delta3 <= mf->max_distance &&
+	    *(ip - delta3) == *ip) {
+		matchend = ez_memcmp(ip + 3, ip - delta3 + 3, ilimit);
+
+		if (matchend - ip > bestlen) {
+			bestlen = matchend - ip;
+			*(mp++) = (struct lzma_match) { .len = bestlen,
+							.dist = delta3 };
+
+			if (matchend >= ilimit)
+				goto out;
+		}
+	}
+
+	/* check 4 or more byte matches, traversal the whole hash chain */
+	for (depth = mf->depth; depth; --depth) {
+		const uint32_t delta = pos - cur_match;
+		const uint8_t *match = ip - delta;
+		uint32_t nextcur;
+
+		if (delta > mf->max_distance)
+			break;
+
+		nextcur = (mf->chaincur >= delta ? mf->chaincur - delta :
+			   mf->max_distance + 1 + mf->chaincur - delta);
+		cur_match = mf->chain[nextcur];
+
+		if (get_unaligned32(match) == get_unaligned32(ip) &&
+		    match[bestlen] == ip[bestlen]) {
+			matchend = ez_memcmp(ip + 4, match + 4, ilimit);
+
+			if (matchend - ip <= bestlen)
+				continue;
+
+			bestlen = matchend - ip;
+			*(mp++) = (struct lzma_match) { .len = bestlen,
+							.dist = delta };
+
+			if (matchend >= ilimit)
+				break;
+		}
+	}
+
+out:
+	return mp - matches;
+}
+
+/* aka. lzma_mf_hc4_skip */
+void lzma_mf_skip(struct lzma_mf *mf, unsigned int bytetotal)
+{
+	const unsigned int hashbits = mf->hashbits;
+	unsigned int unhashedskip = mf->unhashedskip;
+	unsigned int bytecount = 0;
+
+	if (unhashedskip) {
+		bytetotal += unhashedskip;
+		mf->cur -= unhashedskip;
+		mf->lookahead -= unhashedskip;
+		mf->unhashedskip = 0;
+	}
+
+	if (unlikely(!bytetotal))
+		return;
+
+	do {
+		const uint8_t *ip = mf->buffer + mf->cur;
+		uint32_t pos, dualhash, hash_2, hash_3, hash_value;
+
+		if (mf->iend - ip < 4) {
+			unhashedskip = bytetotal - bytecount;
+
+			mf->unhashedskip = unhashedskip;
+			mf->cur += unhashedskip;
+			break;
+		}
+
+		pos = mf->cur + mf->offset;
+
+		dualhash = mt_calc_dualhash(ip);
+		hash_2 = dualhash & (LZMA_HASH_2_SZ - 1);
+		mf->hash[hash_2] = pos;
+
+		hash_3 = mt_calc_hash_3(ip, dualhash);
+		mf->hash[LZMA_HASH_3_BASE + hash_3] = pos;
+
+		hash_value = mt_calc_hash_4(ip, hashbits);
+
+		mf->chain[mf->chaincur] =
+			mf->hash[LZMA_HASH_4_BASE + hash_value];
+		mf->hash[LZMA_HASH_4_BASE + hash_value] = pos;
+
+		mf_move(mf);
+	} while (++bytecount < bytetotal);
+
+	mf->lookahead += bytetotal;
+}
+
+static int lzma_mf_hc4_find(struct lzma_mf *mf,
+			    struct lzma_match *matches, bool finish)
+{
+	int ret;
+
+	if (mf->iend - &mf->buffer[mf->cur] < 4) {
+		if (!finish)
+			return -ERANGE;
+
+		mf->eod = true;
+		if (mf->buffer + mf->cur == mf->iend)
+			return -ERANGE;
+	}
+
+	if (!mf->eod) {
+		ret = lzma_mf_do_hc4_find(mf, matches);
+	} else {
+		ret = 0;
+		/* ++mf->unhashedskip; */
+		mf->unhashedskip = 0;	/* bypass all lzma_mf_skip(mf, 0); */
+	}
+	mf_move(mf);
+	++mf->lookahead;
+	return ret;
+}
+
+int lzma_mf_find(struct lzma_mf *mf, struct lzma_match *matches, bool finish)
+{
+	const uint8_t *ip = mf->buffer + mf->cur;
+	const uint8_t *iend = max((const uint8_t *)mf->iend,
+				  ip + kMatchMaxLen);
+	unsigned int i;
+	int ret;
+
+	/* if (mf->unhashedskip && !mf->eod) */
+	if (mf->unhashedskip)
+		lzma_mf_skip(mf, 0);
+
+	ret = lzma_mf_hc4_find(mf, matches, finish);
+	if (ret <= 0)
+		return ret;
+
+	i = ret;
+	do {
+		const uint8_t *cur = ip + matches[--i].len;
+
+		if (matches[i].len < mf->nice_len || cur >= iend)
+			break;
+
+		/* extend the candicated match as far as it can */
+		matches[i].len = ez_memcmp(cur, cur - matches[i].dist,
+					   iend) - ip;
+	} while (i);
+	return ret;
+}
+
+void lzma_mf_fill(struct lzma_mf *mf, const uint8_t *in, unsigned int size)
+{
+	DBG_BUGON(mf->buffer + mf->cur > mf->iend);
+
+	/* move the sliding window in advance if needed */
+	//if (mf->cur >= mf->size - mf->keep_size_after)
+	//	move_window(mf);
+
+	memcpy(mf->iend, in, size);
+	mf->iend += size;
+}
+
+int lzma_mf_reset(struct lzma_mf *mf, const struct lzma_mf_properties *p)
+{
+	const uint32_t dictsize = p->dictsize;
+	unsigned int new_hashbits;
+
+	if (!dictsize)
+		return -EINVAL;
+
+	if (dictsize < UINT16_MAX) {
+		new_hashbits = 16;
+	/* most significant set bit + 1 of distsize to derive hashbits */
+	} else {
+		const unsigned int hs = fls(dictsize);
+
+		new_hashbits = hs - (1 << (hs - 1) == dictsize);
+		if (new_hashbits > 31)
+			new_hashbits = 31;
+	}
+
+	if (new_hashbits != mf->hashbits ||
+	    mf->max_distance != dictsize - 1) {
+		if (mf->hash)
+			free(mf->hash);
+		if (mf->chain)
+			free(mf->chain);
+
+		mf->hashbits = 0;
+		mf->hash = calloc(LZMA_HASH_4_BASE + (1 << new_hashbits),
+				  sizeof(mf->hash[0]));
+		if (!mf->hash)
+			return -ENOMEM;
+
+		mf->chain = malloc(sizeof(mf->chain[0]) * (dictsize - 1));
+		if (!mf->chain) {
+			free(mf->hash);
+			return -ENOMEM;
+		}
+		mf->hashbits = new_hashbits;
+	}
+
+	mf->max_distance = dictsize - 1;
+	/*
+	 * Set the initial value as mf->max_distance + 1.
+	 * This would avoid hash zero initialization.
+	 */
+	mf->offset = mf->max_distance + 1;
+
+	mf->nice_len = p->nice_len;
+	mf->depth = p->depth;
+
+	mf->cur = 0;
+	mf->lookahead = 0;
+	mf->chaincur = 0;
+	return 0;
+}
+
diff --git a/lzma/mf.h b/lzma/mf.h
new file mode 100644
index 0000000..3df5043
--- /dev/null
+++ b/lzma/mf.h
@@ -0,0 +1,73 @@
+/* SPDX-License-Identifier: Apache-2.0 */
+/*
+ * ez/lzma/mf.h - header file for LZMA matchfinder
+ *
+ * Copyright (C) 2019-2020 Gao Xiang <hsiangkao@aol.com>
+ */
+#ifndef __LZMA_MF_H
+#define __LZMA_MF_H
+
+#include <ez/util.h>
+#include "lzma_common.h"
+
+struct lzma_mf_properties {
+	uint32_t dictsize;
+
+	uint32_t nice_len, depth;
+};
+
+/*
+ * an array used used by the LZ-based encoder to hold
+ * the length-distance pairs found by LZMA matchfinder.
+ */
+struct lzma_match {
+	unsigned int len;
+	unsigned int dist;
+};
+
+struct lzma_mf {
+	/* pointer to buffer with data to be compressed */
+	uint8_t *buffer;
+
+	/* size of the whole LZMA matchbuffer */
+	uint32_t size;
+
+	uint32_t offset;
+
+	/* indicate the next byte to run through the match finder */
+	uint32_t cur;
+
+	/* maximum length of a match that the matchfinder will try to find. */
+	uint32_t nice_len;
+
+	/* indicate the first byte that doesn't contain valid input data */
+	uint8_t *iend;
+
+	/* indicate the number of bytes still not encoded */
+	uint32_t lookahead;
+
+	/* LZ matchfinder hash chain representation */
+	uint32_t *hash, *chain;
+
+	/* indicate the next byte in chain (0 ~ max_distance) */
+	uint32_t chaincur;
+	uint8_t hashbits;
+
+	/* maximum number of loops in the match finder */
+	uint8_t depth;
+
+	uint32_t max_distance;
+
+	/* the number of bytes unhashed, and wait to roll back later */
+	uint32_t unhashedskip;
+
+	bool eod;
+};
+
+int lzma_mf_find(struct lzma_mf *mf, struct lzma_match *matches, bool finish);
+void lzma_mf_skip(struct lzma_mf *mf, unsigned int n);
+void lzma_mf_fill(struct lzma_mf *mf, const uint8_t *in, unsigned int size);
+int lzma_mf_reset(struct lzma_mf *mf, const struct lzma_mf_properties *p);
+
+#endif
+
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [WIP] [PATCH v0.0-20200229 08/11] ez: lzma: add LZMA encoder
  2020-02-29  4:50 ` [WIP] [PATCH v0.0-20200229 00/11] ez: LZMA fixed-sized output compression Gao Xiang via Linux-erofs
                     ` (6 preceding siblings ...)
  2020-02-29  4:50   ` [WIP] [PATCH v0.0-20200229 07/11] ez: lzma: add LZMA matchfinder Gao Xiang via Linux-erofs
@ 2020-02-29  4:50   ` Gao Xiang via Linux-erofs
  2020-02-29  4:50   ` [WIP] [PATCH v0.0-20200229 09/11] ez: lzma: checkpoint feature for range encoder Gao Xiang via Linux-erofs
                     ` (3 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Gao Xiang via Linux-erofs @ 2020-02-29  4:50 UTC (permalink / raw)
  To: linux-erofs

It adds an improved fast optimizer which implements
lazy matching and a LZMA symbol encoder.

Signed-off-by: Gao Xiang <hsiangkao@aol.com>
---
 lzma/lzma_encoder.c | 628 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 628 insertions(+)
 create mode 100644 lzma/lzma_encoder.c

diff --git a/lzma/lzma_encoder.c b/lzma/lzma_encoder.c
new file mode 100644
index 0000000..b213504
--- /dev/null
+++ b/lzma/lzma_encoder.c
@@ -0,0 +1,628 @@
+// SPDX-License-Identifier: Apache-2.0
+/*
+ * ez/lzma/lzma_encoder.c
+ *
+ * Copyright (C) 2019-2020 Gao Xiang <hsiangkao@aol.com>
+ *
+ * Authors: Igor Pavlov <http://7-zip.org/>
+ *          Lasse Collin <lasse.collin@tukaani.org>
+ *          Gao Xiang <hsiangkao@aol.com>
+ */
+#include <stdlib.h>
+#include <ez/bitops.h>
+#include "rc_encoder.h"
+#include "lzma_common.h"
+#include "mf.h"
+
+#define kNumBitModelTotalBits	11
+#define kBitModelTotal		(1 << kNumBitModelTotalBits)
+#define kProbInitValue		(kBitModelTotal >> 1)
+
+#define kNumStates		12
+#define LZMA_PB_MAX		4
+#define LZMA_NUM_PB_STATES_MAX	(1 << LZMA_PB_MAX)
+
+#define kNumLenToPosStates	4
+#define kNumPosSlotBits		6
+
+#define kStartPosModelIndex	4
+#define kEndPosModelIndex	14
+#define kNumFullDistances	(1 << (kEndPosModelIndex >> 1))
+
+#define kNumAlignBits		4
+#define kAlignTableSize		(1 << kNumAlignBits)
+#define kAlignMask		(kAlignTableSize - 1)
+
+#define kNumLenToPosStates	4
+
+#define is_literal_state(state) ((state) < 7)
+
+/* note that here dist is an zero-based distance */
+static unsigned int get_pos_slot2(unsigned int dist)
+{
+	const unsigned int zz = fls(dist) - 1;
+
+	return (zz + zz) + ((dist >> (zz - 1)) & 1);
+}
+
+static unsigned int get_pos_slot(unsigned int dist)
+{
+	return dist <= 4 ? dist : get_pos_slot2(dist);
+}
+
+/* aka. GetLenToPosState in LZMA */
+static inline unsigned int get_len_state(unsigned int len)
+{
+	if (len < kNumLenToPosStates - 1 + kMatchMinLen)
+		return len - kMatchMinLen;
+
+	return kNumLenToPosStates - 1;
+}
+
+struct lzma_properties {
+	uint32_t lc;	/* 0 <= lc <= 8, default = 3 */
+	uint32_t lp;	/* 0 <= lp <= 4, default = 0 */
+	uint32_t pb;	/* 0 <= pb <= 4, default = 2 */
+
+	struct lzma_mf_properties mf;
+};
+
+struct lzma_length_encoder {
+	probability low[LZMA_NUM_PB_STATES_MAX << (kLenNumLowBits + 1)];
+	probability high[kLenNumHighSymbols];
+};
+
+struct lzma_encoder {
+	struct lzma_mf mf;
+	struct lzma_rc_encoder rc;
+
+	uint8_t *op, *oend;
+	bool finish;
+
+	unsigned int state;
+
+	/* the four most recent match distances */
+	uint32_t reps[LZMA_NUM_REPS];
+
+	unsigned int pbMask, lpMask;
+
+	unsigned int lc, lp;
+
+	/* the following names came from lzma-specification.txt */
+	probability isMatch[kNumStates][LZMA_NUM_PB_STATES_MAX];
+	probability isRep[kNumStates];
+	probability isRepG0[kNumStates];
+	probability isRepG1[kNumStates];
+	probability isRepG2[kNumStates];
+	probability isRep0Long[kNumStates][LZMA_NUM_PB_STATES_MAX];
+
+	probability posSlotEncoder[kNumLenToPosStates][1 << kNumPosSlotBits];
+	probability posEncoders[kNumFullDistances];
+	probability posAlignEncoder[1 << kNumAlignBits];
+
+	probability *literal;
+
+	struct lzma_length_encoder lenEnc;
+	struct lzma_length_encoder repLenEnc;
+
+	struct {
+		struct lzma_match matches[kMatchMaxLen];
+		unsigned int matches_count;
+	} fast;
+};
+
+#define change_pair(smalldist, bigdist) (((bigdist) >> 7) > (smalldist))
+
+static int lzma_get_optimum_fast(struct lzma_encoder *lzma,
+				 uint32_t *back_res, uint32_t *len_res)
+{
+	struct lzma_mf *const mf = &lzma->mf;
+	const uint32_t nice_len = mf->nice_len;
+
+	unsigned int matches_count, i;
+	unsigned int longest_match_length, longest_match_back;
+	unsigned int best_replen, best_rep;
+	const uint8_t *ip, *ilimit, *ista;
+	uint32_t len;
+	int ret;
+
+	if (!mf->lookahead) {
+		ret = lzma_mf_find(mf, lzma->fast.matches, lzma->finish);
+
+		if (ret < 0)
+			return ret;
+
+		matches_count = ret;
+	} else {
+		matches_count = lzma->fast.matches_count;
+	}
+
+	ip = mf->buffer + mf->cur - mf->lookahead;
+
+	/* no valid match found by matchfinder */
+	if (!matches_count ||
+	/* not enough input left to encode a match */
+	   mf->iend - ip <= 2)
+		goto out_literal;
+
+	ilimit = (mf->iend <= ip + kMatchMaxLen ?
+		  mf->iend : ip + kMatchMaxLen);
+
+	best_replen = 0;
+
+	/* look for all valid repeat matches */
+	for (i = 0; i < LZMA_NUM_REPS; ++i) {
+		const uint8_t *const repp = ip - lzma->reps[i];
+
+		/* the first two bytes (MATCH_LEN_MIN == 2) do not match */
+		if (get_unaligned16(ip) != get_unaligned16(repp))
+			continue;
+
+		len = ez_memcmp(ip + 2, repp + 2, ilimit) - ip;
+		/* a repeated match at least nice_len, return it immediately */
+		if (len >= nice_len) {
+			*back_res = i;
+			*len_res = len;
+			lzma_mf_skip(mf, len - 1);
+			return 0;
+		}
+
+		if (len > best_replen) {
+			best_rep = i;
+			best_replen = len;
+		}
+	}
+
+	/*
+	 * although we didn't find a long enough repeated match,
+	 * the normal match is long enough to use directly.
+	 */
+	longest_match_length = lzma->fast.matches[matches_count - 1].len;
+	longest_match_back = lzma->fast.matches[matches_count - 1].dist;
+	if (longest_match_length >= nice_len) {
+		/* it's encoded as 0-based match distances */
+		*back_res = LZMA_NUM_REPS + longest_match_back - 1;
+		*len_res = longest_match_length;
+		lzma_mf_skip(mf, longest_match_length - 1);
+		return 0;
+	}
+
+	while (matches_count > 1) {
+		const struct lzma_match *const victim =
+			&lzma->fast.matches[matches_count - 2];
+
+		/* only (longest_match_length - 1) would be considered */
+		if (longest_match_length > victim->len + 1)
+			break;
+
+		if (!change_pair(victim->dist, longest_match_back))
+			break;
+
+		--matches_count;
+		longest_match_length = victim->len;
+		longest_match_back = victim->dist;
+	}
+
+	if (longest_match_length > best_replen + 1) {
+		best_replen = 0;
+
+		if (longest_match_length < 3 &&
+		    longest_match_back > 0x80)
+			goto out_literal;
+	} else {
+		longest_match_length = best_replen;
+		longest_match_back = 0;
+	}
+
+	ista = ip;
+
+	while (1) {
+		const struct lzma_match *victim;
+
+		ret = lzma_mf_find(mf, lzma->fast.matches, lzma->finish);
+
+		if (ret < 0) {
+			lzma->fast.matches_count = 0;
+			break;
+		}
+
+		lzma->fast.matches_count = ret;
+		if (!ret)
+			break;
+
+		victim = &lzma->fast.matches[lzma->fast.matches_count - 1];
+
+		/* both sides have eliminated '+ nlits' */
+		if (victim->len + 1 < longest_match_length)
+			break;
+
+		if (!best_replen) {
+			/* victim->len (should) >= longest_match_length - 1 */
+			const uint8_t *ip1 = ip + 1;
+			const uint32_t rl = max(2U, longest_match_length - 1);
+
+			/* TODO: lazy match for this */
+			for (i = 0; i < LZMA_NUM_REPS; ++i) {
+				if (!memcmp(ip1, ip1 - lzma->reps[i], rl)) {
+					*len_res = 0;
+					return ip1 - ista;
+				}
+			}
+
+			len = UINT32_MAX;
+		} else {
+			len = 0;
+		}
+
+		for (i = 0; i < LZMA_NUM_REPS; ++i) {
+			if (lzma->reps[i] == victim->dist) {
+				len = victim->len;
+				break;
+			}
+		}
+
+		/* if the previous match is a rep, this should be longer */
+		if (len <= best_replen)
+			break;
+
+		/* if it's not a rep */
+		if (len == UINT32_MAX) {
+			if (victim->len + 1 == longest_match_length &&
+			    !change_pair(victim->dist, longest_match_back))
+				break;
+
+			if (victim->len == longest_match_length &&
+			    get_pos_slot(victim->dist - 1) >=
+			    get_pos_slot(longest_match_back))
+				break;
+			len = 0;
+		}
+		longest_match_length = victim->len;
+		longest_match_back = victim->dist;
+		best_replen = len;
+		best_rep = i;
+		++ip;
+	}
+
+	/* it's encoded as 0-based match distances */
+	if (best_replen)
+		*back_res = best_rep;
+	else
+		*back_res = LZMA_NUM_REPS + longest_match_back - 1;
+
+	*len_res = longest_match_length;
+	lzma_mf_skip(mf, longest_match_length - 2 + (ret < 0));
+	return ip - ista;
+
+out_literal:
+	*len_res = 0;
+	return 1;
+}
+
+static void literal_matched(struct lzma_rc_encoder *rc, probability *probs,
+			    uint32_t match_byte, uint32_t symbol)
+{
+	uint32_t offset = 0x100;
+
+	symbol += 0x100;
+	do {
+		const unsigned int bit = (symbol >> 7) & 1;
+		const unsigned int match_bit = (match_byte <<= 1) & offset;
+
+		rc_bit(rc, &probs[offset + match_bit + (symbol >> 8)], bit);
+		symbol <<= 1;
+		offset &= ~(match_byte ^ symbol);
+	} while (symbol < 0x10000);
+}
+
+static void literal(struct lzma_encoder *lzma, uint32_t position)
+{
+	static const unsigned char kLiteralNextStates[] = {
+		0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 4, 5
+	};
+	struct lzma_mf *mf = &lzma->mf;
+	const uint8_t *ptr = &mf->buffer[mf->cur - mf->lookahead];
+	const unsigned int state = lzma->state;
+
+	probability *probs = lzma->literal +
+		3 * ((((position << 8) + ptr[-1]) & lzma->lpMask) << lzma->lc);
+
+	if (is_literal_state(state)) {
+		/*
+		 * Previous LZMA-symbol was a literal. Encode a normal
+		 * literal without a match byte.
+		 */
+		rc_bittree(&lzma->rc, probs, 8, *ptr);
+	} else {
+		/*
+		 * Previous LZMA-symbol was a match. Use the byte + 1
+		 * of the last match as a "match byte". That is, compare
+		 * the bits of the current literal and the match byte.
+		 */
+		const uint8_t match_byte = *(ptr - lzma->reps[0]);
+
+		literal_matched(&lzma->rc, probs, match_byte, *ptr);
+	}
+
+	lzma->state = kLiteralNextStates[state];
+}
+
+/* LenEnc_Encode */
+static void length(struct lzma_rc_encoder *rc,
+		   struct lzma_length_encoder *lc,
+		   const uint32_t pos_state, const uint32_t len)
+{
+	uint32_t sym = len - kMatchMinLen;
+	probability *probs = lc->low;
+
+	if (sym >= kLenNumLowSymbols) {
+		rc_bit(rc, probs, 1);
+		probs += kLenNumLowSymbols;
+		if (sym >= kLenNumLowSymbols * 2 /* + kLenNumMidSymbols */) {
+			rc_bit(rc, probs, 1);
+			rc_bittree(rc, lc->high, kLenNumHighBits,
+				   sym - kLenNumLowSymbols * 2);
+			return;
+		}
+		sym -= kLenNumLowSymbols;
+	}
+	rc_bit(rc, probs, 0);
+	rc_bittree(rc, probs + (pos_state << (kLenNumLowBits + 1)),
+		   kLenNumLowBits, sym);
+}
+
+/* Match */
+static void match(struct lzma_encoder *lzma, const uint32_t pos_state,
+		  const uint32_t dist, const uint32_t len)
+{
+	const uint32_t posSlot = get_pos_slot(dist);
+	const uint32_t lenState = get_len_state(len);
+
+	lzma->state = (is_literal_state(lzma->state) ? 7 : 10);
+	length(&lzma->rc, &lzma->lenEnc, pos_state, len);
+
+	/* - unsigned posSlot = PosSlotDecoder[lenState].Decode(&RangeDec); */
+	rc_bittree(&lzma->rc, lzma->posSlotEncoder[lenState],
+		   kNumPosSlotBits, posSlot);
+
+	if (dist >= kStartPosModelIndex) {
+		const uint32_t footer_bits = (posSlot >> 1) - 1;
+		const uint32_t base = (2 | (posSlot & 1)) << footer_bits;
+
+		if (dist < kNumFullDistances) {
+			rc_bittree_reverse(&lzma->rc,
+					   lzma->posEncoders + base,
+					   footer_bits, dist);
+		} else {
+			const uint32_t dist_reduced = dist - base;
+
+			rc_direct(&lzma->rc, dist_reduced >> kNumAlignBits,
+				  footer_bits - kNumAlignBits);
+
+#if 0
+			rc_bittree_reverse(&lzma->rc, lzma->posAlignEncoder,
+					   kNumAlignBits,
+					   dist_reduced & kAlignMask);
+#endif
+			rc_bittree_reverse(&lzma->rc, lzma->posAlignEncoder,
+					   kNumAlignBits, dist);
+		}
+	}
+	lzma->reps[3] = lzma->reps[2];
+	lzma->reps[2] = lzma->reps[1];
+	lzma->reps[1] = lzma->reps[0];
+	lzma->reps[0] = dist + 1;
+}
+
+static void rep_match(struct lzma_encoder *lzma, const uint32_t pos_state,
+		      const uint32_t rep, const uint32_t len)
+{
+	const unsigned int state = lzma->state;
+
+	if (rep == 0) {
+		rc_bit(&lzma->rc, &lzma->isRepG0[state], 0);
+		rc_bit(&lzma->rc, &lzma->isRep0Long[state][pos_state],
+		       len != 1);
+	} else {
+		const uint32_t distance = lzma->reps[rep];
+
+		rc_bit(&lzma->rc, &lzma->isRepG0[state], 1);
+		if (rep == 1) {
+			rc_bit(&lzma->rc, &lzma->isRepG1[state], 0);
+		} else {
+			rc_bit(&lzma->rc, &lzma->isRepG1[state], 1);
+			rc_bit(&lzma->rc, &lzma->isRepG2[state], rep - 2);
+
+			if (rep == 3)
+				lzma->reps[3] = lzma->reps[2];
+			lzma->reps[2] = lzma->reps[1];
+		}
+		lzma->reps[1] = lzma->reps[0];
+		lzma->reps[0] = distance;
+	}
+
+	if (len == 1) {
+		lzma->state = is_literal_state(state) ? 9 : 11;
+	} else {
+		length(&lzma->rc, &lzma->repLenEnc, pos_state, len);
+		lzma->state = is_literal_state(state) ? 8 : 11;
+	}
+}
+
+static void encode_eopm(struct lzma_encoder *lzma)
+{
+	const uint32_t pos_state =
+		(lzma->mf.cur - lzma->mf.lookahead) & lzma->pbMask;
+	const unsigned int state = lzma->state;
+
+	rc_bit(&lzma->rc, &lzma->isMatch[state][pos_state], 1);
+	rc_bit(&lzma->rc, &lzma->isRep[state], 0);
+	match(lzma, pos_state, UINT32_MAX, kMatchMinLen);
+}
+
+static int flush_symbol(struct lzma_encoder *lzma)
+{
+	return rc_encode(&lzma->rc, &lzma->op, lzma->oend) ? -ENOSPC : 0;
+}
+
+static int encode_symbol(struct lzma_encoder *lzma, uint32_t back,
+			 uint32_t len, uint32_t *position)
+{
+	int err = flush_symbol(lzma);
+
+	if (!err) {
+		const uint32_t pos_state = *position & lzma->pbMask;
+		const unsigned int state = lzma->state;
+		struct lzma_mf *const mf = &lzma->mf;
+
+		if (back == MARK_LIT) {
+			/* literal i.e. 8-bit byte */
+			rc_bit(&lzma->rc, &lzma->isMatch[state][pos_state], 0);
+			literal(lzma, *position);
+			len = 1;
+		} else {
+			rc_bit(&lzma->rc, &lzma->isMatch[state][pos_state], 1);
+
+			if (back < LZMA_NUM_REPS) {
+				/* repeated match */
+				rc_bit(&lzma->rc, &lzma->isRep[state], 1);
+				rep_match(lzma, pos_state, back, len);
+			} else {
+				/* normal match */
+				rc_bit(&lzma->rc, &lzma->isRep[state], 0);
+				match(lzma, pos_state,
+				      back - LZMA_NUM_REPS, len);
+			}
+		}
+
+		/* len bytes has been consumed by encoder */
+		DBG_BUGON(mf->lookahead < len);
+		mf->lookahead -= len;
+		*position += len;
+	}
+	return err;
+}
+
+/* encode sequence (literal, match) */
+static int encode_sequence(struct lzma_encoder *lzma, unsigned int nliterals,
+			   uint32_t back, uint32_t len, uint32_t *position)
+{
+	while (nliterals) {
+		int err = encode_symbol(lzma, MARK_LIT, 0, position);
+
+		if (err)
+			return err;
+		--nliterals;
+	}
+	if (!len)	/* no match */
+		return 0;
+	return encode_symbol(lzma, back, len, position);
+}
+
+static int __lzma_encode(struct lzma_encoder *lzma)
+{
+	uint32_t pos32 = lzma->mf.cur - lzma->mf.lookahead;
+	int err;
+
+	do {
+		uint32_t back, len;
+		int nlits;
+
+		nlits = lzma_get_optimum_fast(lzma, &back, &len);
+
+		if (nlits < 0) {
+			err = nlits;
+			break;
+		}
+
+		err = encode_sequence(lzma, nlits, back, len, &pos32);
+	} while (!err);
+	return err;
+}
+
+static void lzma_length_encoder_reset(struct lzma_length_encoder *lc)
+{
+	unsigned int i;
+
+	for (i = 0; i < ARRAY_SIZE(lc->low); i++)
+		lc->low[i] = kProbInitValue;
+
+	for (i = 0; i < ARRAY_SIZE(lc->high); i++)
+		lc->high[i] = kProbInitValue;
+}
+
+static int lzma_encoder_reset(struct lzma_encoder *lzma,
+			      const struct lzma_properties *props)
+{
+	unsigned int i, j, oldlclp, lclp;
+
+	lzma_mf_reset(&lzma->mf, &props->mf);
+	rc_reset(&lzma->rc);
+
+	/* refer to "The main loop of decoder" of lzma specification */
+	lzma->state = 0;
+	lzma->reps[0] = lzma->reps[1] = lzma->reps[2] =
+		lzma->reps[3] = 1;
+
+	/* reset all LZMA probability matrices */
+	for (i = 0; i < kNumStates; ++i) {
+		for (j = 0; j < LZMA_NUM_PB_STATES_MAX; ++j) {
+			lzma->isMatch[i][j] = kProbInitValue;
+			lzma->isRep0Long[i][j] = kProbInitValue;
+		}
+		lzma->isRep[i] = kProbInitValue;
+		lzma->isRepG0[i] = kProbInitValue;
+		lzma->isRepG1[i] = kProbInitValue;
+		lzma->isRepG2[i] = kProbInitValue;
+	}
+
+	for (i = 0; i < kNumLenToPosStates; ++i)
+		for (j = 0; j < (1 << kNumPosSlotBits); j++)
+			lzma->posSlotEncoder[i][j] = kProbInitValue;
+
+	for (i = 0; i < ARRAY_SIZE(lzma->posEncoders); i++)
+		lzma->posEncoders[i] = kProbInitValue;
+
+	for (i = 0; i < ARRAY_SIZE(lzma->posAlignEncoder); i++)
+		lzma->posAlignEncoder[i] = kProbInitValue;
+
+	/* set up LZMA literal probabilities */
+	oldlclp = lzma->lc + lzma->lp;
+	lclp = props->lc + props->lp;
+	lzma->lc = props->lc;
+	lzma->lp = props->lp;
+
+	if (lzma->literal && lclp != oldlclp) {
+		free(lzma->literal);
+		lzma->literal = NULL;
+	}
+
+	if (!lzma->literal) {
+		lzma->literal = malloc((0x300 << lclp) * sizeof(probability));
+		if (!lzma->literal)
+			return -ENOMEM;
+	}
+
+	for (i = 0; i < (0x300 << lclp); i++)
+		lzma->literal[i] = kProbInitValue;
+
+	lzma->pbMask = (1 << props->pb) - 1;
+	lzma->lpMask = (0x100 << props->lp) - (0x100 >> props->lc);
+
+	lzma_length_encoder_reset(&lzma->lenEnc);
+	lzma_length_encoder_reset(&lzma->repLenEnc);
+	return 0;
+}
+
+void lzma_default_properties(struct lzma_properties *p, int level)
+{
+	if (level < 0)
+		level = 5;
+
+	p->lc = 3;
+	p->lp = 0;
+	p->pb = 2;
+	p->mf.nice_len = (level < 7 ? 32 : 64);	/* LZMA SDK numFastBytes */
+	p->mf.depth = (16 + (p->mf.nice_len >> 1)) >> 1;
+}
+
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [WIP] [PATCH v0.0-20200229 09/11] ez: lzma: checkpoint feature for range encoder
  2020-02-29  4:50 ` [WIP] [PATCH v0.0-20200229 00/11] ez: LZMA fixed-sized output compression Gao Xiang via Linux-erofs
                     ` (7 preceding siblings ...)
  2020-02-29  4:50   ` [WIP] [PATCH v0.0-20200229 08/11] ez: lzma: add LZMA encoder Gao Xiang via Linux-erofs
@ 2020-02-29  4:50   ` Gao Xiang via Linux-erofs
  2020-02-29  4:50   ` [WIP] [PATCH v0.0-20200229 10/11] ez: lzma: add fixed-sized output compression Gao Xiang via Linux-erofs
                     ` (2 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Gao Xiang via Linux-erofs @ 2020-02-29  4:50 UTC (permalink / raw)
  To: linux-erofs

which is used to save the current state of range
encoder for later recovery in order to implement
LZMA fixed-sized output compression.

Signed-off-by: Gao Xiang <hsiangkao@aol.com>
---
 lzma/rc_encoder_ckpt.h | 40 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)
 create mode 100644 lzma/rc_encoder_ckpt.h

diff --git a/lzma/rc_encoder_ckpt.h b/lzma/rc_encoder_ckpt.h
new file mode 100644
index 0000000..415bf0c
--- /dev/null
+++ b/lzma/rc_encoder_ckpt.h
@@ -0,0 +1,40 @@
+/* SPDX-License-Identifier: Apache-2.0 */
+/*
+ * ez/lzma/rc_encoder_ckpt.h - checkpoint for range encoder
+ *
+ * Copyright (C) 2020 Gao Xiang <hsiangkao@aol.com>
+ */
+#ifndef __EZ_LZMA_RC_ENCODER_CKPT_H
+#define __EZ_LZMA_RC_ENCODER_CKPT_H
+
+#include "rc_encoder.h"
+
+struct lzma_rc_ckpt {
+	uint64_t low;
+	uint64_t extended_bytes;
+	uint32_t range;
+	uint8_t firstbyte;
+};
+
+void rc_write_checkpoint(struct lzma_rc_encoder *rc, struct lzma_rc_ckpt *cp)
+{
+	*cp = (struct lzma_rc_ckpt) { .low = rc->low,
+				      .extended_bytes = rc->extended_bytes,
+				      .range = rc->range,
+				      .firstbyte = rc->firstbyte
+	};
+}
+
+void rc_restore_checkpoint(struct lzma_rc_encoder *rc, struct lzma_rc_ckpt *cp)
+{
+	rc->low = cp->low;
+	rc->extended_bytes = cp->extended_bytes;
+	rc->range = cp->range;
+	rc->firstbyte = cp->firstbyte;
+
+	rc->pos = 0;
+	rc->count = 0;
+}
+
+#endif
+
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [WIP] [PATCH v0.0-20200229 10/11] ez: lzma: add fixed-sized output compression
  2020-02-29  4:50 ` [WIP] [PATCH v0.0-20200229 00/11] ez: LZMA fixed-sized output compression Gao Xiang via Linux-erofs
                     ` (8 preceding siblings ...)
  2020-02-29  4:50   ` [WIP] [PATCH v0.0-20200229 09/11] ez: lzma: checkpoint feature for range encoder Gao Xiang via Linux-erofs
@ 2020-02-29  4:50   ` Gao Xiang via Linux-erofs
  2020-02-29  4:50   ` [WIP] [PATCH v0.0-20200229 11/11] ez: lzma: add test program Gao Xiang via Linux-erofs
  2020-02-29  4:58   ` [WIP] [PATCH v0.0-20200229 00/11] ez: LZMA fixed-sized output compression Eric Biggers
  11 siblings, 0 replies; 14+ messages in thread
From: Gao Xiang via Linux-erofs @ 2020-02-29  4:50 UTC (permalink / raw)
  To: linux-erofs

After this patch, compressed data can be as
much as close to destsize but not exceed.

Signed-off-by: Gao Xiang <hsiangkao@aol.com>
---
 lzma/lzma_encoder.c | 133 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 132 insertions(+), 1 deletion(-)

diff --git a/lzma/lzma_encoder.c b/lzma/lzma_encoder.c
index b213504..98cde22 100644
--- a/lzma/lzma_encoder.c
+++ b/lzma/lzma_encoder.c
@@ -10,7 +10,7 @@
  */
 #include <stdlib.h>
 #include <ez/bitops.h>
-#include "rc_encoder.h"
+#include "rc_encoder_ckpt.h"
 #include "lzma_common.h"
 #include "mf.h"
 
@@ -72,12 +72,23 @@ struct lzma_length_encoder {
 	probability high[kLenNumHighSymbols];
 };
 
+struct lzma_encoder_destsize {
+	struct lzma_rc_ckpt cp;
+
+	uint8_t *op;
+	uint32_t capacity;
+
+	uint32_t esz;
+	uint8_t ending[LZMA_REQUIRED_INPUT_MAX + 5];
+};
+
 struct lzma_encoder {
 	struct lzma_mf mf;
 	struct lzma_rc_encoder rc;
 
 	uint8_t *op, *oend;
 	bool finish;
+	bool need_eopm;
 
 	unsigned int state;
 
@@ -109,6 +120,8 @@ struct lzma_encoder {
 		struct lzma_match matches[kMatchMaxLen];
 		unsigned int matches_count;
 	} fast;
+
+	struct lzma_encoder_destsize *dstsize;
 };
 
 #define change_pair(smalldist, bigdist) (((bigdist) >> 7) > (smalldist))
@@ -449,6 +462,46 @@ static void rep_match(struct lzma_encoder *lzma, const uint32_t pos_state,
 	}
 }
 
+struct lzma_endstate {
+	struct lzma_length_encoder lenEnc;
+
+	probability simpleMatch[2];
+	probability posSlot[kNumPosSlotBits];
+	probability posAlign[kNumAlignBits];
+};
+
+static void encode_eopm_stateless(struct lzma_encoder *lzma,
+				  struct lzma_endstate *endstate)
+{
+	const uint32_t pos_state =
+		(lzma->mf.cur - lzma->mf.lookahead) & lzma->pbMask;
+	const unsigned int state = lzma->state;
+	unsigned int i;
+
+	endstate->simpleMatch[0] = lzma->isMatch[state][pos_state];
+	endstate->simpleMatch[1] = lzma->isRep[state];
+	endstate->lenEnc = lzma->lenEnc;
+
+	rc_bit(&lzma->rc, endstate->simpleMatch, 1);
+	rc_bit(&lzma->rc, endstate->simpleMatch + 1, 0);
+	length(&lzma->rc, &endstate->lenEnc, pos_state, kMatchMinLen);
+
+	for (i = 0; i < kNumPosSlotBits; ++i) {
+		endstate->posSlot[i] =
+			lzma->posSlotEncoder[0][(1 << (i + 1)) - 1];
+		rc_bit(&lzma->rc, endstate->posSlot + i, 1);
+	}
+
+	rc_direct(&lzma->rc, (1 << (30 - kNumAlignBits)) - 1,
+		  30 - kNumAlignBits);
+
+	for (i = 0; i < kNumAlignBits; ++i) {
+		endstate->posAlign[i] =
+			lzma->posAlignEncoder[(1 << (i + 1)) - 1];
+		rc_bit(&lzma->rc, endstate->posAlign + i, 1);
+	}
+}
+
 static void encode_eopm(struct lzma_encoder *lzma)
 {
 	const uint32_t pos_state =
@@ -460,8 +513,86 @@ static void encode_eopm(struct lzma_encoder *lzma)
 	match(lzma, pos_state, UINT32_MAX, kMatchMinLen);
 }
 
+static int __flush_symbol_destsize(struct lzma_encoder *lzma)
+{
+	uint8_t *op2;
+	unsigned int symbols_size;
+	unsigned int esz = 0;
+
+	if (lzma->dstsize->capacity < 5)
+		return -ENOSPC;
+
+	if (!lzma->rc.pos) {
+		rc_write_checkpoint(&lzma->rc, &lzma->dstsize->cp);
+		lzma->dstsize->op = lzma->op;
+	}
+
+	if (rc_encode(&lzma->rc, &lzma->op, lzma->oend))
+		return -ENOSPC;
+
+	op2 = lzma->op;
+	symbols_size = op2 - lzma->dstsize->op;
+	if (lzma->dstsize->capacity < symbols_size + 5)
+		goto err_enospc;
+
+	if (!lzma->need_eopm)
+		goto out;
+
+	if (lzma->dstsize->capacity < symbols_size +
+	    LZMA_REQUIRED_INPUT_MAX + 5) {
+		struct lzma_rc_ckpt cp2;
+		struct lzma_endstate endstate;
+		uint8_t ending[sizeof(lzma->dstsize->ending)];
+		uint8_t *ep;
+
+		rc_write_checkpoint(&lzma->rc, &cp2);
+		encode_eopm_stateless(lzma, &endstate);
+		rc_flush(&lzma->rc);
+
+		ep = ending;
+		if (rc_encode(&lzma->rc, &ep, ending + sizeof(ending)))
+			DBG_BUGON(1);
+
+		esz = ep - ending;
+
+		if (lzma->dstsize->capacity < symbols_size + esz)
+			goto err_enospc;
+		rc_restore_checkpoint(&lzma->rc, &cp2);
+
+		memcpy(lzma->dstsize->ending, ending, sizeof(ending));
+		lzma->dstsize->esz = esz;
+	}
+
+out:
+	lzma->dstsize->capacity -= symbols_size;
+	lzma->dstsize->esz = esz;
+	return 0;
+
+err_enospc:
+	rc_restore_checkpoint(&lzma->rc, &lzma->dstsize->cp);
+	lzma->op = lzma->dstsize->op;
+	lzma->dstsize->capacity = 0;
+	return -ENOSPC;
+}
+
 static int flush_symbol(struct lzma_encoder *lzma)
 {
+	if (lzma->rc.count && lzma->dstsize) {
+		const unsigned int safemargin =
+			5 + (LZMA_REQUIRED_INPUT_MAX << !!lzma->need_eopm);
+		uint8_t *op;
+		bool ret;
+
+		if (lzma->dstsize->capacity < safemargin)
+			return __flush_symbol_destsize(lzma);
+
+		op = lzma->op;
+		ret = rc_encode(&lzma->rc, &lzma->op, lzma->oend);
+
+		lzma->dstsize->capacity -= lzma->op - op;
+		return ret ? -ENOSPC : 0;
+	}
+
 	return rc_encode(&lzma->rc, &lzma->op, lzma->oend) ? -ENOSPC : 0;
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [WIP] [PATCH v0.0-20200229 11/11] ez: lzma: add test program
  2020-02-29  4:50 ` [WIP] [PATCH v0.0-20200229 00/11] ez: LZMA fixed-sized output compression Gao Xiang via Linux-erofs
                     ` (9 preceding siblings ...)
  2020-02-29  4:50   ` [WIP] [PATCH v0.0-20200229 10/11] ez: lzma: add fixed-sized output compression Gao Xiang via Linux-erofs
@ 2020-02-29  4:50   ` Gao Xiang via Linux-erofs
  2020-02-29  4:58   ` [WIP] [PATCH v0.0-20200229 00/11] ez: LZMA fixed-sized output compression Eric Biggers
  11 siblings, 0 replies; 14+ messages in thread
From: Gao Xiang via Linux-erofs @ 2020-02-29  4:50 UTC (permalink / raw)
  To: linux-erofs

Usage:
$ ./run.sh
$ ./a.out output.bin.lzma infile

It will compress the beginning as much
as possible into 4k RAW LZMA block.

Signed-off-by: Gao Xiang <hsiangkao@aol.com>
---
 lzma/lzma_encoder.c | 115 ++++++++++++++++++++++++++++++++++++++++++++
 lzma/run.sh         |   1 +
 2 files changed, 116 insertions(+)
 create mode 100755 lzma/run.sh

diff --git a/lzma/lzma_encoder.c b/lzma/lzma_encoder.c
index 98cde22..7c4b51d 100644
--- a/lzma/lzma_encoder.c
+++ b/lzma/lzma_encoder.c
@@ -757,3 +757,118 @@ void lzma_default_properties(struct lzma_properties *p, int level)
 	p->mf.depth = (16 + (p->mf.nice_len >> 1)) >> 1;
 }
 
+#include <stdlib.h>
+#include <stdio.h>
+#include <fcntl.h>
+#include <unistd.h>
+
+#if 0
+const char text[] = "HABEABDABABABHHHEAAAAAAAA";
+#elif 0
+const char text[] = "abcde_bcdefgh_abcdefghxxxxxxx";
+#else
+const char text[] = "The only time we actually leave the path spinning is if we're truncating "
+"a small amount and don't actually free an extent, which is not a common "
+"occurrence.  We have to set the path blocking in order to add the "
+"delayed ref anyway, so the first extent we find we set the path to "
+"blocking and stay blocking for the duration of the operation.  With the "
+"upcoming file extent map stuff there will be another case that we have "
+"to have the path blocking, so just swap to blocking always.";
+#endif
+
+static const uint8_t lzma_header[] = {
+	0x5D,				/* LZMA model properties (lc, lp, pb) in encoded form */
+	0x00, 0x00, 0x80, 0x00,		/* Dictionary size (32-bit unsigned integer, little-endian) */
+	0xFF, 0xFF, 0xFF, 0xFF,
+	0xFF, 0xFF, 0xFF, 0xFF,		/* Uncompressed size (64-bit unsigned integer, little-endian) */
+};
+
+int main(int argc, char *argv[])
+{
+	char *outfile;
+	struct lzma_encoder lzmaenc = {0};
+	struct lzma_properties props = {
+		.mf.dictsize = 65536,
+	};
+	struct lzma_encoder_destsize dstsize;
+	uint8_t buf[4096];
+	int inf, outf;
+
+	int err;
+
+	lzmaenc.mf.buffer = malloc(65536) + 1;
+	lzmaenc.mf.buffer[-1] = 0;
+
+	if (argc >= 3) {
+		int len;
+
+		inf = open(argv[2], O_RDONLY);
+
+		len = lseek(inf, 0, SEEK_END);
+		if (len >= 65535)
+			len = 65535;
+
+		printf("len: %d\n", len);
+
+		lseek(inf, 0, SEEK_SET);
+		read(inf, lzmaenc.mf.buffer, len);
+		close(inf);
+		lzmaenc.mf.iend = lzmaenc.mf.buffer + len;
+	} else {
+		memcpy(lzmaenc.mf.buffer, text, sizeof(text));
+		lzmaenc.mf.iend = lzmaenc.mf.buffer + sizeof(text);
+	}
+	lzmaenc.op = buf;
+	lzmaenc.oend = buf + sizeof(buf);
+	lzmaenc.finish = true;
+
+	lzmaenc.need_eopm = true;
+	dstsize.capacity = 4096 - sizeof(lzma_header); //UINT32_MAX;
+	lzmaenc.dstsize = &dstsize;
+
+
+	lzma_default_properties(&props, 5);
+	lzma_encoder_reset(&lzmaenc, &props);
+
+	err = __lzma_encode(&lzmaenc);
+
+	printf("%d\n", err);
+
+	rc_encode(&lzmaenc.rc, &lzmaenc.op, lzmaenc.oend);
+
+	if (err != -ERANGE) {
+		memcpy(lzmaenc.op, dstsize.ending, dstsize.esz);
+		lzmaenc.op += dstsize.esz;
+	} else {
+		encode_eopm(&lzmaenc);
+		rc_flush(&lzmaenc.rc);
+
+		rc_encode(&lzmaenc.rc, &lzmaenc.op, lzmaenc.oend);
+	}
+
+	printf("encoded length: %lu + %lu\n", lzmaenc.op - buf,
+	       sizeof(lzma_header));
+
+	if (argc < 2)
+		outfile = "output.bin.lzma";
+	else
+		outfile = argv[1];
+
+	outf = open(outfile, O_WRONLY | O_CREAT | O_TRUNC, 0644);
+	write(outf, lzma_header, sizeof(lzma_header));
+	write(outf, buf, lzmaenc.op - buf);
+	close(outf);
+
+#if 0
+	nliterals = lzma_get_optimum_fast(&lzmaenc, &back_res, &len_res);
+	printf("nlits %d (%d %d)\n", nliterals, back_res, len_res);
+	encode_sequence(&lzmaenc, nliterals, back_res, len_res, &position);
+	nliterals = lzma_get_optimum_fast(&lzmaenc, &back_res, &len_res);
+	printf("nlits %d (%d %d)\n", nliterals, back_res, len_res);
+	nliterals = lzma_get_optimum_fast(&lzmaenc, &back_res, &len_res);
+	printf("nlits %d (%d %d)\n", nliterals, back_res, len_res);
+	nliterals = lzma_get_optimum_fast(&lzmaenc, &back_res, &len_res);
+	printf("nlits %d (%d %d)\n", nliterals, back_res, len_res);
+#endif
+}
+
diff --git a/lzma/run.sh b/lzma/run.sh
new file mode 100755
index 0000000..57adc12
--- /dev/null
+++ b/lzma/run.sh
@@ -0,0 +1 @@
+gcc -Wall -g -I ../include lzma_encoder.c mf.c
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [WIP] [PATCH v0.0-20200229 00/11] ez: LZMA fixed-sized output compression
  2020-02-29  4:50 ` [WIP] [PATCH v0.0-20200229 00/11] ez: LZMA fixed-sized output compression Gao Xiang via Linux-erofs
                     ` (10 preceding siblings ...)
  2020-02-29  4:50   ` [WIP] [PATCH v0.0-20200229 11/11] ez: lzma: add test program Gao Xiang via Linux-erofs
@ 2020-02-29  4:58   ` Eric Biggers
  2020-02-29  5:27     ` Gao Xiang
  11 siblings, 1 reply; 14+ messages in thread
From: Eric Biggers @ 2020-02-29  4:58 UTC (permalink / raw)
  To: Gao Xiang; +Cc: linux-erofs

On Sat, Feb 29, 2020 at 12:50:06PM +0800, Gao Xiang via Linux-erofs wrote:
> From: Gao Xiang <gaoxiang25@huawei.com>
> 
> This is a WIP PREVIEW patchset, just for archiving to open
> source community only.
> 
> For now, it implements LZMA SDK-like GetOptimumFast approach
> and GetOptimum is still on schedule.
> 
> It's still buggy, lack of formal APIs and actively under
> development for a while...
> 
> Usage:
> $ ./run.sh
> $ ./a.out output.bin.lzma infile
> 
> It will compress the beginning as much as possible into
> 4k RAW LZMA block.

Why not just use liblzma?

Also, if you care enough about compression ratio to use LZMA instead of
Zstandard, why use only a 4 KB blocksize?

- Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [WIP] [PATCH v0.0-20200229 00/11] ez: LZMA fixed-sized output compression
  2020-02-29  4:58   ` [WIP] [PATCH v0.0-20200229 00/11] ez: LZMA fixed-sized output compression Eric Biggers
@ 2020-02-29  5:27     ` Gao Xiang
  0 siblings, 0 replies; 14+ messages in thread
From: Gao Xiang @ 2020-02-29  5:27 UTC (permalink / raw)
  To: Eric Biggers; +Cc: Miao Xie, linux-erofs

Hi Eric,

On Fri, Feb 28, 2020 at 08:58:42PM -0800, Eric Biggers wrote:
> On Sat, Feb 29, 2020 at 12:50:06PM +0800, Gao Xiang via Linux-erofs wrote:
> > From: Gao Xiang <gaoxiang25@huawei.com>
> > 
> > This is a WIP PREVIEW patchset, just for archiving to open
> > source community only.
> > 
> > For now, it implements LZMA SDK-like GetOptimumFast approach
> > and GetOptimum is still on schedule.
> > 
> > It's still buggy, lack of formal APIs and actively under
> > development for a while...
> > 
> > Usage:
> > $ ./run.sh
> > $ ./a.out output.bin.lzma infile
> > 
> > It will compress the beginning as much as possible into
> > 4k RAW LZMA block.
> 
> Why not just use liblzma?

I discuss with Lasse and Igor in private, they have no recent plan to
develop fixed-sized output compression for now so it could not have
a formal upstream release in the near future.

After I digged into liblzma, it has many filters and LZMA2 wrapper so
it needs some more hack for now. I tend that Lasse could implement it
(Lasse said no recent, but it can be potential upstreamed. In fact,
liblzma is odd fixes status recent years compared with LZMA SDK...
However LZMA SDK coding style is so weird.)

On the other hand, it's just a product when I learned LZMA internals,
just like libdeflate in some extent. Bacause I need to do some coding
so I can learn LZMA internals and get how to do next step by step
because I'm not a LZMA expert at first.

> 
> Also, if you care enough about compression ratio to use LZMA instead of
> Zstandard, why use only a 4 KB blocksize?

They are all on schedule. I need some priority because I only have 24
hours a day and the content of my HUAWEI job is not all about EROFS. :(

If Zstandard developers have more interests on it. I'm happy to integrate
Zstd to EROFS as well. Otherwise, I need to do them myself. As a quick
glance, Zstandard uses multi-pass approach, so it may be hard for me to
do a quick hack. And I don't know all Zstandard internals as well.

Why use only a 4 KB blocksize --- It's just that I need to find a cleverer
and elegant way to do in-place decompression for larger physical cluster.
I need to think it over otherwise it could have potential compatibility
concerns and make EROFS more complex in the future.

In fact, Lasse raised a clever idea month ago. I will sort it out after
LZMA algortithm for EROFS is ready.

LZMA and larger physical cluster will be implemented this year If nothing
unexpected happens. I'm happy more experts and volunteers could have interest
in it but if only me, I only can do my best in my spare time...

Thanks,
Gao Xiang

> 
> - Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-02-29  5:29 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20200229045017.12424-1-hsiangkao.ref@aol.com>
2020-02-29  4:50 ` [WIP] [PATCH v0.0-20200229 00/11] ez: LZMA fixed-sized output compression Gao Xiang via Linux-erofs
2020-02-29  4:50   ` [WIP] [PATCH v0.0-20200229 01/11] ez: add basic source files Gao Xiang via Linux-erofs
2020-02-29  4:50   ` [WIP] [PATCH v0.0-20200229 02/11] ez: add helpers for unaligned accesses Gao Xiang via Linux-erofs
2020-02-29  4:50   ` [WIP] [PATCH v0.0-20200229 03/11] ez: introduce bitops header file Gao Xiang via Linux-erofs
2020-02-29  4:50   ` [WIP] [PATCH v0.0-20200229 04/11] ez: lzma: add range encoder Gao Xiang via Linux-erofs
2020-02-29  4:50   ` [WIP] [PATCH v0.0-20200229 05/11] ez: lzma: add common header file Gao Xiang via Linux-erofs
2020-02-29  4:50   ` [WIP] [PATCH v0.0-20200229 06/11] ez: lzma: add byte hashtable generated with CRC32 Gao Xiang via Linux-erofs
2020-02-29  4:50   ` [WIP] [PATCH v0.0-20200229 07/11] ez: lzma: add LZMA matchfinder Gao Xiang via Linux-erofs
2020-02-29  4:50   ` [WIP] [PATCH v0.0-20200229 08/11] ez: lzma: add LZMA encoder Gao Xiang via Linux-erofs
2020-02-29  4:50   ` [WIP] [PATCH v0.0-20200229 09/11] ez: lzma: checkpoint feature for range encoder Gao Xiang via Linux-erofs
2020-02-29  4:50   ` [WIP] [PATCH v0.0-20200229 10/11] ez: lzma: add fixed-sized output compression Gao Xiang via Linux-erofs
2020-02-29  4:50   ` [WIP] [PATCH v0.0-20200229 11/11] ez: lzma: add test program Gao Xiang via Linux-erofs
2020-02-29  4:58   ` [WIP] [PATCH v0.0-20200229 00/11] ez: LZMA fixed-sized output compression Eric Biggers
2020-02-29  5:27     ` Gao Xiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).