linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] S390 hardware compression support for kernel zlib
@ 2019-11-26 14:41 Mikhail Zaslonko
  2019-11-26 14:41 ` [PATCH 1/5] lib/zlib: Add s390 hardware support for kernel zlib_deflate Mikhail Zaslonko
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Mikhail Zaslonko @ 2019-11-26 14:41 UTC (permalink / raw)
  To: Andrew Morton, Chris Mason, Josef Bacik, David Sterba, Richard Purdie
  Cc: Heiko Carstens, Vasily Gorbik, Christian Borntraeger, linux-s390,
	linux-kernel

With IBM z15 mainframe the new DFLTCC instruction is available. It
implements deflate algorithm in hardware (Nest Acceleration Unit - NXU)
with estimated compression and decompression performance orders of
magnitude faster than the current zlib.

This patch-set adds s390 hardware compression support to kernel zlib.
The code is based on the userspace zlib implementation:
https://github.com/madler/zlib/pull/410
The coding style is also preserved for future maintainability.
There is only limited set of userspace zlib functions represented in
kernel. Apart from that, all the memory allocation should be performed
in advance. Thus, the workarea structures are extended with the parameter
lists required for the DEFLATE CONVENTION CALL instruction.
Since kernel zlib itself does not support gzip headers, only Adler-32
checksum is processed (also can be produced by DFLTCC facility).
Like it was implemented for userspace, kernel zlib will compress in
hardware on level 1, and in software on all other levels. Decompression
will always happen in hardware (when enabled).
Two DFLTCC compression calls produce the same results only when they
both are made on machines of the same generation, and when the
respective buffers have the same offset relative to the start of the
page. Therefore care should be taken when using hardware compression
when reproducible results are desired. However it does always produce
the standard conform output which can be inflated anyway.
The new kernel command line parameter 'dfltcc' is introduced to
configure s390 zlib hardware support:
    Format: { on | off | def_only | inf_only | always }
     on:       s390 zlib hardware support for compression on
               level 1 and decompression (default)
     off:      No s390 zlib hardware support
     def_only: s390 zlib hardware support for deflate
               only (compression on level 1)
     inf_only: s390 zlib hardware support for inflate
               only (decompression)
     always:   Same as 'on' but ignores the selected compression
               level always using hardware support (used for debugging)

The main purpose of the integration of the NXU support into the kernel zlib
is the use of hardware deflate in btrfs filesystem with on-the-fly
compression enabled. Apart from that, hardware support can also be used
during boot for decompressing the kernel or the ramdisk image 

With the patch for btrfs expanding zlib buffer from 1 to 4 pages (patch 5)
the following performance results have been achieved using the ramdisk
with btrfs. These are relative numbers based on throughput rate and
compression ratio for zlib level 1:

  Input data              Deflate rate   Inflate rate   Compression ratio
                          NXU/Software   NXU/Software   NXU/Software
  stream of zeroes        1.46           1.02           1.00
  random ASCII data       10.44          3.00           0.96
  ASCII text (dickens)    6,21           3.33           0.94
  binary data (vmlinux)   8,37           3.90           1.02

This means that s390 hardware deflate can provide up to 10 times faster
compression (on level 1) and up to 4 times faster decompression (refers to
all compression  levels) for btrfs zlib.

Disclaimer: Performance results are based on IBM internal tests using DD
command-line utility on btrfs on a Fedora 30 based internal driver in native
LPAR on a z15 system. Results may vary based on individual workload,
configuration and software levels.

Mikhail Zaslonko (5):
  lib/zlib: Add s390 hardware support for kernel zlib_deflate
  s390/boot: Rename HEAP_SIZE due to name collision
  lib/zlib: Add s390 hardware support for kernel zlib_inflate
  s390/boot: Add dfltcc= kernel command line parameter
  btrfs: Increase buffer size for zlib functions

 .../admin-guide/kernel-parameters.txt         |  12 +
 arch/s390/boot/compressed/decompressor.c      |   8 +-
 arch/s390/boot/ipl_parm.c                     |  14 +
 arch/s390/include/asm/setup.h                 |   7 +
 arch/s390/kernel/setup.c                      |   1 +
 fs/btrfs/compression.c                        |   2 +-
 fs/btrfs/zlib.c                               | 109 ++++---
 lib/Kconfig                                   |  22 ++
 lib/Makefile                                  |   1 +
 lib/decompress_inflate.c                      |  13 +
 lib/zlib_deflate/deflate.c                    |  79 ++---
 lib/zlib_deflate/deftree.c                    |  54 ----
 lib/zlib_deflate/defutil.h                    | 134 ++++++++-
 lib/zlib_dfltcc/Makefile                      |  11 +
 lib/zlib_dfltcc/dfltcc.c                      |  55 ++++
 lib/zlib_dfltcc/dfltcc.h                      | 144 +++++++++
 lib/zlib_dfltcc/dfltcc_deflate.c              | 280 ++++++++++++++++++
 lib/zlib_dfltcc/dfltcc_inflate.c              | 149 ++++++++++
 lib/zlib_dfltcc/dfltcc_syms.c                 |  17 ++
 lib/zlib_dfltcc/dfltcc_util.h                 | 124 ++++++++
 lib/zlib_inflate/inflate.c                    |  32 +-
 lib/zlib_inflate/inflate.h                    |   8 +
 lib/zlib_inflate/infutil.h                    |  18 +-
 23 files changed, 1137 insertions(+), 157 deletions(-)
 create mode 100644 lib/zlib_dfltcc/Makefile
 create mode 100644 lib/zlib_dfltcc/dfltcc.c
 create mode 100644 lib/zlib_dfltcc/dfltcc.h
 create mode 100644 lib/zlib_dfltcc/dfltcc_deflate.c
 create mode 100644 lib/zlib_dfltcc/dfltcc_inflate.c
 create mode 100644 lib/zlib_dfltcc/dfltcc_syms.c
 create mode 100644 lib/zlib_dfltcc/dfltcc_util.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/5] lib/zlib: Add s390 hardware support for kernel zlib_deflate
  2019-11-26 14:41 [PATCH 0/5] S390 hardware compression support for kernel zlib Mikhail Zaslonko
@ 2019-11-26 14:41 ` Mikhail Zaslonko
  2019-11-26 14:41 ` [PATCH 2/5] s390/boot: Rename HEAP_SIZE due to name collision Mikhail Zaslonko
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Mikhail Zaslonko @ 2019-11-26 14:41 UTC (permalink / raw)
  To: Andrew Morton, Chris Mason, Josef Bacik, David Sterba, Richard Purdie
  Cc: Heiko Carstens, Vasily Gorbik, Christian Borntraeger, linux-s390,
	linux-kernel

Create zlib_dfltcc library with the s390 DEFLATE CONVERSION CALL
implementation and related compression functions.
Update zlib_deflate functions with the hooks for s390 hardware support
and adjust workspace structures with extra parameter lists required
for hardware deflate.

Co-developed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
---
 lib/Kconfig                      |  14 ++
 lib/Makefile                     |   1 +
 lib/zlib_deflate/deflate.c       |  79 ++++-----
 lib/zlib_deflate/deftree.c       |  54 ------
 lib/zlib_deflate/defutil.h       | 134 +++++++++++++--
 lib/zlib_dfltcc/Makefile         |  11 ++
 lib/zlib_dfltcc/dfltcc.c         |  52 ++++++
 lib/zlib_dfltcc/dfltcc.h         | 115 +++++++++++++
 lib/zlib_dfltcc/dfltcc_deflate.c | 274 +++++++++++++++++++++++++++++++
 lib/zlib_dfltcc/dfltcc_syms.c    |  17 ++
 lib/zlib_dfltcc/dfltcc_util.h    | 119 ++++++++++++++
 11 files changed, 768 insertions(+), 102 deletions(-)
 create mode 100644 lib/zlib_dfltcc/Makefile
 create mode 100644 lib/zlib_dfltcc/dfltcc.c
 create mode 100644 lib/zlib_dfltcc/dfltcc.h
 create mode 100644 lib/zlib_dfltcc/dfltcc_deflate.c
 create mode 100644 lib/zlib_dfltcc/dfltcc_syms.c
 create mode 100644 lib/zlib_dfltcc/dfltcc_util.h

diff --git a/lib/Kconfig b/lib/Kconfig
index 3321d04dfa5a..7ccaa54006c3 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -278,6 +278,20 @@ config ZLIB_DEFLATE
 	tristate
 	select BITREVERSE
 
+config ZLIB_DFLTCC
+	def_bool y
+	depends on S390
+	prompt "Enable s390x DEFLATE CONVERSION CALL support for zlib"
+	help
+	 Enable s390x hardware support for zlib in the kernel.
+
+config ZLIB_DFLTCC_DEFLATE
+	def_bool y
+	depends on ZLIB_DFLTCC && ZLIB_DEFLATE
+	prompt "Enable s390x hardware support for zlib deflate functions"
+	help
+	 Enable s390x hardware support for zlib deflate in the kernel.
+
 config LZO_COMPRESS
 	tristate
 
diff --git a/lib/Makefile b/lib/Makefile
index c5892807e06f..97aba321bb36 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -139,6 +139,7 @@ obj-$(CONFIG_842_COMPRESS) += 842/
 obj-$(CONFIG_842_DECOMPRESS) += 842/
 obj-$(CONFIG_ZLIB_INFLATE) += zlib_inflate/
 obj-$(CONFIG_ZLIB_DEFLATE) += zlib_deflate/
+obj-$(CONFIG_ZLIB_DFLTCC) += zlib_dfltcc/
 obj-$(CONFIG_REED_SOLOMON) += reed_solomon/
 obj-$(CONFIG_BCH) += bch.o
 obj-$(CONFIG_LZO_COMPRESS) += lzo/
diff --git a/lib/zlib_deflate/deflate.c b/lib/zlib_deflate/deflate.c
index d20ef458f137..9595b32b944a 100644
--- a/lib/zlib_deflate/deflate.c
+++ b/lib/zlib_deflate/deflate.c
@@ -52,16 +52,18 @@
 #include <linux/zutil.h>
 #include "defutil.h"
 
+/* architecture-specific bits */
+#ifdef CONFIG_ZLIB_DFLTCC_DEFLATE
+#  include "../zlib_dfltcc/dfltcc.h"
+#else
+#define DEFLATE_RESET_HOOK(strm) do {} while (0)
+#define DEFLATE_HOOK(strm, flush, bstate) 0
+#define DEFLATE_NEED_CHECKSUM(strm) 1
+#endif
 
 /* ===========================================================================
  *  Function prototypes.
  */
-typedef enum {
-    need_more,      /* block not completed, need more input or more output */
-    block_done,     /* block flush performed */
-    finish_started, /* finish started, need only more output at next deflate */
-    finish_done     /* finish done, accept no more input or output */
-} block_state;
 
 typedef block_state (*compress_func) (deflate_state *s, int flush);
 /* Compression function. Returns the block state after the call. */
@@ -72,7 +74,6 @@ static block_state deflate_fast   (deflate_state *s, int flush);
 static block_state deflate_slow   (deflate_state *s, int flush);
 static void lm_init        (deflate_state *s);
 static void putShortMSB    (deflate_state *s, uInt b);
-static void flush_pending  (z_streamp strm);
 static int read_buf        (z_streamp strm, Byte *buf, unsigned size);
 static uInt longest_match  (deflate_state *s, IPos cur_match);
 
@@ -98,6 +99,25 @@ static  void check_match (deflate_state *s, IPos start, IPos match,
  * See deflate.c for comments about the MIN_MATCH+1.
  */
 
+/* Workspace to be allocated for deflate processing */
+typedef struct deflate_workspace {
+    /* State memory for the deflator */
+    deflate_state deflate_memory;
+#ifdef CONFIG_ZLIB_DFLTCC_DEFLATE
+    /* State memory for s390 hardware deflate */
+    struct dfltcc_state dfltcc_memory;
+#endif
+    Byte *window_memory;
+    Pos *prev_memory;
+    Pos *head_memory;
+    char *overlay_memory;
+} deflate_workspace;
+
+#ifdef CONFIG_ZLIB_DFLTCC_DEFLATE
+/* dfltcc_state must be doubleword aligned for DFLTCC call */
+static_assert(offsetof(struct deflate_workspace, dfltcc_memory) % 8 == 0);
+#endif
+
 /* Values for max_lazy_match, good_match and max_chain_length, depending on
  * the desired pack level (0..9). The values given below have been tuned to
  * exclude worst case performance for pathological files. Better values may be
@@ -207,7 +227,15 @@ int zlib_deflateInit2(
      */
     next = (char *) mem;
     next += sizeof(*mem);
+#ifdef CONFIG_ZLIB_DFLTCC_DEFLATE
+    /*
+     *  DFLTCC requires the window to be page aligned.
+     *  Thus, we overallocate and take the aligned portion of the buffer.
+     */
+    mem->window_memory = (Byte *) PTR_ALIGN(next, PAGE_SIZE);
+#else
     mem->window_memory = (Byte *) next;
+#endif
     next += zlib_deflate_window_memsize(windowBits);
     mem->prev_memory = (Pos *) next;
     next += zlib_deflate_prev_memsize(windowBits);
@@ -277,6 +305,8 @@ int zlib_deflateReset(
     zlib_tr_init(s);
     lm_init(s);
 
+    DEFLATE_RESET_HOOK(strm);
+
     return Z_OK;
 }
 
@@ -294,35 +324,6 @@ static void putShortMSB(
     put_byte(s, (Byte)(b & 0xff));
 }   
 
-/* =========================================================================
- * Flush as much pending output as possible. All deflate() output goes
- * through this function so some applications may wish to modify it
- * to avoid allocating a large strm->next_out buffer and copying into it.
- * (See also read_buf()).
- */
-static void flush_pending(
-	z_streamp strm
-)
-{
-    deflate_state *s = (deflate_state *) strm->state;
-    unsigned len = s->pending;
-
-    if (len > strm->avail_out) len = strm->avail_out;
-    if (len == 0) return;
-
-    if (strm->next_out != NULL) {
-	memcpy(strm->next_out, s->pending_out, len);
-	strm->next_out += len;
-    }
-    s->pending_out += len;
-    strm->total_out += len;
-    strm->avail_out  -= len;
-    s->pending -= len;
-    if (s->pending == 0) {
-        s->pending_out = s->pending_buf;
-    }
-}
-
 /* ========================================================================= */
 int zlib_deflate(
 	z_streamp strm,
@@ -404,7 +405,8 @@ int zlib_deflate(
         (flush != Z_NO_FLUSH && s->status != FINISH_STATE)) {
         block_state bstate;
 
-	bstate = (*(configuration_table[s->level].func))(s, flush);
+	bstate = DEFLATE_HOOK(strm, flush, &bstate) ? bstate :
+		 (*(configuration_table[s->level].func))(s, flush);
 
         if (bstate == finish_started || bstate == finish_done) {
             s->status = FINISH_STATE;
@@ -503,7 +505,8 @@ static int read_buf(
 
     strm->avail_in  -= len;
 
-    if (!((deflate_state *)(strm->state))->noheader) {
+    if (!DEFLATE_NEED_CHECKSUM(strm)) {}
+    else if (!((deflate_state *)(strm->state))->noheader) {
         strm->adler = zlib_adler32(strm->adler, strm->next_in, len);
     }
     memcpy(buf, strm->next_in, len);
diff --git a/lib/zlib_deflate/deftree.c b/lib/zlib_deflate/deftree.c
index 9b1756b12743..a4a34da512fe 100644
--- a/lib/zlib_deflate/deftree.c
+++ b/lib/zlib_deflate/deftree.c
@@ -76,11 +76,6 @@ static const uch bl_order[BL_CODES]
  * probability, to avoid transmitting the lengths for unused bit length codes.
  */
 
-#define Buf_size (8 * 2*sizeof(char))
-/* Number of bits used within bi_buf. (bi_buf might be implemented on
- * more than 16 bits on some systems.)
- */
-
 /* ===========================================================================
  * Local data. These are initialized only once.
  */
@@ -147,7 +142,6 @@ static void send_all_trees (deflate_state *s, int lcodes, int dcodes,
 static void compress_block (deflate_state *s, ct_data *ltree,
                            ct_data *dtree);
 static void set_data_type  (deflate_state *s);
-static void bi_windup      (deflate_state *s);
 static void bi_flush       (deflate_state *s);
 static void copy_block     (deflate_state *s, char *buf, unsigned len,
                            int header);
@@ -169,54 +163,6 @@ static void copy_block     (deflate_state *s, char *buf, unsigned len,
  * used.
  */
 
-/* ===========================================================================
- * Send a value on a given number of bits.
- * IN assertion: length <= 16 and value fits in length bits.
- */
-#ifdef DEBUG_ZLIB
-static void send_bits      (deflate_state *s, int value, int length);
-
-static void send_bits(
-	deflate_state *s,
-	int value,  /* value to send */
-	int length  /* number of bits */
-)
-{
-    Tracevv((stderr," l %2d v %4x ", length, value));
-    Assert(length > 0 && length <= 15, "invalid length");
-    s->bits_sent += (ulg)length;
-
-    /* If not enough room in bi_buf, use (valid) bits from bi_buf and
-     * (16 - bi_valid) bits from value, leaving (width - (16-bi_valid))
-     * unused bits in value.
-     */
-    if (s->bi_valid > (int)Buf_size - length) {
-        s->bi_buf |= (value << s->bi_valid);
-        put_short(s, s->bi_buf);
-        s->bi_buf = (ush)value >> (Buf_size - s->bi_valid);
-        s->bi_valid += length - Buf_size;
-    } else {
-        s->bi_buf |= value << s->bi_valid;
-        s->bi_valid += length;
-    }
-}
-#else /* !DEBUG_ZLIB */
-
-#define send_bits(s, value, length) \
-{ int len = length;\
-  if (s->bi_valid > (int)Buf_size - len) {\
-    int val = value;\
-    s->bi_buf |= (val << s->bi_valid);\
-    put_short(s, s->bi_buf);\
-    s->bi_buf = (ush)val >> (Buf_size - s->bi_valid);\
-    s->bi_valid += len - Buf_size;\
-  } else {\
-    s->bi_buf |= (value) << s->bi_valid;\
-    s->bi_valid += len;\
-  }\
-}
-#endif /* DEBUG_ZLIB */
-
 /* ===========================================================================
  * Initialize the various 'constant' tables. In a multi-threaded environment,
  * this function may be called by two threads concurrently, but this is
diff --git a/lib/zlib_deflate/defutil.h b/lib/zlib_deflate/defutil.h
index a8c370897c9f..abf50099dfb0 100644
--- a/lib/zlib_deflate/defutil.h
+++ b/lib/zlib_deflate/defutil.h
@@ -1,5 +1,7 @@
+#ifndef DEFUTIL_H
+#define DEFUTIL_H
 
-
+#include <linux/zutil.h>
 
 #define Assert(err, str) 
 #define Trace(dummy) 
@@ -238,17 +240,13 @@ typedef struct deflate_state {
 
 } deflate_state;
 
-typedef struct deflate_workspace {
-    /* State memory for the deflator */
-    deflate_state deflate_memory;
-    Byte *window_memory;
-    Pos *prev_memory;
-    Pos *head_memory;
-    char *overlay_memory;
-} deflate_workspace;
-
+#ifdef CONFIG_ZLIB_DFLTCC_DEFLATE
+#define zlib_deflate_window_memsize(windowBits) \
+	(2 * (1 << (windowBits)) * sizeof(Byte) + PAGE_SIZE)
+#else
 #define zlib_deflate_window_memsize(windowBits) \
 	(2 * (1 << (windowBits)) * sizeof(Byte))
+#endif
 #define zlib_deflate_prev_memsize(windowBits) \
 	((1 << (windowBits)) * sizeof(Pos))
 #define zlib_deflate_head_memsize(memLevel) \
@@ -292,6 +290,24 @@ void zlib_tr_stored_type_only (deflate_state *);
     put_byte(s, (uch)((ush)(w) >> 8)); \
 }
 
+/* ===========================================================================
+ * Reverse the first len bits of a code, using straightforward code (a faster
+ * method would use a table)
+ * IN assertion: 1 <= len <= 15
+ */
+static inline unsigned  bi_reverse(
+    unsigned code, /* the value to invert */
+    int len        /* its bit length */
+)
+{
+    register unsigned res = 0;
+    do {
+        res |= code & 1;
+        code >>= 1, res <<= 1;
+    } while (--len > 0);
+    return res >> 1;
+}
+
 /* ===========================================================================
  * Flush the bit buffer, keeping at most 7 bits in it.
  */
@@ -325,3 +341,101 @@ static inline void bi_windup(deflate_state *s)
 #endif
 }
 
+typedef enum {
+    need_more,      /* block not completed, need more input or more output */
+    block_done,     /* block flush performed */
+    finish_started, /* finish started, need only more output at next deflate */
+    finish_done     /* finish done, accept no more input or output */
+} block_state;
+
+#define Buf_size (8 * 2*sizeof(char))
+/* Number of bits used within bi_buf. (bi_buf might be implemented on
+ * more than 16 bits on some systems.)
+ */
+
+/* ===========================================================================
+ * Send a value on a given number of bits.
+ * IN assertion: length <= 16 and value fits in length bits.
+ */
+#ifdef DEBUG_ZLIB
+static void send_bits      (deflate_state *s, int value, int length);
+
+static void send_bits(
+    deflate_state *s,
+    int value,  /* value to send */
+    int length  /* number of bits */
+)
+{
+    Tracevv((stderr," l %2d v %4x ", length, value));
+    Assert(length > 0 && length <= 15, "invalid length");
+    s->bits_sent += (ulg)length;
+
+    /* If not enough room in bi_buf, use (valid) bits from bi_buf and
+     * (16 - bi_valid) bits from value, leaving (width - (16-bi_valid))
+     * unused bits in value.
+     */
+    if (s->bi_valid > (int)Buf_size - length) {
+        s->bi_buf |= (value << s->bi_valid);
+        put_short(s, s->bi_buf);
+        s->bi_buf = (ush)value >> (Buf_size - s->bi_valid);
+        s->bi_valid += length - Buf_size;
+    } else {
+        s->bi_buf |= value << s->bi_valid;
+        s->bi_valid += length;
+    }
+}
+#else /* !DEBUG_ZLIB */
+
+#define send_bits(s, value, length) \
+{ int len = length;\
+  if (s->bi_valid > (int)Buf_size - len) {\
+    int val = value;\
+    s->bi_buf |= (val << s->bi_valid);\
+    put_short(s, s->bi_buf);\
+    s->bi_buf = (ush)val >> (Buf_size - s->bi_valid);\
+    s->bi_valid += len - Buf_size;\
+  } else {\
+    s->bi_buf |= (value) << s->bi_valid;\
+    s->bi_valid += len;\
+  }\
+}
+#endif /* DEBUG_ZLIB */
+
+static inline void zlib_tr_send_bits(
+    deflate_state *s,
+    int value,
+    int length
+)
+{
+    send_bits(s, value, length);
+}
+
+/* =========================================================================
+ * Flush as much pending output as possible. All deflate() output goes
+ * through this function so some applications may wish to modify it
+ * to avoid allocating a large strm->next_out buffer and copying into it.
+ * (See also read_buf()).
+ */
+static inline void flush_pending(
+	z_streamp strm
+)
+{
+    deflate_state *s = (deflate_state *) strm->state;
+    unsigned len = s->pending;
+
+    if (len > strm->avail_out) len = strm->avail_out;
+    if (len == 0) return;
+
+    if (strm->next_out != NULL) {
+	memcpy(strm->next_out, s->pending_out, len);
+	strm->next_out += len;
+    }
+    s->pending_out += len;
+    strm->total_out += len;
+    strm->avail_out  -= len;
+    s->pending -= len;
+    if (s->pending == 0) {
+        s->pending_out = s->pending_buf;
+    }
+}
+#endif /* DEFUTIL_H */
diff --git a/lib/zlib_dfltcc/Makefile b/lib/zlib_dfltcc/Makefile
new file mode 100644
index 000000000000..863d3b37e09d
--- /dev/null
+++ b/lib/zlib_dfltcc/Makefile
@@ -0,0 +1,11 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# This is a modified version of zlib, which does all memory
+# allocation ahead of time.
+#
+# This is the code for s390 zlib hardware support.
+#
+
+obj-$(CONFIG_ZLIB_DFLTCC) += zlib_dfltcc.o
+
+zlib_dfltcc-objs := dfltcc.o dfltcc_deflate.o dfltcc_syms.o
diff --git a/lib/zlib_dfltcc/dfltcc.c b/lib/zlib_dfltcc/dfltcc.c
new file mode 100644
index 000000000000..7f77e5bb01c6
--- /dev/null
+++ b/lib/zlib_dfltcc/dfltcc.c
@@ -0,0 +1,52 @@
+// SPDX-License-Identifier: Zlib
+/* dfltcc.c - SystemZ DEFLATE CONVERSION CALL support. */
+
+#include <linux/zutil.h>
+#include "dfltcc_util.h"
+#include "dfltcc.h"
+
+char *oesc_msg(
+    char *buf,
+    int oesc
+)
+{
+    if (oesc == 0x00)
+        return NULL; /* Successful completion */
+    else {
+#ifdef STATIC
+        return NULL; /* Ignore for pre-boot decompressor */
+#else
+        sprintf(buf, "Operation-Ending-Supplemental Code is 0x%.2X", oesc);
+        return buf;
+#endif
+    }
+}
+
+void dfltcc_reset(
+    z_streamp strm,
+    uInt size
+)
+{
+    struct dfltcc_state *dfltcc_state =
+        (struct dfltcc_state *)((char *)strm->state + size);
+    struct dfltcc_qaf_param *param =
+        (struct dfltcc_qaf_param *)&dfltcc_state->param;
+
+    /* Initialize available functions */
+    if (is_dfltcc_enabled()) {
+        dfltcc(DFLTCC_QAF, param, NULL, NULL, NULL, NULL, NULL);
+        memmove(&dfltcc_state->af, param, sizeof(dfltcc_state->af));
+    } else
+        memset(&dfltcc_state->af, 0, sizeof(dfltcc_state->af));
+
+    /* Initialize parameter block */
+    memset(&dfltcc_state->param, 0, sizeof(dfltcc_state->param));
+    dfltcc_state->param.nt = 1;
+
+    /* Initialize tuning parameters */
+    dfltcc_state->level_mask = DFLTCC_LEVEL_MASK;
+    dfltcc_state->block_size = DFLTCC_BLOCK_SIZE;
+    dfltcc_state->block_threshold = DFLTCC_FIRST_FHT_BLOCK_SIZE;
+    dfltcc_state->dht_threshold = DFLTCC_DHT_MIN_SAMPLE_SIZE;
+    dfltcc_state->param.ribm = DFLTCC_RIBM;
+}
diff --git a/lib/zlib_dfltcc/dfltcc.h b/lib/zlib_dfltcc/dfltcc.h
new file mode 100644
index 000000000000..18fed7a444bc
--- /dev/null
+++ b/lib/zlib_dfltcc/dfltcc.h
@@ -0,0 +1,115 @@
+// SPDX-License-Identifier: Zlib
+#ifndef DFLTCC_H
+#define DFLTCC_H
+
+#include "../zlib_deflate/defutil.h"
+
+/*
+ * Tuning parameters.
+ */
+#define DFLTCC_LEVEL_MASK 0x2 /* DFLTCC compression for level 1 only */
+#define DFLTCC_BLOCK_SIZE 1048576
+#define DFLTCC_FIRST_FHT_BLOCK_SIZE 4096
+#define DFLTCC_DHT_MIN_SAMPLE_SIZE 4096
+#define DFLTCC_RIBM 0
+
+/*
+ * Parameter Block for Query Available Functions.
+ */
+struct dfltcc_qaf_param {
+    char fns[16];
+    char reserved1[8];
+    char fmts[2];
+    char reserved2[6];
+};
+
+static_assert(sizeof(struct dfltcc_qaf_param) == 32);
+
+#define DFLTCC_FMT0 0
+
+/*
+ * Parameter Block for Generate Dynamic-Huffman Table, Compress and Expand.
+ */
+struct dfltcc_param_v0 {
+    uint16_t pbvn;                     /* Parameter-Block-Version Number */
+    uint8_t mvn;                       /* Model-Version Number */
+    uint8_t ribm;                      /* Reserved for IBM use */
+    unsigned reserved32 : 31;
+    unsigned cf : 1;                   /* Continuation Flag */
+    uint8_t reserved64[8];
+    unsigned nt : 1;                   /* New Task */
+    unsigned reserved129 : 1;
+    unsigned cvt : 1;                  /* Check Value Type */
+    unsigned reserved131 : 1;
+    unsigned htt : 1;                  /* Huffman-Table Type */
+    unsigned bcf : 1;                  /* Block-Continuation Flag */
+    unsigned bcc : 1;                  /* Block Closing Control */
+    unsigned bhf : 1;                  /* Block Header Final */
+    unsigned reserved136 : 1;
+    unsigned reserved137 : 1;
+    unsigned dhtgc : 1;                /* DHT Generation Control */
+    unsigned reserved139 : 5;
+    unsigned reserved144 : 5;
+    unsigned sbb : 3;                  /* Sub-Byte Boundary */
+    uint8_t oesc;                      /* Operation-Ending-Supplemental Code */
+    unsigned reserved160 : 12;
+    unsigned ifs : 4;                  /* Incomplete-Function Status */
+    uint16_t ifl;                      /* Incomplete-Function Length */
+    uint8_t reserved192[8];
+    uint8_t reserved256[8];
+    uint8_t reserved320[4];
+    uint16_t hl;                       /* History Length */
+    unsigned reserved368 : 1;
+    uint16_t ho : 15;                  /* History Offset */
+    uint32_t cv;                       /* Check Value */
+    unsigned eobs : 15;                /* End-of-block Symbol */
+    unsigned reserved431: 1;
+    uint8_t eobl : 4;                  /* End-of-block Length */
+    unsigned reserved436 : 12;
+    unsigned reserved448 : 4;
+    uint16_t cdhtl : 12;               /* Compressed-Dynamic-Huffman Table
+                                          Length */
+    uint8_t reserved464[6];
+    uint8_t cdht[288];
+    uint8_t reserved[32];
+    uint8_t csb[1152];
+};
+
+static_assert(sizeof(struct dfltcc_param_v0) == 1536);
+
+#define CVT_CRC32 0
+#define CVT_ADLER32 1
+#define HTT_FIXED 0
+#define HTT_DYNAMIC 1
+
+/*
+ *  Extension of inflate_state and deflate_state for DFLTCC.
+ */
+struct dfltcc_state {
+    struct dfltcc_param_v0 param;      /* Parameter block */
+    struct dfltcc_qaf_param af;        /* Available functions */
+    uLong level_mask;                  /* Levels on which to use DFLTCC */
+    uLong block_size;                  /* New block each X bytes */
+    uLong block_threshold;             /* New block after total_in > X */
+    uLong dht_threshold;               /* New block only if avail_in >= X */
+    char msg[64];                      /* Buffer for strm->msg */
+};
+
+/* Resides right after inflate_state or deflate_state */
+#define GET_DFLTCC_STATE(state) ((struct dfltcc_state *)((state) + 1))
+
+/* External functions */
+int dfltcc_can_deflate(z_streamp strm);
+int dfltcc_deflate(z_streamp strm,
+                   int flush,
+                   block_state *result);
+void dfltcc_reset(z_streamp strm, uInt size);
+
+#define DEFLATE_RESET_HOOK(strm) \
+    dfltcc_reset((strm), sizeof(deflate_state))
+
+#define DEFLATE_HOOK dfltcc_deflate
+
+#define DEFLATE_NEED_CHECKSUM(strm) (!dfltcc_can_deflate((strm)))
+
+#endif /* DFLTCC_H */
diff --git a/lib/zlib_dfltcc/dfltcc_deflate.c b/lib/zlib_dfltcc/dfltcc_deflate.c
new file mode 100644
index 000000000000..b9d16917c30a
--- /dev/null
+++ b/lib/zlib_dfltcc/dfltcc_deflate.c
@@ -0,0 +1,274 @@
+// SPDX-License-Identifier: Zlib
+
+#include "../zlib_deflate/defutil.h"
+#include "dfltcc_util.h"
+#include "dfltcc.h"
+#include <linux/zutil.h>
+
+/*
+ * Compress.
+ */
+int dfltcc_can_deflate(
+    z_streamp strm
+)
+{
+    deflate_state *state = (deflate_state *)strm->state;
+    struct dfltcc_state *dfltcc_state = GET_DFLTCC_STATE(state);
+
+    /* Unsupported compression settings */
+    if (!dfltcc_are_params_ok(state->level, state->w_bits, state->strategy,
+                              dfltcc_state->level_mask))
+        return 0;
+
+    /* Unsupported hardware */
+    if (!is_bit_set(dfltcc_state->af.fns, DFLTCC_GDHT) ||
+            !is_bit_set(dfltcc_state->af.fns, DFLTCC_CMPR) ||
+            !is_bit_set(dfltcc_state->af.fmts, DFLTCC_FMT0))
+        return 0;
+
+    return 1;
+}
+
+static void dfltcc_gdht(
+    z_streamp strm
+)
+{
+    deflate_state *state = (deflate_state *)strm->state;
+    struct dfltcc_param_v0 *param = &GET_DFLTCC_STATE(state)->param;
+    size_t avail_in = avail_in = strm->avail_in;
+
+    dfltcc(DFLTCC_GDHT,
+           param, NULL, NULL,
+           &strm->next_in, &avail_in, NULL);
+}
+
+static dfltcc_cc dfltcc_cmpr(
+    z_streamp strm
+)
+{
+    deflate_state *state = (deflate_state *)strm->state;
+    struct dfltcc_param_v0 *param = &GET_DFLTCC_STATE(state)->param;
+    size_t avail_in = strm->avail_in;
+    size_t avail_out = strm->avail_out;
+    dfltcc_cc cc;
+
+    cc = dfltcc(DFLTCC_CMPR | HBT_CIRCULAR,
+                param, &strm->next_out, &avail_out,
+                &strm->next_in, &avail_in, state->window);
+    strm->total_in += (strm->avail_in - avail_in);
+    strm->total_out += (strm->avail_out - avail_out);
+    strm->avail_in = avail_in;
+    strm->avail_out = avail_out;
+    return cc;
+}
+
+static void send_eobs(
+    z_streamp strm,
+    const struct dfltcc_param_v0 *param
+)
+{
+    deflate_state *state = (deflate_state *)strm->state;
+
+    zlib_tr_send_bits(
+          state,
+          bi_reverse(param->eobs >> (15 - param->eobl), param->eobl),
+          param->eobl);
+    flush_pending(strm);
+    if (state->pending != 0) {
+        /* The remaining data is located in pending_out[0:pending]. If someone
+         * calls put_byte() - this might happen in deflate() - the byte will be
+         * placed into pending_buf[pending], which is incorrect. Move the
+         * remaining data to the beginning of pending_buf so that put_byte() is
+         * usable again.
+         */
+        memmove(state->pending_buf, state->pending_out, state->pending);
+        state->pending_out = state->pending_buf;
+    }
+#ifdef ZLIB_DEBUG
+    state->compressed_len += param->eobl;
+#endif
+}
+
+int dfltcc_deflate(
+    z_streamp strm,
+    int flush,
+    block_state *result
+)
+{
+    deflate_state *state = (deflate_state *)strm->state;
+    struct dfltcc_state *dfltcc_state = GET_DFLTCC_STATE(state);
+    struct dfltcc_param_v0 *param = &dfltcc_state->param;
+    uInt masked_avail_in;
+    dfltcc_cc cc;
+    int need_empty_block;
+    int soft_bcc;
+    int no_flush;
+
+    if (!dfltcc_can_deflate(strm))
+        return 0;
+
+again:
+    masked_avail_in = 0;
+    soft_bcc = 0;
+    no_flush = flush == Z_NO_FLUSH;
+
+    /* Trailing empty block. Switch to software, except when Continuation Flag
+     * is set, which means that DFLTCC has buffered some output in the
+     * parameter block and needs to be called again in order to flush it.
+     */
+    if (flush == Z_FINISH && strm->avail_in == 0 && !param->cf) {
+        if (param->bcf) {
+            /* A block is still open, and the hardware does not support closing
+             * blocks without adding data. Thus, close it manually.
+             */
+            send_eobs(strm, param);
+            param->bcf = 0;
+        }
+        return 0;
+    }
+
+    if (strm->avail_in == 0 && !param->cf) {
+        *result = need_more;
+        return 1;
+    }
+
+    /* There is an open non-BFINAL block, we are not going to close it just
+     * yet, we have compressed more than DFLTCC_BLOCK_SIZE bytes and we see
+     * more than DFLTCC_DHT_MIN_SAMPLE_SIZE bytes. Open a new block with a new
+     * DHT in order to adapt to a possibly changed input data distribution.
+     */
+    if (param->bcf && no_flush &&
+            strm->total_in > dfltcc_state->block_threshold &&
+            strm->avail_in >= dfltcc_state->dht_threshold) {
+        if (param->cf) {
+            /* We need to flush the DFLTCC buffer before writing the
+             * End-of-block Symbol. Mask the input data and proceed as usual.
+             */
+            masked_avail_in += strm->avail_in;
+            strm->avail_in = 0;
+            no_flush = 0;
+        } else {
+            /* DFLTCC buffer is empty, so we can manually write the
+             * End-of-block Symbol right away.
+             */
+            send_eobs(strm, param);
+            param->bcf = 0;
+            dfltcc_state->block_threshold =
+                strm->total_in + dfltcc_state->block_size;
+            if (strm->avail_out == 0) {
+                *result = need_more;
+                return 1;
+            }
+        }
+    }
+
+    /* The caller gave us too much data. Pass only one block worth of
+     * uncompressed data to DFLTCC and mask the rest, so that on the next
+     * iteration we start a new block.
+     */
+    if (no_flush && strm->avail_in > dfltcc_state->block_size) {
+        masked_avail_in += (strm->avail_in - dfltcc_state->block_size);
+        strm->avail_in = dfltcc_state->block_size;
+    }
+
+    /* When we have an open non-BFINAL deflate block and caller indicates that
+     * the stream is ending, we need to close an open deflate block and open a
+     * BFINAL one.
+     */
+    need_empty_block = flush == Z_FINISH && param->bcf && !param->bhf;
+
+    /* Translate stream to parameter block */
+    param->cvt = CVT_ADLER32;
+    if (!no_flush)
+        /* We need to close a block. Always do this in software - when there is
+         * no input data, the hardware will not nohor BCC. */
+        soft_bcc = 1;
+    if (flush == Z_FINISH && !param->bcf)
+        /* We are about to open a BFINAL block, set Block Header Final bit
+         * until the stream ends.
+         */
+        param->bhf = 1;
+    /* DFLTCC-CMPR will write to next_out, so make sure that buffers with
+     * higher precedence are empty.
+     */
+    Assert(state->pending == 0, "There must be no pending bytes");
+    Assert(state->bi_valid < 8, "There must be less than 8 pending bits");
+    param->sbb = (unsigned int)state->bi_valid;
+    if (param->sbb > 0)
+        *strm->next_out = (Byte)state->bi_buf;
+    if (param->hl)
+        param->nt = 0; /* Honor history */
+    param->cv = strm->adler;
+
+    /* When opening a block, choose a Huffman-Table Type */
+    if (!param->bcf) {
+        if (strm->total_in == 0 && dfltcc_state->block_threshold > 0) {
+            param->htt = HTT_FIXED;
+        }
+        else {
+            param->htt = HTT_DYNAMIC;
+            dfltcc_gdht(strm);
+        }
+    }
+
+    /* Deflate */
+    do {
+        cc = dfltcc_cmpr(strm);
+        if (strm->avail_in < 4096 && masked_avail_in > 0)
+            /* We are about to call DFLTCC with a small input buffer, which is
+             * inefficient. Since there is masked data, there will be at least
+             * one more DFLTCC call, so skip the current one and make the next
+             * one handle more data.
+             */
+            break;
+    } while (cc == DFLTCC_CC_AGAIN);
+
+    /* Translate parameter block to stream */
+    strm->msg = oesc_msg(dfltcc_state->msg, param->oesc);
+    state->bi_valid = param->sbb;
+    if (state->bi_valid == 0)
+        state->bi_buf = 0; /* Avoid accessing next_out */
+    else
+        state->bi_buf = *strm->next_out & ((1 << state->bi_valid) - 1);
+    strm->adler = param->cv;
+
+    /* Unmask the input data */
+    strm->avail_in += masked_avail_in;
+    masked_avail_in = 0;
+
+    /* If we encounter an error, it means there is a bug in DFLTCC call */
+    Assert(cc != DFLTCC_CC_OP2_CORRUPT || param->oesc == 0, "BUG");
+
+    /* Update Block-Continuation Flag. It will be used to check whether to call
+     * GDHT the next time.
+     */
+    if (cc == DFLTCC_CC_OK) {
+        if (soft_bcc) {
+            send_eobs(strm, param);
+            param->bcf = 0;
+            dfltcc_state->block_threshold =
+                strm->total_in + dfltcc_state->block_size;
+        } else
+            param->bcf = 1;
+        if (flush == Z_FINISH) {
+            if (need_empty_block)
+                /* Make the current deflate() call also close the stream */
+                return 0;
+            else {
+                bi_windup(state);
+                *result = finish_done;
+            }
+        } else {
+            if (flush == Z_FULL_FLUSH)
+                param->hl = 0; /* Clear history */
+            *result = flush == Z_NO_FLUSH ? need_more : block_done;
+        }
+    } else {
+        param->bcf = 1;
+        *result = need_more;
+    }
+    if (strm->avail_in != 0 && strm->avail_out != 0)
+        goto again; /* deflate() must use all input or all output */
+    return 1;
+}
+
diff --git a/lib/zlib_dfltcc/dfltcc_syms.c b/lib/zlib_dfltcc/dfltcc_syms.c
new file mode 100644
index 000000000000..6f23481804c1
--- /dev/null
+++ b/lib/zlib_dfltcc/dfltcc_syms.c
@@ -0,0 +1,17 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * linux/lib/zlib_dfltcc/dfltcc_syms.c
+ *
+ * Exported symbols for the s390 zlib dfltcc support.
+ *
+ */
+
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/zlib.h>
+#include "dfltcc.h"
+
+EXPORT_SYMBOL(dfltcc_can_deflate);
+EXPORT_SYMBOL(dfltcc_deflate);
+EXPORT_SYMBOL(dfltcc_reset);
+MODULE_LICENSE("GPL");
diff --git a/lib/zlib_dfltcc/dfltcc_util.h b/lib/zlib_dfltcc/dfltcc_util.h
new file mode 100644
index 000000000000..1930b43c1915
--- /dev/null
+++ b/lib/zlib_dfltcc/dfltcc_util.h
@@ -0,0 +1,119 @@
+// SPDX-License-Identifier: Zlib
+#ifndef DFLTCC_UTIL_H
+#define DFLTCC_UTIL_H
+
+#include <linux/zutil.h>
+
+/*
+ * C wrapper for the DEFLATE CONVERSION CALL instruction.
+ */
+typedef enum {
+    DFLTCC_CC_OK = 0,
+    DFLTCC_CC_OP1_TOO_SHORT = 1,
+    DFLTCC_CC_OP2_TOO_SHORT = 2,
+    DFLTCC_CC_OP2_CORRUPT = 2,
+    DFLTCC_CC_AGAIN = 3,
+} dfltcc_cc;
+
+#define DFLTCC_QAF 0
+#define DFLTCC_GDHT 1
+#define DFLTCC_CMPR 2
+#define DFLTCC_XPND 4
+#define HBT_CIRCULAR (1 << 7)
+#define HB_BITS 15
+#define HB_SIZE (1 << HB_BITS)
+#define DFLTCC_FACILITY 151
+
+static inline dfltcc_cc dfltcc(
+    int fn,
+    void *param,
+    Byte **op1,
+    size_t *len1,
+    const Byte **op2,
+    size_t *len2,
+    void *hist
+)
+{
+    Byte *t2 = op1 ? *op1 : NULL;
+    size_t t3 = len1 ? *len1 : 0;
+    const Byte *t4 = op2 ? *op2 : NULL;
+    size_t t5 = len2 ? *len2 : 0;
+    register int r0 __asm__("r0") = fn;
+    register void *r1 __asm__("r1") = param;
+    register Byte *r2 __asm__("r2") = t2;
+    register size_t r3 __asm__("r3") = t3;
+    register const Byte *r4 __asm__("r4") = t4;
+    register size_t r5 __asm__("r5") = t5;
+    int cc;
+
+    __asm__ volatile(
+                     ".insn rrf,0xb9390000,%[r2],%[r4],%[hist],0\n"
+                     "ipm %[cc]\n"
+                     : [r2] "+r" (r2)
+                     , [r3] "+r" (r3)
+                     , [r4] "+r" (r4)
+                     , [r5] "+r" (r5)
+                     , [cc] "=r" (cc)
+                     : [r0] "r" (r0)
+                     , [r1] "r" (r1)
+                     , [hist] "r" (hist)
+                     : "cc", "memory");
+    t2 = r2; t3 = r3; t4 = r4; t5 = r5;
+
+    if (op1)
+        *op1 = t2;
+    if (len1)
+        *len1 = t3;
+    if (op2)
+        *op2 = t4;
+    if (len2)
+        *len2 = t5;
+    return (cc >> 28) & 3;
+}
+
+static inline int is_bit_set(
+    const char *bits,
+    int n
+)
+{
+    return bits[n / 8] & (1 << (7 - (n % 8)));
+}
+
+static inline void turn_bit_off(
+    char *bits,
+    int n
+)
+{
+    bits[n / 8] &= ~(1 << (7 - (n % 8)));
+}
+
+static inline int dfltcc_are_params_ok(
+    int level,
+    uInt window_bits,
+    int strategy,
+    uLong level_mask
+)
+{
+    return (level_mask & (1 << level)) != 0 &&
+        (window_bits == HB_BITS) &&
+        (strategy == Z_DEFAULT_STRATEGY);
+}
+
+static inline int is_dfltcc_enabled(void)
+{
+    uint64_t facilities[(DFLTCC_FACILITY / 64) + 1];
+    register char r0 __asm__("r0");
+
+    memset(facilities, 0, sizeof(facilities));
+    r0 = sizeof(facilities) / sizeof(facilities[0]) - 1;
+    __asm__ volatile("stfle %[facilities]\n"
+                     : [facilities] "=Q" (facilities)
+                     , [r0] "+r" (r0)
+                     :
+                     : "cc");
+    return is_bit_set((const char *)facilities, DFLTCC_FACILITY);
+}
+
+char *oesc_msg(char *buf, int oesc);
+
+#endif /* DFLTCC_UTIL_H */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/5] s390/boot: Rename HEAP_SIZE due to name collision
  2019-11-26 14:41 [PATCH 0/5] S390 hardware compression support for kernel zlib Mikhail Zaslonko
  2019-11-26 14:41 ` [PATCH 1/5] lib/zlib: Add s390 hardware support for kernel zlib_deflate Mikhail Zaslonko
@ 2019-11-26 14:41 ` Mikhail Zaslonko
  2019-11-26 14:41 ` [PATCH 3/5] lib/zlib: Add s390 hardware support for kernel zlib_inflate Mikhail Zaslonko
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Mikhail Zaslonko @ 2019-11-26 14:41 UTC (permalink / raw)
  To: Andrew Morton, Chris Mason, Josef Bacik, David Sterba, Richard Purdie
  Cc: Heiko Carstens, Vasily Gorbik, Christian Borntraeger, linux-s390,
	linux-kernel

Change the conflicting macro name in preparation for zlib_inflate
hardware support.

Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
---
 arch/s390/boot/compressed/decompressor.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/s390/boot/compressed/decompressor.c b/arch/s390/boot/compressed/decompressor.c
index 45046630c56a..368fd372c875 100644
--- a/arch/s390/boot/compressed/decompressor.c
+++ b/arch/s390/boot/compressed/decompressor.c
@@ -30,13 +30,13 @@ extern unsigned char _compressed_start[];
 extern unsigned char _compressed_end[];
 
 #ifdef CONFIG_HAVE_KERNEL_BZIP2
-#define HEAP_SIZE	0x400000
+#define BOOT_HEAP_SIZE	0x400000
 #else
-#define HEAP_SIZE	0x10000
+#define BOOT_HEAP_SIZE	0x10000
 #endif
 
 static unsigned long free_mem_ptr = (unsigned long) _end;
-static unsigned long free_mem_end_ptr = (unsigned long) _end + HEAP_SIZE;
+static unsigned long free_mem_end_ptr = (unsigned long) _end + BOOT_HEAP_SIZE;
 
 #ifdef CONFIG_KERNEL_GZIP
 #include "../../../../lib/decompress_inflate.c"
@@ -62,7 +62,7 @@ static unsigned long free_mem_end_ptr = (unsigned long) _end + HEAP_SIZE;
 #include "../../../../lib/decompress_unxz.c"
 #endif
 
-#define decompress_offset ALIGN((unsigned long)_end + HEAP_SIZE, PAGE_SIZE)
+#define decompress_offset ALIGN((unsigned long)_end + BOOT_HEAP_SIZE, PAGE_SIZE)
 
 unsigned long mem_safe_offset(void)
 {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/5] lib/zlib: Add s390 hardware support for kernel zlib_inflate
  2019-11-26 14:41 [PATCH 0/5] S390 hardware compression support for kernel zlib Mikhail Zaslonko
  2019-11-26 14:41 ` [PATCH 1/5] lib/zlib: Add s390 hardware support for kernel zlib_deflate Mikhail Zaslonko
  2019-11-26 14:41 ` [PATCH 2/5] s390/boot: Rename HEAP_SIZE due to name collision Mikhail Zaslonko
@ 2019-11-26 14:41 ` Mikhail Zaslonko
  2019-11-26 14:41 ` [PATCH 4/5] s390/boot: Add dfltcc= kernel command line parameter Mikhail Zaslonko
  2019-11-26 14:41 ` [PATCH 5/5] btrfs: Increase buffer size for zlib functions Mikhail Zaslonko
  4 siblings, 0 replies; 13+ messages in thread
From: Mikhail Zaslonko @ 2019-11-26 14:41 UTC (permalink / raw)
  To: Andrew Morton, Chris Mason, Josef Bacik, David Sterba, Richard Purdie
  Cc: Heiko Carstens, Vasily Gorbik, Christian Borntraeger, linux-s390,
	linux-kernel

Add decompression functions to zlib_dfltcc library.
Update zlib_inflate functions with the hooks for s390 hardware support
and adjust workspace structures with extra parameter lists required
for hardware inflate decompression.

Co-developed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
---
 lib/Kconfig                      |   8 ++
 lib/decompress_inflate.c         |  13 +++
 lib/zlib_dfltcc/Makefile         |   2 +-
 lib/zlib_dfltcc/dfltcc.h         |  28 ++++++
 lib/zlib_dfltcc/dfltcc_inflate.c | 143 +++++++++++++++++++++++++++++++
 lib/zlib_inflate/inflate.c       |  32 +++++--
 lib/zlib_inflate/inflate.h       |   8 ++
 lib/zlib_inflate/infutil.h       |  18 +++-
 8 files changed, 241 insertions(+), 11 deletions(-)
 create mode 100644 lib/zlib_dfltcc/dfltcc_inflate.c

diff --git a/lib/Kconfig b/lib/Kconfig
index 7ccaa54006c3..a29264d6e98d 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -292,6 +292,14 @@ config ZLIB_DFLTCC_DEFLATE
 	help
 	 Enable s390x hardware support for zlib deflate in the kernel.
 
+config ZLIB_DFLTCC_INFLATE
+	def_bool y
+	depends on ZLIB_DFLTCC && ZLIB_INFLATE
+	prompt "Enable s390x hardware support for zlib inflate functions"
+	help
+	 Enable s390x hardware support for zlib inflate in the kernel.
+
+
 config LZO_COMPRESS
 	tristate
 
diff --git a/lib/decompress_inflate.c b/lib/decompress_inflate.c
index 63b4b7eee138..523651e23c06 100644
--- a/lib/decompress_inflate.c
+++ b/lib/decompress_inflate.c
@@ -10,6 +10,10 @@
 #include "zlib_inflate/inftrees.c"
 #include "zlib_inflate/inffast.c"
 #include "zlib_inflate/inflate.c"
+#ifdef CONFIG_ZLIB_DFLTCC_INFLATE
+#include "zlib_dfltcc/dfltcc.c"
+#include "zlib_dfltcc/dfltcc_inflate.c"
+#endif
 
 #else /* STATIC */
 /* initramfs et al: linked */
@@ -76,7 +80,12 @@ STATIC int INIT __gunzip(unsigned char *buf, long len,
 	}
 
 	strm->workspace = malloc(flush ? zlib_inflate_workspacesize() :
+#ifdef CONFIG_ZLIB_DFLTCC_INFLATE
+	/* Always allocate the full workspace for DFLTCC */
+				 zlib_inflate_workspacesize());
+#else
 				 sizeof(struct inflate_state));
+#endif
 	if (strm->workspace == NULL) {
 		error("Out of memory while allocating workspace");
 		goto gunzip_nomem4;
@@ -123,10 +132,14 @@ STATIC int INIT __gunzip(unsigned char *buf, long len,
 
 	rc = zlib_inflateInit2(strm, -MAX_WBITS);
 
+#ifdef CONFIG_ZLIB_DFLTCC_INFLATE
+	/* Always keep the window for DFLTCC */
+#else
 	if (!flush) {
 		WS(strm)->inflate_state.wsize = 0;
 		WS(strm)->inflate_state.window = NULL;
 	}
+#endif
 
 	while (rc == Z_OK) {
 		if (strm->avail_in == 0) {
diff --git a/lib/zlib_dfltcc/Makefile b/lib/zlib_dfltcc/Makefile
index 863d3b37e09d..8e4d5afbbb10 100644
--- a/lib/zlib_dfltcc/Makefile
+++ b/lib/zlib_dfltcc/Makefile
@@ -8,4 +8,4 @@
 
 obj-$(CONFIG_ZLIB_DFLTCC) += zlib_dfltcc.o
 
-zlib_dfltcc-objs := dfltcc.o dfltcc_deflate.o dfltcc_syms.o
+zlib_dfltcc-objs := dfltcc.o dfltcc_deflate.o dfltcc_inflate.o dfltcc_syms.o
diff --git a/lib/zlib_dfltcc/dfltcc.h b/lib/zlib_dfltcc/dfltcc.h
index 18fed7a444bc..4782c92bb2ff 100644
--- a/lib/zlib_dfltcc/dfltcc.h
+++ b/lib/zlib_dfltcc/dfltcc.h
@@ -104,6 +104,14 @@ int dfltcc_deflate(z_streamp strm,
                    int flush,
                    block_state *result);
 void dfltcc_reset(z_streamp strm, uInt size);
+int dfltcc_can_inflate(z_streamp strm);
+typedef enum {
+    DFLTCC_INFLATE_CONTINUE,
+    DFLTCC_INFLATE_BREAK,
+    DFLTCC_INFLATE_SOFTWARE,
+} dfltcc_inflate_action;
+dfltcc_inflate_action dfltcc_inflate(z_streamp strm,
+                                     int flush, int *ret);
 
 #define DEFLATE_RESET_HOOK(strm) \
     dfltcc_reset((strm), sizeof(deflate_state))
@@ -112,4 +120,24 @@ void dfltcc_reset(z_streamp strm, uInt size);
 
 #define DEFLATE_NEED_CHECKSUM(strm) (!dfltcc_can_deflate((strm)))
 
+#define INFLATE_RESET_HOOK(strm) \
+    dfltcc_reset((strm), sizeof(struct inflate_state))
+
+#define INFLATE_TYPEDO_HOOK(strm, flush) \
+    if (dfltcc_can_inflate((strm))) { \
+        dfltcc_inflate_action action; \
+\
+        RESTORE(); \
+        action = dfltcc_inflate((strm), (flush), &ret); \
+        LOAD(); \
+        if (action == DFLTCC_INFLATE_CONTINUE) \
+            break; \
+        else if (action == DFLTCC_INFLATE_BREAK) \
+            goto inf_leave; \
+    }
+
+#define INFLATE_NEED_CHECKSUM(strm) (!dfltcc_can_inflate((strm)))
+
+#define INFLATE_NEED_UPDATEWINDOW(strm) (!dfltcc_can_inflate((strm)))
+
 #endif /* DFLTCC_H */
diff --git a/lib/zlib_dfltcc/dfltcc_inflate.c b/lib/zlib_dfltcc/dfltcc_inflate.c
new file mode 100644
index 000000000000..12a93a06bd61
--- /dev/null
+++ b/lib/zlib_dfltcc/dfltcc_inflate.c
@@ -0,0 +1,143 @@
+// SPDX-License-Identifier: Zlib
+
+#include "../zlib_inflate/inflate.h"
+#include "dfltcc_util.h"
+#include "dfltcc.h"
+#include <linux/zutil.h>
+
+/*
+ * Expand.
+ */
+int dfltcc_can_inflate(
+    z_streamp strm
+)
+{
+    struct inflate_state *state = (struct inflate_state *)strm->state;
+    struct dfltcc_state *dfltcc_state = GET_DFLTCC_STATE(state);
+
+    /* Unsupported compression settings */
+    if (state->wbits != HB_BITS)
+        return 0;
+
+    /* Unsupported hardware */
+    return is_bit_set(dfltcc_state->af.fns, DFLTCC_XPND) &&
+               is_bit_set(dfltcc_state->af.fmts, DFLTCC_FMT0);
+}
+
+static int dfltcc_was_inflate_used(
+    z_streamp strm
+)
+{
+    struct inflate_state *state = (struct inflate_state *)strm->state;
+    struct dfltcc_param_v0 *param = &GET_DFLTCC_STATE(state)->param;
+
+    return !param->nt;
+}
+
+static int dfltcc_inflate_disable(
+    z_streamp strm
+)
+{
+    struct inflate_state *state = (struct inflate_state *)strm->state;
+    struct dfltcc_state *dfltcc_state = GET_DFLTCC_STATE(state);
+
+    if (!dfltcc_can_inflate(strm))
+        return 0;
+    if (dfltcc_was_inflate_used(strm))
+        /* DFLTCC has already decompressed some data. Since there is not
+         * enough information to resume decompression in software, the call
+         * must fail.
+         */
+        return 1;
+    /* DFLTCC was not used yet - decompress in software */
+    memset(&dfltcc_state->af, 0, sizeof(dfltcc_state->af));
+    return 0;
+}
+
+static dfltcc_cc dfltcc_xpnd(
+    z_streamp strm
+)
+{
+    struct inflate_state *state = (struct inflate_state *)strm->state;
+    struct dfltcc_param_v0 *param = &GET_DFLTCC_STATE(state)->param;
+    size_t avail_in = strm->avail_in;
+    size_t avail_out = strm->avail_out;
+    dfltcc_cc cc;
+
+    cc = dfltcc(DFLTCC_XPND | HBT_CIRCULAR,
+                param, &strm->next_out, &avail_out,
+                &strm->next_in, &avail_in, state->window);
+    strm->avail_in = avail_in;
+    strm->avail_out = avail_out;
+    return cc;
+}
+
+dfltcc_inflate_action dfltcc_inflate(
+    z_streamp strm,
+    int flush,
+    int *ret
+)
+{
+    struct inflate_state *state = (struct inflate_state *)strm->state;
+    struct dfltcc_state *dfltcc_state = GET_DFLTCC_STATE(state);
+    struct dfltcc_param_v0 *param = &dfltcc_state->param;
+    dfltcc_cc cc;
+
+    if (flush == Z_BLOCK) {
+        /* DFLTCC does not support stopping on block boundaries */
+        if (dfltcc_inflate_disable(strm)) {
+            *ret = Z_STREAM_ERROR;
+            return DFLTCC_INFLATE_BREAK;
+        } else
+            return DFLTCC_INFLATE_SOFTWARE;
+    }
+
+    if (state->last) {
+        if (state->bits != 0) {
+            strm->next_in++;
+            strm->avail_in--;
+            state->bits = 0;
+        }
+        state->mode = CHECK;
+        return DFLTCC_INFLATE_CONTINUE;
+    }
+
+    if (strm->avail_in == 0 && !param->cf)
+        return DFLTCC_INFLATE_BREAK;
+
+    if (!state->window || state->wsize == 0) {
+        state->mode = MEM;
+        return DFLTCC_INFLATE_CONTINUE;
+    }
+
+    /* Translate stream to parameter block */
+    param->cvt = CVT_ADLER32;
+    param->sbb = state->bits;
+    param->hl = state->whave; /* Software and hardware history formats match */
+    param->ho = (state->write - state->whave) & ((1 << HB_BITS) - 1);
+    if (param->hl)
+        param->nt = 0; /* Honor history for the first block */
+    param->cv = state->flags ? REVERSE(state->check) : state->check;
+
+    /* Inflate */
+    do {
+        cc = dfltcc_xpnd(strm);
+    } while (cc == DFLTCC_CC_AGAIN);
+
+    /* Translate parameter block to stream */
+    strm->msg = oesc_msg(dfltcc_state->msg, param->oesc);
+    state->last = cc == DFLTCC_CC_OK;
+    state->bits = param->sbb;
+    state->whave = param->hl;
+    state->write = (param->ho + param->hl) & ((1 << HB_BITS) - 1);
+    state->check = state->flags ? REVERSE(param->cv) : param->cv;
+    if (cc == DFLTCC_CC_OP2_CORRUPT && param->oesc != 0) {
+        /* Report an error if stream is corrupted */
+        state->mode = BAD;
+        return DFLTCC_INFLATE_CONTINUE;
+    }
+    state->mode = TYPEDO;
+    /* Break if operands are exhausted, otherwise continue looping */
+    return (cc == DFLTCC_CC_OP1_TOO_SHORT || cc == DFLTCC_CC_OP2_TOO_SHORT) ?
+        DFLTCC_INFLATE_BREAK : DFLTCC_INFLATE_CONTINUE;
+}
diff --git a/lib/zlib_inflate/inflate.c b/lib/zlib_inflate/inflate.c
index 48f14cd58c77..59de6c18f2fc 100644
--- a/lib/zlib_inflate/inflate.c
+++ b/lib/zlib_inflate/inflate.c
@@ -15,6 +15,16 @@
 #include "inffast.h"
 #include "infutil.h"
 
+/* architecture-specific bits */
+#ifdef CONFIG_ZLIB_DFLTCC_INFLATE
+#  include "../zlib_dfltcc/dfltcc.h"
+#else
+#define INFLATE_RESET_HOOK(strm) do {} while (0)
+#define INFLATE_TYPEDO_HOOK(strm, flush) do {} while (0)
+#define INFLATE_NEED_UPDATEWINDOW(strm) 1
+#define INFLATE_NEED_CHECKSUM(strm) 1
+#endif
+
 int zlib_inflate_workspacesize(void)
 {
     return sizeof(struct inflate_workspace);
@@ -42,6 +52,7 @@ int zlib_inflateReset(z_streamp strm)
     state->write = 0;
     state->whave = 0;
 
+    INFLATE_RESET_HOOK(strm);
     return Z_OK;
 }
 
@@ -66,7 +77,15 @@ int zlib_inflateInit2(z_streamp strm, int windowBits)
         return Z_STREAM_ERROR;
     }
     state->wbits = (unsigned)windowBits;
+#ifdef CONFIG_ZLIB_DFLTCC_INFLATE
+    /*
+     * DFLTCC requires the window to be page aligned.
+     * Thus, we overallocate and take the aligned portion of the buffer.
+     */
+    state->window = PTR_ALIGN(&WS(strm)->working_window[0], PAGE_SIZE);
+#else
     state->window = &WS(strm)->working_window[0];
+#endif
 
     return zlib_inflateReset(strm);
 }
@@ -227,11 +246,6 @@ static int zlib_inflateSyncPacket(z_streamp strm)
         bits -= bits & 7; \
     } while (0)
 
-/* Reverse the bytes in a 32-bit value */
-#define REVERSE(q) \
-    ((((q) >> 24) & 0xff) + (((q) >> 8) & 0xff00) + \
-     (((q) & 0xff00) << 8) + (((q) & 0xff) << 24))
-
 /*
    inflate() uses a state machine to process as much input data and generate as
    much output data as possible before returning.  The state machine is
@@ -395,6 +409,7 @@ int zlib_inflate(z_streamp strm, int flush)
             if (flush == Z_BLOCK) goto inf_leave;
 	    /* fall through */
         case TYPEDO:
+            INFLATE_TYPEDO_HOOK(strm, flush);
             if (state->last) {
                 BYTEBITS();
                 state->mode = CHECK;
@@ -692,7 +707,7 @@ int zlib_inflate(z_streamp strm, int flush)
                 out -= left;
                 strm->total_out += out;
                 state->total += out;
-                if (out)
+                if (INFLATE_NEED_CHECKSUM(strm) && out)
                     strm->adler = state->check =
                         UPDATE(state->check, put - out, out);
                 out = left;
@@ -726,7 +741,8 @@ int zlib_inflate(z_streamp strm, int flush)
      */
   inf_leave:
     RESTORE();
-    if (state->wsize || (state->mode < CHECK && out != strm->avail_out))
+    if (INFLATE_NEED_UPDATEWINDOW(strm) &&
+            (state->wsize || (state->mode < CHECK && out != strm->avail_out)))
         zlib_updatewindow(strm, out);
 
     in -= strm->avail_in;
@@ -734,7 +750,7 @@ int zlib_inflate(z_streamp strm, int flush)
     strm->total_in += in;
     strm->total_out += out;
     state->total += out;
-    if (state->wrap && out)
+    if (INFLATE_NEED_CHECKSUM(strm) && state->wrap && out)
         strm->adler = state->check =
             UPDATE(state->check, strm->next_out - out, out);
 
diff --git a/lib/zlib_inflate/inflate.h b/lib/zlib_inflate/inflate.h
index 3d17b3d1b21f..f79337ddf98c 100644
--- a/lib/zlib_inflate/inflate.h
+++ b/lib/zlib_inflate/inflate.h
@@ -11,6 +11,8 @@
    subject to change. Applications should only use zlib.h.
  */
 
+#include "inftrees.h"
+
 /* Possible inflate modes between inflate() calls */
 typedef enum {
     HEAD,       /* i: waiting for magic header */
@@ -108,4 +110,10 @@ struct inflate_state {
     unsigned short work[288];   /* work area for code table building */
     code codes[ENOUGH];         /* space for code tables */
 };
+
+/* Reverse the bytes in a 32-bit value */
+#define REVERSE(q) \
+    ((((q) >> 24) & 0xff) + (((q) >> 8) & 0xff00) + \
+     (((q) & 0xff00) << 8) + (((q) & 0xff) << 24))
+
 #endif
diff --git a/lib/zlib_inflate/infutil.h b/lib/zlib_inflate/infutil.h
index eb1a9007bd86..0264b15c9192 100644
--- a/lib/zlib_inflate/infutil.h
+++ b/lib/zlib_inflate/infutil.h
@@ -12,14 +12,28 @@
 #define _INFUTIL_H
 
 #include <linux/zlib.h>
+#ifdef CONFIG_ZLIB_DFLTCC_INFLATE
+#include "../zlib_dfltcc/dfltcc.h"
+#include <asm/page.h>
+#endif
 
 /* memory allocation for inflation */
 
 struct inflate_workspace {
 	struct inflate_state inflate_state;
-	unsigned char working_window[1 << MAX_WBITS];
+#ifdef CONFIG_ZLIB_DFLTCC_INFLATE
+	struct dfltcc_state dfltcc_state;
+	unsigned char working_window[(1 << MAX_WBITS) + PAGE_SIZE];
+#else
+	unsigned char working_window[(1 << MAX_WBITS)];
+#endif
 };
 
-#define WS(z) ((struct inflate_workspace *)(z->workspace))
+#ifdef CONFIG_ZLIB_DFLTCC_INFLATE
+/* dfltcc_state must be doubleword aligned for DFLTCC call */
+static_assert(offsetof(struct inflate_workspace, dfltcc_state) % 8 == 0);
+#endif
+
+#define WS(strm) ((struct inflate_workspace *)(strm->workspace))
 
 #endif
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/5] s390/boot: Add dfltcc= kernel command line parameter
  2019-11-26 14:41 [PATCH 0/5] S390 hardware compression support for kernel zlib Mikhail Zaslonko
                   ` (2 preceding siblings ...)
  2019-11-26 14:41 ` [PATCH 3/5] lib/zlib: Add s390 hardware support for kernel zlib_inflate Mikhail Zaslonko
@ 2019-11-26 14:41 ` Mikhail Zaslonko
  2019-11-26 14:41 ` [PATCH 5/5] btrfs: Increase buffer size for zlib functions Mikhail Zaslonko
  4 siblings, 0 replies; 13+ messages in thread
From: Mikhail Zaslonko @ 2019-11-26 14:41 UTC (permalink / raw)
  To: Andrew Morton, Chris Mason, Josef Bacik, David Sterba, Richard Purdie
  Cc: Heiko Carstens, Vasily Gorbik, Christian Borntraeger, linux-s390,
	linux-kernel

Add the new kernel command line parameter 'dfltcc=' to configure
s390 zlib hardware support.
Format: { on | off | def_only | inf_only | always }
 on:       s390 zlib hardware support for compression on
           level 1 and decompression (default)
 off:      No s390 zlib hardware support
 def_only: s390 zlib hardware support for deflate
           only (compression on level 1)
 inf_only: s390 zlib hardware support for inflate
           only (decompression)
 always:   Same as 'on' but ignores the selected compression
           level always using hardware support (used for debugging)

Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
---
 Documentation/admin-guide/kernel-parameters.txt | 12 ++++++++++++
 arch/s390/boot/ipl_parm.c                       | 14 ++++++++++++++
 arch/s390/include/asm/setup.h                   |  7 +++++++
 arch/s390/kernel/setup.c                        |  1 +
 lib/zlib_dfltcc/dfltcc.c                        |  5 ++++-
 lib/zlib_dfltcc/dfltcc.h                        |  1 +
 lib/zlib_dfltcc/dfltcc_deflate.c                |  6 ++++++
 lib/zlib_dfltcc/dfltcc_inflate.c                |  6 ++++++
 lib/zlib_dfltcc/dfltcc_util.h                   |  5 +++++
 9 files changed, 56 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 8dee8f68fe15..953376347a3a 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -836,6 +836,18 @@
 			dump out devices still on the deferred probe list after
 			retrying.
 
+	dfltcc=		[HW,S390]
+			Format: { on | off | def_only | inf_only | always }
+			on:       s390 zlib hardware support for compression on
+			          level 1 and decompression (default)
+			off:      No s390 zlib hardware support
+			def_only: s390 zlib hardware support for deflate
+			          only (compression on level 1)
+			inf_only: s390 zlib hardware support for inflate
+			          only (decompression)
+			always:   Same as 'on' but ignores the selected compression
+			          level always using hardware support (used for debugging)
+
 	dhash_entries=	[KNL]
 			Set number of hash buckets for dentry cache.
 
diff --git a/arch/s390/boot/ipl_parm.c b/arch/s390/boot/ipl_parm.c
index 24ef67eb1cef..357adad991d2 100644
--- a/arch/s390/boot/ipl_parm.c
+++ b/arch/s390/boot/ipl_parm.c
@@ -14,6 +14,7 @@
 char __bootdata(early_command_line)[COMMAND_LINE_SIZE];
 struct ipl_parameter_block __bootdata_preserved(ipl_block);
 int __bootdata_preserved(ipl_block_valid);
+unsigned int __bootdata_preserved(zlib_dfltcc_support) = ZLIB_DFLTCC_FULL;
 
 unsigned long __bootdata(vmalloc_size) = VMALLOC_DEFAULT_SIZE;
 unsigned long __bootdata(memory_end);
@@ -229,6 +230,19 @@ void parse_boot_command_line(void)
 		if (!strcmp(param, "vmalloc") && val)
 			vmalloc_size = round_up(memparse(val, NULL), PAGE_SIZE);
 
+		if (!strcmp(param, "dfltcc")) {
+			if (!strcmp(val, "off"))
+				zlib_dfltcc_support = ZLIB_DFLTCC_DISABLED;
+			else if (!strcmp(val, "on"))
+				zlib_dfltcc_support = ZLIB_DFLTCC_FULL;
+			else if (!strcmp(val, "def_only"))
+				zlib_dfltcc_support = ZLIB_DFLTCC_DEFLATE_ONLY;
+			else if (!strcmp(val, "inf_only"))
+				zlib_dfltcc_support = ZLIB_DFLTCC_INFLATE_ONLY;
+			else if (!strcmp(val, "always"))
+				zlib_dfltcc_support = ZLIB_DFLTCC_FULL_DEBUG;
+		}
+
 		if (!strcmp(param, "noexec")) {
 			rc = kstrtobool(val, &enabled);
 			if (!rc && !enabled)
diff --git a/arch/s390/include/asm/setup.h b/arch/s390/include/asm/setup.h
index 6dc6c4fbc8e2..bd2cca097421 100644
--- a/arch/s390/include/asm/setup.h
+++ b/arch/s390/include/asm/setup.h
@@ -80,6 +80,13 @@ struct parmarea {
 	char command_line[ARCH_COMMAND_LINE_SIZE];	/* 0x10480 */
 };
 
+extern unsigned int zlib_dfltcc_support;
+#define ZLIB_DFLTCC_DISABLED		0
+#define ZLIB_DFLTCC_FULL		1
+#define ZLIB_DFLTCC_DEFLATE_ONLY	2
+#define ZLIB_DFLTCC_INFLATE_ONLY	3
+#define ZLIB_DFLTCC_FULL_DEBUG		4
+
 extern int noexec_disabled;
 extern int memory_end_set;
 extern unsigned long memory_end;
diff --git a/arch/s390/kernel/setup.c b/arch/s390/kernel/setup.c
index 3ff291bc63b7..e1a0cf580f6f 100644
--- a/arch/s390/kernel/setup.c
+++ b/arch/s390/kernel/setup.c
@@ -111,6 +111,7 @@ unsigned long __bootdata_preserved(__etext_dma);
 unsigned long __bootdata_preserved(__sdma);
 unsigned long __bootdata_preserved(__edma);
 unsigned long __bootdata_preserved(__kaslr_offset);
+unsigned int __bootdata_preserved(zlib_dfltcc_support);
 
 unsigned long VMALLOC_START;
 EXPORT_SYMBOL(VMALLOC_START);
diff --git a/lib/zlib_dfltcc/dfltcc.c b/lib/zlib_dfltcc/dfltcc.c
index 7f77e5bb01c6..c30de430b30c 100644
--- a/lib/zlib_dfltcc/dfltcc.c
+++ b/lib/zlib_dfltcc/dfltcc.c
@@ -44,7 +44,10 @@ void dfltcc_reset(
     dfltcc_state->param.nt = 1;
 
     /* Initialize tuning parameters */
-    dfltcc_state->level_mask = DFLTCC_LEVEL_MASK;
+    if (zlib_dfltcc_support == ZLIB_DFLTCC_FULL_DEBUG)
+        dfltcc_state->level_mask = DFLTCC_LEVEL_MASK_DEBUG;
+    else
+        dfltcc_state->level_mask = DFLTCC_LEVEL_MASK;
     dfltcc_state->block_size = DFLTCC_BLOCK_SIZE;
     dfltcc_state->block_threshold = DFLTCC_FIRST_FHT_BLOCK_SIZE;
     dfltcc_state->dht_threshold = DFLTCC_DHT_MIN_SAMPLE_SIZE;
diff --git a/lib/zlib_dfltcc/dfltcc.h b/lib/zlib_dfltcc/dfltcc.h
index 4782c92bb2ff..be70c807b62f 100644
--- a/lib/zlib_dfltcc/dfltcc.h
+++ b/lib/zlib_dfltcc/dfltcc.h
@@ -8,6 +8,7 @@
  * Tuning parameters.
  */
 #define DFLTCC_LEVEL_MASK 0x2 /* DFLTCC compression for level 1 only */
+#define DFLTCC_LEVEL_MASK_DEBUG 0x3fe /* DFLTCC compression for all levels */
 #define DFLTCC_BLOCK_SIZE 1048576
 #define DFLTCC_FIRST_FHT_BLOCK_SIZE 4096
 #define DFLTCC_DHT_MIN_SAMPLE_SIZE 4096
diff --git a/lib/zlib_dfltcc/dfltcc_deflate.c b/lib/zlib_dfltcc/dfltcc_deflate.c
index b9d16917c30a..aa4539f5c071 100644
--- a/lib/zlib_dfltcc/dfltcc_deflate.c
+++ b/lib/zlib_dfltcc/dfltcc_deflate.c
@@ -3,6 +3,7 @@
 #include "../zlib_deflate/defutil.h"
 #include "dfltcc_util.h"
 #include "dfltcc.h"
+#include <asm/setup.h>
 #include <linux/zutil.h>
 
 /*
@@ -15,6 +16,11 @@ int dfltcc_can_deflate(
     deflate_state *state = (deflate_state *)strm->state;
     struct dfltcc_state *dfltcc_state = GET_DFLTCC_STATE(state);
 
+    /* Check for kernel dfltcc command line parameter */
+    if (zlib_dfltcc_support == ZLIB_DFLTCC_DISABLED ||
+            zlib_dfltcc_support == ZLIB_DFLTCC_INFLATE_ONLY)
+        return 0;
+
     /* Unsupported compression settings */
     if (!dfltcc_are_params_ok(state->level, state->w_bits, state->strategy,
                               dfltcc_state->level_mask))
diff --git a/lib/zlib_dfltcc/dfltcc_inflate.c b/lib/zlib_dfltcc/dfltcc_inflate.c
index 12a93a06bd61..aa9ef23474df 100644
--- a/lib/zlib_dfltcc/dfltcc_inflate.c
+++ b/lib/zlib_dfltcc/dfltcc_inflate.c
@@ -3,6 +3,7 @@
 #include "../zlib_inflate/inflate.h"
 #include "dfltcc_util.h"
 #include "dfltcc.h"
+#include <asm/setup.h>
 #include <linux/zutil.h>
 
 /*
@@ -15,6 +16,11 @@ int dfltcc_can_inflate(
     struct inflate_state *state = (struct inflate_state *)strm->state;
     struct dfltcc_state *dfltcc_state = GET_DFLTCC_STATE(state);
 
+    /* Check for kernel dfltcc command line parameter */
+    if (zlib_dfltcc_support == ZLIB_DFLTCC_DISABLED ||
+            zlib_dfltcc_support == ZLIB_DFLTCC_DEFLATE_ONLY)
+        return 0;
+
     /* Unsupported compression settings */
     if (state->wbits != HB_BITS)
         return 0;
diff --git a/lib/zlib_dfltcc/dfltcc_util.h b/lib/zlib_dfltcc/dfltcc_util.h
index 1930b43c1915..43419d7d181b 100644
--- a/lib/zlib_dfltcc/dfltcc_util.h
+++ b/lib/zlib_dfltcc/dfltcc_util.h
@@ -3,6 +3,7 @@
 #define DFLTCC_UTIL_H
 
 #include <linux/zutil.h>
+#include <asm/setup.h>
 
 /*
  * C wrapper for the DEFLATE CONVERSION CALL instruction.
@@ -104,6 +105,10 @@ static inline int is_dfltcc_enabled(void)
     uint64_t facilities[(DFLTCC_FACILITY / 64) + 1];
     register char r0 __asm__("r0");
 
+    /* Check for kernel dfltcc command line parameter */
+    if (zlib_dfltcc_support == ZLIB_DFLTCC_DISABLED)
+        return 0;
+
     memset(facilities, 0, sizeof(facilities));
     r0 = sizeof(facilities) / sizeof(facilities[0]) - 1;
     __asm__ volatile("stfle %[facilities]\n"
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 5/5] btrfs: Increase buffer size for zlib functions
  2019-11-26 14:41 [PATCH 0/5] S390 hardware compression support for kernel zlib Mikhail Zaslonko
                   ` (3 preceding siblings ...)
  2019-11-26 14:41 ` [PATCH 4/5] s390/boot: Add dfltcc= kernel command line parameter Mikhail Zaslonko
@ 2019-11-26 14:41 ` Mikhail Zaslonko
  2019-11-26 15:52   ` Josef Bacik
  4 siblings, 1 reply; 13+ messages in thread
From: Mikhail Zaslonko @ 2019-11-26 14:41 UTC (permalink / raw)
  To: Andrew Morton, Chris Mason, Josef Bacik, David Sterba, Richard Purdie
  Cc: Heiko Carstens, Vasily Gorbik, Christian Borntraeger, linux-s390,
	linux-kernel

Due to the small size of zlib buffer (1 page) set in btrfs code, s390
hardware compression is rather limited in terms of performance. Increasing
the buffer size to 4 pages would bring significant benefit for s390
hardware compression (up to 60% better performance compared to the
PAGE_SIZE buffer) and should not bring much overhead in terms of memory
consumption due to order 2 allocations.

Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
---
 fs/btrfs/compression.c |   2 +-
 fs/btrfs/zlib.c        | 109 ++++++++++++++++++++++++++---------------
 2 files changed, 70 insertions(+), 41 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index b05b361e2062..f789b356fd8b 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -1158,7 +1158,7 @@ int btrfs_decompress_buf2page(const char *buf, unsigned long buf_start,
 	/* copy bytes from the working buffer into the pages */
 	while (working_bytes > 0) {
 		bytes = min_t(unsigned long, bvec.bv_len,
-				PAGE_SIZE - buf_offset);
+				PAGE_SIZE - (buf_offset % PAGE_SIZE));
 		bytes = min(bytes, working_bytes);
 
 		kaddr = kmap_atomic(bvec.bv_page);
diff --git a/fs/btrfs/zlib.c b/fs/btrfs/zlib.c
index df1aace5df50..c5b4654e1242 100644
--- a/fs/btrfs/zlib.c
+++ b/fs/btrfs/zlib.c
@@ -20,6 +20,8 @@
 #include <linux/refcount.h>
 #include "compression.h"
 
+#define ZLIB_BUF_SIZE    (4 * PAGE_SIZE)
+
 struct workspace {
 	z_stream strm;
 	char *buf;
@@ -76,7 +78,7 @@ static struct list_head *zlib_alloc_workspace(unsigned int level)
 			zlib_inflate_workspacesize());
 	workspace->strm.workspace = kvmalloc(workspacesize, GFP_KERNEL);
 	workspace->level = level;
-	workspace->buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
+	workspace->buf = kmalloc(ZLIB_BUF_SIZE, GFP_KERNEL);
 	if (!workspace->strm.workspace || !workspace->buf)
 		goto fail;
 
@@ -97,6 +99,7 @@ static int zlib_compress_pages(struct list_head *ws,
 			       unsigned long *total_out)
 {
 	struct workspace *workspace = list_entry(ws, struct workspace, list);
+	int i;
 	int ret;
 	char *data_in;
 	char *cpage_out;
@@ -104,6 +107,7 @@ static int zlib_compress_pages(struct list_head *ws,
 	struct page *in_page = NULL;
 	struct page *out_page = NULL;
 	unsigned long bytes_left;
+	unsigned long in_buf_pages;
 	unsigned long len = *total_out;
 	unsigned long nr_dest_pages = *out_pages;
 	const unsigned long max_out = nr_dest_pages * PAGE_SIZE;
@@ -121,9 +125,6 @@ static int zlib_compress_pages(struct list_head *ws,
 	workspace->strm.total_in = 0;
 	workspace->strm.total_out = 0;
 
-	in_page = find_get_page(mapping, start >> PAGE_SHIFT);
-	data_in = kmap(in_page);
-
 	out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
 	if (out_page == NULL) {
 		ret = -ENOMEM;
@@ -133,12 +134,34 @@ static int zlib_compress_pages(struct list_head *ws,
 	pages[0] = out_page;
 	nr_pages = 1;
 
-	workspace->strm.next_in = data_in;
+	workspace->strm.next_in = workspace->buf;
+	workspace->strm.avail_in = 0;
 	workspace->strm.next_out = cpage_out;
 	workspace->strm.avail_out = PAGE_SIZE;
-	workspace->strm.avail_in = min(len, PAGE_SIZE);
 
 	while (workspace->strm.total_in < len) {
+		/* get next set of pages and copy their contents to
+		 * the input buffer for the following deflate call
+		 */
+		if (workspace->strm.avail_in == 0) {
+			bytes_left = len - workspace->strm.total_in;
+			in_buf_pages = min(DIV_ROUND_UP(bytes_left, PAGE_SIZE),
+					   ZLIB_BUF_SIZE / PAGE_SIZE);
+			for (i = 0; i < in_buf_pages; i++) {
+				in_page = find_get_page(mapping,
+							start >> PAGE_SHIFT);
+				data_in = kmap(in_page);
+				memcpy(workspace->buf + i*PAGE_SIZE, data_in,
+				       PAGE_SIZE);
+				kunmap(in_page);
+				put_page(in_page);
+				start += PAGE_SIZE;
+			}
+			workspace->strm.avail_in = min(bytes_left,
+						       ZLIB_BUF_SIZE);
+			workspace->strm.next_in = workspace->buf;
+		}
+
 		ret = zlib_deflate(&workspace->strm, Z_SYNC_FLUSH);
 		if (ret != Z_OK) {
 			pr_debug("BTRFS: deflate in loop returned %d\n",
@@ -149,12 +172,13 @@ static int zlib_compress_pages(struct list_head *ws,
 		}
 
 		/* we're making it bigger, give up */
-		if (workspace->strm.total_in > 8192 &&
+		if (workspace->strm.total_in > ZLIB_BUF_SIZE &&
 		    workspace->strm.total_in <
 		    workspace->strm.total_out) {
 			ret = -E2BIG;
 			goto out;
 		}
+
 		/* we need another page for writing out.  Test this
 		 * before the total_in so we will pull in a new page for
 		 * the stream end if required
@@ -180,33 +204,42 @@ static int zlib_compress_pages(struct list_head *ws,
 		/* we're all done */
 		if (workspace->strm.total_in >= len)
 			break;
-
-		/* we've read in a full page, get a new one */
-		if (workspace->strm.avail_in == 0) {
-			if (workspace->strm.total_out > max_out)
-				break;
-
-			bytes_left = len - workspace->strm.total_in;
-			kunmap(in_page);
-			put_page(in_page);
-
-			start += PAGE_SIZE;
-			in_page = find_get_page(mapping,
-						start >> PAGE_SHIFT);
-			data_in = kmap(in_page);
-			workspace->strm.avail_in = min(bytes_left,
-							   PAGE_SIZE);
-			workspace->strm.next_in = data_in;
-		}
+		if (workspace->strm.total_out > max_out)
+			break;
 	}
 	workspace->strm.avail_in = 0;
-	ret = zlib_deflate(&workspace->strm, Z_FINISH);
-	zlib_deflateEnd(&workspace->strm);
-
-	if (ret != Z_STREAM_END) {
-		ret = -EIO;
-		goto out;
+	/* call deflate with Z_FINISH flush parameter providing more output
+	 * space but no more input data, until it returns with Z_STREAM_END
+	 */
+	while (ret != Z_STREAM_END) {
+		ret = zlib_deflate(&workspace->strm, Z_FINISH);
+		if (ret == Z_STREAM_END)
+			break;
+		if (ret != Z_OK && ret != Z_BUF_ERROR) {
+			zlib_deflateEnd(&workspace->strm);
+			ret = -EIO;
+			goto out;
+		} else if (workspace->strm.avail_out == 0) {
+			/* get another page for the stream end */
+			kunmap(out_page);
+			if (nr_pages == nr_dest_pages) {
+				out_page = NULL;
+				ret = -E2BIG;
+				goto out;
+			}
+			out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
+			if (out_page == NULL) {
+				ret = -ENOMEM;
+				goto out;
+			}
+			cpage_out = kmap(out_page);
+			pages[nr_pages] = out_page;
+			nr_pages++;
+			workspace->strm.avail_out = PAGE_SIZE;
+			workspace->strm.next_out = cpage_out;
+		}
 	}
+	zlib_deflateEnd(&workspace->strm);
 
 	if (workspace->strm.total_out >= workspace->strm.total_in) {
 		ret = -E2BIG;
@@ -221,10 +254,6 @@ static int zlib_compress_pages(struct list_head *ws,
 	if (out_page)
 		kunmap(out_page);
 
-	if (in_page) {
-		kunmap(in_page);
-		put_page(in_page);
-	}
 	return ret;
 }
 
@@ -250,7 +279,7 @@ static int zlib_decompress_bio(struct list_head *ws, struct compressed_bio *cb)
 
 	workspace->strm.total_out = 0;
 	workspace->strm.next_out = workspace->buf;
-	workspace->strm.avail_out = PAGE_SIZE;
+	workspace->strm.avail_out = ZLIB_BUF_SIZE;
 
 	/* If it's deflate, and it's got no preset dictionary, then
 	   we can tell zlib to skip the adler32 check. */
@@ -289,7 +318,7 @@ static int zlib_decompress_bio(struct list_head *ws, struct compressed_bio *cb)
 		}
 
 		workspace->strm.next_out = workspace->buf;
-		workspace->strm.avail_out = PAGE_SIZE;
+		workspace->strm.avail_out = ZLIB_BUF_SIZE;
 
 		if (workspace->strm.avail_in == 0) {
 			unsigned long tmp;
@@ -340,7 +369,7 @@ static int zlib_decompress(struct list_head *ws, unsigned char *data_in,
 	workspace->strm.total_in = 0;
 
 	workspace->strm.next_out = workspace->buf;
-	workspace->strm.avail_out = PAGE_SIZE;
+	workspace->strm.avail_out = ZLIB_BUF_SIZE;
 	workspace->strm.total_out = 0;
 	/* If it's deflate, and it's got no preset dictionary, then
 	   we can tell zlib to skip the adler32 check. */
@@ -384,7 +413,7 @@ static int zlib_decompress(struct list_head *ws, unsigned char *data_in,
 			buf_offset = 0;
 
 		bytes = min(PAGE_SIZE - pg_offset,
-			    PAGE_SIZE - buf_offset);
+			    PAGE_SIZE - (buf_offset % PAGE_SIZE));
 		bytes = min(bytes, bytes_left);
 
 		kaddr = kmap_atomic(dest_page);
@@ -395,7 +424,7 @@ static int zlib_decompress(struct list_head *ws, unsigned char *data_in,
 		bytes_left -= bytes;
 next:
 		workspace->strm.next_out = workspace->buf;
-		workspace->strm.avail_out = PAGE_SIZE;
+		workspace->strm.avail_out = ZLIB_BUF_SIZE;
 	}
 
 	if (ret != Z_STREAM_END && bytes_left != 0)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 5/5] btrfs: Increase buffer size for zlib functions
  2019-11-26 14:41 ` [PATCH 5/5] btrfs: Increase buffer size for zlib functions Mikhail Zaslonko
@ 2019-11-26 15:52   ` Josef Bacik
  2019-11-27 12:14     ` David Sterba
  2019-11-27 13:42     ` Zaslonko Mikhail
  0 siblings, 2 replies; 13+ messages in thread
From: Josef Bacik @ 2019-11-26 15:52 UTC (permalink / raw)
  To: Mikhail Zaslonko
  Cc: Andrew Morton, Chris Mason, Josef Bacik, David Sterba,
	Richard Purdie, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, linux-s390, linux-kernel

On Tue, Nov 26, 2019 at 03:41:30PM +0100, Mikhail Zaslonko wrote:
> Due to the small size of zlib buffer (1 page) set in btrfs code, s390
> hardware compression is rather limited in terms of performance. Increasing
> the buffer size to 4 pages would bring significant benefit for s390
> hardware compression (up to 60% better performance compared to the
> PAGE_SIZE buffer) and should not bring much overhead in terms of memory
> consumption due to order 2 allocations.
> 
> Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>

We may have to make these allocations under memory pressure in the IO context,
order 2 allocations here is going to be not awesome.  If you really want it then
you need to at least be able to fall back to single page if you fail to get the
allocation.  Thanks,

josef

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 5/5] btrfs: Increase buffer size for zlib functions
  2019-11-26 15:52   ` Josef Bacik
@ 2019-11-27 12:14     ` David Sterba
  2019-11-27 13:54       ` Josef Bacik
  2019-12-09 16:39       ` Zaslonko Mikhail
  2019-11-27 13:42     ` Zaslonko Mikhail
  1 sibling, 2 replies; 13+ messages in thread
From: David Sterba @ 2019-11-27 12:14 UTC (permalink / raw)
  To: Josef Bacik
  Cc: Mikhail Zaslonko, Andrew Morton, Chris Mason, David Sterba,
	Richard Purdie, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, linux-s390, linux-kernel

On Tue, Nov 26, 2019 at 10:52:49AM -0500, Josef Bacik wrote:
> On Tue, Nov 26, 2019 at 03:41:30PM +0100, Mikhail Zaslonko wrote:
> > Due to the small size of zlib buffer (1 page) set in btrfs code, s390
> > hardware compression is rather limited in terms of performance. Increasing
> > the buffer size to 4 pages would bring significant benefit for s390
> > hardware compression (up to 60% better performance compared to the
> > PAGE_SIZE buffer) and should not bring much overhead in terms of memory
> > consumption due to order 2 allocations.
> > 
> > Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
> 
> We may have to make these allocations under memory pressure in the IO context,
> order 2 allocations here is going to be not awesome.  If you really want it then
> you need to at least be able to fall back to single page if you fail to get the
> allocation.  Thanks,

The allocation is only for the workspace and it does not happen on the
IO path for each call. There's the pool and if

btrfs_get_workspace
  alloc_workspace

fails, then there's fallback path to wait for an existing workspace to
be free.

The order 2 allocation can put more pressure on the allocator though so
it's possible to have effects in some corner cases, but not in general.
I don't think the single page fallback code is needed.

And of course evaluation of the effects of the larger zlib buffer should
be done, it could improve compression but probably at the cost of cpu
time. Also decompression of blocks created on new code (4 pages) must
work on the old code (1 page).

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 5/5] btrfs: Increase buffer size for zlib functions
  2019-11-26 15:52   ` Josef Bacik
  2019-11-27 12:14     ` David Sterba
@ 2019-11-27 13:42     ` Zaslonko Mikhail
  2019-11-27 13:50       ` Josef Bacik
  2019-11-27 13:54       ` David Sterba
  1 sibling, 2 replies; 13+ messages in thread
From: Zaslonko Mikhail @ 2019-11-27 13:42 UTC (permalink / raw)
  To: Josef Bacik
  Cc: Andrew Morton, Chris Mason, David Sterba, Richard Purdie,
	Heiko Carstens, Vasily Gorbik, Christian Borntraeger, linux-s390,
	linux-kernel

Hello,

On 26.11.2019 16:52, Josef Bacik wrote:
> On Tue, Nov 26, 2019 at 03:41:30PM +0100, Mikhail Zaslonko wrote:
>> Due to the small size of zlib buffer (1 page) set in btrfs code, s390
>> hardware compression is rather limited in terms of performance. Increasing
>> the buffer size to 4 pages would bring significant benefit for s390
>> hardware compression (up to 60% better performance compared to the
>> PAGE_SIZE buffer) and should not bring much overhead in terms of memory
>> consumption due to order 2 allocations.
>>
>> Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
> 
> We may have to make these allocations under memory pressure in the IO context,
> order 2 allocations here is going to be not awesome.  If you really want it then
> you need to at least be able to fall back to single page if you fail to get the
> allocation.  Thanks,
> 

As far as I understand GFP_KERNEL allocations would never fail for the order <= 
PAGE_ALLOC_COSTLY_ORDER. How else can the memory pressure condition be identified
here?

> josef
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 5/5] btrfs: Increase buffer size for zlib functions
  2019-11-27 13:42     ` Zaslonko Mikhail
@ 2019-11-27 13:50       ` Josef Bacik
  2019-11-27 13:54       ` David Sterba
  1 sibling, 0 replies; 13+ messages in thread
From: Josef Bacik @ 2019-11-27 13:50 UTC (permalink / raw)
  To: Zaslonko Mikhail
  Cc: Josef Bacik, Andrew Morton, Chris Mason, David Sterba,
	Richard Purdie, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, linux-s390, linux-kernel

On Wed, Nov 27, 2019 at 02:42:20PM +0100, Zaslonko Mikhail wrote:
> Hello,
> 
> On 26.11.2019 16:52, Josef Bacik wrote:
> > On Tue, Nov 26, 2019 at 03:41:30PM +0100, Mikhail Zaslonko wrote:
> >> Due to the small size of zlib buffer (1 page) set in btrfs code, s390
> >> hardware compression is rather limited in terms of performance. Increasing
> >> the buffer size to 4 pages would bring significant benefit for s390
> >> hardware compression (up to 60% better performance compared to the
> >> PAGE_SIZE buffer) and should not bring much overhead in terms of memory
> >> consumption due to order 2 allocations.
> >>
> >> Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
> > 
> > We may have to make these allocations under memory pressure in the IO context,
> > order 2 allocations here is going to be not awesome.  If you really want it then
> > you need to at least be able to fall back to single page if you fail to get the
> > allocation.  Thanks,
> > 
> 
> As far as I understand GFP_KERNEL allocations would never fail for the order <= 
> PAGE_ALLOC_COSTLY_ORDER. How else can the memory pressure condition be identified
> here?
> 

Except these can be done under NOFS, and just because GFP_KERNEL probably won't
fail doesn't mean it won't cause problems accross the system at alloc time.
Half of our rebase time at Facebook is spent finding all the fun ways people
have abused memory allocations not thinking about how the system behaves under
memory pressure.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 5/5] btrfs: Increase buffer size for zlib functions
  2019-11-27 12:14     ` David Sterba
@ 2019-11-27 13:54       ` Josef Bacik
  2019-12-09 16:39       ` Zaslonko Mikhail
  1 sibling, 0 replies; 13+ messages in thread
From: Josef Bacik @ 2019-11-27 13:54 UTC (permalink / raw)
  To: David Sterba
  Cc: Josef Bacik, Mikhail Zaslonko, Andrew Morton, Chris Mason,
	David Sterba, Richard Purdie, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, linux-s390, linux-kernel

On Wed, Nov 27, 2019 at 01:14:23PM +0100, David Sterba wrote:
> On Tue, Nov 26, 2019 at 10:52:49AM -0500, Josef Bacik wrote:
> > On Tue, Nov 26, 2019 at 03:41:30PM +0100, Mikhail Zaslonko wrote:
> > > Due to the small size of zlib buffer (1 page) set in btrfs code, s390
> > > hardware compression is rather limited in terms of performance. Increasing
> > > the buffer size to 4 pages would bring significant benefit for s390
> > > hardware compression (up to 60% better performance compared to the
> > > PAGE_SIZE buffer) and should not bring much overhead in terms of memory
> > > consumption due to order 2 allocations.
> > > 
> > > Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
> > 
> > We may have to make these allocations under memory pressure in the IO context,
> > order 2 allocations here is going to be not awesome.  If you really want it then
> > you need to at least be able to fall back to single page if you fail to get the
> > allocation.  Thanks,
> 
> The allocation is only for the workspace and it does not happen on the
> IO path for each call. There's the pool and if
> 
> btrfs_get_workspace
>   alloc_workspace
> 
> fails, then there's fallback path to wait for an existing workspace to
> be free.

Only if we are maxed out, otherwise it tries to allocate.  If it can happen it
will happen, and I'll be the guy swearing in the middle of the night trying to
deal with this making boxes fall over in production.  I'm ok if we pre-allocate
them first and only do 1 page on-demand, but having it always this way will bite
us in the ass in production.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 5/5] btrfs: Increase buffer size for zlib functions
  2019-11-27 13:42     ` Zaslonko Mikhail
  2019-11-27 13:50       ` Josef Bacik
@ 2019-11-27 13:54       ` David Sterba
  1 sibling, 0 replies; 13+ messages in thread
From: David Sterba @ 2019-11-27 13:54 UTC (permalink / raw)
  To: Zaslonko Mikhail
  Cc: Josef Bacik, Andrew Morton, Chris Mason, David Sterba,
	Richard Purdie, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, linux-s390, linux-kernel

On Wed, Nov 27, 2019 at 02:42:20PM +0100, Zaslonko Mikhail wrote:
> Hello,
> 
> On 26.11.2019 16:52, Josef Bacik wrote:
> > On Tue, Nov 26, 2019 at 03:41:30PM +0100, Mikhail Zaslonko wrote:
> >> Due to the small size of zlib buffer (1 page) set in btrfs code, s390
> >> hardware compression is rather limited in terms of performance. Increasing
> >> the buffer size to 4 pages would bring significant benefit for s390
> >> hardware compression (up to 60% better performance compared to the
> >> PAGE_SIZE buffer) and should not bring much overhead in terms of memory
> >> consumption due to order 2 allocations.
> >>
> >> Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
> > 
> > We may have to make these allocations under memory pressure in the IO context,
> > order 2 allocations here is going to be not awesome.  If you really want it then
> > you need to at least be able to fall back to single page if you fail to get the
> > allocation.  Thanks,
> 
> As far as I understand GFP_KERNEL allocations would never fail for the order <= 
> PAGE_ALLOC_COSTLY_ORDER.

There's no guaranteed no-fail semantics for GFP flags (obviously besides
__GFP_NOFAIL), GFP_KERNEL can fail and GFP_NOFS is unlikely to fail for
order below costly allocations. This depends on the allocator internals
and has never been an API-level guarantee AFAIK. There's ongoing to work
to relax the allocator constraints and allow to fail in more cases
(like for GFP_NOFS).

> How else can the memory pressure condition be identified
> here?

All data write paths must consider what happens under memory pressure,
because the reason to write the data could be started by an allocation
that can get free memory by writing dirty data. So it's kind of implied
here.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 5/5] btrfs: Increase buffer size for zlib functions
  2019-11-27 12:14     ` David Sterba
  2019-11-27 13:54       ` Josef Bacik
@ 2019-12-09 16:39       ` Zaslonko Mikhail
  1 sibling, 0 replies; 13+ messages in thread
From: Zaslonko Mikhail @ 2019-12-09 16:39 UTC (permalink / raw)
  To: dsterba
  Cc: Josef Bacik, Andrew Morton, Chris Mason, David Sterba,
	Richard Purdie, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, linux-s390, linux-kernel

Hi David,

On 27.11.2019 13:14, David Sterba wrote:
> On Tue, Nov 26, 2019 at 10:52:49AM -0500, Josef Bacik wrote:
>> On Tue, Nov 26, 2019 at 03:41:30PM +0100, Mikhail Zaslonko wrote:
>>> Due to the small size of zlib buffer (1 page) set in btrfs code, s390
>>> hardware compression is rather limited in terms of performance. Increasing
>>> the buffer size to 4 pages would bring significant benefit for s390
>>> hardware compression (up to 60% better performance compared to the
>>> PAGE_SIZE buffer) and should not bring much overhead in terms of memory
>>> consumption due to order 2 allocations.
>>>
>>> Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
>>
>> We may have to make these allocations under memory pressure in the IO context,
>> order 2 allocations here is going to be not awesome.  If you really want it then
>> you need to at least be able to fall back to single page if you fail to get the
>> allocation.  Thanks,
> 
> The allocation is only for the workspace and it does not happen on the
> IO path for each call. There's the pool and if
> 
> btrfs_get_workspace
>   alloc_workspace
> 
> fails, then there's fallback path to wait for an existing workspace to
> be free.
> 
> The order 2 allocation can put more pressure on the allocator though so
> it's possible to have effects in some corner cases, but not in general.
> I don't think the single page fallback code is needed.
> 
> And of course evaluation of the effects of the larger zlib buffer should
> be done, it could improve compression but probably at the cost of cpu
> time. Also decompression of blocks created on new code (4 pages) must
> work on the old code (1 page).
Regarding 'improve compression but probably at the cost of cpu' ... what would be 
the proper way to evaluate this effect?
As for backward compatibility, I do not see side effects of using larger buffers. 
Data in the compressed state might differ indeed, but it will sill conform to zlib
standard and thus can be decompressed.

BTW, I have sent around V2 of the patch set. I would appreciate if you take a look 
as well.
> 
Thanks,
Mikhail

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-12-09 16:40 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-26 14:41 [PATCH 0/5] S390 hardware compression support for kernel zlib Mikhail Zaslonko
2019-11-26 14:41 ` [PATCH 1/5] lib/zlib: Add s390 hardware support for kernel zlib_deflate Mikhail Zaslonko
2019-11-26 14:41 ` [PATCH 2/5] s390/boot: Rename HEAP_SIZE due to name collision Mikhail Zaslonko
2019-11-26 14:41 ` [PATCH 3/5] lib/zlib: Add s390 hardware support for kernel zlib_inflate Mikhail Zaslonko
2019-11-26 14:41 ` [PATCH 4/5] s390/boot: Add dfltcc= kernel command line parameter Mikhail Zaslonko
2019-11-26 14:41 ` [PATCH 5/5] btrfs: Increase buffer size for zlib functions Mikhail Zaslonko
2019-11-26 15:52   ` Josef Bacik
2019-11-27 12:14     ` David Sterba
2019-11-27 13:54       ` Josef Bacik
2019-12-09 16:39       ` Zaslonko Mikhail
2019-11-27 13:42     ` Zaslonko Mikhail
2019-11-27 13:50       ` Josef Bacik
2019-11-27 13:54       ` David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).