All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] Fix stack usage on parisc & improve code size bloat
@ 2021-11-17 20:14 Nick Terrell
  2021-11-17 20:14 ` [PATCH v2 1/3] lib: zstd: Fix unused variable warning Nick Terrell
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Nick Terrell @ 2021-11-17 20:14 UTC (permalink / raw)
  To: Nick Terrell
  Cc: linux-kernel, linux-parisc, Helge Deller, Geert Uytterhoeven,
	Linus Torvalds

From: Nick Terrell <terrelln@fb.com>

I'll be sending these patches to Linus through my tree. Just submitting them
for comment before I do so. I'm somewhat unsure of the correct workflow for
patches I submit myself, so let me know if there is something different I
should do.

This patch set contains 3 commits:
1. Fixes a minor unused variable warning reported by Kernel test robot [0].
2. Improves the reported code bloat (-88KB / 374KB) [1] by outlining
   some functions that are unlikely to be used in performance sensitive
   workloads.
3. Fixes the reported excess stack usage on parisc [2] by removing -O3
   from zstd's compilation flags. -O3 triggered bugs in the hppa-linux-gnu
   gcc-8 compiler. -O2 performance is acceptable: neutral compression,
   about -1% decompression speed. We also reduce code bloat
   (-105KB / 374KB).

After this commit our code bloat is cut from 374KB to 105KB with gcc-11.
If we wanted to cut the remaining 105KB we'd likely have to trade
signicant performance, so I want to say that this is enough for now.

We should be able to get further gains without sacrificing speed, but
that will take some significant optimization effort, and isn't suitable
for a quick fix. I've opened an upstream issue [3] to track the code size,
and try to avoid future regressions, and improve it in the long term.

[0] https://lore.kernel.org/linux-mm/202111120312.833wII4i-lkp@intel.com/T/
[1] https://lkml.org/lkml/2021/11/15/710
[2] https://lkml.org/lkml/2021/11/14/189
[3] https://github.com/facebook/zstd/issues/2867

Signed-off-by: Nick Terrell <terrelln@fb.com>

v1 -> v2:
* (1/3) Update commit message & add a comment.

Nick Terrell (3):
  lib: zstd: Fix unused variable warning
  lib: zstd: Don't inline functions in zstd_opt.c
  lib: zstd: Don't add -O3 to cflags

 lib/zstd/Makefile                            |  2 --
 lib/zstd/common/compiler.h                   |  7 +++++++
 lib/zstd/compress/zstd_compress_superblock.c |  2 ++
 lib/zstd/compress/zstd_opt.c                 | 14 +++++++++++++-
 4 files changed, 22 insertions(+), 3 deletions(-)

--
2.33.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 1/3] lib: zstd: Fix unused variable warning
  2021-11-17 20:14 [PATCH v2 0/3] Fix stack usage on parisc & improve code size bloat Nick Terrell
@ 2021-11-17 20:14 ` Nick Terrell
  2021-11-18  7:50   ` Geert Uytterhoeven
  2021-11-17 20:14 ` [PATCH v2 2/3] lib: zstd: Don't inline functions in zstd_opt.c Nick Terrell
  2021-11-17 20:14 ` [PATCH v2 3/3] lib: zstd: Don't add -O3 to cflags Nick Terrell
  2 siblings, 1 reply; 9+ messages in thread
From: Nick Terrell @ 2021-11-17 20:14 UTC (permalink / raw)
  To: Nick Terrell
  Cc: linux-kernel, linux-parisc, Helge Deller, Geert Uytterhoeven,
	Linus Torvalds, kernel test robot

From: Nick Terrell <terrelln@fb.com>

The variable `litLengthSum` is only used by an `assert()`, so when
asserts are disabled the compiler doesn't see any usage and warns.

This issue is already fixed upstream by PR #2838 [0]. It was reported
by the Kernel test robot in [1].

Another approach would be to change zstd's disabled `assert()`
definition to use the argument in a disabled branch, instead of
ignoring the argument. I've avoided this approach because there are
some small changes necessary to get zstd to build, and I would
want to thoroughly re-test for performance, since that is slightly
changing the code in every function in zstd. It seems like a
trivial change, but some functions are pretty sensitive to small
changes. However, I think it is a valid approach that I would
like to see upstream take, so I've opened Issue #2868 to attempt
this upstream.

[0] https://github.com/facebook/zstd/pull/2838
[1] https://lore.kernel.org/linux-mm/202111120312.833wII4i-lkp@intel.com/T/
[2] https://github.com/facebook/zstd/issues/2868

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Nick Terrell <terrelln@fb.com>
---
 lib/zstd/compress/zstd_compress_superblock.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/zstd/compress/zstd_compress_superblock.c b/lib/zstd/compress/zstd_compress_superblock.c
index ee03e0aedb03..b0610b255653 100644
--- a/lib/zstd/compress/zstd_compress_superblock.c
+++ b/lib/zstd/compress/zstd_compress_superblock.c
@@ -411,6 +411,8 @@ static size_t ZSTD_seqDecompressedSize(seqStore_t const* seqStore, const seqDef*
     const seqDef* sp = sstart;
     size_t matchLengthSum = 0;
     size_t litLengthSum = 0;
+    /* Only used by assert(), suppress unused variable warnings in production. */
+    (void)litLengthSum;
     while (send-sp > 0) {
         ZSTD_sequenceLength const seqLen = ZSTD_getSequenceLength(seqStore, sp);
         litLengthSum += seqLen.litLength;
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 2/3] lib: zstd: Don't inline functions in zstd_opt.c
  2021-11-17 20:14 [PATCH v2 0/3] Fix stack usage on parisc & improve code size bloat Nick Terrell
  2021-11-17 20:14 ` [PATCH v2 1/3] lib: zstd: Fix unused variable warning Nick Terrell
@ 2021-11-17 20:14 ` Nick Terrell
  2021-11-18  7:52   ` Geert Uytterhoeven
  2021-11-17 20:14 ` [PATCH v2 3/3] lib: zstd: Don't add -O3 to cflags Nick Terrell
  2 siblings, 1 reply; 9+ messages in thread
From: Nick Terrell @ 2021-11-17 20:14 UTC (permalink / raw)
  To: Nick Terrell
  Cc: linux-kernel, linux-parisc, Helge Deller, Geert Uytterhoeven,
	Linus Torvalds

From: Nick Terrell <terrelln@fb.com>

`zstd_opt.c` contains the match finder for the highest compression
levels. These levels are already very slow, and are unlikely to be used
in the kernel. If they are used, they shouldn't be used in latency
sensitive workloads, so slowing them down shouldn't be a big deal.

This saves 188 KB of the 288 KB regression reported by Geert Uytterhoeven [0].
I've also opened an issue upstream [1] so that we can properly tackle
the code size issue in `zstd_opt.c` for all users, and can hopefully
remove this hack in the next zstd version we import.

Bloat-o-meter output on x86-64:

```
> ../scripts/bloat-o-meter vmlinux.old vmlinux
add/remove: 6/5 grow/shrink: 1/9 up/down: 16673/-209939 (-193266)
Function                                     old     new   delta
ZSTD_compressBlock_opt_generic.constprop       -    7559   +7559
ZSTD_insertBtAndGetAllMatches                  -    6304   +6304
ZSTD_insertBt1                                 -    1731   +1731
ZSTD_storeSeq                                  -     693    +693
ZSTD_BtGetAllMatches                           -     255    +255
ZSTD_updateRep                                 -     128    +128
ZSTD_updateTree                               96      99      +3
ZSTD_insertAndFindFirstIndexHash3             81       -     -81
ZSTD_setBasePrices.constprop                  98       -     -98
ZSTD_litLengthPrice.constprop                138       -    -138
ZSTD_count                                   362     181    -181
ZSTD_count_2segments                        1407     938    -469
ZSTD_insertBt1.constprop                    2689       -   -2689
ZSTD_compressBlock_btultra2                19990     423  -19567
ZSTD_compressBlock_btultra                 19633      15  -19618
ZSTD_initStats_ultra                       19825       -  -19825
ZSTD_compressBlock_btopt                   20374      12  -20362
ZSTD_compressBlock_btopt_extDict           29984      12  -29972
ZSTD_compressBlock_btultra_extDict         30718      15  -30703
ZSTD_compressBlock_btopt_dictMatchState    32689      12  -32677
ZSTD_compressBlock_btultra_dictMatchState   33574      15  -33559
Total: Before=6611828, After=6418562, chg -2.92%
```

[0] https://lkml.org/lkml/2021/11/14/189
[1] https://github.com/facebook/zstd/issues/2862

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Nick Terrell <terrelln@fb.com>
---
 lib/zstd/common/compiler.h   |  7 +++++++
 lib/zstd/compress/zstd_opt.c | 14 +++++++++++++-
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/lib/zstd/common/compiler.h b/lib/zstd/common/compiler.h
index a1a051e4bce6..f5a9c70a228a 100644
--- a/lib/zstd/common/compiler.h
+++ b/lib/zstd/common/compiler.h
@@ -16,6 +16,7 @@
 *********************************************************/
 /* force inlining */
 
+#if !defined(ZSTD_NO_INLINE)
 #if (defined(__GNUC__) && !defined(__STRICT_ANSI__)) || defined(__cplusplus) || defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L   /* C99 */
 #  define INLINE_KEYWORD inline
 #else
@@ -24,6 +25,12 @@
 
 #define FORCE_INLINE_ATTR __attribute__((always_inline))
 
+#else
+
+#define INLINE_KEYWORD
+#define FORCE_INLINE_ATTR
+
+#endif
 
 /*
   On MSVC qsort requires that functions passed into it use the __cdecl calling conversion(CC).
diff --git a/lib/zstd/compress/zstd_opt.c b/lib/zstd/compress/zstd_opt.c
index 04337050fe9a..09483f518dc3 100644
--- a/lib/zstd/compress/zstd_opt.c
+++ b/lib/zstd/compress/zstd_opt.c
@@ -8,6 +8,18 @@
  * You may select, at your option, one of the above-listed licenses.
  */
 
+/*
+ * Disable inlining for the optimal parser for the kernel build.
+ * It is unlikely to be used in the kernel, and where it is used
+ * latency shouldn't matter because it is very slow to begin with.
+ * We prefer a ~180KB binary size win over faster optimal parsing.
+ *
+ * TODO(https://github.com/facebook/zstd/issues/2862):
+ * Improve the code size of the optimal parser in general, so we
+ * don't need this hack for the kernel build.
+ */
+#define ZSTD_NO_INLINE 1
+
 #include "zstd_compress_internal.h"
 #include "hist.h"
 #include "zstd_opt.h"
@@ -894,7 +906,7 @@ static void ZSTD_optLdm_processMatchCandidate(ZSTD_optLdm_t* optLdm, ZSTD_match_
              */
             U32 posOvershoot = currPosInBlock - optLdm->endPosInBlock;
             ZSTD_optLdm_skipRawSeqStoreBytes(&optLdm->seqStore, posOvershoot);
-        } 
+        }
         ZSTD_opt_getNextMatchAndUpdateSeqStore(optLdm, currPosInBlock, remainingBytes);
     }
     ZSTD_optLdm_maybeAddMatch(matches, nbMatches, optLdm, currPosInBlock);
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 3/3] lib: zstd: Don't add -O3 to cflags
  2021-11-17 20:14 [PATCH v2 0/3] Fix stack usage on parisc & improve code size bloat Nick Terrell
  2021-11-17 20:14 ` [PATCH v2 1/3] lib: zstd: Fix unused variable warning Nick Terrell
  2021-11-17 20:14 ` [PATCH v2 2/3] lib: zstd: Don't inline functions in zstd_opt.c Nick Terrell
@ 2021-11-17 20:14 ` Nick Terrell
  2021-11-18  7:56   ` Geert Uytterhoeven
  2 siblings, 1 reply; 9+ messages in thread
From: Nick Terrell @ 2021-11-17 20:14 UTC (permalink / raw)
  To: Nick Terrell
  Cc: linux-kernel, linux-parisc, Helge Deller, Geert Uytterhoeven,
	Linus Torvalds

From: Nick Terrell <terrelln@fb.com>

After the update to zstd-1.4.10 passing -O3 is no longer necessary to
get good performance from zstd. Using the default optimization level -O2
is sufficient to get good performance.

I've measured no significant change to compression speed, and a ~1%
decompression speed loss, which is acceptable.

This fixes the reported parisc -Wframe-larger-than=1536 errors [0]. The
gcc-8-hppa-linux-gnu compiler performed very poorly with -O3, generating
stacks that are ~3KB. With -O2 these same functions generate stacks in
the < 100B, completely fixing the problem. Function size deltas are
listed below:

ZSTD_compressBlock_fast_extDict_generic: 3800 -> 68
ZSTD_compressBlock_fast: 2216 -> 40
ZSTD_compressBlock_fast_dictMatchState: 1848 ->  64
ZSTD_compressBlock_doubleFast_extDict_generic: 3744 -> 76
ZSTD_fillDoubleHashTable: 3252 -> 0
ZSTD_compressBlock_doubleFast: 5856 -> 36
ZSTD_compressBlock_doubleFast_dictMatchState: 5380 -> 84
ZSTD_copmressBlock_lazy2: 2420 -> 72

Additionally, this improves the reported code bloat [1]. With gcc-11
bloat-o-meter shows an 80KB code size improvement:

```
> ../scripts/bloat-o-meter vmlinux.old vmlinux
add/remove: 31/8 grow/shrink: 24/155 up/down: 25734/-107924 (-82190)
Total: Before=6418562, After=6336372, chg -1.28%
```

Compared to before the zstd-1.4.10 update we see a total code size
regression of 105KB, down from 374KB at v5.16-rc1:

```
> ../scripts/bloat-o-meter vmlinux.old vmlinux
add/remove: 292/62 grow/shrink: 56/88 up/down: 235009/-127487 (107522)
Total: Before=6228850, After=6336372, chg +1.73%
```

[0] https://lkml.org/lkml/2021/11/15/710
[1] https://lkml.org/lkml/2021/11/14/189

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Nick Terrell <terrelln@fb.com>
---
 lib/zstd/Makefile | 2 --
 1 file changed, 2 deletions(-)

diff --git a/lib/zstd/Makefile b/lib/zstd/Makefile
index 65218ec5b8f2..fc45339fc3a3 100644
--- a/lib/zstd/Makefile
+++ b/lib/zstd/Makefile
@@ -11,8 +11,6 @@
 obj-$(CONFIG_ZSTD_COMPRESS) += zstd_compress.o
 obj-$(CONFIG_ZSTD_DECOMPRESS) += zstd_decompress.o
 
-ccflags-y += -O3
-
 zstd_compress-y := \
 		zstd_compress_module.o \
 		common/debug.o \
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/3] lib: zstd: Fix unused variable warning
  2021-11-17 20:14 ` [PATCH v2 1/3] lib: zstd: Fix unused variable warning Nick Terrell
@ 2021-11-18  7:50   ` Geert Uytterhoeven
  2021-11-18  8:21     ` Nick Terrell
  0 siblings, 1 reply; 9+ messages in thread
From: Geert Uytterhoeven @ 2021-11-18  7:50 UTC (permalink / raw)
  To: Nick Terrell
  Cc: Nick Terrell, Linux Kernel Mailing List, Parisc List,
	Helge Deller, Linus Torvalds, kernel test robot

Hi Nick,

On Wed, Nov 17, 2021 at 9:08 PM Nick Terrell <nickrterrell@gmail.com> wrote:
> The variable `litLengthSum` is only used by an `assert()`, so when
> asserts are disabled the compiler doesn't see any usage and warns.
>
> This issue is already fixed upstream by PR #2838 [0]. It was reported
> by the Kernel test robot in [1].
>
> Another approach would be to change zstd's disabled `assert()`
> definition to use the argument in a disabled branch, instead of
> ignoring the argument. I've avoided this approach because there are
> some small changes necessary to get zstd to build, and I would
> want to thoroughly re-test for performance, since that is slightly
> changing the code in every function in zstd. It seems like a
> trivial change, but some functions are pretty sensitive to small
> changes. However, I think it is a valid approach that I would
> like to see upstream take, so I've opened Issue #2868 to attempt
> this upstream.
>
> [0] https://github.com/facebook/zstd/pull/2838
> [1] https://lore.kernel.org/linux-mm/202111120312.833wII4i-lkp@intel.com/T/
> [2] https://github.com/facebook/zstd/issues/2868
>
> Reported-by: kernel test robot <lkp@intel.com>
> Signed-off-by: Nick Terrell <terrelln@fb.com>

Thanks for your patch!

> --- a/lib/zstd/compress/zstd_compress_superblock.c
> +++ b/lib/zstd/compress/zstd_compress_superblock.c
> @@ -411,6 +411,8 @@ static size_t ZSTD_seqDecompressedSize(seqStore_t const* seqStore, const seqDef*
>      const seqDef* sp = sstart;
>      size_t matchLengthSum = 0;
>      size_t litLengthSum = 0;
> +    /* Only used by assert(), suppress unused variable warnings in production. */
> +    (void)litLengthSum;

The Linux way-to do this is to add __maybe_unused.
But perhaps you don't want to introduce that in the upstream codebase.

>      while (send-sp > 0) {
>          ZSTD_sequenceLength const seqLen = ZSTD_getSequenceLength(seqStore, sp);
>          litLengthSum += seqLen.litLength;

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/3] lib: zstd: Don't inline functions in zstd_opt.c
  2021-11-17 20:14 ` [PATCH v2 2/3] lib: zstd: Don't inline functions in zstd_opt.c Nick Terrell
@ 2021-11-18  7:52   ` Geert Uytterhoeven
  2021-11-18  8:22     ` Nick Terrell
  0 siblings, 1 reply; 9+ messages in thread
From: Geert Uytterhoeven @ 2021-11-18  7:52 UTC (permalink / raw)
  To: Nick Terrell
  Cc: Nick Terrell, Linux Kernel Mailing List, Parisc List,
	Helge Deller, Linus Torvalds

Hi Nick,

On Wed, Nov 17, 2021 at 9:08 PM Nick Terrell <nickrterrell@gmail.com> wrote:
> `zstd_opt.c` contains the match finder for the highest compression
> levels. These levels are already very slow, and are unlikely to be used
> in the kernel. If they are used, they shouldn't be used in latency
> sensitive workloads, so slowing them down shouldn't be a big deal.
>
> This saves 188 KB of the 288 KB regression reported by Geert Uytterhoeven [0].
> I've also opened an issue upstream [1] so that we can properly tackle
> the code size issue in `zstd_opt.c` for all users, and can hopefully
> remove this hack in the next zstd version we import.
>
> Bloat-o-meter output on x86-64:
>
> ```
> > ../scripts/bloat-o-meter vmlinux.old vmlinux
> add/remove: 6/5 grow/shrink: 1/9 up/down: 16673/-209939 (-193266)
> Function                                     old     new   delta
> ZSTD_compressBlock_opt_generic.constprop       -    7559   +7559
> ZSTD_insertBtAndGetAllMatches                  -    6304   +6304
> ZSTD_insertBt1                                 -    1731   +1731
> ZSTD_storeSeq                                  -     693    +693
> ZSTD_BtGetAllMatches                           -     255    +255
> ZSTD_updateRep                                 -     128    +128
> ZSTD_updateTree                               96      99      +3
> ZSTD_insertAndFindFirstIndexHash3             81       -     -81
> ZSTD_setBasePrices.constprop                  98       -     -98
> ZSTD_litLengthPrice.constprop                138       -    -138
> ZSTD_count                                   362     181    -181
> ZSTD_count_2segments                        1407     938    -469
> ZSTD_insertBt1.constprop                    2689       -   -2689
> ZSTD_compressBlock_btultra2                19990     423  -19567
> ZSTD_compressBlock_btultra                 19633      15  -19618
> ZSTD_initStats_ultra                       19825       -  -19825
> ZSTD_compressBlock_btopt                   20374      12  -20362
> ZSTD_compressBlock_btopt_extDict           29984      12  -29972
> ZSTD_compressBlock_btultra_extDict         30718      15  -30703
> ZSTD_compressBlock_btopt_dictMatchState    32689      12  -32677
> ZSTD_compressBlock_btultra_dictMatchState   33574      15  -33559
> Total: Before=6611828, After=6418562, chg -2.92%
> ```
>
> [0] https://lkml.org/lkml/2021/11/14/189
> [1] https://github.com/facebook/zstd/issues/2862
>
> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
> Signed-off-by: Nick Terrell <terrelln@fb.com>

Thanks for your patch!

Impact on lib/zstd/zstd_compress.ko for atari_defconfig:

    add/remove: 5/4 grow/shrink: 1/9 up/down: 15392/-167214 (-151822)

Nice!

Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>

> --- a/lib/zstd/compress/zstd_opt.c
> +++ b/lib/zstd/compress/zstd_opt.c
> @@ -894,7 +906,7 @@ static void ZSTD_optLdm_processMatchCandidate(ZSTD_optLdm_t* optLdm, ZSTD_match_
>               */
>              U32 posOvershoot = currPosInBlock - optLdm->endPosInBlock;
>              ZSTD_optLdm_skipRawSeqStoreBytes(&optLdm->seqStore, posOvershoot);
> -        }
> +        }
>          ZSTD_opt_getNextMatchAndUpdateSeqStore(optLdm, currPosInBlock, remainingBytes);
>      }
>      ZSTD_optLdm_maybeAddMatch(matches, nbMatches, optLdm, currPosInBlock);

This change is unrelated. With that removed:
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 3/3] lib: zstd: Don't add -O3 to cflags
  2021-11-17 20:14 ` [PATCH v2 3/3] lib: zstd: Don't add -O3 to cflags Nick Terrell
@ 2021-11-18  7:56   ` Geert Uytterhoeven
  0 siblings, 0 replies; 9+ messages in thread
From: Geert Uytterhoeven @ 2021-11-18  7:56 UTC (permalink / raw)
  To: Nick Terrell
  Cc: Nick Terrell, Linux Kernel Mailing List, Parisc List,
	Helge Deller, Linus Torvalds

Hi Nick,

On Wed, Nov 17, 2021 at 9:08 PM Nick Terrell <nickrterrell@gmail.com> wrote:
> From: Nick Terrell <terrelln@fb.com>
>
> After the update to zstd-1.4.10 passing -O3 is no longer necessary to
> get good performance from zstd. Using the default optimization level -O2
> is sufficient to get good performance.
>
> I've measured no significant change to compression speed, and a ~1%
> decompression speed loss, which is acceptable.
>
> This fixes the reported parisc -Wframe-larger-than=1536 errors [0]. The
> gcc-8-hppa-linux-gnu compiler performed very poorly with -O3, generating
> stacks that are ~3KB. With -O2 these same functions generate stacks in
> the < 100B, completely fixing the problem. Function size deltas are
> listed below:
>
> ZSTD_compressBlock_fast_extDict_generic: 3800 -> 68
> ZSTD_compressBlock_fast: 2216 -> 40
> ZSTD_compressBlock_fast_dictMatchState: 1848 ->  64
> ZSTD_compressBlock_doubleFast_extDict_generic: 3744 -> 76
> ZSTD_fillDoubleHashTable: 3252 -> 0
> ZSTD_compressBlock_doubleFast: 5856 -> 36
> ZSTD_compressBlock_doubleFast_dictMatchState: 5380 -> 84
> ZSTD_copmressBlock_lazy2: 2420 -> 72
>
> Additionally, this improves the reported code bloat [1]. With gcc-11
> bloat-o-meter shows an 80KB code size improvement:
>
> ```
> > ../scripts/bloat-o-meter vmlinux.old vmlinux
> add/remove: 31/8 grow/shrink: 24/155 up/down: 25734/-107924 (-82190)
> Total: Before=6418562, After=6336372, chg -1.28%
> ```
>
> Compared to before the zstd-1.4.10 update we see a total code size
> regression of 105KB, down from 374KB at v5.16-rc1:
>
> ```
> > ../scripts/bloat-o-meter vmlinux.old vmlinux
> add/remove: 292/62 grow/shrink: 56/88 up/down: 235009/-127487 (107522)
> Total: Before=6228850, After=6336372, chg +1.73%
> ```
>
> [0] https://lkml.org/lkml/2021/11/15/710
> [1] https://lkml.org/lkml/2021/11/14/189
>
> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
> Signed-off-by: Nick Terrell <terrelln@fb.com>

Impact on vmlinux for atari_defconfig:

    add/remove: 22/3 grow/shrink: 7/91 up/down: 3246/-35548 (-32302)

Impact on lib/zstd/zstd_compress.ko for atari_defconfig:

    add/remove: 63/5 grow/shrink: 23/197 up/down: 13410/-168604 (-155194)

Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/3] lib: zstd: Fix unused variable warning
  2021-11-18  7:50   ` Geert Uytterhoeven
@ 2021-11-18  8:21     ` Nick Terrell
  0 siblings, 0 replies; 9+ messages in thread
From: Nick Terrell @ 2021-11-18  8:21 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Nick Terrell, Linux Kernel Mailing List, Parisc List,
	Helge Deller, Linus Torvalds, kernel test robot



> On Nov 17, 2021, at 11:50 PM, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> 
> Hi Nick,
> 
> On Wed, Nov 17, 2021 at 9:08 PM Nick Terrell <nickrterrell@gmail.com> wrote:
>> The variable `litLengthSum` is only used by an `assert()`, so when
>> asserts are disabled the compiler doesn't see any usage and warns.
>> 
>> This issue is already fixed upstream by PR #2838 [0]. It was reported
>> by the Kernel test robot in [1].
>> 
>> Another approach would be to change zstd's disabled `assert()`
>> definition to use the argument in a disabled branch, instead of
>> ignoring the argument. I've avoided this approach because there are
>> some small changes necessary to get zstd to build, and I would
>> want to thoroughly re-test for performance, since that is slightly
>> changing the code in every function in zstd. It seems like a
>> trivial change, but some functions are pretty sensitive to small
>> changes. However, I think it is a valid approach that I would
>> like to see upstream take, so I've opened Issue #2868 to attempt
>> this upstream.
>> 
>> [0] https://github.com/facebook/zstd/pull/2838
>> [1] https://lore.kernel.org/linux-mm/202111120312.833wII4i-lkp@intel.com/T/
>> [2] https://github.com/facebook/zstd/issues/2868
>> 
>> Reported-by: kernel test robot <lkp@intel.com>
>> Signed-off-by: Nick Terrell <terrelln@fb.com>
> 
> Thanks for your patch!
> 
>> --- a/lib/zstd/compress/zstd_compress_superblock.c
>> +++ b/lib/zstd/compress/zstd_compress_superblock.c
>> @@ -411,6 +411,8 @@ static size_t ZSTD_seqDecompressedSize(seqStore_t const* seqStore, const seqDef*
>>     const seqDef* sp = sstart;
>>     size_t matchLengthSum = 0;
>>     size_t litLengthSum = 0;
>> +    /* Only used by assert(), suppress unused variable warnings in production. */
>> +    (void)litLengthSum;
> 
> The Linux way-to do this is to add __maybe_unused.
> But perhaps you don't want to introduce that in the upstream codebase.

Upstream zstd can't use __maybe_unused (or its own version) because
not every compiler supports it. So compilers that don't support it may emit
unused variable warnings. We like using attributes (like fallthrough) when
applicable, and when there is a portable fallback. In this case, there isn’t
really a way to write __maybe_unused that works portably.

One of the design goals of lib/zstd/ in the kernel is to not maintain any
long-term patches on top of upstream. That is so we can automatically
import upstream into the kernel, so it is easy to keep up to date. We
can accept short-term fixes, but all patches in lib/zstd/ in the kernel
need to be ported upstream before the next import.

That does incur non-linux style in lib/zstd/. However, we mitigate that
by providing a linux-style wrapper API in <linux/zstd.h>, so that the
callers of zstd don’t get “infected” with zstd’s upstream style.

Best,
Nick Terrell

>>     while (send-sp > 0) {
>>         ZSTD_sequenceLength const seqLen = ZSTD_getSequenceLength(seqStore, sp);
>>         litLengthSum += seqLen.litLength;
> 
> Gr{oetje,eeting}s,
> 
>                        Geert
> 
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                -- Linus Torvalds


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/3] lib: zstd: Don't inline functions in zstd_opt.c
  2021-11-18  7:52   ` Geert Uytterhoeven
@ 2021-11-18  8:22     ` Nick Terrell
  0 siblings, 0 replies; 9+ messages in thread
From: Nick Terrell @ 2021-11-18  8:22 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Nick Terrell, Linux Kernel Mailing List, Parisc List,
	Helge Deller, Linus Torvalds



> On Nov 17, 2021, at 11:52 PM, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> 
> Hi Nick,
> 
> On Wed, Nov 17, 2021 at 9:08 PM Nick Terrell <nickrterrell@gmail.com> wrote:
>> `zstd_opt.c` contains the match finder for the highest compression
>> levels. These levels are already very slow, and are unlikely to be used
>> in the kernel. If they are used, they shouldn't be used in latency
>> sensitive workloads, so slowing them down shouldn't be a big deal.
>> 
>> This saves 188 KB of the 288 KB regression reported by Geert Uytterhoeven [0].
>> I've also opened an issue upstream [1] so that we can properly tackle
>> the code size issue in `zstd_opt.c` for all users, and can hopefully
>> remove this hack in the next zstd version we import.
>> 
>> Bloat-o-meter output on x86-64:
>> 
>> ```
>>> ../scripts/bloat-o-meter vmlinux.old vmlinux
>> add/remove: 6/5 grow/shrink: 1/9 up/down: 16673/-209939 (-193266)
>> Function                                     old     new   delta
>> ZSTD_compressBlock_opt_generic.constprop       -    7559   +7559
>> ZSTD_insertBtAndGetAllMatches                  -    6304   +6304
>> ZSTD_insertBt1                                 -    1731   +1731
>> ZSTD_storeSeq                                  -     693    +693
>> ZSTD_BtGetAllMatches                           -     255    +255
>> ZSTD_updateRep                                 -     128    +128
>> ZSTD_updateTree                               96      99      +3
>> ZSTD_insertAndFindFirstIndexHash3             81       -     -81
>> ZSTD_setBasePrices.constprop                  98       -     -98
>> ZSTD_litLengthPrice.constprop                138       -    -138
>> ZSTD_count                                   362     181    -181
>> ZSTD_count_2segments                        1407     938    -469
>> ZSTD_insertBt1.constprop                    2689       -   -2689
>> ZSTD_compressBlock_btultra2                19990     423  -19567
>> ZSTD_compressBlock_btultra                 19633      15  -19618
>> ZSTD_initStats_ultra                       19825       -  -19825
>> ZSTD_compressBlock_btopt                   20374      12  -20362
>> ZSTD_compressBlock_btopt_extDict           29984      12  -29972
>> ZSTD_compressBlock_btultra_extDict         30718      15  -30703
>> ZSTD_compressBlock_btopt_dictMatchState    32689      12  -32677
>> ZSTD_compressBlock_btultra_dictMatchState   33574      15  -33559
>> Total: Before=6611828, After=6418562, chg -2.92%
>> ```
>> 
>> [0] https://lkml.org/lkml/2021/11/14/189
>> [1] https://github.com/facebook/zstd/issues/2862
>> 
>> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
>> Signed-off-by: Nick Terrell <terrelln@fb.com>
> 
> Thanks for your patch!
> 
> Impact on lib/zstd/zstd_compress.ko for atari_defconfig:
> 
>    add/remove: 5/4 grow/shrink: 1/9 up/down: 15392/-167214 (-151822)
> 
> Nice!
> 
> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
> 
>> --- a/lib/zstd/compress/zstd_opt.c
>> +++ b/lib/zstd/compress/zstd_opt.c
>> @@ -894,7 +906,7 @@ static void ZSTD_optLdm_processMatchCandidate(ZSTD_optLdm_t* optLdm, ZSTD_match_
>>              */
>>             U32 posOvershoot = currPosInBlock - optLdm->endPosInBlock;
>>             ZSTD_optLdm_skipRawSeqStoreBytes(&optLdm->seqStore, posOvershoot);
>> -        }
>> +        }
>>         ZSTD_opt_getNextMatchAndUpdateSeqStore(optLdm, currPosInBlock, remainingBytes);
>>     }
>>     ZSTD_optLdm_maybeAddMatch(matches, nbMatches, optLdm, currPosInBlock);
> 
> This change is unrelated. With that removed:
> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>

Thanks for the review! I will remove this change before submitting
the PR. Unless you have a strong preference, I won’t put up another
version of the patch set with just this change.

Best,
Nick Terrell

> Gr{oetje,eeting}s,
> 
>                        Geert
> 
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                -- Linus Torvalds


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-11-18  8:23 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-17 20:14 [PATCH v2 0/3] Fix stack usage on parisc & improve code size bloat Nick Terrell
2021-11-17 20:14 ` [PATCH v2 1/3] lib: zstd: Fix unused variable warning Nick Terrell
2021-11-18  7:50   ` Geert Uytterhoeven
2021-11-18  8:21     ` Nick Terrell
2021-11-17 20:14 ` [PATCH v2 2/3] lib: zstd: Don't inline functions in zstd_opt.c Nick Terrell
2021-11-18  7:52   ` Geert Uytterhoeven
2021-11-18  8:22     ` Nick Terrell
2021-11-17 20:14 ` [PATCH v2 3/3] lib: zstd: Don't add -O3 to cflags Nick Terrell
2021-11-18  7:56   ` Geert Uytterhoeven

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.