linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] Fix stack usage on parisc & improve code size bloat
@ 2021-11-17  1:49 Nick Terrell
  2021-11-17  1:49 ` [PATCH 1/3] lib: zstd: Fix unused variable warning Nick Terrell
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Nick Terrell @ 2021-11-17  1:49 UTC (permalink / raw)
  To: Nick Terrell
  Cc: linux-kernel, linux-parisc, Helge Deller, Geert Uytterhoeven,
	Linus Torvalds

From: Nick Terrell <terrelln@fb.com>

I'll be sending these patches to Linus through my tree. Just submitting them
for comment before I do so. I'm somewhat unsure of the correct workflow for
patches I submit myself, so let me know if there is something different I
should do.

This patch set contains 3 commits:
1. Fixes a minor unused variable warning reported by Kernel test robot [0].
2. Improves the reported code bloat (-88KB / 374KB) [1] by outlining
   some functions that are unlikely to be used in performance sensitive
   workloads.
3. Fixes the reported excess stack usage on parisc [2] by removing -O3
   from zstd's compilation flags. -O3 triggered bugs in the hppa-linux-gnu
   gcc-8 compiler. -O2 performance is acceptable: neutral compression,
   about -1% decompression speed. We also reduce code bloat
   (-105KB / 374KB).

After this commit our code bloat is cut from 374KB to 105KB with gcc-11.
If we wanted to cut the remaining 105KB we'd likely have to trade
signicant performance, so I want to say that this is enough for now.

We should be able to get further gains without sacrificing speed, but
that will take some significant optimization effort, and isn't suitable
for a quick fix. I've opened an upstream issue [3] to track the code size,
and try to avoid future regressions, and improve it in the long term.

[0] https://lore.kernel.org/linux-mm/202111120312.833wII4i-lkp@intel.com/T/
[1] https://lkml.org/lkml/2021/11/15/710
[2] https://lkml.org/lkml/2021/11/14/189
[3] https://github.com/facebook/zstd/issues/2867

Signed-off-by: Nick Terrell <terrelln@fb.com>

Nick Terrell (3):
  lib: zstd: Fix unused variable warning
  lib: zstd: Don't inline functions in zstd_opt.c
  lib: zstd: Don't add -O3 to cflags

 lib/zstd/Makefile                            |  2 --
 lib/zstd/common/compiler.h                   |  7 +++++++
 lib/zstd/compress/zstd_compress_superblock.c |  1 +
 lib/zstd/compress/zstd_opt.c                 | 14 +++++++++++++-
 4 files changed, 21 insertions(+), 3 deletions(-)

--
2.33.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/3] lib: zstd: Fix unused variable warning
  2021-11-17  1:49 [PATCH 0/3] Fix stack usage on parisc & improve code size bloat Nick Terrell
@ 2021-11-17  1:49 ` Nick Terrell
  2021-11-17 16:45   ` Linus Torvalds
  2021-11-17  1:49 ` [PATCH 2/3] lib: zstd: Don't inline functions in zstd_opt.c Nick Terrell
  2021-11-17  1:49 ` [PATCH 3/3] lib: zstd: Don't add -O3 to cflags Nick Terrell
  2 siblings, 1 reply; 6+ messages in thread
From: Nick Terrell @ 2021-11-17  1:49 UTC (permalink / raw)
  To: Nick Terrell
  Cc: linux-kernel, linux-parisc, Helge Deller, Geert Uytterhoeven,
	Linus Torvalds, kernel test robot

From: Nick Terrell <terrelln@fb.com>

Backport the fix from upstream PR #2838 [0]. Found by the Kernel test
robot in [1].

[0] https://github.com/facebook/zstd/pull/2838
[1] https://lore.kernel.org/linux-mm/202111120312.833wII4i-lkp@intel.com/T/

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Nick Terrell <terrelln@fb.com>
---
 lib/zstd/compress/zstd_compress_superblock.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/zstd/compress/zstd_compress_superblock.c b/lib/zstd/compress/zstd_compress_superblock.c
index ee03e0aedb03..a6a8e9a2aa0e 100644
--- a/lib/zstd/compress/zstd_compress_superblock.c
+++ b/lib/zstd/compress/zstd_compress_superblock.c
@@ -411,6 +411,7 @@ static size_t ZSTD_seqDecompressedSize(seqStore_t const* seqStore, const seqDef*
     const seqDef* sp = sstart;
     size_t matchLengthSum = 0;
     size_t litLengthSum = 0;
+    (void)litLengthSum;
     while (send-sp > 0) {
         ZSTD_sequenceLength const seqLen = ZSTD_getSequenceLength(seqStore, sp);
         litLengthSum += seqLen.litLength;
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/3] lib: zstd: Don't inline functions in zstd_opt.c
  2021-11-17  1:49 [PATCH 0/3] Fix stack usage on parisc & improve code size bloat Nick Terrell
  2021-11-17  1:49 ` [PATCH 1/3] lib: zstd: Fix unused variable warning Nick Terrell
@ 2021-11-17  1:49 ` Nick Terrell
  2021-11-17  1:49 ` [PATCH 3/3] lib: zstd: Don't add -O3 to cflags Nick Terrell
  2 siblings, 0 replies; 6+ messages in thread
From: Nick Terrell @ 2021-11-17  1:49 UTC (permalink / raw)
  To: Nick Terrell
  Cc: linux-kernel, linux-parisc, Helge Deller, Geert Uytterhoeven,
	Linus Torvalds

From: Nick Terrell <terrelln@fb.com>

`zstd_opt.c` contains the match finder for the highest compression
levels. These levels are already very slow, and are unlikely to be used
in the kernel. If they are used, they shouldn't be used in latency
sensitive workloads, so slowing them down shouldn't be a big deal.

This saves 188 KB of the 288 KB regression reported by Geert Uytterhoeven [0].
I've also opened an issue upstream [1] so that we can properly tackle
the code size issue in `zstd_opt.c` for all users, and can hopefully
remove this hack in the next zstd version we import.

Bloat-o-meter output on x86-64:

```
> ../scripts/bloat-o-meter vmlinux.old vmlinux
add/remove: 6/5 grow/shrink: 1/9 up/down: 16673/-209939 (-193266)
Function                                     old     new   delta
ZSTD_compressBlock_opt_generic.constprop       -    7559   +7559
ZSTD_insertBtAndGetAllMatches                  -    6304   +6304
ZSTD_insertBt1                                 -    1731   +1731
ZSTD_storeSeq                                  -     693    +693
ZSTD_BtGetAllMatches                           -     255    +255
ZSTD_updateRep                                 -     128    +128
ZSTD_updateTree                               96      99      +3
ZSTD_insertAndFindFirstIndexHash3             81       -     -81
ZSTD_setBasePrices.constprop                  98       -     -98
ZSTD_litLengthPrice.constprop                138       -    -138
ZSTD_count                                   362     181    -181
ZSTD_count_2segments                        1407     938    -469
ZSTD_insertBt1.constprop                    2689       -   -2689
ZSTD_compressBlock_btultra2                19990     423  -19567
ZSTD_compressBlock_btultra                 19633      15  -19618
ZSTD_initStats_ultra                       19825       -  -19825
ZSTD_compressBlock_btopt                   20374      12  -20362
ZSTD_compressBlock_btopt_extDict           29984      12  -29972
ZSTD_compressBlock_btultra_extDict         30718      15  -30703
ZSTD_compressBlock_btopt_dictMatchState    32689      12  -32677
ZSTD_compressBlock_btultra_dictMatchState   33574      15  -33559
Total: Before=6611828, After=6418562, chg -2.92%
```

[0] https://lkml.org/lkml/2021/11/14/189
[1] https://github.com/facebook/zstd/issues/2862

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Nick Terrell <terrelln@fb.com>
---
 lib/zstd/common/compiler.h   |  7 +++++++
 lib/zstd/compress/zstd_opt.c | 14 +++++++++++++-
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/lib/zstd/common/compiler.h b/lib/zstd/common/compiler.h
index a1a051e4bce6..f5a9c70a228a 100644
--- a/lib/zstd/common/compiler.h
+++ b/lib/zstd/common/compiler.h
@@ -16,6 +16,7 @@
 *********************************************************/
 /* force inlining */
 
+#if !defined(ZSTD_NO_INLINE)
 #if (defined(__GNUC__) && !defined(__STRICT_ANSI__)) || defined(__cplusplus) || defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L   /* C99 */
 #  define INLINE_KEYWORD inline
 #else
@@ -24,6 +25,12 @@
 
 #define FORCE_INLINE_ATTR __attribute__((always_inline))
 
+#else
+
+#define INLINE_KEYWORD
+#define FORCE_INLINE_ATTR
+
+#endif
 
 /*
   On MSVC qsort requires that functions passed into it use the __cdecl calling conversion(CC).
diff --git a/lib/zstd/compress/zstd_opt.c b/lib/zstd/compress/zstd_opt.c
index 04337050fe9a..09483f518dc3 100644
--- a/lib/zstd/compress/zstd_opt.c
+++ b/lib/zstd/compress/zstd_opt.c
@@ -8,6 +8,18 @@
  * You may select, at your option, one of the above-listed licenses.
  */
 
+/*
+ * Disable inlining for the optimal parser for the kernel build.
+ * It is unlikely to be used in the kernel, and where it is used
+ * latency shouldn't matter because it is very slow to begin with.
+ * We prefer a ~180KB binary size win over faster optimal parsing.
+ *
+ * TODO(https://github.com/facebook/zstd/issues/2862):
+ * Improve the code size of the optimal parser in general, so we
+ * don't need this hack for the kernel build.
+ */
+#define ZSTD_NO_INLINE 1
+
 #include "zstd_compress_internal.h"
 #include "hist.h"
 #include "zstd_opt.h"
@@ -894,7 +906,7 @@ static void ZSTD_optLdm_processMatchCandidate(ZSTD_optLdm_t* optLdm, ZSTD_match_
              */
             U32 posOvershoot = currPosInBlock - optLdm->endPosInBlock;
             ZSTD_optLdm_skipRawSeqStoreBytes(&optLdm->seqStore, posOvershoot);
-        } 
+        }
         ZSTD_opt_getNextMatchAndUpdateSeqStore(optLdm, currPosInBlock, remainingBytes);
     }
     ZSTD_optLdm_maybeAddMatch(matches, nbMatches, optLdm, currPosInBlock);
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 3/3] lib: zstd: Don't add -O3 to cflags
  2021-11-17  1:49 [PATCH 0/3] Fix stack usage on parisc & improve code size bloat Nick Terrell
  2021-11-17  1:49 ` [PATCH 1/3] lib: zstd: Fix unused variable warning Nick Terrell
  2021-11-17  1:49 ` [PATCH 2/3] lib: zstd: Don't inline functions in zstd_opt.c Nick Terrell
@ 2021-11-17  1:49 ` Nick Terrell
  2 siblings, 0 replies; 6+ messages in thread
From: Nick Terrell @ 2021-11-17  1:49 UTC (permalink / raw)
  To: Nick Terrell
  Cc: linux-kernel, linux-parisc, Helge Deller, Geert Uytterhoeven,
	Linus Torvalds

From: Nick Terrell <terrelln@fb.com>

After the update to zstd-1.4.10 passing -O3 is no longer necessary to
get good performance from zstd. Using the default optimization level -O2
is sufficient to get good performance.

I've measured no significant change to compression speed, and a ~1%
decompression speed loss, which is acceptable.

This fixes the reported parisc -Wframe-larger-than=1536 errors [0]. The
gcc-8-hppa-linux-gnu compiler performed very poorly with -O3, generating
stacks that are ~3KB. With -O2 these same functions generate stacks in
the < 100B, completely fixing the problem. Function size deltas are
listed below:

ZSTD_compressBlock_fast_extDict_generic: 3800 -> 68
ZSTD_compressBlock_fast: 2216 -> 40
ZSTD_compressBlock_fast_dictMatchState: 1848 ->  64
ZSTD_compressBlock_doubleFast_extDict_generic: 3744 -> 76
ZSTD_fillDoubleHashTable: 3252 -> 0
ZSTD_compressBlock_doubleFast: 5856 -> 36
ZSTD_compressBlock_doubleFast_dictMatchState: 5380 -> 84
ZSTD_copmressBlock_lazy2: 2420 -> 72

Additionally, this improves the reported code bloat [1]. With gcc-11
bloat-o-meter shows an 80KB code size improvement:

```
> ../scripts/bloat-o-meter vmlinux.old vmlinux
add/remove: 31/8 grow/shrink: 24/155 up/down: 25734/-107924 (-82190)
Total: Before=6418562, After=6336372, chg -1.28%
```

Compared to before the zstd-1.4.10 update we see a total code size
regression of 105KB, down from 374KB at v5.16-rc1:

```
> ../scripts/bloat-o-meter vmlinux.old vmlinux
add/remove: 292/62 grow/shrink: 56/88 up/down: 235009/-127487 (107522)
Total: Before=6228850, After=6336372, chg +1.73%
```

[0] https://lkml.org/lkml/2021/11/15/710
[1] https://lkml.org/lkml/2021/11/14/189

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Nick Terrell <terrelln@fb.com>
---
 lib/zstd/Makefile | 2 --
 1 file changed, 2 deletions(-)

diff --git a/lib/zstd/Makefile b/lib/zstd/Makefile
index 65218ec5b8f2..fc45339fc3a3 100644
--- a/lib/zstd/Makefile
+++ b/lib/zstd/Makefile
@@ -11,8 +11,6 @@
 obj-$(CONFIG_ZSTD_COMPRESS) += zstd_compress.o
 obj-$(CONFIG_ZSTD_DECOMPRESS) += zstd_decompress.o
 
-ccflags-y += -O3
-
 zstd_compress-y := \
 		zstd_compress_module.o \
 		common/debug.o \
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/3] lib: zstd: Fix unused variable warning
  2021-11-17  1:49 ` [PATCH 1/3] lib: zstd: Fix unused variable warning Nick Terrell
@ 2021-11-17 16:45   ` Linus Torvalds
  2021-11-17 17:02     ` Nick Terrell
  0 siblings, 1 reply; 6+ messages in thread
From: Linus Torvalds @ 2021-11-17 16:45 UTC (permalink / raw)
  To: Nick Terrell
  Cc: Nick Terrell, Linux Kernel Mailing List, linux-parisc,
	Helge Deller, Geert Uytterhoeven, kernel test robot

On Tue, Nov 16, 2021 at 5:43 PM Nick Terrell <nickrterrell@gmail.com> wrote:
>
> From: Nick Terrell <terrelln@fb.com>
>
> Backport the fix from upstream PR #2838 [0]. Found by the Kernel test
> robot in [1].

Ugh. Mind having a better commit message?

This just tells you that it's a backport. It doesn't actually talk
about what it fixes.

Yes, the summary line says "Fix unused variable warning", but it
doesn't talk about why that variable is unused and why it's not
removed entirely.

And it's not obvious in the diff either, because the context isn't
sufficiently large.

So a comment along the lines of "the variable is only used by an
'assert()', so when asserts are disabled the compiler sees no use at
all" or similar would be appreciated.

Of course, the alternative would be to make the backup definition of
'assert()' actually evaluate the argument. That's not what the
standard assert() is supposed to do, but whatever, and the zstd use
doesn't care.

So using

    #define assert(condition) ((void)(condition))

instead would also fix the warning at least in kernel use (but not
necessarily outside the kernel - the standard C 'assert.h' is just
evil).

                  Linus

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/3] lib: zstd: Fix unused variable warning
  2021-11-17 16:45   ` Linus Torvalds
@ 2021-11-17 17:02     ` Nick Terrell
  0 siblings, 0 replies; 6+ messages in thread
From: Nick Terrell @ 2021-11-17 17:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Nick Terrell, Linux Kernel Mailing List, linux-parisc,
	Helge Deller, Geert Uytterhoeven, kernel test robot



> On Nov 17, 2021, at 8:45 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> On Tue, Nov 16, 2021 at 5:43 PM Nick Terrell <nickrterrell@gmail.com> wrote:
>> 
>> From: Nick Terrell <terrelln@fb.com>
>> 
>> Backport the fix from upstream PR #2838 [0]. Found by the Kernel test
>> robot in [1].
> 
> Ugh. Mind having a better commit message?
> 
> This just tells you that it's a backport. It doesn't actually talk
> about what it fixes.
>
> Yes, the summary line says "Fix unused variable warning", but it
> doesn't talk about why that variable is unused and why it's not
> removed entirely.
> 
> And it's not obvious in the diff either, because the context isn't
> sufficiently large.
> 
> So a comment along the lines of "the variable is only used by an
> 'assert()', so when asserts are disabled the compiler sees no use at
> all" or similar would be appreciated.

Yeah of course, thanks for the review! I’ll put up a fix shortly.

> Of course, the alternative would be to make the backup definition of
> 'assert()' actually evaluate the argument. That's not what the
> standard assert() is supposed to do, but whatever, and the zstd use
> doesn't care.
> 
> So using
> 
>    #define assert(condition) ((void)(condition))
> 
> instead would also fix the warning at least in kernel use (but not
> necessarily outside the kernel - the standard C 'assert.h' is just
> evil).

I totally agree that the standard C assert is not ideal, however I’d
be hesitant to do that in a quick fix. Just because zstd assumes
that asserts are removed from production builds, so there are at
least a few places where it makes potentially expensive function
calls.

I may be able to try something like:

  #define assert(condition) do { if (0) { (void)(condition) } } while (0)

But, the code in the asserts hasn’t yet been tested to build. So we
may run into build issues, like the presence of division, or
dependency on libc.

I’ll take a stab at it, but if it ends up requiring too large of a change,
I will tend to continue with the current approach.

Best,
Nick Terrell

>                  Linus


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-11-17 17:02 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-17  1:49 [PATCH 0/3] Fix stack usage on parisc & improve code size bloat Nick Terrell
2021-11-17  1:49 ` [PATCH 1/3] lib: zstd: Fix unused variable warning Nick Terrell
2021-11-17 16:45   ` Linus Torvalds
2021-11-17 17:02     ` Nick Terrell
2021-11-17  1:49 ` [PATCH 2/3] lib: zstd: Don't inline functions in zstd_opt.c Nick Terrell
2021-11-17  1:49 ` [PATCH 3/3] lib: zstd: Don't add -O3 to cflags Nick Terrell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).