linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Heads up: gcc miscompiling initramfs zlib decompression code at -O3
@ 2021-04-30 20:46 Vineet Gupta
  2021-04-30 22:06 ` Linus Torvalds
  2021-05-03 17:41 ` Linus Torvalds
  0 siblings, 2 replies; 8+ messages in thread
From: Vineet Gupta @ 2021-04-30 20:46 UTC (permalink / raw)
  To: Arnd Bergmann, Peter Zijlstra, Masahiro Yamada, Linus Torvalds,
	Jann Horn
  Cc: lkml, arcml, linux-arch

Hi,

I've hit a mainline gcc 10.2 (also gcc 9.3) bug which triggers at -O3 
causing wrong codegen.

  Config needs to have initramfs + gzip compressed.

	CONFIG_HAVE_KERNEL_GZIP=y
	CONFIG_KERNEL_GZIP=y
	CONFIG_DECOMPRESS_GZIP=y
	CONFIG_INITRAMFS_COMPRESSION_GZIP=y

  lib/zlib_inflate/inffast.c

     if (dist > 2) {
	unsigned short *sfrom;

	sfrom = (unsigned short *)(from);
	loops = len >> 1;
	do
	    *sout++ = *sfrom++;
              ^^^^^^    ^^^^^^^^
	while (--loops);
	out = (unsigned char *)sout;
	from = (unsigned char *)sfrom;
     }
     ...

The gist of issue is that despite use of unsigned short pointers, gcc is 
generating wider load/stores (8-byte LDD/STD on arcv2 and 16-byte on 
aarch64) causing extraneous bytes to copied into inflated gzip binaries
manifesting later as corrupted fragments in the binaries.

I've opened a gcc bug at:
   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363

The workaround is to build lib/zlib_inflate/inffast.c with -O2, although 
I reckon not many arches build with -O3 as default. I'll be proposing an 
ARC only patch to build this file with -O2, unless people think it needs 
to be generalized.

Also problem originally seen on 5.6 kernel, although I confirm it shows 
on latest mainline as well.

Unraveling this pretty fun, gory details for those interested at:

 
https://github.com/foss-for-synopsys-dwc-arc-processors/toolchain/issues/372


Thx,
-Vineet

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Heads up: gcc miscompiling initramfs zlib decompression code at -O3
  2021-04-30 20:46 Heads up: gcc miscompiling initramfs zlib decompression code at -O3 Vineet Gupta
@ 2021-04-30 22:06 ` Linus Torvalds
  2021-04-30 22:44   ` Vineet Gupta
  2021-05-01 21:03   ` Oleksandr Natalenko
  2021-05-03 17:41 ` Linus Torvalds
  1 sibling, 2 replies; 8+ messages in thread
From: Linus Torvalds @ 2021-04-30 22:06 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: Arnd Bergmann, Peter Zijlstra, Masahiro Yamada, Jann Horn, lkml,
	arcml, linux-arch

On Fri, Apr 30, 2021 at 1:46 PM Vineet Gupta <Vineet.Gupta1@synopsys.com> wrote:
>
> I've hit a mainline gcc 10.2 (also gcc 9.3) bug which triggers at -O3
> causing wrong codegen.

I'd be more than happy to just disable CC_OPTIMIZE_FOR_PERFORMANCE_O3 entirely.

The advantages are very questionable - with a lot of the optimizations
at O3 being about loops, something which the kernel to a close
approximation doesn't have.

Most kernel loops are "count on one hand" iterations, and loop
optimizations generally just make things worse.

And we've had problems with -O3 before, because not only are the
optimizations a bit esoteric, they are often relatively untested. If
you look around at various projects (outside the kernel), -O2 is
generally the "default".

And that's entirely ignoring the gcc history - where -O3 has often
been very buggy indeed. It's gotten much better, but I just don't see
the upside of using -O3.

In fact, it looks like we already have that

        depends on ARC

for -O3, exactly because nobody really wants to use this.

So this bug seems to be entirely ARC-specific, in that only ARC can
use -O3 for the kernel already.

             Linus

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Heads up: gcc miscompiling initramfs zlib decompression code at -O3
  2021-04-30 22:06 ` Linus Torvalds
@ 2021-04-30 22:44   ` Vineet Gupta
  2021-04-30 22:52     ` Linus Torvalds
  2021-05-01 21:03   ` Oleksandr Natalenko
  1 sibling, 1 reply; 8+ messages in thread
From: Vineet Gupta @ 2021-04-30 22:44 UTC (permalink / raw)
  To: Linus Torvalds, Vineet Gupta
  Cc: Arnd Bergmann, Peter Zijlstra, Masahiro Yamada, Jann Horn, lkml,
	arcml, linux-arch

On 4/30/21 3:06 PM, Linus Torvalds wrote:
> On Fri, Apr 30, 2021 at 1:46 PM Vineet Gupta <Vineet.Gupta1@synopsys.com> wrote:
>>
>> I've hit a mainline gcc 10.2 (also gcc 9.3) bug which triggers at -O3
>> causing wrong codegen.
> 
> I'd be more than happy to just disable CC_OPTIMIZE_FOR_PERFORMANCE_O3 entirely.
> 
> The advantages are very questionable - with a lot of the optimizations
> at O3 being about loops, something which the kernel to a close
> approximation doesn't have.
> 
> Most kernel loops are "count on one hand" iterations, and loop
> optimizations generally just make things worse.
> 
> And we've had problems with -O3 before, because not only are the
> optimizations a bit esoteric, they are often relatively untested. If
> you look around at various projects (outside the kernel), -O2 is
> generally the "default".

I agree that -O2 is default, but we've had -O3 default for ARC kernel 
forever, since last decade seriously. The reason I turned it on back 
then was upside of 10% performance improvement on select LMBench numbers 
on hardware at the time which for a rookie kernel hacker was yay momemt. 
I can revisit this and see if that is still true.

> And that's entirely ignoring the gcc history - where -O3 has often
> been very buggy indeed. It's gotten much better, but I just don't see
> the upside of using -O3.
> 
> In fact, it looks like we already have that
> 
>          depends on ARC
> 
> for -O3, exactly because nobody really wants to use this.

Either that or that people are not brave enough ;-) Perhaps gcc folks 
would like me to retain this as a testing ground if nothing else.

> So this bug seems to be entirely ARC-specific, in that only ARC can
> use -O3 for the kernel already.

kid in me complaining "that's not fair !"

-Vineet

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Heads up: gcc miscompiling initramfs zlib decompression code at -O3
  2021-04-30 22:44   ` Vineet Gupta
@ 2021-04-30 22:52     ` Linus Torvalds
  0 siblings, 0 replies; 8+ messages in thread
From: Linus Torvalds @ 2021-04-30 22:52 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: Arnd Bergmann, Peter Zijlstra, Masahiro Yamada, Jann Horn, lkml,
	arcml, linux-arch

On Fri, Apr 30, 2021 at 3:44 PM Vineet Gupta <Vineet.Gupta1@synopsys.com> wrote:
>
> I agree that -O2 is default, but we've had -O3 default for ARC kernel
> forever, since last decade seriously. The reason I turned it on back
> then was upside of 10% performance improvement on select LMBench numbers
> on hardware at the time which for a rookie kernel hacker was yay momemt.
> I can revisit this and see if that is still true.

It would be interesting if you can actually show 10% improvement, and
also pinpoint things in the profile.

I (long long long ago) actually used -O6 for kernel builds as a "give
me everything you have" (I don't think gcc has actually ever done
anything more than O3, but whatever).

In fact, you can find

   Q: What Does gcc -O6 Do?

in some kernel FAQ's from those historical times.

We eventually gave up on it, because it just generated bigger and
slower code, and people got very tired of all the compiler bugs.

               Linus

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Heads up: gcc miscompiling initramfs zlib decompression code at -O3
  2021-04-30 22:06 ` Linus Torvalds
  2021-04-30 22:44   ` Vineet Gupta
@ 2021-05-01 21:03   ` Oleksandr Natalenko
  1 sibling, 0 replies; 8+ messages in thread
From: Oleksandr Natalenko @ 2021-05-01 21:03 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Vineet Gupta, Arnd Bergmann, Peter Zijlstra, Masahiro Yamada,
	Jann Horn, lkml, arcml, linux-arch

Hello.

On Fri, Apr 30, 2021 at 03:06:30PM -0700, Linus Torvalds wrote:
> On Fri, Apr 30, 2021 at 1:46 PM Vineet Gupta <Vineet.Gupta1@synopsys.com> wrote:
> >
> > I've hit a mainline gcc 10.2 (also gcc 9.3) bug which triggers at -O3
> > causing wrong codegen.
> 
> I'd be more than happy to just disable CC_OPTIMIZE_FOR_PERFORMANCE_O3 entirely.

FWIW, we used to find real bugs using -O3 in the past [1].

> The advantages are very questionable - with a lot of the optimizations
> at O3 being about loops, something which the kernel to a close
> approximation doesn't have.
> 
> Most kernel loops are "count on one hand" iterations, and loop
> optimizations generally just make things worse.
> 
> And we've had problems with -O3 before, because not only are the
> optimizations a bit esoteric, they are often relatively untested. If
> you look around at various projects (outside the kernel), -O2 is
> generally the "default".
> 
> And that's entirely ignoring the gcc history - where -O3 has often
> been very buggy indeed. It's gotten much better, but I just don't see
> the upside of using -O3.
> 
> In fact, it looks like we already have that
> 
>         depends on ARC
> 
> for -O3, exactly because nobody really wants to use this.
> 
> So this bug seems to be entirely ARC-specific, in that only ARC can
> use -O3 for the kernel already.
> 
>              Linus

[1] https://lore.kernel.org/lkml/673b885183fb64f1cbb3ed2387524077@natalenko.name/

-- 
  Oleksandr Natalenko (post-factum)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Heads up: gcc miscompiling initramfs zlib decompression code at -O3
  2021-04-30 20:46 Heads up: gcc miscompiling initramfs zlib decompression code at -O3 Vineet Gupta
  2021-04-30 22:06 ` Linus Torvalds
@ 2021-05-03 17:41 ` Linus Torvalds
  2021-05-03 19:18   ` Vineet Gupta
  2021-05-05 17:44   ` Heiko Carstens
  1 sibling, 2 replies; 8+ messages in thread
From: Linus Torvalds @ 2021-05-03 17:41 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: Arnd Bergmann, Peter Zijlstra, Masahiro Yamada, Jann Horn, lkml,
	arcml, linux-arch

[-- Attachment #1: Type: text/plain, Size: 1923 bytes --]

On Fri, Apr 30, 2021 at 1:46 PM Vineet Gupta <Vineet.Gupta1@synopsys.com> wrote:
>
> I've hit a mainline gcc 10.2 (also gcc 9.3) bug which triggers at -O3
> causing wrong codegen.

So it does seem to be a gcc bug or at least mis-feature where gcc gets
the aliasing decision wrong when vectorizing the code.

That said, the fact that gcc even tries to vectorize the code shows
how pointless it was for us to (years ago) try to make it use 16-bit
accesses in the first place.

So can you try this patch that basically reverts some of those
kernel-specific changes to zlib's inffast.c - and does so by just
upgrading it to a newer version from a more modern zlib (which in this
case still means "from 2017", but that's the most recent version there
is).

This is a fairly quick hack, and I really tried to keep it to just
inffast.c and inftrees.c with a few minimal fixups to inflate.c
itself.

Most of the changes are for naming (which seems to be at least partly
for C++ namespace reasons, ie "this" -> "here"), but there's some
cleanup wrt maximum table sizes etc too.

NOTE! I have not tested this patch in any other way than "it compiles
with allmodconfig". The diff looks reasonable to me, but that's all
I'm really going to say.

Does this avoid the gcc -O3 problem for you?

It would be lovely if somebody also took a look at some of the other
zlib code, like inflate.c itself. But some of that code has rather
subtle changes for things like s390 hardware support, and we have
thihngs like our fallthrough rules etc, so it gets a bit hairier.

Which is why I did just this fairly minimal part.

Note that the commit message has a "Not-yet-signed-off-by:" from me.
That's purely because I wanted it to be obvious that this needs a lot
more testing - I'm not comfy with this patch unless somebody takes it
upon themselves to actually test it under different loads (and
different architectures).

                  Linus

[-- Attachment #2: 0001-Update-zlib-inffast-code-to-more-modern-zlib.patch --]
[-- Type: text/x-patch, Size: 35111 bytes --]

From 3ea219a14bd6f36dfed5e461067349ab86a02935 Mon Sep 17 00:00:00 2001
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Mon, 3 May 2021 10:27:34 -0700
Subject: [PATCH] Update zlib inffast code to more modern zlib

This updates part of our zlib inflate code to the most current zlib
sources as of zlib 1.2.11 (released by Mark Adler back in 2017).

This is only a partial update to a more modern zlib base, mainly to get
the current 'inffast' and 'inftrees' code.  In the process, this
effectively reverts some of our local kernel optimizations that have
caused problems.

In particular, see

   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363

which is (I argue) a gcc bug or at least mis-feature, but it does show
how the 16-bit access optimizations that do not exist in upstream zlib
have caused problems.

The effectively reverted commits are

  51ea3f6a4571 ("inflate_fast: sout is already a short so ptr arith was off by one.")
  e69eae65523b ("zlib: make new optimized inflate endian independent")
  6846ee5ca68d ("zlib: Fix build of powerpc boot wrapper")
  ac4c2a3bbe5d ("zlib: optimize inffast when copying direct from output")

that do not get carried forward by this update to a more recent version
of intffast.c.

Note that I've tried to keep the changes minimal wrt the original zlib
sources, but there are some changes:

 - Removal of some zlib-internals:
    - no more ASMINF
    - ZLIB_INTERNAL
    - no Tracevv() tracing calls

 - Removal of portability crud for ancient compilers:
    - "FAR" pointer markings
    - turned "z_const" into just "const"
    - convert function prototypes to "modern" ANSI C rather than K&R syntax

 - removed unused "gz_headerp" from inflate_state

Also note that this does change inflate.c minimally: in the years since
we syncronized with upstream zlib, the '->write' member of 'struct
inflate_state' has been renamed to '->wnext'.  But this does *not*
update all of inflate.c to a newer version, because we do have alot of
local changes for static workspace allocations, and even special s390
hardware support.

This will need lots of testing.

Not-yet-signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 lib/zlib_inflate/inffast.c  | 162 ++++++++++++++++--------------------
 lib/zlib_inflate/inflate.c  |  25 +++---
 lib/zlib_inflate/inflate.h  |  66 ++++++++-------
 lib/zlib_inflate/inftrees.c | 107 +++++++++---------------
 lib/zlib_inflate/inftrees.h |  31 +++----
 5 files changed, 180 insertions(+), 211 deletions(-)

diff --git a/lib/zlib_inflate/inffast.c b/lib/zlib_inflate/inffast.c
index ed1f3df27260..e3d6fb10d198 100644
--- a/lib/zlib_inflate/inffast.c
+++ b/lib/zlib_inflate/inffast.c
@@ -1,5 +1,5 @@
 /* inffast.c -- fast decoding
- * Copyright (C) 1995-2004 Mark Adler
+ * Copyright (C) 1995-2017 Mark Adler
  * For conditions of distribution and use, see copyright notice in zlib.h
  */
 
@@ -8,24 +8,16 @@
 #include "inflate.h"
 #include "inffast.h"
 
-#ifndef ASMINF
-
-union uu {
-	unsigned short us;
-	unsigned char b[2];
-};
-
-/* Endian independed version */
-static inline unsigned short
-get_unaligned16(const unsigned short *p)
-{
-	union uu  mm;
-	unsigned char *b = (unsigned char *)p;
-
-	mm.b[0] = b[0];
-	mm.b[1] = b[1];
-	return mm.us;
-}
+/*
+ * Imported from zlib 1.2.11 with minimal changes:
+ *  - removed ASMINF
+ *  - removed ZLIB_INTERNAL
+ *  - removed "FAR" pointer markings
+ *  - removed Tracevv() tracing calls
+ *  - turned "z_const" into just "const"
+ *  - removed "gz_headerp" from inflate_state
+ *  - converted to "modern" ANSI C rather than K&R syntax
+ */
 
 /*
    Decode literal, length, and distance codes and write out the resulting
@@ -62,13 +54,13 @@ get_unaligned16(const unsigned short *p)
       requires strm->avail_out >= 258 for each loop to avoid checking for
       output space.
 
-    - @start:	inflate()'s starting value for strm->avail_out
+    - @start:  inflate()'s starting value for strm->avail_out
  */
 void inflate_fast(z_streamp strm, unsigned start)
 {
     struct inflate_state *state;
     const unsigned char *in;    /* local strm->next_in */
-    const unsigned char *last;  /* while in < last, enough input available */
+    const unsigned char *last;  /* have enough input while in < last */
     unsigned char *out;         /* local strm->next_out */
     unsigned char *beg;         /* inflate()'s initial strm->next_out */
     unsigned char *end;         /* while out < end, enough space available */
@@ -77,7 +69,7 @@ void inflate_fast(z_streamp strm, unsigned start)
 #endif
     unsigned wsize;             /* window size or zero if not using window */
     unsigned whave;             /* valid bytes in the window */
-    unsigned write;             /* window write index */
+    unsigned wnext;             /* window write index */
     unsigned char *window;      /* allocated sliding window, if wsize != 0 */
     unsigned long hold;         /* local strm->hold */
     unsigned bits;              /* local strm->bits */
@@ -85,7 +77,7 @@ void inflate_fast(z_streamp strm, unsigned start)
     code const *dcode;          /* local strm->distcode */
     unsigned lmask;             /* mask for first level of length codes */
     unsigned dmask;             /* mask for first level of distance codes */
-    code this;                  /* retrieved table entry */
+    code here;                  /* retrieved table entry */
     unsigned op;                /* code bits, operation, extra bits, or */
                                 /*  window position, window bytes to copy */
     unsigned len;               /* match length, unused bytes */
@@ -104,7 +96,7 @@ void inflate_fast(z_streamp strm, unsigned start)
 #endif
     wsize = state->wsize;
     whave = state->whave;
-    write = state->write;
+    wnext = state->wnext;
     window = state->window;
     hold = state->hold;
     bits = state->bits;
@@ -122,17 +114,17 @@ void inflate_fast(z_streamp strm, unsigned start)
             hold += (unsigned long)(*in++) << bits;
             bits += 8;
         }
-        this = lcode[hold & lmask];
+        here = lcode[hold & lmask];
       dolen:
-        op = (unsigned)(this.bits);
+        op = (unsigned)(here.bits);
         hold >>= op;
         bits -= op;
-        op = (unsigned)(this.op);
+        op = (unsigned)(here.op);
         if (op == 0) {                          /* literal */
-            *out++ = (unsigned char)(this.val);
+            *out++ = (unsigned char)(here.val);
         }
         else if (op & 16) {                     /* length base */
-            len = (unsigned)(this.val);
+            len = (unsigned)(here.val);
             op &= 15;                           /* number of extra bits */
             if (op) {
                 if (bits < op) {
@@ -149,14 +141,14 @@ void inflate_fast(z_streamp strm, unsigned start)
                 hold += (unsigned long)(*in++) << bits;
                 bits += 8;
             }
-            this = dcode[hold & dmask];
+            here = dcode[hold & dmask];
           dodist:
-            op = (unsigned)(this.bits);
+            op = (unsigned)(here.bits);
             hold >>= op;
             bits -= op;
-            op = (unsigned)(this.op);
+            op = (unsigned)(here.op);
             if (op & 16) {                      /* distance base */
-                dist = (unsigned)(this.val);
+                dist = (unsigned)(here.val);
                 op &= 15;                       /* number of extra bits */
                 if (bits < op) {
                     hold += (unsigned long)(*in++) << bits;
@@ -180,12 +172,34 @@ void inflate_fast(z_streamp strm, unsigned start)
                 if (dist > op) {                /* see if copy from window */
                     op = dist - op;             /* distance back in window */
                     if (op > whave) {
-                        strm->msg = (char *)"invalid distance too far back";
-                        state->mode = BAD;
-                        break;
+                        if (state->sane) {
+                            strm->msg =
+                                (char *)"invalid distance too far back";
+                            state->mode = BAD;
+                            break;
+                        }
+#ifdef INFLATE_ALLOW_INVALID_DISTANCE_TOOFAR_ARRR
+                        if (len <= op - whave) {
+                            do {
+                                *out++ = 0;
+                            } while (--len);
+                            continue;
+                        }
+                        len -= op - whave;
+                        do {
+                            *out++ = 0;
+                        } while (--op > whave);
+                        if (op == 0) {
+                            from = out - dist;
+                            do {
+                                *out++ = *from++;
+                            } while (--len);
+                            continue;
+                        }
+#endif
                     }
                     from = window;
-                    if (write == 0) {           /* very common case */
+                    if (wnext == 0) {           /* very common case */
                         from += wsize - op;
                         if (op < len) {         /* some from window */
                             len -= op;
@@ -195,17 +209,17 @@ void inflate_fast(z_streamp strm, unsigned start)
                             from = out - dist;  /* rest from output */
                         }
                     }
-                    else if (write < op) {      /* wrap around window */
-                        from += wsize + write - op;
-                        op -= write;
+                    else if (wnext < op) {      /* wrap around window */
+                        from += wsize + wnext - op;
+                        op -= wnext;
                         if (op < len) {         /* some from end of window */
                             len -= op;
                             do {
                                 *out++ = *from++;
                             } while (--op);
                             from = window;
-                            if (write < len) {  /* some from start of window */
-                                op = write;
+                            if (wnext < len) {  /* some from start of window */
+                                op = wnext;
                                 len -= op;
                                 do {
                                     *out++ = *from++;
@@ -215,7 +229,7 @@ void inflate_fast(z_streamp strm, unsigned start)
                         }
                     }
                     else {                      /* contiguous in window */
-                        from += write - op;
+                        from += wnext - op;
                         if (op < len) {         /* some from window */
                             len -= op;
                             do {
@@ -237,54 +251,22 @@ void inflate_fast(z_streamp strm, unsigned start)
                     }
                 }
                 else {
-		    unsigned short *sout;
-		    unsigned long loops;
-
                     from = out - dist;          /* copy direct from output */
-		    /* minimum length is three */
-		    /* Align out addr */
-		    if (!((long)(out - 1) & 1)) {
-			*out++ = *from++;
-			len--;
-		    }
-		    sout = (unsigned short *)(out);
-		    if (dist > 2) {
-			unsigned short *sfrom;
-
-			sfrom = (unsigned short *)(from);
-			loops = len >> 1;
-			do
-#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
-			    *sout++ = *sfrom++;
-#else
-			    *sout++ = get_unaligned16(sfrom++);
-#endif
-			while (--loops);
-			out = (unsigned char *)sout;
-			from = (unsigned char *)sfrom;
-		    } else { /* dist == 1 or dist == 2 */
-			unsigned short pat16;
-
-			pat16 = *(sout-1);
-			if (dist == 1) {
-				union uu mm;
-				/* copy one char pattern to both bytes */
-				mm.us = pat16;
-				mm.b[0] = mm.b[1];
-				pat16 = mm.us;
-			}
-			loops = len >> 1;
-			do
-			    *sout++ = pat16;
-			while (--loops);
-			out = (unsigned char *)sout;
-		    }
-		    if (len & 1)
-			*out++ = *from++;
+                    do {                        /* minimum length is three */
+                        *out++ = *from++;
+                        *out++ = *from++;
+                        *out++ = *from++;
+                        len -= 3;
+                    } while (len > 2);
+                    if (len) {
+                        *out++ = *from++;
+                        if (len > 1)
+                            *out++ = *from++;
+                    }
                 }
             }
             else if ((op & 64) == 0) {          /* 2nd level distance code */
-                this = dcode[this.val + (hold & ((1U << op) - 1))];
+                here = dcode[here.val + (hold & ((1U << op) - 1))];
                 goto dodist;
             }
             else {
@@ -294,7 +276,7 @@ void inflate_fast(z_streamp strm, unsigned start)
             }
         }
         else if ((op & 64) == 0) {              /* 2nd level length code */
-            this = lcode[this.val + (hold & ((1U << op) - 1))];
+            here = lcode[here.val + (hold & ((1U << op) - 1))];
             goto dolen;
         }
         else if (op & 32) {                     /* end-of-block */
@@ -329,7 +311,7 @@ void inflate_fast(z_streamp strm, unsigned start)
    inflate_fast() speedups that turned out slower (on a PowerPC G3 750CXe):
    - Using bit fields for code structure
    - Different op definition to avoid & for extra bits (do & for table bits)
-   - Three separate decoding do-loops for direct, window, and write == 0
+   - Three separate decoding do-loops for direct, window, and wnext == 0
    - Special case for distance > 1 copies to do overlapped load and store copy
    - Explicit branch predictions (based on measured branch probabilities)
    - Deferring match copy and interspersed it with decoding subsequent codes
@@ -338,5 +320,3 @@ void inflate_fast(z_streamp strm, unsigned start)
    - Larger unrolled copy loops (three is about right)
    - Moving len -= 3 statement into middle of loop
  */
-
-#endif /* !ASMINF */
diff --git a/lib/zlib_inflate/inflate.c b/lib/zlib_inflate/inflate.c
index ee39b5eb71f7..c6d198960e0f 100644
--- a/lib/zlib_inflate/inflate.c
+++ b/lib/zlib_inflate/inflate.c
@@ -25,6 +25,11 @@
 #define INFLATE_NEED_CHECKSUM(strm) 1
 #endif
 
+/* Reverse the bytes in a 32-bit value */
+#define REVERSE(q) \
+    ((((q) >> 24) & 0xff) + (((q) >> 8) & 0xff00) + \
+     (((q) & 0xff00) << 8) + (((q) & 0xff) << 24))
+
 int zlib_inflate_workspacesize(void)
 {
     return sizeof(struct inflate_workspace);
@@ -49,7 +54,7 @@ int zlib_inflateReset(z_streamp strm)
 
     /* Initialise Window */
     state->wsize = 1U << state->wbits;
-    state->write = 0;
+    state->wnext = 0;
     state->whave = 0;
 
     INFLATE_RESET_HOOK(strm);
@@ -128,22 +133,22 @@ static void zlib_updatewindow(z_streamp strm, unsigned out)
     copy = out - strm->avail_out;
     if (copy >= state->wsize) {
         memcpy(state->window, strm->next_out - state->wsize, state->wsize);
-        state->write = 0;
+        state->wnext = 0;
         state->whave = state->wsize;
     }
     else {
-        dist = state->wsize - state->write;
+        dist = state->wsize - state->wnext;
         if (dist > copy) dist = copy;
-        memcpy(state->window + state->write, strm->next_out - copy, dist);
+        memcpy(state->window + state->wnext, strm->next_out - copy, dist);
         copy -= dist;
         if (copy) {
             memcpy(state->window, strm->next_out - copy, copy);
-            state->write = copy;
+            state->wnext = copy;
             state->whave = state->wsize;
         }
         else {
-            state->write += dist;
-            if (state->write == state->wsize) state->write = 0;
+            state->wnext += dist;
+            if (state->wnext == state->wsize) state->wnext = 0;
             if (state->whave < state->wsize) state->whave += dist;
         }
     }
@@ -675,12 +680,12 @@ int zlib_inflate(z_streamp strm, int flush)
             copy = out - left;
             if (state->offset > copy) {         /* copy from window */
                 copy = state->offset - copy;
-                if (copy > state->write) {
-                    copy -= state->write;
+                if (copy > state->wnext) {
+                    copy -= state->wnext;
                     from = state->window + (state->wsize - copy);
                 }
                 else
-                    from = state->window + (state->write - copy);
+                    from = state->window + (state->wnext - copy);
                 if (copy > state->length) copy = state->length;
             }
             else {                              /* copy from output */
diff --git a/lib/zlib_inflate/inflate.h b/lib/zlib_inflate/inflate.h
index f79337ddf98c..6ece8efd879b 100644
--- a/lib/zlib_inflate/inflate.h
+++ b/lib/zlib_inflate/inflate.h
@@ -1,8 +1,5 @@
-#ifndef INFLATE_H
-#define INFLATE_H
-
 /* inflate.h -- internal inflate state definition
- * Copyright (C) 1995-2004 Mark Adler
+ * Copyright (C) 1995-2016 Mark Adler
  * For conditions of distribution and use, see copyright notice in zlib.h
  */
 
@@ -11,11 +8,17 @@
    subject to change. Applications should only use zlib.h.
  */
 
-#include "inftrees.h"
+/* define NO_GZIP when compiling if you want to disable gzip header and
+   trailer decoding by inflate().  NO_GZIP would be used to avoid linking in
+   the crc code when it is not needed.  For shared libraries, gzip decoding
+   should be left enabled. */
+#ifndef NO_GZIP
+#  define GUNZIP
+#endif
 
 /* Possible inflate modes between inflate() calls */
 typedef enum {
-    HEAD,       /* i: waiting for magic header */
+    HEAD = 16180,   /* i: waiting for magic header */
     FLAGS,      /* i: waiting for method and flags (gzip) */
     TIME,       /* i: waiting for modification time (gzip) */
     OS,         /* i: waiting for extra flags and operating system (gzip) */
@@ -29,11 +32,13 @@ typedef enum {
         TYPE,       /* i: waiting for type bits, including last-flag bit */
         TYPEDO,     /* i: same, but skip check to exit inflate on new block */
         STORED,     /* i: waiting for stored size (length and complement) */
+        COPY_,      /* i/o: same as COPY below, but only first time in */
         COPY,       /* i/o: waiting for input or output to copy stored block */
         TABLE,      /* i: waiting for dynamic block table lengths */
         LENLENS,    /* i: waiting for code length code lengths */
         CODELENS,   /* i: waiting for length/lit and distance code lengths */
-            LEN,        /* i: waiting for length/lit code */
+            LEN_,       /* i: same as LEN below, but only first time in */
+            LEN,        /* i: waiting for length/lit/eob code */
             LENEXT,     /* i: waiting for length extra bits */
             DIST,       /* i: waiting for distance code */
             DISTEXT,    /* i: waiting for distance extra bits */
@@ -50,19 +55,21 @@ typedef enum {
 /*
     State transitions between above modes -
 
-    (most modes can go to the BAD or MEM mode -- not shown for clarity)
+    (most modes can go to BAD or MEM on error -- not shown for clarity)
 
     Process header:
-        HEAD -> (gzip) or (zlib)
-        (gzip) -> FLAGS -> TIME -> OS -> EXLEN -> EXTRA -> NAME
-        NAME -> COMMENT -> HCRC -> TYPE
+        HEAD -> (gzip) or (zlib) or (raw)
+        (gzip) -> FLAGS -> TIME -> OS -> EXLEN -> EXTRA -> NAME -> COMMENT ->
+                  HCRC -> TYPE
         (zlib) -> DICTID or TYPE
         DICTID -> DICT -> TYPE
+        (raw) -> TYPEDO
     Read deflate blocks:
-            TYPE -> STORED or TABLE or LEN or CHECK
-            STORED -> COPY -> TYPE
-            TABLE -> LENLENS -> CODELENS -> LEN
-    Read deflate codes:
+            TYPE -> TYPEDO -> STORED or TABLE or LEN_ or CHECK
+            STORED -> COPY_ -> COPY -> TYPE
+            TABLE -> LENLENS -> CODELENS -> LEN_
+            LEN_ -> LEN
+    Read deflate codes in fixed or dynamic block:
                 LEN -> LENEXT or LIT or TYPE
                 LENEXT -> DIST -> DISTEXT -> MATCH -> LEN
                 LIT -> LEN
@@ -70,23 +77,26 @@ typedef enum {
         CHECK -> LENGTH -> DONE
  */
 
-/* state maintained between inflate() calls.  Approximately 7K bytes. */
+/* State maintained between inflate() calls -- approximately 7K bytes, not
+   including the allocated sliding window, which is up to 32K bytes. */
 struct inflate_state {
+    z_streamp strm;             /* pointer back to this zlib stream */
     inflate_mode mode;          /* current inflate mode */
     int last;                   /* true if processing last block */
-    int wrap;                   /* bit 0 true for zlib, bit 1 true for gzip */
+    int wrap;                   /* bit 0 true for zlib, bit 1 true for gzip,
+                                   bit 2 true to validate check value */
     int havedict;               /* true if dictionary provided */
     int flags;                  /* gzip header method and flags (0 if zlib) */
     unsigned dmax;              /* zlib header max distance (INFLATE_STRICT) */
     unsigned long check;        /* protected copy of check value */
     unsigned long total;        /* protected copy of output count */
- /*   gz_headerp head; */           /* where to save gzip header information */
+    /* gz_headerp head; */      /* where to save gzip header information */
         /* sliding window */
     unsigned wbits;             /* log base 2 of requested window size */
     unsigned wsize;             /* window size or zero if not using window */
     unsigned whave;             /* valid bytes in the window */
-    unsigned write;             /* window write index */
-    unsigned char *window;  /* allocated sliding window, if needed */
+    unsigned wnext;             /* window write index */
+    unsigned char *window;      /* allocated sliding window, if needed */
         /* bit accumulator */
     unsigned long hold;         /* input bit accumulator */
     unsigned bits;              /* number of bits in "in" */
@@ -96,8 +106,8 @@ struct inflate_state {
         /* for table and code decoding */
     unsigned extra;             /* extra bits needed */
         /* fixed and dynamic code tables */
-    code const *lencode;    /* starting table for length/literal codes */
-    code const *distcode;   /* starting table for distance codes */
+    code const *lencode;        /* starting table for length/literal codes */
+    code const *distcode;       /* starting table for distance codes */
     unsigned lenbits;           /* index bits for lencode */
     unsigned distbits;          /* index bits for distcode */
         /* dynamic table building */
@@ -105,15 +115,11 @@ struct inflate_state {
     unsigned nlen;              /* number of length code lengths */
     unsigned ndist;             /* number of distance code lengths */
     unsigned have;              /* number of code lengths in lens[] */
-    code *next;             /* next available space in codes[] */
+    code *next;                 /* next available space in codes[] */
     unsigned short lens[320];   /* temporary storage for code lengths */
     unsigned short work[288];   /* work area for code table building */
     code codes[ENOUGH];         /* space for code tables */
+    int sane;                   /* if false, allow invalid distance too far */
+    int back;                   /* bits back of last unprocessed length/lit */
+    unsigned was;               /* initial length of match */
 };
-
-/* Reverse the bytes in a 32-bit value */
-#define REVERSE(q) \
-    ((((q) >> 24) & 0xff) + (((q) >> 8) & 0xff00) + \
-     (((q) & 0xff00) << 8) + (((q) & 0xff) << 24))
-
-#endif
diff --git a/lib/zlib_inflate/inftrees.c b/lib/zlib_inflate/inftrees.c
index 028943052926..a3c64216b5c7 100644
--- a/lib/zlib_inflate/inftrees.c
+++ b/lib/zlib_inflate/inftrees.c
@@ -1,5 +1,5 @@
 /* inftrees.c -- generate Huffman trees for efficient decoding
- * Copyright (C) 1995-2005 Mark Adler
+ * Copyright (C) 1995-2017 Mark Adler
  * For conditions of distribution and use, see copyright notice in zlib.h
  */
 
@@ -36,11 +36,11 @@ int zlib_inflate_table(codetype type, unsigned short *lens, unsigned codes,
     unsigned fill;              /* index for replicating entries */
     unsigned low;               /* low bits for current root entry */
     unsigned mask;              /* mask for low root bits */
-    code this;                  /* table entry for duplication */
-    code *next;             /* next available space in table */
+    code here;                  /* table entry for duplication */
+    code *next;                 /* next available space in table */
     const unsigned short *base;     /* base value table to use */
     const unsigned short *extra;    /* extra bits table to use */
-    int end;                    /* use base and extra for symbol > end */
+    unsigned match;             /* use base and extra for symbol >= match */
     unsigned short count[MAXBITS+1];    /* number of codes of each length */
     unsigned short offs[MAXBITS+1];     /* offsets in table for each length */
     static const unsigned short lbase[31] = { /* Length codes 257..285 base */
@@ -48,7 +48,7 @@ int zlib_inflate_table(codetype type, unsigned short *lens, unsigned codes,
         35, 43, 51, 59, 67, 83, 99, 115, 131, 163, 195, 227, 258, 0, 0};
     static const unsigned short lext[31] = { /* Length codes 257..285 extra */
         16, 16, 16, 16, 16, 16, 16, 16, 17, 17, 17, 17, 18, 18, 18, 18,
-        19, 19, 19, 19, 20, 20, 20, 20, 21, 21, 21, 21, 16, 201, 196};
+        19, 19, 19, 19, 20, 20, 20, 20, 21, 21, 21, 21, 16, 77, 202};
     static const unsigned short dbase[32] = { /* Distance codes 0..29 base */
         1, 2, 3, 4, 5, 7, 9, 13, 17, 25, 33, 49, 65, 97, 129, 193,
         257, 385, 513, 769, 1025, 1537, 2049, 3073, 4097, 6145,
@@ -101,15 +101,15 @@ int zlib_inflate_table(codetype type, unsigned short *lens, unsigned codes,
         if (count[max] != 0) break;
     if (root > max) root = max;
     if (max == 0) {                     /* no symbols to code at all */
-        this.op = (unsigned char)64;    /* invalid code marker */
-        this.bits = (unsigned char)1;
-        this.val = (unsigned short)0;
-        *(*table)++ = this;             /* make a table to force an error */
-        *(*table)++ = this;
+        here.op = (unsigned char)64;    /* invalid code marker */
+        here.bits = (unsigned char)1;
+        here.val = (unsigned short)0;
+        *(*table)++ = here;             /* make a table to force an error */
+        *(*table)++ = here;
         *bits = 1;
         return 0;     /* no symbols, but wait for decoding to report error */
     }
-    for (min = 1; min < MAXBITS; min++)
+    for (min = 1; min < max; min++)
         if (count[min] != 0) break;
     if (root < min) root = min;
 
@@ -152,11 +152,10 @@ int zlib_inflate_table(codetype type, unsigned short *lens, unsigned codes,
        entered in the tables.
 
        used keeps track of how many table entries have been allocated from the
-       provided *table space.  It is checked when a LENS table is being made
-       against the space in *table, ENOUGH, minus the maximum space needed by
-       the worst case distance code, MAXD.  This should never happen, but the
-       sufficiency of ENOUGH has not been proven exhaustively, hence the check.
-       This assumes that when type == LENS, bits == 9.
+       provided *table space.  It is checked for LENS and DIST tables against
+       the constants ENOUGH_LENS and ENOUGH_DISTS to guard against changes in
+       the initial root table size constants.  See the comments in inftrees.h
+       for more information.
 
        sym increments through all symbols, and the loop terminates when
        all codes of length max, i.e. all codes, have been processed.  This
@@ -168,19 +167,17 @@ int zlib_inflate_table(codetype type, unsigned short *lens, unsigned codes,
     switch (type) {
     case CODES:
         base = extra = work;    /* dummy value--not used */
-        end = 19;
+        match = 20;
         break;
     case LENS:
         base = lbase;
-        base -= 257;
         extra = lext;
-        extra -= 257;
-        end = 256;
+        match = 257;
         break;
-    default:            /* DISTS */
+    default:    /* DISTS */
         base = dbase;
         extra = dext;
-        end = -1;
+        match = 0;
     }
 
     /* initialize state for loop */
@@ -195,24 +192,25 @@ int zlib_inflate_table(codetype type, unsigned short *lens, unsigned codes,
     mask = used - 1;            /* mask for comparing low */
 
     /* check available table space */
-    if (type == LENS && used >= ENOUGH - MAXD)
+    if ((type == LENS && used > ENOUGH_LENS) ||
+        (type == DISTS && used > ENOUGH_DISTS))
         return 1;
 
     /* process all codes and make table entries */
     for (;;) {
         /* create table entry */
-        this.bits = (unsigned char)(len - drop);
-        if ((int)(work[sym]) < end) {
-            this.op = (unsigned char)0;
-            this.val = work[sym];
+        here.bits = (unsigned char)(len - drop);
+        if (work[sym] + 1U < match) {
+            here.op = (unsigned char)0;
+            here.val = work[sym];
         }
-        else if ((int)(work[sym]) > end) {
-            this.op = (unsigned char)(extra[work[sym]]);
-            this.val = base[work[sym]];
+        else if (work[sym] >= match) {
+            here.op = (unsigned char)(extra[work[sym] - match]);
+            here.val = base[work[sym] - match];
         }
         else {
-            this.op = (unsigned char)(32 + 64);         /* end of block */
-            this.val = 0;
+            here.op = (unsigned char)(32 + 64);         /* end of block */
+            here.val = 0;
         }
 
         /* replicate for those indices with low len bits equal to huff */
@@ -221,7 +219,7 @@ int zlib_inflate_table(codetype type, unsigned short *lens, unsigned codes,
         min = fill;                 /* save offset to next table */
         do {
             fill -= incr;
-            next[(huff >> drop) + fill] = this;
+            next[(huff >> drop) + fill] = here;
         } while (fill != 0);
 
         /* backwards increment the len-bit code huff */
@@ -263,7 +261,8 @@ int zlib_inflate_table(codetype type, unsigned short *lens, unsigned codes,
 
             /* check for enough space */
             used += 1U << curr;
-            if (type == LENS && used >= ENOUGH - MAXD)
+            if ((type == LENS && used > ENOUGH_LENS) ||
+                (type == DISTS && used > ENOUGH_DISTS))
                 return 1;
 
             /* point entry in root table to sub-table */
@@ -274,38 +273,14 @@ int zlib_inflate_table(codetype type, unsigned short *lens, unsigned codes,
         }
     }
 
-    /*
-       Fill in rest of table for incomplete codes.  This loop is similar to the
-       loop above in incrementing huff for table indices.  It is assumed that
-       len is equal to curr + drop, so there is no loop needed to increment
-       through high index bits.  When the current sub-table is filled, the loop
-       drops back to the root table to fill in any remaining entries there.
-     */
-    this.op = (unsigned char)64;                /* invalid code marker */
-    this.bits = (unsigned char)(len - drop);
-    this.val = (unsigned short)0;
-    while (huff != 0) {
-        /* when done with sub-table, drop back to root table */
-        if (drop != 0 && (huff & mask) != low) {
-            drop = 0;
-            len = root;
-            next = *table;
-            this.bits = (unsigned char)len;
-        }
-
-        /* put invalid code marker in table */
-        next[huff >> drop] = this;
-
-        /* backwards increment the len-bit code huff */
-        incr = 1U << (len - 1);
-        while (huff & incr)
-            incr >>= 1;
-        if (incr != 0) {
-            huff &= incr - 1;
-            huff += incr;
-        }
-        else
-            huff = 0;
+    /* fill in remaining table entry if code is incomplete (guaranteed to have
+       at most one remaining entry, since if the code is incomplete, the
+       maximum code length that was allowed to get this far is one bit) */
+    if (huff != 0) {
+        here.op = (unsigned char)64;            /* invalid code marker */
+        here.bits = (unsigned char)(len - drop);
+        here.val = (unsigned short)0;
+        next[huff] = here;
     }
 
     /* set return parameters */
diff --git a/lib/zlib_inflate/inftrees.h b/lib/zlib_inflate/inftrees.h
index b70b4731ac7a..fe4307fcfbe3 100644
--- a/lib/zlib_inflate/inftrees.h
+++ b/lib/zlib_inflate/inftrees.h
@@ -1,8 +1,5 @@
-#ifndef INFTREES_H
-#define INFTREES_H
-
 /* inftrees.h -- header to use inftrees.c
- * Copyright (C) 1995-2005 Mark Adler
+ * Copyright (C) 1995-2005, 2010 Mark Adler
  * For conditions of distribution and use, see copyright notice in zlib.h
  */
 
@@ -38,22 +35,28 @@ typedef struct {
     01000000 - invalid code
  */
 
-/* Maximum size of dynamic tree.  The maximum found in a long but non-
-   exhaustive search was 1444 code structures (852 for length/literals
-   and 592 for distances, the latter actually the result of an
-   exhaustive search).  The true maximum is not known, but the value
-   below is more than safe. */
-#define ENOUGH 2048
-#define MAXD 592
+/* Maximum size of the dynamic table.  The maximum number of code structures is
+   1444, which is the sum of 852 for literal/length codes and 592 for distance
+   codes.  These values were found by exhaustive searches using the program
+   examples/enough.c found in the zlib distribtution.  The arguments to that
+   program are the number of symbols, the initial root table size, and the
+   maximum bit length of a code.  "enough 286 9 15" for literal/length codes
+   returns returns 852, and "enough 30 6 15" for distance codes returns 592.
+   The initial root table size (9 or 6) is found in the fifth argument of the
+   inflate_table() calls in inflate.c and infback.c.  If the root table size is
+   changed, then these maximum sizes would be need to be recalculated and
+   updated. */
+#define ENOUGH_LENS 852
+#define ENOUGH_DISTS 592
+#define ENOUGH (ENOUGH_LENS+ENOUGH_DISTS)
 
-/* Type of code to build for inftable() */
+/* Type of code to build for inflate_table() */
 typedef enum {
     CODES,
     LENS,
     DISTS
 } codetype;
 
-extern int zlib_inflate_table (codetype type, unsigned short *lens,
+int zlib_inflate_table (codetype type, unsigned short *lens,
                              unsigned codes, code **table,
                              unsigned *bits, unsigned short *work);
-#endif
-- 
2.31.1.365.ga2a05a39c5


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: Heads up: gcc miscompiling initramfs zlib decompression code at -O3
  2021-05-03 17:41 ` Linus Torvalds
@ 2021-05-03 19:18   ` Vineet Gupta
  2021-05-05 17:44   ` Heiko Carstens
  1 sibling, 0 replies; 8+ messages in thread
From: Vineet Gupta @ 2021-05-03 19:18 UTC (permalink / raw)
  To: Linus Torvalds, Vineet Gupta
  Cc: Arnd Bergmann, Peter Zijlstra, Masahiro Yamada, Jann Horn, lkml,
	arcml, linux-arch

On 5/3/21 10:41 AM, Linus Torvalds wrote:
> On Fri, Apr 30, 2021 at 1:46 PM Vineet Gupta <Vineet.Gupta1@synopsys.com> wrote:
>> I've hit a mainline gcc 10.2 (also gcc 9.3) bug which triggers at -O3
>> causing wrong codegen.
> So it does seem to be a gcc bug or at least mis-feature where gcc gets
> the aliasing decision wrong when vectorizing the code.
>
> That said, the fact that gcc even tries to vectorize the code shows
> how pointless it was for us to (years ago) try to make it use 16-bit
> accesses in the first place.
>
> So can you try this patch that basically reverts some of those
> kernel-specific changes to zlib's inffast.c - and does so by just
> upgrading it to a newer version from a more modern zlib (which in this
> case still means "from 2017", but that's the most recent version there
> is).
>
> This is a fairly quick hack, and I really tried to keep it to just
> inffast.c and inftrees.c with a few minimal fixups to inflate.c
> itself.
>
> Most of the changes are for naming (which seems to be at least partly
> for C++ namespace reasons, ie "this" -> "here"), but there's some
> cleanup wrt maximum table sizes etc too.
>
> NOTE! I have not tested this patch in any other way than "it compiles
> with allmodconfig". The diff looks reasonable to me, but that's all
> I'm really going to say.
>
> Does this avoid the gcc -O3 problem for you?

Yes it does. I built the following:

2021-05-03 b93bedcf8fa4 Update zlib inffast code to more modern zlib
2021-02-26 ea680985468f ARC: fix CONFIG_HARDENED_USERCOPY
2021-04-23 f7118f8ada1b ARC: entry: fix off-by-one error in syscall 
number validation
2021-04-21 1cb7eefda7ed ARC: kgdb: add 'fallthrough' to prevent a warning
2021-03-22 163630b2d95b arc: Fix typos/spellos
2021-04-11 d434405aaab7 Linux 5.12-rc7

And it seems to be DTRT

| Linux version 5.12.0-rc7-00005-gb93bedcf8fa4 
(vineetg@vineetg-Latitude-7400) (arc-linux-gcc.br_real (Buildroot 
2021.02-6-g5e29ba7bf732) 10.2.0, GNU ld (GNU Binutils) 2.36.1) #678 
PREEMPT Mon May 3 11:29:32 PDT 2021
| Memory @ 80000000 [1024M]
| ...
|   with environment:
|     HOME=/
|     TERM=linux
| Starting syslogd: OK
| Starting klogd: OK
| Running sysctl: OK
| $
| $ zcat /proc/config.gz | egrep "(OPTIM|COMPRESSION_GZIP)"
| CONFIG_INITRAMFS_COMPRESSION_GZIP=y
| # CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE is not set
| CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE_O3=y
|
| $ free -m
|               total        used        free      shared buff/cache   
available
| Mem:           1012           3         978          31 31         972
| Swap:
|

> It would be lovely if somebody also took a look at some of the other
> zlib code, like inflate.c itself. But some of that code has rather
> subtle changes for things like s390 hardware support, and we have
> thihngs like our fallthrough rules etc, so it gets a bit hairier.

I took a quick look, but there some to be subtle state machine changes 
in inflate.c which I'm not comfortable mucking around with.

> Which is why I did just this fairly minimal part.
>
> Note that the commit message has a "Not-yet-signed-off-by:" from me.
> That's purely because I wanted it to be obvious that this needs a lot
> more testing - I'm not comfy with this patch unless somebody takes it
> upon themselves to actually test it under different loads (and
> different architectures).

Maybe give it some time to bake in linux-next for a 5.14 inclusion ?
Thx again for jumping in and steering gcc folks to right conclusions.

-Vineet

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Heads up: gcc miscompiling initramfs zlib decompression code at -O3
  2021-05-03 17:41 ` Linus Torvalds
  2021-05-03 19:18   ` Vineet Gupta
@ 2021-05-05 17:44   ` Heiko Carstens
  1 sibling, 0 replies; 8+ messages in thread
From: Heiko Carstens @ 2021-05-05 17:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Vineet Gupta, Arnd Bergmann, Peter Zijlstra, Masahiro Yamada,
	Jann Horn, lkml, arcml, linux-arch, Mikhail Zaslonko,
	Ilya Leoshkevich, Vasily Gorbik

On Mon, May 03, 2021 at 10:41:45AM -0700, Linus Torvalds wrote:
> It would be lovely if somebody also took a look at some of the other
> zlib code, like inflate.c itself. But some of that code has rather
> subtle changes for things like s390 hardware support, and we have
> thihngs like our fallthrough rules etc, so it gets a bit hairier.
> 
> Which is why I did just this fairly minimal part.
> 
> Note that the commit message has a "Not-yet-signed-off-by:" from me.
> That's purely because I wanted it to be obvious that this needs a lot
> more testing - I'm not comfy with this patch unless somebody takes it
> upon themselves to actually test it under different loads (and
> different architectures).

The patch below is required on top of your patch to make it compile
for s390 as well.
Tested with kernel image decompression, and also btrfs with file
compression; both software and hardware compression.
Everything seems to work.

diff --git a/lib/zlib_dfltcc/dfltcc_inflate.c b/lib/zlib_dfltcc/dfltcc_inflate.c
index fb60b5a6a1cb..3bb3e431b79f 100644
--- a/lib/zlib_dfltcc/dfltcc_inflate.c
+++ b/lib/zlib_dfltcc/dfltcc_inflate.c
@@ -1,5 +1,7 @@
 // SPDX-License-Identifier: Zlib
 
+#include <linux/zlib.h>
+#include "../zlib_inflate/inftrees.h"
 #include "../zlib_inflate/inflate.h"
 #include "dfltcc_util.h"
 #include "dfltcc.h"
@@ -122,7 +124,7 @@ dfltcc_inflate_action dfltcc_inflate(
     param->cvt = CVT_ADLER32;
     param->sbb = state->bits;
     param->hl = state->whave; /* Software and hardware history formats match */
-    param->ho = (state->write - state->whave) & ((1 << HB_BITS) - 1);
+    param->ho = (state->wnext - state->whave) & ((1 << HB_BITS) - 1);
     if (param->hl)
         param->nt = 0; /* Honor history for the first block */
     param->cv = state->check;
@@ -137,7 +139,7 @@ dfltcc_inflate_action dfltcc_inflate(
     state->last = cc == DFLTCC_CC_OK;
     state->bits = param->sbb;
     state->whave = param->hl;
-    state->write = (param->ho + param->hl) & ((1 << HB_BITS) - 1);
+    state->wnext = (param->ho + param->hl) & ((1 << HB_BITS) - 1);
     state->check = param->cv;
     if (cc == DFLTCC_CC_OP2_CORRUPT && param->oesc != 0) {
         /* Report an error if stream is corrupted */
diff --git a/lib/zlib_inflate/inflate.h b/lib/zlib_inflate/inflate.h
index 6ece8efd879b..52314b8b28ea 100644
--- a/lib/zlib_inflate/inflate.h
+++ b/lib/zlib_inflate/inflate.h
@@ -1,3 +1,6 @@
+#ifndef __ZLIB_INFLATE_INFLATE_H
+#define __ZLIB_INFLATE_INFLATE_H
+
 /* inflate.h -- internal inflate state definition
  * Copyright (C) 1995-2016 Mark Adler
  * For conditions of distribution and use, see copyright notice in zlib.h
@@ -123,3 +126,5 @@ struct inflate_state {
     int back;                   /* bits back of last unprocessed length/lit */
     unsigned was;               /* initial length of match */
 };
+
+#endif /* __ZLIB_INFLATE_INFLATE_H */
diff --git a/lib/zlib_inflate/inftrees.h b/lib/zlib_inflate/inftrees.h
index fe4307fcfbe3..39d5d90d1258 100644
--- a/lib/zlib_inflate/inftrees.h
+++ b/lib/zlib_inflate/inftrees.h
@@ -1,3 +1,6 @@
+#ifndef __ZLIB_INFLATE_INFTREES_H
+#define __ZLIB_INFLATE_INFTREES_H
+
 /* inftrees.h -- header to use inftrees.c
  * Copyright (C) 1995-2005, 2010 Mark Adler
  * For conditions of distribution and use, see copyright notice in zlib.h
@@ -60,3 +63,5 @@ typedef enum {
 int zlib_inflate_table (codetype type, unsigned short *lens,
                              unsigned codes, code **table,
                              unsigned *bits, unsigned short *work);
+
+#endif /* __ZLIB_INFLATE_INFTREES_H */

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-05-05 17:45 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-30 20:46 Heads up: gcc miscompiling initramfs zlib decompression code at -O3 Vineet Gupta
2021-04-30 22:06 ` Linus Torvalds
2021-04-30 22:44   ` Vineet Gupta
2021-04-30 22:52     ` Linus Torvalds
2021-05-01 21:03   ` Oleksandr Natalenko
2021-05-03 17:41 ` Linus Torvalds
2021-05-03 19:18   ` Vineet Gupta
2021-05-05 17:44   ` Heiko Carstens

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).