* Userspace regression in LTS and stable kernels @ 2019-02-13 17:57 Samuel Dionne-Riel 2019-02-13 18:00 ` Samuel Dionne-Riel 2019-02-13 23:36 ` Richard Weinberger 0 siblings, 2 replies; 22+ messages in thread From: Samuel Dionne-Riel @ 2019-02-13 17:57 UTC (permalink / raw) To: linux-kernel; +Cc: torvalds, graham Hi, I am posting as a representative of the NixOS Linux distribution, about a userspace regression on 5.0-rc* which recently was backported to the 4.14.99, 4.19.21 and 4.20.8 current LTS and stable versions. The issue has been reported to the bug tracker, bug 202497, but seems to have gone unnoticed by the maintainers. The issue seems to break userspace for long-standing patterns in the NixOS distribution, with regards to use of the shebangs. Here is an example shebang causing an issue: #! /nix/store/mbwav8kz8b3y471wjsybgzw84mrh4js9-perl-5.28.1/bin/perl -I/nix/store/x6yyav38jgr924nkna62q3pkp0dgmzlx-perl5.28.1-File-Slurp-9999.25/lib/perl5/site_perl -I/nix/store/ha8v67sl8dac92r9z07vzr4gv1y9nwqz-perl5.28.1-Net-DBus-1.1.0/lib/perl5/site_perl -I/nix/store/dcrkvnjmwh69ljsvpbdjjdnqgwx90a9d-perl5.28.1-XML-Parser-2.44/lib/perl5/site_perl -I/nix/store/rmji88k2zz7h4zg97385bygcydrf2q8h-perl5.28.1-XML-Twig-3.52/lib/perl5/site_perl (The shebang was artificially wrapped spaces replaced by newlines) Another contributor tracked the regression it to commit 8099b047ecc431518b9bb6bdbba3549bbecdc343 in the 5.0-rc* tree. I bring no particular fix to the issue, but I believe it should at least be fast-tracked to a revert for the stable and LTS branches, and since 5.0 might drop soon, a solution worked on, or possibly a revert until one is figured out. Thanks, -- — Samuel Dionne-Riel ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Userspace regression in LTS and stable kernels 2019-02-13 17:57 Userspace regression in LTS and stable kernels Samuel Dionne-Riel @ 2019-02-13 18:00 ` Samuel Dionne-Riel 2019-02-13 23:36 ` Richard Weinberger 1 sibling, 0 replies; 22+ messages in thread From: Samuel Dionne-Riel @ 2019-02-13 18:00 UTC (permalink / raw) To: linux-kernel; +Cc: torvalds, graham Sorry, had an issue when sending the e-mail and lost a bit of context at the end, here's the missing bit: --- [...] Note: I wish to be CC'ed for answers and comments as I am not subscribed to the mailing list, thanks. References: * NixOS bug 53672, https://github.com/NixOS/nixpkgs/issues/53672 * Bug 202497, https://bugzilla.kernel.org/show_bug.cgi?id=202497 Thanks, On 13/02/2019, Samuel Dionne-Riel <samuel@dionne-riel.com> wrote: > Hi, > > I am posting as a representative of the NixOS Linux distribution, > about a userspace regression on 5.0-rc* which recently was backported > to the 4.14.99, 4.19.21 and 4.20.8 current LTS and stable versions. > The issue has been reported to the bug tracker, bug 202497, but seems > to have gone unnoticed by the maintainers. > > The issue seems to break userspace for long-standing patterns in the > NixOS distribution, with regards to use of the shebangs. > > Here is an example shebang causing an issue: > > #! /nix/store/mbwav8kz8b3y471wjsybgzw84mrh4js9-perl-5.28.1/bin/perl > -I/nix/store/x6yyav38jgr924nkna62q3pkp0dgmzlx-perl5.28.1-File-Slurp-9999.25/lib/perl5/site_perl > -I/nix/store/ha8v67sl8dac92r9z07vzr4gv1y9nwqz-perl5.28.1-Net-DBus-1.1.0/lib/perl5/site_perl > -I/nix/store/dcrkvnjmwh69ljsvpbdjjdnqgwx90a9d-perl5.28.1-XML-Parser-2.44/lib/perl5/site_perl > -I/nix/store/rmji88k2zz7h4zg97385bygcydrf2q8h-perl5.28.1-XML-Twig-3.52/lib/perl5/site_perl > > (The shebang was artificially wrapped spaces replaced by newlines) > > Another contributor tracked the regression it to commit > 8099b047ecc431518b9bb6bdbba3549bbecdc343 in the 5.0-rc* tree. > > I bring no particular fix to the issue, but I believe it should at > least be fast-tracked to a revert for the stable and LTS branches, and > since 5.0 might drop soon, a solution worked on, or possibly a revert > until one is figured out. > > Thanks, > > -- > — Samuel Dionne-Riel > -- — Samuel Dionne-Riel ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Userspace regression in LTS and stable kernels 2019-02-13 17:57 Userspace regression in LTS and stable kernels Samuel Dionne-Riel 2019-02-13 18:00 ` Samuel Dionne-Riel @ 2019-02-13 23:36 ` Richard Weinberger 2019-02-14 0:41 ` Samuel Dionne-Riel ` (2 more replies) 1 sibling, 3 replies; 22+ messages in thread From: Richard Weinberger @ 2019-02-13 23:36 UTC (permalink / raw) To: Samuel Dionne-Riel Cc: LKML, Linus Torvalds, graham, Oleg Nesterov, Kees Cook, mhocko, Andrew Morton [CC'in relevant folks] On Thu, Feb 14, 2019 at 12:19 AM Samuel Dionne-Riel <samuel@dionne-riel.com> wrote: > > Hi, > > I am posting as a representative of the NixOS Linux distribution, > about a userspace regression on 5.0-rc* which recently was backported > to the 4.14.99, 4.19.21 and 4.20.8 current LTS and stable versions. > The issue has been reported to the bug tracker, bug 202497, but seems > to have gone unnoticed by the maintainers. > > The issue seems to break userspace for long-standing patterns in the > NixOS distribution, with regards to use of the shebangs. > > Here is an example shebang causing an issue: > > #! /nix/store/mbwav8kz8b3y471wjsybgzw84mrh4js9-perl-5.28.1/bin/perl > -I/nix/store/x6yyav38jgr924nkna62q3pkp0dgmzlx-perl5.28.1-File-Slurp-9999.25/lib/perl5/site_perl > -I/nix/store/ha8v67sl8dac92r9z07vzr4gv1y9nwqz-perl5.28.1-Net-DBus-1.1.0/lib/perl5/site_perl > -I/nix/store/dcrkvnjmwh69ljsvpbdjjdnqgwx90a9d-perl5.28.1-XML-Parser-2.44/lib/perl5/site_perl > -I/nix/store/rmji88k2zz7h4zg97385bygcydrf2q8h-perl5.28.1-XML-Twig-3.52/lib/perl5/site_perl This this ever work correctly? It is longer than BINPRM_BUF_SIZE. > (The shebang was artificially wrapped spaces replaced by newlines) > > Another contributor tracked the regression it to commit > 8099b047ecc431518b9bb6bdbba3549bbecdc343 in the 5.0-rc* tree. > > I bring no particular fix to the issue, but I believe it should at > least be fast-tracked to a revert for the stable and LTS branches, and > since 5.0 might drop soon, a solution worked on, or possibly a revert > until one is figured out. Your shebang line exceeds BINPRM_BUF_SIZE. Before the said commit the kernel silently truncated the shebang line (and corrupted it), now it tells the user that the line is too long. Thanks, //richard ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Userspace regression in LTS and stable kernels 2019-02-13 23:36 ` Richard Weinberger @ 2019-02-14 0:41 ` Samuel Dionne-Riel 2019-02-14 0:54 ` Kees Cook 2019-02-14 0:41 ` Kees Cook 2019-02-14 17:56 ` Linus Torvalds 2 siblings, 1 reply; 22+ messages in thread From: Samuel Dionne-Riel @ 2019-02-14 0:41 UTC (permalink / raw) To: Richard Weinberger Cc: LKML, Linus Torvalds, graham, Oleg Nesterov, Kees Cook, mhocko, Andrew Morton Thanks for CC'ing relevant folks. On 13/02/2019, Richard Weinberger <richard.weinberger@gmail.com> wrote: > [CC'in relevant folks] > > On Thu, Feb 14, 2019 at 12:19 AM Samuel Dionne-Riel > <samuel@dionne-riel.com> wrote: >> >> Here is an example shebang causing an issue: >> >> #! /nix/store/mbwav8kz8b3y471wjsybgzw84mrh4js9-perl-5.28.1/bin/perl >> -I/nix/store/x6yyav38jgr924nkna62q3pkp0dgmzlx-perl5.28.1-File-Slurp-9999.25/lib/perl5/site_perl >> -I/nix/store/ha8v67sl8dac92r9z07vzr4gv1y9nwqz-perl5.28.1-Net-DBus-1.1.0/lib/perl5/site_perl >> -I/nix/store/dcrkvnjmwh69ljsvpbdjjdnqgwx90a9d-perl5.28.1-XML-Parser-2.44/lib/perl5/site_perl >> -I/nix/store/rmji88k2zz7h4zg97385bygcydrf2q8h-perl5.28.1-XML-Twig-3.52/lib/perl5/site_perl > > This this ever work correctly? It is longer than BINPRM_BUF_SIZE. > Yes, this is longer than BINPRM_BUF_SIZE. This worked when the interpreter knew to re-read the shebang, which among other interpreters, perl did. https://perl5.git.perl.org/perl.git/blob/04db542212fdad3a62f13afe741c99028f4bf799:/toke.c#l5524 >> (The shebang was artificially wrapped spaces replaced by newlines) >> >> Another contributor tracked the regression it to commit >> 8099b047ecc431518b9bb6bdbba3549bbecdc343 in the 5.0-rc* tree. >> >> I bring no particular fix to the issue, but I believe it should at >> least be fast-tracked to a revert for the stable and LTS branches, and >> since 5.0 might drop soon, a solution worked on, or possibly a revert >> until one is figured out. > > Your shebang line exceeds BINPRM_BUF_SIZE. > Before the said commit the kernel silently truncated the shebang line > (and corrupted it), > now it tells the user that the line is too long. > Yes, the shebang line exceeds BINPRM_BUF_SIZE. Before, the interpreter was still used (assuming it wasn't cut by the length), and the interpreter was free to re-read the shebang if desired. With the new behaviour, instead of executing the interpreter with a possibly truncated command line, the default script interpreter is used, meaning that the perl script is being interpreted by the wrong interpreter, giving copious amounts of irrelevant errors. This is not telling the user that the line is too long. This provably breaks the userland as under 4.14.98 the perl interpreter is used when running the script, and under 4.14.99 bash is used when running the script. For a given script with a shebang of #!(124*A)ZBBB: The behaviour as of before the regression: sh: ./foolish: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAZ: bad interpreter: No such file or directory As expected, it cuts the shebang. The behaviour following the change is the script ends up being executed by the shell interpreter, as if there was no shebang, and +x. Quoted from `man 3 exec` > If the header of a file isn't recognized (the attempted execve(2) failed with the error ENOEXEC), these functions will execute the shell (/bin/sh) with the path of the file as its first argument. (If this attempt fails, no further searching is done.) The now returned ENOEXEC might be "more right", but changes the semantics of a truncated shebang. Here I am not debating the validity of using a truncated shebang, but showing that the userspace behaviour changed in an unexpected and breaking behaviour. Am I under a wrong assumption that the kernel shouldn't break userspace? -- — Samuel Dionne-Riel ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Userspace regression in LTS and stable kernels 2019-02-14 0:41 ` Samuel Dionne-Riel @ 2019-02-14 0:54 ` Kees Cook 2019-02-14 1:27 ` Samuel Dionne-Riel 0 siblings, 1 reply; 22+ messages in thread From: Kees Cook @ 2019-02-14 0:54 UTC (permalink / raw) To: Samuel Dionne-Riel Cc: Richard Weinberger, LKML, Linus Torvalds, Graham Christensen, Oleg Nesterov, Michal Hocko, Andrew Morton On Wed, Feb 13, 2019 at 4:41 PM Samuel Dionne-Riel <samuel@dionne-riel.com> wrote: > Before, the interpreter was still used (assuming it wasn't cut by the > length), and the interpreter was free to re-read the shebang if > desired. Oh awesome. Yeah, so, nevermind about the WARN_ONCE(). So, to address the "wrong binary" problem, how about we ENOEXEC only if no newline or spaces are found in the string? -- Kees Cook ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Userspace regression in LTS and stable kernels 2019-02-14 0:54 ` Kees Cook @ 2019-02-14 1:27 ` Samuel Dionne-Riel 2019-02-14 1:35 ` Kees Cook 0 siblings, 1 reply; 22+ messages in thread From: Samuel Dionne-Riel @ 2019-02-14 1:27 UTC (permalink / raw) To: Kees Cook Cc: Richard Weinberger, LKML, Linus Torvalds, Graham Christensen, Oleg Nesterov, Michal Hocko, Andrew Morton On 13/02/2019, Kees Cook <keescook@chromium.org> wrote: > On Wed, Feb 13, 2019 at 4:41 PM Samuel Dionne-Riel > <samuel@dionne-riel.com> wrote: >> Before, the interpreter was still used (assuming it wasn't cut by the >> length), and the interpreter was free to re-read the shebang if >> desired. > > So, to address the "wrong binary" problem, how about we ENOEXEC only > if no newline or spaces are found in the string? > If I understand right, you're asking whether it should return NOEXEC if, of the first 128 bytes of the shebang, there are no spaces, but a too long shebang? I wouldn't know for sure. The behaviour would change. Instead failing due to trying to execute a shortened path, it would fall back to the shell interpreter interpreting the file, which, due to the inclusion of a specific shebang, might be a wrong assumption still. Here I believe it's still in the "undefined behaviour" territory, but one where it fails early for the userspace. I don't have a strong opinion, but having a special case depending on whitespace or not (well, possibility of the interpreter being truncated or not) feels off. As an end-user, I would rather it truncates, and show the truncated interpreter it tried to use (behaviour before regression), rather than fail in a way where the libc will continue executing using another unexpected interpreter. Thinking in the principle of least astonishment, I would be less surprised to see a truncated path on exec() than seeing exec() use an unexpected interpreter. -- — Samuel Dionne-Riel ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Userspace regression in LTS and stable kernels 2019-02-14 1:27 ` Samuel Dionne-Riel @ 2019-02-14 1:35 ` Kees Cook 2019-02-14 3:16 ` Samuel Dionne-Riel 0 siblings, 1 reply; 22+ messages in thread From: Kees Cook @ 2019-02-14 1:35 UTC (permalink / raw) To: Samuel Dionne-Riel Cc: Richard Weinberger, LKML, Linus Torvalds, Graham Christensen, Oleg Nesterov, Michal Hocko, Andrew Morton On Wed, Feb 13, 2019 at 5:27 PM Samuel Dionne-Riel <samuel@dionne-riel.com> wrote: > If I understand right, you're asking whether it should return NOEXEC > if, of the first 128 bytes of the shebang, there are no spaces, but a > too long shebang? I wouldn't know for sure. The behaviour would > change. Instead failing due to trying to execute a shortened path, it > would fall back to the shell interpreter interpreting the file, which, > due to the inclusion of a specific shebang, might be a wrong > assumption still. Here I believe it's still in the "undefined > behaviour" territory, but one where it fails early for the userspace. The original problem that was trying to be fixed here was to disallow execution of a truncated interpreter path. It was assumed argument truncate was just as bad, but it's not, since the interpreter can (and does!) re-read the script to get the right arguments. So, I've sent a fix-up patch that should disallow the path truncation, but pass through the argument truncation as before. This passes all the tests I built: $ ls -l /AAA*/perl -rwxr-xr-x 1 root root 129 Feb 13 17:17 /AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA/perl -rwxr-xr-x 1 root root 129 Feb 13 17:17 /AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA/perl -rwxr-xr-x 1 root root 129 Feb 13 17:17 /AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA/perl -rwxr-xr-x 1 root root 129 Feb 13 17:17 /AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA/perl -rwxr-xr-x 1 root root 129 Feb 13 17:17 /AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA/perl $ ./test.pl Arg # 0 : /nix/store/mbwav8kz8b3y471wjsybgzw84mrh4js9-perl-5.28.1/bin/perl Arg # 1 : -I/nix/store/x6yyav38jgr924nkna62q3pkp0dgmzlx-perl5.28.1-Fi Arg # 2 : ./test.pl $ ./AAAA.pl Error: no such file "I should fail to run huge interp\n" $ ./A128.pl Error: no such file "I should fail to run 128 byte buf interp\n" $ ./A127.pl Error: no such file "I should fail to run 127 byte buf interp\n" $ ./A126.pl Arg # 0 : '/AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA/perl' Arg # 1 : './A126.pl' $ ./A125space.pl Arg # 0 : '/AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA/perl' Arg # 1 : './A125space.pl' Are you able to test the patch and report back? Thanks again for bringing this to our attention! -- Kees Cook ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Userspace regression in LTS and stable kernels 2019-02-14 1:35 ` Kees Cook @ 2019-02-14 3:16 ` Samuel Dionne-Riel 0 siblings, 0 replies; 22+ messages in thread From: Samuel Dionne-Riel @ 2019-02-14 3:16 UTC (permalink / raw) To: Kees Cook Cc: Richard Weinberger, LKML, Linus Torvalds, Graham Christensen, Oleg Nesterov, Michal Hocko, Andrew Morton On 13/02/2019, Kees Cook <keescook@chromium.org> wrote: > The original problem that was trying to be fixed here was to disallow > execution of a truncated interpreter path. It was assumed argument > truncate was just as bad, but it's not, since the interpreter can (and > does!) re-read the script to get the right arguments. > > So, I've sent a fix-up patch that should disallow the path truncation, > but pass through the argument truncation as before. This passes all > the tests I built: > > [...] > > Are you able to test the patch and report back? The patch works as implemented. It also fixes the specific regression which affected NixOS. This was verified to work once applied to 4.14 in our testing infra. Confidence is high enough that I don't think I need to test other LTS/stable versions. Though, I have one minor doubt in mind. Looking at man 2 execve, ENOEXEC An executable is not in a recognized format, is for the wrong architecture, or has some other format error that means it cannot be executed. I can see "or some other format error" could be misapplied to mean ENOEXEC on failure to read the shebang, but I'm thinking it's kinda abusing the meaning behind the failure. The format was recognized, as a shebang, but it was impossible to use the shebang. If I were to misuse an error code, I would probably misuse ENAMETOOLONG. I'm still doubting ENOEXEC is safe to not cause issues since a truncated interpreter name (not shebang) will end up with a different behaviour than expected in the exec(3) userspace scenario, where with ENOEXEC the shell will be used instead of failing. Though, this is a different face to the same root regression reported here; our initial issue with the regression can be deemed fully resolved with the patch. In all cases, I think the man page will need an update to describe the new failure mode with truncated shebangs, and describe the non-failure mode when truncating arguments. > Thanks again for bringing this to our attention! Thanks for the quick turnaround! -- — Samuel Dionne-Riel ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Userspace regression in LTS and stable kernels 2019-02-13 23:36 ` Richard Weinberger 2019-02-14 0:41 ` Samuel Dionne-Riel @ 2019-02-14 0:41 ` Kees Cook 2019-02-14 17:56 ` Linus Torvalds 2 siblings, 0 replies; 22+ messages in thread From: Kees Cook @ 2019-02-14 0:41 UTC (permalink / raw) To: Richard Weinberger Cc: Samuel Dionne-Riel, LKML, Linus Torvalds, graham, Oleg Nesterov, Michal Hocko, Andrew Morton On Wed, Feb 13, 2019 at 3:36 PM Richard Weinberger <richard.weinberger@gmail.com> wrote: > > [CC'in relevant folks] > > On Thu, Feb 14, 2019 at 12:19 AM Samuel Dionne-Riel > <samuel@dionne-riel.com> wrote: > > > > Hi, > > > > I am posting as a representative of the NixOS Linux distribution, > > about a userspace regression on 5.0-rc* which recently was backported > > to the 4.14.99, 4.19.21 and 4.20.8 current LTS and stable versions. > > The issue has been reported to the bug tracker, bug 202497, but seems > > to have gone unnoticed by the maintainers. > > > > The issue seems to break userspace for long-standing patterns in the > > NixOS distribution, with regards to use of the shebangs. > > > > Here is an example shebang causing an issue: > > > > #! /nix/store/mbwav8kz8b3y471wjsybgzw84mrh4js9-perl-5.28.1/bin/perl > > -I/nix/store/x6yyav38jgr924nkna62q3pkp0dgmzlx-perl5.28.1-File-Slurp-9999.25/lib/perl5/site_perl > > -I/nix/store/ha8v67sl8dac92r9z07vzr4gv1y9nwqz-perl5.28.1-Net-DBus-1.1.0/lib/perl5/site_perl > > -I/nix/store/dcrkvnjmwh69ljsvpbdjjdnqgwx90a9d-perl5.28.1-XML-Parser-2.44/lib/perl5/site_perl > > -I/nix/store/rmji88k2zz7h4zg97385bygcydrf2q8h-perl5.28.1-XML-Twig-3.52/lib/perl5/site_perl > > This this ever work correctly? It is longer than BINPRM_BUF_SIZE. > > > (The shebang was artificially wrapped spaces replaced by newlines) > > > > Another contributor tracked the regression it to commit > > 8099b047ecc431518b9bb6bdbba3549bbecdc343 in the 5.0-rc* tree. > > > > I bring no particular fix to the issue, but I believe it should at > > least be fast-tracked to a revert for the stable and LTS branches, and > > since 5.0 might drop soon, a solution worked on, or possibly a revert > > until one is figured out. > > Your shebang line exceeds BINPRM_BUF_SIZE. > Before the said commit the kernel silently truncated the shebang line > (and corrupted it), > now it tells the user that the line is too long. Yeah, it looks like it just truncates: $ cat /nix/store/mbwav8kz8b3y471wjsybgzw84mrh4js9-perl-5.28.1/bin/perl #!/usr/bin/perl print "Arg # 0 : $0\n"; $counter = 1; foreach my $a (@ARGV) { print "Arg # $counter : $a\n"; $counter++; } $ cat test.pl #! /nix/store/mbwav8kz8b3y471wjsybgzw84mrh4js9-perl-5.28.1/bin/perl -I/nix/store/x6yyav38jgr924nkna62q3pkp0dgmzlx-perl5.28.1-File-Slurp-9999.25/lib/perl5/site_perl -I/nix/store/ha8v67sl8dac92r9z07vzr4gv1y9nwqz-perl5.28.1-Net-DBus-1.1.0/lib/perl5/site_perl -I/nix/store/dcrkvnjmwh69ljsvpbdjjdnqgwx90a9d-perl5.28.1-XML-Parser-2.44/lib/perl5/site_perl -I/nix/store/rmji88k2zz7h4zg97385bygcydrf2q8h-perl5.28.1-XML-Twig-3.52/lib/perl5/site_perl print "I am the script\n"; 4.20.7: $ ./test.pl Arg # 0 : /nix/store/mbwav8kz8b3y471wjsybgzw84mrh4js9-perl-5.28.1/bin/perl Arg # 1 : -I/nix/store/x6yyav38jgr924nkna62q3pkp0dgmzlx-perl5.28.1-Fi Arg # 2 : ./test.pl 4.20.8: $ ./test.pl Error: no such file "I am the script\n" (My shell seems to fall back to direct shell execution) Since this is breaking existing userspace, we should probably switch back to the truncation, but do a WARN_ONCE or something so there's a visible hint _somewhere_ about the (long standing) issue? What do you think Oleg? -Kees -- Kees Cook ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Userspace regression in LTS and stable kernels 2019-02-13 23:36 ` Richard Weinberger 2019-02-14 0:41 ` Samuel Dionne-Riel 2019-02-14 0:41 ` Kees Cook @ 2019-02-14 17:56 ` Linus Torvalds 2019-02-14 20:20 ` Andrew Morton 2 siblings, 1 reply; 22+ messages in thread From: Linus Torvalds @ 2019-02-14 17:56 UTC (permalink / raw) To: Richard Weinberger Cc: Samuel Dionne-Riel, LKML, graham, Oleg Nesterov, Kees Cook, Michal Hocko, Andrew Morton On Wed, Feb 13, 2019 at 3:37 PM Richard Weinberger <richard.weinberger@gmail.com> wrote: > > Your shebang line exceeds BINPRM_BUF_SIZE. > Before the said commit the kernel silently truncated the shebang line > (and corrupted it), > now it tells the user that the line is too long. It doesn't matter if it "corrupted" things by truncating it. All that matters is "it used to work, now it doesn't" Yes, maybe it never *should* have worked. And yes, it's sad that people apparently had cases that depended on this odd behavior, but there we are. I see that Kees has a patch to fix it up. Linus ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Userspace regression in LTS and stable kernels 2019-02-14 17:56 ` Linus Torvalds @ 2019-02-14 20:20 ` Andrew Morton 2019-02-15 7:00 ` Greg Kroah-Hartman 0 siblings, 1 reply; 22+ messages in thread From: Andrew Morton @ 2019-02-14 20:20 UTC (permalink / raw) To: Linus Torvalds Cc: Richard Weinberger, Samuel Dionne-Riel, LKML, graham, Oleg Nesterov, Kees Cook, Michal Hocko, Greg Kroah-Hartman On Thu, 14 Feb 2019 09:56:46 -0800 Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Wed, Feb 13, 2019 at 3:37 PM Richard Weinberger > <richard.weinberger@gmail.com> wrote: > > > > Your shebang line exceeds BINPRM_BUF_SIZE. > > Before the said commit the kernel silently truncated the shebang line > > (and corrupted it), > > now it tells the user that the line is too long. > > It doesn't matter if it "corrupted" things by truncating it. All that > matters is "it used to work, now it doesn't" > > Yes, maybe it never *should* have worked. And yes, it's sad that > people apparently had cases that depended on this odd behavior, but > there we are. > > I see that Kees has a patch to fix it up. > Greg, I think we have a problem here. 8099b047ecc431518 ("exec: load_script: don't blindly truncate shebang string") wasn't marked for backporting. And, presumably as a consequence, Kees's fix "exec: load_script: allow interpreter argument truncation" was not marked for backporting. 8099b047ecc431518 hasn't even appeared in a Linus released kernel, yet it is now present in 4.9.x, 4.14.x, 4.19.x and 4.20.x. I don't know if Oleg considered backporting that patch. I certainly did (I always do), and I decided against doing so. Yet there it is. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Userspace regression in LTS and stable kernels 2019-02-14 20:20 ` Andrew Morton @ 2019-02-15 7:00 ` Greg Kroah-Hartman 2019-02-15 7:13 ` Greg Kroah-Hartman 2019-02-15 9:10 ` Michal Hocko 0 siblings, 2 replies; 22+ messages in thread From: Greg Kroah-Hartman @ 2019-02-15 7:00 UTC (permalink / raw) To: Andrew Morton Cc: stable, Linus Torvalds, Richard Weinberger, Samuel Dionne-Riel, LKML, graham, Oleg Nesterov, Kees Cook, Michal Hocko On Thu, Feb 14, 2019 at 12:20:27PM -0800, Andrew Morton wrote: > On Thu, 14 Feb 2019 09:56:46 -0800 Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > On Wed, Feb 13, 2019 at 3:37 PM Richard Weinberger > > <richard.weinberger@gmail.com> wrote: > > > > > > Your shebang line exceeds BINPRM_BUF_SIZE. > > > Before the said commit the kernel silently truncated the shebang line > > > (and corrupted it), > > > now it tells the user that the line is too long. > > > > It doesn't matter if it "corrupted" things by truncating it. All that > > matters is "it used to work, now it doesn't" > > > > Yes, maybe it never *should* have worked. And yes, it's sad that > > people apparently had cases that depended on this odd behavior, but > > there we are. > > > > I see that Kees has a patch to fix it up. > > > > Greg, I think we have a problem here. > > 8099b047ecc431518 ("exec: load_script: don't blindly truncate shebang > string") wasn't marked for backporting. And, presumably as a > consequence, Kees's fix "exec: load_script: allow interpreter argument > truncation" was not marked for backporting. > > 8099b047ecc431518 hasn't even appeared in a Linus released kernel, yet > it is now present in 4.9.x, 4.14.x, 4.19.x and 4.20.x. It came in 5.0-rc1, so it fits the "in a Linus released kernel" requirement. If we are to wait until it shows up in a -final, that would be months too late for almost all of these types of patches that are picked up. > I don't know if Oleg considered backporting that patch. I certainly > did (I always do), and I decided against doing so. Yet there it is. This came in through Sasha's tools, which give people a week or so to say "hey, this isn't a stable patch!" and it seems everyone ignored that :( Where is Kees's fix? I'll be glad to queue it up, or just revert the above commit, which ever people think is easiest. thanks, greg k-h ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Userspace regression in LTS and stable kernels 2019-02-15 7:00 ` Greg Kroah-Hartman @ 2019-02-15 7:13 ` Greg Kroah-Hartman 2019-02-15 9:10 ` Michal Hocko 1 sibling, 0 replies; 22+ messages in thread From: Greg Kroah-Hartman @ 2019-02-15 7:13 UTC (permalink / raw) To: Andrew Morton Cc: stable, Linus Torvalds, Richard Weinberger, Samuel Dionne-Riel, LKML, graham, Oleg Nesterov, Kees Cook, Michal Hocko On Fri, Feb 15, 2019 at 08:00:22AM +0100, Greg Kroah-Hartman wrote: > On Thu, Feb 14, 2019 at 12:20:27PM -0800, Andrew Morton wrote: > > On Thu, 14 Feb 2019 09:56:46 -0800 Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > > > On Wed, Feb 13, 2019 at 3:37 PM Richard Weinberger > > > <richard.weinberger@gmail.com> wrote: > > > > > > > > Your shebang line exceeds BINPRM_BUF_SIZE. > > > > Before the said commit the kernel silently truncated the shebang line > > > > (and corrupted it), > > > > now it tells the user that the line is too long. > > > > > > It doesn't matter if it "corrupted" things by truncating it. All that > > > matters is "it used to work, now it doesn't" > > > > > > Yes, maybe it never *should* have worked. And yes, it's sad that > > > people apparently had cases that depended on this odd behavior, but > > > there we are. > > > > > > I see that Kees has a patch to fix it up. > > > > > > > Greg, I think we have a problem here. > > > > 8099b047ecc431518 ("exec: load_script: don't blindly truncate shebang > > string") wasn't marked for backporting. And, presumably as a > > consequence, Kees's fix "exec: load_script: allow interpreter argument > > truncation" was not marked for backporting. > > > > 8099b047ecc431518 hasn't even appeared in a Linus released kernel, yet > > it is now present in 4.9.x, 4.14.x, 4.19.x and 4.20.x. > > It came in 5.0-rc1, so it fits the "in a Linus released kernel" > requirement. If we are to wait until it shows up in a -final, that > would be months too late for almost all of these types of patches that > are picked up. > > > I don't know if Oleg considered backporting that patch. I certainly > > did (I always do), and I decided against doing so. Yet there it is. > > This came in through Sasha's tools, which give people a week or so to > say "hey, this isn't a stable patch!" and it seems everyone ignored that > :( > > Where is Kees's fix? I'll be glad to queue it up, or just revert the > above commit, which ever people think is easiest. Ah, I see the fix now, _after_ I just pushed out a bunch of stable releases. I'll go queue it up and push it out with just that fix in it now... thanks, greg k-h ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Userspace regression in LTS and stable kernels 2019-02-15 7:00 ` Greg Kroah-Hartman 2019-02-15 7:13 ` Greg Kroah-Hartman @ 2019-02-15 9:10 ` Michal Hocko 2019-02-15 9:20 ` Greg Kroah-Hartman 1 sibling, 1 reply; 22+ messages in thread From: Michal Hocko @ 2019-02-15 9:10 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Andrew Morton, stable, Linus Torvalds, Richard Weinberger, Samuel Dionne-Riel, LKML, graham, Oleg Nesterov, Kees Cook On Fri 15-02-19 08:00:22, Greg KH wrote: > On Thu, Feb 14, 2019 at 12:20:27PM -0800, Andrew Morton wrote: > > On Thu, 14 Feb 2019 09:56:46 -0800 Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > > > On Wed, Feb 13, 2019 at 3:37 PM Richard Weinberger > > > <richard.weinberger@gmail.com> wrote: > > > > > > > > Your shebang line exceeds BINPRM_BUF_SIZE. > > > > Before the said commit the kernel silently truncated the shebang line > > > > (and corrupted it), > > > > now it tells the user that the line is too long. > > > > > > It doesn't matter if it "corrupted" things by truncating it. All that > > > matters is "it used to work, now it doesn't" > > > > > > Yes, maybe it never *should* have worked. And yes, it's sad that > > > people apparently had cases that depended on this odd behavior, but > > > there we are. > > > > > > I see that Kees has a patch to fix it up. > > > > > > > Greg, I think we have a problem here. > > > > 8099b047ecc431518 ("exec: load_script: don't blindly truncate shebang > > string") wasn't marked for backporting. And, presumably as a > > consequence, Kees's fix "exec: load_script: allow interpreter argument > > truncation" was not marked for backporting. > > > > 8099b047ecc431518 hasn't even appeared in a Linus released kernel, yet > > it is now present in 4.9.x, 4.14.x, 4.19.x and 4.20.x. > > It came in 5.0-rc1, so it fits the "in a Linus released kernel" > requirement. If we are to wait until it shows up in a -final, that > would be months too late for almost all of these types of patches that > are picked up. rc1 is just a too early. Waiting few more rcs or even a final release for something that people do not see as an issue should be just fine. Consider this particular patch and tell me why it had to be rushed in the first place. The original code was broken for _years_ but I do not remember anybody would be complaining. > > I don't know if Oleg considered backporting that patch. I certainly > > did (I always do), and I decided against doing so. Yet there it is. > > This came in through Sasha's tools, which give people a week or so to > say "hey, this isn't a stable patch!" and it seems everyone ignored that > :( I thought we were through this already. Automagic autoselection of patches in the core kernel (or mmotm tree patches in particular) is too dangerous. We try hard to consider each and every patch for stable. Even if something slips through then it is much more preferred to ask for a stable backport in the respective email thread and wait for a conclusion before adding it. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Userspace regression in LTS and stable kernels 2019-02-15 9:10 ` Michal Hocko @ 2019-02-15 9:20 ` Greg Kroah-Hartman 2019-02-15 9:42 ` Michal Hocko 0 siblings, 1 reply; 22+ messages in thread From: Greg Kroah-Hartman @ 2019-02-15 9:20 UTC (permalink / raw) To: Michal Hocko Cc: Andrew Morton, stable, Linus Torvalds, Richard Weinberger, Samuel Dionne-Riel, LKML, graham, Oleg Nesterov, Kees Cook On Fri, Feb 15, 2019 at 10:10:00AM +0100, Michal Hocko wrote: > On Fri 15-02-19 08:00:22, Greg KH wrote: > > On Thu, Feb 14, 2019 at 12:20:27PM -0800, Andrew Morton wrote: > > > On Thu, 14 Feb 2019 09:56:46 -0800 Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > > > > > On Wed, Feb 13, 2019 at 3:37 PM Richard Weinberger > > > > <richard.weinberger@gmail.com> wrote: > > > > > > > > > > Your shebang line exceeds BINPRM_BUF_SIZE. > > > > > Before the said commit the kernel silently truncated the shebang line > > > > > (and corrupted it), > > > > > now it tells the user that the line is too long. > > > > > > > > It doesn't matter if it "corrupted" things by truncating it. All that > > > > matters is "it used to work, now it doesn't" > > > > > > > > Yes, maybe it never *should* have worked. And yes, it's sad that > > > > people apparently had cases that depended on this odd behavior, but > > > > there we are. > > > > > > > > I see that Kees has a patch to fix it up. > > > > > > > > > > Greg, I think we have a problem here. > > > > > > 8099b047ecc431518 ("exec: load_script: don't blindly truncate shebang > > > string") wasn't marked for backporting. And, presumably as a > > > consequence, Kees's fix "exec: load_script: allow interpreter argument > > > truncation" was not marked for backporting. > > > > > > 8099b047ecc431518 hasn't even appeared in a Linus released kernel, yet > > > it is now present in 4.9.x, 4.14.x, 4.19.x and 4.20.x. > > > > It came in 5.0-rc1, so it fits the "in a Linus released kernel" > > requirement. If we are to wait until it shows up in a -final, that > > would be months too late for almost all of these types of patches that > > are picked up. > > rc1 is just a too early. Waiting few more rcs or even a final release > for something that people do not see as an issue should be just fine. > Consider this particular patch and tell me why it had to be rushed in > the first place. The original code was broken for _years_ but I do not > remember anybody would be complaining. This patch was in 4.20.10, which was released on Feb 12 while 5.0-rc1 came out on Jan 6. Over a month delay. > > > I don't know if Oleg considered backporting that patch. I certainly > > > did (I always do), and I decided against doing so. Yet there it is. > > > > This came in through Sasha's tools, which give people a week or so to > > say "hey, this isn't a stable patch!" and it seems everyone ignored that > > :( > > I thought we were through this already. Automagic autoselection of > patches in the core kernel (or mmotm tree patches in particular) is too > dangerous. We try hard to consider each and every patch for stable. Even > if something slips through then it is much more preferred to ask for a > stable backport in the respective email thread and wait for a conclusion > before adding it. We have a list of blacklisted files/subsystems for people that do not want this to happen to their area of the kernel. The patch seemed to make sense, and it passed all known tests that we currently have. Sometimes things will slip through like this, it happens. And really, a 3 day turn-around-time to resolve this is pretty good, don't you think? It also seems like we need another test to catch this problem from ever happening again :) thanks, greg k-h ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Userspace regression in LTS and stable kernels 2019-02-15 9:20 ` Greg Kroah-Hartman @ 2019-02-15 9:42 ` Michal Hocko 2019-02-15 15:19 ` Sasha Levin 0 siblings, 1 reply; 22+ messages in thread From: Michal Hocko @ 2019-02-15 9:42 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Andrew Morton, stable, Linus Torvalds, Richard Weinberger, Samuel Dionne-Riel, LKML, graham, Oleg Nesterov, Kees Cook On Fri 15-02-19 10:20:13, Greg KH wrote: > On Fri, Feb 15, 2019 at 10:10:00AM +0100, Michal Hocko wrote: > > On Fri 15-02-19 08:00:22, Greg KH wrote: > > > On Thu, Feb 14, 2019 at 12:20:27PM -0800, Andrew Morton wrote: > > > > On Thu, 14 Feb 2019 09:56:46 -0800 Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > > > > > > > On Wed, Feb 13, 2019 at 3:37 PM Richard Weinberger > > > > > <richard.weinberger@gmail.com> wrote: > > > > > > > > > > > > Your shebang line exceeds BINPRM_BUF_SIZE. > > > > > > Before the said commit the kernel silently truncated the shebang line > > > > > > (and corrupted it), > > > > > > now it tells the user that the line is too long. > > > > > > > > > > It doesn't matter if it "corrupted" things by truncating it. All that > > > > > matters is "it used to work, now it doesn't" > > > > > > > > > > Yes, maybe it never *should* have worked. And yes, it's sad that > > > > > people apparently had cases that depended on this odd behavior, but > > > > > there we are. > > > > > > > > > > I see that Kees has a patch to fix it up. > > > > > > > > > > > > > Greg, I think we have a problem here. > > > > > > > > 8099b047ecc431518 ("exec: load_script: don't blindly truncate shebang > > > > string") wasn't marked for backporting. And, presumably as a > > > > consequence, Kees's fix "exec: load_script: allow interpreter argument > > > > truncation" was not marked for backporting. > > > > > > > > 8099b047ecc431518 hasn't even appeared in a Linus released kernel, yet > > > > it is now present in 4.9.x, 4.14.x, 4.19.x and 4.20.x. > > > > > > It came in 5.0-rc1, so it fits the "in a Linus released kernel" > > > requirement. If we are to wait until it shows up in a -final, that > > > would be months too late for almost all of these types of patches that > > > are picked up. > > > > rc1 is just a too early. Waiting few more rcs or even a final release > > for something that people do not see as an issue should be just fine. > > Consider this particular patch and tell me why it had to be rushed in > > the first place. The original code was broken for _years_ but I do not > > remember anybody would be complaining. > > This patch was in 4.20.10, which was released on Feb 12 while 5.0-rc1 > came out on Jan 6. Over a month delay. Obviously not long enough. > > > > I don't know if Oleg considered backporting that patch. I certainly > > > > did (I always do), and I decided against doing so. Yet there it is. > > > > > > This came in through Sasha's tools, which give people a week or so to > > > say "hey, this isn't a stable patch!" and it seems everyone ignored that > > > :( > > > > I thought we were through this already. Automagic autoselection of > > patches in the core kernel (or mmotm tree patches in particular) is too > > dangerous. We try hard to consider each and every patch for stable. Even > > if something slips through then it is much more preferred to ask for a > > stable backport in the respective email thread and wait for a conclusion > > before adding it. > > We have a list of blacklisted files/subsystems for people that do not > want this to happen to their area of the kernel. The patch seemed to > make sense, and it passed all known tests that we currently have. Yes, the patch makes sense (I wouldn't give my acked-by otherwise). But this is one of the area where things that make sense might still break because it is hard to assume what userspace depends on. > Sometimes things will slip through like this, it happens. And really, a > 3 day turn-around-time to resolve this is pretty good, don't you think? Yes, but that doesn't make any difference on the fact that this was not marked for stable and I still think this is not a stable material - at least not at this moment. > It also seems like we need another test to catch this problem from ever > happening again :) Agreed on this. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Userspace regression in LTS and stable kernels 2019-02-15 9:42 ` Michal Hocko @ 2019-02-15 15:19 ` Sasha Levin 2019-02-15 15:52 ` Michal Hocko 0 siblings, 1 reply; 22+ messages in thread From: Sasha Levin @ 2019-02-15 15:19 UTC (permalink / raw) To: Michal Hocko Cc: Greg Kroah-Hartman, Andrew Morton, stable, Linus Torvalds, Richard Weinberger, Samuel Dionne-Riel, LKML, graham, Oleg Nesterov, Kees Cook On Fri, Feb 15, 2019 at 10:42:05AM +0100, Michal Hocko wrote: >On Fri 15-02-19 10:20:13, Greg KH wrote: >> On Fri, Feb 15, 2019 at 10:10:00AM +0100, Michal Hocko wrote: >> > On Fri 15-02-19 08:00:22, Greg KH wrote: >> > > On Thu, Feb 14, 2019 at 12:20:27PM -0800, Andrew Morton wrote: >> > > > On Thu, 14 Feb 2019 09:56:46 -0800 Linus Torvalds <torvalds@linux-foundation.org> wrote: >> > > > >> > > > > On Wed, Feb 13, 2019 at 3:37 PM Richard Weinberger >> > > > > <richard.weinberger@gmail.com> wrote: >> > > > > > >> > > > > > Your shebang line exceeds BINPRM_BUF_SIZE. >> > > > > > Before the said commit the kernel silently truncated the shebang line >> > > > > > (and corrupted it), >> > > > > > now it tells the user that the line is too long. >> > > > > >> > > > > It doesn't matter if it "corrupted" things by truncating it. All that >> > > > > matters is "it used to work, now it doesn't" >> > > > > >> > > > > Yes, maybe it never *should* have worked. And yes, it's sad that >> > > > > people apparently had cases that depended on this odd behavior, but >> > > > > there we are. >> > > > > >> > > > > I see that Kees has a patch to fix it up. >> > > > > >> > > > >> > > > Greg, I think we have a problem here. >> > > > >> > > > 8099b047ecc431518 ("exec: load_script: don't blindly truncate shebang >> > > > string") wasn't marked for backporting. And, presumably as a >> > > > consequence, Kees's fix "exec: load_script: allow interpreter argument >> > > > truncation" was not marked for backporting. >> > > > >> > > > 8099b047ecc431518 hasn't even appeared in a Linus released kernel, yet >> > > > it is now present in 4.9.x, 4.14.x, 4.19.x and 4.20.x. >> > > >> > > It came in 5.0-rc1, so it fits the "in a Linus released kernel" >> > > requirement. If we are to wait until it shows up in a -final, that >> > > would be months too late for almost all of these types of patches that >> > > are picked up. >> > >> > rc1 is just a too early. Waiting few more rcs or even a final release >> > for something that people do not see as an issue should be just fine. >> > Consider this particular patch and tell me why it had to be rushed in >> > the first place. The original code was broken for _years_ but I do not >> > remember anybody would be complaining. >> >> This patch was in 4.20.10, which was released on Feb 12 while 5.0-rc1 >> came out on Jan 6. Over a month delay. > >Obviously not long enough. You're assuming that if we wouldn't have taken this patch to stable somehow someone else would notice this bug and fix it. What test do we have that would catch this? Which testsuite tests for long shebang lines? Where is the test added together with this patch that covers this and similar cases? The fact is that many patches are not tested until they get to stable, whether we add them the same week they went upstream or months later. This is a great case for this: I doubt anyone but NixOS does this crazy thing with shebang lines, so who else would discover the bug? If this is indeed a case of us jumping the gun and shipping stuff too early before all tests are complete, please point me to the test that we missed and I'll make sure that for any future kernel release it gets run before we ship a stable kernel. >> > > > I don't know if Oleg considered backporting that patch. I certainly >> > > > did (I always do), and I decided against doing so. Yet there it is. >> > > >> > > This came in through Sasha's tools, which give people a week or so to >> > > say "hey, this isn't a stable patch!" and it seems everyone ignored that >> > > :( >> > >> > I thought we were through this already. Automagic autoselection of >> > patches in the core kernel (or mmotm tree patches in particular) is too >> > dangerous. We try hard to consider each and every patch for stable. Even >> > if something slips through then it is much more preferred to ask for a >> > stable backport in the respective email thread and wait for a conclusion >> > before adding it. >> >> We have a list of blacklisted files/subsystems for people that do not >> want this to happen to their area of the kernel. The patch seemed to >> make sense, and it passed all known tests that we currently have. > >Yes, the patch makes sense (I wouldn't give my acked-by otherwise). But >this is one of the area where things that make sense might still break >because it is hard to assume what userspace depends on. Great, so the solution is to just not take these things into stable at all? The solution should be to add tests to the patches that go in there to verify their correctness and that they don't regress in the future. If you're really concerned about subsystems being brittle the solution is to improve their testing rather push stuff in and hope nothing explodes. On one hand you Ack it saying it looks great to you and should be merged, but on the other hand you're saying that you don't really trust the patch? Really, if I wouldn't pick this patch now what do you think would have happened? It would just pop up in a few months as we roll our stable kernel forward. >> Sometimes things will slip through like this, it happens. And really, a >> 3 day turn-around-time to resolve this is pretty good, don't you think? > >Yes, but that doesn't make any difference on the fact that this was not >marked for stable and I still think this is not a stable material - at >least not at this moment. Hindsight is 20/20 :) If people were good at understanding the impact and implications their patch has on the kernel we would never introduce new bugs! I'll happily list a bunch more patches where folks didn't think they're stable material, but turned out to be important fixes. -- Thanks, Sasha ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Userspace regression in LTS and stable kernels 2019-02-15 15:19 ` Sasha Levin @ 2019-02-15 15:52 ` Michal Hocko 2019-02-15 16:18 ` Samuel Dionne-Riel 2019-02-15 18:00 ` Sasha Levin 0 siblings, 2 replies; 22+ messages in thread From: Michal Hocko @ 2019-02-15 15:52 UTC (permalink / raw) To: Sasha Levin Cc: Greg Kroah-Hartman, Andrew Morton, stable, Linus Torvalds, Richard Weinberger, Samuel Dionne-Riel, LKML, graham, Oleg Nesterov, Kees Cook On Fri 15-02-19 10:19:12, Sasha Levin wrote: > On Fri, Feb 15, 2019 at 10:42:05AM +0100, Michal Hocko wrote: > > On Fri 15-02-19 10:20:13, Greg KH wrote: > > > On Fri, Feb 15, 2019 at 10:10:00AM +0100, Michal Hocko wrote: > > > > On Fri 15-02-19 08:00:22, Greg KH wrote: > > > > > On Thu, Feb 14, 2019 at 12:20:27PM -0800, Andrew Morton wrote: > > > > > > On Thu, 14 Feb 2019 09:56:46 -0800 Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > > > > > > > > > > > On Wed, Feb 13, 2019 at 3:37 PM Richard Weinberger > > > > > > > <richard.weinberger@gmail.com> wrote: > > > > > > > > > > > > > > > > Your shebang line exceeds BINPRM_BUF_SIZE. > > > > > > > > Before the said commit the kernel silently truncated the shebang line > > > > > > > > (and corrupted it), > > > > > > > > now it tells the user that the line is too long. > > > > > > > > > > > > > > It doesn't matter if it "corrupted" things by truncating it. All that > > > > > > > matters is "it used to work, now it doesn't" > > > > > > > > > > > > > > Yes, maybe it never *should* have worked. And yes, it's sad that > > > > > > > people apparently had cases that depended on this odd behavior, but > > > > > > > there we are. > > > > > > > > > > > > > > I see that Kees has a patch to fix it up. > > > > > > > > > > > > > > > > > > > Greg, I think we have a problem here. > > > > > > > > > > > > 8099b047ecc431518 ("exec: load_script: don't blindly truncate shebang > > > > > > string") wasn't marked for backporting. And, presumably as a > > > > > > consequence, Kees's fix "exec: load_script: allow interpreter argument > > > > > > truncation" was not marked for backporting. > > > > > > > > > > > > 8099b047ecc431518 hasn't even appeared in a Linus released kernel, yet > > > > > > it is now present in 4.9.x, 4.14.x, 4.19.x and 4.20.x. > > > > > > > > > > It came in 5.0-rc1, so it fits the "in a Linus released kernel" > > > > > requirement. If we are to wait until it shows up in a -final, that > > > > > would be months too late for almost all of these types of patches that > > > > > are picked up. > > > > > > > > rc1 is just a too early. Waiting few more rcs or even a final release > > > > for something that people do not see as an issue should be just fine. > > > > Consider this particular patch and tell me why it had to be rushed in > > > > the first place. The original code was broken for _years_ but I do not > > > > remember anybody would be complaining. > > > > > > This patch was in 4.20.10, which was released on Feb 12 while 5.0-rc1 > > > came out on Jan 6. Over a month delay. > > > > Obviously not long enough. > > You're assuming that if we wouldn't have taken this patch to stable > somehow someone else would notice this bug and fix it. > > What test do we have that would catch this? Which testsuite tests for > long shebang lines? Where is the test added together with this patch > that covers this and similar cases? The test is the "users out there". Right now we do not have any specialized test case because we haven't even realized it might break something. The main difference between breaking on the bleeding edge vs. stable tree is that people running on bleeding edge are more likely to expect a breakage while stable users would most likely prefer to not be guinea pigs and have, well stable trees. [...] > > > We have a list of blacklisted files/subsystems for people that do not > > > want this to happen to their area of the kernel. The patch seemed to > > > make sense, and it passed all known tests that we currently have. > > > > Yes, the patch makes sense (I wouldn't give my acked-by otherwise). But > > this is one of the area where things that make sense might still break > > because it is hard to assume what userspace depends on. > > Great, so the solution is to just not take these things into stable at > all? No, but if the patch author and the maintainer have considered the stable tree and haven't found convincing arguments to mark for stable then it is likely that the patch doesn't need an urgent backporting. > The solution should be to add tests to the patches that go in there > to verify their correctness and that they don't regress in the future. > > If you're really concerned about subsystems being brittle the solution > is to improve their testing rather push stuff in and hope nothing > explodes. > > On one hand you Ack it saying it looks great to you and should be > merged, but on the other hand you're saying that you don't really trust > the patch? No. But I didn't consider it a stable material. You just do not really need all the patches in the stable, right? I have already said that this code is there for ages and fixing it is good to have for future but considering that nobody was really complaining then a backporting just adds a risk and as it turned out that risk was really not zero. > Really, if I wouldn't pick this patch now what do you think would have > happened? It would just pop up in a few months as we roll our stable > kernel forward. and that would be a different kernel version and people kinda expect bugs with newer versions. This is not the case with the stable update. But I guess we are just repeating the same discussion over and over. Our expectations about what the stable kernel should be differs a lot. I would like to see fewer but only important fixes while you would like to take as many fixes as possible. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Userspace regression in LTS and stable kernels 2019-02-15 15:52 ` Michal Hocko @ 2019-02-15 16:18 ` Samuel Dionne-Riel 2019-02-15 18:02 ` Sasha Levin 2019-02-15 18:00 ` Sasha Levin 1 sibling, 1 reply; 22+ messages in thread From: Samuel Dionne-Riel @ 2019-02-15 16:18 UTC (permalink / raw) To: Michal Hocko Cc: Sasha Levin, Greg Kroah-Hartman, Andrew Morton, stable, Linus Torvalds, Richard Weinberger, LKML, graham, Oleg Nesterov, Kees Cook On 15/02/2019, Michal Hocko <mhocko@kernel.org> wrote: > On Fri 15-02-19 10:19:12, Sasha Levin wrote: >> On Fri, Feb 15, 2019 at 10:42:05AM +0100, Michal Hocko wrote: >> > On Fri 15-02-19 10:20:13, Greg KH wrote: >> > > On Fri, Feb 15, 2019 at 10:10:00AM +0100, Michal Hocko wrote: >> > > > On Fri 15-02-19 08:00:22, Greg KH wrote: >> > > > > On Thu, Feb 14, 2019 at 12:20:27PM -0800, Andrew Morton wrote: >> > > > > > On Thu, 14 Feb 2019 09:56:46 -0800 Linus Torvalds >> > > > > > <torvalds@linux-foundation.org> wrote: >> > > > > > >> > > > > > > On Wed, Feb 13, 2019 at 3:37 PM Richard Weinberger >> > > > > > > <richard.weinberger@gmail.com> wrote: >> > > > > > > > >> > > > > > > > Your shebang line exceeds BINPRM_BUF_SIZE. >> > > > > > > > Before the said commit the kernel silently truncated the >> > > > > > > > shebang line >> > > > > > > > (and corrupted it), >> > > > > > > > now it tells the user that the line is too long. >> > > > > > > >> > > > > > > It doesn't matter if it "corrupted" things by truncating it. >> > > > > > > All that >> > > > > > > matters is "it used to work, now it doesn't" >> > > > > > > >> > > > > > > Yes, maybe it never *should* have worked. And yes, it's sad >> > > > > > > that >> > > > > > > people apparently had cases that depended on this odd >> > > > > > > behavior, but >> > > > > > > there we are. >> > > > > > > >> > > > > > > I see that Kees has a patch to fix it up. >> > > > > > > >> > > > > > >> > > > > > Greg, I think we have a problem here. >> > > > > > >> > > > > > 8099b047ecc431518 ("exec: load_script: don't blindly truncate >> > > > > > shebang >> > > > > > string") wasn't marked for backporting. And, presumably as a >> > > > > > consequence, Kees's fix "exec: load_script: allow interpreter >> > > > > > argument >> > > > > > truncation" was not marked for backporting. >> > > > > > >> > > > > > 8099b047ecc431518 hasn't even appeared in a Linus released >> > > > > > kernel, yet >> > > > > > it is now present in 4.9.x, 4.14.x, 4.19.x and 4.20.x. >> > > > > >> > > > > It came in 5.0-rc1, so it fits the "in a Linus released kernel" >> > > > > requirement. If we are to wait until it shows up in a -final, >> > > > > that >> > > > > would be months too late for almost all of these types of patches >> > > > > that >> > > > > are picked up. >> > > > >> > > > rc1 is just a too early. Waiting few more rcs or even a final >> > > > release >> > > > for something that people do not see as an issue should be just >> > > > fine. >> > > > Consider this particular patch and tell me why it had to be rushed >> > > > in >> > > > the first place. The original code was broken for _years_ but I do >> > > > not >> > > > remember anybody would be complaining. >> > > >> > > This patch was in 4.20.10, which was released on Feb 12 while 5.0-rc1 >> > > came out on Jan 6. Over a month delay. >> > >> > Obviously not long enough. >> >> You're assuming that if we wouldn't have taken this patch to stable >> somehow someone else would notice this bug and fix it. >> >> What test do we have that would catch this? Which testsuite tests for >> long shebang lines? Where is the test added together with this patch >> that covers this and similar cases? > > The test is the "users out there". Right now we do not have any > specialized test case because we haven't even realized it might break > something. The main difference between breaking on the bleeding edge vs. > stable tree is that people running on bleeding edge are more likely to > expect a breakage while stable users would most likely prefer to not be > guinea pigs and have, well stable trees. > [...] > >> > > We have a list of blacklisted files/subsystems for people that do not >> > > want this to happen to their area of the kernel. The patch seemed to >> > > make sense, and it passed all known tests that we currently have. >> > >> > Yes, the patch makes sense (I wouldn't give my acked-by otherwise). But >> > this is one of the area where things that make sense might still break >> > because it is hard to assume what userspace depends on. >> >> Great, so the solution is to just not take these things into stable at >> all? > > No, but if the patch author and the maintainer have considered the > stable tree and haven't found convincing arguments to mark for stable > then it is likely that the patch doesn't need an urgent backporting. > >> The solution should be to add tests to the patches that go in there >> to verify their correctness and that they don't regress in the future. >> >> If you're really concerned about subsystems being brittle the solution >> is to improve their testing rather push stuff in and hope nothing >> explodes. >> >> On one hand you Ack it saying it looks great to you and should be >> merged, but on the other hand you're saying that you don't really trust >> the patch? > > No. But I didn't consider it a stable material. You just do not really > need all the patches in the stable, right? I have already said that this > code is there for ages and fixing it is good to have for future but > considering that nobody was really complaining then a backporting just > adds a risk and as it turned out that risk was really not zero. > I'm sorry to interject here, but the issue was reported on the Kernel.org Bugzilla on February 2nd - https://bugzilla.kernel.org/show_bug.cgi?id=202497 In the interest of better communication, if the need arises again, how should bugs in the RC kernels be reported so they (1) are spotted by the right maintainers and (2) not backported even though they were reported as causing breaking changes? >> Really, if I wouldn't pick this patch now what do you think would have >> happened? It would just pop up in a few months as we roll our stable >> kernel forward. > > and that would be a different kernel version and people kinda expect > bugs with newer versions. This is not the case with the stable update. > > But I guess we are just repeating the same discussion over and over. Our > expectations about what the stable kernel should be differs a lot. I > would like to see fewer but only important fixes while you would like to > take as many fixes as possible. > -- > Michal Hocko > SUSE Labs > -- — Samuel Dionne-Riel ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Userspace regression in LTS and stable kernels 2019-02-15 16:18 ` Samuel Dionne-Riel @ 2019-02-15 18:02 ` Sasha Levin 0 siblings, 0 replies; 22+ messages in thread From: Sasha Levin @ 2019-02-15 18:02 UTC (permalink / raw) To: Samuel Dionne-Riel Cc: Michal Hocko, Greg Kroah-Hartman, Andrew Morton, stable, Linus Torvalds, Richard Weinberger, LKML, graham, Oleg Nesterov, Kees Cook On Fri, Feb 15, 2019 at 11:18:30AM -0500, Samuel Dionne-Riel wrote: >I'm sorry to interject here, but the issue was reported on the >Kernel.org Bugzilla on February 2nd > > - https://bugzilla.kernel.org/show_bug.cgi?id=202497 > >In the interest of better communication, if the need arises again, how >should bugs in the RC kernels be reported so they (1) are spotted by >the right maintainers and (2) not backported even though they were >reported as causing breaking changes? Sadly our bugzilla is rarely used, even though the information in that particular bug report is perfect. Maybe pinging LKML and stable@vger.kernel.org would be enough, specially if you know that it's a stable commit that caused the regression. -- Thanks, Sasha ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Userspace regression in LTS and stable kernels 2019-02-15 15:52 ` Michal Hocko 2019-02-15 16:18 ` Samuel Dionne-Riel @ 2019-02-15 18:00 ` Sasha Levin 2019-02-18 12:56 ` Michal Hocko 1 sibling, 1 reply; 22+ messages in thread From: Sasha Levin @ 2019-02-15 18:00 UTC (permalink / raw) To: Michal Hocko Cc: Greg Kroah-Hartman, Andrew Morton, stable, Linus Torvalds, Richard Weinberger, Samuel Dionne-Riel, LKML, graham, Oleg Nesterov, Kees Cook On Fri, Feb 15, 2019 at 04:52:00PM +0100, Michal Hocko wrote: >On Fri 15-02-19 10:19:12, Sasha Levin wrote: >> On Fri, Feb 15, 2019 at 10:42:05AM +0100, Michal Hocko wrote: >> > On Fri 15-02-19 10:20:13, Greg KH wrote: >> > > On Fri, Feb 15, 2019 at 10:10:00AM +0100, Michal Hocko wrote: >> > > > On Fri 15-02-19 08:00:22, Greg KH wrote: >> > > > > On Thu, Feb 14, 2019 at 12:20:27PM -0800, Andrew Morton wrote: >> > > > > > On Thu, 14 Feb 2019 09:56:46 -0800 Linus Torvalds <torvalds@linux-foundation.org> wrote: >> > > > > > >> > > > > > > On Wed, Feb 13, 2019 at 3:37 PM Richard Weinberger >> > > > > > > <richard.weinberger@gmail.com> wrote: >> > > > > > > > >> > > > > > > > Your shebang line exceeds BINPRM_BUF_SIZE. >> > > > > > > > Before the said commit the kernel silently truncated the shebang line >> > > > > > > > (and corrupted it), >> > > > > > > > now it tells the user that the line is too long. >> > > > > > > >> > > > > > > It doesn't matter if it "corrupted" things by truncating it. All that >> > > > > > > matters is "it used to work, now it doesn't" >> > > > > > > >> > > > > > > Yes, maybe it never *should* have worked. And yes, it's sad that >> > > > > > > people apparently had cases that depended on this odd behavior, but >> > > > > > > there we are. >> > > > > > > >> > > > > > > I see that Kees has a patch to fix it up. >> > > > > > > >> > > > > > >> > > > > > Greg, I think we have a problem here. >> > > > > > >> > > > > > 8099b047ecc431518 ("exec: load_script: don't blindly truncate shebang >> > > > > > string") wasn't marked for backporting. And, presumably as a >> > > > > > consequence, Kees's fix "exec: load_script: allow interpreter argument >> > > > > > truncation" was not marked for backporting. >> > > > > > >> > > > > > 8099b047ecc431518 hasn't even appeared in a Linus released kernel, yet >> > > > > > it is now present in 4.9.x, 4.14.x, 4.19.x and 4.20.x. >> > > > > >> > > > > It came in 5.0-rc1, so it fits the "in a Linus released kernel" >> > > > > requirement. If we are to wait until it shows up in a -final, that >> > > > > would be months too late for almost all of these types of patches that >> > > > > are picked up. >> > > > >> > > > rc1 is just a too early. Waiting few more rcs or even a final release >> > > > for something that people do not see as an issue should be just fine. >> > > > Consider this particular patch and tell me why it had to be rushed in >> > > > the first place. The original code was broken for _years_ but I do not >> > > > remember anybody would be complaining. >> > > >> > > This patch was in 4.20.10, which was released on Feb 12 while 5.0-rc1 >> > > came out on Jan 6. Over a month delay. >> > >> > Obviously not long enough. >> >> You're assuming that if we wouldn't have taken this patch to stable >> somehow someone else would notice this bug and fix it. >> >> What test do we have that would catch this? Which testsuite tests for >> long shebang lines? Where is the test added together with this patch >> that covers this and similar cases? > >The test is the "users out there". Right now we do not have any >specialized test case because we haven't even realized it might break >something. The main difference between breaking on the bleeding edge vs. >stable tree is that people running on bleeding edge are more likely to >expect a breakage while stable users would most likely prefer to not be >guinea pigs and have, well stable trees. >[...] Exactly, and my argument here is that no one really tests Linus's tree. Sure, folks run -rc kernels and report bugs, but no one actually runs these kernels at larger scales. Most "users out there" wouldn't see this patch until it ends up in a stable kernel. >> > > We have a list of blacklisted files/subsystems for people that do not >> > > want this to happen to their area of the kernel. The patch seemed to >> > > make sense, and it passed all known tests that we currently have. >> > >> > Yes, the patch makes sense (I wouldn't give my acked-by otherwise). But >> > this is one of the area where things that make sense might still break >> > because it is hard to assume what userspace depends on. >> >> Great, so the solution is to just not take these things into stable at >> all? > >No, but if the patch author and the maintainer have considered the >stable tree and haven't found convincing arguments to mark for stable >then it is likely that the patch doesn't need an urgent backporting. Are you suggesting that waiting longer would somehow made this "safer"? This goes back to my argument above. >> The solution should be to add tests to the patches that go in there >> to verify their correctness and that they don't regress in the future. >> >> If you're really concerned about subsystems being brittle the solution >> is to improve their testing rather push stuff in and hope nothing >> explodes. >> >> On one hand you Ack it saying it looks great to you and should be >> merged, but on the other hand you're saying that you don't really trust >> the patch? > >No. But I didn't consider it a stable material. You just do not really >need all the patches in the stable, right? I have already said that this >code is there for ages and fixing it is good to have for future but >considering that nobody was really complaining then a backporting just >adds a risk and as it turned out that risk was really not zero. > >> Really, if I wouldn't pick this patch now what do you think would have >> happened? It would just pop up in a few months as we roll our stable >> kernel forward. > >and that would be a different kernel version and people kinda expect >bugs with newer versions. This is not the case with the stable update. > >But I guess we are just repeating the same discussion over and over. Our >expectations about what the stable kernel should be differs a lot. I >would like to see fewer but only important fixes while you would like to >take as many fixes as possible. Maybe to clarify here: I don't want to blindly take as much patches as I can. I want to take patches based on testing results: if something looks like a fix and it passes all our tests, there shouldn't be a reason not to take it. My view is that humans are terrible at writing and understanding code: if folks fully understood the impact of their patches we would never have bugs, right? Assuming we both agree here that we make mistakes and introduce bugs, why do you think that these very same people fully understand whether a patch should go in stable or not? The approach of manually deciding if a patch needs to go in stable is wrong and it doesn't scale. We need to beef up our testing story and make these decisions based off of that, and not our error-prone brains that introduced these bugs to begin with. Look at the outcome of this very issue: people sprung into action and fixed this bug quickly, but how many tests were added as a result of this? How do we know it's not going to regress again? -- Thanks, Sasha ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Userspace regression in LTS and stable kernels 2019-02-15 18:00 ` Sasha Levin @ 2019-02-18 12:56 ` Michal Hocko 0 siblings, 0 replies; 22+ messages in thread From: Michal Hocko @ 2019-02-18 12:56 UTC (permalink / raw) To: Sasha Levin Cc: Greg Kroah-Hartman, Andrew Morton, stable, Linus Torvalds, Richard Weinberger, Samuel Dionne-Riel, LKML, graham, Oleg Nesterov, Kees Cook On Fri 15-02-19 13:00:26, Sasha Levin wrote: > On Fri, Feb 15, 2019 at 04:52:00PM +0100, Michal Hocko wrote: > > On Fri 15-02-19 10:19:12, Sasha Levin wrote: > > > On Fri, Feb 15, 2019 at 10:42:05AM +0100, Michal Hocko wrote: > > > > On Fri 15-02-19 10:20:13, Greg KH wrote: > > > > > On Fri, Feb 15, 2019 at 10:10:00AM +0100, Michal Hocko wrote: > > > > > > On Fri 15-02-19 08:00:22, Greg KH wrote: > > > > > > > On Thu, Feb 14, 2019 at 12:20:27PM -0800, Andrew Morton wrote: > > > > > > > > On Thu, 14 Feb 2019 09:56:46 -0800 Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > > > > > > > > > > > > > > > On Wed, Feb 13, 2019 at 3:37 PM Richard Weinberger > > > > > > > > > <richard.weinberger@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > > Your shebang line exceeds BINPRM_BUF_SIZE. > > > > > > > > > > Before the said commit the kernel silently truncated the shebang line > > > > > > > > > > (and corrupted it), > > > > > > > > > > now it tells the user that the line is too long. > > > > > > > > > > > > > > > > > > It doesn't matter if it "corrupted" things by truncating it. All that > > > > > > > > > matters is "it used to work, now it doesn't" > > > > > > > > > > > > > > > > > > Yes, maybe it never *should* have worked. And yes, it's sad that > > > > > > > > > people apparently had cases that depended on this odd behavior, but > > > > > > > > > there we are. > > > > > > > > > > > > > > > > > > I see that Kees has a patch to fix it up. > > > > > > > > > > > > > > > > > > > > > > > > > Greg, I think we have a problem here. > > > > > > > > > > > > > > > > 8099b047ecc431518 ("exec: load_script: don't blindly truncate shebang > > > > > > > > string") wasn't marked for backporting. And, presumably as a > > > > > > > > consequence, Kees's fix "exec: load_script: allow interpreter argument > > > > > > > > truncation" was not marked for backporting. > > > > > > > > > > > > > > > > 8099b047ecc431518 hasn't even appeared in a Linus released kernel, yet > > > > > > > > it is now present in 4.9.x, 4.14.x, 4.19.x and 4.20.x. > > > > > > > > > > > > > > It came in 5.0-rc1, so it fits the "in a Linus released kernel" > > > > > > > requirement. If we are to wait until it shows up in a -final, that > > > > > > > would be months too late for almost all of these types of patches that > > > > > > > are picked up. > > > > > > > > > > > > rc1 is just a too early. Waiting few more rcs or even a final release > > > > > > for something that people do not see as an issue should be just fine. > > > > > > Consider this particular patch and tell me why it had to be rushed in > > > > > > the first place. The original code was broken for _years_ but I do not > > > > > > remember anybody would be complaining. > > > > > > > > > > This patch was in 4.20.10, which was released on Feb 12 while 5.0-rc1 > > > > > came out on Jan 6. Over a month delay. > > > > > > > > Obviously not long enough. > > > > > > You're assuming that if we wouldn't have taken this patch to stable > > > somehow someone else would notice this bug and fix it. > > > > > > What test do we have that would catch this? Which testsuite tests for > > > long shebang lines? Where is the test added together with this patch > > > that covers this and similar cases? > > > > The test is the "users out there". Right now we do not have any > > specialized test case because we haven't even realized it might break > > something. The main difference between breaking on the bleeding edge vs. > > stable tree is that people running on bleeding edge are more likely to > > expect a breakage while stable users would most likely prefer to not be > > guinea pigs and have, well stable trees. > > [...] > > Exactly, and my argument here is that no one really tests Linus's tree. I would beg to disagree. The testing coverage is smaller of course because most people are running on a distribution/stable kernels. > Sure, folks run -rc kernels and report bugs, but no one actually runs > these kernels at larger scales. And this just screams that a (much) more time has to pass before fixes which are nice-to-have are passed to the stable tree - assuming they are not fixing something that users of the said stable tree are seeing the issue of course. > Most "users out there" wouldn't see this patch until it ends up in a > stable kernel. ...and this would be on a kernel version upgrade when some breakage is expected and tolerated more than on minor version stable update. [...] > > But I guess we are just repeating the same discussion over and over. Our > > expectations about what the stable kernel should be differs a lot. I > > would like to see fewer but only important fixes while you would like to > > take as many fixes as possible. > > Maybe to clarify here: I don't want to blindly take as much patches as I > can. I want to take patches based on testing results: if something looks > like a fix and it passes all our tests, there shouldn't be a reason not > to take it. There are many things we do not have any tests for. E.g. I wasn't even aware that Perl (and others) are dealing with an excessive shebang input by re-reading the input. There are always going to be corner cases like that. The underlying thing is that nobody seem to be complaining about the original issue addressed by Oleg. So why the heck should we push it to the stable tree and _risk_ a regression. > My view is that humans are terrible at writing and understanding code: > if folks fully understood the impact of their patches we would never > have bugs, right? Assuming we both agree here that we make mistakes and > introduce bugs, why do you think that these very same people fully > understand whether a patch should go in stable or not? I haven't really seen a script that would be more efficient in this evaluation. With a lack of the full test coverage I do not see this going to change anytime soon. > The approach of manually deciding if a patch needs to go in stable is > wrong and it doesn't scale. We need to beef up our testing story and > make these decisions based off of that, and not our error-prone brains > that introduced these bugs to begin with. > > Look at the outcome of this very issue: people sprung into action and > fixed this bug quickly, but how many tests were added as a result of > this? How do we know it's not going to regress again? Yes, the issue got identified and analyzed quickly. There was no questioning this part. It is the regression in stable that bothers me. You have exposed users of a tree, which is supposed to be stable, to a bug which was totally unnecessary because nobody cared for the parsing behavior for years. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2019-02-18 12:56 UTC | newest] Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-02-13 17:57 Userspace regression in LTS and stable kernels Samuel Dionne-Riel 2019-02-13 18:00 ` Samuel Dionne-Riel 2019-02-13 23:36 ` Richard Weinberger 2019-02-14 0:41 ` Samuel Dionne-Riel 2019-02-14 0:54 ` Kees Cook 2019-02-14 1:27 ` Samuel Dionne-Riel 2019-02-14 1:35 ` Kees Cook 2019-02-14 3:16 ` Samuel Dionne-Riel 2019-02-14 0:41 ` Kees Cook 2019-02-14 17:56 ` Linus Torvalds 2019-02-14 20:20 ` Andrew Morton 2019-02-15 7:00 ` Greg Kroah-Hartman 2019-02-15 7:13 ` Greg Kroah-Hartman 2019-02-15 9:10 ` Michal Hocko 2019-02-15 9:20 ` Greg Kroah-Hartman 2019-02-15 9:42 ` Michal Hocko 2019-02-15 15:19 ` Sasha Levin 2019-02-15 15:52 ` Michal Hocko 2019-02-15 16:18 ` Samuel Dionne-Riel 2019-02-15 18:02 ` Sasha Levin 2019-02-15 18:00 ` Sasha Levin 2019-02-18 12:56 ` Michal Hocko
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).