* Bisected stability regression in 6.6 @ 2023-11-11 6:31 matoro 2023-11-11 7:02 ` Bagas Sanjaya 2023-11-11 21:21 ` Helge Deller 0 siblings, 2 replies; 12+ messages in thread From: matoro @ 2023-11-11 6:31 UTC (permalink / raw) To: linux-parisc, deller, Linux Kernel Mailing List, Sam James Hi Helge, I have bisected a regression in 6.6 which is causing userspace segfaults at a significantly increased rate in kernel 6.6. There seems to be a pathological case triggered by the ninja build tool. The test case I have been using is cmake with ninja backend to attempt to build the nghttp2 package. In 6.6, this segfaults, not at the same location every time, but with enough reliability that I was able to use it as a bisection regression case, including immediately after a reboot. In the kernel log, these show up as "trap #15: Data TLB miss fault" messages. Now these messages can and do show up in 6.5 causing segfaults, but never immediately after a reboot and infrequently enough that the system is stable. With kernel 6.6 I am completely unable to build nghttp2 under any circumstances. I have bisected this down to the following commit: $ git bisect good 3033cd4307681c60db6d08f398a64484b36e0b0f is the first bad commit commit 3033cd4307681c60db6d08f398a64484b36e0b0f Author: Helge Deller <deller@gmx.de> Date: Sat Aug 19 00:53:28 2023 +0200 parisc: Use generic mmap top-down layout and brk randomization parisc uses a top-down layout by default that exactly fits the generic functions, so get rid of arch specific code and use the generic version by selecting ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT. Note that on parisc the stack always grows up and a "unlimited stack" simply means that the value as defined in CONFIG_STACK_MAX_DEFAULT_SIZE_MB should be used. So RLIM_INFINITY is not an indicator to use the legacy memory layout. Signed-off-by: Helge Deller <deller@gmx.de> arch/parisc/Kconfig | 17 +++++++++++++ arch/parisc/kernel/process.c | 14 ----------- arch/parisc/kernel/sys_parisc.c | 54 +---------------------------------------- mm/util.c | 5 +++- 4 files changed, 22 insertions(+), 68 deletions(-) I have tried applying ad4aa06e1d92b06ed56c7240252927bd60632efe ("parisc: Add nop instructions after TLB inserts") on top of 6.6, but it does NOT fix the issue. Let me know if there is anything I can answer on this. I can provide full remote access with BMC if it would help. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Bisected stability regression in 6.6 2023-11-11 6:31 Bisected stability regression in 6.6 matoro @ 2023-11-11 7:02 ` Bagas Sanjaya 2023-11-22 9:07 ` Linux regression tracking #update (Thorsten Leemhuis) 2023-11-11 21:21 ` Helge Deller 1 sibling, 1 reply; 12+ messages in thread From: Bagas Sanjaya @ 2023-11-11 7:02 UTC (permalink / raw) To: matoro, Linux PA-RISC Mailing List, Helge Deller, Linux Kernel Mailing List, Sam James, Linux Memory Management List, Linux Regressions Cc: James E.J. Bottomley, Andrew Morton, Peter Zijlstra, Rafael J. Wysocki, Gautham R. Shenoy, Josh Poimboeuf, Thomas Gleixner, Jens Axboe, John David Anglin [-- Attachment #1: Type: text/plain, Size: 2560 bytes --] On Sat, Nov 11, 2023 at 01:31:01AM -0500, matoro wrote: > Hi Helge, I have bisected a regression in 6.6 which is causing userspace > segfaults at a significantly increased rate in kernel 6.6. There seems to > be a pathological case triggered by the ninja build tool. The test case I > have been using is cmake with ninja backend to attempt to build the nghttp2 > package. In 6.6, this segfaults, not at the same location every time, but > with enough reliability that I was able to use it as a bisection regression > case, including immediately after a reboot. In the kernel log, these show > up as "trap #15: Data TLB miss fault" messages. Now these messages can and > do show up in 6.5 causing segfaults, but never immediately after a reboot > and infrequently enough that the system is stable. With kernel 6.6 I am > completely unable to build nghttp2 under any circumstances. > > I have bisected this down to the following commit: > > $ git bisect good > 3033cd4307681c60db6d08f398a64484b36e0b0f is the first bad commit > commit 3033cd4307681c60db6d08f398a64484b36e0b0f > Author: Helge Deller <deller@gmx.de> > Date: Sat Aug 19 00:53:28 2023 +0200 > > parisc: Use generic mmap top-down layout and brk randomization > > parisc uses a top-down layout by default that exactly fits the generic > functions, so get rid of arch specific code and use the generic version > by selecting ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT. > > Note that on parisc the stack always grows up and a "unlimited stack" > simply means that the value as defined in > CONFIG_STACK_MAX_DEFAULT_SIZE_MB > should be used. So RLIM_INFINITY is not an indicator to use the legacy > memory layout. > > Signed-off-by: Helge Deller <deller@gmx.de> > > arch/parisc/Kconfig | 17 +++++++++++++ > arch/parisc/kernel/process.c | 14 ----------- > arch/parisc/kernel/sys_parisc.c | 54 > +---------------------------------------- > mm/util.c | 5 +++- > 4 files changed, 22 insertions(+), 68 deletions(-) > > I have tried applying ad4aa06e1d92b06ed56c7240252927bd60632efe ("parisc: Add > nop instructions after TLB inserts") on top of 6.6, but it does NOT fix the > issue. > > Let me know if there is anything I can answer on this. I can provide full > remote access with BMC if it would help. Thanks for the regression report. I'm adding it to regzbot: #regzbot ^introduced: 3033cd4307681c -- An old man doll... just what I always wanted! - Clara [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Bisected stability regression in 6.6 2023-11-11 7:02 ` Bagas Sanjaya @ 2023-11-22 9:07 ` Linux regression tracking #update (Thorsten Leemhuis) 0 siblings, 0 replies; 12+ messages in thread From: Linux regression tracking #update (Thorsten Leemhuis) @ 2023-11-22 9:07 UTC (permalink / raw) To: Bagas Sanjaya, Linux PA-RISC Mailing List, Linux Kernel Mailing List, Sam James, Linux Memory Management List, Linux Regressions [TLDR: This mail in primarily relevant for Linux kernel regression tracking. See link in footer if these mails annoy you.] On 11.11.23 08:02, Bagas Sanjaya wrote: > On Sat, Nov 11, 2023 at 01:31:01AM -0500, matoro wrote: >> Hi Helge, I have bisected a regression in 6.6 which is causing userspace >> segfaults at a significantly increased rate in kernel 6.6. > #regzbot ^introduced: 3033cd4307681c #regzbot fix: 5f74f820f6fc844b95f9 #regzbot ignore-activity Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr That page also explains what to do if mails like this annoy you. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Bisected stability regression in 6.6 2023-11-11 6:31 Bisected stability regression in 6.6 matoro 2023-11-11 7:02 ` Bagas Sanjaya @ 2023-11-11 21:21 ` Helge Deller 2023-11-11 21:27 ` Sam James 2023-11-11 21:28 ` matoro 1 sibling, 2 replies; 12+ messages in thread From: Helge Deller @ 2023-11-11 21:21 UTC (permalink / raw) To: matoro, linux-parisc, Linux Kernel Mailing List, Sam James On 11/11/23 07:31, matoro wrote: > Hi Helge, I have bisected a regression in 6.6 which is causing > userspace segfaults at a significantly increased rate in kernel 6.6. > There seems to be a pathological case triggered by the ninja build > tool. The test case I have been using is cmake with ninja backend to > attempt to build the nghttp2 package. In 6.6, this segfaults, not at > the same location every time, but with enough reliability that I was > able to use it as a bisection regression case, including immediately > after a reboot. In the kernel log, these show up as "trap #15: Data > TLB miss fault" messages. Now these messages can and do show up in > 6.5 causing segfaults, but never immediately after a reboot and > infrequently enough that the system is stable. With kernel 6.6 I am > completely unable to build nghttp2 under any circumstances. > > I have bisected this down to the following commit: > > $ git bisect good > 3033cd4307681c60db6d08f398a64484b36e0b0f is the first bad commit > commit 3033cd4307681c60db6d08f398a64484b36e0b0f > Author: Helge Deller <deller@gmx.de> > Date: Sat Aug 19 00:53:28 2023 +0200 > > parisc: Use generic mmap top-down layout and brk randomization > > parisc uses a top-down layout by default that exactly fits the generic > functions, so get rid of arch specific code and use the generic version > by selecting ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT. > > Note that on parisc the stack always grows up and a "unlimited stack" > simply means that the value as defined in CONFIG_STACK_MAX_DEFAULT_SIZE_MB > should be used. So RLIM_INFINITY is not an indicator to use the legacy > memory layout. > > Signed-off-by: Helge Deller <deller@gmx.de> > > arch/parisc/Kconfig | 17 +++++++++++++ > arch/parisc/kernel/process.c | 14 ----------- > arch/parisc/kernel/sys_parisc.c | 54 +---------------------------------------- > mm/util.c | 5 +++- > 4 files changed, 22 insertions(+), 68 deletions(-) Thanks for your report! I think it's quite unlikely that this patch introduces such a bad regression. I'd suspect some other bad commmit, but I'll try to reproduce. In any case, do you have CONFIG_BPF_JIT enabled? If so, could you try to reproduce with CONFIG_BPF_JIT disabled? The JIT is quite new in v6.6 and I did face some crashes and disabling it helped me so far. > I have tried applying ad4aa06e1d92b06ed56c7240252927bd60632efe > ("parisc: Add nop instructions after TLB inserts") on top of 6.6, but > it does NOT fix the issue. Ok. Helge ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Bisected stability regression in 6.6 2023-11-11 21:21 ` Helge Deller @ 2023-11-11 21:27 ` Sam James 2023-11-11 23:33 ` matoro 2023-11-11 21:28 ` matoro 1 sibling, 1 reply; 12+ messages in thread From: Sam James @ 2023-11-11 21:27 UTC (permalink / raw) To: Helge Deller; +Cc: matoro, linux-parisc, Linux Kernel Mailing List, Sam James Helge Deller <deller@gmx.de> writes: > On 11/11/23 07:31, matoro wrote: >> Hi Helge, I have bisected a regression in 6.6 which is causing >> userspace segfaults at a significantly increased rate in kernel 6.6. >> There seems to be a pathological case triggered by the ninja build >> tool. The test case I have been using is cmake with ninja backend to >> attempt to build the nghttp2 package. In 6.6, this segfaults, not at >> the same location every time, but with enough reliability that I was >> able to use it as a bisection regression case, including immediately >> after a reboot. In the kernel log, these show up as "trap #15: Data >> TLB miss fault" messages. Now these messages can and do show up in >> 6.5 causing segfaults, but never immediately after a reboot and >> infrequently enough that the system is stable. With kernel 6.6 I am >> completely unable to build nghttp2 under any circumstances. >> >> I have bisected this down to the following commit: >> >> $ git bisect good >> 3033cd4307681c60db6d08f398a64484b36e0b0f is the first bad commit >> commit 3033cd4307681c60db6d08f398a64484b36e0b0f >> Author: Helge Deller <deller@gmx.de> >> Date: Sat Aug 19 00:53:28 2023 +0200 >> >> parisc: Use generic mmap top-down layout and brk randomization >> >> parisc uses a top-down layout by default that exactly fits the generic >> functions, so get rid of arch specific code and use the generic version >> by selecting ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT. >> >> Note that on parisc the stack always grows up and a "unlimited stack" >> simply means that the value as defined in CONFIG_STACK_MAX_DEFAULT_SIZE_MB >> should be used. So RLIM_INFINITY is not an indicator to use the legacy >> memory layout. >> >> Signed-off-by: Helge Deller <deller@gmx.de> >> >> arch/parisc/Kconfig | 17 +++++++++++++ >> arch/parisc/kernel/process.c | 14 ----------- >> arch/parisc/kernel/sys_parisc.c | 54 +---------------------------------------- >> mm/util.c | 5 +++- >> 4 files changed, 22 insertions(+), 68 deletions(-) > > Thanks for your report! > I think it's quite unlikely that this patch introduces such a bad regression. > I'd suspect some other bad commmit, but I'll try to reproduce. matoro, does a revert apply cleanly? Does it help? > > In any case, do you have CONFIG_BPF_JIT enabled? If so, could you try > to reproduce with CONFIG_BPF_JIT disabled? > The JIT is quite new in v6.6 and I did face some crashes and disabling > it helped me so far. > >> I have tried applying ad4aa06e1d92b06ed56c7240252927bd60632efe >> ("parisc: Add nop instructions after TLB inserts") on top of 6.6, but >> it does NOT fix the issue. > > Ok. > > Helge ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Bisected stability regression in 6.6 2023-11-11 21:27 ` Sam James @ 2023-11-11 23:33 ` matoro 2023-11-12 1:22 ` Dr. David Alan Gilbert 2023-11-12 20:22 ` Helge Deller 0 siblings, 2 replies; 12+ messages in thread From: matoro @ 2023-11-11 23:33 UTC (permalink / raw) To: Sam James; +Cc: Helge Deller, linux-parisc, Linux Kernel Mailing List On 2023-11-11 16:27, Sam James wrote: > Helge Deller <deller@gmx.de> writes: > >> On 11/11/23 07:31, matoro wrote: >>> Hi Helge, I have bisected a regression in 6.6 which is causing >>> userspace segfaults at a significantly increased rate in kernel 6.6. >>> There seems to be a pathological case triggered by the ninja build >>> tool. The test case I have been using is cmake with ninja backend to >>> attempt to build the nghttp2 package. In 6.6, this segfaults, not at >>> the same location every time, but with enough reliability that I was >>> able to use it as a bisection regression case, including immediately >>> after a reboot. In the kernel log, these show up as "trap #15: Data >>> TLB miss fault" messages. Now these messages can and do show up in >>> 6.5 causing segfaults, but never immediately after a reboot and >>> infrequently enough that the system is stable. With kernel 6.6 I am >>> completely unable to build nghttp2 under any circumstances. >>> >>> I have bisected this down to the following commit: >>> >>> $ git bisect good >>> 3033cd4307681c60db6d08f398a64484b36e0b0f is the first bad commit >>> commit 3033cd4307681c60db6d08f398a64484b36e0b0f >>> Author: Helge Deller <deller@gmx.de> >>> Date: Sat Aug 19 00:53:28 2023 +0200 >>> >>> parisc: Use generic mmap top-down layout and brk randomization >>> >>> parisc uses a top-down layout by default that exactly fits the >>> generic >>> functions, so get rid of arch specific code and use the generic >>> version >>> by selecting ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT. >>> >>> Note that on parisc the stack always grows up and a "unlimited stack" >>> simply means that the value as defined in >>> CONFIG_STACK_MAX_DEFAULT_SIZE_MB >>> should be used. So RLIM_INFINITY is not an indicator to use the >>> legacy >>> memory layout. >>> >>> Signed-off-by: Helge Deller <deller@gmx.de> >>> >>> arch/parisc/Kconfig | 17 +++++++++++++ >>> arch/parisc/kernel/process.c | 14 ----------- >>> arch/parisc/kernel/sys_parisc.c | 54 >>> +---------------------------------------- >>> mm/util.c | 5 +++- >>> 4 files changed, 22 insertions(+), 68 deletions(-) >> >> Thanks for your report! >> I think it's quite unlikely that this patch introduces such a bad >> regression. >> I'd suspect some other bad commmit, but I'll try to reproduce. > > matoro, does a revert apply cleanly? Does it help? Yes, I just tested this and it cleanly reverts on linux-6.6.y and the revert does fix the issue. >> >> In any case, do you have CONFIG_BPF_JIT enabled? If so, could you try >> to reproduce with CONFIG_BPF_JIT disabled? >> The JIT is quite new in v6.6 and I did face some crashes and disabling >> it helped me so far. >> >>> I have tried applying ad4aa06e1d92b06ed56c7240252927bd60632efe >>> ("parisc: Add nop instructions after TLB inserts") on top of 6.6, but >>> it does NOT fix the issue. >> >> Ok. >> >> Helge ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Bisected stability regression in 6.6 2023-11-11 23:33 ` matoro @ 2023-11-12 1:22 ` Dr. David Alan Gilbert 2023-11-12 8:03 ` Helge Deller 2023-11-12 20:22 ` Helge Deller 1 sibling, 1 reply; 12+ messages in thread From: Dr. David Alan Gilbert @ 2023-11-12 1:22 UTC (permalink / raw) To: matoro, HelgeDeller, deller Cc: Sam James, linux-parisc, Linux Kernel Mailing List * matoro (matoro_mailinglist_kernel@matoro.tk) wrote: > On 2023-11-11 16:27, Sam James wrote: > > Helge Deller <deller@gmx.de> writes: > > > > > On 11/11/23 07:31, matoro wrote: > > > > Hi Helge, I have bisected a regression in 6.6 which is causing > > > > userspace segfaults at a significantly increased rate in kernel 6.6. > > > > There seems to be a pathological case triggered by the ninja build > > > > tool. The test case I have been using is cmake with ninja backend to > > > > attempt to build the nghttp2 package. In 6.6, this segfaults, not at > > > > the same location every time, but with enough reliability that I was > > > > able to use it as a bisection regression case, including immediately > > > > after a reboot. In the kernel log, these show up as "trap #15: Data > > > > TLB miss fault" messages. Now these messages can and do show up in > > > > 6.5 causing segfaults, but never immediately after a reboot and > > > > infrequently enough that the system is stable. With kernel 6.6 I am > > > > completely unable to build nghttp2 under any circumstances. > > > > > > > > I have bisected this down to the following commit: > > > > > > > > $ git bisect good > > > > 3033cd4307681c60db6d08f398a64484b36e0b0f is the first bad commit > > > > commit 3033cd4307681c60db6d08f398a64484b36e0b0f > > > > Author: Helge Deller <deller@gmx.de> > > > > Date: Sat Aug 19 00:53:28 2023 +0200 > > > > > > > > parisc: Use generic mmap top-down layout and brk randomization > > > > > > > > parisc uses a top-down layout by default that exactly fits > > > > the generic > > > > functions, so get rid of arch specific code and use the > > > > generic version > > > > by selecting ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT. > > > > > > > > Note that on parisc the stack always grows up and a "unlimited stack" > > > > simply means that the value as defined in > > > > CONFIG_STACK_MAX_DEFAULT_SIZE_MB > > > > should be used. So RLIM_INFINITY is not an indicator to use > > > > the legacy > > > > memory layout. > > > > > > > > Signed-off-by: Helge Deller <deller@gmx.de> > > > > > > > > arch/parisc/Kconfig | 17 +++++++++++++ > > > > arch/parisc/kernel/process.c | 14 ----------- > > > > arch/parisc/kernel/sys_parisc.c | 54 > > > > +---------------------------------------- > > > > mm/util.c | 5 +++- > > > > 4 files changed, 22 insertions(+), 68 deletions(-) > > > > > > Thanks for your report! > > > I think it's quite unlikely that this patch introduces such a bad > > > regression. > > > I'd suspect some other bad commmit, but I'll try to reproduce. > > > > matoro, does a revert apply cleanly? Does it help? > > Yes, I just tested this and it cleanly reverts on linux-6.6.y and the revert > does fix the issue. Helge: In that patch is: diff --git a/mm/util.c b/mm/util.c index dd12b9531ac4c..8810206444977 100644 --- a/mm/util.c +++ b/mm/util.c @@ -396,7 +396,10 @@ static int mmap_is_legacy(struct rlimit *rlim_stack) if (current->personality & ADDR_COMPAT_LAYOUT) return 1; - if (rlim_stack->rlim_cur == RLIM_INFINITY) + /* On parisc the stack always grows up - so a unlimited stack should + * not be an indicator to use the legacy memory layout. */ + if (rlim_stack->rlim_cur == RLIM_INFINITY && + !IS_ENABLED(CONFIG_STACK_GROWSUP)) return 1; return sysctl_legacy_va_layout; is that: '!IS_ENABLED(CONFIG_STACK_GROWSUP))' the right way around? That feels inverted to me; non-parisc don't have that config set, so !IS_ENABLED... is true, so they return 1 instead of checking the flag? Dave > > > > > > In any case, do you have CONFIG_BPF_JIT enabled? If so, could you try > > > to reproduce with CONFIG_BPF_JIT disabled? > > > The JIT is quite new in v6.6 and I did face some crashes and disabling > > > it helped me so far. > > > > > > > I have tried applying ad4aa06e1d92b06ed56c7240252927bd60632efe > > > > ("parisc: Add nop instructions after TLB inserts") on top of 6.6, but > > > > it does NOT fix the issue. > > > > > > Ok. > > > > > > Helge -- -----Open up your eyes, open up your mind, open up your code ------- / Dr. David Alan Gilbert | Running GNU/Linux | Happy \ \ dave @ treblig.org | | In Hex / \ _________________________|_____ http://www.treblig.org |_______/ ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: Bisected stability regression in 6.6 2023-11-12 1:22 ` Dr. David Alan Gilbert @ 2023-11-12 8:03 ` Helge Deller 2023-11-12 12:07 ` Dr. David Alan Gilbert 0 siblings, 1 reply; 12+ messages in thread From: Helge Deller @ 2023-11-12 8:03 UTC (permalink / raw) To: Dr. David Alan Gilbert, matoro, HelgeDeller Cc: Sam James, linux-parisc, Linux Kernel Mailing List On 11/12/23 02:22, Dr. David Alan Gilbert wrote: > * matoro (matoro_mailinglist_kernel@matoro.tk) wrote: >> On 2023-11-11 16:27, Sam James wrote: >>> Helge Deller <deller@gmx.de> writes: >>> >>>> On 11/11/23 07:31, matoro wrote: >>>>> Hi Helge, I have bisected a regression in 6.6 which is causing >>>>> userspace segfaults at a significantly increased rate in kernel 6.6. >>>>> There seems to be a pathological case triggered by the ninja build >>>>> tool. The test case I have been using is cmake with ninja backend to >>>>> attempt to build the nghttp2 package. In 6.6, this segfaults, not at >>>>> the same location every time, but with enough reliability that I was >>>>> able to use it as a bisection regression case, including immediately >>>>> after a reboot. In the kernel log, these show up as "trap #15: Data >>>>> TLB miss fault" messages. Now these messages can and do show up in >>>>> 6.5 causing segfaults, but never immediately after a reboot and >>>>> infrequently enough that the system is stable. With kernel 6.6 I am >>>>> completely unable to build nghttp2 under any circumstances. >>>>> >>>>> I have bisected this down to the following commit: >>>>> >>>>> $ git bisect good >>>>> 3033cd4307681c60db6d08f398a64484b36e0b0f is the first bad commit >>>>> commit 3033cd4307681c60db6d08f398a64484b36e0b0f >>>>> Author: Helge Deller <deller@gmx.de> >>>>> Date: Sat Aug 19 00:53:28 2023 +0200 >>>>> >>>>> parisc: Use generic mmap top-down layout and brk randomization >>>>> >>>>> parisc uses a top-down layout by default that exactly fits >>>>> the generic >>>>> functions, so get rid of arch specific code and use the >>>>> generic version >>>>> by selecting ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT. >>>>> >>>>> Note that on parisc the stack always grows up and a "unlimited stack" >>>>> simply means that the value as defined in >>>>> CONFIG_STACK_MAX_DEFAULT_SIZE_MB >>>>> should be used. So RLIM_INFINITY is not an indicator to use >>>>> the legacy >>>>> memory layout. >>>>> >>>>> Signed-off-by: Helge Deller <deller@gmx.de> >>>>> >>>>> arch/parisc/Kconfig | 17 +++++++++++++ >>>>> arch/parisc/kernel/process.c | 14 ----------- >>>>> arch/parisc/kernel/sys_parisc.c | 54 >>>>> +---------------------------------------- >>>>> mm/util.c | 5 +++- >>>>> 4 files changed, 22 insertions(+), 68 deletions(-) >>>> >>>> Thanks for your report! >>>> I think it's quite unlikely that this patch introduces such a bad >>>> regression. >>>> I'd suspect some other bad commmit, but I'll try to reproduce. >>> >>> matoro, does a revert apply cleanly? Does it help? >> >> Yes, I just tested this and it cleanly reverts on linux-6.6.y and the revert >> does fix the issue. > > Helge: > In that patch is: > > diff --git a/mm/util.c b/mm/util.c > index dd12b9531ac4c..8810206444977 100644 > --- a/mm/util.c > +++ b/mm/util.c > @@ -396,7 +396,10 @@ static int mmap_is_legacy(struct rlimit *rlim_stack) > if (current->personality & ADDR_COMPAT_LAYOUT) > return 1; > > - if (rlim_stack->rlim_cur == RLIM_INFINITY) > + /* On parisc the stack always grows up - so a unlimited stack should > + * not be an indicator to use the legacy memory layout. */ > + if (rlim_stack->rlim_cur == RLIM_INFINITY && > + !IS_ENABLED(CONFIG_STACK_GROWSUP)) > return 1; > > return sysctl_legacy_va_layout; > > is that: > '!IS_ENABLED(CONFIG_STACK_GROWSUP))' > > the right way around? > > That feels inverted to me; non-parisc don't have that config > set, so !IS_ENABLED... is true, so they return 1 instead of checking > the flag? Right. For non-parisc the behaviour didn't change with my patch, and this is intended. If rlim_stack->rlim_cur == RLIM_INFINITY, non-parisc return 1 as before. Note that matoro reported a regression specifically on the parisc platform. This change: - if (rlim_stack->rlim_cur == RLIM_INFINITY) + if (rlim_stack->rlim_cur == RLIM_INFINITY && + !IS_ENABLED(CONFIG_STACK_GROWSUP)) just changes the behaviour on parisc. On parisc rlim_stack->rlim_cur == RLIM_INFINITY" is always true, unless the user changed the stack limit manually. If unchanged, mmap_is_legacy() should return sysctl_legacy_va_layout, otherwise 1. So, I think that part of the patch is OK. Helge ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Bisected stability regression in 6.6 2023-11-12 8:03 ` Helge Deller @ 2023-11-12 12:07 ` Dr. David Alan Gilbert 0 siblings, 0 replies; 12+ messages in thread From: Dr. David Alan Gilbert @ 2023-11-12 12:07 UTC (permalink / raw) To: Helge Deller; +Cc: matoro, Sam James, linux-parisc, Linux Kernel Mailing List * Helge Deller (deller@gmx.de) wrote: > On 11/12/23 02:22, Dr. David Alan Gilbert wrote: > > * matoro (matoro_mailinglist_kernel@matoro.tk) wrote: > > > On 2023-11-11 16:27, Sam James wrote: > > > > Helge Deller <deller@gmx.de> writes: > > > > > > > > > On 11/11/23 07:31, matoro wrote: > > > > > > Hi Helge, I have bisected a regression in 6.6 which is causing > > > > > > userspace segfaults at a significantly increased rate in kernel 6.6. > > > > > > There seems to be a pathological case triggered by the ninja build > > > > > > tool. The test case I have been using is cmake with ninja backend to > > > > > > attempt to build the nghttp2 package. In 6.6, this segfaults, not at > > > > > > the same location every time, but with enough reliability that I was > > > > > > able to use it as a bisection regression case, including immediately > > > > > > after a reboot. In the kernel log, these show up as "trap #15: Data > > > > > > TLB miss fault" messages. Now these messages can and do show up in > > > > > > 6.5 causing segfaults, but never immediately after a reboot and > > > > > > infrequently enough that the system is stable. With kernel 6.6 I am > > > > > > completely unable to build nghttp2 under any circumstances. > > > > > > > > > > > > I have bisected this down to the following commit: > > > > > > > > > > > > $ git bisect good > > > > > > 3033cd4307681c60db6d08f398a64484b36e0b0f is the first bad commit > > > > > > commit 3033cd4307681c60db6d08f398a64484b36e0b0f > > > > > > Author: Helge Deller <deller@gmx.de> > > > > > > Date: Sat Aug 19 00:53:28 2023 +0200 > > > > > > > > > > > > parisc: Use generic mmap top-down layout and brk randomization > > > > > > > > > > > > parisc uses a top-down layout by default that exactly fits > > > > > > the generic > > > > > > functions, so get rid of arch specific code and use the > > > > > > generic version > > > > > > by selecting ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT. > > > > > > > > > > > > Note that on parisc the stack always grows up and a "unlimited stack" > > > > > > simply means that the value as defined in > > > > > > CONFIG_STACK_MAX_DEFAULT_SIZE_MB > > > > > > should be used. So RLIM_INFINITY is not an indicator to use > > > > > > the legacy > > > > > > memory layout. > > > > > > > > > > > > Signed-off-by: Helge Deller <deller@gmx.de> > > > > > > > > > > > > arch/parisc/Kconfig | 17 +++++++++++++ > > > > > > arch/parisc/kernel/process.c | 14 ----------- > > > > > > arch/parisc/kernel/sys_parisc.c | 54 > > > > > > +---------------------------------------- > > > > > > mm/util.c | 5 +++- > > > > > > 4 files changed, 22 insertions(+), 68 deletions(-) > > > > > > > > > > Thanks for your report! > > > > > I think it's quite unlikely that this patch introduces such a bad > > > > > regression. > > > > > I'd suspect some other bad commmit, but I'll try to reproduce. > > > > > > > > matoro, does a revert apply cleanly? Does it help? > > > > > > Yes, I just tested this and it cleanly reverts on linux-6.6.y and the revert > > > does fix the issue. > > > > Helge: > > In that patch is: > > > > diff --git a/mm/util.c b/mm/util.c > > index dd12b9531ac4c..8810206444977 100644 > > --- a/mm/util.c > > +++ b/mm/util.c > > @@ -396,7 +396,10 @@ static int mmap_is_legacy(struct rlimit *rlim_stack) > > if (current->personality & ADDR_COMPAT_LAYOUT) > > return 1; > > > > - if (rlim_stack->rlim_cur == RLIM_INFINITY) > > + /* On parisc the stack always grows up - so a unlimited stack should > > + * not be an indicator to use the legacy memory layout. */ > > + if (rlim_stack->rlim_cur == RLIM_INFINITY && > > + !IS_ENABLED(CONFIG_STACK_GROWSUP)) > > return 1; > > > > return sysctl_legacy_va_layout; > > > > is that: > > '!IS_ENABLED(CONFIG_STACK_GROWSUP))' > > > > the right way around? > > > > That feels inverted to me; non-parisc don't have that config > > set, so !IS_ENABLED... is true, so they return 1 instead of checking > > the flag? > > Right. For non-parisc the behaviour didn't change with my patch, and this > is intended. If rlim_stack->rlim_cur == RLIM_INFINITY, non-parisc return 1 as before. > > Note that matoro reported a regression specifically on the parisc platform. Oh, that I missed. > This change: > - if (rlim_stack->rlim_cur == RLIM_INFINITY) > + if (rlim_stack->rlim_cur == RLIM_INFINITY && > + !IS_ENABLED(CONFIG_STACK_GROWSUP)) > just changes the behaviour on parisc. > On parisc rlim_stack->rlim_cur == RLIM_INFINITY" is always true, unless the user > changed the stack limit manually. If unchanged, mmap_is_legacy() should return > sysctl_legacy_va_layout, otherwise 1. > > So, I think that part of the patch is OK. OK, thanks for the clarification. Dave (P.S. and sorry screwing up one email in the header) > Helge -- -----Open up your eyes, open up your mind, open up your code ------- / Dr. David Alan Gilbert | Running GNU/Linux | Happy \ \ dave @ treblig.org | | In Hex / \ _________________________|_____ http://www.treblig.org |_______/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Bisected stability regression in 6.6 2023-11-11 23:33 ` matoro 2023-11-12 1:22 ` Dr. David Alan Gilbert @ 2023-11-12 20:22 ` Helge Deller 2023-11-12 23:37 ` matoro 1 sibling, 1 reply; 12+ messages in thread From: Helge Deller @ 2023-11-12 20:22 UTC (permalink / raw) To: matoro; +Cc: Sam James, Helge Deller, linux-parisc, Linux Kernel Mailing List * matoro <matoro_mailinglist_kernel@matoro.tk>: > On 2023-11-11 16:27, Sam James wrote: > > Helge Deller <deller@gmx.de> writes: > > > > > On 11/11/23 07:31, matoro wrote: > > > > Hi Helge, I have bisected a regression in 6.6 which is causing > > > > userspace segfaults at a significantly increased rate in kernel 6.6. > > > > There seems to be a pathological case triggered by the ninja build > > > > tool. The test case I have been using is cmake with ninja backend to > > > > attempt to build the nghttp2 package. In 6.6, this segfaults, not at > > > > the same location every time, but with enough reliability that I was > > > > able to use it as a bisection regression case, including immediately > > > > after a reboot. In the kernel log, these show up as "trap #15: Data > > > > TLB miss fault" messages. Now these messages can and do show up in > > > > 6.5 causing segfaults, but never immediately after a reboot and > > > > infrequently enough that the system is stable. With kernel 6.6 I am > > > > completely unable to build nghttp2 under any circumstances. > > > > > > > > I have bisected this down to the following commit: > > > > > > > > $ git bisect good > > > > 3033cd4307681c60db6d08f398a64484b36e0b0f is the first bad commit > > > > commit 3033cd4307681c60db6d08f398a64484b36e0b0f > > > > Author: Helge Deller <deller@gmx.de> > > > > Date: Sat Aug 19 00:53:28 2023 +0200 > > > > > > > > parisc: Use generic mmap top-down layout and brk randomization > > > > > > > > parisc uses a top-down layout by default that exactly fits > > > > the generic > > > > functions, so get rid of arch specific code and use the > > > > generic version > > > > by selecting ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT. > > > > > > > > Note that on parisc the stack always grows up and a "unlimited stack" > > > > simply means that the value as defined in > > > > CONFIG_STACK_MAX_DEFAULT_SIZE_MB > > > > should be used. So RLIM_INFINITY is not an indicator to use > > > > the legacy > > > > memory layout. > > > > > > > > Signed-off-by: Helge Deller <deller@gmx.de> > > > > > > > > arch/parisc/Kconfig | 17 +++++++++++++ > > > > arch/parisc/kernel/process.c | 14 ----------- > > > > arch/parisc/kernel/sys_parisc.c | 54 > > > > +---------------------------------------- > > > > mm/util.c | 5 +++- > > > > 4 files changed, 22 insertions(+), 68 deletions(-) > > > > > > Thanks for your report! > > > I think it's quite unlikely that this patch introduces such a bad > > > regression. I was wrong. Indeed, by switching to the generic implementation with this patch the calculation of mmap_base is wrong for parisc (because parisc is the only architecture left where the stack grows upwards). Could you please test the patch below. It did fixed the crashes when building nghttp2 for me. Helge --- From: Helge Deller <deller@gmx.de> Subject: [PATCH] parisc: Adjust ARCH_MMAP_RND_BITS* to previous values Matoro reported various userspace crashes in kernel 6.6 and bisected it to commit 3033cd430768 ("parisc: Use generic mmap top-down layout and brk randomization"). The problem is, that mmap_base is calculated wrongly for the stack-grows-upwards case (as on parisc). On parisc, mmap_base is simply just below the stack start. Reported-by: matoro <matoro_mailinglist_kernel@matoro.tk> Signed-off-by: Helge Deller <deller@gmx.de> Fixes: 3033cd430768 ("parisc: Use generic mmap top-down layout and brk randomization") Cc: <stable@vger.kernel.org> # v6.6+ diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig index a15ab147af2e..68cbe666510a 100644 --- a/arch/parisc/Kconfig +++ b/arch/parisc/Kconfig @@ -138,11 +138,11 @@ config ARCH_MMAP_RND_COMPAT_BITS_MIN default 8 config ARCH_MMAP_RND_BITS_MAX - default 24 if 64BIT - default 17 + default 18 if 64BIT + default 13 config ARCH_MMAP_RND_COMPAT_BITS_MAX - default 17 + default 13 # unless you want to implement ACPI on PA-RISC ... ;-) config PM diff --git a/arch/parisc/include/asm/elf.h b/arch/parisc/include/asm/elf.h index 140eaa97bf21..2d73d3c3cd37 100644 --- a/arch/parisc/include/asm/elf.h +++ b/arch/parisc/include/asm/elf.h @@ -349,15 +349,7 @@ struct pt_regs; /* forward declaration... */ #define ELF_HWCAP 0 -/* Masks for stack and mmap randomization */ -#define BRK_RND_MASK (is_32bit_task() ? 0x07ffUL : 0x3ffffUL) -#define MMAP_RND_MASK (is_32bit_task() ? 0x1fffUL : 0x3ffffUL) -#define STACK_RND_MASK MMAP_RND_MASK - -struct mm_struct; -extern unsigned long arch_randomize_brk(struct mm_struct *); -#define arch_randomize_brk arch_randomize_brk - +#define STACK_RND_MASK 0x7ff /* 8MB of VA */ #define ARCH_HAS_SETUP_ADDITIONAL_PAGES 1 struct linux_binprm; diff --git a/arch/parisc/include/asm/processor.h b/arch/parisc/include/asm/processor.h index ff6cbdb6903b..ece4b3046515 100644 --- a/arch/parisc/include/asm/processor.h +++ b/arch/parisc/include/asm/processor.h @@ -47,6 +47,8 @@ #ifndef __ASSEMBLY__ +struct rlimit; +unsigned long mmap_upper_limit(struct rlimit *rlim_stack); unsigned long calc_max_stack_size(unsigned long stack_max); /* diff --git a/arch/parisc/kernel/sys_parisc.c b/arch/parisc/kernel/sys_parisc.c index ab896eff7a1d..98af719d5f85 100644 --- a/arch/parisc/kernel/sys_parisc.c +++ b/arch/parisc/kernel/sys_parisc.c @@ -77,7 +77,7 @@ unsigned long calc_max_stack_size(unsigned long stack_max) * indicating that "current" should be used instead of a passed-in * value from the exec bprm as done with arch_pick_mmap_layout(). */ -static unsigned long mmap_upper_limit(struct rlimit *rlim_stack) +unsigned long mmap_upper_limit(struct rlimit *rlim_stack) { unsigned long stack_base; diff --git a/mm/util.c b/mm/util.c index 8cbbfd3a3d59..0b7e715a71f2 100644 --- a/mm/util.c +++ b/mm/util.c @@ -414,6 +414,15 @@ static int mmap_is_legacy(struct rlimit *rlim_stack) static unsigned long mmap_base(unsigned long rnd, struct rlimit *rlim_stack) { +#ifdef CONFIG_STACK_GROWSUP + /* + * For an upwards growing stack the calculation is much simpler. + * Memory for the maximum stack size is reserved at the top of the + * task. mmap_base starts directly below the stack and grows + * downwards. + */ + return PAGE_ALIGN(mmap_upper_limit(rlim_stack) - rnd); +#else unsigned long gap = rlim_stack->rlim_cur; unsigned long pad = stack_guard_gap; @@ -431,6 +440,7 @@ static unsigned long mmap_base(unsigned long rnd, struct rlimit *rlim_stack) gap = MAX_GAP; return PAGE_ALIGN(STACK_TOP - gap - rnd); +#endif } void arch_pick_mmap_layout(struct mm_struct *mm, struct rlimit *rlim_stack) ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: Bisected stability regression in 6.6 2023-11-12 20:22 ` Helge Deller @ 2023-11-12 23:37 ` matoro 0 siblings, 0 replies; 12+ messages in thread From: matoro @ 2023-11-12 23:37 UTC (permalink / raw) To: Helge Deller; +Cc: Sam James, linux-parisc, Linux Kernel Mailing List On 2023-11-12 15:22, Helge Deller wrote: > * matoro <matoro_mailinglist_kernel@matoro.tk>: >> On 2023-11-11 16:27, Sam James wrote: >> > Helge Deller <deller@gmx.de> writes: >> > >> > > On 11/11/23 07:31, matoro wrote: >> > > > Hi Helge, I have bisected a regression in 6.6 which is causing >> > > > userspace segfaults at a significantly increased rate in kernel 6.6. >> > > > There seems to be a pathological case triggered by the ninja build >> > > > tool. The test case I have been using is cmake with ninja backend to >> > > > attempt to build the nghttp2 package. In 6.6, this segfaults, not at >> > > > the same location every time, but with enough reliability that I was >> > > > able to use it as a bisection regression case, including immediately >> > > > after a reboot. In the kernel log, these show up as "trap #15: Data >> > > > TLB miss fault" messages. Now these messages can and do show up in >> > > > 6.5 causing segfaults, but never immediately after a reboot and >> > > > infrequently enough that the system is stable. With kernel 6.6 I am >> > > > completely unable to build nghttp2 under any circumstances. >> > > > >> > > > I have bisected this down to the following commit: >> > > > >> > > > $ git bisect good >> > > > 3033cd4307681c60db6d08f398a64484b36e0b0f is the first bad commit >> > > > commit 3033cd4307681c60db6d08f398a64484b36e0b0f >> > > > Author: Helge Deller <deller@gmx.de> >> > > > Date: Sat Aug 19 00:53:28 2023 +0200 >> > > > >> > > > parisc: Use generic mmap top-down layout and brk randomization >> > > > >> > > > parisc uses a top-down layout by default that exactly fits >> > > > the generic >> > > > functions, so get rid of arch specific code and use the >> > > > generic version >> > > > by selecting ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT. >> > > > >> > > > Note that on parisc the stack always grows up and a "unlimited stack" >> > > > simply means that the value as defined in >> > > > CONFIG_STACK_MAX_DEFAULT_SIZE_MB >> > > > should be used. So RLIM_INFINITY is not an indicator to use >> > > > the legacy >> > > > memory layout. >> > > > >> > > > Signed-off-by: Helge Deller <deller@gmx.de> >> > > > >> > > > arch/parisc/Kconfig | 17 +++++++++++++ >> > > > arch/parisc/kernel/process.c | 14 ----------- >> > > > arch/parisc/kernel/sys_parisc.c | 54 >> > > > +---------------------------------------- >> > > > mm/util.c | 5 +++- >> > > > 4 files changed, 22 insertions(+), 68 deletions(-) >> > > >> > > Thanks for your report! >> > > I think it's quite unlikely that this patch introduces such a bad >> > > regression. > > I was wrong. > Indeed, by switching to the generic implementation with this patch > the calculation of mmap_base is wrong for parisc (because parisc > is the only architecture left where the stack grows upwards). > > Could you please test the patch below. It did fixed the crashes > when building nghttp2 for me. > > Helge > > --- > > From: Helge Deller <deller@gmx.de> > Subject: [PATCH] parisc: Adjust ARCH_MMAP_RND_BITS* to previous values > > Matoro reported various userspace crashes in kernel 6.6 and bisected it to > commit 3033cd430768 ("parisc: Use generic mmap top-down layout and brk > randomization"). > > The problem is, that mmap_base is calculated wrongly for the > stack-grows-upwards case (as on parisc). On parisc, mmap_base is simply just > below the stack start. > > Reported-by: matoro <matoro_mailinglist_kernel@matoro.tk> > Signed-off-by: Helge Deller <deller@gmx.de> > Fixes: 3033cd430768 ("parisc: Use generic mmap top-down layout and brk > randomization") > Cc: <stable@vger.kernel.org> # v6.6+ > > diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig > index a15ab147af2e..68cbe666510a 100644 > --- a/arch/parisc/Kconfig > +++ b/arch/parisc/Kconfig > @@ -138,11 +138,11 @@ config ARCH_MMAP_RND_COMPAT_BITS_MIN > default 8 > > config ARCH_MMAP_RND_BITS_MAX > - default 24 if 64BIT > - default 17 > + default 18 if 64BIT > + default 13 > > config ARCH_MMAP_RND_COMPAT_BITS_MAX > - default 17 > + default 13 > > # unless you want to implement ACPI on PA-RISC ... ;-) > config PM > diff --git a/arch/parisc/include/asm/elf.h b/arch/parisc/include/asm/elf.h > index 140eaa97bf21..2d73d3c3cd37 100644 > --- a/arch/parisc/include/asm/elf.h > +++ b/arch/parisc/include/asm/elf.h > @@ -349,15 +349,7 @@ struct pt_regs; /* forward declaration... */ > > #define ELF_HWCAP 0 > > -/* Masks for stack and mmap randomization */ > -#define BRK_RND_MASK (is_32bit_task() ? 0x07ffUL : 0x3ffffUL) > -#define MMAP_RND_MASK (is_32bit_task() ? 0x1fffUL : 0x3ffffUL) > -#define STACK_RND_MASK MMAP_RND_MASK > - > -struct mm_struct; > -extern unsigned long arch_randomize_brk(struct mm_struct *); > -#define arch_randomize_brk arch_randomize_brk > - > +#define STACK_RND_MASK 0x7ff /* 8MB of VA */ > > #define ARCH_HAS_SETUP_ADDITIONAL_PAGES 1 > struct linux_binprm; > diff --git a/arch/parisc/include/asm/processor.h > b/arch/parisc/include/asm/processor.h > index ff6cbdb6903b..ece4b3046515 100644 > --- a/arch/parisc/include/asm/processor.h > +++ b/arch/parisc/include/asm/processor.h > @@ -47,6 +47,8 @@ > > #ifndef __ASSEMBLY__ > > +struct rlimit; > +unsigned long mmap_upper_limit(struct rlimit *rlim_stack); > unsigned long calc_max_stack_size(unsigned long stack_max); > > /* > diff --git a/arch/parisc/kernel/sys_parisc.c > b/arch/parisc/kernel/sys_parisc.c > index ab896eff7a1d..98af719d5f85 100644 > --- a/arch/parisc/kernel/sys_parisc.c > +++ b/arch/parisc/kernel/sys_parisc.c > @@ -77,7 +77,7 @@ unsigned long calc_max_stack_size(unsigned long stack_max) > * indicating that "current" should be used instead of a passed-in > * value from the exec bprm as done with arch_pick_mmap_layout(). > */ > -static unsigned long mmap_upper_limit(struct rlimit *rlim_stack) > +unsigned long mmap_upper_limit(struct rlimit *rlim_stack) > { > unsigned long stack_base; > > diff --git a/mm/util.c b/mm/util.c > index 8cbbfd3a3d59..0b7e715a71f2 100644 > --- a/mm/util.c > +++ b/mm/util.c > @@ -414,6 +414,15 @@ static int mmap_is_legacy(struct rlimit *rlim_stack) > > static unsigned long mmap_base(unsigned long rnd, struct rlimit > *rlim_stack) > { > +#ifdef CONFIG_STACK_GROWSUP > + /* > + * For an upwards growing stack the calculation is much simpler. > + * Memory for the maximum stack size is reserved at the top of the > + * task. mmap_base starts directly below the stack and grows > + * downwards. > + */ > + return PAGE_ALIGN(mmap_upper_limit(rlim_stack) - rnd); > +#else > unsigned long gap = rlim_stack->rlim_cur; > unsigned long pad = stack_guard_gap; > > @@ -431,6 +440,7 @@ static unsigned long mmap_base(unsigned long rnd, struct > rlimit *rlim_stack) > gap = MAX_GAP; > > return PAGE_ALIGN(STACK_TOP - gap - rnd); > +#endif > } > > void arch_pick_mmap_layout(struct mm_struct *mm, struct rlimit *rlim_stack) Works here! Thanks Helge!! ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Bisected stability regression in 6.6 2023-11-11 21:21 ` Helge Deller 2023-11-11 21:27 ` Sam James @ 2023-11-11 21:28 ` matoro 1 sibling, 0 replies; 12+ messages in thread From: matoro @ 2023-11-11 21:28 UTC (permalink / raw) To: Helge Deller; +Cc: linux-parisc, Linux Kernel Mailing List, Sam James On 2023-11-11 16:21, Helge Deller wrote: > On 11/11/23 07:31, matoro wrote: >> Hi Helge, I have bisected a regression in 6.6 which is causing >> userspace segfaults at a significantly increased rate in kernel 6.6. >> There seems to be a pathological case triggered by the ninja build >> tool. The test case I have been using is cmake with ninja backend to >> attempt to build the nghttp2 package. In 6.6, this segfaults, not at >> the same location every time, but with enough reliability that I was >> able to use it as a bisection regression case, including immediately >> after a reboot. In the kernel log, these show up as "trap #15: Data >> TLB miss fault" messages. Now these messages can and do show up in >> 6.5 causing segfaults, but never immediately after a reboot and >> infrequently enough that the system is stable. With kernel 6.6 I am >> completely unable to build nghttp2 under any circumstances. >> >> I have bisected this down to the following commit: >> >> $ git bisect good >> 3033cd4307681c60db6d08f398a64484b36e0b0f is the first bad commit >> commit 3033cd4307681c60db6d08f398a64484b36e0b0f >> Author: Helge Deller <deller@gmx.de> >> Date: Sat Aug 19 00:53:28 2023 +0200 >> >> parisc: Use generic mmap top-down layout and brk randomization >> >> parisc uses a top-down layout by default that exactly fits the generic >> functions, so get rid of arch specific code and use the generic >> version >> by selecting ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT. >> >> Note that on parisc the stack always grows up and a "unlimited stack" >> simply means that the value as defined in >> CONFIG_STACK_MAX_DEFAULT_SIZE_MB >> should be used. So RLIM_INFINITY is not an indicator to use the legacy >> memory layout. >> >> Signed-off-by: Helge Deller <deller@gmx.de> >> >> arch/parisc/Kconfig | 17 +++++++++++++ >> arch/parisc/kernel/process.c | 14 ----------- >> arch/parisc/kernel/sys_parisc.c | 54 >> +---------------------------------------- >> mm/util.c | 5 +++- >> 4 files changed, 22 insertions(+), 68 deletions(-) > > Thanks for your report! > I think it's quite unlikely that this patch introduces such a bad > regression. > I'd suspect some other bad commmit, but I'll try to reproduce. > > In any case, do you have CONFIG_BPF_JIT enabled? If so, could you try > to reproduce with CONFIG_BPF_JIT disabled? > The JIT is quite new in v6.6 and I did face some crashes and disabling > it helped me so far. > >> I have tried applying ad4aa06e1d92b06ed56c7240252927bd60632efe >> ("parisc: Add nop instructions after TLB inserts") on top of 6.6, but >> it does NOT fix the issue. > > Ok. > > Helge Nope, I use "make olddefconfig" when upgrading and it appears to be default-disabled. $ grep -i "config_bpf_jit" /usr/src/linux-6.6.0-gentoo/.config # CONFIG_BPF_JIT is not set ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2023-11-22 9:07 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-11-11 6:31 Bisected stability regression in 6.6 matoro 2023-11-11 7:02 ` Bagas Sanjaya 2023-11-22 9:07 ` Linux regression tracking #update (Thorsten Leemhuis) 2023-11-11 21:21 ` Helge Deller 2023-11-11 21:27 ` Sam James 2023-11-11 23:33 ` matoro 2023-11-12 1:22 ` Dr. David Alan Gilbert 2023-11-12 8:03 ` Helge Deller 2023-11-12 12:07 ` Dr. David Alan Gilbert 2023-11-12 20:22 ` Helge Deller 2023-11-12 23:37 ` matoro 2023-11-11 21:28 ` matoro
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.