From mboxrd@z Thu Jan 1 00:00:00 1970 From: Peter Robinson Subject: Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1 Date: Fri, 17 Aug 2018 15:32:22 +0100 Message-ID: References: <7ff516fd-1d01-4d7a-1d5d-b58932c0c69d@gmail.com> <20180816203515.GA7688@torres.zugschlus.de> <20180816225844.GW30658@n2100.armlinux.org.uk> <1c2218cb-63bf-1528-6156-8ce93f46169c@iogearbox.net> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Cc: Russell King - ARM Linux , Marc Haber , linux-arm-kernel@lists.infradead.org, netdev@vger.kernel.org, labbott@redhat.com, Eric Dumazet To: Daniel Borkmann Return-path: Received: from mail-wm0-f68.google.com ([74.125.82.68]:50508 "EHLO mail-wm0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725816AbeHQRf7 (ORCPT ); Fri, 17 Aug 2018 13:35:59 -0400 Received: by mail-wm0-f68.google.com with SMTP id s12-v6so7810459wmc.0 for ; Fri, 17 Aug 2018 07:32:23 -0700 (PDT) In-Reply-To: <1c2218cb-63bf-1528-6156-8ce93f46169c@iogearbox.net> Sender: netdev-owner@vger.kernel.org List-ID: On Fri, Aug 17, 2018 at 1:40 PM, Daniel Borkmann wrote: > On 08/17/2018 02:25 PM, Peter Robinson wrote: >> On Thu, Aug 16, 2018 at 11:58 PM, Russell King - ARM Linux >> wrote: >>> On Thu, Aug 16, 2018 at 10:35:16PM +0200, Marc Haber wrote: >>>> On Mon, Jun 25, 2018 at 05:41:27PM +0100, Peter Robinson wrote: >>>>> So with that and the other fix there was no improvement, with those >>>>> and the BPF JIT disabled it works, I'm not sure if the two patches >>>>> have any effect with the JIT disabled though. >>>> >>>> I can confirm the crash with the released 4.18.1 on Banana Pi, and I can >>>> also confirm that disabling BPF JIT makes the Banana Pi work again., >>> >>> I'm afraid that the information in the crash dumps is insufficient >>> to be able to work very much out about these crashes. >>> >>> We need a recipe (kernel configuration and what userspace is doing) >>> so that it's possible to recreate the crash, or we need responses >>> to requests for information - I requested the disassembly of >>> sk_filter_trim_cap and the BPF code dump via setting a sysctl back >>> in early July. Without this, as I say, I don't see how this problem >>> can be progressed. >> >> I can provide a kernel config [1] but I've not had enough time to sit >> down and get the rest of the stuff and debug it due to a combination >> of travel and other priorities. > > Did you get a chance to try latest kernel from Linus' tree [1] from last > few days to see whether the issue is still persistent? There have been > a number of improvements, bit strange why e.g. Russell didn't run into > it while others have, hmm. Perhaps due to EABI vs non EABI. I haven't had a chance to try anything from the 4.19 merge window as yet, I'm traveling this week so it was on the list for next week to try. > [1] git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git > >>> If the problem is at boot, one way to set the sysctl would be to >>> hack the kernel and explicitly initialise the sysctl to '2', or >>> boot with init=/bin/sh, then manually mount /proc, set the sysctl, >>> and then "exec /sbin/init" from that shell. (Remember there's no >>> job control in that shell, so ^z, ^c, etc do not work.) >> >> It starts to happen in the early kernel boot long before we get to any >> userspace across a number of ARMv7 devices (RPi2/3, BeagleBone and >> AllWinner H3 based devices at least). >> >> [1] https://pbrobinson.fedorapeople.org/kernel-armv7hl.config > > I'd have one potential bug suspicion, for the 4.18 one you were trying, > could you run with the below patch to see whether it would help? I will try and get someone to test that today, thanks > Thanks, > Daniel > > diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c > index f6a62ae..c864f6b 100644 > --- a/arch/arm/net/bpf_jit_32.c > +++ b/arch/arm/net/bpf_jit_32.c > @@ -238,7 +238,7 @@ static void jit_fill_hole(void *area, unsigned int size) > #define STACK_SIZE ALIGN(_STACK_SIZE, STACK_ALIGNMENT) > > /* Get the offset of eBPF REGISTERs stored on scratch space. */ > -#define STACK_VAR(off) (STACK_SIZE - off) > +#define STACK_VAR(off) (STACK_SIZE - off - 4) > > #if __LINUX_ARM_ARCH__ < 7 > From mboxrd@z Thu Jan 1 00:00:00 1970 From: pbrobinson@gmail.com (Peter Robinson) Date: Fri, 17 Aug 2018 15:32:22 +0100 Subject: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1 In-Reply-To: <1c2218cb-63bf-1528-6156-8ce93f46169c@iogearbox.net> References: <7ff516fd-1d01-4d7a-1d5d-b58932c0c69d@gmail.com> <20180816203515.GA7688@torres.zugschlus.de> <20180816225844.GW30658@n2100.armlinux.org.uk> <1c2218cb-63bf-1528-6156-8ce93f46169c@iogearbox.net> Message-ID: To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Fri, Aug 17, 2018 at 1:40 PM, Daniel Borkmann wrote: > On 08/17/2018 02:25 PM, Peter Robinson wrote: >> On Thu, Aug 16, 2018 at 11:58 PM, Russell King - ARM Linux >> wrote: >>> On Thu, Aug 16, 2018 at 10:35:16PM +0200, Marc Haber wrote: >>>> On Mon, Jun 25, 2018 at 05:41:27PM +0100, Peter Robinson wrote: >>>>> So with that and the other fix there was no improvement, with those >>>>> and the BPF JIT disabled it works, I'm not sure if the two patches >>>>> have any effect with the JIT disabled though. >>>> >>>> I can confirm the crash with the released 4.18.1 on Banana Pi, and I can >>>> also confirm that disabling BPF JIT makes the Banana Pi work again., >>> >>> I'm afraid that the information in the crash dumps is insufficient >>> to be able to work very much out about these crashes. >>> >>> We need a recipe (kernel configuration and what userspace is doing) >>> so that it's possible to recreate the crash, or we need responses >>> to requests for information - I requested the disassembly of >>> sk_filter_trim_cap and the BPF code dump via setting a sysctl back >>> in early July. Without this, as I say, I don't see how this problem >>> can be progressed. >> >> I can provide a kernel config [1] but I've not had enough time to sit >> down and get the rest of the stuff and debug it due to a combination >> of travel and other priorities. > > Did you get a chance to try latest kernel from Linus' tree [1] from last > few days to see whether the issue is still persistent? There have been > a number of improvements, bit strange why e.g. Russell didn't run into > it while others have, hmm. Perhaps due to EABI vs non EABI. I haven't had a chance to try anything from the 4.19 merge window as yet, I'm traveling this week so it was on the list for next week to try. > [1] git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git > >>> If the problem is at boot, one way to set the sysctl would be to >>> hack the kernel and explicitly initialise the sysctl to '2', or >>> boot with init=/bin/sh, then manually mount /proc, set the sysctl, >>> and then "exec /sbin/init" from that shell. (Remember there's no >>> job control in that shell, so ^z, ^c, etc do not work.) >> >> It starts to happen in the early kernel boot long before we get to any >> userspace across a number of ARMv7 devices (RPi2/3, BeagleBone and >> AllWinner H3 based devices at least). >> >> [1] https://pbrobinson.fedorapeople.org/kernel-armv7hl.config > > I'd have one potential bug suspicion, for the 4.18 one you were trying, > could you run with the below patch to see whether it would help? I will try and get someone to test that today, thanks > Thanks, > Daniel > > diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c > index f6a62ae..c864f6b 100644 > --- a/arch/arm/net/bpf_jit_32.c > +++ b/arch/arm/net/bpf_jit_32.c > @@ -238,7 +238,7 @@ static void jit_fill_hole(void *area, unsigned int size) > #define STACK_SIZE ALIGN(_STACK_SIZE, STACK_ALIGNMENT) > > /* Get the offset of eBPF REGISTERs stored on scratch space. */ > -#define STACK_VAR(off) (STACK_SIZE - off) > +#define STACK_VAR(off) (STACK_SIZE - off - 4) > > #if __LINUX_ARM_ARCH__ < 7 >