All of lore.kernel.org
 help / color / mirror / Atom feed
* PROBLEM: Kernel Oops in UDP stack
@ 2018-07-31 15:06 Marcel Hellwig
  2018-07-31 15:59 ` Eric Dumazet
  0 siblings, 1 reply; 19+ messages in thread
From: Marcel Hellwig @ 2018-07-31 15:06 UTC (permalink / raw)
  To: 'davem@davemloft.net', 'kuznet@ms2.inr.ac.ru',
	'yoshfuji@linux-ipv6.org'
  Cc: 'netdev@vger.kernel.org',
	'linux-kernel@vger.kernel.org',
	Matthias Wystrik

Dear all,

we are facing a problem the UDP Stack in our embedded device based on a LPC3250.

We discovered the bug the first place in the 2.6.39.2 kernel provided by lpc[0].
We tried different newer versions of the kernel until 3.4.113 and the error still occurs.
Newer versions of the kernel have not been tested (yet).

We have a simple program that listens on a multicast address and uses select to query the socket.
We read the data, validate it and process it further (put it into some shared memory via shm_open).
The bandwidth of the traffic is approximately 100Mbit/s.

We tried to debug the error with gdb and printfs, but all the pointer in the relevant section looked sane and a printf of the values did not trigger a panic.
The bug occurs after approximately 15 minutes under high network load, but cannot be triggered reliably. 

We found two relevant topics [1][2], but the first one didn't helped and the second one has no answer, but looks promising. 

Because this bug affects a lot of our products, we want to develop an intermediate patch, but we need some help to locate the error.
In the long term we want to migrate to a newer kernel, but at the moment we need this fix for our customers.

Can anybody help us to spot out the error so that we can develop a patch for this problem?
Perhaps this is a known issue and a solution is already available.



Further information:
https://gist.github.com/hellow554/6b11c6c0827d5db80a7e66f71f5636ff

/proc/version:
Linux version 3.4.113.7 (buildroot@buildroot) (gcc version 4.9.4 (Buildroot 2018.02.1) ) #1 PREEMPT Mon Apr 9 23:40:00 CEST 2018

Kernel oops:
[ 1125.090000] Unable to handle kernel paging request at virtual address c14fe63a
[ 1125.100000] pgd = c14d8000
[ 1125.100000] [c14fe63a] *pgd=8140041e(bad)
[ 1125.100000] Internal error: Oops: 1 [#1] PREEMPT ARM
[ 1125.100000] Modules linked in:
[ 1125.100000] CPU: 0    Not tainted  (3.4.113.7 #1)
[ 1125.100000] PC is at udp_recvmsg+0x284/0x33c
[ 1125.100000] LR is at 0x0
[ 1125.100000] pc : [<c0228adc>]    lr : [<00000000>]    psr: a0000013
[ 1125.100000] sp : c1e67d10  ip : 00000000  fp : 0000004a
[ 1125.100000] r10: c1e67d34  r9 : 0000004a  r8 : 00000000
[ 1125.100000] r7 : 000005c0  r6 : c1e10220  r5 : c1e67f7c  r4 : c14f4640
[ 1125.100000] r3 : c14fe62e  r2 : c1e67ec0  r1 : 00000008  r0 : c1e67ec8
[ 1125.100000] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[ 1125.100000] Control: 0005317f  Table: 814d8000  DAC: 00000015
[ 1125.100000] Process trdp.release (pid: 132, stack limit = 0xc1e66270)
[ 1125.100000] Stack: (0xc1e67d10 to 0xc1e68000)
[ 1125.100000] 7d00:                                     c1e67d34 00004348 00000001 0000004a
[ 1125.100000] 7d20: 00000000 c1e67ec0 c1e10220 00000000 00000000 00000000 c022aea4 c1e67f7c
[ 1125.100000] 7d40: 00000000 00000000 00000000 000005c0 00000000 c1e67f7c c1e67ec0 c02306e0
[ 1125.100000] 7d60: 00000000 00000000 c1e67d74 00000000 c1e10220 00000000 c1e67d90 00000000
[ 1125.100000] 7d80: c3483000 c01d2c38 00000000 00000001 00000001 00000000 00000000 000005c0
[ 1125.100000] 7da0: c3483000 c005c5d4 00000000 c1e67f7c 00000000 c03482b8 c1e66000 c1e67ee0
[ 1125.100000] 7dc0: c1e67e4c 0001424f c0345ba0 c035fca0 c035fca8 c005c6a8 00000000 00000001
[ 1125.100000] 7de0: ffffffff 00000000 00000000 00000000 00000000 00000000 c3887700 c0022608
[ 1125.100000] 7e00: 00000000 00000000 c01e59b4 00000001 c1e67d90 00000000 00000000 beca3b78
[ 1125.100000] 7e20: 00000004 00000000 00000004 c00b05b0 c1e67e48 c1e67e4c c1e67e50 00000001
[ 1125.100000] 7e40: c1e67f7c c3483000 beca35bc c1e67e80 c1e67f7c c1e67e80 c01d2b90 c3483000
[ 1125.100000] 7e60: beca35bc 00000000 c1e67e80 c01d3fac beca35d8 00000008 beca35c0 beca359c
[ 1125.100000] 7e80: b6ab34aa 00000576 00000001 c0022128 c025ce20 c1e49a80 00000009 0001424e
[ 1125.100000] 7ea0: 40008000 c1e66008 0000001d 00000000 c1e67f14 c3887700 c1e66008 0000001d
[ 1125.100000] 7ec0: b0040002 1714010a 00000000 00000000 c00189a8 00000013 f4008000 c000dbf4
[ 1125.100000] 7ee0: 81e68000 c1e49180 00000000 00000000 c1e1e000 c003c2f8 c1e1e000 00000002
[ 1125.100000] 7f00: c036092c c0346700 00000000 00000000 ffffffff 00000000 ffffffff 00000000
[ 1125.100000] 7f20: c1e67f78 c1e66000 c1e67f78 beca3cc0 00000001 beca3cc0 00000008 beca35bc
[ 1125.100000] 7f40: 00000000 c3483000 beca35bc 00000000 00000129 c000e188 c1e66000 00000000
[ 1125.100000] 7f60: beca37e0 c01d4fbc 00000000 beca3860 beca3938 00000000 fffffff7 c1e67ec0
[ 1125.100000] 7f80: 00000000 c1e67e80 00000001 beca359c 00000020 00000000 beca359c b6ab3460
[ 1125.100000] 7fa0: 00000000 c000dfe0 beca359c b6ab3460 00000004 beca35bc 00000000 00000020
[ 1125.100000] 7fc0: beca359c b6ab3460 00000000 00000129 beca37f4 00000000 00056178 beca37e0
[ 1125.100000] 7fe0: 00000004 beca33d0 0001a904 b6f6d8dc 60000010 00000004 83ffe831 83ffec31
[ 1125.100000] [<c0228adc>] (udp_recvmsg+0x284/0x33c) from [<c02306e0>] (inet_recvmsg+0x38/0x4c)
[ 1125.100000] [<c02306e0>] (inet_recvmsg+0x38/0x4c) from [<c01d2c38>] (sock_recvmsg+0xa8/0xcc)
[ 1125.100000] [<c01d2c38>] (sock_recvmsg+0xa8/0xcc) from [<c01d3fac>] (___sys_recvmsg.part.4+0xe0/0x1bc)
[ 1125.100000] [<c01d3fac>] (___sys_recvmsg.part.4+0xe0/0x1bc) from [<c01d4fbc>] (__sys_recvmsg+0x50/0x80)
[ 1125.100000] [<c01d4fbc>] (__sys_recvmsg+0x50/0x80) from [<c000dfe0>] (ret_fast_syscall+0x0/0x2c)
[ 1125.100000] Code: e1d330b0 e3a01008 e1c230b2 e5943080 (e593300c)
[ 1125.430000] ---[ end trace f0b7642b14562089 ]---
[ 1125.440000] ------------[ cut here ]------------
[ 1125.450000] WARNING: at net/ipv4/af_inet.c:153 inet_sock_destruct+0x188/0x1a8()
[ 1125.460000] Modules linked in:
[ 1125.460000] [<c0013ac8>] (unwind_backtrace+0x0/0xec) from [<c001c588>] (warn_slowpath_common+0x4c/0x64)
[ 1125.470000] [<c001c588>] (warn_slowpath_common+0x4c/0x64) from [<c001c63c>] (warn_slowpath_null+0x1c/0x24)
[ 1125.480000] [<c001c63c>] (warn_slowpath_null+0x1c/0x24) from [<c0230888>] (inet_sock_destruct+0x188/0x1a8)
[ 1125.490000] [<c0230888>] (inet_sock_destruct+0x188/0x1a8) from [<c01d75f4>] (__sk_free+0x18/0x154)
[ 1125.500000] [<c01d75f4>] (__sk_free+0x18/0x154) from [<c0230aa0>] (inet_release+0x44/0x70)
[ 1125.510000] [<c0230aa0>] (inet_release+0x44/0x70) from [<c01d3714>] (sock_release+0x20/0xc8)
[ 1125.510000] [<c01d3714>] (sock_release+0x20/0xc8) from [<c01d37d0>] (sock_close+0x14/0x2c)
[ 1125.520000] [<c01d37d0>] (sock_close+0x14/0x2c) from [<c00a0044>] (fput+0xb4/0x27c)
[ 1125.530000] [<c00a0044>] (fput+0xb4/0x27c) from [<c009d64c>] (filp_close+0x64/0x88)
[ 1125.540000] [<c009d64c>] (filp_close+0x64/0x88) from [<c001fb28>] (put_files_struct+0x80/0xe0)
[ 1125.550000] [<c001fb28>] (put_files_struct+0x80/0xe0) from [<c0020388>] (do_exit+0x4c8/0x748)
[ 1125.560000] [<c0020388>] (do_exit+0x4c8/0x748) from [<c0011894>] (die+0x214/0x240)
[ 1125.560000] [<c0011894>] (die+0x214/0x240) from [<c0256160>] (__do_kernel_fault.part.0+0x54/0x74)
[ 1125.570000] [<c0256160>] (__do_kernel_fault.part.0+0x54/0x74) from [<c0015188>] (do_bad_area+0x88/0x8c)
[ 1125.580000] [<c0015188>] (do_bad_area+0x88/0x8c) from [<c00173dc>] (do_alignment+0xf0/0x938)
[ 1125.590000] [<c00173dc>] (do_alignment+0xf0/0x938) from [<c000862c>] (do_DataAbort+0x34/0x98)
[ 1125.600000] [<c000862c>] (do_DataAbort+0x34/0x98) from [<c000db98>] (__dabt_svc+0x38/0x60)
[ 1125.610000] Exception stack(0xc1e67cc8 to 0xc1e67d10)
[ 1125.610000] 7cc0:                   c1e67ec8 00000008 c1e67ec0 c14fe62e c14f4640 c1e67f7c
[ 1125.620000] 7ce0: c1e10220 000005c0 00000000 0000004a c1e67d34 0000004a 00000000 c1e67d10
[ 1125.630000] 7d00: 00000000 c0228adc a0000013 ffffffff
[ 1125.640000] [<c000db98>] (__dabt_svc+0x38/0x60) from [<c0228adc>] (udp_recvmsg+0x284/0x33c)
[ 1125.650000] [<c0228adc>] (udp_recvmsg+0x284/0x33c) from [<c02306e0>] (inet_recvmsg+0x38/0x4c)
[ 1125.650000] [<c02306e0>] (inet_recvmsg+0x38/0x4c) from [<c01d2c38>] (sock_recvmsg+0xa8/0xcc)
[ 1125.660000] [<c01d2c38>] (sock_recvmsg+0xa8/0xcc) from [<c01d3fac>] (___sys_recvmsg.part.4+0xe0/0x1bc)
[ 1125.670000] [<c01d3fac>] (___sys_recvmsg.part.4+0xe0/0x1bc) from [<c01d4fbc>] (__sys_recvmsg+0x50/0x80)
[ 1125.680000] [<c01d4fbc>] (__sys_recvmsg+0x50/0x80) from [<c000dfe0>] (ret_fast_syscall+0x0/0x2c)
[ 1125.690000] ---[ end trace f0b7642b1456208a ]---
[ 1125.700000] ------------[ cut here ]------------
[ 1125.700000] WARNING: at net/ipv4/af_inet.c:156 inet_sock_destruct+0x158/0x1a8()
[ 1125.710000] Modules linked in:
[ 1125.710000] [<c0013ac8>] (unwind_backtrace+0x0/0xec) from [<c001c588>] (warn_slowpath_common+0x4c/0x64)
[ 1125.720000] [<c001c588>] (warn_slowpath_common+0x4c/0x64) from [<c001c63c>] (warn_slowpath_null+0x1c/0x24)
[ 1125.730000] [<c001c63c>] (warn_slowpath_null+0x1c/0x24) from [<c0230858>] (inet_sock_destruct+0x158/0x1a8)
[ 1125.740000] [<c0230858>] (inet_sock_destruct+0x158/0x1a8) from [<c01d75f4>] (__sk_free+0x18/0x154)
[ 1125.750000] [<c01d75f4>] (__sk_free+0x18/0x154) from [<c0230aa0>] (inet_release+0x44/0x70)
[ 1125.760000] [<c0230aa0>] (inet_release+0x44/0x70) from [<c01d3714>] (sock_release+0x20/0xc8)
[ 1125.770000] [<c01d3714>] (sock_release+0x20/0xc8) from [<c01d37d0>] (sock_close+0x14/0x2c)
[ 1125.780000] [<c01d37d0>] (sock_close+0x14/0x2c) from [<c00a0044>] (fput+0xb4/0x27c)
[ 1125.780000] [<c00a0044>] (fput+0xb4/0x27c) from [<c009d64c>] (filp_close+0x64/0x88)
[ 1125.790000] [<c009d64c>] (filp_close+0x64/0x88) from [<c001fb28>] (put_files_struct+0x80/0xe0)
[ 1125.800000] [<c001fb28>] (put_files_struct+0x80/0xe0) from [<c0020388>] (do_exit+0x4c8/0x748)
[ 1125.810000] [<c0020388>] (do_exit+0x4c8/0x748) from [<c0011894>] (die+0x214/0x240)
[ 1125.820000] [<c0011894>] (die+0x214/0x240) from [<c0256160>] (__do_kernel_fault.part.0+0x54/0x74)
[ 1125.830000] [<c0256160>] (__do_kernel_fault.part.0+0x54/0x74) from [<c0015188>] (do_bad_area+0x88/0x8c)
[ 1125.840000] [<c0015188>] (do_bad_area+0x88/0x8c) from [<c00173dc>] (do_alignment+0xf0/0x938)
[ 1125.850000] [<c00173dc>] (do_alignment+0xf0/0x938) from [<c000862c>] (do_DataAbort+0x34/0x98)
[ 1125.850000] [<c000862c>] (do_DataAbort+0x34/0x98) from [<c000db98>] (__dabt_svc+0x38/0x60)
[ 1125.860000] Exception stack(0xc1e67cc8 to 0xc1e67d10)
[ 1125.870000] 7cc0:                   c1e67ec8 00000008 c1e67ec0 c14fe62e c14f4640 c1e67f7c
[ 1125.880000] 7ce0: c1e10220 000005c0 00000000 0000004a c1e67d34 0000004a 00000000 c1e67d10
[ 1125.880000] 7d00: 00000000 c0228adc a0000013 ffffffff
[ 1125.890000] [<c000db98>] (__dabt_svc+0x38/0x60) from [<c0228adc>] (udp_recvmsg+0x284/0x33c)
[ 1125.900000] [<c0228adc>] (udp_recvmsg+0x284/0x33c) from [<c02306e0>] (inet_recvmsg+0x38/0x4c)
[ 1125.910000] [<c02306e0>] (inet_recvmsg+0x38/0x4c) from [<c01d2c38>] (sock_recvmsg+0xa8/0xcc)
[ 1125.920000] [<c01d2c38>] (sock_recvmsg+0xa8/0xcc) from [<c01d3fac>] (___sys_recvmsg.part.4+0xe0/0x1bc)
[ 1125.930000] [<c01d3fac>] (___sys_recvmsg.part.4+0xe0/0x1bc) from [<c01d4fbc>] (__sys_recvmsg+0x50/0x80)
[ 1125.940000] [<c01d4fbc>] (__sys_recvmsg+0x50/0x80) from [<c000dfe0>] (ret_fast_syscall+0x0/0x2c)
[ 1125.940000] ---[ end trace f0b7642b1456208b ]--- 


[0]: http://git.lpclinux.com/?p=linux-2.6.39.2-lpc.git;a=summary
[1]: http://lists.openwall.net/netdev/2009/03/09/28
[2]: http://lists.infradead.org/pipermail/linux-arm-kernel/2013-June/176757.html


Mit freundlichen Grüßen / With kind regards 

Marcel Hellwig
B. Sc. Informatik
Entwickler

m-u-t GmbH
Am Marienhof 2
22880 Wedel
Germany

Phone:	+49 4103 9308 - 474
Fax:  	+49 4103 9308 - 99
mailto:mhellwig@mut-group.com

http://www.mut-group.com

Geschäftsführer (Managing Director): Fabian Peters
Amtsgericht Pinneberg (Commercial Register No.): HRB 10304 PI
USt-IdNr. (VAT-No.): DE228275390
WEEE-Reg-Nr.: DE 72271808


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: PROBLEM: Kernel Oops in UDP stack
  2018-07-31 15:06 PROBLEM: Kernel Oops in UDP stack Marcel Hellwig
@ 2018-07-31 15:59 ` Eric Dumazet
  2018-08-01  5:55   ` AW: " Marcel Hellwig
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2018-07-31 15:59 UTC (permalink / raw)
  To: Marcel Hellwig, 'davem@davemloft.net',
	'kuznet@ms2.inr.ac.ru', 'yoshfuji@linux-ipv6.org'
  Cc: 'netdev@vger.kernel.org',
	'linux-kernel@vger.kernel.org',
	Matthias Wystrik



On 07/31/2018 08:06 AM, Marcel Hellwig wrote:
> Dear all,

> [ 1125.100000] [<c0228adc>] (udp_recvmsg+0x284/0x33c) from [<c02306e0>] (inet_recvmsg+0x38/0x4c)
> [ 1125.100000] [<c02306e0>] (inet_recvmsg+0x38/0x4c) from [<c01d2c38>] (sock_recvmsg+0xa8/0xcc)
> [ 1125.100000] [<c01d2c38>] (sock_recvmsg+0xa8/0xcc) from [<c01d3fac>] (___sys_recvmsg.part.4+0xe0/0x1bc)
> [ 1125.100000] [<c01d3fac>] (___sys_recvmsg.part.4+0xe0/0x1bc) from [<c01d4fbc>] (__sys_recvmsg+0x50/0x80)
> [ 1125.100000] [<c01d4fbc>] (__sys_recvmsg+0x50/0x80) from [<c000dfe0>] (ret_fast_syscall+0x0/0x2c)

Any idea how you could get file:line information ?

( like : udp_setsockopt+0x62/0xa0 net/ipv4/udp.c:2502 )


^ permalink raw reply	[flat|nested] 19+ messages in thread

* AW: PROBLEM: Kernel Oops in UDP stack
  2018-07-31 15:59 ` Eric Dumazet
@ 2018-08-01  5:55   ` Marcel Hellwig
  2018-08-01 10:20     ` Eric Dumazet
  0 siblings, 1 reply; 19+ messages in thread
From: Marcel Hellwig @ 2018-08-01  5:55 UTC (permalink / raw)
  To: 'Eric Dumazet', 'davem@davemloft.net',
	'kuznet@ms2.inr.ac.ru', 'yoshfuji@linux-ipv6.org',
	'andrew@lunn.ch'
  Cc: 'netdev@vger.kernel.org',
	'linux-kernel@vger.kernel.org',
	Matthias Wystrik

On Tue, Jul 31, 2018 at 15:36:05PM +0000 Andrew Lunn wrote:

> Is this mainline 3.4.113, or LPC version?
Mainline, afaik there is no newer version of the lpc kernel and the lpc driver are upstream since 3.4 (hence the 3.4.113 kernel version we tried).

> How much work is involved in testing a newer kernel. You are not going to get too much help from the community with such an old kernel. If you can reproduce it with a modern day kernel, then people are more likely to help.

We haven't tried any newer version, because DTS is mandatory since 3.5 afaik? We hadn't the time to look into, although it looks pretty straight forward.

>> Kernel oops:
>> [ 1125.090000] Unable to handle kernel paging request at virtual 
>> address c14fe63a [ 1125.100000] pgd = c14d8000 [ 1125.100000] 
>> [c14fe63a] *pgd=8140041e(bad) [ 1125.100000] Internal error: Oops: 1 
>> [#1] PREEMPT ARM [ 1125.100000] Modules linked in:
>> [ 1125.100000] CPU: 0    Not tainted  (3.4.113.7 #1)
>> [ 1125.100000] PC is at udp_recvmsg+0x284/0x33c [ 1125.100000] LR is 
>> at 0x0
> LR == 0 is suspicious. It should contain the return address, inet_recvmsg+0x38/0x4c. That is assuming the calling convention is the same for this old kernel as todays kernels on ARM.

I will do a little debugging why LR is 0 here. Maybe that's the clue.

> Could you produce net/ipv4/udp.lst for this exact kernel build?

Sure: https://gist.github.com/hellow554/6b11c6c0827d5db80a7e66f71f5636ff#file-net_uipv4_udp-lst

> Any idea how you could get file:line information ?
> ( like : udp_setsockopt+0x62/0xa0 net/ipv4/udp.c:2502 )

[<c0228adc>] (udp_recvmsg+0x284/0x33c) from [<c02306e0>] (inet_recvmsg+0x38/0x4c): net/ipv4/udp.c:1234
[<c02306e0>] (inet_recvmsg+0x38/0x4c) from [<c01d2c38>] (sock_recvmsg+0xa8/0xcc): include/linux/file.h:25
[<c01d2c38>] (sock_recvmsg+0xa8/0xcc) from [<c01d3fac>] (___sys_recvmsg.part.4+0xe0/0x1bc): net/socket.c:751
[<c01d3fac>] (___sys_recvmsg.part.4+0xe0/0x1bc) from [<c01d4fbc>] (__sys_recvmsg+0x50/0x80): net/socket.c:2193
[<c01d4fbc>] (__sys_recvmsg+0x50/0x80) from [<c000dfe0>] (ret_fast_syscall+0x0/0x2c): include/linux/file.h:25 (from arch/arm/kernel/entry-common.S:34)

https://elixir.bootlin.com/linux/v3.4.113/source :)


Many thanks for the answer, I hope I could answer your questions.


Mit freundlichen Grüßen / With kind regards

Marcel Hellwig
B. Sc. Informatik
Entwickler

m-u-t GmbH
Am Marienhof 2
22880 Wedel
Germany

Phone:	+49 4103 9308 - 474
Fax:  	+49 4103 9308 - 99
mhellwig@mut-group.com

www.mut-group.com

Geschäftsführer (Managing Director): Fabian Peters
Amtsgericht Pinneberg (Commercial Register No.): HRB 10304 PI
USt-IdNr. (VAT-No.): DE228275390
WEEE-Reg-Nr.: DE 72271808



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: AW: PROBLEM: Kernel Oops in UDP stack
  2018-08-01  5:55   ` AW: " Marcel Hellwig
@ 2018-08-01 10:20     ` Eric Dumazet
  2018-08-01 10:35       ` AW: " Marcel Hellwig
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2018-08-01 10:20 UTC (permalink / raw)
  To: Marcel Hellwig, 'Eric Dumazet',
	'davem@davemloft.net', 'kuznet@ms2.inr.ac.ru',
	'yoshfuji@linux-ipv6.org', 'andrew@lunn.ch'
  Cc: 'netdev@vger.kernel.org',
	'linux-kernel@vger.kernel.org',
	Matthias Wystrik



On 07/31/2018 10:55 PM, Marcel Hellwig wrote:
> 
> [<c0228adc>] (udp_recvmsg+0x284/0x33c) from [<c02306e0>] (inet_recvmsg+0x38/0x4c): net/ipv4/udp.c:1234


              sin->sin_addr.s_addr = ip_hdr(skb)->saddr;

Unaligned access trap (virtual address c14fe63a), so either sin or ip_hdr(skb) are not on a 32bit alignment

Can you produce the disassembly of the trapping instruction ?

(Is is a read at address c14fe63a, or a write ...)

Thanks.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* AW: AW: PROBLEM: Kernel Oops in UDP stack
  2018-08-01 10:20     ` Eric Dumazet
@ 2018-08-01 10:35       ` Marcel Hellwig
  2018-08-01 10:44         ` Paolo Abeni
  2018-08-02  9:17         ` David Laight
  0 siblings, 2 replies; 19+ messages in thread
From: Marcel Hellwig @ 2018-08-01 10:35 UTC (permalink / raw)
  To: 'Eric Dumazet', 'davem@davemloft.net',
	'kuznet@ms2.inr.ac.ru', 'yoshfuji@linux-ipv6.org',
	'andrew@lunn.ch'
  Cc: 'netdev@vger.kernel.org',
	'linux-kernel@vger.kernel.org',
	Matthias Wystrik

>> [<c0228adc>] (udp_recvmsg+0x284/0x33c) from [<c02306e0>] (inet_recvmsg+0x38/0x4c): net/ipv4/udp.c:1234
>
>              sin->sin_addr.s_addr = ip_hdr(skb)->saddr;
>
>Unaligned access trap (virtual address c14fe63a), so either sin or ip_hdr(skb) are not on a 32bit alignment
>
>Can you produce the disassembly of the trapping instruction ?

https://gist.github.com/hellow554/6b11c6c0827d5db80a7e66f71f5636ff#file-net_uipv4_udp-lst-L1892-L1895

		sin->sin_addr.s_addr = ip_hdr(skb)->saddr;
c0228ad8:	e5943080 	ldr	r3, [r4, #128]	; 0x80
c0228adc:	e593300c 	ldr	r3, [r3, #12]
c0228ae0: 	e5823004	str	r3, [r2, #4]


Mit freundlichen Grüßen / With kind regards

Marcel Hellwig
B. Sc. Informatik
Entwickler

m-u-t GmbH
Am Marienhof 2
22880 Wedel
Germany

Phone:	+49 4103 9308 - 474
Fax:  	+49 4103 9308 - 99
mhellwig@mut-group.com

www.mut-group.com

Geschäftsführer (Managing Director): Fabian Peters
Amtsgericht Pinneberg (Commercial Register No.): HRB 10304 PI
USt-IdNr. (VAT-No.): DE228275390
WEEE-Reg-Nr.: DE 72271808



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: AW: AW: PROBLEM: Kernel Oops in UDP stack
  2018-08-01 10:35       ` AW: " Marcel Hellwig
@ 2018-08-01 10:44         ` Paolo Abeni
  2018-08-01 10:49           ` Eric Dumazet
  2018-08-02  9:17         ` David Laight
  1 sibling, 1 reply; 19+ messages in thread
From: Paolo Abeni @ 2018-08-01 10:44 UTC (permalink / raw)
  To: Marcel Hellwig, 'Eric Dumazet',
	'davem@davemloft.net', 'kuznet@ms2.inr.ac.ru',
	'yoshfuji@linux-ipv6.org', 'andrew@lunn.ch'
  Cc: 'netdev@vger.kernel.org',
	'linux-kernel@vger.kernel.org',
	Matthias Wystrik

On Wed, 2018-08-01 at 10:35 +0000, Marcel Hellwig wrote:
> > > [<c0228adc>] (udp_recvmsg+0x284/0x33c) from [<c02306e0>] (inet_recvmsg+0x38/0x4c): net/ipv4/udp.c:1234
> > 
> >              sin->sin_addr.s_addr = ip_hdr(skb)->saddr;
> > 
> > Unaligned access trap (virtual address c14fe63a), so either sin or ip_hdr(skb) are not on a 32bit alignment
> > 
> > Can you produce the disassembly of the trapping instruction ?
> 
> https://gist.github.com/hellow554/6b11c6c0827d5db80a7e66f71f5636ff#file-net_uipv4_udp-lst-L1892-L1895
> 
> 		sin->sin_addr.s_addr = ip_hdr(skb)->saddr;
> c0228ad8:	e5943080 	ldr	r3, [r4, #128]	; 0x80
> c0228adc:	e593300c 	ldr	r3, [r3, #12]
> c0228ae0: 	e5823004	str	r3, [r2, #4]

I *think* pskb_trim_rcsum() in __udp4_lib_rcv() can copy the ipv4
header to an unaligned address, for cloned skbs. If I understood
correctly the relevant socket is a mcast one, so cloned skbs can land
there.

Paolo


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: AW: AW: PROBLEM: Kernel Oops in UDP stack
  2018-08-01 10:44         ` Paolo Abeni
@ 2018-08-01 10:49           ` Eric Dumazet
  2018-08-01 11:25             ` Eric Dumazet
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2018-08-01 10:49 UTC (permalink / raw)
  To: Paolo Abeni, Marcel Hellwig, 'Eric Dumazet',
	'davem@davemloft.net', 'kuznet@ms2.inr.ac.ru',
	'yoshfuji@linux-ipv6.org', 'andrew@lunn.ch'
  Cc: 'netdev@vger.kernel.org',
	'linux-kernel@vger.kernel.org',
	Matthias Wystrik



On 08/01/2018 03:44 AM, Paolo Abeni wrote:
> On Wed, 2018-08-01 at 10:35 +0000, Marcel Hellwig wrote:
>>>> [<c0228adc>] (udp_recvmsg+0x284/0x33c) from [<c02306e0>] (inet_recvmsg+0x38/0x4c): net/ipv4/udp.c:1234
>>>
>>>              sin->sin_addr.s_addr = ip_hdr(skb)->saddr;
>>>
>>> Unaligned access trap (virtual address c14fe63a), so either sin or ip_hdr(skb) are not on a 32bit alignment
>>>
>>> Can you produce the disassembly of the trapping instruction ?
>>
>> https://gist.github.com/hellow554/6b11c6c0827d5db80a7e66f71f5636ff#file-net_uipv4_udp-lst-L1892-L1895
>>
>> 		sin->sin_addr.s_addr = ip_hdr(skb)->saddr;
>> c0228ad8:	e5943080 	ldr	r3, [r4, #128]	; 0x80
>> c0228adc:	e593300c 	ldr	r3, [r3, #12]
>> c0228ae0: 	e5823004	str	r3, [r2, #4]
> 
> I *think* pskb_trim_rcsum() in __udp4_lib_rcv() can copy the ipv4
> header to an unaligned address, for cloned skbs. If I understood
> correctly the relevant socket is a mcast one, so cloned skbs can land
> there.
> 

kmalloc() should return aligned pointer.

pskb_expand_head() should allocate aligned skb->head

So pskb_expand_head() should keep whatever offset was provided in the source skb 

( Driver called skb_reserve() or similar function)


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: AW: AW: PROBLEM: Kernel Oops in UDP stack
  2018-08-01 10:49           ` Eric Dumazet
@ 2018-08-01 11:25             ` Eric Dumazet
  2018-08-01 11:31               ` AW: " Marcel Hellwig
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2018-08-01 11:25 UTC (permalink / raw)
  To: Eric Dumazet, Paolo Abeni, Marcel Hellwig,
	'davem@davemloft.net', 'kuznet@ms2.inr.ac.ru',
	'yoshfuji@linux-ipv6.org', 'andrew@lunn.ch'
  Cc: 'netdev@vger.kernel.org',
	'linux-kernel@vger.kernel.org',
	Matthias Wystrik



On 08/01/2018 03:49 AM, Eric Dumazet wrote:
> 
> 
> On 08/01/2018 03:44 AM, Paolo Abeni wrote:
>> On Wed, 2018-08-01 at 10:35 +0000, Marcel Hellwig wrote:
>>>>> [<c0228adc>] (udp_recvmsg+0x284/0x33c) from [<c02306e0>] (inet_recvmsg+0x38/0x4c): net/ipv4/udp.c:1234
>>>>
>>>>              sin->sin_addr.s_addr = ip_hdr(skb)->saddr;
>>>>
>>>> Unaligned access trap (virtual address c14fe63a), so either sin or ip_hdr(skb) are not on a 32bit alignment
>>>>
>>>> Can you produce the disassembly of the trapping instruction ?
>>>
>>> https://gist.github.com/hellow554/6b11c6c0827d5db80a7e66f71f5636ff#file-net_uipv4_udp-lst-L1892-L1895
>>>
>>> 		sin->sin_addr.s_addr = ip_hdr(skb)->saddr;
>>> c0228ad8:	e5943080 	ldr	r3, [r4, #128]	; 0x80
>>> c0228adc:	e593300c 	ldr	r3, [r3, #12]
>>> c0228ae0: 	e5823004	str	r3, [r2, #4]
>>
>> I *think* pskb_trim_rcsum() in __udp4_lib_rcv() can copy the ipv4
>> header to an unaligned address, for cloned skbs. If I understood
>> correctly the relevant socket is a mcast one, so cloned skbs can land
>> there.
>>
> 
> kmalloc() should return aligned pointer.
> 
> pskb_expand_head() should allocate aligned skb->head
> 
> So pskb_expand_head() should keep whatever offset was provided in the source skb 
> 
> ( Driver called skb_reserve() or similar function)
> 

I suspect the following patch my need to be backported, please Marcel git it a try.

Another way to spot the problem would be to add a check in pskb_expand_head()

commit 5e2afba4ecd7931ea06e6fa116ab28e6943dbd42
Author: Paul Guo <ggang@tilera.com>
Date:   Mon Nov 14 19:00:54 2011 +0800

    netfilter: possible unaligned packet header in ip_route_me_harder
    
    This patch tries to fix the following issue in netfilter:
    In ip_route_me_harder(), we invoke pskb_expand_head() that
    rellocates new header with additional head room which can break
    the alignment of the original packet header.
    
    In one of my NAT test case, the NIC port for internal hosts is
    configured with vlan and the port for external hosts is with
    general configuration. If we ping an external "unknown" hosts from an
    internal host, an icmp packet will be sent. We find that in
    icmp_send()->...->ip_route_me_harder()->pskb_expand_head(), hh_len=18
    and current headroom (skb_headroom(skb)) of the packet is 16. After
    calling pskb_expand_head() the packet header becomes to be unaligned
    and then our system (arch/tile) panics immediately.
    
    Signed-off-by: Paul Guo <ggang@tilera.com>
    Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

diff --git a/net/ipv4/netfilter.c b/net/ipv4/netfilter.c
index 9899619ab9b8db0f9d8d02c8005c0e6bb01fda94..4f47e064e262c2f24e7cb13eacfcebff0fad86a3 100644
--- a/net/ipv4/netfilter.c
+++ b/net/ipv4/netfilter.c
@@ -64,7 +64,8 @@ int ip_route_me_harder(struct sk_buff *skb, unsigned addr_type)
        /* Change in oif may mean change in hh_len. */
        hh_len = skb_dst(skb)->dev->hard_header_len;
        if (skb_headroom(skb) < hh_len &&
-           pskb_expand_head(skb, hh_len - skb_headroom(skb), 0, GFP_ATOMIC))
+           pskb_expand_head(skb, HH_DATA_ALIGN(hh_len - skb_headroom(skb)),
+                               0, GFP_ATOMIC))
                return -1;
 
        return 0;


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* AW: AW: AW: PROBLEM: Kernel Oops in UDP stack
  2018-08-01 11:25             ` Eric Dumazet
@ 2018-08-01 11:31               ` Marcel Hellwig
  2018-08-01 13:27                 ` Marcel Hellwig
  0 siblings, 1 reply; 19+ messages in thread
From: Marcel Hellwig @ 2018-08-01 11:31 UTC (permalink / raw)
  To: 'Eric Dumazet',
	Paolo Abeni, 'davem@davemloft.net',
	'kuznet@ms2.inr.ac.ru', 'yoshfuji@linux-ipv6.org',
	'andrew@lunn.ch'
  Cc: 'netdev@vger.kernel.org',
	'linux-kernel@vger.kernel.org',
	Matthias Wystrik

> I suspect the following patch my need to be backported, please Marcel git it a try.
>

The patch is already in 3.4.113 which still has the problem (that version is from October 2016, so not that ancient, but old 😊 )

>Another way to spot the problem would be to add a check in pskb_expand_head()
>

I'll try that. More suggestions are welcome.

Thanks for your engagement!


Mit freundlichen Grüßen / With kind regards

Marcel Hellwig
B. Sc. Informatik
Entwickler

m-u-t GmbH
Am Marienhof 2
22880 Wedel
Germany

Phone:	+49 4103 9308 - 474
Fax:  	+49 4103 9308 - 99
mhellwig@mut-group.com

www.mut-group.com

Geschäftsführer (Managing Director): Fabian Peters
Amtsgericht Pinneberg (Commercial Register No.): HRB 10304 PI
USt-IdNr. (VAT-No.): DE228275390
WEEE-Reg-Nr.: DE 72271808



^ permalink raw reply	[flat|nested] 19+ messages in thread

* AW: AW: AW: PROBLEM: Kernel Oops in UDP stack
  2018-08-01 11:31               ` AW: " Marcel Hellwig
@ 2018-08-01 13:27                 ` Marcel Hellwig
  2018-08-02 11:02                   ` Marcel Hellwig
  0 siblings, 1 reply; 19+ messages in thread
From: Marcel Hellwig @ 2018-08-01 13:27 UTC (permalink / raw)
  To: 'Eric Dumazet',
	Paolo Abeni, 'davem@davemloft.net',
	'kuznet@ms2.inr.ac.ru', 'yoshfuji@linux-ipv6.org',
	'andrew@lunn.ch'
  Cc: 'netdev@vger.kernel.org',
	'linux-kernel@vger.kernel.org',
	Matthias Wystrik

>Another way to spot the problem would be to add a check in 
>pskb_expand_head()
>

Many thanks for this advice. Indeed, by backporting the pskb_expand_head (and build_skb because it didn't exist) from the version 3.5.7 to 3.4.113 fixes (?) the issue for now. We want to test 3 or 4 machines over night and see if any problem occurs. 

Let's see if we can do the same for our 2.6.39.2 kernel (I guess that will be pure pain 😃 ).

Many thanks so far for your support! I will report tomorrow (CEST) if we succeeded.


Mit freundlichen Grüßen / With kind regards

Marcel Hellwig
B. Sc. Informatik
Entwickler

m-u-t GmbH
Am Marienhof 2
22880 Wedel
Germany

Phone:	+49 4103 9308 - 474
Fax:  	+49 4103 9308 - 99
mhellwig@mut-group.com

www.mut-group.com

Geschäftsführer (Managing Director): Fabian Peters Amtsgericht Pinneberg (Commercial Register No.): HRB 10304 PI USt-IdNr. (VAT-No.): DE228275390
WEEE-Reg-Nr.: DE 72271808



^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: AW: PROBLEM: Kernel Oops in UDP stack
  2018-08-01 10:35       ` AW: " Marcel Hellwig
  2018-08-01 10:44         ` Paolo Abeni
@ 2018-08-02  9:17         ` David Laight
  2018-08-02 13:13           ` Eric Dumazet
  1 sibling, 1 reply; 19+ messages in thread
From: David Laight @ 2018-08-02  9:17 UTC (permalink / raw)
  To: 'Marcel Hellwig', 'Eric Dumazet',
	'davem@davemloft.net', 'kuznet@ms2.inr.ac.ru',
	'yoshfuji@linux-ipv6.org', 'andrew@lunn.ch'
  Cc: 'netdev@vger.kernel.org',
	'linux-kernel@vger.kernel.org',
	Matthias Wystrik

From: Marcel Hellwig
> Sent: 01 August 2018 11:36
> >> [<c0228adc>] (udp_recvmsg+0x284/0x33c) from [<c02306e0>] (inet_recvmsg+0x38/0x4c):
> net/ipv4/udp.c:1234
> >
> >              sin->sin_addr.s_addr = ip_hdr(skb)->saddr;
> >
> >Unaligned access trap (virtual address c14fe63a), so either sin or ip_hdr(skb) are not on a 32bit
> alignment
> >
> >Can you produce the disassembly of the trapping instruction ?
> 
> https://gist.github.com/hellow554/6b11c6c0827d5db80a7e66f71f5636ff#file-net_uipv4_udp-lst-L1892-L1895
> 
> 		sin->sin_addr.s_addr = ip_hdr(skb)->saddr;
> c0228ad8:	e5943080 	ldr	r3, [r4, #128]	; 0x80
> c0228adc:	e593300c 	ldr	r3, [r3, #12]
> c0228ae0: 	e5823004	str	r3, [r2, #4]

There are actually 2 faults, difficult to quickly sort out the merged tracebacks.
You are also running a rather old kernel: Linux version 3.4.113.

It may well be that whichever ethernet driver generated the misaligned frame
has since been fixed.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* AW: AW: AW: PROBLEM: Kernel Oops in UDP stack
  2018-08-01 13:27                 ` Marcel Hellwig
@ 2018-08-02 11:02                   ` Marcel Hellwig
  2018-08-02 13:05                     ` Eric Dumazet
  0 siblings, 1 reply; 19+ messages in thread
From: Marcel Hellwig @ 2018-08-02 11:02 UTC (permalink / raw)
  To: 'Eric Dumazet',
	Paolo Abeni, 'davem@davemloft.net',
	'kuznet@ms2.inr.ac.ru', 'yoshfuji@linux-ipv6.org',
	'andrew@lunn.ch'
  Cc: 'netdev@vger.kernel.org',
	'linux-kernel@vger.kernel.org',
	Matthias Wystrik

Hi everyone,

I did three things that evening.
* I created a small dts file, so I could boot a 3.5.7 kernel.
* Ported the pskb_expand_head from 3.5.7 to 3.4.113
* Ported the whole skbuf.{c,h} file from 3.5.7 to 3.4.113

Sadly enough the second and third one panicked as well, but interestingly not the first one (3.5.7 with a simple dts file). Now the question is: Is something changed outside of the skbuf (maybe in udp.{c,h}?) or is it because the dts uses a different approach of talking to the MAC of the lpc3250?

I may try a more recent kernel with a dts file (maybe a 4.x.x one), but I think that will be impossible to backport a patch for our 2.6 kernel :(

If anybody has any more ideas about why the dts kernel does not panic, you're very welcome.

Mit freundlichen Grüßen / With kind regards

Marcel Hellwig
B. Sc. Informatik
Entwickler

m-u-t GmbH
Am Marienhof 2
22880 Wedel
Germany

Phone:	+49 4103 9308 - 474
Fax:  	+49 4103 9308 - 99
mhellwig@mut-group.com

www.mut-group.com

Geschäftsführer (Managing Director): Fabian Peters
Amtsgericht Pinneberg (Commercial Register No.): HRB 10304 PI
USt-IdNr. (VAT-No.): DE228275390
WEEE-Reg-Nr.: DE 72271808



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: AW: AW: AW: PROBLEM: Kernel Oops in UDP stack
  2018-08-02 11:02                   ` Marcel Hellwig
@ 2018-08-02 13:05                     ` Eric Dumazet
  0 siblings, 0 replies; 19+ messages in thread
From: Eric Dumazet @ 2018-08-02 13:05 UTC (permalink / raw)
  To: Marcel Hellwig, 'Eric Dumazet',
	Paolo Abeni, 'davem@davemloft.net',
	'kuznet@ms2.inr.ac.ru', 'yoshfuji@linux-ipv6.org',
	'andrew@lunn.ch'
  Cc: 'netdev@vger.kernel.org',
	'linux-kernel@vger.kernel.org',
	Matthias Wystrik



On 08/02/2018 04:02 AM, Marcel Hellwig wrote:
> Hi everyone,
> 
> I did three things that evening.
> * I created a small dts file, so I could boot a 3.5.7 kernel.
> * Ported the pskb_expand_head from 3.5.7 to 3.4.113
> * Ported the whole skbuf.{c,h} file from 3.5.7 to 3.4.113

That seems bold :0)

Sorry, we only backport one fix at a time.

You can not simply copy whole files from one version to another.

> 
> Sadly enough the second and third one panicked as well, but interestingly not the first one (3.5.7 with a simple dts file). Now the question is: Is something changed outside of the skbuf (maybe in udp.{c,h}?) or is it because the dts uses a different approach of talking to the MAC of the lpc3250?
> 
> I may try a more recent kernel with a dts file (maybe a 4.x.x one), but I think that will be impossible to backport a patch for our 2.6 kernel :(
> 
> If anybody has any more ideas about why the dts kernel does not panic, you're very welcome.
> 
> Mit freundlichen Grüßen / With kind regards
> 
> Marcel Hellwig
> B. Sc. Informatik
> Entwickler
> 
> m-u-t GmbH
> Am Marienhof 2
> 22880 Wedel
> Germany
> 
> Phone:	+49 4103 9308 - 474
> Fax:  	+49 4103 9308 - 99
> mhellwig@mut-group.com
> 
> www.mut-group.com
> 
> Geschäftsführer (Managing Director): Fabian Peters
> Amtsgericht Pinneberg (Commercial Register No.): HRB 10304 PI
> USt-IdNr. (VAT-No.): DE228275390
> WEEE-Reg-Nr.: DE 72271808
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: AW: PROBLEM: Kernel Oops in UDP stack
  2018-08-02  9:17         ` David Laight
@ 2018-08-02 13:13           ` Eric Dumazet
  2018-08-02 13:18             ` David Laight
  2018-08-02 13:57             ` AW: " Marcel Hellwig
  0 siblings, 2 replies; 19+ messages in thread
From: Eric Dumazet @ 2018-08-02 13:13 UTC (permalink / raw)
  To: David Laight, 'Marcel Hellwig', 'Eric Dumazet',
	'davem@davemloft.net', 'kuznet@ms2.inr.ac.ru',
	'yoshfuji@linux-ipv6.org', 'andrew@lunn.ch'
  Cc: 'netdev@vger.kernel.org',
	'linux-kernel@vger.kernel.org',
	Matthias Wystrik



On 08/02/2018 02:17 AM, David Laight wrote:
> From: Marcel Hellwig
>> Sent: 01 August 2018 11:36
>>>> [<c0228adc>] (udp_recvmsg+0x284/0x33c) from [<c02306e0>] (inet_recvmsg+0x38/0x4c):
>> net/ipv4/udp.c:1234
>>>
>>>              sin->sin_addr.s_addr = ip_hdr(skb)->saddr;
>>>
>>> Unaligned access trap (virtual address c14fe63a), so either sin or ip_hdr(skb) are not on a 32bit
>> alignment
>>>
>>> Can you produce the disassembly of the trapping instruction ?
>>
>> https://gist.github.com/hellow554/6b11c6c0827d5db80a7e66f71f5636ff#file-net_uipv4_udp-lst-L1892-L1895
>>
>> 		sin->sin_addr.s_addr = ip_hdr(skb)->saddr;
>> c0228ad8:	e5943080 	ldr	r3, [r4, #128]	; 0x80
>> c0228adc:	e593300c 	ldr	r3, [r3, #12]
>> c0228ae0: 	e5823004	str	r3, [r2, #4]
> 
> There are actually 2 faults, difficult to quickly sort out the merged tracebacks.
> You are also running a rather old kernel: Linux version 3.4.113.
> 
> It may well be that whichever ethernet driver generated the misaligned frame
> has since been fixed.

A misalign frame driver problem would have faulted earlier in IP stack,
much before we perform the copy to user space in udp_recvmsg()



^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: AW: PROBLEM: Kernel Oops in UDP stack
  2018-08-02 13:13           ` Eric Dumazet
@ 2018-08-02 13:18             ` David Laight
  2018-08-02 13:57             ` AW: " Marcel Hellwig
  1 sibling, 0 replies; 19+ messages in thread
From: David Laight @ 2018-08-02 13:18 UTC (permalink / raw)
  To: 'Eric Dumazet', 'Marcel Hellwig',
	'davem@davemloft.net', 'kuznet@ms2.inr.ac.ru',
	'yoshfuji@linux-ipv6.org', 'andrew@lunn.ch'
  Cc: 'netdev@vger.kernel.org',
	'linux-kernel@vger.kernel.org',
	Matthias Wystrik

From: Eric Dumazet
> Sent: 02 August 2018 14:13
> 
> A misalign frame driver problem would have faulted earlier in IP stack,
> much before we perform the copy to user space in udp_recvmsg()

And my mailer failed to thread all the later responses :-(

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* AW: AW: PROBLEM: Kernel Oops in UDP stack
  2018-08-02 13:13           ` Eric Dumazet
  2018-08-02 13:18             ` David Laight
@ 2018-08-02 13:57             ` Marcel Hellwig
  2018-08-02 15:07               ` Eric Dumazet
  1 sibling, 1 reply; 19+ messages in thread
From: Marcel Hellwig @ 2018-08-02 13:57 UTC (permalink / raw)
  To: 'Eric Dumazet',
	David Laight, 'davem@davemloft.net',
	'kuznet@ms2.inr.ac.ru', 'yoshfuji@linux-ipv6.org',
	'andrew@lunn.ch'
  Cc: 'netdev@vger.kernel.org',
	'linux-kernel@vger.kernel.org',
	Matthias Wystrik

>> There are actually 2 faults, difficult to quickly sort out the merged tracebacks.
>> You are also running a rather old kernel: Linux version 3.4.113.
>> 
>> It may well be that whichever ethernet driver generated the misaligned 
>> frame has since been fixed.
>
>A misalign frame driver problem would have faulted earlier in IP stack, much before we perform the copy to user space in udp_recvmsg()
>

JFYI: we are talking about the lpc_eth driver[0] #57c10b6 , which is not the newest, but all newer did not fix a major problem (at least the commit messages are not screaming: WARNING, UNALIGNED MEMORY!). Is there a diagram/document how a ip packet travels down the code? From the MAC/phy driver to udp_recvmsg? It's not that obvious for me, but maybe it is something I can work with.


[0]: https://elixir.bootlin.com/linux/v3.4.113/source/drivers/net/ethernet/nxp/lpc_eth.c

Regards,
Marcel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: AW: AW: PROBLEM: Kernel Oops in UDP stack
  2018-08-02 13:57             ` AW: " Marcel Hellwig
@ 2018-08-02 15:07               ` Eric Dumazet
  2018-08-03  8:24                 ` AW: " Marcel Hellwig
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2018-08-02 15:07 UTC (permalink / raw)
  To: Marcel Hellwig, 'Eric Dumazet',
	David Laight, 'davem@davemloft.net',
	'kuznet@ms2.inr.ac.ru', 'yoshfuji@linux-ipv6.org',
	'andrew@lunn.ch'
  Cc: 'netdev@vger.kernel.org',
	'linux-kernel@vger.kernel.org',
	Matthias Wystrik



On 08/02/2018 06:57 AM, Marcel Hellwig wrote:
>>> There are actually 2 faults, difficult to quickly sort out the merged tracebacks.
>>> You are also running a rather old kernel: Linux version 3.4.113.
>>>
>>> It may well be that whichever ethernet driver generated the misaligned 
>>> frame has since been fixed.
>>
>> A misalign frame driver problem would have faulted earlier in IP stack, much before we perform the copy to user space in udp_recvmsg()
>>
> 
> JFYI: we are talking about the lpc_eth driver[0] #57c10b6 , which is not the newest, but all newer did not fix a major problem (at least the commit messages are not screaming: WARNING, UNALIGNED MEMORY!). Is there a diagram/document how a ip packet travels down the code? From the MAC/phy driver to udp_recvmsg? It's not that obvious for me, but maybe it is something I can work with.
> 
> 
> [0]: https://elixir.bootlin.com/linux/v3.4.113/source/drivers/net/ethernet/nxp/lpc_eth.c
> 
> Regards,
> Marcel
> 


Well, this driver does not use NET_IP_ALIGN reservation, meaning IP header is not 4-byte aligned.

No idea why mis-alignments are okay in IP layer, but not in UDP

You could try to patch it to use netdev_alloc_skb_ip_align() instead of dev_alloc_skb()


^ permalink raw reply	[flat|nested] 19+ messages in thread

* AW: AW: AW: PROBLEM: Kernel Oops in UDP stack
  2018-08-02 15:07               ` Eric Dumazet
@ 2018-08-03  8:24                 ` Marcel Hellwig
  2018-08-07 13:42                   ` Marcel Hellwig
  0 siblings, 1 reply; 19+ messages in thread
From: Marcel Hellwig @ 2018-08-03  8:24 UTC (permalink / raw)
  To: 'Eric Dumazet',
	David Laight, 'davem@davemloft.net',
	'kuznet@ms2.inr.ac.ru', 'yoshfuji@linux-ipv6.org',
	'andrew@lunn.ch'
  Cc: 'netdev@vger.kernel.org',
	'linux-kernel@vger.kernel.org',
	Matthias Wystrik

>
>Well, this driver does not use NET_IP_ALIGN reservation, meaning IP header is not 4-byte aligned.
>
>No idea why mis-alignments are okay in IP layer, but not in UDP
>
>You could try to patch it to use netdev_alloc_skb_ip_align() instead of dev_alloc_skb()

Looks very promising! CPU runs for 3 hours and no panic so far. Let's hope it survives the weekend! *fingers crossed*
The strange thing I don't understand is, why does the 3.5.7 kernel does not crash (or 4.17.12, yes I managed to run a recent kernel on our machine! \o/ )? It does not use netdev_alloc_skb_ip_align either[0][1], but one thing, that I noticed it, that there is a difference in /proc/iomem

3.4.113
# cat /proc/iomem
08000000-0801ffff : lpc-eth.0
31060000-31060fff : lpc-eth.0
40088000-4008801f : serial
[...]

3.5.7
# cat /proc/iomem
20084000-20084fff : /ahb/apb/ssp@20084000
2008c000-2008cfff : /ahb/apb/ssp@2008c000
31000000-31000fff : /ahb/dma@31000000
40088000-4008801f : serial
[...]

As you notice lpc-eth.0 (or 31060000.ethernet as it's called nowadays) is not listed, so my guess it is served by DMA?

May that the reason why it does not crash (Notice: 3.5.7 runs via device tree file)?

So just my thought: If I would disable dma (somehow) would the error still occur in today's kernel?


[0]: https://elixir.bootlin.com/linux/v3.5.7/source/drivers/net/ethernet/nxp/lpc_eth.c#L1003
[1]: https://elixir.bootlin.com/linux/v4.17.12/source/drivers/net/ethernet/nxp/lpc_eth.c#L957


Mit freundlichen Grüßen / With kind regards 

Marcel Hellwig
B. Sc. Informatik
Entwickler

m-u-t GmbH
Am Marienhof 2
22880 Wedel
Germany

Phone:	+49 4103 9308 - 474
Fax:  	+49 4103 9308 - 99
mhellwig@mut-group.com

www.mut-group.com

Geschäftsführer (Managing Director): Fabian Peters
Amtsgericht Pinneberg (Commercial Register No.): HRB 10304 PI
USt-IdNr. (VAT-No.): DE228275390
WEEE-Reg-Nr.: DE 72271808

^ permalink raw reply	[flat|nested] 19+ messages in thread

* AW: AW: AW: PROBLEM: Kernel Oops in UDP stack
  2018-08-03  8:24                 ` AW: " Marcel Hellwig
@ 2018-08-07 13:42                   ` Marcel Hellwig
  0 siblings, 0 replies; 19+ messages in thread
From: Marcel Hellwig @ 2018-08-07 13:42 UTC (permalink / raw)
  To: 'Eric Dumazet', 'David Laight',
	'davem@davemloft.net', 'kuznet@ms2.inr.ac.ru',
	'yoshfuji@linux-ipv6.org', 'andrew@lunn.ch'
  Cc: 'netdev@vger.kernel.org',
	'linux-kernel@vger.kernel.org',
	Matthias Wystrik

>>
>> Well, this driver does not use NET_IP_ALIGN reservation, meaning IP header is not 4-byte aligned.
>>
>> No idea why mis-alignments are okay in IP layer, but not in UDP
>>
>> You could try to patch it to use netdev_alloc_skb_ip_align() instead of dev_alloc_skb()
> 
> Looks very promising! CPU runs for 3 hours and no panic so far. Let's hope it survives the weekend! *fingers crossed* The strange thing I don't understand is, why does the 3.5.7 kernel does not crash (or 4.17.12, yes I managed to run a recent kernel on our machine! \o/ )? It does not use netdev_alloc_skb_ip_align either[0][1], but one thing, that I noticed it, that there is a difference in /proc/iomem
> 
> 3.4.113
> # cat /proc/iomem
> 08000000-0801ffff : lpc-eth.0
> 31060000-31060fff : lpc-eth.0
> 40088000-4008801f : serial
> [...]
> 
> 3.5.7
> # cat /proc/iomem
> 20084000-20084fff : /ahb/apb/ssp@20084000
> 2008c000-2008cfff : /ahb/apb/ssp@2008c000
> 31000000-31000fff : /ahb/dma@31000000
> 40088000-4008801f : serial
> [...]
> 
> As you notice lpc-eth.0 (or 31060000.ethernet as it's called nowadays) is not listed, so my guess it is served by DMA?
> 
> May that the reason why it does not crash (Notice: 3.5.7 runs via device tree file)?
> 
> So just my thought: If I would disable dma (somehow) would the error still occur in today's kernel?

I think, that it is implicitly aligned today (I think!), so that's the reason why the oops does not occur anymore.

Our machines survived the 4 days now, so we are very confident, that this solved the problem for us!

Thank you very much, you all helped us a lot! 

Mit freundlichen Grüßen / With kind regards 

Marcel Hellwig
B. Sc. Informatik
Entwickler

m-u-t GmbH
Am Marienhof 2
22880 Wedel
Germany

Phone:	+49 4103 9308 - 474
Fax:  	+49 4103 9308 - 99
mhellwig@mut-group.com

www.mut-group.com

Geschäftsführer (Managing Director): Fabian Peters Amtsgericht Pinneberg (Commercial Register No.): HRB 10304 PI USt-IdNr. (VAT-No.): DE228275390
WEEE-Reg-Nr.: DE 72271808

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2018-08-07 13:43 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-31 15:06 PROBLEM: Kernel Oops in UDP stack Marcel Hellwig
2018-07-31 15:59 ` Eric Dumazet
2018-08-01  5:55   ` AW: " Marcel Hellwig
2018-08-01 10:20     ` Eric Dumazet
2018-08-01 10:35       ` AW: " Marcel Hellwig
2018-08-01 10:44         ` Paolo Abeni
2018-08-01 10:49           ` Eric Dumazet
2018-08-01 11:25             ` Eric Dumazet
2018-08-01 11:31               ` AW: " Marcel Hellwig
2018-08-01 13:27                 ` Marcel Hellwig
2018-08-02 11:02                   ` Marcel Hellwig
2018-08-02 13:05                     ` Eric Dumazet
2018-08-02  9:17         ` David Laight
2018-08-02 13:13           ` Eric Dumazet
2018-08-02 13:18             ` David Laight
2018-08-02 13:57             ` AW: " Marcel Hellwig
2018-08-02 15:07               ` Eric Dumazet
2018-08-03  8:24                 ` AW: " Marcel Hellwig
2018-08-07 13:42                   ` Marcel Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.