netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* PROBLEM: kernel crash when unbinding igb device with 4.19.94
@ 2020-01-15 17:10 Norbert Lange
  2020-01-15 17:24 ` Norbert Lange
  2020-01-15 17:27 ` Florian Fainelli
  0 siblings, 2 replies; 3+ messages in thread
From: Norbert Lange @ 2020-01-15 17:10 UTC (permalink / raw)
  To: Richard Cochran, netdev

[-- Attachment #1: Type: text/plain, Size: 6105 bytes --]

Hello,

The commit "ptp: fix the race between the release of ptp_clock and
cdev" (#0393b8720128) introduced a bad regression, atleast in the 4.19
branch.

I have a Intel I210 card in the system (actually 4 of them if that's
relevant), system is a custom buildroot so I dont have all tools to
create the information, but given that reverting the commit fixed the
issue I think its narrowed down enough.
unbinding the driver from one device will always trigger a crash.

I use the xenomai ipipe-patch on top, if required I could try with a
naked linux (would cost me some time to do).
I ran various versions from 4.14 up to 4.19.89, and 4.19.89 with above
patch reversed, all which did not have this issue.

to reproduce:
> ethpci="0000:01:00.0"
> echo "$ethpci" > /sys/bus/pci/devices/$ethpci/driver/unbind.

Kernel:
> Linux version 4.19.94-cip18-xeno10-static (gcc version 9.2.0) #1 SMP Wed Jan 15 17:38:48 CET 2020

Cpuinfo:
> Intel(R) Atom(TM) Processor E3940 @ 1.60GHz

Modules (almost all are statically linked):
> plusb 16384 0 - Live 0xffffffffc0099000
> usbnet 45056 1 plusb, Live 0xffffffffc0087000
> mii 16384 1 usbnet, Live 0xffffffffc0080000

Lspci:
> 03:00.0 Class 0200: 8086:1539
> 00:1c.0 Class 0805: 8086:5acc
> 00:1f.0 Class 0601: 8086:5ae8
> 00:13.2 Class 0604: 8086:5ada
> 02:00.0 Class 0200: 8086:1539
> 00:13.0 Class 0604: 8086:5ad8
> 01:00.0 Class 0200: 8086:1539
> 00:1b.0 Class 0805: 8086:5aca
> 00:0f.0 Class 0780: 8086:5a9a
> 00:00.0 Class 0600: 8086:5af0
> 00:12.0 Class 0106: 8086:5ae3
> 00:1f.1 Class 0c05: 8086:5ad4
> 00:15.0 Class 0c03: 8086:5aa8
> 00:13.1 Class 0604: 8086:5ad9
> 04:00.0 Class 0200: 8086:1533
> 00:02.0 Class 0300: 8086:5a85
> 00:14.0 Class 0604: 8086:5ad6

Network card (idb driver):
Intel i210 (8086:1539)

Crashlog:
[  199.590152] BUG: unable to handle kernel NULL pointer dereference
at 0000000000000000
[  199.597995] PGD 179717067 P4D 179717067 PUD 17896b067 PMD 0
[  199.603670] Oops: 0000 [#1] SMP NOPTI
[  199.607344] CPU: 2 PID: 764 Comm: zsh Not tainted
4.19.94-cip18-xeno10-static #1
[  199.614745] Hardware name: TQ-Group TQMxE39M/Type2 - Board Product
Name, BIOS 5.12.30.21.20 08/05/2019
[  199.624059] I-pipe domain: Linux
[  199.627300] RIP: 0010:strlen+0x0/0x20
[  199.630972] Code: f6 82 e0 5e 31 8b 20 74 11 0f b6 50 01 48 83 c0
01 f6 82 e0 5e 31 8b 20 75 ef c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f
1f 40 00 <80> 3f 00 74 10 48 89 f8 48 83 c0 01 80 38 00 75 f7 48 29 f8
c3 31
[  199.649742] RSP: 0018:ffffad3ec06ffb20 EFLAGS: 00010246
[  199.654975] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[  199.662118] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[  199.669258] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff9756f996f018
[  199.676402] R10: 0000000000000000 R11: ffff9756fb006c00 R12: ffff9756f9916788
[  199.683543] R13: 0000000000000000 R14: ffff9756fa81e190 R15: ffff9756f8b66f20
[  199.690683] FS:  0000000000535558(0000) GS:ffff9756fbb00000(0000)
knlGS:0000000000000000
[  199.698780] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  199.704535] CR2: 0000000000000000 CR3: 000000017aaa0000 CR4: 00000000003406e0
[  199.711674] Call Trace:
[  199.714136]  kernfs_name_hash+0x12/0x80
[  199.717983]  kernfs_find_ns+0x35/0xd0
[  199.721654]  kernfs_remove_by_name_ns+0x32/0x90
[  199.726194]  remove_files.isra.0+0x30/0x70
[  199.730301]  sysfs_remove_group+0x3d/0x80
[  199.734321]  sysfs_remove_groups+0x29/0x40
[  199.738428]  device_remove_attrs+0x42/0x80
[  199.742534]  device_del+0x14f/0x360
[  199.746036]  cdev_device_del+0x15/0x30
[  199.749797]  posix_clock_unregister+0x21/0x50
[  199.754165]  ptp_clock_unregister+0x6e/0x80
[  199.758359]  igb_ptp_stop+0x1f/0x50
[  199.761861]  igb_remove+0x37/0x110
[  199.765272]  pci_device_remove+0x28/0x60
[  199.769202]  device_release_driver_internal+0x162/0x220
[  199.774437]  unbind_store+0xb1/0x170
[  199.778024]  kernfs_fop_write+0x10b/0x190
[  199.782042]  do_iter_write+0x140/0x180
[  199.785801]  vfs_writev+0xa6/0xf0
[  199.789127]  ? __alloc_fd+0x3d/0x140
[  199.792711]  ? f_dupfd+0x66/0x79
[  199.795949]  do_writev+0x5f/0x100
[  199.799273]  do_syscall_64+0x78/0x3d0
[  199.802944]  ? __do_page_fault+0x206/0x400
[  199.807049]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  199.812106] RIP: 0033:0x4cc34c
[  199.815172] Code: ed 01 48 29 d0 49 83 c5 10 49 8b 55 08 48 63 dd
48 29 c2 49 01 45 00 49 89 55 08 49 63 7f 78 4c 89 e0 4c 89 ee 48 89
da 0f 05 <48> 89 c7 e8 cc 4e ff ff 49 39 c6 75 b7 49 8b 47 58 49 8b 57
60 48
[  199.833943] RSP: 002b:00007ffe32e417a0 EFLAGS: 00000202 ORIG_RAX:
0000000000000014
[  199.841521] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00000000004cc34c
[  199.848663] RDX: 0000000000000002 RSI: 00007ffe32e417b0 RDI: 0000000000000001
[  199.855805] RBP: 0000000000000002 R08: 0000000000523040 R09: 0000000000000000
[  199.862949] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000014
[  199.870091] R13: 00007ffe32e417b0 R14: 000000000000000d R15: 0000000000523040
[  199.877237] Modules linked in: plusb usbnet mii
[  199.881783] CR2: 0000000000000000
[  199.885115] ---[ end trace 218fd81d1aa77ca4 ]---
[  199.889741] RIP: 0010:strlen+0x0/0x20
[  199.893413] Code: f6 82 e0 5e 31 8b 20 74 11 0f b6 50 01 48 83 c0
01 f6 82 e0 5e 31 8b 20 75 ef c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f
1f 40 00 <80> 3f 00 74 10 48 89 f8 48 83 c0 01 80 38 00 75 f7 48 29 f8
c3 31
[  199.912189] RSP: 0018:ffffad3ec06ffb20 EFLAGS: 00010246
[  199.917424] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[  199.924568] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[  199.931711] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff9756f996f018
[  199.938855] R10: 0000000000000000 R11: ffff9756fb006c00 R12: ffff9756f9916788
[  199.946000] R13: 0000000000000000 R14: ffff9756fa81e190 R15: ffff9756f8b66f20
[  199.953142] FS:  0000000000535558(0000) GS:ffff9756fbb00000(0000)
knlGS:0000000000000000
[  199.961237] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  199.966990] CR2: 0000000000000000 CR3: 000000017aaa0000 CR4: 00000000003406e0

[-- Attachment #2: config-4.19.94-cip18-xeno10-static.gz --]
[-- Type: application/gzip, Size: 22755 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: PROBLEM: kernel crash when unbinding igb device with 4.19.94
  2020-01-15 17:10 PROBLEM: kernel crash when unbinding igb device with 4.19.94 Norbert Lange
@ 2020-01-15 17:24 ` Norbert Lange
  2020-01-15 17:27 ` Florian Fainelli
  1 sibling, 0 replies; 3+ messages in thread
From: Norbert Lange @ 2020-01-15 17:24 UTC (permalink / raw)
  To: Richard Cochran, netdev

Small correction:
I ran various versions from 4.14 up to 4.19.89, and *4.19.94* with above
patch reversed, all which did not have this issue.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: PROBLEM: kernel crash when unbinding igb device with 4.19.94
  2020-01-15 17:10 PROBLEM: kernel crash when unbinding igb device with 4.19.94 Norbert Lange
  2020-01-15 17:24 ` Norbert Lange
@ 2020-01-15 17:27 ` Florian Fainelli
  1 sibling, 0 replies; 3+ messages in thread
From: Florian Fainelli @ 2020-01-15 17:27 UTC (permalink / raw)
  To: Norbert Lange, Richard Cochran, netdev

On 1/15/20 9:10 AM, Norbert Lange wrote:
> Hello,
> 
> The commit "ptp: fix the race between the release of ptp_clock and
> cdev" (#0393b8720128) introduced a bad regression, atleast in the 4.19
> branch.
> 
> I have a Intel I210 card in the system (actually 4 of them if that's
> relevant), system is a custom buildroot so I dont have all tools to
> create the information, but given that reverting the commit fixed the
> issue I think its narrowed down enough.
> unbinding the driver from one device will always trigger a crash.
> 
> I use the xenomai ipipe-patch on top, if required I could try with a
> naked linux (would cost me some time to do).
> I ran various versions from 4.14 up to 4.19.89, and 4.19.89 with above
> patch reversed, all which did not have this issue.

This patch should fix the problem:

https://lore.kernel.org/netdev/20200113130009.2938-1-vdronov@redhat.com/

and should soon reach stable kernels if it has not already.
-- 
Florian

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-01-15 17:27 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-15 17:10 PROBLEM: kernel crash when unbinding igb device with 4.19.94 Norbert Lange
2020-01-15 17:24 ` Norbert Lange
2020-01-15 17:27 ` Florian Fainelli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).