* PCI/MSI: kernel NULL pointer dereference
@ 2022-09-08 10:41 Daniel Gomez
2022-09-08 12:12 ` Bjorn Helgaas
0 siblings, 1 reply; 5+ messages in thread
From: Daniel Gomez @ 2022-09-08 10:41 UTC (permalink / raw)
To: linux-pci
Hi,
I have the following error whenever I remove the fglrx module from the
latest 6.0-rc4.
Logs:
/mnt/raid0/krops/workspace/sources/fglrx-module/module/firegl_public.c:1674
KCL_SetPageCache_Array
<6>[fglrx] IRQ 37 Disabled
BUG: kernel NULL pointer dereference, address: 0000000000000010
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 0 P4D 0
Oops: 0002 [#1] SMP NOPTI
CPU: 1 PID: 254 Comm: rmmod Tainted: G W O
6.0.0-rc4-qtec-standard #2
Hardware name: QTechnology QT5022/QT5022, BIOS PM_2.1.0.309 X64 09/27/2013
RIP: 0010:mutex_lock+0x2a/0x40
Code: 0f 1f 44 00 00 53 be 1b 01 00 00 48 89 fb 48 c7 c7 08 81 3d 82 e8 46 2c 52
ff e8 01 d7 ff ff 31 c0 65 48 8b 14 25 00 ad 01 00 <f0> 48 0f b1 13 74 06 48 89
df 5b eb b9 5b c3 0f 1f 80 00 00 00 00
RSP: 0018:ffffc90000b07dd8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000010 RCX: 0000000000000000
RDX: ffff888116aabb00 RSI: 000000000000011b RDI: ffffffff823d8108
RBP: ffff8881148d20d0 R08: 0000000000000000 R09: ffffffffa053537b
R10: ffff888149365cc0 R11: ffffea00052c5048 R12: 0000000000000000
R13: ffff88813877c000 R14: 0000000000000000 R15: 0000000000000000
FS: 00007f6f90b3cb80(0000) GS:ffff88815b300000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000010 CR3: 000000011e4c8000 CR4: 00000000000006e0
Call Trace:
<TASK>
pci_disable_msi+0x34/0xe0
irqmgr_wrap_shutdown+0x165/0x190 [fglrx]
? firegl_takedown+0x841/0x950 [fglrx]
? kobject_put+0xa6/0x220
? cleanup_device+0x299/0x2a0 [fglrx]
? pci_unregister_driver+0x42/0xa0
? firegl_cleanup_device_heads+0x65/0xa0 [fglrx]
? firegl_cleanup_module+0x84/0x11c [fglrx]
? __x64_sys_delete_module+0x11b/0x210
? get_vtime_delta+0xe/0x40
? vtime_user_exit+0x1c/0x60
? __ct_user_exit+0x68/0xb0
? do_syscall_64+0x3c/0x80
? entry_SYSCALL_64_after_hwframe+0x63/0xcd
</TASK>
Modules linked in: amdgpu fglrx(O-) ath9k ath9k_common mfd_core gpu_sched
drm_buddy drm_ttm_helper ath9k_hw ttm drm_display_helper drm_kms_helper ath
sp5100_tco syscopyarea sysfillrect sysimgblt fb_sys_fops video drm backlight
ipv6
CR2: 0000000000000010
---[ end trace 0000000000000000 ]---
RIP: 0010:mutex_lock+0x2a/0x40
Code: 0f 1f 44 00 00 53 be 1b 01 00 00 48 89 fb 48 c7 c7 08 81 3d 82 e8 46 2c 52
ff e8 01 d7 ff ff 31 c0 65 48 8b 14 25 00 ad 01 00 <f0> 48 0f b1 13 74 06 48 89
df 5b eb b9 5b c3 0f 1f 80 00 00 00 00
RSP: 0018:ffffc90000b07dd8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000010 RCX: 0000000000000000
RDX: ffff888116aabb00 RSI: 000000000000011b RDI: ffffffff823d8108
RBP: ffff8881148d20d0 R08: 0000000000000000 R09: ffffffffa053537b
R10: ffff888149365cc0 R11: ffffea00052c5048 R12: 0000000000000000
R13: ffff88813877c000 R14: 0000000000000000 R15: 0000000000000000
FS: 00007f6f90b3cb80(0000) GS:ffff88815b300000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000010 CR3: 000000011e4c8000 CR4: 00000000000006e0
Steps:
insmod fglrx.ko
clinfo
MatrixMultiplication
rmmod fglrx.ko
I know this is an out of tree driver from AMD but we still need that driver
for some products because of the OpenCL stack support on it.
Note: The open-source upstream radeon does not support OpenCL.
So, doing git-bisect I found the issue is provoked by this commit [1].
Unfortunately, I cannot revert it for testing as if I do it the system hangs
on boot because of this other commit [2].
I understand, the driver might have some issues but shouldn't the kernel
prevent this crash at pci_disable_msi function? Do we have a mutex
problem here provoked by the fglrx driver?
Does anyone have any suggestions on how we can/should proceed with this?
Thanks in advance,
Daniel
[1] Commit 93296cd1325d1d9afede60202d8833011c9001f2:
93296cd1325d 2021-12-15 PCI/MSI: Allocate MSI device data on first use
[2] Commit ffd84485e6beb9cad3e5a133d88201b995298c33:
ffd84485e6be 2021-12-10 PCI/MSI: Let the irq code handle sysfs groups
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: PCI/MSI: kernel NULL pointer dereference
2022-09-08 10:41 PCI/MSI: kernel NULL pointer dereference Daniel Gomez
@ 2022-09-08 12:12 ` Bjorn Helgaas
2022-09-08 13:44 ` Deucher, Alexander
0 siblings, 1 reply; 5+ messages in thread
From: Bjorn Helgaas @ 2022-09-08 12:12 UTC (permalink / raw)
To: Daniel Gomez
Cc: linux-pci, Thomas Gleixner, Alex Deucher, Christian König
On Thu, Sep 08, 2022 at 12:41:00PM +0200, Daniel Gomez wrote:
> Hi,
>
> I have the following error whenever I remove the fglrx module from the
> latest 6.0-rc4.
You bisected to 93296cd1325d; I don't see a commit with a "Fixes:"
that references that. If you can reproduce this with an in-tree
driver, we can certainly fix it, but it's harder for an out-of-tree
driver.
I cc'd some AMD graphics folks in case they have a pointer for where
to get fglrx support.
> Logs:
> /mnt/raid0/krops/workspace/sources/fglrx-module/module/firegl_public.c:1674
> KCL_SetPageCache_Array
> <6>[fglrx] IRQ 37 Disabled
> BUG: kernel NULL pointer dereference, address: 0000000000000010
> #PF: supervisor write access in kernel mode
> #PF: error_code(0x0002) - not-present page
> PGD 0 P4D 0
> Oops: 0002 [#1] SMP NOPTI
> CPU: 1 PID: 254 Comm: rmmod Tainted: G W O
> 6.0.0-rc4-qtec-standard #2
> Hardware name: QTechnology QT5022/QT5022, BIOS PM_2.1.0.309 X64 09/27/2013
> RIP: 0010:mutex_lock+0x2a/0x40
> Code: 0f 1f 44 00 00 53 be 1b 01 00 00 48 89 fb 48 c7 c7 08 81 3d 82 e8 46 2c 52
> ff e8 01 d7 ff ff 31 c0 65 48 8b 14 25 00 ad 01 00 <f0> 48 0f b1 13 74 06 48 89
> df 5b eb b9 5b c3 0f 1f 80 00 00 00 00
> RSP: 0018:ffffc90000b07dd8 EFLAGS: 00010246
> RAX: 0000000000000000 RBX: 0000000000000010 RCX: 0000000000000000
> RDX: ffff888116aabb00 RSI: 000000000000011b RDI: ffffffff823d8108
> RBP: ffff8881148d20d0 R08: 0000000000000000 R09: ffffffffa053537b
> R10: ffff888149365cc0 R11: ffffea00052c5048 R12: 0000000000000000
> R13: ffff88813877c000 R14: 0000000000000000 R15: 0000000000000000
> FS: 00007f6f90b3cb80(0000) GS:ffff88815b300000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000010 CR3: 000000011e4c8000 CR4: 00000000000006e0
> Call Trace:
> <TASK>
> pci_disable_msi+0x34/0xe0
> irqmgr_wrap_shutdown+0x165/0x190 [fglrx]
> ? firegl_takedown+0x841/0x950 [fglrx]
> ? kobject_put+0xa6/0x220
> ? cleanup_device+0x299/0x2a0 [fglrx]
> ? pci_unregister_driver+0x42/0xa0
> ? firegl_cleanup_device_heads+0x65/0xa0 [fglrx]
> ? firegl_cleanup_module+0x84/0x11c [fglrx]
> ? __x64_sys_delete_module+0x11b/0x210
> ? get_vtime_delta+0xe/0x40
> ? vtime_user_exit+0x1c/0x60
> ? __ct_user_exit+0x68/0xb0
> ? do_syscall_64+0x3c/0x80
> ? entry_SYSCALL_64_after_hwframe+0x63/0xcd
> </TASK>
> Modules linked in: amdgpu fglrx(O-) ath9k ath9k_common mfd_core gpu_sched
> drm_buddy drm_ttm_helper ath9k_hw ttm drm_display_helper drm_kms_helper ath
> sp5100_tco syscopyarea sysfillrect sysimgblt fb_sys_fops video drm backlight
> ipv6
> CR2: 0000000000000010
> ---[ end trace 0000000000000000 ]---
> RIP: 0010:mutex_lock+0x2a/0x40
> Code: 0f 1f 44 00 00 53 be 1b 01 00 00 48 89 fb 48 c7 c7 08 81 3d 82 e8 46 2c 52
> ff e8 01 d7 ff ff 31 c0 65 48 8b 14 25 00 ad 01 00 <f0> 48 0f b1 13 74 06 48 89
> df 5b eb b9 5b c3 0f 1f 80 00 00 00 00
> RSP: 0018:ffffc90000b07dd8 EFLAGS: 00010246
> RAX: 0000000000000000 RBX: 0000000000000010 RCX: 0000000000000000
> RDX: ffff888116aabb00 RSI: 000000000000011b RDI: ffffffff823d8108
> RBP: ffff8881148d20d0 R08: 0000000000000000 R09: ffffffffa053537b
> R10: ffff888149365cc0 R11: ffffea00052c5048 R12: 0000000000000000
> R13: ffff88813877c000 R14: 0000000000000000 R15: 0000000000000000
> FS: 00007f6f90b3cb80(0000) GS:ffff88815b300000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000010 CR3: 000000011e4c8000 CR4: 00000000000006e0
>
> Steps:
> insmod fglrx.ko
> clinfo
> MatrixMultiplication
> rmmod fglrx.ko
>
> I know this is an out of tree driver from AMD but we still need that driver
> for some products because of the OpenCL stack support on it.
>
> Note: The open-source upstream radeon does not support OpenCL.
>
> So, doing git-bisect I found the issue is provoked by this commit [1].
> Unfortunately, I cannot revert it for testing as if I do it the system hangs
> on boot because of this other commit [2].
>
> I understand, the driver might have some issues but shouldn't the kernel
> prevent this crash at pci_disable_msi function? Do we have a mutex
> problem here provoked by the fglrx driver?
> Does anyone have any suggestions on how we can/should proceed with this?
>
> Thanks in advance,
> Daniel
>
> [1] Commit 93296cd1325d1d9afede60202d8833011c9001f2:
> 93296cd1325d 2021-12-15 PCI/MSI: Allocate MSI device data on first use
> [2] Commit ffd84485e6beb9cad3e5a133d88201b995298c33:
> ffd84485e6be 2021-12-10 PCI/MSI: Let the irq code handle sysfs groups
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: PCI/MSI: kernel NULL pointer dereference
2022-09-08 12:12 ` Bjorn Helgaas
@ 2022-09-08 13:44 ` Deucher, Alexander
2022-09-08 14:30 ` Daniel Gomez
0 siblings, 1 reply; 5+ messages in thread
From: Deucher, Alexander @ 2022-09-08 13:44 UTC (permalink / raw)
To: Bjorn Helgaas, Daniel Gomez; +Cc: linux-pci, Thomas Gleixner, Koenig, Christian
[Public]
> -----Original Message-----
> From: Bjorn Helgaas <helgaas@kernel.org>
> Sent: Thursday, September 8, 2022 8:13 AM
> To: Daniel Gomez <daniel@qtec.com>
> Cc: linux-pci@vger.kernel.org; Thomas Gleixner <tglx@linutronix.de>;
> Deucher, Alexander <Alexander.Deucher@amd.com>; Koenig, Christian
> <Christian.Koenig@amd.com>
> Subject: Re: PCI/MSI: kernel NULL pointer dereference
>
> On Thu, Sep 08, 2022 at 12:41:00PM +0200, Daniel Gomez wrote:
> > Hi,
> >
> > I have the following error whenever I remove the fglrx module from the
> > latest 6.0-rc4.
>
> You bisected to 93296cd1325d; I don't see a commit with a "Fixes:"
> that references that. If you can reproduce this with an in-tree driver, we can
> certainly fix it, but it's harder for an out-of-tree driver.
>
> I cc'd some AMD graphics folks in case they have a pointer for where to get
> fglrx support.
I can ask around, but I don't think we've actively worked on fglrx since we switched to the open source amdgpu driver 5-6 years ago. What hardware is this?
Alex
>
> > Logs:
> > /mnt/raid0/krops/workspace/sources/fglrx-
> module/module/firegl_public.c
> > :1674
> > KCL_SetPageCache_Array
> > <6>[fglrx] IRQ 37 Disabled
> > BUG: kernel NULL pointer dereference, address: 0000000000000010
> > #PF: supervisor write access in kernel mode
> > #PF: error_code(0x0002) - not-present page PGD 0 P4D 0
> > Oops: 0002 [#1] SMP NOPTI
> > CPU: 1 PID: 254 Comm: rmmod Tainted: G W O
> > 6.0.0-rc4-qtec-standard #2
> > Hardware name: QTechnology QT5022/QT5022, BIOS PM_2.1.0.309 X64
> > 09/27/2013
> > RIP: 0010:mutex_lock+0x2a/0x40
> > Code: 0f 1f 44 00 00 53 be 1b 01 00 00 48 89 fb 48 c7 c7 08 81 3d 82
> > e8 46 2c 52 ff e8 01 d7 ff ff 31 c0 65 48 8b 14 25 00 ad 01 00 <f0>
> > 48 0f b1 13 74 06 48 89 df 5b eb b9 5b c3 0f 1f 80 00 00 00 00
> > RSP: 0018:ffffc90000b07dd8 EFLAGS: 00010246
> > RAX: 0000000000000000 RBX: 0000000000000010 RCX: 0000000000000000
> > RDX: ffff888116aabb00 RSI: 000000000000011b RDI: ffffffff823d8108
> > RBP: ffff8881148d20d0 R08: 0000000000000000 R09: ffffffffa053537b
> > R10: ffff888149365cc0 R11: ffffea00052c5048 R12: 0000000000000000
> > R13: ffff88813877c000 R14: 0000000000000000 R15: 0000000000000000
> > FS: 00007f6f90b3cb80(0000) GS:ffff88815b300000(0000)
> > knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000000000010 CR3: 000000011e4c8000 CR4: 00000000000006e0 Call
> > Trace:
> > <TASK>
> > pci_disable_msi+0x34/0xe0
> > irqmgr_wrap_shutdown+0x165/0x190 [fglrx] ?
> > firegl_takedown+0x841/0x950 [fglrx] ? kobject_put+0xa6/0x220 ?
> > cleanup_device+0x299/0x2a0 [fglrx] ? pci_unregister_driver+0x42/0xa0
> > ? firegl_cleanup_device_heads+0x65/0xa0 [fglrx] ?
> > firegl_cleanup_module+0x84/0x11c [fglrx] ?
> > __x64_sys_delete_module+0x11b/0x210
> > ? get_vtime_delta+0xe/0x40
> > ? vtime_user_exit+0x1c/0x60
> > ? __ct_user_exit+0x68/0xb0
> > ? do_syscall_64+0x3c/0x80
> > ? entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > </TASK>
> > Modules linked in: amdgpu fglrx(O-) ath9k ath9k_common mfd_core
> > gpu_sched drm_buddy drm_ttm_helper ath9k_hw ttm
> drm_display_helper
> > drm_kms_helper ath sp5100_tco syscopyarea sysfillrect sysimgblt
> > fb_sys_fops video drm backlight
> > ipv6
> > CR2: 0000000000000010
> > ---[ end trace 0000000000000000 ]---
> > RIP: 0010:mutex_lock+0x2a/0x40
> > Code: 0f 1f 44 00 00 53 be 1b 01 00 00 48 89 fb 48 c7 c7 08 81 3d 82
> > e8 46 2c 52 ff e8 01 d7 ff ff 31 c0 65 48 8b 14 25 00 ad 01 00 <f0>
> > 48 0f b1 13 74 06 48 89 df 5b eb b9 5b c3 0f 1f 80 00 00 00 00
> > RSP: 0018:ffffc90000b07dd8 EFLAGS: 00010246
> > RAX: 0000000000000000 RBX: 0000000000000010 RCX: 0000000000000000
> > RDX: ffff888116aabb00 RSI: 000000000000011b RDI: ffffffff823d8108
> > RBP: ffff8881148d20d0 R08: 0000000000000000 R09: ffffffffa053537b
> > R10: ffff888149365cc0 R11: ffffea00052c5048 R12: 0000000000000000
> > R13: ffff88813877c000 R14: 0000000000000000 R15: 0000000000000000
> > FS: 00007f6f90b3cb80(0000) GS:ffff88815b300000(0000)
> > knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000000000010 CR3: 000000011e4c8000 CR4: 00000000000006e0
> >
> > Steps:
> > insmod fglrx.ko
> > clinfo
> > MatrixMultiplication
> > rmmod fglrx.ko
> >
> > I know this is an out of tree driver from AMD but we still need that
> > driver for some products because of the OpenCL stack support on it.
> >
> > Note: The open-source upstream radeon does not support OpenCL.
> >
> > So, doing git-bisect I found the issue is provoked by this commit [1].
> > Unfortunately, I cannot revert it for testing as if I do it the system
> > hangs on boot because of this other commit [2].
> >
> > I understand, the driver might have some issues but shouldn't the
> > kernel prevent this crash at pci_disable_msi function? Do we have a
> > mutex problem here provoked by the fglrx driver?
> > Does anyone have any suggestions on how we can/should proceed with
> this?
> >
> > Thanks in advance,
> > Daniel
> >
> > [1] Commit 93296cd1325d1d9afede60202d8833011c9001f2:
> > 93296cd1325d 2021-12-15 PCI/MSI: Allocate MSI device data on first use
> > [2] Commit ffd84485e6beb9cad3e5a133d88201b995298c33:
> > ffd84485e6be 2021-12-10 PCI/MSI: Let the irq code handle sysfs groups
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: PCI/MSI: kernel NULL pointer dereference
2022-09-08 13:44 ` Deucher, Alexander
@ 2022-09-08 14:30 ` Daniel Gomez
2022-09-14 16:19 ` Bjorn Helgaas
0 siblings, 1 reply; 5+ messages in thread
From: Daniel Gomez @ 2022-09-08 14:30 UTC (permalink / raw)
To: Deucher, Alexander
Cc: Bjorn Helgaas, linux-pci, Thomas Gleixner, Koenig, Christian
On Thu, 8 Sept 2022 at 15:44, Deucher, Alexander
<Alexander.Deucher@amd.com> wrote:
>
> [Public]
>
> > -----Original Message-----
> > From: Bjorn Helgaas <helgaas@kernel.org>
> > Sent: Thursday, September 8, 2022 8:13 AM
> > To: Daniel Gomez <daniel@qtec.com>
> > Cc: linux-pci@vger.kernel.org; Thomas Gleixner <tglx@linutronix.de>;
> > Deucher, Alexander <Alexander.Deucher@amd.com>; Koenig, Christian
> > <Christian.Koenig@amd.com>
> > Subject: Re: PCI/MSI: kernel NULL pointer dereference
> >
> > On Thu, Sep 08, 2022 at 12:41:00PM +0200, Daniel Gomez wrote:
> > > Hi,
> > >
> > > I have the following error whenever I remove the fglrx module from the
> > > latest 6.0-rc4.
> >
> > You bisected to 93296cd1325d; I don't see a commit with a "Fixes:"
> > that references that. If you can reproduce this with an in-tree driver, we can
> > certainly fix it, but it's harder for an out-of-tree driver.
I understand, thanks. I tried with radeon but rmmod it is not possible
once it's loaded.
> >
> > I cc'd some AMD graphics folks in case they have a pointer for where to get
> > fglrx support.
>
> I can ask around, but I don't think we've actively worked on fglrx since we switched to the open source amdgpu driver 5-6 years ago. What hardware is this?
It's an AMD G-T56N (bobcat) with an AMD ATI Radeon HD 6320.
>
> Alex
>
> >
> > > Logs:
> > > /mnt/raid0/krops/workspace/sources/fglrx-
> > module/module/firegl_public.c
> > > :1674
> > > KCL_SetPageCache_Array
> > > <6>[fglrx] IRQ 37 Disabled
> > > BUG: kernel NULL pointer dereference, address: 0000000000000010
> > > #PF: supervisor write access in kernel mode
> > > #PF: error_code(0x0002) - not-present page PGD 0 P4D 0
> > > Oops: 0002 [#1] SMP NOPTI
> > > CPU: 1 PID: 254 Comm: rmmod Tainted: G W O
> > > 6.0.0-rc4-qtec-standard #2
> > > Hardware name: QTechnology QT5022/QT5022, BIOS PM_2.1.0.309 X64
> > > 09/27/2013
> > > RIP: 0010:mutex_lock+0x2a/0x40
> > > Code: 0f 1f 44 00 00 53 be 1b 01 00 00 48 89 fb 48 c7 c7 08 81 3d 82
> > > e8 46 2c 52 ff e8 01 d7 ff ff 31 c0 65 48 8b 14 25 00 ad 01 00 <f0>
> > > 48 0f b1 13 74 06 48 89 df 5b eb b9 5b c3 0f 1f 80 00 00 00 00
> > > RSP: 0018:ffffc90000b07dd8 EFLAGS: 00010246
> > > RAX: 0000000000000000 RBX: 0000000000000010 RCX: 0000000000000000
> > > RDX: ffff888116aabb00 RSI: 000000000000011b RDI: ffffffff823d8108
> > > RBP: ffff8881148d20d0 R08: 0000000000000000 R09: ffffffffa053537b
> > > R10: ffff888149365cc0 R11: ffffea00052c5048 R12: 0000000000000000
> > > R13: ffff88813877c000 R14: 0000000000000000 R15: 0000000000000000
> > > FS: 00007f6f90b3cb80(0000) GS:ffff88815b300000(0000)
> > > knlGS:0000000000000000
> > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > CR2: 0000000000000010 CR3: 000000011e4c8000 CR4: 00000000000006e0 Call
> > > Trace:
> > > <TASK>
> > > pci_disable_msi+0x34/0xe0
> > > irqmgr_wrap_shutdown+0x165/0x190 [fglrx] ?
> > > firegl_takedown+0x841/0x950 [fglrx] ? kobject_put+0xa6/0x220 ?
> > > cleanup_device+0x299/0x2a0 [fglrx] ? pci_unregister_driver+0x42/0xa0
> > > ? firegl_cleanup_device_heads+0x65/0xa0 [fglrx] ?
> > > firegl_cleanup_module+0x84/0x11c [fglrx] ?
> > > __x64_sys_delete_module+0x11b/0x210
> > > ? get_vtime_delta+0xe/0x40
> > > ? vtime_user_exit+0x1c/0x60
> > > ? __ct_user_exit+0x68/0xb0
> > > ? do_syscall_64+0x3c/0x80
> > > ? entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > </TASK>
> > > Modules linked in: amdgpu fglrx(O-) ath9k ath9k_common mfd_core
> > > gpu_sched drm_buddy drm_ttm_helper ath9k_hw ttm
> > drm_display_helper
> > > drm_kms_helper ath sp5100_tco syscopyarea sysfillrect sysimgblt
> > > fb_sys_fops video drm backlight
> > > ipv6
> > > CR2: 0000000000000010
> > > ---[ end trace 0000000000000000 ]---
> > > RIP: 0010:mutex_lock+0x2a/0x40
> > > Code: 0f 1f 44 00 00 53 be 1b 01 00 00 48 89 fb 48 c7 c7 08 81 3d 82
> > > e8 46 2c 52 ff e8 01 d7 ff ff 31 c0 65 48 8b 14 25 00 ad 01 00 <f0>
> > > 48 0f b1 13 74 06 48 89 df 5b eb b9 5b c3 0f 1f 80 00 00 00 00
> > > RSP: 0018:ffffc90000b07dd8 EFLAGS: 00010246
> > > RAX: 0000000000000000 RBX: 0000000000000010 RCX: 0000000000000000
> > > RDX: ffff888116aabb00 RSI: 000000000000011b RDI: ffffffff823d8108
> > > RBP: ffff8881148d20d0 R08: 0000000000000000 R09: ffffffffa053537b
> > > R10: ffff888149365cc0 R11: ffffea00052c5048 R12: 0000000000000000
> > > R13: ffff88813877c000 R14: 0000000000000000 R15: 0000000000000000
> > > FS: 00007f6f90b3cb80(0000) GS:ffff88815b300000(0000)
> > > knlGS:0000000000000000
> > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > CR2: 0000000000000010 CR3: 000000011e4c8000 CR4: 00000000000006e0
> > >
> > > Steps:
> > > insmod fglrx.ko
> > > clinfo
> > > MatrixMultiplication
> > > rmmod fglrx.ko
> > >
> > > I know this is an out of tree driver from AMD but we still need that
> > > driver for some products because of the OpenCL stack support on it.
> > >
> > > Note: The open-source upstream radeon does not support OpenCL.
> > >
> > > So, doing git-bisect I found the issue is provoked by this commit [1].
> > > Unfortunately, I cannot revert it for testing as if I do it the system
> > > hangs on boot because of this other commit [2].
> > >
> > > I understand, the driver might have some issues but shouldn't the
> > > kernel prevent this crash at pci_disable_msi function? Do we have a
> > > mutex problem here provoked by the fglrx driver?
> > > Does anyone have any suggestions on how we can/should proceed with
> > this?
> > >
> > > Thanks in advance,
> > > Daniel
> > >
> > > [1] Commit 93296cd1325d1d9afede60202d8833011c9001f2:
> > > 93296cd1325d 2021-12-15 PCI/MSI: Allocate MSI device data on first use
> > > [2] Commit ffd84485e6beb9cad3e5a133d88201b995298c33:
> > > ffd84485e6be 2021-12-10 PCI/MSI: Let the irq code handle sysfs groups
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: PCI/MSI: kernel NULL pointer dereference
2022-09-08 14:30 ` Daniel Gomez
@ 2022-09-14 16:19 ` Bjorn Helgaas
0 siblings, 0 replies; 5+ messages in thread
From: Bjorn Helgaas @ 2022-09-14 16:19 UTC (permalink / raw)
To: Daniel Gomez
Cc: Deucher, Alexander, linux-pci, Thomas Gleixner, Koenig, Christian
On Thu, Sep 08, 2022 at 04:30:28PM +0200, Daniel Gomez wrote:
> On Thu, 8 Sept 2022 at 15:44, Deucher, Alexander
> <Alexander.Deucher@amd.com> wrote:
> > > -----Original Message-----
> > > From: Bjorn Helgaas <helgaas@kernel.org>
> > > Sent: Thursday, September 8, 2022 8:13 AM
> > > To: Daniel Gomez <daniel@qtec.com>
> > > Cc: linux-pci@vger.kernel.org; Thomas Gleixner <tglx@linutronix.de>;
> > > Deucher, Alexander <Alexander.Deucher@amd.com>; Koenig, Christian
> > > <Christian.Koenig@amd.com>
> > > Subject: Re: PCI/MSI: kernel NULL pointer dereference
> > >
> > > On Thu, Sep 08, 2022 at 12:41:00PM +0200, Daniel Gomez wrote:
> > > > Hi,
> > > >
> > > > I have the following error whenever I remove the fglrx module from the
> > > > latest 6.0-rc4.
> > >
> > > You bisected to 93296cd1325d; I don't see a commit with a "Fixes:"
> > > that references that. If you can reproduce this with an in-tree driver, we can
> > > certainly fix it, but it's harder for an out-of-tree driver.
> I understand, thanks. I tried with radeon but rmmod it is not possible
> once it's loaded.
A brute-force way to debug this would be to add logging to entry
points in msi.c so you can see what MSI-related interfaces fglrx uses
and in what order. Then you may be able to make a trivial module that
does something similar that we could use as a test case so we could
duplicate the failure and verify a fix.
Bjorn
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-09-14 16:19 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-08 10:41 PCI/MSI: kernel NULL pointer dereference Daniel Gomez
2022-09-08 12:12 ` Bjorn Helgaas
2022-09-08 13:44 ` Deucher, Alexander
2022-09-08 14:30 ` Daniel Gomez
2022-09-14 16:19 ` Bjorn Helgaas
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.