* PCI/MSI: kernel NULL pointer dereference @ 2022-09-08 10:41 Daniel Gomez 2022-09-08 12:12 ` Bjorn Helgaas 0 siblings, 1 reply; 5+ messages in thread From: Daniel Gomez @ 2022-09-08 10:41 UTC (permalink / raw) To: linux-pci Hi, I have the following error whenever I remove the fglrx module from the latest 6.0-rc4. Logs: /mnt/raid0/krops/workspace/sources/fglrx-module/module/firegl_public.c:1674 KCL_SetPageCache_Array <6>[fglrx] IRQ 37 Disabled BUG: kernel NULL pointer dereference, address: 0000000000000010 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] SMP NOPTI CPU: 1 PID: 254 Comm: rmmod Tainted: G W O 6.0.0-rc4-qtec-standard #2 Hardware name: QTechnology QT5022/QT5022, BIOS PM_2.1.0.309 X64 09/27/2013 RIP: 0010:mutex_lock+0x2a/0x40 Code: 0f 1f 44 00 00 53 be 1b 01 00 00 48 89 fb 48 c7 c7 08 81 3d 82 e8 46 2c 52 ff e8 01 d7 ff ff 31 c0 65 48 8b 14 25 00 ad 01 00 <f0> 48 0f b1 13 74 06 48 89 df 5b eb b9 5b c3 0f 1f 80 00 00 00 00 RSP: 0018:ffffc90000b07dd8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000010 RCX: 0000000000000000 RDX: ffff888116aabb00 RSI: 000000000000011b RDI: ffffffff823d8108 RBP: ffff8881148d20d0 R08: 0000000000000000 R09: ffffffffa053537b R10: ffff888149365cc0 R11: ffffea00052c5048 R12: 0000000000000000 R13: ffff88813877c000 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f6f90b3cb80(0000) GS:ffff88815b300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000010 CR3: 000000011e4c8000 CR4: 00000000000006e0 Call Trace: <TASK> pci_disable_msi+0x34/0xe0 irqmgr_wrap_shutdown+0x165/0x190 [fglrx] ? firegl_takedown+0x841/0x950 [fglrx] ? kobject_put+0xa6/0x220 ? cleanup_device+0x299/0x2a0 [fglrx] ? pci_unregister_driver+0x42/0xa0 ? firegl_cleanup_device_heads+0x65/0xa0 [fglrx] ? firegl_cleanup_module+0x84/0x11c [fglrx] ? __x64_sys_delete_module+0x11b/0x210 ? get_vtime_delta+0xe/0x40 ? vtime_user_exit+0x1c/0x60 ? __ct_user_exit+0x68/0xb0 ? do_syscall_64+0x3c/0x80 ? entry_SYSCALL_64_after_hwframe+0x63/0xcd </TASK> Modules linked in: amdgpu fglrx(O-) ath9k ath9k_common mfd_core gpu_sched drm_buddy drm_ttm_helper ath9k_hw ttm drm_display_helper drm_kms_helper ath sp5100_tco syscopyarea sysfillrect sysimgblt fb_sys_fops video drm backlight ipv6 CR2: 0000000000000010 ---[ end trace 0000000000000000 ]--- RIP: 0010:mutex_lock+0x2a/0x40 Code: 0f 1f 44 00 00 53 be 1b 01 00 00 48 89 fb 48 c7 c7 08 81 3d 82 e8 46 2c 52 ff e8 01 d7 ff ff 31 c0 65 48 8b 14 25 00 ad 01 00 <f0> 48 0f b1 13 74 06 48 89 df 5b eb b9 5b c3 0f 1f 80 00 00 00 00 RSP: 0018:ffffc90000b07dd8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000010 RCX: 0000000000000000 RDX: ffff888116aabb00 RSI: 000000000000011b RDI: ffffffff823d8108 RBP: ffff8881148d20d0 R08: 0000000000000000 R09: ffffffffa053537b R10: ffff888149365cc0 R11: ffffea00052c5048 R12: 0000000000000000 R13: ffff88813877c000 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f6f90b3cb80(0000) GS:ffff88815b300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000010 CR3: 000000011e4c8000 CR4: 00000000000006e0 Steps: insmod fglrx.ko clinfo MatrixMultiplication rmmod fglrx.ko I know this is an out of tree driver from AMD but we still need that driver for some products because of the OpenCL stack support on it. Note: The open-source upstream radeon does not support OpenCL. So, doing git-bisect I found the issue is provoked by this commit [1]. Unfortunately, I cannot revert it for testing as if I do it the system hangs on boot because of this other commit [2]. I understand, the driver might have some issues but shouldn't the kernel prevent this crash at pci_disable_msi function? Do we have a mutex problem here provoked by the fglrx driver? Does anyone have any suggestions on how we can/should proceed with this? Thanks in advance, Daniel [1] Commit 93296cd1325d1d9afede60202d8833011c9001f2: 93296cd1325d 2021-12-15 PCI/MSI: Allocate MSI device data on first use [2] Commit ffd84485e6beb9cad3e5a133d88201b995298c33: ffd84485e6be 2021-12-10 PCI/MSI: Let the irq code handle sysfs groups ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: PCI/MSI: kernel NULL pointer dereference 2022-09-08 10:41 PCI/MSI: kernel NULL pointer dereference Daniel Gomez @ 2022-09-08 12:12 ` Bjorn Helgaas 2022-09-08 13:44 ` Deucher, Alexander 0 siblings, 1 reply; 5+ messages in thread From: Bjorn Helgaas @ 2022-09-08 12:12 UTC (permalink / raw) To: Daniel Gomez Cc: linux-pci, Thomas Gleixner, Alex Deucher, Christian König On Thu, Sep 08, 2022 at 12:41:00PM +0200, Daniel Gomez wrote: > Hi, > > I have the following error whenever I remove the fglrx module from the > latest 6.0-rc4. You bisected to 93296cd1325d; I don't see a commit with a "Fixes:" that references that. If you can reproduce this with an in-tree driver, we can certainly fix it, but it's harder for an out-of-tree driver. I cc'd some AMD graphics folks in case they have a pointer for where to get fglrx support. > Logs: > /mnt/raid0/krops/workspace/sources/fglrx-module/module/firegl_public.c:1674 > KCL_SetPageCache_Array > <6>[fglrx] IRQ 37 Disabled > BUG: kernel NULL pointer dereference, address: 0000000000000010 > #PF: supervisor write access in kernel mode > #PF: error_code(0x0002) - not-present page > PGD 0 P4D 0 > Oops: 0002 [#1] SMP NOPTI > CPU: 1 PID: 254 Comm: rmmod Tainted: G W O > 6.0.0-rc4-qtec-standard #2 > Hardware name: QTechnology QT5022/QT5022, BIOS PM_2.1.0.309 X64 09/27/2013 > RIP: 0010:mutex_lock+0x2a/0x40 > Code: 0f 1f 44 00 00 53 be 1b 01 00 00 48 89 fb 48 c7 c7 08 81 3d 82 e8 46 2c 52 > ff e8 01 d7 ff ff 31 c0 65 48 8b 14 25 00 ad 01 00 <f0> 48 0f b1 13 74 06 48 89 > df 5b eb b9 5b c3 0f 1f 80 00 00 00 00 > RSP: 0018:ffffc90000b07dd8 EFLAGS: 00010246 > RAX: 0000000000000000 RBX: 0000000000000010 RCX: 0000000000000000 > RDX: ffff888116aabb00 RSI: 000000000000011b RDI: ffffffff823d8108 > RBP: ffff8881148d20d0 R08: 0000000000000000 R09: ffffffffa053537b > R10: ffff888149365cc0 R11: ffffea00052c5048 R12: 0000000000000000 > R13: ffff88813877c000 R14: 0000000000000000 R15: 0000000000000000 > FS: 00007f6f90b3cb80(0000) GS:ffff88815b300000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000000010 CR3: 000000011e4c8000 CR4: 00000000000006e0 > Call Trace: > <TASK> > pci_disable_msi+0x34/0xe0 > irqmgr_wrap_shutdown+0x165/0x190 [fglrx] > ? firegl_takedown+0x841/0x950 [fglrx] > ? kobject_put+0xa6/0x220 > ? cleanup_device+0x299/0x2a0 [fglrx] > ? pci_unregister_driver+0x42/0xa0 > ? firegl_cleanup_device_heads+0x65/0xa0 [fglrx] > ? firegl_cleanup_module+0x84/0x11c [fglrx] > ? __x64_sys_delete_module+0x11b/0x210 > ? get_vtime_delta+0xe/0x40 > ? vtime_user_exit+0x1c/0x60 > ? __ct_user_exit+0x68/0xb0 > ? do_syscall_64+0x3c/0x80 > ? entry_SYSCALL_64_after_hwframe+0x63/0xcd > </TASK> > Modules linked in: amdgpu fglrx(O-) ath9k ath9k_common mfd_core gpu_sched > drm_buddy drm_ttm_helper ath9k_hw ttm drm_display_helper drm_kms_helper ath > sp5100_tco syscopyarea sysfillrect sysimgblt fb_sys_fops video drm backlight > ipv6 > CR2: 0000000000000010 > ---[ end trace 0000000000000000 ]--- > RIP: 0010:mutex_lock+0x2a/0x40 > Code: 0f 1f 44 00 00 53 be 1b 01 00 00 48 89 fb 48 c7 c7 08 81 3d 82 e8 46 2c 52 > ff e8 01 d7 ff ff 31 c0 65 48 8b 14 25 00 ad 01 00 <f0> 48 0f b1 13 74 06 48 89 > df 5b eb b9 5b c3 0f 1f 80 00 00 00 00 > RSP: 0018:ffffc90000b07dd8 EFLAGS: 00010246 > RAX: 0000000000000000 RBX: 0000000000000010 RCX: 0000000000000000 > RDX: ffff888116aabb00 RSI: 000000000000011b RDI: ffffffff823d8108 > RBP: ffff8881148d20d0 R08: 0000000000000000 R09: ffffffffa053537b > R10: ffff888149365cc0 R11: ffffea00052c5048 R12: 0000000000000000 > R13: ffff88813877c000 R14: 0000000000000000 R15: 0000000000000000 > FS: 00007f6f90b3cb80(0000) GS:ffff88815b300000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000000010 CR3: 000000011e4c8000 CR4: 00000000000006e0 > > Steps: > insmod fglrx.ko > clinfo > MatrixMultiplication > rmmod fglrx.ko > > I know this is an out of tree driver from AMD but we still need that driver > for some products because of the OpenCL stack support on it. > > Note: The open-source upstream radeon does not support OpenCL. > > So, doing git-bisect I found the issue is provoked by this commit [1]. > Unfortunately, I cannot revert it for testing as if I do it the system hangs > on boot because of this other commit [2]. > > I understand, the driver might have some issues but shouldn't the kernel > prevent this crash at pci_disable_msi function? Do we have a mutex > problem here provoked by the fglrx driver? > Does anyone have any suggestions on how we can/should proceed with this? > > Thanks in advance, > Daniel > > [1] Commit 93296cd1325d1d9afede60202d8833011c9001f2: > 93296cd1325d 2021-12-15 PCI/MSI: Allocate MSI device data on first use > [2] Commit ffd84485e6beb9cad3e5a133d88201b995298c33: > ffd84485e6be 2021-12-10 PCI/MSI: Let the irq code handle sysfs groups ^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: PCI/MSI: kernel NULL pointer dereference 2022-09-08 12:12 ` Bjorn Helgaas @ 2022-09-08 13:44 ` Deucher, Alexander 2022-09-08 14:30 ` Daniel Gomez 0 siblings, 1 reply; 5+ messages in thread From: Deucher, Alexander @ 2022-09-08 13:44 UTC (permalink / raw) To: Bjorn Helgaas, Daniel Gomez; +Cc: linux-pci, Thomas Gleixner, Koenig, Christian [Public] > -----Original Message----- > From: Bjorn Helgaas <helgaas@kernel.org> > Sent: Thursday, September 8, 2022 8:13 AM > To: Daniel Gomez <daniel@qtec.com> > Cc: linux-pci@vger.kernel.org; Thomas Gleixner <tglx@linutronix.de>; > Deucher, Alexander <Alexander.Deucher@amd.com>; Koenig, Christian > <Christian.Koenig@amd.com> > Subject: Re: PCI/MSI: kernel NULL pointer dereference > > On Thu, Sep 08, 2022 at 12:41:00PM +0200, Daniel Gomez wrote: > > Hi, > > > > I have the following error whenever I remove the fglrx module from the > > latest 6.0-rc4. > > You bisected to 93296cd1325d; I don't see a commit with a "Fixes:" > that references that. If you can reproduce this with an in-tree driver, we can > certainly fix it, but it's harder for an out-of-tree driver. > > I cc'd some AMD graphics folks in case they have a pointer for where to get > fglrx support. I can ask around, but I don't think we've actively worked on fglrx since we switched to the open source amdgpu driver 5-6 years ago. What hardware is this? Alex > > > Logs: > > /mnt/raid0/krops/workspace/sources/fglrx- > module/module/firegl_public.c > > :1674 > > KCL_SetPageCache_Array > > <6>[fglrx] IRQ 37 Disabled > > BUG: kernel NULL pointer dereference, address: 0000000000000010 > > #PF: supervisor write access in kernel mode > > #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 > > Oops: 0002 [#1] SMP NOPTI > > CPU: 1 PID: 254 Comm: rmmod Tainted: G W O > > 6.0.0-rc4-qtec-standard #2 > > Hardware name: QTechnology QT5022/QT5022, BIOS PM_2.1.0.309 X64 > > 09/27/2013 > > RIP: 0010:mutex_lock+0x2a/0x40 > > Code: 0f 1f 44 00 00 53 be 1b 01 00 00 48 89 fb 48 c7 c7 08 81 3d 82 > > e8 46 2c 52 ff e8 01 d7 ff ff 31 c0 65 48 8b 14 25 00 ad 01 00 <f0> > > 48 0f b1 13 74 06 48 89 df 5b eb b9 5b c3 0f 1f 80 00 00 00 00 > > RSP: 0018:ffffc90000b07dd8 EFLAGS: 00010246 > > RAX: 0000000000000000 RBX: 0000000000000010 RCX: 0000000000000000 > > RDX: ffff888116aabb00 RSI: 000000000000011b RDI: ffffffff823d8108 > > RBP: ffff8881148d20d0 R08: 0000000000000000 R09: ffffffffa053537b > > R10: ffff888149365cc0 R11: ffffea00052c5048 R12: 0000000000000000 > > R13: ffff88813877c000 R14: 0000000000000000 R15: 0000000000000000 > > FS: 00007f6f90b3cb80(0000) GS:ffff88815b300000(0000) > > knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 0000000000000010 CR3: 000000011e4c8000 CR4: 00000000000006e0 Call > > Trace: > > <TASK> > > pci_disable_msi+0x34/0xe0 > > irqmgr_wrap_shutdown+0x165/0x190 [fglrx] ? > > firegl_takedown+0x841/0x950 [fglrx] ? kobject_put+0xa6/0x220 ? > > cleanup_device+0x299/0x2a0 [fglrx] ? pci_unregister_driver+0x42/0xa0 > > ? firegl_cleanup_device_heads+0x65/0xa0 [fglrx] ? > > firegl_cleanup_module+0x84/0x11c [fglrx] ? > > __x64_sys_delete_module+0x11b/0x210 > > ? get_vtime_delta+0xe/0x40 > > ? vtime_user_exit+0x1c/0x60 > > ? __ct_user_exit+0x68/0xb0 > > ? do_syscall_64+0x3c/0x80 > > ? entry_SYSCALL_64_after_hwframe+0x63/0xcd > > </TASK> > > Modules linked in: amdgpu fglrx(O-) ath9k ath9k_common mfd_core > > gpu_sched drm_buddy drm_ttm_helper ath9k_hw ttm > drm_display_helper > > drm_kms_helper ath sp5100_tco syscopyarea sysfillrect sysimgblt > > fb_sys_fops video drm backlight > > ipv6 > > CR2: 0000000000000010 > > ---[ end trace 0000000000000000 ]--- > > RIP: 0010:mutex_lock+0x2a/0x40 > > Code: 0f 1f 44 00 00 53 be 1b 01 00 00 48 89 fb 48 c7 c7 08 81 3d 82 > > e8 46 2c 52 ff e8 01 d7 ff ff 31 c0 65 48 8b 14 25 00 ad 01 00 <f0> > > 48 0f b1 13 74 06 48 89 df 5b eb b9 5b c3 0f 1f 80 00 00 00 00 > > RSP: 0018:ffffc90000b07dd8 EFLAGS: 00010246 > > RAX: 0000000000000000 RBX: 0000000000000010 RCX: 0000000000000000 > > RDX: ffff888116aabb00 RSI: 000000000000011b RDI: ffffffff823d8108 > > RBP: ffff8881148d20d0 R08: 0000000000000000 R09: ffffffffa053537b > > R10: ffff888149365cc0 R11: ffffea00052c5048 R12: 0000000000000000 > > R13: ffff88813877c000 R14: 0000000000000000 R15: 0000000000000000 > > FS: 00007f6f90b3cb80(0000) GS:ffff88815b300000(0000) > > knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 0000000000000010 CR3: 000000011e4c8000 CR4: 00000000000006e0 > > > > Steps: > > insmod fglrx.ko > > clinfo > > MatrixMultiplication > > rmmod fglrx.ko > > > > I know this is an out of tree driver from AMD but we still need that > > driver for some products because of the OpenCL stack support on it. > > > > Note: The open-source upstream radeon does not support OpenCL. > > > > So, doing git-bisect I found the issue is provoked by this commit [1]. > > Unfortunately, I cannot revert it for testing as if I do it the system > > hangs on boot because of this other commit [2]. > > > > I understand, the driver might have some issues but shouldn't the > > kernel prevent this crash at pci_disable_msi function? Do we have a > > mutex problem here provoked by the fglrx driver? > > Does anyone have any suggestions on how we can/should proceed with > this? > > > > Thanks in advance, > > Daniel > > > > [1] Commit 93296cd1325d1d9afede60202d8833011c9001f2: > > 93296cd1325d 2021-12-15 PCI/MSI: Allocate MSI device data on first use > > [2] Commit ffd84485e6beb9cad3e5a133d88201b995298c33: > > ffd84485e6be 2021-12-10 PCI/MSI: Let the irq code handle sysfs groups ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: PCI/MSI: kernel NULL pointer dereference 2022-09-08 13:44 ` Deucher, Alexander @ 2022-09-08 14:30 ` Daniel Gomez 2022-09-14 16:19 ` Bjorn Helgaas 0 siblings, 1 reply; 5+ messages in thread From: Daniel Gomez @ 2022-09-08 14:30 UTC (permalink / raw) To: Deucher, Alexander Cc: Bjorn Helgaas, linux-pci, Thomas Gleixner, Koenig, Christian On Thu, 8 Sept 2022 at 15:44, Deucher, Alexander <Alexander.Deucher@amd.com> wrote: > > [Public] > > > -----Original Message----- > > From: Bjorn Helgaas <helgaas@kernel.org> > > Sent: Thursday, September 8, 2022 8:13 AM > > To: Daniel Gomez <daniel@qtec.com> > > Cc: linux-pci@vger.kernel.org; Thomas Gleixner <tglx@linutronix.de>; > > Deucher, Alexander <Alexander.Deucher@amd.com>; Koenig, Christian > > <Christian.Koenig@amd.com> > > Subject: Re: PCI/MSI: kernel NULL pointer dereference > > > > On Thu, Sep 08, 2022 at 12:41:00PM +0200, Daniel Gomez wrote: > > > Hi, > > > > > > I have the following error whenever I remove the fglrx module from the > > > latest 6.0-rc4. > > > > You bisected to 93296cd1325d; I don't see a commit with a "Fixes:" > > that references that. If you can reproduce this with an in-tree driver, we can > > certainly fix it, but it's harder for an out-of-tree driver. I understand, thanks. I tried with radeon but rmmod it is not possible once it's loaded. > > > > I cc'd some AMD graphics folks in case they have a pointer for where to get > > fglrx support. > > I can ask around, but I don't think we've actively worked on fglrx since we switched to the open source amdgpu driver 5-6 years ago. What hardware is this? It's an AMD G-T56N (bobcat) with an AMD ATI Radeon HD 6320. > > Alex > > > > > > Logs: > > > /mnt/raid0/krops/workspace/sources/fglrx- > > module/module/firegl_public.c > > > :1674 > > > KCL_SetPageCache_Array > > > <6>[fglrx] IRQ 37 Disabled > > > BUG: kernel NULL pointer dereference, address: 0000000000000010 > > > #PF: supervisor write access in kernel mode > > > #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 > > > Oops: 0002 [#1] SMP NOPTI > > > CPU: 1 PID: 254 Comm: rmmod Tainted: G W O > > > 6.0.0-rc4-qtec-standard #2 > > > Hardware name: QTechnology QT5022/QT5022, BIOS PM_2.1.0.309 X64 > > > 09/27/2013 > > > RIP: 0010:mutex_lock+0x2a/0x40 > > > Code: 0f 1f 44 00 00 53 be 1b 01 00 00 48 89 fb 48 c7 c7 08 81 3d 82 > > > e8 46 2c 52 ff e8 01 d7 ff ff 31 c0 65 48 8b 14 25 00 ad 01 00 <f0> > > > 48 0f b1 13 74 06 48 89 df 5b eb b9 5b c3 0f 1f 80 00 00 00 00 > > > RSP: 0018:ffffc90000b07dd8 EFLAGS: 00010246 > > > RAX: 0000000000000000 RBX: 0000000000000010 RCX: 0000000000000000 > > > RDX: ffff888116aabb00 RSI: 000000000000011b RDI: ffffffff823d8108 > > > RBP: ffff8881148d20d0 R08: 0000000000000000 R09: ffffffffa053537b > > > R10: ffff888149365cc0 R11: ffffea00052c5048 R12: 0000000000000000 > > > R13: ffff88813877c000 R14: 0000000000000000 R15: 0000000000000000 > > > FS: 00007f6f90b3cb80(0000) GS:ffff88815b300000(0000) > > > knlGS:0000000000000000 > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > CR2: 0000000000000010 CR3: 000000011e4c8000 CR4: 00000000000006e0 Call > > > Trace: > > > <TASK> > > > pci_disable_msi+0x34/0xe0 > > > irqmgr_wrap_shutdown+0x165/0x190 [fglrx] ? > > > firegl_takedown+0x841/0x950 [fglrx] ? kobject_put+0xa6/0x220 ? > > > cleanup_device+0x299/0x2a0 [fglrx] ? pci_unregister_driver+0x42/0xa0 > > > ? firegl_cleanup_device_heads+0x65/0xa0 [fglrx] ? > > > firegl_cleanup_module+0x84/0x11c [fglrx] ? > > > __x64_sys_delete_module+0x11b/0x210 > > > ? get_vtime_delta+0xe/0x40 > > > ? vtime_user_exit+0x1c/0x60 > > > ? __ct_user_exit+0x68/0xb0 > > > ? do_syscall_64+0x3c/0x80 > > > ? entry_SYSCALL_64_after_hwframe+0x63/0xcd > > > </TASK> > > > Modules linked in: amdgpu fglrx(O-) ath9k ath9k_common mfd_core > > > gpu_sched drm_buddy drm_ttm_helper ath9k_hw ttm > > drm_display_helper > > > drm_kms_helper ath sp5100_tco syscopyarea sysfillrect sysimgblt > > > fb_sys_fops video drm backlight > > > ipv6 > > > CR2: 0000000000000010 > > > ---[ end trace 0000000000000000 ]--- > > > RIP: 0010:mutex_lock+0x2a/0x40 > > > Code: 0f 1f 44 00 00 53 be 1b 01 00 00 48 89 fb 48 c7 c7 08 81 3d 82 > > > e8 46 2c 52 ff e8 01 d7 ff ff 31 c0 65 48 8b 14 25 00 ad 01 00 <f0> > > > 48 0f b1 13 74 06 48 89 df 5b eb b9 5b c3 0f 1f 80 00 00 00 00 > > > RSP: 0018:ffffc90000b07dd8 EFLAGS: 00010246 > > > RAX: 0000000000000000 RBX: 0000000000000010 RCX: 0000000000000000 > > > RDX: ffff888116aabb00 RSI: 000000000000011b RDI: ffffffff823d8108 > > > RBP: ffff8881148d20d0 R08: 0000000000000000 R09: ffffffffa053537b > > > R10: ffff888149365cc0 R11: ffffea00052c5048 R12: 0000000000000000 > > > R13: ffff88813877c000 R14: 0000000000000000 R15: 0000000000000000 > > > FS: 00007f6f90b3cb80(0000) GS:ffff88815b300000(0000) > > > knlGS:0000000000000000 > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > CR2: 0000000000000010 CR3: 000000011e4c8000 CR4: 00000000000006e0 > > > > > > Steps: > > > insmod fglrx.ko > > > clinfo > > > MatrixMultiplication > > > rmmod fglrx.ko > > > > > > I know this is an out of tree driver from AMD but we still need that > > > driver for some products because of the OpenCL stack support on it. > > > > > > Note: The open-source upstream radeon does not support OpenCL. > > > > > > So, doing git-bisect I found the issue is provoked by this commit [1]. > > > Unfortunately, I cannot revert it for testing as if I do it the system > > > hangs on boot because of this other commit [2]. > > > > > > I understand, the driver might have some issues but shouldn't the > > > kernel prevent this crash at pci_disable_msi function? Do we have a > > > mutex problem here provoked by the fglrx driver? > > > Does anyone have any suggestions on how we can/should proceed with > > this? > > > > > > Thanks in advance, > > > Daniel > > > > > > [1] Commit 93296cd1325d1d9afede60202d8833011c9001f2: > > > 93296cd1325d 2021-12-15 PCI/MSI: Allocate MSI device data on first use > > > [2] Commit ffd84485e6beb9cad3e5a133d88201b995298c33: > > > ffd84485e6be 2021-12-10 PCI/MSI: Let the irq code handle sysfs groups ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: PCI/MSI: kernel NULL pointer dereference 2022-09-08 14:30 ` Daniel Gomez @ 2022-09-14 16:19 ` Bjorn Helgaas 0 siblings, 0 replies; 5+ messages in thread From: Bjorn Helgaas @ 2022-09-14 16:19 UTC (permalink / raw) To: Daniel Gomez Cc: Deucher, Alexander, linux-pci, Thomas Gleixner, Koenig, Christian On Thu, Sep 08, 2022 at 04:30:28PM +0200, Daniel Gomez wrote: > On Thu, 8 Sept 2022 at 15:44, Deucher, Alexander > <Alexander.Deucher@amd.com> wrote: > > > -----Original Message----- > > > From: Bjorn Helgaas <helgaas@kernel.org> > > > Sent: Thursday, September 8, 2022 8:13 AM > > > To: Daniel Gomez <daniel@qtec.com> > > > Cc: linux-pci@vger.kernel.org; Thomas Gleixner <tglx@linutronix.de>; > > > Deucher, Alexander <Alexander.Deucher@amd.com>; Koenig, Christian > > > <Christian.Koenig@amd.com> > > > Subject: Re: PCI/MSI: kernel NULL pointer dereference > > > > > > On Thu, Sep 08, 2022 at 12:41:00PM +0200, Daniel Gomez wrote: > > > > Hi, > > > > > > > > I have the following error whenever I remove the fglrx module from the > > > > latest 6.0-rc4. > > > > > > You bisected to 93296cd1325d; I don't see a commit with a "Fixes:" > > > that references that. If you can reproduce this with an in-tree driver, we can > > > certainly fix it, but it's harder for an out-of-tree driver. > I understand, thanks. I tried with radeon but rmmod it is not possible > once it's loaded. A brute-force way to debug this would be to add logging to entry points in msi.c so you can see what MSI-related interfaces fglrx uses and in what order. Then you may be able to make a trivial module that does something similar that we could use as a test case so we could duplicate the failure and verify a fix. Bjorn ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-09-14 16:19 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-09-08 10:41 PCI/MSI: kernel NULL pointer dereference Daniel Gomez 2022-09-08 12:12 ` Bjorn Helgaas 2022-09-08 13:44 ` Deucher, Alexander 2022-09-08 14:30 ` Daniel Gomez 2022-09-14 16:19 ` Bjorn Helgaas
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.