linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] drm/amdkfd: Fix an illegal memory access
@ 2023-02-21 11:35 qu.huang
  2023-02-21 16:26 ` Felix Kuehling
  0 siblings, 1 reply; 4+ messages in thread
From: qu.huang @ 2023-02-21 11:35 UTC (permalink / raw)
  To: Felix.Kuehling, alexander.deucher, christian.koenig, Xinhui.Pan,
	airlied, daniel
  Cc: amd-gfx, dri-devel, linux-kernel, qu.huang

From: Qu Huang <qu.huang@linux.dev>

In the kfd_wait_on_events() function, the kfd_event_waiter structure is
allocated by alloc_event_waiters(), but the event field of the waiter
structure is not initialized; When copy_from_user() fails in the
kfd_wait_on_events() function, it will enter exception handling to
release the previously allocated memory of the waiter structure;
Due to the event field of the waiters structure being accessed
in the free_waiters() function, this results in illegal memory access
and system crash, here is the crash log:

localhost kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x185/0x1e0
localhost kernel: RSP: 0018:ffffaa53c362bd60 EFLAGS: 00010082
localhost kernel: RAX: ff3d3d6bff4007cb RBX: 0000000000000282 RCX: 00000000002c0000
localhost kernel: RDX: ffff9e855eeacb80 RSI: 000000000000279c RDI: ffffe7088f6a21d0
localhost kernel: RBP: ffffe7088f6a21d0 R08: 00000000002c0000 R09: ffffaa53c362be64
localhost kernel: R10: ffffaa53c362bbd8 R11: 0000000000000001 R12: 0000000000000002
localhost kernel: R13: ffff9e7ead15d600 R14: 0000000000000000 R15: ffff9e7ead15d698
localhost kernel: FS:  0000152a3d111700(0000) GS:ffff9e855ee80000(0000) knlGS:0000000000000000
localhost kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
localhost kernel: CR2: 0000152938000010 CR3: 000000044d7a4000 CR4: 00000000003506e0
localhost kernel: Call Trace:
localhost kernel: _raw_spin_lock_irqsave+0x30/0x40
localhost kernel: remove_wait_queue+0x12/0x50
localhost kernel: kfd_wait_on_events+0x1b6/0x490 [hydcu]
localhost kernel: ? ftrace_graph_caller+0xa0/0xa0
localhost kernel: kfd_ioctl+0x38c/0x4a0 [hydcu]
localhost kernel: ? kfd_ioctl_set_trap_handler+0x70/0x70 [hydcu]
localhost kernel: ? kfd_ioctl_create_queue+0x5a0/0x5a0 [hydcu]
localhost kernel: ? ftrace_graph_caller+0xa0/0xa0
localhost kernel: __x64_sys_ioctl+0x8e/0xd0
localhost kernel: ? syscall_trace_enter.isra.18+0x143/0x1b0
localhost kernel: do_syscall_64+0x33/0x80
localhost kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
localhost kernel: RIP: 0033:0x152a4dff68d7

Signed-off-by: Qu Huang <qu.huang@linux.dev>
---
 drivers/gpu/drm/amd/amdkfd/kfd_events.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
index 729d26d..e5faaad 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
@@ -787,6 +787,7 @@ static struct kfd_event_waiter *alloc_event_waiters(uint32_t num_events)
 	for (i = 0; (event_waiters) && (i < num_events) ; i++) {
 		init_wait(&event_waiters[i].wait);
 		event_waiters[i].activated = false;
+		event_waiters[i].event = NULL;
 	}

 	return event_waiters;
--
1.8.3.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] drm/amdkfd: Fix an illegal memory access
  2023-02-21 11:35 [PATCH] drm/amdkfd: Fix an illegal memory access qu.huang
@ 2023-02-21 16:26 ` Felix Kuehling
  2023-02-21 19:17   ` Christophe JAILLET
  0 siblings, 1 reply; 4+ messages in thread
From: Felix Kuehling @ 2023-02-21 16:26 UTC (permalink / raw)
  To: qu.huang, alexander.deucher, christian.koenig, Xinhui.Pan,
	airlied, daniel
  Cc: amd-gfx, dri-devel, linux-kernel


On 2023-02-21 06:35, qu.huang@linux.dev wrote:
> From: Qu Huang <qu.huang@linux.dev>
>
> In the kfd_wait_on_events() function, the kfd_event_waiter structure is
> allocated by alloc_event_waiters(), but the event field of the waiter
> structure is not initialized; When copy_from_user() fails in the
> kfd_wait_on_events() function, it will enter exception handling to
> release the previously allocated memory of the waiter structure;
> Due to the event field of the waiters structure being accessed
> in the free_waiters() function, this results in illegal memory access
> and system crash, here is the crash log:
>
> localhost kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x185/0x1e0
> localhost kernel: RSP: 0018:ffffaa53c362bd60 EFLAGS: 00010082
> localhost kernel: RAX: ff3d3d6bff4007cb RBX: 0000000000000282 RCX: 00000000002c0000
> localhost kernel: RDX: ffff9e855eeacb80 RSI: 000000000000279c RDI: ffffe7088f6a21d0
> localhost kernel: RBP: ffffe7088f6a21d0 R08: 00000000002c0000 R09: ffffaa53c362be64
> localhost kernel: R10: ffffaa53c362bbd8 R11: 0000000000000001 R12: 0000000000000002
> localhost kernel: R13: ffff9e7ead15d600 R14: 0000000000000000 R15: ffff9e7ead15d698
> localhost kernel: FS:  0000152a3d111700(0000) GS:ffff9e855ee80000(0000) knlGS:0000000000000000
> localhost kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> localhost kernel: CR2: 0000152938000010 CR3: 000000044d7a4000 CR4: 00000000003506e0
> localhost kernel: Call Trace:
> localhost kernel: _raw_spin_lock_irqsave+0x30/0x40
> localhost kernel: remove_wait_queue+0x12/0x50
> localhost kernel: kfd_wait_on_events+0x1b6/0x490 [hydcu]
> localhost kernel: ? ftrace_graph_caller+0xa0/0xa0
> localhost kernel: kfd_ioctl+0x38c/0x4a0 [hydcu]
> localhost kernel: ? kfd_ioctl_set_trap_handler+0x70/0x70 [hydcu]
> localhost kernel: ? kfd_ioctl_create_queue+0x5a0/0x5a0 [hydcu]
> localhost kernel: ? ftrace_graph_caller+0xa0/0xa0
> localhost kernel: __x64_sys_ioctl+0x8e/0xd0
> localhost kernel: ? syscall_trace_enter.isra.18+0x143/0x1b0
> localhost kernel: do_syscall_64+0x33/0x80
> localhost kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
> localhost kernel: RIP: 0033:0x152a4dff68d7
>
> Signed-off-by: Qu Huang <qu.huang@linux.dev>
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_events.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> index 729d26d..e5faaad 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> @@ -787,6 +787,7 @@ static struct kfd_event_waiter *alloc_event_waiters(uint32_t num_events)
>   	for (i = 0; (event_waiters) && (i < num_events) ; i++) {
>   		init_wait(&event_waiters[i].wait);
>   		event_waiters[i].activated = false;
> +		event_waiters[i].event = NULL;

Thank you for catching this. We're often lazy about initializing things 
to NULL or 0 because most of our data structures are allocated with 
kzalloc or similar. I'm not sure why we're not doing this here. If we 
allocated event_waiters with kcalloc, we could also remove the 
initialization of activated. I think that would be the cleaner and safer 
solution.

Regards,
   Felix


>   	}
>
>   	return event_waiters;
> --
> 1.8.3.1

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] drm/amdkfd: Fix an illegal memory access
  2023-02-21 16:26 ` Felix Kuehling
@ 2023-02-21 19:17   ` Christophe JAILLET
  2023-02-22  3:16     ` Qu Huang
  0 siblings, 1 reply; 4+ messages in thread
From: Christophe JAILLET @ 2023-02-21 19:17 UTC (permalink / raw)
  To: qu.huang
  Cc: Felix.Kuehling, Xinhui.Pan, airlied, alexander.deucher, amd-gfx,
	christian.koenig, daniel, dri-devel, linux-kernel

Le 21/02/2023 à 17:26, Felix Kuehling a écrit :
> 
> On 2023-02-21 06:35, qu.huang-fxUVXftIFDnyG1zEObXtfA@public.gmane.org 
> wrote:
>> From: Qu Huang <qu.huang-fxUVXftIFDnyG1zEObXtfA@public.gmane.org>
>>
>> In the kfd_wait_on_events() function, the kfd_event_waiter structure is
>> allocated by alloc_event_waiters(), but the event field of the waiter
>> structure is not initialized; When copy_from_user() fails in the
>> kfd_wait_on_events() function, it will enter exception handling to
>> release the previously allocated memory of the waiter structure;
>> Due to the event field of the waiters structure being accessed
>> in the free_waiters() function, this results in illegal memory access
>> and system crash, here is the crash log:
>>
>> localhost kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x185/0x1e0
>> localhost kernel: RSP: 0018:ffffaa53c362bd60 EFLAGS: 00010082
>> localhost kernel: RAX: ff3d3d6bff4007cb RBX: 0000000000000282 RCX: 
>> 00000000002c0000
>> localhost kernel: RDX: ffff9e855eeacb80 RSI: 000000000000279c RDI: 
>> ffffe7088f6a21d0
>> localhost kernel: RBP: ffffe7088f6a21d0 R08: 00000000002c0000 R09: 
>> ffffaa53c362be64
>> localhost kernel: R10: ffffaa53c362bbd8 R11: 0000000000000001 R12: 
>> 0000000000000002
>> localhost kernel: R13: ffff9e7ead15d600 R14: 0000000000000000 R15: 
>> ffff9e7ead15d698
>> localhost kernel: FS:  0000152a3d111700(0000) 
>> GS:ffff9e855ee80000(0000) knlGS:0000000000000000
>> localhost kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> localhost kernel: CR2: 0000152938000010 CR3: 000000044d7a4000 CR4: 
>> 00000000003506e0
>> localhost kernel: Call Trace:
>> localhost kernel: _raw_spin_lock_irqsave+0x30/0x40
>> localhost kernel: remove_wait_queue+0x12/0x50
>> localhost kernel: kfd_wait_on_events+0x1b6/0x490 [hydcu]
>> localhost kernel: ? ftrace_graph_caller+0xa0/0xa0
>> localhost kernel: kfd_ioctl+0x38c/0x4a0 [hydcu]
>> localhost kernel: ? kfd_ioctl_set_trap_handler+0x70/0x70 [hydcu]
>> localhost kernel: ? kfd_ioctl_create_queue+0x5a0/0x5a0 [hydcu]
>> localhost kernel: ? ftrace_graph_caller+0xa0/0xa0
>> localhost kernel: __x64_sys_ioctl+0x8e/0xd0
>> localhost kernel: ? syscall_trace_enter.isra.18+0x143/0x1b0
>> localhost kernel: do_syscall_64+0x33/0x80
>> localhost kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> localhost kernel: RIP: 0033:0x152a4dff68d7
>>
>> Signed-off-by: Qu Huang 
>> <qu.huang-fxUVXftIFDnyG1zEObXtfA@public.gmane.org>
>> ---
>>   drivers/gpu/drm/amd/amdkfd/kfd_events.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c 
>> b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
>> index 729d26d..e5faaad 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
>> @@ -787,6 +787,7 @@ static struct kfd_event_waiter 
>> *alloc_event_waiters(uint32_t num_events)
>>       for (i = 0; (event_waiters) && (i < num_events) ; i++) {
>>           init_wait(&event_waiters[i].wait);
>>           event_waiters[i].activated = false;
>> +        event_waiters[i].event = NULL;
> 
> Thank you for catching this. We're often lazy about initializing things 
> to NULL or 0 because most of our data structures are allocated with 
> kzalloc or similar. I'm not sure why we're not doing this here. If we 
> allocated event_waiters with kcalloc, we could also remove the 
> initialization of activated. I think that would be the cleaner and safer 
> solution.

Hi,

I think that the '(event_waiters) &&' in the 'for' can also be removed.
'event_waiters' is already NULL tested a few lines above


Just my 2c.

CJ

> 
> Regards,
>    Felix
> 
> 
>>       }
>>
>>       return event_waiters;
>> -- 
>> 1.8.3.1
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] drm/amdkfd: Fix an illegal memory access
  2023-02-21 19:17   ` Christophe JAILLET
@ 2023-02-22  3:16     ` Qu Huang
  0 siblings, 0 replies; 4+ messages in thread
From: Qu Huang @ 2023-02-22  3:16 UTC (permalink / raw)
  To: Christophe JAILLET
  Cc: Felix.Kuehling, Xinhui.Pan, airlied, alexander.deucher, amd-gfx,
	christian.koenig, daniel, dri-devel, linux-kernel

On 2023/2/22 3:17, Christophe JAILLET wrote:
> Le 21/02/2023 à 17:26, Felix Kuehling a écrit :
>>
>> On 2023-02-21 06:35, qu.huang-fxUVXftIFDnyG1zEObXtfA@public.gmane.org wrote:
>>> From: Qu Huang <qu.huang-fxUVXftIFDnyG1zEObXtfA@public.gmane.org>
>>>
>>> In the kfd_wait_on_events() function, the kfd_event_waiter structure is
>>> allocated by alloc_event_waiters(), but the event field of the waiter
>>> structure is not initialized; When copy_from_user() fails in the
>>> kfd_wait_on_events() function, it will enter exception handling to
>>> release the previously allocated memory of the waiter structure;
>>> Due to the event field of the waiters structure being accessed
>>> in the free_waiters() function, this results in illegal memory access
>>> and system crash, here is the crash log:
>>>
>>> localhost kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x185/0x1e0
>>> localhost kernel: RSP: 0018:ffffaa53c362bd60 EFLAGS: 00010082
>>> localhost kernel: RAX: ff3d3d6bff4007cb RBX: 0000000000000282 RCX: 00000000002c0000
>>> localhost kernel: RDX: ffff9e855eeacb80 RSI: 000000000000279c RDI: ffffe7088f6a21d0
>>> localhost kernel: RBP: ffffe7088f6a21d0 R08: 00000000002c0000 R09: ffffaa53c362be64
>>> localhost kernel: R10: ffffaa53c362bbd8 R11: 0000000000000001 R12: 0000000000000002
>>> localhost kernel: R13: ffff9e7ead15d600 R14: 0000000000000000 R15: ffff9e7ead15d698
>>> localhost kernel: FS:  0000152a3d111700(0000) GS:ffff9e855ee80000(0000) knlGS:0000000000000000
>>> localhost kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> localhost kernel: CR2: 0000152938000010 CR3: 000000044d7a4000 CR4: 00000000003506e0
>>> localhost kernel: Call Trace:
>>> localhost kernel: _raw_spin_lock_irqsave+0x30/0x40
>>> localhost kernel: remove_wait_queue+0x12/0x50
>>> localhost kernel: kfd_wait_on_events+0x1b6/0x490 [hydcu]
>>> localhost kernel: ? ftrace_graph_caller+0xa0/0xa0
>>> localhost kernel: kfd_ioctl+0x38c/0x4a0 [hydcu]
>>> localhost kernel: ? kfd_ioctl_set_trap_handler+0x70/0x70 [hydcu]
>>> localhost kernel: ? kfd_ioctl_create_queue+0x5a0/0x5a0 [hydcu]
>>> localhost kernel: ? ftrace_graph_caller+0xa0/0xa0
>>> localhost kernel: __x64_sys_ioctl+0x8e/0xd0
>>> localhost kernel: ? syscall_trace_enter.isra.18+0x143/0x1b0
>>> localhost kernel: do_syscall_64+0x33/0x80
>>> localhost kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>> localhost kernel: RIP: 0033:0x152a4dff68d7
>>>
>>> Signed-off-by: Qu Huang <qu.huang-fxUVXftIFDnyG1zEObXtfA@public.gmane.org>
>>> ---
>>>   drivers/gpu/drm/amd/amdkfd/kfd_events.c | 1 +
>>>   1 file changed, 1 insertion(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
>>> index 729d26d..e5faaad 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
>>> @@ -787,6 +787,7 @@ static struct kfd_event_waiter *alloc_event_waiters(uint32_t num_events)
>>>       for (i = 0; (event_waiters) && (i < num_events) ; i++) {
>>>           init_wait(&event_waiters[i].wait);
>>>           event_waiters[i].activated = false;
>>> +        event_waiters[i].event = NULL;
>>
>> Thank you for catching this. We're often lazy about initializing things to NULL or 0 because most of our data structures are allocated with kzalloc or similar. I'm not sure why we're not doing this here. If we allocated event_waiters with kcalloc, we could also remove the initialization of activated. I think that would be the cleaner and safer solution.
>
> Hi,
>
> I think that the '(event_waiters) &&' in the 'for' can also be removed.
> 'event_waiters' is already NULL tested a few lines above
>
>
> Just my 2c.
>
> CJ
>

Thanks for the suggestions from Felix and CJ, I have re-submitted patch v2, please review it:

https://lore.kernel.org/all/ea5b997309825b21e406f9bad2ce8779@linux.dev/

Regards,

Qu


>>
>> Regards,
>>    Felix
>>
>>
>>>       }
>>>
>>>       return event_waiters;
>>> -- 
>>> 1.8.3.1
>>
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-02-22  3:17 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-21 11:35 [PATCH] drm/amdkfd: Fix an illegal memory access qu.huang
2023-02-21 16:26 ` Felix Kuehling
2023-02-21 19:17   ` Christophe JAILLET
2023-02-22  3:16     ` Qu Huang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).