All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] drm/amdkfd: new event wait support
@ 2023-06-06 16:24 James Zhu
  2023-06-06 16:24 ` [PATCH v2 1/3] drm/amdkfd: add event age tracking James Zhu
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: James Zhu @ 2023-06-06 16:24 UTC (permalink / raw)
  To: amd-gfx; +Cc: Felix.kuehling, jamesz

In kernel amdgpu driver, kfd_wait_on_events is used to support user space signal event wait
function. For multiple threads waiting on same event scenery, race condition could occur
since some threads after checking signal condition, before calling kfd_wait_on_events, the 
event interrupt could be fired and wake up other thread which are sleeping on this event.
Then those threads could fall into sleep without waking up again. Adding event age tracking
in both kernel and user mode, will help avoiding this race condition.

The changes for The user space ROCT-Thunk-Interface/ROCR-Runtime are listed below for
review togehter with kernel mode changes.

ROCT-Thunk-Interface:
https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/commit/efdbf6cfbc026bd68ac3c35d00dacf84370eb81e
https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/commit/1820ae0a2db85b6f584611dc0cde1a00e7c22915

ROCR-Runtime:
https://github.com/RadeonOpenCompute/ROCR-Runtime/compare/master...zhums:ROCR-Runtime:new_event_wait_review
https://github.com/RadeonOpenCompute/ROCR-Runtime/commit/e1f5bdb88eb882ac798aeca2c00ea3fbb2dba459
https://github.com/RadeonOpenCompute/ROCR-Runtime/commit/7d26afd14107b5c2a754c1a3f415d89f3aabb503

-v2: remove unnecessay link

James Zhu (3):
  drm/amdkfd: add event age tracking
  drm/amdkfd: add event_age tracking when receiving interrupt
  drm/amdkfd: don't sleep when event age unmatch

 drivers/gpu/drm/amd/amdkfd/kfd_events.c | 21 +++++++++++++++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_events.h |  1 +
 include/uapi/linux/kfd_ioctl.h          | 13 +++++++++++--
 3 files changed, 33 insertions(+), 2 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/3] drm/amdkfd: add event age tracking
  2023-06-06 16:24 [PATCH v2 0/3] drm/amdkfd: new event wait support James Zhu
@ 2023-06-06 16:24 ` James Zhu
  2023-06-07 17:17   ` Felix Kuehling
  2023-06-06 16:24 ` [PATCH v2 2/3] drm/amdkfd: add event_age tracking when receiving interrupt James Zhu
  2023-06-06 16:24 ` [PATCH v2 3/3] drm/amdkfd: don't sleep when event age unmatch James Zhu
  2 siblings, 1 reply; 8+ messages in thread
From: James Zhu @ 2023-06-06 16:24 UTC (permalink / raw)
  To: amd-gfx; +Cc: Felix.kuehling, jamesz

Add event age tracking

Signed-off-by: James Zhu <James.Zhu@amd.com>
---
 include/uapi/linux/kfd_ioctl.h | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index 1781e7669982..eeb2fdcbdcb7 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -39,9 +39,10 @@
  * - 1.11 - Add unified memory for ctx save/restore area
  * - 1.12 - Add DMA buf export ioctl
  * - 1.13 - Add debugger API
+ * - 1.14 - Update kfd_event_data
  */
 #define KFD_IOCTL_MAJOR_VERSION 1
-#define KFD_IOCTL_MINOR_VERSION 13
+#define KFD_IOCTL_MINOR_VERSION 14
 
 struct kfd_ioctl_get_version_args {
 	__u32 major_version;	/* from KFD */
@@ -320,12 +321,20 @@ struct kfd_hsa_hw_exception_data {
 	__u32 gpu_id;
 };
 
+/* hsa signal event data */
+struct kfd_hsa_signal_event_data {
+	__u64 last_event_age;	/* to and from KFD */
+};
+
 /* Event data */
 struct kfd_event_data {
 	union {
+		/* From KFD */
 		struct kfd_hsa_memory_exception_data memory_exception_data;
 		struct kfd_hsa_hw_exception_data hw_exception_data;
-	};				/* From KFD */
+		/* To and From KFD */
+		struct kfd_hsa_signal_event_data signal_event_data;
+	};
 	__u64 kfd_event_data_ext;	/* pointer to an extension structure
 					   for future exception types */
 	__u32 event_id;		/* to KFD */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 2/3] drm/amdkfd: add event_age tracking when receiving interrupt
  2023-06-06 16:24 [PATCH v2 0/3] drm/amdkfd: new event wait support James Zhu
  2023-06-06 16:24 ` [PATCH v2 1/3] drm/amdkfd: add event age tracking James Zhu
@ 2023-06-06 16:24 ` James Zhu
  2023-06-06 16:24 ` [PATCH v2 3/3] drm/amdkfd: don't sleep when event age unmatch James Zhu
  2 siblings, 0 replies; 8+ messages in thread
From: James Zhu @ 2023-06-06 16:24 UTC (permalink / raw)
  To: amd-gfx; +Cc: Felix.kuehling, jamesz

Add event_age tracking when receiving interrupt.

Signed-off-by: James Zhu <James.Zhu@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_events.c | 6 ++++++
 drivers/gpu/drm/amd/amdkfd/kfd_events.h | 1 +
 2 files changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
index 7ff5c4e1b7e2..c7689181cc22 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
@@ -431,6 +431,7 @@ int kfd_event_create(struct file *devkfd, struct kfd_process *p,
 	if (!ret) {
 		*event_id = ev->event_id;
 		*event_trigger_data = ev->event_id;
+		ev->event_age = 1;
 	} else {
 		kfree(ev);
 	}
@@ -629,6 +630,11 @@ static void set_event(struct kfd_event *ev)
 	 * updating the wait queues in kfd_wait_on_events.
 	 */
 	ev->signaled = !ev->auto_reset || !waitqueue_active(&ev->wq);
+	if (!(++ev->event_age)) {
+		/* Never wrap back to reserved/default event age 0/1 */
+		ev->event_age = 2;
+		WARN_ONCE(1, "event_age wrap back!");
+	}
 
 	list_for_each_entry(waiter, &ev->wq.head, wait.entry)
 		WRITE_ONCE(waiter->activated, true);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.h b/drivers/gpu/drm/amd/amdkfd/kfd_events.h
index 1c62c8dd6460..52ccfd397c2b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_events.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.h
@@ -53,6 +53,7 @@ struct signal_page;
 
 struct kfd_event {
 	u32 event_id;
+	u64 event_age;
 
 	bool signaled;
 	bool auto_reset;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 3/3] drm/amdkfd: don't sleep when event age unmatch
  2023-06-06 16:24 [PATCH v2 0/3] drm/amdkfd: new event wait support James Zhu
  2023-06-06 16:24 ` [PATCH v2 1/3] drm/amdkfd: add event age tracking James Zhu
  2023-06-06 16:24 ` [PATCH v2 2/3] drm/amdkfd: add event_age tracking when receiving interrupt James Zhu
@ 2023-06-06 16:24 ` James Zhu
  2023-06-07 17:27   ` Felix Kuehling
  2 siblings, 1 reply; 8+ messages in thread
From: James Zhu @ 2023-06-06 16:24 UTC (permalink / raw)
  To: amd-gfx; +Cc: Felix.kuehling, jamesz

Don't sleep when event age unmatch, and update last_event_age.
It is only for KFD_EVENT_TYPE_SIGNAL which is checked by user space.

Signed-off-by: James Zhu <James.Zhu@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_events.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
index c7689181cc22..f4ceb5be78ed 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
@@ -952,6 +952,21 @@ int kfd_wait_on_events(struct kfd_process *p,
 					event_data.event_id);
 		if (ret)
 			goto out_unlock;
+
+		/* last_event_age = 0 reserved for backward compatible */
+		if (event_data.signal_event_data.last_event_age &&
+			event_waiters[i].event->event_age !=
+				event_data.signal_event_data.last_event_age) {
+			event_data.signal_event_data.last_event_age =
+				event_waiters[i].event->event_age;
+			WRITE_ONCE(event_waiters[i].activated, true);
+
+			if (copy_to_user(&events[i], &event_data,
+				sizeof(struct kfd_event_data))) {
+				ret = -EFAULT;
+				goto out_unlock;
+			}
+		}
 	}
 
 	/* Check condition once. */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/3] drm/amdkfd: add event age tracking
  2023-06-06 16:24 ` [PATCH v2 1/3] drm/amdkfd: add event age tracking James Zhu
@ 2023-06-07 17:17   ` Felix Kuehling
  2023-06-07 17:20     ` James Zhu
  0 siblings, 1 reply; 8+ messages in thread
From: Felix Kuehling @ 2023-06-07 17:17 UTC (permalink / raw)
  To: James Zhu, amd-gfx; +Cc: jamesz

On 2023-06-06 12:24, James Zhu wrote:
> Add event age tracking
>
> Signed-off-by: James Zhu <James.Zhu@amd.com>
> ---
>   include/uapi/linux/kfd_ioctl.h | 13 +++++++++++--
>   1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
> index 1781e7669982..eeb2fdcbdcb7 100644
> --- a/include/uapi/linux/kfd_ioctl.h
> +++ b/include/uapi/linux/kfd_ioctl.h
> @@ -39,9 +39,10 @@
>    * - 1.11 - Add unified memory for ctx save/restore area
>    * - 1.12 - Add DMA buf export ioctl
>    * - 1.13 - Add debugger API
> + * - 1.14 - Update kfd_event_data
>    */
>   #define KFD_IOCTL_MAJOR_VERSION 1
> -#define KFD_IOCTL_MINOR_VERSION 13
> +#define KFD_IOCTL_MINOR_VERSION 14

Bumping the version number should be done in the last patch in the 
series, once the feature is fully enabled.

Regards,
   Felix


>   
>   struct kfd_ioctl_get_version_args {
>   	__u32 major_version;	/* from KFD */
> @@ -320,12 +321,20 @@ struct kfd_hsa_hw_exception_data {
>   	__u32 gpu_id;
>   };
>   
> +/* hsa signal event data */
> +struct kfd_hsa_signal_event_data {
> +	__u64 last_event_age;	/* to and from KFD */
> +};
> +
>   /* Event data */
>   struct kfd_event_data {
>   	union {
> +		/* From KFD */
>   		struct kfd_hsa_memory_exception_data memory_exception_data;
>   		struct kfd_hsa_hw_exception_data hw_exception_data;
> -	};				/* From KFD */
> +		/* To and From KFD */
> +		struct kfd_hsa_signal_event_data signal_event_data;
> +	};
>   	__u64 kfd_event_data_ext;	/* pointer to an extension structure
>   					   for future exception types */
>   	__u32 event_id;		/* to KFD */

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/3] drm/amdkfd: add event age tracking
  2023-06-07 17:17   ` Felix Kuehling
@ 2023-06-07 17:20     ` James Zhu
  0 siblings, 0 replies; 8+ messages in thread
From: James Zhu @ 2023-06-07 17:20 UTC (permalink / raw)
  To: Felix Kuehling, James Zhu, amd-gfx


On 2023-06-07 13:17, Felix Kuehling wrote:
> On 2023-06-06 12:24, James Zhu wrote:
>> Add event age tracking
>>
>> Signed-off-by: James Zhu <James.Zhu@amd.com>
>> ---
>>   include/uapi/linux/kfd_ioctl.h | 13 +++++++++++--
>>   1 file changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/uapi/linux/kfd_ioctl.h 
>> b/include/uapi/linux/kfd_ioctl.h
>> index 1781e7669982..eeb2fdcbdcb7 100644
>> --- a/include/uapi/linux/kfd_ioctl.h
>> +++ b/include/uapi/linux/kfd_ioctl.h
>> @@ -39,9 +39,10 @@
>>    * - 1.11 - Add unified memory for ctx save/restore area
>>    * - 1.12 - Add DMA buf export ioctl
>>    * - 1.13 - Add debugger API
>> + * - 1.14 - Update kfd_event_data
>>    */
>>   #define KFD_IOCTL_MAJOR_VERSION 1
>> -#define KFD_IOCTL_MINOR_VERSION 13
>> +#define KFD_IOCTL_MINOR_VERSION 14
>
> Bumping the version number should be done in the last patch in the 
> series, once the feature is fully enabled.
[JZ] Noted, Thanks!
>
> Regards,
>   Felix
>
>
>>     struct kfd_ioctl_get_version_args {
>>       __u32 major_version;    /* from KFD */
>> @@ -320,12 +321,20 @@ struct kfd_hsa_hw_exception_data {
>>       __u32 gpu_id;
>>   };
>>   +/* hsa signal event data */
>> +struct kfd_hsa_signal_event_data {
>> +    __u64 last_event_age;    /* to and from KFD */
>> +};
>> +
>>   /* Event data */
>>   struct kfd_event_data {
>>       union {
>> +        /* From KFD */
>>           struct kfd_hsa_memory_exception_data memory_exception_data;
>>           struct kfd_hsa_hw_exception_data hw_exception_data;
>> -    };                /* From KFD */
>> +        /* To and From KFD */
>> +        struct kfd_hsa_signal_event_data signal_event_data;
>> +    };
>>       __u64 kfd_event_data_ext;    /* pointer to an extension structure
>>                          for future exception types */
>>       __u32 event_id;        /* to KFD */

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 3/3] drm/amdkfd: don't sleep when event age unmatch
  2023-06-06 16:24 ` [PATCH v2 3/3] drm/amdkfd: don't sleep when event age unmatch James Zhu
@ 2023-06-07 17:27   ` Felix Kuehling
  2023-06-07 17:49     ` James Zhu
  0 siblings, 1 reply; 8+ messages in thread
From: Felix Kuehling @ 2023-06-07 17:27 UTC (permalink / raw)
  To: James Zhu, amd-gfx; +Cc: jamesz

On 2023-06-06 12:24, James Zhu wrote:
> Don't sleep when event age unmatch, and update last_event_age.
> It is only for KFD_EVENT_TYPE_SIGNAL which is checked by user space.
>
> Signed-off-by: James Zhu <James.Zhu@amd.com>
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_events.c | 15 +++++++++++++++
>   1 file changed, 15 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> index c7689181cc22..f4ceb5be78ed 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> @@ -952,6 +952,21 @@ int kfd_wait_on_events(struct kfd_process *p,
>   					event_data.event_id);
>   		if (ret)
>   			goto out_unlock;
> +
> +		/* last_event_age = 0 reserved for backward compatible */
> +		if (event_data.signal_event_data.last_event_age &&
> +			event_waiters[i].event->event_age !=
> +				event_data.signal_event_data.last_event_age) {
> +			event_data.signal_event_data.last_event_age =
> +				event_waiters[i].event->event_age;

The event_age is updated in set_event under the event->spin_lock. You 
need to take that lock for this check here as well.

I think the easiest way to do this would be to move the check into 
init_event_waiter. That way you can initialize the waiter as activated 
if the event age is not up to date.


> +			WRITE_ONCE(event_waiters[i].activated, true);
> +
> +			if (copy_to_user(&events[i], &event_data,
> +				sizeof(struct kfd_event_data))) {
> +				ret = -EFAULT;
> +				goto out_unlock;
> +			}
> +		}

I think we also need to update the event age in event data after an 
event has signaled. You should probably move updating and copying of the 
event age to user mode into copy_signaled_event_data. That way it would 
handle all the cases.

Regards,
   Felix


>   	}
>   
>   	/* Check condition once. */

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 3/3] drm/amdkfd: don't sleep when event age unmatch
  2023-06-07 17:27   ` Felix Kuehling
@ 2023-06-07 17:49     ` James Zhu
  0 siblings, 0 replies; 8+ messages in thread
From: James Zhu @ 2023-06-07 17:49 UTC (permalink / raw)
  To: Felix Kuehling, James Zhu, amd-gfx


On 2023-06-07 13:27, Felix Kuehling wrote:
> On 2023-06-06 12:24, James Zhu wrote:
>> Don't sleep when event age unmatch, and update last_event_age.
>> It is only for KFD_EVENT_TYPE_SIGNAL which is checked by user space.
>>
>> Signed-off-by: James Zhu <James.Zhu@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdkfd/kfd_events.c | 15 +++++++++++++++
>>   1 file changed, 15 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c 
>> b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
>> index c7689181cc22..f4ceb5be78ed 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
>> @@ -952,6 +952,21 @@ int kfd_wait_on_events(struct kfd_process *p,
>>                       event_data.event_id);
>>           if (ret)
>>               goto out_unlock;
>> +
>> +        /* last_event_age = 0 reserved for backward compatible */
>> +        if (event_data.signal_event_data.last_event_age &&
>> +            event_waiters[i].event->event_age !=
>> +                event_data.signal_event_data.last_event_age) {
>> +            event_data.signal_event_data.last_event_age =
>> +                event_waiters[i].event->event_age;
>
> The event_age is updated in set_event under the event->spin_lock. You 
> need to take that lock for this check here as well.
>
> I think the easiest way to do this would be to move the check into 
> init_event_waiter. That way you can initialize the waiter as activated 
> if the event age is not up to date.
[JZ] Sure
>
>
>> + WRITE_ONCE(event_waiters[i].activated, true);
>> +
>> +            if (copy_to_user(&events[i], &event_data,
>> +                sizeof(struct kfd_event_data))) {
>> +                ret = -EFAULT;
>> +                goto out_unlock;
>> +            }
>> +        }
>
> I think we also need to update the event age in event data after an 
> event has signaled. You should probably move updating and copying of 
> the event age to user mode into copy_signaled_event_data. That way it 
> would handle all the cases.
[JZ] Sure
>
> Regards,
>   Felix
>
>
>>       }
>>         /* Check condition once. */

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-06-07 17:49 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-06 16:24 [PATCH v2 0/3] drm/amdkfd: new event wait support James Zhu
2023-06-06 16:24 ` [PATCH v2 1/3] drm/amdkfd: add event age tracking James Zhu
2023-06-07 17:17   ` Felix Kuehling
2023-06-07 17:20     ` James Zhu
2023-06-06 16:24 ` [PATCH v2 2/3] drm/amdkfd: add event_age tracking when receiving interrupt James Zhu
2023-06-06 16:24 ` [PATCH v2 3/3] drm/amdkfd: don't sleep when event age unmatch James Zhu
2023-06-07 17:27   ` Felix Kuehling
2023-06-07 17:49     ` James Zhu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.