All of lore.kernel.org
 help / color / mirror / Atom feed
* [Intel-xe] [PATCH] drm/xe: Use atomic instead of mutex for xe_device_mem_access_ongoing
@ 2023-02-28 10:17 Maarten Lankhorst
  2023-03-01 16:36 ` Lucas De Marchi
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Maarten Lankhorst @ 2023-02-28 10:17 UTC (permalink / raw)
  To: intel-xe; +Cc: Maarten Lankhorst

xe_guc_ct_fast_path() is called from an irq context, and cannot lock
the mutex used by xe_device_mem_access_ongoing().

Fortunately it is easy to fix, and the atomic guarantees are good enough
to ensure xe->mem_access.hold_rpm is set before last ref is dropped.

As far as I can tell, the runtime ref in device access should be
killable, but don't dare to do it yet.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_device.c       | 17 ++++++++---------
 drivers/gpu/drm/xe/xe_device.h       | 14 ++++----------
 drivers/gpu/drm/xe/xe_device_types.h |  4 +---
 3 files changed, 13 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 4eb6786b11f0..ab179b1e24c1 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -237,7 +237,6 @@ struct xe_device *xe_device_create(struct pci_dev *pdev,
 	if (err)
 		goto err;
 
-	mutex_init(&xe->mem_access.lock);
 	return xe;
 
 err_put:
@@ -424,25 +423,25 @@ u32 xe_device_ccs_bytes(struct xe_device *xe, u64 size)
 void xe_device_mem_access_get(struct xe_device *xe)
 {
 	bool resumed = xe_pm_runtime_resume_if_suspended(xe);
+	int ref = atomic_inc_return(&xe->mem_access.ref);
 
-	mutex_lock(&xe->mem_access.lock);
-	if (xe->mem_access.ref++ == 0)
+	if (ref == 1)
 		xe->mem_access.hold_rpm = xe_pm_runtime_get_if_active(xe);
-	mutex_unlock(&xe->mem_access.lock);
 
 	/* The usage counter increased if device was immediately resumed */
 	if (resumed)
 		xe_pm_runtime_put(xe);
 
-	XE_WARN_ON(xe->mem_access.ref == S32_MAX);
+	XE_WARN_ON(ref == S32_MAX);
 }
 
 void xe_device_mem_access_put(struct xe_device *xe)
 {
-	mutex_lock(&xe->mem_access.lock);
-	if (--xe->mem_access.ref == 0 && xe->mem_access.hold_rpm)
+	bool hold = xe->mem_access.hold_rpm;
+	int ref = atomic_dec_return(&xe->mem_access.ref);
+
+	if (!ref && hold)
 		xe_pm_runtime_put(xe);
-	mutex_unlock(&xe->mem_access.lock);
 
-	XE_WARN_ON(xe->mem_access.ref < 0);
+	XE_WARN_ON(ref < 0);
 }
diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
index 263620953c3b..96b4f3d7969e 100644
--- a/drivers/gpu/drm/xe/xe_device.h
+++ b/drivers/gpu/drm/xe/xe_device.h
@@ -90,20 +90,14 @@ static inline struct xe_force_wake * gt_to_fw(struct xe_gt *gt)
 void xe_device_mem_access_get(struct xe_device *xe);
 void xe_device_mem_access_put(struct xe_device *xe);
 
-static inline void xe_device_assert_mem_access(struct xe_device *xe)
+static inline bool xe_device_mem_access_ongoing(struct xe_device *xe)
 {
-	XE_WARN_ON(!xe->mem_access.ref);
+	return atomic_read(&xe->mem_access.ref);
 }
 
-static inline bool xe_device_mem_access_ongoing(struct xe_device *xe)
+static inline void xe_device_assert_mem_access(struct xe_device *xe)
 {
-	bool ret;
-
-	mutex_lock(&xe->mem_access.lock);
-	ret = xe->mem_access.ref;
-	mutex_unlock(&xe->mem_access.lock);
-
-	return ret;
+	XE_WARN_ON(!xe_device_mem_access_ongoing(xe));
 }
 
 static inline bool xe_device_in_fault_mode(struct xe_device *xe)
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 9743987fc883..0b8c4ee0ad48 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -230,10 +230,8 @@ struct xe_device {
 	 * triggering additional actions when they occur.
 	 */
 	struct {
-		/** @lock: protect the ref count */
-		struct mutex lock;
 		/** @ref: ref count of memory accesses */
-		s32 ref;
+		atomic_t ref;
 		/** @hold_rpm: need to put rpm ref back at the end */
 		bool hold_rpm;
 	} mem_access;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [Intel-xe] [PATCH] drm/xe: Use atomic instead of mutex for xe_device_mem_access_ongoing
  2023-02-28 10:17 [Intel-xe] [PATCH] drm/xe: Use atomic instead of mutex for xe_device_mem_access_ongoing Maarten Lankhorst
@ 2023-03-01 16:36 ` Lucas De Marchi
  2023-03-01 23:14   ` Rodrigo Vivi
  2023-03-17 23:40 ` Matthew Brost
  2023-03-21 14:39 ` Matthew Brost
  2 siblings, 1 reply; 10+ messages in thread
From: Lucas De Marchi @ 2023-03-01 16:36 UTC (permalink / raw)
  To: Maarten Lankhorst; +Cc: intel-xe

On Tue, Feb 28, 2023 at 11:17:30AM +0100, Maarten Lankhorst wrote:
>xe_guc_ct_fast_path() is called from an irq context, and cannot lock
>the mutex used by xe_device_mem_access_ongoing().
>
>Fortunately it is easy to fix, and the atomic guarantees are good enough
>to ensure xe->mem_access.hold_rpm is set before last ref is dropped.
>
>As far as I can tell, the runtime ref in device access should be
>killable, but don't dare to do it yet.

I don't follow this last paragraph. Could you point it in the code?

>
>Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>---
> drivers/gpu/drm/xe/xe_device.c       | 17 ++++++++---------
> drivers/gpu/drm/xe/xe_device.h       | 14 ++++----------
> drivers/gpu/drm/xe/xe_device_types.h |  4 +---
> 3 files changed, 13 insertions(+), 22 deletions(-)
>
>diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
>index 4eb6786b11f0..ab179b1e24c1 100644
>--- a/drivers/gpu/drm/xe/xe_device.c
>+++ b/drivers/gpu/drm/xe/xe_device.c
>@@ -237,7 +237,6 @@ struct xe_device *xe_device_create(struct pci_dev *pdev,
> 	if (err)
> 		goto err;
>
>-	mutex_init(&xe->mem_access.lock);
> 	return xe;
>
> err_put:
>@@ -424,25 +423,25 @@ u32 xe_device_ccs_bytes(struct xe_device *xe, u64 size)
> void xe_device_mem_access_get(struct xe_device *xe)
> {
> 	bool resumed = xe_pm_runtime_resume_if_suspended(xe);
>+	int ref = atomic_inc_return(&xe->mem_access.ref);


+Matt Brost

Any reason for not using kref?

Lucas De Marchi

>
>-	mutex_lock(&xe->mem_access.lock);
>-	if (xe->mem_access.ref++ == 0)
>+	if (ref == 1)
> 		xe->mem_access.hold_rpm = xe_pm_runtime_get_if_active(xe);
>-	mutex_unlock(&xe->mem_access.lock);
>
> 	/* The usage counter increased if device was immediately resumed */
> 	if (resumed)
> 		xe_pm_runtime_put(xe);
>
>-	XE_WARN_ON(xe->mem_access.ref == S32_MAX);
>+	XE_WARN_ON(ref == S32_MAX);
> }
>
> void xe_device_mem_access_put(struct xe_device *xe)
> {
>-	mutex_lock(&xe->mem_access.lock);
>-	if (--xe->mem_access.ref == 0 && xe->mem_access.hold_rpm)
>+	bool hold = xe->mem_access.hold_rpm;
>+	int ref = atomic_dec_return(&xe->mem_access.ref);
>+
>+	if (!ref && hold)
> 		xe_pm_runtime_put(xe);
>-	mutex_unlock(&xe->mem_access.lock);
>
>-	XE_WARN_ON(xe->mem_access.ref < 0);
>+	XE_WARN_ON(ref < 0);
> }
>diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
>index 263620953c3b..96b4f3d7969e 100644
>--- a/drivers/gpu/drm/xe/xe_device.h
>+++ b/drivers/gpu/drm/xe/xe_device.h
>@@ -90,20 +90,14 @@ static inline struct xe_force_wake * gt_to_fw(struct xe_gt *gt)
> void xe_device_mem_access_get(struct xe_device *xe);
> void xe_device_mem_access_put(struct xe_device *xe);
>
>-static inline void xe_device_assert_mem_access(struct xe_device *xe)
>+static inline bool xe_device_mem_access_ongoing(struct xe_device *xe)
> {
>-	XE_WARN_ON(!xe->mem_access.ref);
>+	return atomic_read(&xe->mem_access.ref);
> }
>
>-static inline bool xe_device_mem_access_ongoing(struct xe_device *xe)
>+static inline void xe_device_assert_mem_access(struct xe_device *xe)
> {
>-	bool ret;
>-
>-	mutex_lock(&xe->mem_access.lock);
>-	ret = xe->mem_access.ref;
>-	mutex_unlock(&xe->mem_access.lock);
>-
>-	return ret;
>+	XE_WARN_ON(!xe_device_mem_access_ongoing(xe));
> }
>
> static inline bool xe_device_in_fault_mode(struct xe_device *xe)
>diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
>index 9743987fc883..0b8c4ee0ad48 100644
>--- a/drivers/gpu/drm/xe/xe_device_types.h
>+++ b/drivers/gpu/drm/xe/xe_device_types.h
>@@ -230,10 +230,8 @@ struct xe_device {
> 	 * triggering additional actions when they occur.
> 	 */
> 	struct {
>-		/** @lock: protect the ref count */
>-		struct mutex lock;
> 		/** @ref: ref count of memory accesses */
>-		s32 ref;
>+		atomic_t ref;
> 		/** @hold_rpm: need to put rpm ref back at the end */
> 		bool hold_rpm;
> 	} mem_access;
>-- 
>2.34.1
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Intel-xe] [PATCH] drm/xe: Use atomic instead of mutex for xe_device_mem_access_ongoing
  2023-03-01 16:36 ` Lucas De Marchi
@ 2023-03-01 23:14   ` Rodrigo Vivi
  2023-03-02 11:26     ` Maarten Lankhorst
  0 siblings, 1 reply; 10+ messages in thread
From: Rodrigo Vivi @ 2023-03-01 23:14 UTC (permalink / raw)
  To: Lucas De Marchi; +Cc: intel-xe

On Wed, Mar 01, 2023 at 08:36:29AM -0800, Lucas De Marchi wrote:
> On Tue, Feb 28, 2023 at 11:17:30AM +0100, Maarten Lankhorst wrote:
> > xe_guc_ct_fast_path() is called from an irq context, and cannot lock
> > the mutex used by xe_device_mem_access_ongoing().
> > 
> > Fortunately it is easy to fix, and the atomic guarantees are good enough
> > to ensure xe->mem_access.hold_rpm is set before last ref is dropped.
> > 
> > As far as I can tell, the runtime ref in device access should be
> > killable, but don't dare to do it yet.
> 
> I don't follow this last paragraph. Could you point it in the code?

I also didn't understand this... if we remove that we will end up in
memory access with the sleeping device...

> 
> > 
> > Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > ---
> > drivers/gpu/drm/xe/xe_device.c       | 17 ++++++++---------
> > drivers/gpu/drm/xe/xe_device.h       | 14 ++++----------
> > drivers/gpu/drm/xe/xe_device_types.h |  4 +---
> > 3 files changed, 13 insertions(+), 22 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> > index 4eb6786b11f0..ab179b1e24c1 100644
> > --- a/drivers/gpu/drm/xe/xe_device.c
> > +++ b/drivers/gpu/drm/xe/xe_device.c
> > @@ -237,7 +237,6 @@ struct xe_device *xe_device_create(struct pci_dev *pdev,
> > 	if (err)
> > 		goto err;
> > 
> > -	mutex_init(&xe->mem_access.lock);
> > 	return xe;
> > 
> > err_put:
> > @@ -424,25 +423,25 @@ u32 xe_device_ccs_bytes(struct xe_device *xe, u64 size)
> > void xe_device_mem_access_get(struct xe_device *xe)
> > {
> > 	bool resumed = xe_pm_runtime_resume_if_suspended(xe);
> > +	int ref = atomic_inc_return(&xe->mem_access.ref);
> 
> 
> +Matt Brost
> 
> Any reason for not using kref?

hmmm... my bad actually...

I did considered the kref, but I can't remember why I haven't used it.
I recently was asking myself the same question.

> 
> Lucas De Marchi
> 
> > 
> > -	mutex_lock(&xe->mem_access.lock);
> > -	if (xe->mem_access.ref++ == 0)
> > +	if (ref == 1)
> > 		xe->mem_access.hold_rpm = xe_pm_runtime_get_if_active(xe);

hmmm... I'm afraid this can be tricky without locks...

if we have 3 simultaneous threads calling this.
get
get
put
get

and they happened in this order but the resume didn't finished yet
on the first one, then you will:
1. end up the runtime pm twice.
2. the second will pass over thinking the gpu is already awake, but it might
be still asleep.

> > -	mutex_unlock(&xe->mem_access.lock);
> > 
> > 	/* The usage counter increased if device was immediately resumed */
> > 	if (resumed)
> > 		xe_pm_runtime_put(xe);
> > 
> > -	XE_WARN_ON(xe->mem_access.ref == S32_MAX);
> > +	XE_WARN_ON(ref == S32_MAX);
> > }
> > 
> > void xe_device_mem_access_put(struct xe_device *xe)
> > {
> > -	mutex_lock(&xe->mem_access.lock);
> > -	if (--xe->mem_access.ref == 0 && xe->mem_access.hold_rpm)
> > +	bool hold = xe->mem_access.hold_rpm;
> > +	int ref = atomic_dec_return(&xe->mem_access.ref);
> > +
> > +	if (!ref && hold)
> > 		xe_pm_runtime_put(xe);
> > -	mutex_unlock(&xe->mem_access.lock);
> > 
> > -	XE_WARN_ON(xe->mem_access.ref < 0);
> > +	XE_WARN_ON(ref < 0);
> > }
> > diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
> > index 263620953c3b..96b4f3d7969e 100644
> > --- a/drivers/gpu/drm/xe/xe_device.h
> > +++ b/drivers/gpu/drm/xe/xe_device.h
> > @@ -90,20 +90,14 @@ static inline struct xe_force_wake * gt_to_fw(struct xe_gt *gt)
> > void xe_device_mem_access_get(struct xe_device *xe);
> > void xe_device_mem_access_put(struct xe_device *xe);
> > 
> > -static inline void xe_device_assert_mem_access(struct xe_device *xe)
> > +static inline bool xe_device_mem_access_ongoing(struct xe_device *xe)
> > {
> > -	XE_WARN_ON(!xe->mem_access.ref);
> > +	return atomic_read(&xe->mem_access.ref);
> > }
> > 
> > -static inline bool xe_device_mem_access_ongoing(struct xe_device *xe)
> > +static inline void xe_device_assert_mem_access(struct xe_device *xe)
> > {
> > -	bool ret;
> > -
> > -	mutex_lock(&xe->mem_access.lock);
> > -	ret = xe->mem_access.ref;
> > -	mutex_unlock(&xe->mem_access.lock);
> > -
> > -	return ret;
> > +	XE_WARN_ON(!xe_device_mem_access_ongoing(xe));
> > }
> > 
> > static inline bool xe_device_in_fault_mode(struct xe_device *xe)
> > diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
> > index 9743987fc883..0b8c4ee0ad48 100644
> > --- a/drivers/gpu/drm/xe/xe_device_types.h
> > +++ b/drivers/gpu/drm/xe/xe_device_types.h
> > @@ -230,10 +230,8 @@ struct xe_device {
> > 	 * triggering additional actions when they occur.
> > 	 */
> > 	struct {
> > -		/** @lock: protect the ref count */
> > -		struct mutex lock;
> > 		/** @ref: ref count of memory accesses */
> > -		s32 ref;
> > +		atomic_t ref;
> > 		/** @hold_rpm: need to put rpm ref back at the end */
> > 		bool hold_rpm;
> > 	} mem_access;
> > -- 
> > 2.34.1
> > 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Intel-xe] [PATCH] drm/xe: Use atomic instead of mutex for xe_device_mem_access_ongoing
  2023-03-01 23:14   ` Rodrigo Vivi
@ 2023-03-02 11:26     ` Maarten Lankhorst
  2023-03-02 22:33       ` Rodrigo Vivi
  0 siblings, 1 reply; 10+ messages in thread
From: Maarten Lankhorst @ 2023-03-02 11:26 UTC (permalink / raw)
  To: Rodrigo Vivi, Lucas De Marchi; +Cc: intel-xe


On 2023-03-02 00:14, Rodrigo Vivi wrote:
> On Wed, Mar 01, 2023 at 08:36:29AM -0800, Lucas De Marchi wrote:
>> On Tue, Feb 28, 2023 at 11:17:30AM +0100, Maarten Lankhorst wrote:
>>> xe_guc_ct_fast_path() is called from an irq context, and cannot lock
>>> the mutex used by xe_device_mem_access_ongoing().
>>>
>>> Fortunately it is easy to fix, and the atomic guarantees are good enough
>>> to ensure xe->mem_access.hold_rpm is set before last ref is dropped.
>>>
>>> As far as I can tell, the runtime ref in device access should be
>>> killable, but don't dare to do it yet.
>> I don't follow this last paragraph. Could you point it in the code?
> I also didn't understand this... if we remove that we will end up in
> memory access with the sleeping device...
I may understand the code wrong, but without error checking by the 
callers, and changing the prototype to return int is there any way this 
will be guaranteed to work regardless?
>>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>>> ---
>>> drivers/gpu/drm/xe/xe_device.c       | 17 ++++++++---------
>>> drivers/gpu/drm/xe/xe_device.h       | 14 ++++----------
>>> drivers/gpu/drm/xe/xe_device_types.h |  4 +---
>>> 3 files changed, 13 insertions(+), 22 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
>>> index 4eb6786b11f0..ab179b1e24c1 100644
>>> --- a/drivers/gpu/drm/xe/xe_device.c
>>> +++ b/drivers/gpu/drm/xe/xe_device.c
>>> @@ -237,7 +237,6 @@ struct xe_device *xe_device_create(struct pci_dev *pdev,
>>> 	if (err)
>>> 		goto err;
>>>
>>> -	mutex_init(&xe->mem_access.lock);
>>> 	return xe;
>>>
>>> err_put:
>>> @@ -424,25 +423,25 @@ u32 xe_device_ccs_bytes(struct xe_device *xe, u64 size)
>>> void xe_device_mem_access_get(struct xe_device *xe)
>>> {
>>> 	bool resumed = xe_pm_runtime_resume_if_suspended(xe);
>>> +	int ref = atomic_inc_return(&xe->mem_access.ref);
>>
>> +Matt Brost
>>
>> Any reason for not using kref?
> hmmm... my bad actually...
>
> I did considered the kref, but I can't remember why I haven't used it.
> I recently was asking myself the same question.

I looked at it, I don't think you can kref from 0 to 1 by design.

xe_device_mem_access_get() is usually called with force wake held 
explicitly or implicitly, so we shouldn't need the runtime pm ref there.

>> Lucas De Marchi
>>
>>> -	mutex_lock(&xe->mem_access.lock);
>>> -	if (xe->mem_access.ref++ == 0)
>>> +	if (ref == 1)
>>> 		xe->mem_access.hold_rpm = xe_pm_runtime_get_if_active(xe);
> hmmm... I'm afraid this can be tricky without locks...
>
> if we have 3 simultaneous threads calling this.
> get
> get
> put
> get
>
> and they happened in this order but the resume didn't finished yet
> on the first one, then you will:
> 1. end up the runtime pm twice.
> 2. the second will pass over thinking the gpu is already awake, but it might
> be still asleep.







>>> -	mutex_unlock(&xe->mem_access.lock);
>>>
>>> 	/* The usage counter increased if device was immediately resumed */
>>> 	if (resumed)
>>> 		xe_pm_runtime_put(xe);
>>>
>>> -	XE_WARN_ON(xe->mem_access.ref == S32_MAX);
>>> +	XE_WARN_ON(ref == S32_MAX);
>>> }
>>>
>>> void xe_device_mem_access_put(struct xe_device *xe)
>>> {
>>> -	mutex_lock(&xe->mem_access.lock);
>>> -	if (--xe->mem_access.ref == 0 && xe->mem_access.hold_rpm)
>>> +	bool hold = xe->mem_access.hold_rpm;
>>> +	int ref = atomic_dec_return(&xe->mem_access.ref);
>>> +
>>> +	if (!ref && hold)
>>> 		xe_pm_runtime_put(xe);
>>> -	mutex_unlock(&xe->mem_access.lock);
>>>
>>> -	XE_WARN_ON(xe->mem_access.ref < 0);
>>> +	XE_WARN_ON(ref < 0);
>>> }
>>> diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
>>> index 263620953c3b..96b4f3d7969e 100644
>>> --- a/drivers/gpu/drm/xe/xe_device.h
>>> +++ b/drivers/gpu/drm/xe/xe_device.h
>>> @@ -90,20 +90,14 @@ static inline struct xe_force_wake * gt_to_fw(struct xe_gt *gt)
>>> void xe_device_mem_access_get(struct xe_device *xe);
>>> void xe_device_mem_access_put(struct xe_device *xe);
>>>
>>> -static inline void xe_device_assert_mem_access(struct xe_device *xe)
>>> +static inline bool xe_device_mem_access_ongoing(struct xe_device *xe)
>>> {
>>> -	XE_WARN_ON(!xe->mem_access.ref);
>>> +	return atomic_read(&xe->mem_access.ref);
>>> }
>>>
>>> -static inline bool xe_device_mem_access_ongoing(struct xe_device *xe)
>>> +static inline void xe_device_assert_mem_access(struct xe_device *xe)
>>> {
>>> -	bool ret;
>>> -
>>> -	mutex_lock(&xe->mem_access.lock);
>>> -	ret = xe->mem_access.ref;
>>> -	mutex_unlock(&xe->mem_access.lock);
>>> -
>>> -	return ret;
>>> +	XE_WARN_ON(!xe_device_mem_access_ongoing(xe));
>>> }
>>>
>>> static inline bool xe_device_in_fault_mode(struct xe_device *xe)
>>> diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
>>> index 9743987fc883..0b8c4ee0ad48 100644
>>> --- a/drivers/gpu/drm/xe/xe_device_types.h
>>> +++ b/drivers/gpu/drm/xe/xe_device_types.h
>>> @@ -230,10 +230,8 @@ struct xe_device {
>>> 	 * triggering additional actions when they occur.
>>> 	 */
>>> 	struct {
>>> -		/** @lock: protect the ref count */
>>> -		struct mutex lock;
>>> 		/** @ref: ref count of memory accesses */
>>> -		s32 ref;
>>> +		atomic_t ref;
>>> 		/** @hold_rpm: need to put rpm ref back at the end */
>>> 		bool hold_rpm;
>>> 	} mem_access;
>>> -- 
>>> 2.34.1
>>>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Intel-xe] [PATCH] drm/xe: Use atomic instead of mutex for xe_device_mem_access_ongoing
  2023-03-02 11:26     ` Maarten Lankhorst
@ 2023-03-02 22:33       ` Rodrigo Vivi
  2023-03-10 21:05         ` Lucas De Marchi
  0 siblings, 1 reply; 10+ messages in thread
From: Rodrigo Vivi @ 2023-03-02 22:33 UTC (permalink / raw)
  To: Maarten Lankhorst; +Cc: Lucas De Marchi, intel-xe

On Thu, Mar 02, 2023 at 12:26:18PM +0100, Maarten Lankhorst wrote:
> 
> On 2023-03-02 00:14, Rodrigo Vivi wrote:
> > On Wed, Mar 01, 2023 at 08:36:29AM -0800, Lucas De Marchi wrote:
> > > On Tue, Feb 28, 2023 at 11:17:30AM +0100, Maarten Lankhorst wrote:
> > > > xe_guc_ct_fast_path() is called from an irq context, and cannot lock
> > > > the mutex used by xe_device_mem_access_ongoing().
> > > > 
> > > > Fortunately it is easy to fix, and the atomic guarantees are good enough
> > > > to ensure xe->mem_access.hold_rpm is set before last ref is dropped.
> > > > 
> > > > As far as I can tell, the runtime ref in device access should be
> > > > killable, but don't dare to do it yet.
> > > I don't follow this last paragraph. Could you point it in the code?
> > I also didn't understand this... if we remove that we will end up in
> > memory access with the sleeping device...
> I may understand the code wrong, but without error checking by the callers,
> and changing the prototype to return int is there any way this will be
> guaranteed to work regardless?
> > > > Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > > > ---
> > > > drivers/gpu/drm/xe/xe_device.c       | 17 ++++++++---------
> > > > drivers/gpu/drm/xe/xe_device.h       | 14 ++++----------
> > > > drivers/gpu/drm/xe/xe_device_types.h |  4 +---
> > > > 3 files changed, 13 insertions(+), 22 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> > > > index 4eb6786b11f0..ab179b1e24c1 100644
> > > > --- a/drivers/gpu/drm/xe/xe_device.c
> > > > +++ b/drivers/gpu/drm/xe/xe_device.c
> > > > @@ -237,7 +237,6 @@ struct xe_device *xe_device_create(struct pci_dev *pdev,
> > > > 	if (err)
> > > > 		goto err;
> > > > 
> > > > -	mutex_init(&xe->mem_access.lock);
> > > > 	return xe;
> > > > 
> > > > err_put:
> > > > @@ -424,25 +423,25 @@ u32 xe_device_ccs_bytes(struct xe_device *xe, u64 size)
> > > > void xe_device_mem_access_get(struct xe_device *xe)
> > > > {
> > > > 	bool resumed = xe_pm_runtime_resume_if_suspended(xe);
> > > > +	int ref = atomic_inc_return(&xe->mem_access.ref);
> > > 
> > > +Matt Brost
> > > 
> > > Any reason for not using kref?
> > hmmm... my bad actually...
> > 
> > I did considered the kref, but I can't remember why I haven't used it.
> > I recently was asking myself the same question.
> 
> I looked at it, I don't think you can kref from 0 to 1 by design.
> 
> xe_device_mem_access_get() is usually called with force wake held explicitly
> or implicitly, so we shouldn't need the runtime pm ref there.

But this is one of the main problems in i915 right now on why we can't
enable the D3Cold. If we have any possibility of memory transaction we
need to first wake up the device, then we need to proceed with the memory
accesses. Or the bridges might be sleeping and returning 0xfffff

so, the first one to take the very first mem_access ref will also ensure
that the device is awake before proceeding. This works now without any
error handling by the caller.

Another way to do would probably need to involve a queue for every
memory access or like i915 was going where there's a list...

I wonder if making this spin_lock isn't enough to fix the current issue...

But well, Thomas had some bad feelings about the whole idea of the
memory access handling anyway and was in favor of the i915 approach...

> 
> > > Lucas De Marchi
> > > 
> > > > -	mutex_lock(&xe->mem_access.lock);
> > > > -	if (xe->mem_access.ref++ == 0)
> > > > +	if (ref == 1)
> > > > 		xe->mem_access.hold_rpm = xe_pm_runtime_get_if_active(xe);
> > hmmm... I'm afraid this can be tricky without locks...
> > 
> > if we have 3 simultaneous threads calling this.
> > get
> > get
> > put
> > get
> > 
> > and they happened in this order but the resume didn't finished yet
> > on the first one, then you will:
> > 1. end up the runtime pm twice.
> > 2. the second will pass over thinking the gpu is already awake, but it might
> > be still asleep.
> 
> 
> 
> 
> 
> 
> 
> > > > -	mutex_unlock(&xe->mem_access.lock);
> > > > 
> > > > 	/* The usage counter increased if device was immediately resumed */
> > > > 	if (resumed)
> > > > 		xe_pm_runtime_put(xe);
> > > > 
> > > > -	XE_WARN_ON(xe->mem_access.ref == S32_MAX);
> > > > +	XE_WARN_ON(ref == S32_MAX);
> > > > }
> > > > 
> > > > void xe_device_mem_access_put(struct xe_device *xe)
> > > > {
> > > > -	mutex_lock(&xe->mem_access.lock);
> > > > -	if (--xe->mem_access.ref == 0 && xe->mem_access.hold_rpm)
> > > > +	bool hold = xe->mem_access.hold_rpm;
> > > > +	int ref = atomic_dec_return(&xe->mem_access.ref);
> > > > +
> > > > +	if (!ref && hold)
> > > > 		xe_pm_runtime_put(xe);
> > > > -	mutex_unlock(&xe->mem_access.lock);
> > > > 
> > > > -	XE_WARN_ON(xe->mem_access.ref < 0);
> > > > +	XE_WARN_ON(ref < 0);
> > > > }
> > > > diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
> > > > index 263620953c3b..96b4f3d7969e 100644
> > > > --- a/drivers/gpu/drm/xe/xe_device.h
> > > > +++ b/drivers/gpu/drm/xe/xe_device.h
> > > > @@ -90,20 +90,14 @@ static inline struct xe_force_wake * gt_to_fw(struct xe_gt *gt)
> > > > void xe_device_mem_access_get(struct xe_device *xe);
> > > > void xe_device_mem_access_put(struct xe_device *xe);
> > > > 
> > > > -static inline void xe_device_assert_mem_access(struct xe_device *xe)
> > > > +static inline bool xe_device_mem_access_ongoing(struct xe_device *xe)
> > > > {
> > > > -	XE_WARN_ON(!xe->mem_access.ref);
> > > > +	return atomic_read(&xe->mem_access.ref);
> > > > }
> > > > 
> > > > -static inline bool xe_device_mem_access_ongoing(struct xe_device *xe)
> > > > +static inline void xe_device_assert_mem_access(struct xe_device *xe)
> > > > {
> > > > -	bool ret;
> > > > -
> > > > -	mutex_lock(&xe->mem_access.lock);
> > > > -	ret = xe->mem_access.ref;
> > > > -	mutex_unlock(&xe->mem_access.lock);
> > > > -
> > > > -	return ret;
> > > > +	XE_WARN_ON(!xe_device_mem_access_ongoing(xe));
> > > > }
> > > > 
> > > > static inline bool xe_device_in_fault_mode(struct xe_device *xe)
> > > > diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
> > > > index 9743987fc883..0b8c4ee0ad48 100644
> > > > --- a/drivers/gpu/drm/xe/xe_device_types.h
> > > > +++ b/drivers/gpu/drm/xe/xe_device_types.h
> > > > @@ -230,10 +230,8 @@ struct xe_device {
> > > > 	 * triggering additional actions when they occur.
> > > > 	 */
> > > > 	struct {
> > > > -		/** @lock: protect the ref count */
> > > > -		struct mutex lock;
> > > > 		/** @ref: ref count of memory accesses */
> > > > -		s32 ref;
> > > > +		atomic_t ref;
> > > > 		/** @hold_rpm: need to put rpm ref back at the end */
> > > > 		bool hold_rpm;
> > > > 	} mem_access;
> > > > -- 
> > > > 2.34.1
> > > > 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Intel-xe] [PATCH] drm/xe: Use atomic instead of mutex for xe_device_mem_access_ongoing
  2023-03-02 22:33       ` Rodrigo Vivi
@ 2023-03-10 21:05         ` Lucas De Marchi
  0 siblings, 0 replies; 10+ messages in thread
From: Lucas De Marchi @ 2023-03-10 21:05 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: Maarten Lankhorst, intel-xe

On Thu, Mar 02, 2023 at 05:33:55PM -0500, Rodrigo Vivi wrote:
>On Thu, Mar 02, 2023 at 12:26:18PM +0100, Maarten Lankhorst wrote:
>>
>> On 2023-03-02 00:14, Rodrigo Vivi wrote:
>> > On Wed, Mar 01, 2023 at 08:36:29AM -0800, Lucas De Marchi wrote:
>> > > On Tue, Feb 28, 2023 at 11:17:30AM +0100, Maarten Lankhorst wrote:
>> > > > xe_guc_ct_fast_path() is called from an irq context, and cannot lock
>> > > > the mutex used by xe_device_mem_access_ongoing().
>> > > >
>> > > > Fortunately it is easy to fix, and the atomic guarantees are good enough
>> > > > to ensure xe->mem_access.hold_rpm is set before last ref is dropped.
>> > > >
>> > > > As far as I can tell, the runtime ref in device access should be
>> > > > killable, but don't dare to do it yet.
>> > > I don't follow this last paragraph. Could you point it in the code?
>> > I also didn't understand this... if we remove that we will end up in
>> > memory access with the sleeping device...
>> I may understand the code wrong, but without error checking by the callers,
>> and changing the prototype to return int is there any way this will be
>> guaranteed to work regardless?
>> > > > Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>> > > > ---
>> > > > drivers/gpu/drm/xe/xe_device.c       | 17 ++++++++---------
>> > > > drivers/gpu/drm/xe/xe_device.h       | 14 ++++----------
>> > > > drivers/gpu/drm/xe/xe_device_types.h |  4 +---
>> > > > 3 files changed, 13 insertions(+), 22 deletions(-)
>> > > >
>> > > > diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
>> > > > index 4eb6786b11f0..ab179b1e24c1 100644
>> > > > --- a/drivers/gpu/drm/xe/xe_device.c
>> > > > +++ b/drivers/gpu/drm/xe/xe_device.c
>> > > > @@ -237,7 +237,6 @@ struct xe_device *xe_device_create(struct pci_dev *pdev,
>> > > > 	if (err)
>> > > > 		goto err;
>> > > >
>> > > > -	mutex_init(&xe->mem_access.lock);
>> > > > 	return xe;
>> > > >
>> > > > err_put:
>> > > > @@ -424,25 +423,25 @@ u32 xe_device_ccs_bytes(struct xe_device *xe, u64 size)
>> > > > void xe_device_mem_access_get(struct xe_device *xe)
>> > > > {
>> > > > 	bool resumed = xe_pm_runtime_resume_if_suspended(xe);
>> > > > +	int ref = atomic_inc_return(&xe->mem_access.ref);
>> > >
>> > > +Matt Brost
>> > >
>> > > Any reason for not using kref?
>> > hmmm... my bad actually...
>> >
>> > I did considered the kref, but I can't remember why I haven't used it.
>> > I recently was asking myself the same question.
>>
>> I looked at it, I don't think you can kref from 0 to 1 by design.
>>
>> xe_device_mem_access_get() is usually called with force wake held explicitly
>> or implicitly, so we shouldn't need the runtime pm ref there.
>
>But this is one of the main problems in i915 right now on why we can't
>enable the D3Cold. If we have any possibility of memory transaction we
>need to first wake up the device, then we need to proceed with the memory
>accesses. Or the bridges might be sleeping and returning 0xfffff
>
>so, the first one to take the very first mem_access ref will also ensure
>that the device is awake before proceeding. This works now without any
>error handling by the caller.
>
>Another way to do would probably need to involve a queue for every
>memory access or like i915 was going where there's a list...
>
>I wonder if making this spin_lock isn't enough to fix the current issue...

 From what I can see it would be ok. However, how is this reproduced?
The only way I can see a lockdep complain is on a system with a PVC and
another card using the ast module.  In that system we have the following
splat:

1678477803.564451 ======================================================
1678477803.564641 WARNING: possible circular locking dependency detected
1678477803.564797 6.1.0-xe-demarchi+ #34 Not tainted
1678477803.564986 ------------------------------------------------------
1678477803.565220 kworker/28:1/316 is trying to acquire lock:
1678477803.565429 ffff8881ccf8ea48 (lock#7){+.+.}-{3:3}, at: xe_device_mem_access_get+0x2f/0xc0 [xe]
1678477803.565585 
                   but task is already holding lock:
1678477803.565762 ffff8881ccf83ed8 (&ct->lock){+.+.}-{3:3}, at: xe_guc_ct_send+0x2c/0x80 [xe]
1678477803.565921 
                   which lock already depends on the new lock.
1678477803.566149 
                   the existing dependency chain (in reverse order) is:
1678477803.566347 
                   -> #2 (&ct->lock){+.+.}-{3:3}:
1678477803.566505        lock_acquire+0x169/0x410
1678477804.447764        xe_guc_ct_init+0x19b/0x260 [xe]
1678477804.448706        xe_guc_init+0x88/0x570 [xe]
1678477804.449251        xe_uc_init+0x4c/0xc0 [xe]
1678477804.450514        xe_gt_init+0x1be/0x490 [xe]
1678477804.451031        xe_device_probe+0x2d8/0x350 [xe]
1678477804.451724        xe_pci_probe+0x770/0x8d0 [xe]
1678477804.452298        pci_device_probe+0xf9/0x200
1678477804.452974        really_probe+0x13b/0x530
1678477804.453751        __driver_probe_device+0xca/0x210
1678477804.454309        driver_probe_device+0x4a/0xf0
1678477804.455055        __driver_attach+0x135/0x290
1678477804.455744        bus_for_each_dev+0xf9/0x160
1678477804.456304        bus_add_driver+0x29c/0x2f0
1678477804.459887        driver_register+0x10d/0x1a0
1678477804.460599        do_one_initcall+0xce/0x3f0
1678477804.461137        do_init_module+0xe4/0x320
1678477804.465037        load_module+0x331e/0x3560
1678477804.465838        __do_sys_finit_module+0x10d/0x1b0
1678477804.466735        do_syscall_64+0x37/0x90
1678477804.467223        entry_SYSCALL_64_after_hwframe+0x63/0xcd
1678477804.468172 
                   -> #1 (fs_reclaim){+.+.}-{0:0}:
1678477804.468719        lock_acquire+0x169/0x410
1678477804.469372        fs_reclaim_acquire+0xca/0x100
1678477804.470024        __kmem_cache_alloc_node+0x4a/0x3b0
1678477804.470682        kmalloc_trace+0x26/0x60
1678477804.471246        _drm_do_get_edid+0xd3/0x400 [drm]
1678477804.471953        drm_get_edid+0x8d/0x100 [drm]
1678477804.472642        ast_vga_connector_helper_get_modes+0x55/0xb0 [ast]
1678477804.473231        drm_helper_probe_single_connector_modes+0x34b/0x850 [drm_kms_helper]
1678477804.473854        drm_client_modeset_probe+0x31a/0x1390 [drm]
1678477804.474494        __drm_fb_helper_initial_config_and_unlock+0xce/0x980 [drm_kms_helper]
1678477804.474998        drm_fbdev_client_hotplug+0x142/0x1a0 [drm_kms_helper]
1678477804.475577        drm_fbdev_generic_setup+0xbe/0x1e0 [drm_kms_helper]
1678477804.475987        ast_pci_probe+0xe9/0x100 [ast]
1678477804.476588        pci_device_probe+0xf9/0x200
1678477804.477076        really_probe+0x13b/0x530
1678477804.477485        __driver_probe_device+0xca/0x210
1678477804.545950        driver_probe_device+0x4a/0xf0
1678477804.546166        __driver_attach+0x135/0x290
1678477804.546320        bus_for_each_dev+0xf9/0x160
1678477804.546570        bus_add_driver+0x29c/0x2f0
1678477804.546717        driver_register+0x10d/0x1a0
1678477804.546868        do_one_initcall+0xce/0x3f0
1678477804.547007        do_init_module+0xe4/0x320
1678477804.547155        load_module+0x331e/0x3560
1678477804.547305        __do_sys_finit_module+0x10d/0x1b0
1678477804.547490        do_syscall_64+0x37/0x90
1678477804.547636        entry_SYSCALL_64_after_hwframe+0x63/0xcd
1678477804.547777 
                   -> #0 (lock#7){+.+.}-{3:3}:
1678477804.547932        check_prev_add+0xfc/0x1190
1678477804.548114        __lock_acquire+0x1e0a/0x2710
1678477804.548296        lock_acquire+0x169/0x410
1678477804.548484        __mutex_lock+0x164/0xe10
1678477804.548633        xe_device_mem_access_get+0x2f/0xc0 [xe]
1678477804.548787        guc_ct_send_locked+0x224/0xbb0 [xe]
1678477804.548935        xe_guc_ct_send+0x43/0x80 [xe]
1678477804.549112        __register_engine+0x13b/0x180 [xe]
1678477804.549267        guc_engine_run_job+0xf1c/0x1080 [xe]
1678477804.549471        drm_sched_main+0x3fc/0x9e0 [gpu_sched]
1678477804.549619        process_one_work+0x54a/0x9a0
1678477804.549830        worker_thread+0x8f/0x630
1678477804.549984        kthread+0x17b/0x1b0
1678477804.550135        ret_from_fork+0x1f/0x30


Note that #1 is actually coming from the probe of the ast driver.

Lucas De Marchi

>
>But well, Thomas had some bad feelings about the whole idea of the
>memory access handling anyway and was in favor of the i915 approach...
>
>>
>> > > Lucas De Marchi
>> > >
>> > > > -	mutex_lock(&xe->mem_access.lock);
>> > > > -	if (xe->mem_access.ref++ == 0)
>> > > > +	if (ref == 1)
>> > > > 		xe->mem_access.hold_rpm = xe_pm_runtime_get_if_active(xe);
>> > hmmm... I'm afraid this can be tricky without locks...
>> >
>> > if we have 3 simultaneous threads calling this.
>> > get
>> > get
>> > put
>> > get
>> >
>> > and they happened in this order but the resume didn't finished yet
>> > on the first one, then you will:
>> > 1. end up the runtime pm twice.
>> > 2. the second will pass over thinking the gpu is already awake, but it might
>> > be still asleep.
>>
>>
>>
>>
>>
>>
>>
>> > > > -	mutex_unlock(&xe->mem_access.lock);
>> > > >
>> > > > 	/* The usage counter increased if device was immediately resumed */
>> > > > 	if (resumed)
>> > > > 		xe_pm_runtime_put(xe);
>> > > >
>> > > > -	XE_WARN_ON(xe->mem_access.ref == S32_MAX);
>> > > > +	XE_WARN_ON(ref == S32_MAX);
>> > > > }
>> > > >
>> > > > void xe_device_mem_access_put(struct xe_device *xe)
>> > > > {
>> > > > -	mutex_lock(&xe->mem_access.lock);
>> > > > -	if (--xe->mem_access.ref == 0 && xe->mem_access.hold_rpm)
>> > > > +	bool hold = xe->mem_access.hold_rpm;
>> > > > +	int ref = atomic_dec_return(&xe->mem_access.ref);
>> > > > +
>> > > > +	if (!ref && hold)
>> > > > 		xe_pm_runtime_put(xe);
>> > > > -	mutex_unlock(&xe->mem_access.lock);
>> > > >
>> > > > -	XE_WARN_ON(xe->mem_access.ref < 0);
>> > > > +	XE_WARN_ON(ref < 0);
>> > > > }
>> > > > diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
>> > > > index 263620953c3b..96b4f3d7969e 100644
>> > > > --- a/drivers/gpu/drm/xe/xe_device.h
>> > > > +++ b/drivers/gpu/drm/xe/xe_device.h
>> > > > @@ -90,20 +90,14 @@ static inline struct xe_force_wake * gt_to_fw(struct xe_gt *gt)
>> > > > void xe_device_mem_access_get(struct xe_device *xe);
>> > > > void xe_device_mem_access_put(struct xe_device *xe);
>> > > >
>> > > > -static inline void xe_device_assert_mem_access(struct xe_device *xe)
>> > > > +static inline bool xe_device_mem_access_ongoing(struct xe_device *xe)
>> > > > {
>> > > > -	XE_WARN_ON(!xe->mem_access.ref);
>> > > > +	return atomic_read(&xe->mem_access.ref);
>> > > > }
>> > > >
>> > > > -static inline bool xe_device_mem_access_ongoing(struct xe_device *xe)
>> > > > +static inline void xe_device_assert_mem_access(struct xe_device *xe)
>> > > > {
>> > > > -	bool ret;
>> > > > -
>> > > > -	mutex_lock(&xe->mem_access.lock);
>> > > > -	ret = xe->mem_access.ref;
>> > > > -	mutex_unlock(&xe->mem_access.lock);
>> > > > -
>> > > > -	return ret;
>> > > > +	XE_WARN_ON(!xe_device_mem_access_ongoing(xe));
>> > > > }
>> > > >
>> > > > static inline bool xe_device_in_fault_mode(struct xe_device *xe)
>> > > > diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
>> > > > index 9743987fc883..0b8c4ee0ad48 100644
>> > > > --- a/drivers/gpu/drm/xe/xe_device_types.h
>> > > > +++ b/drivers/gpu/drm/xe/xe_device_types.h
>> > > > @@ -230,10 +230,8 @@ struct xe_device {
>> > > > 	 * triggering additional actions when they occur.
>> > > > 	 */
>> > > > 	struct {
>> > > > -		/** @lock: protect the ref count */
>> > > > -		struct mutex lock;
>> > > > 		/** @ref: ref count of memory accesses */
>> > > > -		s32 ref;
>> > > > +		atomic_t ref;
>> > > > 		/** @hold_rpm: need to put rpm ref back at the end */
>> > > > 		bool hold_rpm;
>> > > > 	} mem_access;
>> > > > --
>> > > > 2.34.1
>> > > >

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Intel-xe] [PATCH] drm/xe: Use atomic instead of mutex for xe_device_mem_access_ongoing
  2023-02-28 10:17 [Intel-xe] [PATCH] drm/xe: Use atomic instead of mutex for xe_device_mem_access_ongoing Maarten Lankhorst
  2023-03-01 16:36 ` Lucas De Marchi
@ 2023-03-17 23:40 ` Matthew Brost
  2023-03-21 14:39 ` Matthew Brost
  2 siblings, 0 replies; 10+ messages in thread
From: Matthew Brost @ 2023-03-17 23:40 UTC (permalink / raw)
  To: Maarten Lankhorst; +Cc: intel-xe

On Tue, Feb 28, 2023 at 11:17:30AM +0100, Maarten Lankhorst wrote:
> xe_guc_ct_fast_path() is called from an irq context, and cannot lock
> the mutex used by xe_device_mem_access_ongoing().
> 
> Fortunately it is easy to fix, and the atomic guarantees are good enough
> to ensure xe->mem_access.hold_rpm is set before last ref is dropped.
> 
> As far as I can tell, the runtime ref in device access should be
> killable, but don't dare to do it yet.
> 
> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

I think this a better fix:
https://patchwork.freedesktop.org/series/115301/

Matt

> ---
>  drivers/gpu/drm/xe/xe_device.c       | 17 ++++++++---------
>  drivers/gpu/drm/xe/xe_device.h       | 14 ++++----------
>  drivers/gpu/drm/xe/xe_device_types.h |  4 +---
>  3 files changed, 13 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> index 4eb6786b11f0..ab179b1e24c1 100644
> --- a/drivers/gpu/drm/xe/xe_device.c
> +++ b/drivers/gpu/drm/xe/xe_device.c
> @@ -237,7 +237,6 @@ struct xe_device *xe_device_create(struct pci_dev *pdev,
>  	if (err)
>  		goto err;
>  
> -	mutex_init(&xe->mem_access.lock);
>  	return xe;
>  
>  err_put:
> @@ -424,25 +423,25 @@ u32 xe_device_ccs_bytes(struct xe_device *xe, u64 size)
>  void xe_device_mem_access_get(struct xe_device *xe)
>  {
>  	bool resumed = xe_pm_runtime_resume_if_suspended(xe);
> +	int ref = atomic_inc_return(&xe->mem_access.ref);
>  
> -	mutex_lock(&xe->mem_access.lock);
> -	if (xe->mem_access.ref++ == 0)
> +	if (ref == 1)
>  		xe->mem_access.hold_rpm = xe_pm_runtime_get_if_active(xe);
> -	mutex_unlock(&xe->mem_access.lock);
>  
>  	/* The usage counter increased if device was immediately resumed */
>  	if (resumed)
>  		xe_pm_runtime_put(xe);
>  
> -	XE_WARN_ON(xe->mem_access.ref == S32_MAX);
> +	XE_WARN_ON(ref == S32_MAX);
>  }
>  
>  void xe_device_mem_access_put(struct xe_device *xe)
>  {
> -	mutex_lock(&xe->mem_access.lock);
> -	if (--xe->mem_access.ref == 0 && xe->mem_access.hold_rpm)
> +	bool hold = xe->mem_access.hold_rpm;
> +	int ref = atomic_dec_return(&xe->mem_access.ref);
> +
> +	if (!ref && hold)
>  		xe_pm_runtime_put(xe);
> -	mutex_unlock(&xe->mem_access.lock);
>  
> -	XE_WARN_ON(xe->mem_access.ref < 0);
> +	XE_WARN_ON(ref < 0);
>  }
> diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
> index 263620953c3b..96b4f3d7969e 100644
> --- a/drivers/gpu/drm/xe/xe_device.h
> +++ b/drivers/gpu/drm/xe/xe_device.h
> @@ -90,20 +90,14 @@ static inline struct xe_force_wake * gt_to_fw(struct xe_gt *gt)
>  void xe_device_mem_access_get(struct xe_device *xe);
>  void xe_device_mem_access_put(struct xe_device *xe);
>  
> -static inline void xe_device_assert_mem_access(struct xe_device *xe)
> +static inline bool xe_device_mem_access_ongoing(struct xe_device *xe)
>  {
> -	XE_WARN_ON(!xe->mem_access.ref);
> +	return atomic_read(&xe->mem_access.ref);
>  }
>  
> -static inline bool xe_device_mem_access_ongoing(struct xe_device *xe)
> +static inline void xe_device_assert_mem_access(struct xe_device *xe)
>  {
> -	bool ret;
> -
> -	mutex_lock(&xe->mem_access.lock);
> -	ret = xe->mem_access.ref;
> -	mutex_unlock(&xe->mem_access.lock);
> -
> -	return ret;
> +	XE_WARN_ON(!xe_device_mem_access_ongoing(xe));
>  }
>  
>  static inline bool xe_device_in_fault_mode(struct xe_device *xe)
> diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
> index 9743987fc883..0b8c4ee0ad48 100644
> --- a/drivers/gpu/drm/xe/xe_device_types.h
> +++ b/drivers/gpu/drm/xe/xe_device_types.h
> @@ -230,10 +230,8 @@ struct xe_device {
>  	 * triggering additional actions when they occur.
>  	 */
>  	struct {
> -		/** @lock: protect the ref count */
> -		struct mutex lock;
>  		/** @ref: ref count of memory accesses */
> -		s32 ref;
> +		atomic_t ref;
>  		/** @hold_rpm: need to put rpm ref back at the end */
>  		bool hold_rpm;
>  	} mem_access;
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Intel-xe] [PATCH] drm/xe: Use atomic instead of mutex for xe_device_mem_access_ongoing
  2023-02-28 10:17 [Intel-xe] [PATCH] drm/xe: Use atomic instead of mutex for xe_device_mem_access_ongoing Maarten Lankhorst
  2023-03-01 16:36 ` Lucas De Marchi
  2023-03-17 23:40 ` Matthew Brost
@ 2023-03-21 14:39 ` Matthew Brost
  2023-03-21 21:16   ` Rodrigo Vivi
  2 siblings, 1 reply; 10+ messages in thread
From: Matthew Brost @ 2023-03-21 14:39 UTC (permalink / raw)
  To: Maarten Lankhorst; +Cc: intel-xe

On Tue, Feb 28, 2023 at 11:17:30AM +0100, Maarten Lankhorst wrote:
> xe_guc_ct_fast_path() is called from an irq context, and cannot lock
> the mutex used by xe_device_mem_access_ongoing().
> 
> Fortunately it is easy to fix, and the atomic guarantees are good enough
> to ensure xe->mem_access.hold_rpm is set before last ref is dropped.
> 
> As far as I can tell, the runtime ref in device access should be
> killable, but don't dare to do it yet.
> 
> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  drivers/gpu/drm/xe/xe_device.c       | 17 ++++++++---------
>  drivers/gpu/drm/xe/xe_device.h       | 14 ++++----------
>  drivers/gpu/drm/xe/xe_device_types.h |  4 +---
>  3 files changed, 13 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> index 4eb6786b11f0..ab179b1e24c1 100644
> --- a/drivers/gpu/drm/xe/xe_device.c
> +++ b/drivers/gpu/drm/xe/xe_device.c
> @@ -237,7 +237,6 @@ struct xe_device *xe_device_create(struct pci_dev *pdev,
>  	if (err)
>  		goto err;
>  
> -	mutex_init(&xe->mem_access.lock);
>  	return xe;
>  
>  err_put:
> @@ -424,25 +423,25 @@ u32 xe_device_ccs_bytes(struct xe_device *xe, u64 size)
>  void xe_device_mem_access_get(struct xe_device *xe)
>  {
>  	bool resumed = xe_pm_runtime_resume_if_suspended(xe);
> +	int ref = atomic_inc_return(&xe->mem_access.ref);
>  
> -	mutex_lock(&xe->mem_access.lock);
> -	if (xe->mem_access.ref++ == 0)
> +	if (ref == 1)
>  		xe->mem_access.hold_rpm = xe_pm_runtime_get_if_active(xe);
> -	mutex_unlock(&xe->mem_access.lock);
>  
>  	/* The usage counter increased if device was immediately resumed */
>  	if (resumed)
>  		xe_pm_runtime_put(xe);
>  
> -	XE_WARN_ON(xe->mem_access.ref == S32_MAX);
> +	XE_WARN_ON(ref == S32_MAX);
>  }
>  
>  void xe_device_mem_access_put(struct xe_device *xe)
>  {
> -	mutex_lock(&xe->mem_access.lock);
> -	if (--xe->mem_access.ref == 0 && xe->mem_access.hold_rpm)
> +	bool hold = xe->mem_access.hold_rpm;
> +	int ref = atomic_dec_return(&xe->mem_access.ref);
> +
> +	if (!ref && hold)
>  		xe_pm_runtime_put(xe);
> -	mutex_unlock(&xe->mem_access.lock);
>  
> -	XE_WARN_ON(xe->mem_access.ref < 0);
> +	XE_WARN_ON(ref < 0);
>  }
> diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
> index 263620953c3b..96b4f3d7969e 100644
> --- a/drivers/gpu/drm/xe/xe_device.h
> +++ b/drivers/gpu/drm/xe/xe_device.h
> @@ -90,20 +90,14 @@ static inline struct xe_force_wake * gt_to_fw(struct xe_gt *gt)
>  void xe_device_mem_access_get(struct xe_device *xe);
>  void xe_device_mem_access_put(struct xe_device *xe);
>  
> -static inline void xe_device_assert_mem_access(struct xe_device *xe)
> +static inline bool xe_device_mem_access_ongoing(struct xe_device *xe)
>  {
> -	XE_WARN_ON(!xe->mem_access.ref);
> +	return atomic_read(&xe->mem_access.ref);
>  }
>  
> -static inline bool xe_device_mem_access_ongoing(struct xe_device *xe)
> +static inline void xe_device_assert_mem_access(struct xe_device *xe)
>  {
> -	bool ret;
> -
> -	mutex_lock(&xe->mem_access.lock);
> -	ret = xe->mem_access.ref;
> -	mutex_unlock(&xe->mem_access.lock);
> -
> -	return ret;
> +	XE_WARN_ON(!xe_device_mem_access_ongoing(xe));
>  }
>  
>  static inline bool xe_device_in_fault_mode(struct xe_device *xe)
> diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
> index 9743987fc883..0b8c4ee0ad48 100644
> --- a/drivers/gpu/drm/xe/xe_device_types.h
> +++ b/drivers/gpu/drm/xe/xe_device_types.h
> @@ -230,10 +230,8 @@ struct xe_device {
>  	 * triggering additional actions when they occur.
>  	 */
>  	struct {
> -		/** @lock: protect the ref count */
> -		struct mutex lock;
>  		/** @ref: ref count of memory accesses */
> -		s32 ref;
> +		atomic_t ref;
>  		/** @hold_rpm: need to put rpm ref back at the end */
>  		bool hold_rpm;
>  	} mem_access;
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Intel-xe] [PATCH] drm/xe: Use atomic instead of mutex for xe_device_mem_access_ongoing
  2023-03-21 14:39 ` Matthew Brost
@ 2023-03-21 21:16   ` Rodrigo Vivi
  2023-03-22 12:44     ` Maarten Lankhorst
  0 siblings, 1 reply; 10+ messages in thread
From: Rodrigo Vivi @ 2023-03-21 21:16 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

On Tue, Mar 21, 2023 at 02:39:04PM +0000, Matthew Brost wrote:
> On Tue, Feb 28, 2023 at 11:17:30AM +0100, Maarten Lankhorst wrote:
> > xe_guc_ct_fast_path() is called from an irq context, and cannot lock
> > the mutex used by xe_device_mem_access_ongoing().
> > 
> > Fortunately it is easy to fix, and the atomic guarantees are good enough
> > to ensure xe->mem_access.hold_rpm is set before last ref is dropped.
> > 
> > As far as I can tell, the runtime ref in device access should be
> > killable, but don't dare to do it yet.
> > 
> > Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> 
> Reviewed-by: Matthew Brost <matthew.brost@intel.com>


Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
so we can unblock PVC.

but we might re-thing the mem_access in general and consider
some serialization like 'foo_' example at Documentation/power/runtime_pm.txt

> 
> > ---
> >  drivers/gpu/drm/xe/xe_device.c       | 17 ++++++++---------
> >  drivers/gpu/drm/xe/xe_device.h       | 14 ++++----------
> >  drivers/gpu/drm/xe/xe_device_types.h |  4 +---
> >  3 files changed, 13 insertions(+), 22 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> > index 4eb6786b11f0..ab179b1e24c1 100644
> > --- a/drivers/gpu/drm/xe/xe_device.c
> > +++ b/drivers/gpu/drm/xe/xe_device.c
> > @@ -237,7 +237,6 @@ struct xe_device *xe_device_create(struct pci_dev *pdev,
> >  	if (err)
> >  		goto err;
> >  
> > -	mutex_init(&xe->mem_access.lock);
> >  	return xe;
> >  
> >  err_put:
> > @@ -424,25 +423,25 @@ u32 xe_device_ccs_bytes(struct xe_device *xe, u64 size)
> >  void xe_device_mem_access_get(struct xe_device *xe)
> >  {
> >  	bool resumed = xe_pm_runtime_resume_if_suspended(xe);
> > +	int ref = atomic_inc_return(&xe->mem_access.ref);
> >  
> > -	mutex_lock(&xe->mem_access.lock);
> > -	if (xe->mem_access.ref++ == 0)
> > +	if (ref == 1)
> >  		xe->mem_access.hold_rpm = xe_pm_runtime_get_if_active(xe);
> > -	mutex_unlock(&xe->mem_access.lock);
> >  
> >  	/* The usage counter increased if device was immediately resumed */
> >  	if (resumed)
> >  		xe_pm_runtime_put(xe);
> >  
> > -	XE_WARN_ON(xe->mem_access.ref == S32_MAX);
> > +	XE_WARN_ON(ref == S32_MAX);
> >  }
> >  
> >  void xe_device_mem_access_put(struct xe_device *xe)
> >  {
> > -	mutex_lock(&xe->mem_access.lock);
> > -	if (--xe->mem_access.ref == 0 && xe->mem_access.hold_rpm)
> > +	bool hold = xe->mem_access.hold_rpm;
> > +	int ref = atomic_dec_return(&xe->mem_access.ref);
> > +
> > +	if (!ref && hold)
> >  		xe_pm_runtime_put(xe);
> > -	mutex_unlock(&xe->mem_access.lock);
> >  
> > -	XE_WARN_ON(xe->mem_access.ref < 0);
> > +	XE_WARN_ON(ref < 0);
> >  }
> > diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
> > index 263620953c3b..96b4f3d7969e 100644
> > --- a/drivers/gpu/drm/xe/xe_device.h
> > +++ b/drivers/gpu/drm/xe/xe_device.h
> > @@ -90,20 +90,14 @@ static inline struct xe_force_wake * gt_to_fw(struct xe_gt *gt)
> >  void xe_device_mem_access_get(struct xe_device *xe);
> >  void xe_device_mem_access_put(struct xe_device *xe);
> >  
> > -static inline void xe_device_assert_mem_access(struct xe_device *xe)
> > +static inline bool xe_device_mem_access_ongoing(struct xe_device *xe)
> >  {
> > -	XE_WARN_ON(!xe->mem_access.ref);
> > +	return atomic_read(&xe->mem_access.ref);
> >  }
> >  
> > -static inline bool xe_device_mem_access_ongoing(struct xe_device *xe)
> > +static inline void xe_device_assert_mem_access(struct xe_device *xe)
> >  {
> > -	bool ret;
> > -
> > -	mutex_lock(&xe->mem_access.lock);
> > -	ret = xe->mem_access.ref;
> > -	mutex_unlock(&xe->mem_access.lock);
> > -
> > -	return ret;
> > +	XE_WARN_ON(!xe_device_mem_access_ongoing(xe));
> >  }
> >  
> >  static inline bool xe_device_in_fault_mode(struct xe_device *xe)
> > diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
> > index 9743987fc883..0b8c4ee0ad48 100644
> > --- a/drivers/gpu/drm/xe/xe_device_types.h
> > +++ b/drivers/gpu/drm/xe/xe_device_types.h
> > @@ -230,10 +230,8 @@ struct xe_device {
> >  	 * triggering additional actions when they occur.
> >  	 */
> >  	struct {
> > -		/** @lock: protect the ref count */
> > -		struct mutex lock;
> >  		/** @ref: ref count of memory accesses */
> > -		s32 ref;
> > +		atomic_t ref;
> >  		/** @hold_rpm: need to put rpm ref back at the end */
> >  		bool hold_rpm;
> >  	} mem_access;
> > -- 
> > 2.34.1
> > 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Intel-xe] [PATCH] drm/xe: Use atomic instead of mutex for xe_device_mem_access_ongoing
  2023-03-21 21:16   ` Rodrigo Vivi
@ 2023-03-22 12:44     ` Maarten Lankhorst
  0 siblings, 0 replies; 10+ messages in thread
From: Maarten Lankhorst @ 2023-03-22 12:44 UTC (permalink / raw)
  To: Rodrigo Vivi, Matthew Brost; +Cc: intel-xe


On 2023-03-21 22:16, Rodrigo Vivi wrote:
> On Tue, Mar 21, 2023 at 02:39:04PM +0000, Matthew Brost wrote:
>> On Tue, Feb 28, 2023 at 11:17:30AM +0100, Maarten Lankhorst wrote:
>>> xe_guc_ct_fast_path() is called from an irq context, and cannot lock
>>> the mutex used by xe_device_mem_access_ongoing().
>>>
>>> Fortunately it is easy to fix, and the atomic guarantees are good enough
>>> to ensure xe->mem_access.hold_rpm is set before last ref is dropped.
>>>
>>> As far as I can tell, the runtime ref in device access should be
>>> killable, but don't dare to do it yet.
>>>
>>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>> Reviewed-by: Matthew Brost <matthew.brost@intel.com>
>
> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> so we can unblock PVC.
>
> but we might re-thing the mem_access in general and consider
> some serialization like 'foo_' example at Documentation/power/runtime_pm.txt

Thanks, pushed.

We will probably need to do something like that on the longer term.

~Maarten


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-03-22 12:44 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-28 10:17 [Intel-xe] [PATCH] drm/xe: Use atomic instead of mutex for xe_device_mem_access_ongoing Maarten Lankhorst
2023-03-01 16:36 ` Lucas De Marchi
2023-03-01 23:14   ` Rodrigo Vivi
2023-03-02 11:26     ` Maarten Lankhorst
2023-03-02 22:33       ` Rodrigo Vivi
2023-03-10 21:05         ` Lucas De Marchi
2023-03-17 23:40 ` Matthew Brost
2023-03-21 14:39 ` Matthew Brost
2023-03-21 21:16   ` Rodrigo Vivi
2023-03-22 12:44     ` Maarten Lankhorst

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.