All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: Matthew Auld <matthew.auld@intel.com>,
	Rodrigo Vivi <rodrigo.vivi@kernel.org>
Cc: intel-xe@lists.freedesktop.org, Rodrigo Vivi <rodrigo.vivi@intel.com>
Subject: Re: [Intel-xe] [PATCH 2/3] drm/xe: fix xe_device_mem_access_get() race
Date: Mon, 15 May 2023 11:22:22 +0200	[thread overview]
Message-ID: <0bbe5a45-e908-bbf0-a553-d9f09a9ab4f4@linux.intel.com> (raw)
In-Reply-To: <09216a54-1a54-9300-2bc4-57e330da37bd@intel.com>


On 5/15/23 11:04, Matthew Auld wrote:
> On 15/05/2023 09:44, Thomas Hellström wrote:
>> Hi, Matthew,
>>
>> On 5/12/23 17:56, Matthew Auld wrote:
>>> On 12/05/2023 15:32, Rodrigo Vivi wrote:
>>>> On Thu, May 11, 2023 at 12:43:36PM +0100, Matthew Auld wrote:
>>>>> On 05/05/2023 17:44, Rodrigo Vivi wrote:
>>>>>> On Fri, May 05, 2023 at 04:38:53PM +0100, Matthew Auld wrote:
>>>>>>> It looks like there is at least one race here, given that the
>>>>>>> pm_runtime_suspended() check looks to return false if we are in the
>>>>>>> process of suspending the device (RPM_SUSPENDING vs 
>>>>>>> RPM_SUSPENDED). We
>>>>>>> later also do xe_pm_runtime_get_if_active(), but since the 
>>>>>>> device is
>>>>>>> suspending or has now suspended, this doesn't do anything either.
>>>>>>> Following from this we can potentially return from
>>>>>>> xe_device_mem_access_get() with the device suspended or about to 
>>>>>>> be,
>>>>>>> leading to broken behaviour.
>>>>>>>
>>>>>>> Attempt to fix this by always grabbing the runtime ref when our 
>>>>>>> internal
>>>>>>> ref transitions from 0 -> 1, and then wrap the whole thing with 
>>>>>>> a lock
>>>>>>> to ensure callers are serialized.
>>>>>>>
>>>>>>> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/258
>>>>>>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>>>>>>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>>>>>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>>>>>> Cc: Matthew Brost <matthew.brost@intel.com>
>>>>>>> ---
>>>>>>>    drivers/gpu/drm/xe/xe_device.c       | 11 +++--------
>>>>>>>    drivers/gpu/drm/xe/xe_device_types.h |  5 ++++-
>>>>>>>    drivers/gpu/drm/xe/xe_pm.c           |  9 ++-------
>>>>>>>    drivers/gpu/drm/xe/xe_pm.h           |  2 +-
>>>>>>>    4 files changed, 10 insertions(+), 17 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/xe/xe_device.c 
>>>>>>> b/drivers/gpu/drm/xe/xe_device.c
>>>>>>> index 01c497bcf9a5..0a18b41a0e1a 100644
>>>>>>> --- a/drivers/gpu/drm/xe/xe_device.c
>>>>>>> +++ b/drivers/gpu/drm/xe/xe_device.c
>>>>>>> @@ -406,17 +406,12 @@ u32 xe_device_ccs_bytes(struct xe_device 
>>>>>>> *xe, u64 size)
>>>>>>>    void xe_device_mem_access_get(struct xe_device *xe)
>>>>>>>    {
>>>>>>> -    bool resumed = xe_pm_runtime_resume_if_suspended(xe);
>>>>>>> -
>>>>>>>        mutex_lock(&xe->mem_access.lock);
>>>>>>> -    if (xe->mem_access.ref++ == 0)
>>>>>>> -        xe->mem_access.hold_rpm = xe_pm_runtime_get_if_active(xe);
>>>>>>> +    if (xe->mem_access.ref == 0)
>>>>>>> +        xe->mem_access.hold_rpm = 
>>>>>>> xe_pm_runtime_resume_and_get(xe);
>>>>>>> +    xe->mem_access.ref++;
>>>>>>
>>>>>> my memory is good to the point that I have tried this before...
>>>>>> but not good enough to remember the issues that I got with this
>>>>>> approach :(
>>>>>
>>>>> Ok, it seems one big issue is around xe_pm_runtime_resume() et al. 
>>>>> The lock
>>>>> fixes all the races, but xe_pm_runtime_resume() seems to call
>>>>> xe_device_mem_access_{get,put}() in loads of places AFAICT, but 
>>>>> that is ofc
>>>>> going to deadlock if we introduce a lock, since we are inside the 
>>>>> callback.
>>>>> But I think even without that lock it will still deadlock, since the
>>>>> runtime_pm code will see that we are PM_RESUMING and wait for 
>>>>> itself. I'm
>>>>> guessing that explains why we had the conditional 
>>>>> pm_runtime_suspended() and
>>>>> if_active(), since that prevents triggering the runtime_pm from our
>>>>> callbacks (we will either be PM_SUSPENDING or PM_RESUMING),
>>>>
>>>> Yes, that was the goal.
>>>>
>>>>> but then we are
>>>>> ofc left with all the nasty races.
>>>>
>>>> :(
>>>>
>>>>> Any ideas? It seems like the
>>>>> resume/suspend callbacks should fundamentally never be calling
>>>>> xe_device_mem_access_{get,put}()?
>>>>
>>>> We probably need something like the pseudo code in the end of
>>>> Documentation/power/runtime_pm.rst
>>>
>>> IIUC that looks like a template for where you have lots of incoming 
>>> IO requests, and you want to process them in an async manner. But to 
>>> process the request you need to also keep the device awake and 
>>> ensure it has resumed, so here it makes sense to use the RPM_ASYNC 
>>> runtime_get. And then in the ->resume() callback it then has a 
>>> natural place to at some later point process the outstanding IO 
>>> requests.
>>>
>>> But for xe_device_mem_access_get() I don't think it can be async, 
>>> since we need to ensure the ->resume() callback has already 
>>> completed before returning to the caller. From the pov of 
>>> xe_device_mem_access_get() it doesn't know if it's being called from 
>>> ->resume() or some other normal path and yet it seems like it 
>>> somehow needs to differentiate them. I feel like either the 
>>> ->resume() should never have been allowed to call it in the first 
>>> place, or there needs to be some token that always gets passed around.
>>>
>>> But maybe I'm misunderstanding something here, or at least I'm not 
>>> currently seeing the connection with the pseudo code? Can you share 
>>> more?
>>
>> Some fly-by ideas here:
>>
>> With the variant we briefly discussed before, to only lock during 0-1 
>> and 1-0 transitions, using atomic_inc_not_zero() and 
>> atomic_add_unless() then the get() problem goes out of the way, but 
>> any recursive get() and put() during put()->suspend will then cause 
>> similar problems?
>
> Yeah, I was thinking something like that, but AFAICT the runtime_pm 
> code will see that it is already in a PM_RESUMING state and wait for 
> itself to exit from that state, which also deadlocks.

Ah, ok, yes everything except the recursive calls from resume() needs to 
wait for PM_RESUMING to work correctly?

>
>>
>> While a bit hacky, that could probably be solved having an
>>
>> xe_device::suspending_task set and cleared by the suspending task and 
>> then use
>>
>> if (xe->suspending_task == current)
>
> I didn't think of that tbh. Does seem hacky, but I think should work, 
> and perhaps fine for now?
>
Yes, I guess it has to be, at least until we come up with something better.

/Thomas


>>
>> to skip pm calls during suspend. Assuming of course that all 
>> suspend() task for a single device are never run concurrently.
>
> Yes, AFAICT only one resume or suspend callback can be running for a 
> device, until we exit from PM_SUSPENDING or PM_RESUMING.
>
>>
>> /Thomas
>>
>>>
>>>>
>>>>>
>>>>>>
>>>>>>> mutex_unlock(&xe->mem_access.lock);
>>>>>>> -    /* The usage counter increased if device was immediately 
>>>>>>> resumed */
>>>>>>> -    if (resumed)
>>>>>>> -        xe_pm_runtime_put(xe);
>>>>>>> -
>>>>>>>        XE_WARN_ON(xe->mem_access.ref == S32_MAX);
>>>>>>>    }
>>>>>>> diff --git a/drivers/gpu/drm/xe/xe_device_types.h 
>>>>>>> b/drivers/gpu/drm/xe/xe_device_types.h
>>>>>>> index 59462933f67a..9e37189d5745 100644
>>>>>>> --- a/drivers/gpu/drm/xe/xe_device_types.h
>>>>>>> +++ b/drivers/gpu/drm/xe/xe_device_types.h
>>>>>>> @@ -256,7 +256,10 @@ struct xe_device {
>>>>>>>         * triggering additional actions when they occur.
>>>>>>>         */
>>>>>>>        struct {
>>>>>>> -        /** @lock: protect the ref count */
>>>>>>> +        /**
>>>>>>> +         * @lock: Serialize xe_device_mem_access users,
>>>>>>> +         * and protect the below internal state, like @ref.
>>>>>>> +         */
>>>>>>>            struct mutex lock;
>>>>>>>            /** @ref: ref count of memory accesses */
>>>>>>>            s32 ref;
>>>>>>> diff --git a/drivers/gpu/drm/xe/xe_pm.c 
>>>>>>> b/drivers/gpu/drm/xe/xe_pm.c
>>>>>>> index b7b57f10ba25..b2ffa001e6f7 100644
>>>>>>> --- a/drivers/gpu/drm/xe/xe_pm.c
>>>>>>> +++ b/drivers/gpu/drm/xe/xe_pm.c
>>>>>>> @@ -210,14 +210,9 @@ int xe_pm_runtime_put(struct xe_device *xe)
>>>>>>>        return pm_runtime_put_autosuspend(xe->drm.dev);
>>>>>>>    }
>>>>>>> -/* Return true if resume operation happened and usage count was 
>>>>>>> increased */
>>>>>>> -bool xe_pm_runtime_resume_if_suspended(struct xe_device *xe)
>>>>>>> +bool xe_pm_runtime_resume_and_get(struct xe_device *xe)
>>>>>>>    {
>>>>>>> -    /* In case we are suspended we need to immediately wake up */
>>>>>>> -    if (pm_runtime_suspended(xe->drm.dev))
>>>>>>> -        return !pm_runtime_resume_and_get(xe->drm.dev);
>>>>>>> -
>>>>>>> -    return false;
>>>>>>> +    return !pm_runtime_resume_and_get(xe->drm.dev);
>>>>>>
>>>>>> now with similar name I feel strange that we are not aligned with 
>>>>>> their
>>>>>> return. Although I prefer our one...
>>>>>>
>>>>>> Anyway, the code is right... if you are testing and it is working 
>>>>>> well
>>>>>> let's move with this.
>>>>>>
>>>>>> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>>>>> (for the series)
>>>>>>
>>>>>> but let's get an Ack from Maarten since he was kept as author on 
>>>>>> patch 3
>>>>>> and it is modified from his merged one.
>>>>>>
>>>>>>>    }
>>>>>>>    int xe_pm_runtime_get_if_active(struct xe_device *xe)
>>>>>>> diff --git a/drivers/gpu/drm/xe/xe_pm.h 
>>>>>>> b/drivers/gpu/drm/xe/xe_pm.h
>>>>>>> index 6a885585f653..1b4c15b5e71a 100644
>>>>>>> --- a/drivers/gpu/drm/xe/xe_pm.h
>>>>>>> +++ b/drivers/gpu/drm/xe/xe_pm.h
>>>>>>> @@ -19,7 +19,7 @@ int xe_pm_runtime_suspend(struct xe_device *xe);
>>>>>>>    int xe_pm_runtime_resume(struct xe_device *xe);
>>>>>>>    int xe_pm_runtime_get(struct xe_device *xe);
>>>>>>>    int xe_pm_runtime_put(struct xe_device *xe);
>>>>>>> -bool xe_pm_runtime_resume_if_suspended(struct xe_device *xe);
>>>>>>> +bool xe_pm_runtime_resume_and_get(struct xe_device *xe);
>>>>>>>    int xe_pm_runtime_get_if_active(struct xe_device *xe);
>>>>>>>    #endif
>>>>>>> -- 
>>>>>>> 2.40.0
>>>>>>>

  reply	other threads:[~2023-05-15  9:22 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-05 15:38 [Intel-xe] [PATCH 1/3] Revert "drm/xe: Use atomic instead of mutex for xe_device_mem_access_ongoing" Matthew Auld
2023-05-05 15:38 ` [Intel-xe] [PATCH 2/3] drm/xe: fix xe_device_mem_access_get() race Matthew Auld
2023-05-05 16:44   ` Rodrigo Vivi
2023-05-11 11:43     ` Matthew Auld
2023-05-12 14:32       ` Rodrigo Vivi
2023-05-12 15:56         ` Matthew Auld
2023-05-15  8:44           ` Thomas Hellström
2023-05-15  9:04             ` Matthew Auld
2023-05-15  9:22               ` Thomas Hellström [this message]
2023-05-15 10:15                 ` Matthew Auld
2023-05-05 15:38 ` [Intel-xe] [PATCH 3/3] drm/xe: Use atomic for mem_access.ref Matthew Auld
2023-05-05 15:56 ` [Intel-xe] ✓ CI.Patch_applied: success for series starting with [1/3] Revert "drm/xe: Use atomic instead of mutex for xe_device_mem_access_ongoing" Patchwork
2023-05-05 15:57 ` [Intel-xe] ✓ CI.KUnit: " Patchwork
2023-05-05 16:01 ` [Intel-xe] ✓ CI.Build: " Patchwork
2023-05-05 16:19 ` [Intel-xe] ○ CI.BAT: info " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0bbe5a45-e908-bbf0-a553-d9f09a9ab4f4@linux.intel.com \
    --to=thomas.hellstrom@linux.intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=matthew.auld@intel.com \
    --cc=rodrigo.vivi@intel.com \
    --cc=rodrigo.vivi@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.