From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 74F37C77B7D for ; Mon, 15 May 2023 09:04:29 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 4673510E17A; Mon, 15 May 2023 09:04:29 +0000 (UTC) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by gabe.freedesktop.org (Postfix) with ESMTPS id 4F4C010E17A for ; Mon, 15 May 2023 09:04:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1684141467; x=1715677467; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=jGbv5sC6uZ+8lI2LCxi3rBUoy7CdXa0vIbnfnq8wHp0=; b=b3rcHLcXgwK+x7JgDKP8L89eoWBrJQVIx2LkG20eW/q9yephaq9R81V0 sVf17ufx0hlpWlKGZmYmUD3Y7ZUCaOhXS77z9h9qdY8wffO/i56JSBYR2 Xu1jO+TriY/9PIW3qRGQSrjuxM64uVtNEUkNASw/2mgfliUyUh3sHQmSl 3/JD2PEcj9AfADF4nIrPcnd8DxEaMB1BS90Wd6sluVmF2+t4XqZArkS/G 63Rp3+gwLBrs7IW1FVDu2OlbYna7QB2Fk7mPDaJZYQC6zTtb00I5XXYfy KEe8n8ra3sACO+r4tRxlIKBeS2BMgprKxNgs8QaoyLbrpi6PeqrFS1Jmm w==; X-IronPort-AV: E=McAfee;i="6600,9927,10710"; a="340500621" X-IronPort-AV: E=Sophos;i="5.99,276,1677571200"; d="scan'208";a="340500621" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 May 2023 02:04:26 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10710"; a="678377204" X-IronPort-AV: E=Sophos;i="5.99,276,1677571200"; d="scan'208";a="678377204" Received: from kaelanhx-mobl1.ger.corp.intel.com (HELO [10.252.19.149]) ([10.252.19.149]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 May 2023 02:04:24 -0700 Message-ID: <09216a54-1a54-9300-2bc4-57e330da37bd@intel.com> Date: Mon, 15 May 2023 10:04:10 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0 Thunderbird/102.11.0 To: =?UTF-8?Q?Thomas_Hellstr=c3=b6m?= , Rodrigo Vivi References: <20230505153854.343944-1-matthew.auld@intel.com> <20230505153854.343944-2-matthew.auld@intel.com> <0a7f7d48-4423-1187-63d2-dfa6b85beba9@intel.com> <5a00911c-626c-738d-e542-014923fa532d@intel.com> <6b7301da-c664-fe89-f4ed-64a390a37d9e@linux.intel.com> Content-Language: en-GB From: Matthew Auld In-Reply-To: <6b7301da-c664-fe89-f4ed-64a390a37d9e@linux.intel.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Subject: Re: [Intel-xe] [PATCH 2/3] drm/xe: fix xe_device_mem_access_get() race X-BeenThere: intel-xe@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel Xe graphics driver List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: intel-xe@lists.freedesktop.org, Rodrigo Vivi Errors-To: intel-xe-bounces@lists.freedesktop.org Sender: "Intel-xe" On 15/05/2023 09:44, Thomas Hellström wrote: > Hi, Matthew, > > On 5/12/23 17:56, Matthew Auld wrote: >> On 12/05/2023 15:32, Rodrigo Vivi wrote: >>> On Thu, May 11, 2023 at 12:43:36PM +0100, Matthew Auld wrote: >>>> On 05/05/2023 17:44, Rodrigo Vivi wrote: >>>>> On Fri, May 05, 2023 at 04:38:53PM +0100, Matthew Auld wrote: >>>>>> It looks like there is at least one race here, given that the >>>>>> pm_runtime_suspended() check looks to return false if we are in the >>>>>> process of suspending the device (RPM_SUSPENDING vs >>>>>> RPM_SUSPENDED). We >>>>>> later also do xe_pm_runtime_get_if_active(), but since the device is >>>>>> suspending or has now suspended, this doesn't do anything either. >>>>>> Following from this we can potentially return from >>>>>> xe_device_mem_access_get() with the device suspended or about to be, >>>>>> leading to broken behaviour. >>>>>> >>>>>> Attempt to fix this by always grabbing the runtime ref when our >>>>>> internal >>>>>> ref transitions from 0 -> 1, and then wrap the whole thing with a >>>>>> lock >>>>>> to ensure callers are serialized. >>>>>> >>>>>> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/258 >>>>>> Signed-off-by: Matthew Auld >>>>>> Cc: Rodrigo Vivi >>>>>> Cc: Thomas Hellström >>>>>> Cc: Matthew Brost >>>>>> --- >>>>>>    drivers/gpu/drm/xe/xe_device.c       | 11 +++-------- >>>>>>    drivers/gpu/drm/xe/xe_device_types.h |  5 ++++- >>>>>>    drivers/gpu/drm/xe/xe_pm.c           |  9 ++------- >>>>>>    drivers/gpu/drm/xe/xe_pm.h           |  2 +- >>>>>>    4 files changed, 10 insertions(+), 17 deletions(-) >>>>>> >>>>>> diff --git a/drivers/gpu/drm/xe/xe_device.c >>>>>> b/drivers/gpu/drm/xe/xe_device.c >>>>>> index 01c497bcf9a5..0a18b41a0e1a 100644 >>>>>> --- a/drivers/gpu/drm/xe/xe_device.c >>>>>> +++ b/drivers/gpu/drm/xe/xe_device.c >>>>>> @@ -406,17 +406,12 @@ u32 xe_device_ccs_bytes(struct xe_device >>>>>> *xe, u64 size) >>>>>>    void xe_device_mem_access_get(struct xe_device *xe) >>>>>>    { >>>>>> -    bool resumed = xe_pm_runtime_resume_if_suspended(xe); >>>>>> - >>>>>>        mutex_lock(&xe->mem_access.lock); >>>>>> -    if (xe->mem_access.ref++ == 0) >>>>>> -        xe->mem_access.hold_rpm = xe_pm_runtime_get_if_active(xe); >>>>>> +    if (xe->mem_access.ref == 0) >>>>>> +        xe->mem_access.hold_rpm = xe_pm_runtime_resume_and_get(xe); >>>>>> +    xe->mem_access.ref++; >>>>> >>>>> my memory is good to the point that I have tried this before... >>>>> but not good enough to remember the issues that I got with this >>>>> approach :( >>>> >>>> Ok, it seems one big issue is around xe_pm_runtime_resume() et al. >>>> The lock >>>> fixes all the races, but xe_pm_runtime_resume() seems to call >>>> xe_device_mem_access_{get,put}() in loads of places AFAICT, but that >>>> is ofc >>>> going to deadlock if we introduce a lock, since we are inside the >>>> callback. >>>> But I think even without that lock it will still deadlock, since the >>>> runtime_pm code will see that we are PM_RESUMING and wait for >>>> itself. I'm >>>> guessing that explains why we had the conditional >>>> pm_runtime_suspended() and >>>> if_active(), since that prevents triggering the runtime_pm from our >>>> callbacks (we will either be PM_SUSPENDING or PM_RESUMING), >>> >>> Yes, that was the goal. >>> >>>> but then we are >>>> ofc left with all the nasty races. >>> >>> :( >>> >>>> Any ideas? It seems like the >>>> resume/suspend callbacks should fundamentally never be calling >>>> xe_device_mem_access_{get,put}()? >>> >>> We probably need something like the pseudo code in the end of >>> Documentation/power/runtime_pm.rst >> >> IIUC that looks like a template for where you have lots of incoming IO >> requests, and you want to process them in an async manner. But to >> process the request you need to also keep the device awake and ensure >> it has resumed, so here it makes sense to use the RPM_ASYNC >> runtime_get. And then in the ->resume() callback it then has a natural >> place to at some later point process the outstanding IO requests. >> >> But for xe_device_mem_access_get() I don't think it can be async, >> since we need to ensure the ->resume() callback has already completed >> before returning to the caller. From the pov of >> xe_device_mem_access_get() it doesn't know if it's being called from >> ->resume() or some other normal path and yet it seems like it somehow >> needs to differentiate them. I feel like either the ->resume() should >> never have been allowed to call it in the first place, or there needs >> to be some token that always gets passed around. >> >> But maybe I'm misunderstanding something here, or at least I'm not >> currently seeing the connection with the pseudo code? Can you share more? > > Some fly-by ideas here: > > With the variant we briefly discussed before, to only lock during 0-1 > and 1-0 transitions, using atomic_inc_not_zero() and atomic_add_unless() > then the get() problem goes out of the way, but any recursive get() and > put() during put()->suspend will then cause similar problems? Yeah, I was thinking something like that, but AFAICT the runtime_pm code will see that it is already in a PM_RESUMING state and wait for itself to exit from that state, which also deadlocks. > > While a bit hacky, that could probably be solved having an > > xe_device::suspending_task set and cleared by the suspending task and > then use > > if (xe->suspending_task == current) I didn't think of that tbh. Does seem hacky, but I think should work, and perhaps fine for now? > > to skip pm calls during suspend. Assuming of course that all suspend() > task for a single device are never run concurrently. Yes, AFAICT only one resume or suspend callback can be running for a device, until we exit from PM_SUSPENDING or PM_RESUMING. > > /Thomas > >> >>> >>>> >>>>> >>>>>> mutex_unlock(&xe->mem_access.lock); >>>>>> -    /* The usage counter increased if device was immediately >>>>>> resumed */ >>>>>> -    if (resumed) >>>>>> -        xe_pm_runtime_put(xe); >>>>>> - >>>>>>        XE_WARN_ON(xe->mem_access.ref == S32_MAX); >>>>>>    } >>>>>> diff --git a/drivers/gpu/drm/xe/xe_device_types.h >>>>>> b/drivers/gpu/drm/xe/xe_device_types.h >>>>>> index 59462933f67a..9e37189d5745 100644 >>>>>> --- a/drivers/gpu/drm/xe/xe_device_types.h >>>>>> +++ b/drivers/gpu/drm/xe/xe_device_types.h >>>>>> @@ -256,7 +256,10 @@ struct xe_device { >>>>>>         * triggering additional actions when they occur. >>>>>>         */ >>>>>>        struct { >>>>>> -        /** @lock: protect the ref count */ >>>>>> +        /** >>>>>> +         * @lock: Serialize xe_device_mem_access users, >>>>>> +         * and protect the below internal state, like @ref. >>>>>> +         */ >>>>>>            struct mutex lock; >>>>>>            /** @ref: ref count of memory accesses */ >>>>>>            s32 ref; >>>>>> diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c >>>>>> index b7b57f10ba25..b2ffa001e6f7 100644 >>>>>> --- a/drivers/gpu/drm/xe/xe_pm.c >>>>>> +++ b/drivers/gpu/drm/xe/xe_pm.c >>>>>> @@ -210,14 +210,9 @@ int xe_pm_runtime_put(struct xe_device *xe) >>>>>>        return pm_runtime_put_autosuspend(xe->drm.dev); >>>>>>    } >>>>>> -/* Return true if resume operation happened and usage count was >>>>>> increased */ >>>>>> -bool xe_pm_runtime_resume_if_suspended(struct xe_device *xe) >>>>>> +bool xe_pm_runtime_resume_and_get(struct xe_device *xe) >>>>>>    { >>>>>> -    /* In case we are suspended we need to immediately wake up */ >>>>>> -    if (pm_runtime_suspended(xe->drm.dev)) >>>>>> -        return !pm_runtime_resume_and_get(xe->drm.dev); >>>>>> - >>>>>> -    return false; >>>>>> +    return !pm_runtime_resume_and_get(xe->drm.dev); >>>>> >>>>> now with similar name I feel strange that we are not aligned with >>>>> their >>>>> return. Although I prefer our one... >>>>> >>>>> Anyway, the code is right... if you are testing and it is working well >>>>> let's move with this. >>>>> >>>>> Reviewed-by: Rodrigo Vivi >>>>> (for the series) >>>>> >>>>> but let's get an Ack from Maarten since he was kept as author on >>>>> patch 3 >>>>> and it is modified from his merged one. >>>>> >>>>>>    } >>>>>>    int xe_pm_runtime_get_if_active(struct xe_device *xe) >>>>>> diff --git a/drivers/gpu/drm/xe/xe_pm.h b/drivers/gpu/drm/xe/xe_pm.h >>>>>> index 6a885585f653..1b4c15b5e71a 100644 >>>>>> --- a/drivers/gpu/drm/xe/xe_pm.h >>>>>> +++ b/drivers/gpu/drm/xe/xe_pm.h >>>>>> @@ -19,7 +19,7 @@ int xe_pm_runtime_suspend(struct xe_device *xe); >>>>>>    int xe_pm_runtime_resume(struct xe_device *xe); >>>>>>    int xe_pm_runtime_get(struct xe_device *xe); >>>>>>    int xe_pm_runtime_put(struct xe_device *xe); >>>>>> -bool xe_pm_runtime_resume_if_suspended(struct xe_device *xe); >>>>>> +bool xe_pm_runtime_resume_and_get(struct xe_device *xe); >>>>>>    int xe_pm_runtime_get_if_active(struct xe_device *xe); >>>>>>    #endif >>>>>> -- >>>>>> 2.40.0 >>>>>>