From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 8FF3BC433EF for ; Tue, 23 Nov 2021 09:15:20 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 0C7D16E131; Tue, 23 Nov 2021 09:15:20 +0000 (UTC) Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by gabe.freedesktop.org (Postfix) with ESMTPS id BFE496E131 for ; Tue, 23 Nov 2021 09:15:15 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6200,9189,10176"; a="232477317" X-IronPort-AV: E=Sophos;i="5.87,257,1631602800"; d="scan'208";a="232477317" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Nov 2021 01:15:12 -0800 X-IronPort-AV: E=Sophos;i="5.87,257,1631602800"; d="scan'208";a="552903356" Received: from clei4-mobl.ccr.corp.intel.com (HELO [10.209.177.32]) ([10.209.177.32]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Nov 2021 01:15:11 -0800 Message-ID: <20493aa0-c5b3-c8c2-38f2-c989e0ee2cb0@linux.intel.com> Date: Tue, 23 Nov 2021 09:15:10 +0000 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.2.1 Content-Language: en-US To: Umesh Nerlige Ramappa References: <20211103224708.1931-1-umesh.nerlige.ramappa@intel.com> <68f76da9-6b70-fee4-6cc6-17e74c867bd2@linux.intel.com> <20211104220407.GA23493@unerlige-ril-10.165.21.208> <2c1af88a-93d3-cc8f-5e4e-e4a494fd21f0@linux.intel.com> <20211111164810.GH6789@unerlige-ril-10.165.21.208> <0183b8b0-5c8d-a7d5-55d5-fd068355fb3a@linux.intel.com> <20211122233933.GA1371@unerlige-ril-10.165.21.208> From: Tvrtko Ursulin Organization: Intel Corporation UK Plc In-Reply-To: <20211122233933.GA1371@unerlige-ril-10.165.21.208> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Subject: Re: [Intel-gfx] [PATCH] drm/i915/pmu: Fix synchronization of PMU callback with reset X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: intel-gfx@lists.freedesktop.org Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" On 22/11/2021 23:39, Umesh Nerlige Ramappa wrote: > On Mon, Nov 22, 2021 at 03:44:29PM +0000, Tvrtko Ursulin wrote: >> >> On 11/11/2021 16:48, Umesh Nerlige Ramappa wrote: >>> On Thu, Nov 11, 2021 at 02:37:43PM +0000, Tvrtko Ursulin wrote: >>>> >>>> On 04/11/2021 22:04, Umesh Nerlige Ramappa wrote: >>>>> On Thu, Nov 04, 2021 at 05:37:37PM +0000, Tvrtko Ursulin wrote: >>>>>> >>>>>> On 03/11/2021 22:47, Umesh Nerlige Ramappa wrote: >>>>>>> Since the PMU callback runs in irq context, it synchronizes with gt >>>>>>> reset using the reset count. We could run into a case where the PMU >>>>>>> callback could read the reset count before it is updated. This has a >>>>>>> potential of corrupting the busyness stats. >>>>>>> >>>>>>> In addition to the reset count, check if the reset bit is set before >>>>>>> capturing busyness. >>>>>>> >>>>>>> In addition save the previous stats only if you intend to update >>>>>>> them. >>>>>>> >>>>>>> Signed-off-by: Umesh Nerlige Ramappa >>>>>>> >>>>>>> --- >>>>>>>  drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 12 ++++++++---- >>>>>>>  1 file changed, 8 insertions(+), 4 deletions(-) >>>>>>> >>>>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c >>>>>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c >>>>>>> index 5cc49c0b3889..d83ade77ca07 100644 >>>>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c >>>>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c >>>>>>> @@ -1183,6 +1183,7 @@ static ktime_t guc_engine_busyness(struct >>>>>>> intel_engine_cs *engine, ktime_t *now) >>>>>>>      u64 total, gt_stamp_saved; >>>>>>>      unsigned long flags; >>>>>>>      u32 reset_count; >>>>>>> +    bool in_reset; >>>>>>>      spin_lock_irqsave(&guc->timestamp.lock, flags); >>>>>>> @@ -1191,7 +1192,9 @@ static ktime_t guc_engine_busyness(struct >>>>>>> intel_engine_cs *engine, ktime_t *now) >>>>>>>       * engine busyness from GuC, so we just use the driver stored >>>>>>>       * copy of busyness. Synchronize with gt reset using >>>>>>> reset_count. >>>>>>>       */ >>>>>>> -    reset_count = i915_reset_count(gpu_error); >>>>>>> +    rcu_read_lock(); >>>>>>> +    in_reset = test_bit(I915_RESET_BACKOFF, >->reset.flags); >>>>>>> +    rcu_read_unlock(); >>>>>> >>>>>> I don't really understand the point of rcu_read_lock over test_bit >>>>>> but I guess you copied it from the trylock loop. >>>>> >>>>> Yes, I don't see other parts of code using the lock though. I can >>>>> drop it. >>>>> >>>>>> >>>>>>>      *now = ktime_get(); >>>>>>> @@ -1201,9 +1204,10 @@ static ktime_t guc_engine_busyness(struct >>>>>>> intel_engine_cs *engine, ktime_t *now) >>>>>>>       * start_gt_clk is derived from GuC state. To get a consistent >>>>>>>       * view of activity, we query the GuC state only if gt is >>>>>>> awake. >>>>>>>       */ >>>>>>> -    stats_saved = *stats; >>>>>>> -    gt_stamp_saved = guc->timestamp.gt_stamp; >>>>>>> -    if (intel_gt_pm_get_if_awake(gt)) { >>>>>>> +    if (intel_gt_pm_get_if_awake(gt) && !in_reset) { >>>>>> >>>>>> What is the point of looking at the old value of in_reset here? >>>>>> Gut feeling says if there is a race this does not fix it. >>>>>> >>>>>> I did not figure out from the commit message what does "could read >>>>>> the reset count before it is updated" mean? >>>>>> I thought the point of reading >>>>> >>>>>> the reset count twice was that you are sure there was no reset >>>>>> while in here, in which case it is safe to update the software >>>>>> copy. I don't easily see what test_bit does on top. >>>>> >>>>> This is what I see in the reset flow >>>>> --------------- >>>>> >>>>> R1) test_and_set_bit(I915_RESET_BACKOFF, >->reset.flags) >>>>> R2) atomic_inc(>->i915->gpu_error.reset_count) >>>>> R3) reset prepare >>>>> R4) do the HW reset >>>>> >>>>> The reset count is updated only once above and that's before an >>>>> actual HW reset happens. >>>>> >>>>> PMU callback flow before this patch >>>>> --------------- >>>>> >>>>> P1) read reset count >>>>> P2) update stats >>>>> P3) read reset count >>>>> P4) if reset count changed, use old stats. if not use updated stats. >>>>> >>>>> I am concerned that the PMU flow could run after step (R2). Then we >>>>> wrongly conclude that the count stayed the same and no HW reset >>>>> happened. >>> >>> Here is the problematic sequence: Threads R and P. >>> ------------ >>> R1) test_and_set_bit(I915_RESET_BACKOFF, >->reset.flags) >>> R2) atomic_inc(>->i915->gpu_error.reset_count) >>>     P1) read reset count >>>     P2) update stats >>>     P3) read reset count >>>     P4) if reset count changed, use old stats. if not use updated stats. >>> R3) reset prepare >>> R4) do the HW reset >>> >>> Do you agree that this is racy? In thread P we don't know in if the >>> reset flag was set or not when we captured the reset count in P1? >>> >>>>> >>>>> PMU callback flow with this patch >>>>> --------------- >>>>> This would rely on the reset_count only if a reset is not in progress. >>>>> >>>>> P0) test_bit for I915_RESET_BACKOFF >>>>> P1) read reset count if not in reset. if in reset, use old stats >>>>> P2) update stats >>>>> P3) read reset count >>>>> P4) if reset count changed, use old stats. if not use updated stats. >>>>> >>>>> Now that I think about it more, I do see one sequence that still >>>>> needs fixing though - P0, R1, R2, P1 - P4. For that, I think I need >>>>> to re-read the BACKOFF bit after reading the reset_count for the >>>>> first time. >>>>> Modified PMU callback sequence would be: >>>>> ---------- >>>>> >>>>> M0) test_bit for I915_RESET_BACKOFF >>>>> M1) read reset count if not in reset, if in reset, use old stats >>>>> >>>>> M1.1) test_bit for I915_RESET_BACKOFF. if set, use old stats. if >>>>> not, use reset_count to synchronize >>>>> >>>>> M2) update stats >>>>> M3) read reset count >>>>> M4) if reset count changed, use old stats. if not use updated stats. >>>> >>>> You did not end up implementing this flow? Have you later changed >>>> your mind whether it is required or not? Or maybe I am looking at >>>> not the latest patch. >>>> >>>> Is the below the latest? >>>> >>>> """ >>>> v2: >>>> - The 2 reset counts captured in the PMU callback can end up being the >>>>  same if they were captured right after the count is incremented in the >>>>  reset flow. This can lead to a bad busyness state. Ensure that reset >>>>  is not in progress when the initial reset count is captured. >>>> """ >>> >>> Yes, v2 is the latest (maybe CI results re-ordered the patches). >>> Instead of sampling the BACKOFF flag before and after the reset count >>> (as in the modified sequence), I just sample it after. The order is >>> critical - first sample reset count and then the reset flag. >>> >>>> >>>> Is the key now that you rely on ordering of atomic_inc and set_bit >>>> in the reset path? >>> >>> Yes >>> >>>> Frankly I still don't understand why you can get away >>> >>>> with using stale in_reset in v2. If you acknowledge it can change >>>> between sampling and checking, then what is the point in having it >>>> altogether? You still solely rely on reset count in that case, no? >>> >>> Correct, but now I know for sure that the first sample of reset_count >>> was captured when reset flag was not set (since I am relying on the >>> order of sampling). >>> >>> About solely using the reset_count, I have listed the problematic >>> sequence above to highlight what the issue is. >> >> It was this: >> >> """ >> R1) test_and_set_bit(I915_RESET_BACKOFF, >->reset.flags) >> R2) atomic_inc(>->i915->gpu_error.reset_count) >>     P1) read reset count >>     P2) update stats >>     P3) read reset count >>     P4) if reset count changed, use old stats. if not use updated stats. >> R3) reset prepare >> R4) do the HW reset >> >> Do you agree that this is racy? In thread P we don't know in if the >> reset flag was set or not when we captured the reset count in P1? >> """ >> >> Why it matter if reset flag was set or not? Lets see how things are >> after this patch: >> >> After this patch it ends like this: >> >>     P1) Read and store reset bit >> R1) test_and_set_bit(I915_RESET_BACKOFF, >->reset.flags) >>     P2) If reset bit was not set: >>           P2.1) read reset count >> R2) atomic_inc(>->i915->gpu_error.reset_count) >>           P2.2) update stats >>           P2.3) read reset count >>           P2.4) if reset count changed, use old stats. if not use >> updated stats. >> R3) reset prepare >> R4) do the HW reset >> >> So the reset bit got set between P1 and P2. How is that then not the >> same as not looking at the reset bit at all? > > But the new sequence in this patch is this: Oops I was looking at v1. Okay I can't come up with any further races, acked. Regards, Tvrtko > >     P0) read reset count >     P1) Read and store reset bit > R1) test_and_set_bit(I915_RESET_BACKOFF, >->reset.flags) >     P2) If reset bit was not set: > R2) atomic_inc(>->i915->gpu_error.reset_count) >        P2.2) update stats >        P2.3) read reset count >        P2.4) if reset count changed, use old stats. if not use updated > stats. > R3) reset prepare > R4) do the HW reset > > P2.1 moved to P0 when compared to the sequence you shared above. > > Thanks, > Umesh > >> >> Regards, >> >> Tvrtko