From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AF1D6C433EF for ; Mon, 8 Nov 2021 21:11:15 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7704861994 for ; Mon, 8 Nov 2021 21:11:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 7704861994 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 9015E6E442; Mon, 8 Nov 2021 21:11:12 +0000 (UTC) Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by gabe.freedesktop.org (Postfix) with ESMTPS id 0C6046E303; Mon, 8 Nov 2021 21:11:11 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6200,9189,10162"; a="255999727" X-IronPort-AV: E=Sophos;i="5.87,218,1631602800"; d="scan'208";a="255999727" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Nov 2021 13:11:02 -0800 X-IronPort-AV: E=Sophos;i="5.87,218,1631602800"; d="scan'208";a="451627240" Received: from orsosgc001.jf.intel.com ([10.165.21.154]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Nov 2021 13:11:02 -0800 From: Umesh Nerlige Ramappa To: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org Subject: [PATCH] drm/i915/pmu: Fix synchronization of PMU callback with reset Date: Mon, 8 Nov 2021 13:10:57 -0800 Message-Id: <20211108211057.68783-1-umesh.nerlige.ramappa@intel.com> X-Mailer: git-send-email 2.20.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Matthew Brost , Tvrtko Ursulin , john.c.harrison@intel.com Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Since the PMU callback runs in irq context, it synchronizes with gt reset using the reset count. We could run into a case where the PMU callback could read the reset count before it is updated. This has a potential of corrupting the busyness stats. In addition to the reset count, check if the reset bit is set before capturing busyness. In addition save the previous stats only if you intend to update them. v2: - The 2 reset counts captured in the PMU callback can end up being the same if they were captured right after the count is incremented in the reset flow. This can lead to a bad busyness state. Ensure that reset is not in progress when the initial reset count is captured. Signed-off-by: Umesh Nerlige Ramappa Reviewed-by: Matthew Brost --- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 5cc49c0b3889..0dfc6032cd6b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -1183,15 +1183,20 @@ static ktime_t guc_engine_busyness(struct intel_engine_cs *engine, ktime_t *now) u64 total, gt_stamp_saved; unsigned long flags; u32 reset_count; + bool in_reset; spin_lock_irqsave(&guc->timestamp.lock, flags); /* - * If a reset happened, we risk reading partially updated - * engine busyness from GuC, so we just use the driver stored - * copy of busyness. Synchronize with gt reset using reset_count. + * If a reset happened, we risk reading partially updated engine + * busyness from GuC, so we just use the driver stored copy of busyness. + * Synchronize with gt reset using reset_count and the + * I915_RESET_BACKOFF flag. Note that reset flow updates the reset_count + * after I915_RESET_BACKOFF flag, so ensure that the reset_count is + * usable by checking the flag afterwards. */ reset_count = i915_reset_count(gpu_error); + in_reset = test_bit(I915_RESET_BACKOFF, >->reset.flags); *now = ktime_get(); @@ -1201,9 +1206,9 @@ static ktime_t guc_engine_busyness(struct intel_engine_cs *engine, ktime_t *now) * start_gt_clk is derived from GuC state. To get a consistent * view of activity, we query the GuC state only if gt is awake. */ - stats_saved = *stats; - gt_stamp_saved = guc->timestamp.gt_stamp; - if (intel_gt_pm_get_if_awake(gt)) { + if (intel_gt_pm_get_if_awake(gt) && !in_reset) { + stats_saved = *stats; + gt_stamp_saved = guc->timestamp.gt_stamp; guc_update_engine_gt_clks(engine); guc_update_pm_timestamp(guc, engine, now); intel_gt_pm_put_async(gt); -- 2.20.1 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C1517C433F5 for ; Mon, 8 Nov 2021 21:11:13 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7FAA7611C0 for ; Mon, 8 Nov 2021 21:11:13 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 7FAA7611C0 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id F20C16E8A4; Mon, 8 Nov 2021 21:11:12 +0000 (UTC) Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by gabe.freedesktop.org (Postfix) with ESMTPS id 0C6046E303; Mon, 8 Nov 2021 21:11:11 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6200,9189,10162"; a="255999727" X-IronPort-AV: E=Sophos;i="5.87,218,1631602800"; d="scan'208";a="255999727" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Nov 2021 13:11:02 -0800 X-IronPort-AV: E=Sophos;i="5.87,218,1631602800"; d="scan'208";a="451627240" Received: from orsosgc001.jf.intel.com ([10.165.21.154]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Nov 2021 13:11:02 -0800 From: Umesh Nerlige Ramappa To: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org Date: Mon, 8 Nov 2021 13:10:57 -0800 Message-Id: <20211108211057.68783-1-umesh.nerlige.ramappa@intel.com> X-Mailer: git-send-email 2.20.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: [Intel-gfx] [PATCH] drm/i915/pmu: Fix synchronization of PMU callback with reset X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Since the PMU callback runs in irq context, it synchronizes with gt reset using the reset count. We could run into a case where the PMU callback could read the reset count before it is updated. This has a potential of corrupting the busyness stats. In addition to the reset count, check if the reset bit is set before capturing busyness. In addition save the previous stats only if you intend to update them. v2: - The 2 reset counts captured in the PMU callback can end up being the same if they were captured right after the count is incremented in the reset flow. This can lead to a bad busyness state. Ensure that reset is not in progress when the initial reset count is captured. Signed-off-by: Umesh Nerlige Ramappa Reviewed-by: Matthew Brost --- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 5cc49c0b3889..0dfc6032cd6b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -1183,15 +1183,20 @@ static ktime_t guc_engine_busyness(struct intel_engine_cs *engine, ktime_t *now) u64 total, gt_stamp_saved; unsigned long flags; u32 reset_count; + bool in_reset; spin_lock_irqsave(&guc->timestamp.lock, flags); /* - * If a reset happened, we risk reading partially updated - * engine busyness from GuC, so we just use the driver stored - * copy of busyness. Synchronize with gt reset using reset_count. + * If a reset happened, we risk reading partially updated engine + * busyness from GuC, so we just use the driver stored copy of busyness. + * Synchronize with gt reset using reset_count and the + * I915_RESET_BACKOFF flag. Note that reset flow updates the reset_count + * after I915_RESET_BACKOFF flag, so ensure that the reset_count is + * usable by checking the flag afterwards. */ reset_count = i915_reset_count(gpu_error); + in_reset = test_bit(I915_RESET_BACKOFF, >->reset.flags); *now = ktime_get(); @@ -1201,9 +1206,9 @@ static ktime_t guc_engine_busyness(struct intel_engine_cs *engine, ktime_t *now) * start_gt_clk is derived from GuC state. To get a consistent * view of activity, we query the GuC state only if gt is awake. */ - stats_saved = *stats; - gt_stamp_saved = guc->timestamp.gt_stamp; - if (intel_gt_pm_get_if_awake(gt)) { + if (intel_gt_pm_get_if_awake(gt) && !in_reset) { + stats_saved = *stats; + gt_stamp_saved = guc->timestamp.gt_stamp; guc_update_engine_gt_clks(engine); guc_update_pm_timestamp(guc, engine, now); intel_gt_pm_put_async(gt); -- 2.20.1