All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Wilson <chris@chris-wilson.co.uk>
To: Tvrtko Ursulin <tursulin@ursulin.net>, Intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/i915/pmu: Inspect runtime PM state more carefully while estimating RC6
Date: Tue, 10 Apr 2018 10:57:07 +0100	[thread overview]
Message-ID: <152335422737.3167.17662821212205420318@mail.alporthouse.com> (raw)
In-Reply-To: <20180410092328.20645-1-tvrtko.ursulin@linux.intel.com>

Quoting Tvrtko Ursulin (2018-04-10 10:23:28)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> While thinking about sporadic failures of perf_pmu/rc6-runtime-pm* tests
> on some CI machines I have concluded that: a) the PMU readout of RC6 can
> race against runtime PM transitions, and b) there are other reasons than
> being runtime suspended which can cause intel_runtime_pm_get_if_in_use to
> fail.
> 
> Therefore when estimating RC6 the code needs to assert we are indeed in
> suspended state and if not the best we can do is return the last known RC6
> value.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Fixes: 1fe699e30113 ("drm/i915/pmu: Fix sleep under atomic in RC6 readout")
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105010
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Imre Deak <imre.deak@intel.com>
> ---
> I was able to trigger state != RPM_SUSPENDED on the shards, but not yet
> the actual estimation overaccounting. As such this fix is based partially
> on speculation that it will fix the sporadic perf_pmu/rc6* failures.
> Nevertheless I think it is correct to add this check regardless.
> ---
>  drivers/gpu/drm/i915/i915_pmu.c | 24 ++++++++++++++++++++++++
>  1 file changed, 24 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
> index bd7e695fc663..e92a9571db77 100644
> --- a/drivers/gpu/drm/i915/i915_pmu.c
> +++ b/drivers/gpu/drm/i915/i915_pmu.c
> @@ -473,6 +473,30 @@ static u64 get_rc6(struct drm_i915_private *i915)
>                 spin_lock_irqsave(&i915->pmu.lock, flags);
>                 spin_lock(&kdev->power.lock);
>  
> +               /*
> +                * After the above branch intel_runtime_pm_get_if_in_use failed
> +                * to get the runtime PM reference we cannot assume we are in
> +                * runtime suspend since we can either: a) race with coming out
> +                * of it before we took the power.lock, or b) there are other
> +                * states than suspended which can bring us here.
> +                *
> +                * We need to double-check that we are indeed currently runtime
> +                * suspended and if not we cannot do better than report the last
> +                * known RC6 value.
> +                */
> +               if (kdev->power.runtime_status != RPM_SUSPENDED) {
> +                       spin_unlock(&kdev->power.lock);
> +
> +                       if (i915->pmu.sample[__I915_SAMPLE_RC6_ESTIMATED].cur)
> +                               val = i915->pmu.sample[__I915_SAMPLE_RC6_ESTIMATED].cur;
> +                       else
> +                               val = i915->pmu.sample[__I915_SAMPLE_RC6].cur;

If rpm awake, but having lost the race to read the regs, report the last
known value.

This is because we don't know if another thread is in the other branch,
and so we will have one updating the estimate while it being compared
against.

But I'm not understanding the failure -- why is the estimate bad? At the
very least we still ensure that it is monotonic? Is it just the jitter
you are worrying about? (If the estimate is bad here, isn't it always
bad?)

> +
> +                       spin_unlock_irqrestore(&i915->pmu.lock, flags);
> +
> +                       return val;
> +               }

I'd prefer moving the RPM_SUSPENDED code into an else branch to avoid
another unlock/early return here. (It just fits into 80cols, so no
excuses ;)
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2018-04-10  9:57 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-10  9:23 [PATCH] drm/i915/pmu: Inspect runtime PM state more carefully while estimating RC6 Tvrtko Ursulin
2018-04-10  9:57 ` Chris Wilson [this message]
2018-04-10 10:22   ` Tvrtko Ursulin
2018-04-10 10:34     ` Chris Wilson
2018-04-10 10:54       ` Tvrtko Ursulin
2018-04-10 10:20 ` ✗ Fi.CI.BAT: warning for " Patchwork
2018-04-10 10:34 ` [PATCH v2] " Tvrtko Ursulin
2018-04-10 10:52   ` Chris Wilson
2018-04-10 11:27     ` [PATCH v3] " Tvrtko Ursulin
2018-04-10 13:07 ` ✓ Fi.CI.BAT: success for drm/i915/pmu: Inspect runtime PM state more carefully while estimating RC6 (rev3) Patchwork
2018-04-10 13:54 ` ✗ Fi.CI.IGT: warning " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=152335422737.3167.17662821212205420318@mail.alporthouse.com \
    --to=chris@chris-wilson.co.uk \
    --cc=Intel-gfx@lists.freedesktop.org \
    --cc=tursulin@ursulin.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.