All of lore.kernel.org
 help / color / mirror / Atom feed
From: Paulo Zanoni <przanoni@gmail.com>
To: Chris Wilson <chris@chris-wilson.co.uk>,
	Paulo Zanoni <przanoni@gmail.com>,
	Intel Graphics Development <intel-gfx@lists.freedesktop.org>,
	Daniel Vetter <daniel.vetter@ffwll.ch>,
	Mika Kuoppala <mika.kuoppala@intel.com>,
	Paulo Zanoni <paulo.r.zanoni@intel.com>
Subject: Re: [PATCH 18/49] drm/i915: Reduce frequency of unspecific HSW reg debugging
Date: Mon, 30 Mar 2015 16:15:08 -0300	[thread overview]
Message-ID: <CA+gsUGSFY4mC_KoFr0ga4847up8ZhYDD1s8Rde8WV8qVii9bFQ@mail.gmail.com> (raw)
In-Reply-To: <20150327161216.GF28765@nuc-i3427.alporthouse.com>

2015-03-27 13:12 GMT-03:00 Chris Wilson <chris@chris-wilson.co.uk>:
> On Fri, Mar 27, 2015 at 12:34:05PM -0300, Paulo Zanoni wrote:
>> 2015-03-27 8:01 GMT-03:00 Chris Wilson <chris@chris-wilson.co.uk>:
>> > Delay the expensive read on the FPGA_DBG register from once per mmio to
>> > once per forcewake section when we are doing the general wellbeing
>> > check rather than the targetted error detection. This almost reduces
>> > the overhead of the debug facility (for example when submitting execlists)
>> > to zero whilst keeping the debug checks around.
>> >
>> > v2: Enable one-shot mmio debugging from the interrupt check as well as a
>> >     safeguard to catch invalid display writes from outside the powerwell.
>> >
>> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>> > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
>> > Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
>> > ---
>> >  drivers/gpu/drm/i915/intel_uncore.c | 56 ++++++++++++++++++++-----------------
>> >  1 file changed, 30 insertions(+), 26 deletions(-)
>> >
>> > diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
>> > index ab5cc94588e1..0e32bbbcada8 100644
>> > --- a/drivers/gpu/drm/i915/intel_uncore.c
>> > +++ b/drivers/gpu/drm/i915/intel_uncore.c
>> > @@ -149,6 +149,30 @@ fw_domains_put(struct drm_i915_private *dev_priv, enum forcewake_domains fw_doma
>> >  }
>> >
>> >  static void
>> > +hsw_unclaimed_reg_detect(struct drm_i915_private *dev_priv)
>> > +{
>> > +       static bool mmio_debug_once = true;
>> > +
>> > +       if (i915.mmio_debug || !mmio_debug_once)
>> > +               return;
>> > +
>> > +       if (__raw_i915_read32(dev_priv, FPGA_DBG) & FPGA_DBG_RM_NOCLAIM) {
>> > +               DRM_DEBUG("Unclaimed register detected, "
>> > +                         "enabling oneshot unclaimed register reporting. "
>> > +                         "Please use i915.mmio_debug=N for more information.\n");
>> > +               __raw_i915_write32(dev_priv, FPGA_DBG, FPGA_DBG_RM_NOCLAIM);
>> > +               i915.mmio_debug = mmio_debug_once--;
>> > +       }
>> > +}
>> > +
>> > +static void
>> > +fw_domains_put_debug(struct drm_i915_private *dev_priv, enum forcewake_domains fw_domains)
>> > +{
>> > +       hsw_unclaimed_reg_detect(dev_priv);
>> > +       fw_domains_put(dev_priv, fw_domains);
>> > +}
>>
>> This means we won't check during the forcewake puts that are on the
>> register read/write macros. Is this intentional?
>
> Not really. But the check still catches any mistakes there even though
> we know they are by safe.
>
>> I tried checking the
>> FW code calls, and it seems to me that we're not really going to run
>> hsw_unclaimed_reg_detect very frequently anymore. I wonder if there's
>> the risk of staying a long time without running it. But maybe I'm just
>> wrong.
>
> It gets run after every set of register writes (where set is defined as
> activity on a single cpu within 1ms). It gets run before the powerwell
> is disabled. Look at the profiles, you will see that hsw detect is still
> called quite frequently. And by virtue it does not need to be run very
> often to catch issues anyway.
>
>> > +       if (HAS_FPGA_DBG_UNCLAIMED(dev) &&
>> > +           dev_priv->uncore.funcs.force_wake_put == fw_domains_put)
>>
>> My fear here is that simple changes to the FW code by a future
>> programmer could unintentionally kill the unclaimed register detection
>> feature, and we probably wouldn't notice for a looong time. Why not
>> just omit this fw_domains_put check, since it is true for all
>> platforms where HAS_FPG_DBG_UNCLAIMED is also true? The side effect of
>> calling fw_domains_put() when we shouldn't is probably more noticeable
>> than having unclaimed register detection gone.
>
> Pardon?

My suggestion was to find a way to transform this "if" statement above
somehow into a check just for "if (has_fpga_dbg_unclaimed())", without
relying on whatever is assigned to funcs.force_wake_put, due to the
fear that a refactoring could accidentally kill the unclaimed register
detection. But it was just a suggestion.

>
>> > +               dev_priv->uncore.funcs.force_wake_put = fw_domains_put_debug;
>> > +
>> >         /* All future platforms are expected to require complex power gating */
>> >         WARN_ON(dev_priv->uncore.fw_domains == 0);
>> >  }
>> > @@ -1411,11 +1420,6 @@ int intel_gpu_reset(struct drm_device *dev)
>> >
>> >  void intel_uncore_check_errors(struct drm_device *dev)
>> >  {
>> > -       struct drm_i915_private *dev_priv = dev->dev_private;
>> > -
>> > -       if (HAS_FPGA_DBG_UNCLAIMED(dev) &&
>> > -           (__raw_i915_read32(dev_priv, FPGA_DBG) & FPGA_DBG_RM_NOCLAIM)) {
>> > -               DRM_ERROR("Unclaimed register before interrupt\n");
>> > -               __raw_i915_write32(dev_priv, FPGA_DBG, FPGA_DBG_RM_NOCLAIM);
>> > -       }
>> > +       if (HAS_FPGA_DBG_UNCLAIMED(dev))
>> > +               hsw_unclaimed_reg_detect(to_i915(dev));
>>
>> This means we won't check for unclaimed registers during interrupts if
>> i915.mmio_debug is being used, which is probably not what we want.
>
> It's exactly what you want. It still does the debug check if you have
> mmio_debug unset. If you have mmio_debug set, it means you are debugging
> i915 register mmio, since that is all we can reliably debug.
>
>> One of the things that worries me a little is that now we'll be
>> running a mostly-display-related feature only when certain pieces of
>> non-display-related code run. Instead of doing the checks at the
>> forcewake puts, maybe we could tie the frequency of the checks to
>> something in the display code, or just do the check every X seconds. I
>> don't really know what would be ideal here, I'm just throwing the
>> ideas. I'm also not blocking this patch, just pointing things that
>> could maybe be improved.
>
> Sure, all you would need to do is add the check to every rpm_put() if you
> feel paranoid (it will be run before the powerwell is dropped by
> design).
>
>> Since it's impacting performance, perhaps we could even completely
>> kill unclaimed register detection from the normal use case, hiding it
>> behind i915.mmio_debug=1 (and maybe a kconfig option)? We would then
>> instruct QA and developers to always have the option enabled. Just
>> like QA needs to have lockdep enabled, we could ask it to have
>> mmio_debug enabled all the time too.
>
> Whilst I like the idea, having debug code running in the wild (at a
> frequency high enough to catch bugs, but low enough not to be noticed)
> is invaluable.

Some other ideas that could be worth discussing:

- Adding a check for the range of registers that are covered by
FPGA_DBG. I imagine most/all of your performance sensitive registers
are outside it, so maybe this change would be enough to fix the issues
you're seeing, potentially replacing this patch.

- It seems that most of the times we call I915_WRITE/READ, the
register argument is seen by the compiler as a constant (I checked
this with __builtin_constant_p()). Maybe we could try to exploit this
builtin to make a special-case that allows the compiler to optimize
away all our "if" statements we have on the register writing macros.
Of course, since not all our reg writes are constant, we'd still need
the older/slower version.

- Didn't we ever discuss replacing I915_WRITE with more specialized
macros that would be used just on specific register ranges? On gen8 I
can see, for example: a range requiring forcewake, a range requiring
fpga_dbg and a range requiring nothing. On gen9 I see even more. Would
the little performance gain justify the change?

It seems you're doing some optimizations, so maybe one of the ideas
could be interesting...

Thanks,
Paulo

> -Chris
>
> --
> Chris Wilson, Intel Open Source Technology Centre



-- 
Paulo Zanoni
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2015-03-30 19:15 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-27 11:01 A picking of low hanging fruit Chris Wilson
2015-03-27 11:01 ` [PATCH 01/49] drm/i915: Cache last obj->pages location for i915_gem_object_get_page() Chris Wilson
2015-03-27 11:01 ` [PATCH 02/49] drm/i915: Agressive downclocking on Baytrail Chris Wilson
2015-04-02 11:21   ` Deepak S
2015-03-27 11:01 ` [PATCH 03/49] drm/i915: Fix computation of last_adjustment for RPS autotuning Chris Wilson
2015-03-27 11:01 ` [PATCH 04/49] drm/i915: Add i915_gem_request_unreference__unlocked Chris Wilson
2015-03-27 16:42   ` Tvrtko Ursulin
2015-03-27 11:01 ` [PATCH 05/49] drm/i915: Fix race on unreferencing the wrong mmio-flip-request Chris Wilson
2015-03-27 11:01 ` [PATCH 06/49] drm/i915: Boost GPU frequency if we detect outstanding pageflips Chris Wilson
2015-03-27 11:01 ` [PATCH 07/49] drm/i915: Deminish contribution of wait-boosting from clients Chris Wilson
2015-03-27 11:01 ` [PATCH 08/49] drm/i915: Re-enable RPS wait-boosting for all engines Chris Wilson
2015-04-02 11:09   ` Deepak S
2015-04-02 11:39     ` Chris Wilson
2015-03-27 11:01 ` [PATCH 09/49] drm/i915: Split i915_gem_batch_pool into its own header Chris Wilson
2015-03-27 11:01 ` [PATCH 10/49] drm/i915: Tidy batch pool logic Chris Wilson
2015-03-27 11:59   ` Tvrtko Ursulin
2015-03-27 11:01 ` [PATCH 11/49] drm/i915: Split the batch pool by engine Chris Wilson
2015-03-27 11:01 ` [PATCH 12/49] drm/i915: Free batch pool when idle Chris Wilson
2015-03-27 11:01 ` [PATCH 13/49] drm/i915: Split batch pool into size buckets Chris Wilson
2015-03-27 11:01 ` [PATCH 14/49] drm/i915: Include active flag when describing objects in debugfs Chris Wilson
2015-03-27 11:01 ` [PATCH 15/49] drm/i915: Suppress empty lines from debugfs/i915_gem_objects Chris Wilson
2015-03-27 11:01 ` [PATCH 16/49] drm/i915: Optimistically spin for the request completion Chris Wilson
2015-03-27 11:42   ` Tvrtko Ursulin
2015-03-27 11:01 ` [PATCH 17/49] drm/i915: Implement inter-engine read-read optimisations Chris Wilson
2015-03-30 13:52   ` Tvrtko Ursulin
2015-03-30 14:09     ` Chris Wilson
2015-03-30 14:45       ` Tvrtko Ursulin
2015-03-30 15:07         ` Chris Wilson
2015-03-27 11:01 ` [PATCH 18/49] drm/i915: Reduce frequency of unspecific HSW reg debugging Chris Wilson
2015-03-27 15:34   ` Paulo Zanoni
2015-03-27 16:12     ` Chris Wilson
2015-03-30 19:15       ` Paulo Zanoni [this message]
2015-03-27 11:01 ` [PATCH 19/49] drm/i915: Record ring->start address in error state Chris Wilson
2015-03-27 11:01 ` [PATCH 20/49] drm/i915: Use simpler form of spin_lock_irq(execlist_lock) Chris Wilson
2015-03-27 11:01 ` [PATCH 21/49] drm/i915: Use the global runtime-pm wakelock for a busy GPU for execlists Chris Wilson
2015-03-27 14:19   ` Daniel Vetter
2015-03-27 14:25     ` Chris Wilson
2015-03-27 11:01 ` [PATCH 22/49] drm/i915: Map the execlists context regs once during pinning Chris Wilson
2015-03-27 11:01 ` [PATCH 23/49] drm/i915: Remove vestigal DRI1 ring quiescing code Chris Wilson
2015-03-27 11:01 ` [PATCH 24/49] drm/i915: Tidy execlist submission Chris Wilson
2015-03-27 11:01 ` [PATCH 25/49] drm/i915: Move the execlists retirement to the right spot Chris Wilson
2015-03-27 11:01 ` [PATCH 26/49] drm/i915: Map the ringbuffer using WB on LLC machines Chris Wilson
2015-03-27 11:01 ` [PATCH 27/49] drm/i915: Use a separate slab for requests Chris Wilson
2015-03-27 14:20   ` Daniel Vetter
2015-03-27 14:27     ` Chris Wilson
2015-03-27 11:02 ` [PATCH 28/49] drm/i915: Use the new rq->i915 field where appropriate Chris Wilson
2015-03-27 11:02 ` [PATCH 29/49] drm/i915: Reduce the pointer dance of i915_is_ggtt() Chris Wilson
2015-03-27 14:26   ` Daniel Vetter
2015-03-27 11:02 ` [PATCH 30/49] drm/i915: Squash more pointer indirection for i915_is_gtt Chris Wilson
2015-03-27 11:02 ` [PATCH 31/49] drm/i915: Reduce locking in execlist command submission Chris Wilson
2015-03-27 11:40   ` Tvrtko Ursulin
2015-03-27 11:47     ` Chris Wilson
2015-03-27 11:54       ` Tvrtko Ursulin
2015-03-27 14:15       ` Daniel Vetter
2015-03-27 11:02 ` [PATCH 32/49] drm/i915: Reduce more " Chris Wilson
2015-03-27 11:02 ` [PATCH 33/49] drm/i915: Reduce locking in gen8 IRQ handler Chris Wilson
2015-03-27 14:13   ` Daniel Vetter
2015-03-27 14:14     ` Chris Wilson
2015-03-27 11:02 ` [PATCH 34/49] drm/i915: Tidy " Chris Wilson
2015-03-27 11:02 ` [PATCH 35/49] drm/i915: Remove request retirement before each batch Chris Wilson
2015-03-27 11:02 ` [PATCH 36/49] drm/i915: Cache the GGTT offset for the execlists context Chris Wilson
2015-03-27 11:02 ` [PATCH 37/49] drm/i915: Prefer to check for idleness in worker rather than sync-flush Chris Wilson
2015-03-27 11:02 ` [PATCH 38/49] drm/i915: Skip allocating shadow batch for 0-length batches Chris Wilson
2015-03-27 14:28   ` Daniel Vetter
2015-03-30 12:02   ` Chris Wilson
2015-03-30 14:59     ` Daniel Vetter
2015-03-27 11:02 ` [PATCH 39/49] drm/i915: Remove request->uniq Chris Wilson
2015-03-27 11:02 ` [PATCH 40/49] drm/i915: Cache the reset_counter for the request Chris Wilson
2015-03-27 11:02 ` [PATCH 41/49] drm/i915: Allocate context objects from stolen Chris Wilson
2015-03-27 11:02 ` [PATCH 42/49] drm/i915: Introduce an internal allocator for disposable private objects Chris Wilson
2015-03-27 11:02 ` [PATCH 43/49] drm/i915: Do not zero initialise page tables Chris Wilson
2015-04-07 14:46   ` Mika Kuoppala
2015-04-07 15:00     ` Chris Wilson
2015-03-27 11:02 ` [PATCH 44/49] drm/i915: The argument for postfix is redundant Chris Wilson
2015-03-27 11:02 ` [PATCH 45/49] drm/i915: Record the position of the start of the request Chris Wilson
2015-03-27 11:02 ` [PATCH 46/49] drm/i915: Cache the execlist ctx descriptor Chris Wilson
2015-03-27 11:02 ` [PATCH 47/49] drm/i915: Treat ringbuffer writes as write to normal memory Chris Wilson
2015-03-27 11:02 ` [PATCH 48/49] drm/i915: Eliminate vmap overhead for cmd parser Chris Wilson
2015-03-27 11:02 ` [PATCH 49/49] drm/i915: Cache last cmd descriptor when parsing Chris Wilson
2015-03-28  6:21   ` shuang.he

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+gsUGSFY4mC_KoFr0ga4847up8ZhYDD1s8Rde8WV8qVii9bFQ@mail.gmail.com \
    --to=przanoni@gmail.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=daniel.vetter@ffwll.ch \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=mika.kuoppala@intel.com \
    --cc=paulo.r.zanoni@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.