All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>,
	Tvrtko Ursulin <tursulin@ursulin.net>,
	"Intel-gfx@lists.freedesktop.org"
	<Intel-gfx@lists.freedesktop.org>
Subject: Re: [PATCH 06/10] drm/i915/pmu: Wire up engine busy stats to PMU
Date: Wed, 4 Oct 2017 17:52:30 +0000	[thread overview]
Message-ID: <3EDB40B547243546A1017B798F8C1AA89A5DD6DB@ORSMSX114.amr.corp.intel.com> (raw)
In-Reply-To: <6a56eed4-ec6f-29b3-4372-b1445263c44c@linux.intel.com>

Nice data, thank you!

-----Original Message-----
From: Tvrtko Ursulin [mailto:tvrtko.ursulin@linux.intel.com] 
Sent: Wednesday, October 4, 2017 10:35 AM
To: Tvrtko Ursulin <tursulin@ursulin.net>; Intel-gfx@lists.freedesktop.org
Cc: Chris Wilson <chris@chris-wilson.co.uk>; Rogozhkin, Dmitry V <dmitry.v.rogozhkin@intel.com>
Subject: Re: [Intel-gfx] [PATCH 06/10] drm/i915/pmu: Wire up engine busy stats to PMU


On 29/09/2017 13:34, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> We can use engine busy stats instead of the sampling timer for better 
> accuracy.
> 
> By doing this we replace the stohastic sampling with busyness metric 
> derived directly from engine activity. This is context switch 
> interrupt driven, so as accurate as we can get from software tracking.

To quantify the precision benefit with a graph:

https://people.freedesktop.org/~tursulin/sampling-vs-busy-stats.png

This represents the render engine busyness on neverball menu screen looping a few times. PMU sampling interval is 100ms. It was generated from a special build with both PMU counters running in parallel so the data is absolutely aligned.

Anomaly of busyness going over 100% is caused by me not bothering to use the actual timestamps got from PMU, but using fixed interval (perf stat -I 100). But I don't think it matters for the rough comparison and looking at the noise and error in the sampling version.

> As a secondary benefit, we can also not run the sampling timer in 
> cases only busyness metric is enabled.

For this one data is not showing anything significant. I've seen the i915_sample function use <0.1% of CPU time but that's it. Probably with the original version which used MMIO it was much worse.

Regards,

Tvrtko

> v2: Rebase.
> v3:
>   * Rebase, comments.
>   * Leave engine busyness controls out of workers.
> v4: Checkpatch cleanup.
> v5: Added comment to pmu_needs_timer change.
> v6:
>   * Rebase.
>   * Fix style of some comments. (Chris Wilson)
> v7: Rebase and commit message update. (Chris Wilson)
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_pmu.c         | 37 +++++++++++++++++++++++++++++++--
>   drivers/gpu/drm/i915/intel_ringbuffer.h |  5 +++++
>   2 files changed, 40 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_pmu.c 
> b/drivers/gpu/drm/i915/i915_pmu.c index f341c904c159..93c0e7ec7d75 
> 100644
> --- a/drivers/gpu/drm/i915/i915_pmu.c
> +++ b/drivers/gpu/drm/i915/i915_pmu.c
> @@ -90,6 +90,11 @@ static unsigned int event_enabled_bit(struct perf_event *event)
>   	return config_enabled_bit(event->attr.config);
>   }
>   
> +static bool supports_busy_stats(void) {
> +	return i915_modparams.enable_execlists; }
> +
>   static bool pmu_needs_timer(struct drm_i915_private *i915, bool gpu_active)
>   {
>   	u64 enable;
> @@ -115,6 +120,12 @@ static bool pmu_needs_timer(struct drm_i915_private *i915, bool gpu_active)
>   	 */
>   	if (!gpu_active)
>   		enable &= ~ENGINE_SAMPLE_MASK;
> +	/*
> +	 * Also there is software busyness tracking available we do not
> +	 * need the timer for I915_SAMPLE_BUSY counter.
> +	 */
> +	else if (supports_busy_stats())
> +		enable &= ~BIT(I915_SAMPLE_BUSY);
>   
>   	/*
>   	 * If some bits remain it means we need the sampling timer running.
> @@ -362,6 +373,9 @@ static u64 __i915_pmu_event_read(struct perf_event 
> *event)
>   
>   		if (WARN_ON_ONCE(!engine)) {
>   			/* Do nothing */
> +		} else if (sample == I915_SAMPLE_BUSY &&
> +			   engine->pmu.busy_stats) {
> +			val = ktime_to_ns(intel_engine_get_busy_time(engine));
>   		} else {
>   			val = engine->pmu.sample[sample].cur;
>   		}
> @@ -398,6 +412,12 @@ static void i915_pmu_event_read(struct perf_event *event)
>   	local64_add(new - prev, &event->count);
>   }
>   
> +static bool engine_needs_busy_stats(struct intel_engine_cs *engine) {
> +	return supports_busy_stats() &&
> +	       (engine->pmu.enable & BIT(I915_SAMPLE_BUSY)); }
> +
>   static void i915_pmu_enable(struct perf_event *event)
>   {
>   	struct drm_i915_private *i915 =
> @@ -437,7 +457,14 @@ static void i915_pmu_enable(struct perf_event 
> *event)
>   
>   		GEM_BUG_ON(sample >= I915_PMU_SAMPLE_BITS);
>   		GEM_BUG_ON(engine->pmu.enable_count[sample] == ~0);
> -		engine->pmu.enable_count[sample]++;
> +		if (engine->pmu.enable_count[sample]++ == 0) {
> +			if (engine_needs_busy_stats(engine) &&
> +			    !engine->pmu.busy_stats) {
> +				engine->pmu.busy_stats =
> +					intel_enable_engine_stats(engine) == 0;
> +				WARN_ON_ONCE(!engine->pmu.busy_stats);
> +			}
> +		}
>   	}
>   
>   	/*
> @@ -473,8 +500,14 @@ static void i915_pmu_disable(struct perf_event *event)
>   		 * Decrement the reference count and clear the enabled
>   		 * bitmask when the last listener on an event goes away.
>   		 */
> -		if (--engine->pmu.enable_count[sample] == 0)
> +		if (--engine->pmu.enable_count[sample] == 0) {
>   			engine->pmu.enable &= ~BIT(sample);
> +			if (!engine_needs_busy_stats(engine) &&
> +			    engine->pmu.busy_stats) {
> +				engine->pmu.busy_stats = false;
> +				intel_disable_engine_stats(engine);
> +			}
> +		}
>   	}
>   
>   	GEM_BUG_ON(bit >= I915_PMU_MASK_BITS); diff --git 
> a/drivers/gpu/drm/i915/intel_ringbuffer.h 
> b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index ee738c7694e5..0f8ccb243407 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -351,6 +351,11 @@ struct intel_engine_cs {
>   		 * Our internal timer stores the current counters in this field.
>   		 */
>   		struct i915_pmu_sample sample[I915_ENGINE_SAMPLE_MAX];
> +		/**
> +		 * @busy_stats: Has enablement of engine stats tracking been
> +		 * 		requested.
> +		 */
> +		bool busy_stats;
>   	} pmu;
>   
>   	/*
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2017-10-04 17:52 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-29 12:34 [PATCH v6 00/10] i915 PMU and engine busy stats Tvrtko Ursulin
2017-09-29 12:34 ` [PATCH 01/10] drm/i915: Extract intel_get_cagf Tvrtko Ursulin
2017-09-29 12:34 ` [PATCH 02/10] drm/i915/pmu: Expose a PMU interface for perf queries Tvrtko Ursulin
2017-09-29 12:46   ` Chris Wilson
2017-10-05 13:04     ` [PATCH v13 " Tvrtko Ursulin
2017-09-29 12:34 ` [PATCH 03/10] drm/i915/pmu: Suspend sampling when GPU is idle Tvrtko Ursulin
2017-09-29 12:34 ` [PATCH 04/10] drm/i915: Wrap context schedule notification Tvrtko Ursulin
2017-09-29 12:34 ` [PATCH 05/10] drm/i915: Engine busy time tracking Tvrtko Ursulin
2017-09-29 12:47   ` Chris Wilson
2017-09-29 12:34 ` [PATCH 06/10] drm/i915/pmu: Wire up engine busy stats to PMU Tvrtko Ursulin
2017-10-04 17:35   ` Tvrtko Ursulin
2017-10-04 17:52     ` Rogozhkin, Dmitry V [this message]
2017-09-29 12:34 ` [PATCH 07/10] drm/i915: Gate engine stats collection with a static key Tvrtko Ursulin
2017-10-03 10:17   ` Chris Wilson
2017-10-04 17:38     ` Tvrtko Ursulin
2017-10-04 17:49       ` Chris Wilson
2017-10-05  7:07         ` Tvrtko Ursulin
2017-10-05  9:04           ` Chris Wilson
2017-10-05 13:05     ` [PATCH v10 " Tvrtko Ursulin
2017-09-29 12:34 ` [PATCH 08/10] drm/i915/pmu: Add interrupt count metric Tvrtko Ursulin
2017-09-29 12:53   ` Chris Wilson
2017-09-29 12:34 ` [PATCH 09/10] drm/i915: Convert intel_rc6_residency_us to ns Tvrtko Ursulin
2017-09-29 12:35 ` [PATCH 10/10] drm/i915/pmu: Add RC6 residency metrics Tvrtko Ursulin
2017-09-29 12:54   ` Chris Wilson
2017-09-29 13:29 ` ✗ Fi.CI.BAT: failure for i915 PMU and engine busy stats (rev14) Patchwork
2017-10-06  8:23 ` ✗ Fi.CI.BAT: failure for i915 PMU and engine busy stats (rev16) Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3EDB40B547243546A1017B798F8C1AA89A5DD6DB@ORSMSX114.amr.corp.intel.com \
    --to=dmitry.v.rogozhkin@intel.com \
    --cc=Intel-gfx@lists.freedesktop.org \
    --cc=tursulin@ursulin.net \
    --cc=tvrtko.ursulin@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.