* [Intel-gfx] [PATCH v7 0/3] Dynamic EU configuration of Slice/Sub-slice/EU
@ 2020-03-16 13:36 Ankit Navik
2020-03-16 13:36 ` [Intel-gfx] [PATCH v7 1/3] drm/i915: Get active pending request for given context Ankit Navik
` (3 more replies)
0 siblings, 4 replies; 14+ messages in thread
From: Ankit Navik @ 2020-03-16 13:36 UTC (permalink / raw)
To: intel-gfx; +Cc: ankit.p.navik
drm/i915: Context aware user agnostic EU/Slice/Sub-slice control within kernel
This patch sets improves GPU power consumption on Linux kernel based OS such as
Chromium OS, Ubuntu, etc. Following are the power savings.
Power savings on GLK-GT1 Bobba platform running on Chrome OS.
-----------------------------------------------|
App /KPI | % Power Benefit (mW) |
------------------------|----------------------|
Hangout Call- 20 minute | 1.8% |
Youtube 4K VPB | 14.13% |
WebGL Aquarium | 13.76% |
Unity3D | 6.78% |
| |
------------------------|----------------------|
Chrome PLT | BatteryLife Improves |
| by ~45 minute |
-----------------------------------------------|
Power savings on KBL-GT3 running on Android and Ubuntu (Linux).
-----------------------------------------------|
App /KPI | % Power Benefit (mW) |
|----------------------|
| Android | Ubuntu |
------------------------|----------|-----------|
3D Mark (Ice storm) | 2.30% | N.A. |
TRex On screen | 2.49% | 2.97% |
Manhattan On screen | 3.11% | 4.90% |
Carchase On Screen | N.A. | 5.06% |
AnTuTu 6.1.4 | 3.42% | N.A. |
SynMark2 | N.A. | 1.7% |
-----------------------------------------------|
We have also observed GPU core residencies improves by 1.035%.
Technical Insights of the patch:
Current GPU configuration code for i915 does not allow us to change
EU/Slice/Sub-slice configuration dynamically. Its done only once while context
is created.
While particular graphics application is running, if we examine the command
requests from user space, we observe that command density is not consistent.
It means there is scope to change the graphics configuration dynamically even
while context is running actively. This patch series proposes the solution to
find the active pending load for all active context at given time and based on
that, dynamically perform graphics configuration for each context.
The feature can be enabled using sysfs. We examine pending
commands for a context in the queue, essentially, we intercept them before
they are executed by GPU and we update context with required number of EUs.
For the prior one, empirical data to achieve best performance
in least power was considered. For the later one, we roughly categorized number
of EUs logically based on platform. Now we compare number of pending commands
with a particular threshold and then set number of EUs accordingly with update
context. That threshold is also based on experiments & findings. If GPU is able
to catch up with CPU, typically there are no pending commands, the EU config
would remain unchanged there. In case there are more pending commands we
reprogram context with higher number of EUs.
Ankit Navik (3):
drm/i915: Get active pending request for given context
drm/i915: set optimum eu/slice/sub-slice configuration based on load
type
drm/i915: Predictive governor to control slice/subslice/eu
drivers/gpu/drm/i915/gem/i915_gem_context.c | 4 ++
drivers/gpu/drm/i915/gem/i915_gem_context_types.h | 37 +++++++++++
drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 2 +
drivers/gpu/drm/i915/gt/intel_context_sseu.c | 2 +
drivers/gpu/drm/i915/gt/intel_context_types.h | 2 +
drivers/gpu/drm/i915/gt/intel_lrc.c | 79 ++++++++++++++++++++++-
drivers/gpu/drm/i915/i915_drv.h | 5 ++
drivers/gpu/drm/i915/i915_sysfs.c | 32 +++++++++
drivers/gpu/drm/i915/intel_device_info.c | 55 +++++++++++++++-
9 files changed, 214 insertions(+), 4 deletions(-)
--
2.7.4
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Intel-gfx] [PATCH v7 1/3] drm/i915: Get active pending request for given context
2020-03-16 13:36 [Intel-gfx] [PATCH v7 0/3] Dynamic EU configuration of Slice/Sub-slice/EU Ankit Navik
@ 2020-03-16 13:36 ` Ankit Navik
2020-03-16 13:37 ` [Intel-gfx] [PATCH v7 2/3] drm/i915: set optimum eu/slice/sub-slice configuration based on load type Ankit Navik
` (2 subsequent siblings)
3 siblings, 0 replies; 14+ messages in thread
From: Ankit Navik @ 2020-03-16 13:36 UTC (permalink / raw)
To: intel-gfx; +Cc: ankit.p.navik
This patch gives us the active pending request count which is yet
to be submitted to the GPU.
V2:
* Change 64-bit to atomic for request count. (Tvrtko Ursulin)
V3:
* Remove mutex for request count.
* Rebase.
* Fixes hitting underflow for predictive request. (Tvrtko Ursulin)
V4:
* Rebase.
V5:
* Rebase.
V6:
* Rebase.
V7:
* Rebase.
* Add GEM_BUG_ON for req_cnt.
Cc: Vipin Anand <vipin.anand@intel.com>
Signed-off-by: Ankit Navik <ankit.p.navik@intel.com>
---
drivers/gpu/drm/i915/gem/i915_gem_context.c | 1 +
drivers/gpu/drm/i915/gem/i915_gem_context_types.h | 5 +++++
drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 2 ++
drivers/gpu/drm/i915/gt/intel_lrc.c | 9 +++++++++
4 files changed, 17 insertions(+)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 026999b34abd..d0ff999429ff 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -879,6 +879,7 @@ i915_gem_create_context(struct drm_i915_private *i915, unsigned int flags)
}
trace_i915_context_create(ctx);
+ atomic_set(&ctx->req_cnt, 0);
return ctx;
}
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
index 28760bd03265..a9ba13f8865e 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
@@ -171,6 +171,11 @@ struct i915_gem_context {
*/
struct radix_tree_root handles_vma;
+ /** req_cnt: tracks the pending commands, based on which we decide to
+ * go for low/medium/high load configuration of the GPU.
+ */
+ atomic_t req_cnt;
+
/**
* @name: arbitrary name, used for user debug
*
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index d3f4f28e9468..f90c968f95cd 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -2565,6 +2565,8 @@ i915_gem_do_execbuffer(struct drm_device *dev,
if (batch->private)
intel_engine_pool_mark_active(batch->private, eb.request);
+ atomic_inc(&eb.gem_context->req_cnt);
+
trace_i915_request_queue(eb.request, eb.batch_flags);
err = eb_submit(&eb, batch);
err_request:
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 112531b29f59..ccfebebb0071 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -2143,6 +2143,8 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
}
if (__i915_request_submit(rq)) {
+ struct i915_gem_context *ctx;
+
if (!merge) {
*port = execlists_schedule_in(last, port - execlists->pending);
port++;
@@ -2158,6 +2160,13 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
submit = true;
last = rq;
+
+ ctx = rcu_dereference_protected(
+ rq->context->gem_context, true);
+
+ GEM_BUG_ON(atomic_read(&ctx->req_cnt));
+ if (atomic_read(&ctx->req_cnt) > 0)
+ atomic_dec(&ctx->req_cnt);
}
}
--
2.7.4
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [Intel-gfx] [PATCH v7 2/3] drm/i915: set optimum eu/slice/sub-slice configuration based on load type
2020-03-16 13:36 [Intel-gfx] [PATCH v7 0/3] Dynamic EU configuration of Slice/Sub-slice/EU Ankit Navik
2020-03-16 13:36 ` [Intel-gfx] [PATCH v7 1/3] drm/i915: Get active pending request for given context Ankit Navik
@ 2020-03-16 13:37 ` Ankit Navik
2020-03-17 3:42 ` kbuild test robot
2020-03-16 13:37 ` [Intel-gfx] [PATCH v7 3/3] drm/i915: Predictive governor to control slice/subslice/eu Ankit Navik
2020-03-16 21:53 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for Dynamic EU configuration of Slice/Sub-slice/EU (rev7) Patchwork
3 siblings, 1 reply; 14+ messages in thread
From: Ankit Navik @ 2020-03-16 13:37 UTC (permalink / raw)
To: intel-gfx; +Cc: ankit.p.navik
This patch will select optimum eu/slice/sub-slice configuration based on
type of load (low, medium, high) as input.
Based on our readings and experiments we have predefined set of optimum
configuration for each platform(CHT, KBL).
i915_gem_context_set_load_type will select optimum configuration from
pre-defined optimum configuration table(opt_config).
It also introduce flag update_render_config which can set by any governor.
v2:
* Move static optimum_config to device init time.
* Rename function to appropriate name, fix data types and patch ordering.
* Rename prev_load_type to pending_load_type. (Tvrtko Ursulin)
v3:
* Add safe guard check in i915_gem_context_set_load_type.
* Rename struct from optimum_config to i915_sseu_optimum_config to
avoid namespace clashes.
* Reduces memcpy for space efficient.
* Rebase.
* Improved commit message. (Tvrtko Ursulin)
v4:
* Move optimum config table to file scope. (Tvrtko Ursulin)
v5:
* Adds optimal table of slice/sub-slice/EU for Gen 9 GT1.
* Rebase.
v6:
* Rebase.
* Fix warnings.
v7:
* Fix return conditions.
* Remove i915_gem_context_set_load_type and move logic to
__execlists_update_reg_state. (Tvrtko Ursulin)
Cc: Vipin Anand <vipin.anand@intel.com>
Signed-off-by: Ankit Navik <ankit.p.navik@intel.com>
---
drivers/gpu/drm/i915/gem/i915_gem_context.c | 3 +
drivers/gpu/drm/i915/gem/i915_gem_context_types.h | 32 +++++++++++
drivers/gpu/drm/i915/gt/intel_context_sseu.c | 2 +
drivers/gpu/drm/i915/gt/intel_context_types.h | 2 +
drivers/gpu/drm/i915/gt/intel_lrc.c | 70 ++++++++++++++++++++++-
drivers/gpu/drm/i915/i915_drv.h | 5 ++
drivers/gpu/drm/i915/intel_device_info.c | 55 +++++++++++++++++-
7 files changed, 165 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index d0ff999429ff..3aad45b0ba5a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -880,6 +880,9 @@ i915_gem_create_context(struct drm_i915_private *i915, unsigned int flags)
trace_i915_context_create(ctx);
atomic_set(&ctx->req_cnt, 0);
+ ctx->slice_cnt = hweight8(RUNTIME_INFO(i915)->sseu.slice_mask);
+ ctx->subslice_cnt = hweight8(RUNTIME_INFO(i915)->sseu.subslice_mask[0]);
+ ctx->eu_cnt = RUNTIME_INFO(i915)->sseu.eu_per_subslice;
return ctx;
}
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
index a9ba13f8865e..1af1acd73794 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
@@ -46,6 +46,19 @@ struct i915_gem_engines_iter {
const struct i915_gem_engines *engines;
};
+enum gem_load_type {
+ LOAD_TYPE_LOW,
+ LOAD_TYPE_MEDIUM,
+ LOAD_TYPE_HIGH,
+ LOAD_TYPE_LAST
+};
+
+struct i915_sseu_optimum_config {
+ u8 slice;
+ u8 subslice;
+ u8 eu;
+};
+
/**
* struct i915_gem_context - client state
*
@@ -155,6 +168,25 @@ struct i915_gem_context {
*/
atomic_t active_count;
+ /** slice_cnt: used to set the # of slices to be enabled. */
+ u8 slice_cnt;
+
+ /** subslice_cnt: used to set the # of subslices to be enabled. */
+ u8 subslice_cnt;
+
+ /** eu_cnt: used to set the # of eu to be enabled. */
+ u8 eu_cnt;
+
+ /** load_type: The designated load_type (high/medium/low) for a given
+ * number of pending commands in the command queue.
+ */
+ enum gem_load_type load_type;
+
+ /** pending_load_type: The earlier load type that the GPU was configured
+ * for (high/medium/low).
+ */
+ enum gem_load_type pending_load_type;
+
/**
* @hang_timestamp: The last time(s) this context caused a GPU hang
*/
diff --git a/drivers/gpu/drm/i915/gt/intel_context_sseu.c b/drivers/gpu/drm/i915/gt/intel_context_sseu.c
index 57a30956c922..4f51bfb9690c 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_sseu.c
+++ b/drivers/gpu/drm/i915/gt/intel_context_sseu.c
@@ -84,6 +84,8 @@ intel_context_reconfigure_sseu(struct intel_context *ce,
if (ret)
return ret;
+ ce->user_sseu = true;
+
/* Nothing to do if unmodified. */
if (!memcmp(&ce->sseu, &sseu, sizeof(sseu)))
goto unlock;
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 0f3b68b95c56..fd5811110026 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -93,6 +93,8 @@ struct intel_context {
const struct intel_context_ops *ops;
+ bool user_sseu;
+
/** sseu: Control eu/slice partitioning */
struct intel_sseu sseu;
};
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index ccfebebb0071..7c5f05886278 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -177,6 +177,14 @@
/* Typical size of the average request (2 pipecontrols and a MI_BB) */
#define EXECLISTS_REQUEST_SIZE 64 /* bytes */
+/*
+ * Anything above threshold is considered as HIGH load, less is considered
+ * as LOW load and equal is considered as MEDIUM load.
+ *
+ * The threshold value of three active requests pending.
+ */
+#define PENDING_THRESHOLD_MEDIUM 3
+
struct virtual_engine {
struct intel_engine_cs base;
struct intel_context context;
@@ -3002,6 +3010,36 @@ static void execlists_context_unpin(struct intel_context *ce)
i915_gem_object_unpin_map(ce->state->obj);
}
+static u32
+get_context_rpcs_config(struct i915_gem_context *ctx)
+{
+ u32 rpcs = 0;
+ struct drm_i915_private *dev_priv = ctx->i915;
+
+ if (INTEL_GEN(dev_priv) < 8)
+ return 0;
+
+ if (RUNTIME_INFO(dev_priv)->sseu.has_slice_pg) {
+ rpcs |= GEN8_RPCS_S_CNT_ENABLE;
+ rpcs |= ctx->slice_cnt << GEN8_RPCS_S_CNT_SHIFT;
+ rpcs |= GEN8_RPCS_ENABLE;
+ }
+
+ if (RUNTIME_INFO(dev_priv)->sseu.has_subslice_pg) {
+ rpcs |= GEN8_RPCS_SS_CNT_ENABLE;
+ rpcs |= ctx->subslice_cnt << GEN8_RPCS_SS_CNT_SHIFT;
+ rpcs |= GEN8_RPCS_ENABLE;
+ }
+
+ if (RUNTIME_INFO(dev_priv)->sseu.has_eu_pg) {
+ rpcs |= ctx->eu_cnt << GEN8_RPCS_EU_MIN_SHIFT;
+ rpcs |= ctx->eu_cnt << GEN8_RPCS_EU_MAX_SHIFT;
+ rpcs |= GEN8_RPCS_ENABLE;
+ }
+
+ return rpcs;
+}
+
static void
__execlists_update_reg_state(const struct intel_context *ce,
const struct intel_engine_cs *engine,
@@ -3009,6 +3047,10 @@ __execlists_update_reg_state(const struct intel_context *ce,
{
struct intel_ring *ring = ce->ring;
u32 *regs = ce->lrc_reg_state;
+ const struct i915_sseu_optimum_config *cfg;
+ struct i915_gem_context *ctx;
+ enum gem_load_type load_type;
+ u32 req_pending;
GEM_BUG_ON(!intel_ring_offset_valid(ring, head));
GEM_BUG_ON(!intel_ring_offset_valid(ring, ring->tail));
@@ -3018,10 +3060,31 @@ __execlists_update_reg_state(const struct intel_context *ce,
regs[CTX_RING_TAIL] = ring->tail;
regs[CTX_RING_CTL] = RING_CTL_SIZE(ring->size) | RING_VALID;
+ GEM_BUG_ON(ce->engine->class != RENDER_CLASS);
+ ctx = rcu_dereference_protected(ce->gem_context, true);
+
+ req_pending = atomic_read(&ctx->req_cnt);
+
+ if (req_pending > PENDING_THRESHOLD_MEDIUM)
+ load_type = LOAD_TYPE_HIGH;
+ else if (req_pending == PENDING_THRESHOLD_MEDIUM)
+ load_type = LOAD_TYPE_MEDIUM;
+ else
+ load_type = LOAD_TYPE_LOW;
+
+ cfg = &ctx->i915->opt_config[load_type];
+
/* RPCS */
if (engine->class == RENDER_CLASS) {
- regs[CTX_R_PWR_CLK_STATE] =
- intel_sseu_make_rpcs(engine->i915, &ce->sseu);
+
+ if (!ctx || !ctx->i915->predictive_load_enable
+ || ce->user_sseu) {
+ regs[CTX_R_PWR_CLK_STATE] =
+ intel_sseu_make_rpcs(engine->i915, &ce->sseu);
+ } else {
+ regs[CTX_R_PWR_CLK_STATE] =
+ get_context_rpcs_config(ce->gem_context);
+ }
i915_oa_init_reg_state(ce, engine);
}
@@ -3046,6 +3109,9 @@ __execlists_context_pin(struct intel_context *ce,
ce->lrc_reg_state = vaddr + LRC_STATE_PN * PAGE_SIZE;
__execlists_update_reg_state(ce, engine, ce->ring->tail);
+ if (ce->gem_context->load_type != ce->gem_context->pending_load_type)
+ ce->gem_context->load_type = ce->gem_context->pending_load_type;
+
return 0;
}
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 1f5b9a584f71..304d95aa4974 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -926,6 +926,11 @@ struct drm_i915_private {
/* protects panel power sequencer state */
struct mutex pps_mutex;
+ /* optimal slice/subslice/EU configration state */
+ struct i915_sseu_optimum_config *opt_config;
+
+ bool predictive_load_enable;
+
unsigned int fsb_freq, mem_freq, is_ddr3;
unsigned int skl_preferred_vco_freq;
unsigned int max_cdclk_freq;
diff --git a/drivers/gpu/drm/i915/intel_device_info.c b/drivers/gpu/drm/i915/intel_device_info.c
index d7fe12734db8..53d966a9097e 100644
--- a/drivers/gpu/drm/i915/intel_device_info.c
+++ b/drivers/gpu/drm/i915/intel_device_info.c
@@ -899,6 +899,34 @@ void intel_device_info_subplatform_init(struct drm_i915_private *i915)
RUNTIME_INFO(i915)->platform_mask[pi] |= mask;
}
+/* static table of slice/subslice/EU for Cherryview */
+static const struct i915_sseu_optimum_config chv_config[LOAD_TYPE_LAST] = {
+ {1, 1, 4}, /* Low */
+ {1, 1, 6}, /* Medium */
+ {1, 2, 6} /* High */
+};
+
+/* static table of slice/subslice/EU for GLK GT1 */
+static const struct i915_sseu_optimum_config glk_gt1_config[LOAD_TYPE_LAST] = {
+ {1, 2, 2}, /* Low */
+ {1, 2, 3}, /* Medium */
+ {1, 2, 6} /* High */
+};
+
+/* static table of slice/subslice/EU for KBL GT2 */
+static const struct i915_sseu_optimum_config kbl_gt2_config[LOAD_TYPE_LAST] = {
+ {1, 3, 2}, /* Low */
+ {1, 3, 4}, /* Medium */
+ {1, 3, 8} /* High */
+};
+
+/* static table of slice/subslice/EU for KBL GT3 */
+static const struct i915_sseu_optimum_config kbl_gt3_config[LOAD_TYPE_LAST] = {
+ {2, 3, 4}, /* Low */
+ {2, 3, 6}, /* Medium */
+ {2, 3, 8} /* High */
+};
+
/**
* intel_device_info_runtime_init - initialize runtime info
* @dev_priv: the i915 device
@@ -1027,12 +1055,35 @@ void intel_device_info_runtime_init(struct drm_i915_private *dev_priv)
/* Initialize slice/subslice/EU info */
if (IS_HASWELL(dev_priv))
hsw_sseu_info_init(dev_priv);
- else if (IS_CHERRYVIEW(dev_priv))
+ else if (IS_CHERRYVIEW(dev_priv)) {
cherryview_sseu_info_init(dev_priv);
+ BUILD_BUG_ON(ARRAY_SIZE(chv_config) != LOAD_TYPE_LAST);
+ dev_priv->opt_config = chv_config;
+ }
else if (IS_BROADWELL(dev_priv))
bdw_sseu_info_init(dev_priv);
- else if (IS_GEN(dev_priv, 9))
+ else if (IS_GEN(dev_priv, 9)) {
gen9_sseu_info_init(dev_priv);
+
+ switch (info->gt) {
+ default: /* fall through */
+ case 1:
+ BUILD_BUG_ON(ARRAY_SIZE(glk_gt1_config) !=
+ LOAD_TYPE_LAST);
+ dev_priv->opt_config = glk_gt1_config;
+ break;
+ case 2:
+ BUILD_BUG_ON(ARRAY_SIZE(kbl_gt2_config) !=
+ LOAD_TYPE_LAST);
+ dev_priv->opt_config = kbl_gt2_config;
+ break;
+ case 3:
+ BUILD_BUG_ON(ARRAY_SIZE(kbl_gt3_config) !=
+ LOAD_TYPE_LAST);
+ dev_priv->opt_config = kbl_gt3_config;
+ break;
+ }
+ }
else if (IS_GEN(dev_priv, 10))
gen10_sseu_info_init(dev_priv);
else if (IS_GEN(dev_priv, 11))
--
2.7.4
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [Intel-gfx] [PATCH v7 3/3] drm/i915: Predictive governor to control slice/subslice/eu
2020-03-16 13:36 [Intel-gfx] [PATCH v7 0/3] Dynamic EU configuration of Slice/Sub-slice/EU Ankit Navik
2020-03-16 13:36 ` [Intel-gfx] [PATCH v7 1/3] drm/i915: Get active pending request for given context Ankit Navik
2020-03-16 13:37 ` [Intel-gfx] [PATCH v7 2/3] drm/i915: set optimum eu/slice/sub-slice configuration based on load type Ankit Navik
@ 2020-03-16 13:37 ` Ankit Navik
2020-03-16 21:53 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for Dynamic EU configuration of Slice/Sub-slice/EU (rev7) Patchwork
3 siblings, 0 replies; 14+ messages in thread
From: Ankit Navik @ 2020-03-16 13:37 UTC (permalink / raw)
To: intel-gfx; +Cc: ankit.p.navik
Load classification is used for predictive governor to control
eu/slice/subslice based on workloads.
sysfs is provided to enable/disable the feature
V2:
* Fix code style.
* Move predictive_load_timer into a drm_i915_private
structure.
* Make generic function to set optimum config. (Tvrtko Ursulin)
V3:
* Rebase.
* Fix race condition for predictive load set.
* Add slack to start hrtimer for more power efficient. (Tvrtko Ursulin)
V4:
* Fix data type and initialization of mutex to protect predictive load
state.
* Move predictive timer init to i915_gem_init_early. (Tvrtko Ursulin)
* Move debugfs to kernel parameter.
V5:
* Rebase.
* Remove mutex for pred_timer
V6:
* Rebase.
* Fix warnings.
V7:
* Drop timer and move logic to __execlists_update_reg_state. (Tvrtko Ursulin)
* Remove kernel boot param and make it to sysfs entry. (Jani Nikula)
Cc: Vipin Anand <vipin.anand@intel.com>
Signed-off-by: Ankit Navik <ankit.p.navik@intel.com>
---
drivers/gpu/drm/i915/i915_sysfs.c | 32 ++++++++++++++++++++++++++++++++
1 file changed, 32 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_sysfs.c b/drivers/gpu/drm/i915/i915_sysfs.c
index 45d32ef42787..5d76e4992c8d 100644
--- a/drivers/gpu/drm/i915/i915_sysfs.c
+++ b/drivers/gpu/drm/i915/i915_sysfs.c
@@ -433,12 +433,43 @@ static ssize_t gt_min_freq_mhz_store(struct device *kdev,
return ret ?: count;
}
+static ssize_t deu_enable_show(struct device *kdev, struct device_attribute *attr, char *buf)
+{
+ struct drm_i915_private *i915 = kdev_minor_to_i915(kdev);
+
+ return snprintf(buf, PAGE_SIZE, "%u\n", i915->predictive_load_enable);
+}
+
+static ssize_t deu_enable_store(struct device *kdev,
+ struct device_attribute *attr,
+ const char *buf,
+ size_t count)
+{
+ struct drm_i915_private *i915 = kdev_minor_to_i915(kdev);
+ ssize_t ret;
+ u32 val;
+
+ ret = kstrtou32(buf, 0, &val);
+ if (ret)
+ return ret;
+
+ /* Check invalid values */
+ if (val != 0 && val != 1)
+ ret = -EINVAL;
+
+ i915->predictive_load_enable = val;
+
+ return count;
+}
+
static DEVICE_ATTR_RO(gt_act_freq_mhz);
static DEVICE_ATTR_RO(gt_cur_freq_mhz);
static DEVICE_ATTR_RW(gt_boost_freq_mhz);
static DEVICE_ATTR_RW(gt_max_freq_mhz);
static DEVICE_ATTR_RW(gt_min_freq_mhz);
+static DEVICE_ATTR_RW(deu_enable);
+
static DEVICE_ATTR_RO(vlv_rpe_freq_mhz);
static ssize_t gt_rp_mhz_show(struct device *kdev, struct device_attribute *attr, char *buf);
@@ -474,6 +505,7 @@ static const struct attribute * const gen6_attrs[] = {
&dev_attr_gt_RP0_freq_mhz.attr,
&dev_attr_gt_RP1_freq_mhz.attr,
&dev_attr_gt_RPn_freq_mhz.attr,
+ &dev_attr_deu_enable.attr,
NULL,
};
--
2.7.4
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [Intel-gfx] ✗ Fi.CI.BUILD: failure for Dynamic EU configuration of Slice/Sub-slice/EU (rev7)
2020-03-16 13:36 [Intel-gfx] [PATCH v7 0/3] Dynamic EU configuration of Slice/Sub-slice/EU Ankit Navik
` (2 preceding siblings ...)
2020-03-16 13:37 ` [Intel-gfx] [PATCH v7 3/3] drm/i915: Predictive governor to control slice/subslice/eu Ankit Navik
@ 2020-03-16 21:53 ` Patchwork
3 siblings, 0 replies; 14+ messages in thread
From: Patchwork @ 2020-03-16 21:53 UTC (permalink / raw)
To: Ankit Navik; +Cc: intel-gfx
== Series Details ==
Series: Dynamic EU configuration of Slice/Sub-slice/EU (rev7)
URL : https://patchwork.freedesktop.org/series/69980/
State : failure
== Summary ==
CALL scripts/checksyscalls.sh
CALL scripts/atomic/check-atomics.sh
DESCEND objtool
CHK include/generated/compile.h
CC [M] drivers/gpu/drm/i915/intel_device_info.o
drivers/gpu/drm/i915/intel_device_info.c: In function ‘intel_device_info_runtime_init’:
drivers/gpu/drm/i915/intel_device_info.c:1061:24: error: assignment discards ‘const’ qualifier from pointer target type [-Werror=discarded-qualifiers]
dev_priv->opt_config = chv_config;
^
drivers/gpu/drm/i915/intel_device_info.c:1073:25: error: assignment discards ‘const’ qualifier from pointer target type [-Werror=discarded-qualifiers]
dev_priv->opt_config = glk_gt1_config;
^
drivers/gpu/drm/i915/intel_device_info.c:1078:25: error: assignment discards ‘const’ qualifier from pointer target type [-Werror=discarded-qualifiers]
dev_priv->opt_config = kbl_gt2_config;
^
drivers/gpu/drm/i915/intel_device_info.c:1083:25: error: assignment discards ‘const’ qualifier from pointer target type [-Werror=discarded-qualifiers]
dev_priv->opt_config = kbl_gt3_config;
^
cc1: all warnings being treated as errors
scripts/Makefile.build:267: recipe for target 'drivers/gpu/drm/i915/intel_device_info.o' failed
make[4]: *** [drivers/gpu/drm/i915/intel_device_info.o] Error 1
scripts/Makefile.build:505: recipe for target 'drivers/gpu/drm/i915' failed
make[3]: *** [drivers/gpu/drm/i915] Error 2
scripts/Makefile.build:505: recipe for target 'drivers/gpu/drm' failed
make[2]: *** [drivers/gpu/drm] Error 2
scripts/Makefile.build:505: recipe for target 'drivers/gpu' failed
make[1]: *** [drivers/gpu] Error 2
Makefile:1683: recipe for target 'drivers' failed
make: *** [drivers] Error 2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Intel-gfx] [PATCH v7 2/3] drm/i915: set optimum eu/slice/sub-slice configuration based on load type
2020-03-16 13:37 ` [Intel-gfx] [PATCH v7 2/3] drm/i915: set optimum eu/slice/sub-slice configuration based on load type Ankit Navik
@ 2020-03-17 3:42 ` kbuild test robot
0 siblings, 0 replies; 14+ messages in thread
From: kbuild test robot @ 2020-03-17 3:42 UTC (permalink / raw)
To: Ankit Navik; +Cc: intel-gfx, kbuild-all
[-- Attachment #1: Type: text/plain, Size: 9513 bytes --]
Hi Ankit,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on drm-intel/for-linux-next]
[also build test WARNING on drm-tip/drm-tip next-20200316]
[cannot apply to v5.6-rc6]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]
url: https://github.com/0day-ci/linux/commits/Ankit-Navik/Dynamic-EU-configuration-of-Slice-Sub-slice-EU/20200317-070836
base: git://anongit.freedesktop.org/drm-intel for-linux-next
config: i386-allyesconfig (attached as .config)
compiler: gcc-7 (Debian 7.5.0-5) 7.5.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386
If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>
All warnings (new ones prefixed by >>):
drivers/gpu/drm/i915/intel_device_info.c: In function 'intel_device_info_runtime_init':
>> drivers/gpu/drm/i915/intel_device_info.c:1061:24: warning: assignment discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
dev_priv->opt_config = chv_config;
^
drivers/gpu/drm/i915/intel_device_info.c:1073:25: warning: assignment discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
dev_priv->opt_config = glk_gt1_config;
^
drivers/gpu/drm/i915/intel_device_info.c:1078:25: warning: assignment discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
dev_priv->opt_config = kbl_gt2_config;
^
drivers/gpu/drm/i915/intel_device_info.c:1083:25: warning: assignment discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
dev_priv->opt_config = kbl_gt3_config;
^
vim +/const +1061 drivers/gpu/drm/i915/intel_device_info.c
929
930 /**
931 * intel_device_info_runtime_init - initialize runtime info
932 * @dev_priv: the i915 device
933 *
934 * Determine various intel_device_info fields at runtime.
935 *
936 * Use it when either:
937 * - it's judged too laborious to fill n static structures with the limit
938 * when a simple if statement does the job,
939 * - run-time checks (eg read fuse/strap registers) are needed.
940 *
941 * This function needs to be called:
942 * - after the MMIO has been setup as we are reading registers,
943 * - after the PCH has been detected,
944 * - before the first usage of the fields it can tweak.
945 */
946 void intel_device_info_runtime_init(struct drm_i915_private *dev_priv)
947 {
948 struct intel_device_info *info = mkwrite_device_info(dev_priv);
949 struct intel_runtime_info *runtime = RUNTIME_INFO(dev_priv);
950 enum pipe pipe;
951
952 if (INTEL_GEN(dev_priv) >= 10) {
953 for_each_pipe(dev_priv, pipe)
954 runtime->num_scalers[pipe] = 2;
955 } else if (IS_GEN(dev_priv, 9)) {
956 runtime->num_scalers[PIPE_A] = 2;
957 runtime->num_scalers[PIPE_B] = 2;
958 runtime->num_scalers[PIPE_C] = 1;
959 }
960
961 BUILD_BUG_ON(BITS_PER_TYPE(intel_engine_mask_t) < I915_NUM_ENGINES);
962
963 if (INTEL_GEN(dev_priv) >= 11)
964 for_each_pipe(dev_priv, pipe)
965 runtime->num_sprites[pipe] = 6;
966 else if (IS_GEN(dev_priv, 10) || IS_GEMINILAKE(dev_priv))
967 for_each_pipe(dev_priv, pipe)
968 runtime->num_sprites[pipe] = 3;
969 else if (IS_BROXTON(dev_priv)) {
970 /*
971 * Skylake and Broxton currently don't expose the topmost plane as its
972 * use is exclusive with the legacy cursor and we only want to expose
973 * one of those, not both. Until we can safely expose the topmost plane
974 * as a DRM_PLANE_TYPE_CURSOR with all the features exposed/supported,
975 * we don't expose the topmost plane at all to prevent ABI breakage
976 * down the line.
977 */
978
979 runtime->num_sprites[PIPE_A] = 2;
980 runtime->num_sprites[PIPE_B] = 2;
981 runtime->num_sprites[PIPE_C] = 1;
982 } else if (IS_VALLEYVIEW(dev_priv) || IS_CHERRYVIEW(dev_priv)) {
983 for_each_pipe(dev_priv, pipe)
984 runtime->num_sprites[pipe] = 2;
985 } else if (INTEL_GEN(dev_priv) >= 5 || IS_G4X(dev_priv)) {
986 for_each_pipe(dev_priv, pipe)
987 runtime->num_sprites[pipe] = 1;
988 }
989
990 if (HAS_DISPLAY(dev_priv) && IS_GEN_RANGE(dev_priv, 7, 8) &&
991 HAS_PCH_SPLIT(dev_priv)) {
992 u32 fuse_strap = I915_READ(FUSE_STRAP);
993 u32 sfuse_strap = I915_READ(SFUSE_STRAP);
994
995 /*
996 * SFUSE_STRAP is supposed to have a bit signalling the display
997 * is fused off. Unfortunately it seems that, at least in
998 * certain cases, fused off display means that PCH display
999 * reads don't land anywhere. In that case, we read 0s.
1000 *
1001 * On CPT/PPT, we can detect this case as SFUSE_STRAP_FUSE_LOCK
1002 * should be set when taking over after the firmware.
1003 */
1004 if (fuse_strap & ILK_INTERNAL_DISPLAY_DISABLE ||
1005 sfuse_strap & SFUSE_STRAP_DISPLAY_DISABLED ||
1006 (HAS_PCH_CPT(dev_priv) &&
1007 !(sfuse_strap & SFUSE_STRAP_FUSE_LOCK))) {
1008 drm_info(&dev_priv->drm,
1009 "Display fused off, disabling\n");
1010 info->pipe_mask = 0;
1011 } else if (fuse_strap & IVB_PIPE_C_DISABLE) {
1012 drm_info(&dev_priv->drm, "PipeC fused off\n");
1013 info->pipe_mask &= ~BIT(PIPE_C);
1014 }
1015 } else if (HAS_DISPLAY(dev_priv) && INTEL_GEN(dev_priv) >= 9) {
1016 u32 dfsm = I915_READ(SKL_DFSM);
1017 u8 enabled_mask = info->pipe_mask;
1018
1019 if (dfsm & SKL_DFSM_PIPE_A_DISABLE)
1020 enabled_mask &= ~BIT(PIPE_A);
1021 if (dfsm & SKL_DFSM_PIPE_B_DISABLE)
1022 enabled_mask &= ~BIT(PIPE_B);
1023 if (dfsm & SKL_DFSM_PIPE_C_DISABLE)
1024 enabled_mask &= ~BIT(PIPE_C);
1025 if (INTEL_GEN(dev_priv) >= 12 &&
1026 (dfsm & TGL_DFSM_PIPE_D_DISABLE))
1027 enabled_mask &= ~BIT(PIPE_D);
1028
1029 /*
1030 * At least one pipe should be enabled and if there are
1031 * disabled pipes, they should be the last ones, with no holes
1032 * in the mask.
1033 */
1034 if (enabled_mask == 0 || !is_power_of_2(enabled_mask + 1))
1035 drm_err(&dev_priv->drm,
1036 "invalid pipe fuse configuration: enabled_mask=0x%x\n",
1037 enabled_mask);
1038 else
1039 info->pipe_mask = enabled_mask;
1040
1041 if (dfsm & SKL_DFSM_DISPLAY_HDCP_DISABLE)
1042 info->display.has_hdcp = 0;
1043
1044 if (dfsm & SKL_DFSM_DISPLAY_PM_DISABLE)
1045 info->display.has_fbc = 0;
1046
1047 if (INTEL_GEN(dev_priv) >= 11 && (dfsm & ICL_DFSM_DMC_DISABLE))
1048 info->display.has_csr = 0;
1049
1050 if (INTEL_GEN(dev_priv) >= 10 &&
1051 (dfsm & CNL_DFSM_DISPLAY_DSC_DISABLE))
1052 info->display.has_dsc = 0;
1053 }
1054
1055 /* Initialize slice/subslice/EU info */
1056 if (IS_HASWELL(dev_priv))
1057 hsw_sseu_info_init(dev_priv);
1058 else if (IS_CHERRYVIEW(dev_priv)) {
1059 cherryview_sseu_info_init(dev_priv);
1060 BUILD_BUG_ON(ARRAY_SIZE(chv_config) != LOAD_TYPE_LAST);
> 1061 dev_priv->opt_config = chv_config;
1062 }
1063 else if (IS_BROADWELL(dev_priv))
1064 bdw_sseu_info_init(dev_priv);
1065 else if (IS_GEN(dev_priv, 9)) {
1066 gen9_sseu_info_init(dev_priv);
1067
1068 switch (info->gt) {
1069 default: /* fall through */
1070 case 1:
1071 BUILD_BUG_ON(ARRAY_SIZE(glk_gt1_config) !=
1072 LOAD_TYPE_LAST);
1073 dev_priv->opt_config = glk_gt1_config;
1074 break;
1075 case 2:
1076 BUILD_BUG_ON(ARRAY_SIZE(kbl_gt2_config) !=
1077 LOAD_TYPE_LAST);
1078 dev_priv->opt_config = kbl_gt2_config;
1079 break;
1080 case 3:
1081 BUILD_BUG_ON(ARRAY_SIZE(kbl_gt3_config) !=
1082 LOAD_TYPE_LAST);
1083 dev_priv->opt_config = kbl_gt3_config;
1084 break;
1085 }
1086 }
1087 else if (IS_GEN(dev_priv, 10))
1088 gen10_sseu_info_init(dev_priv);
1089 else if (IS_GEN(dev_priv, 11))
1090 gen11_sseu_info_init(dev_priv);
1091 else if (INTEL_GEN(dev_priv) >= 12)
1092 gen12_sseu_info_init(dev_priv);
1093
1094 if (IS_GEN(dev_priv, 6) && intel_vtd_active()) {
1095 drm_info(&dev_priv->drm,
1096 "Disabling ppGTT for VT-d support\n");
1097 info->ppgtt_type = INTEL_PPGTT_NONE;
1098 }
1099
1100 runtime->rawclk_freq = intel_read_rawclk(dev_priv);
1101 drm_dbg(&dev_priv->drm, "rawclk rate: %d kHz\n", runtime->rawclk_freq);
1102
1103 /* Initialize command stream timestamp frequency */
1104 runtime->cs_timestamp_frequency_khz =
1105 read_timestamp_frequency(dev_priv);
1106 if (runtime->cs_timestamp_frequency_khz) {
1107 runtime->cs_timestamp_period_ns =
1108 div_u64(1e6, runtime->cs_timestamp_frequency_khz);
1109 drm_dbg(&dev_priv->drm,
1110 "CS timestamp wraparound in %lldms\n",
1111 div_u64(mul_u32_u32(runtime->cs_timestamp_period_ns,
1112 S32_MAX),
1113 USEC_PER_SEC));
1114 }
1115 }
1116
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 71525 bytes --]
[-- Attachment #3: Type: text/plain, Size: 160 bytes --]
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Intel-gfx] [PATCH v7 2/3] drm/i915: set optimum eu/slice/sub-slice configuration based on load type
@ 2020-03-17 3:42 ` kbuild test robot
0 siblings, 0 replies; 14+ messages in thread
From: kbuild test robot @ 2020-03-17 3:42 UTC (permalink / raw)
To: kbuild-all
[-- Attachment #1: Type: text/plain, Size: 9747 bytes --]
Hi Ankit,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on drm-intel/for-linux-next]
[also build test WARNING on drm-tip/drm-tip next-20200316]
[cannot apply to v5.6-rc6]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]
url: https://github.com/0day-ci/linux/commits/Ankit-Navik/Dynamic-EU-configuration-of-Slice-Sub-slice-EU/20200317-070836
base: git://anongit.freedesktop.org/drm-intel for-linux-next
config: i386-allyesconfig (attached as .config)
compiler: gcc-7 (Debian 7.5.0-5) 7.5.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386
If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>
All warnings (new ones prefixed by >>):
drivers/gpu/drm/i915/intel_device_info.c: In function 'intel_device_info_runtime_init':
>> drivers/gpu/drm/i915/intel_device_info.c:1061:24: warning: assignment discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
dev_priv->opt_config = chv_config;
^
drivers/gpu/drm/i915/intel_device_info.c:1073:25: warning: assignment discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
dev_priv->opt_config = glk_gt1_config;
^
drivers/gpu/drm/i915/intel_device_info.c:1078:25: warning: assignment discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
dev_priv->opt_config = kbl_gt2_config;
^
drivers/gpu/drm/i915/intel_device_info.c:1083:25: warning: assignment discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
dev_priv->opt_config = kbl_gt3_config;
^
vim +/const +1061 drivers/gpu/drm/i915/intel_device_info.c
929
930 /**
931 * intel_device_info_runtime_init - initialize runtime info
932 * @dev_priv: the i915 device
933 *
934 * Determine various intel_device_info fields at runtime.
935 *
936 * Use it when either:
937 * - it's judged too laborious to fill n static structures with the limit
938 * when a simple if statement does the job,
939 * - run-time checks (eg read fuse/strap registers) are needed.
940 *
941 * This function needs to be called:
942 * - after the MMIO has been setup as we are reading registers,
943 * - after the PCH has been detected,
944 * - before the first usage of the fields it can tweak.
945 */
946 void intel_device_info_runtime_init(struct drm_i915_private *dev_priv)
947 {
948 struct intel_device_info *info = mkwrite_device_info(dev_priv);
949 struct intel_runtime_info *runtime = RUNTIME_INFO(dev_priv);
950 enum pipe pipe;
951
952 if (INTEL_GEN(dev_priv) >= 10) {
953 for_each_pipe(dev_priv, pipe)
954 runtime->num_scalers[pipe] = 2;
955 } else if (IS_GEN(dev_priv, 9)) {
956 runtime->num_scalers[PIPE_A] = 2;
957 runtime->num_scalers[PIPE_B] = 2;
958 runtime->num_scalers[PIPE_C] = 1;
959 }
960
961 BUILD_BUG_ON(BITS_PER_TYPE(intel_engine_mask_t) < I915_NUM_ENGINES);
962
963 if (INTEL_GEN(dev_priv) >= 11)
964 for_each_pipe(dev_priv, pipe)
965 runtime->num_sprites[pipe] = 6;
966 else if (IS_GEN(dev_priv, 10) || IS_GEMINILAKE(dev_priv))
967 for_each_pipe(dev_priv, pipe)
968 runtime->num_sprites[pipe] = 3;
969 else if (IS_BROXTON(dev_priv)) {
970 /*
971 * Skylake and Broxton currently don't expose the topmost plane as its
972 * use is exclusive with the legacy cursor and we only want to expose
973 * one of those, not both. Until we can safely expose the topmost plane
974 * as a DRM_PLANE_TYPE_CURSOR with all the features exposed/supported,
975 * we don't expose the topmost plane at all to prevent ABI breakage
976 * down the line.
977 */
978
979 runtime->num_sprites[PIPE_A] = 2;
980 runtime->num_sprites[PIPE_B] = 2;
981 runtime->num_sprites[PIPE_C] = 1;
982 } else if (IS_VALLEYVIEW(dev_priv) || IS_CHERRYVIEW(dev_priv)) {
983 for_each_pipe(dev_priv, pipe)
984 runtime->num_sprites[pipe] = 2;
985 } else if (INTEL_GEN(dev_priv) >= 5 || IS_G4X(dev_priv)) {
986 for_each_pipe(dev_priv, pipe)
987 runtime->num_sprites[pipe] = 1;
988 }
989
990 if (HAS_DISPLAY(dev_priv) && IS_GEN_RANGE(dev_priv, 7, 8) &&
991 HAS_PCH_SPLIT(dev_priv)) {
992 u32 fuse_strap = I915_READ(FUSE_STRAP);
993 u32 sfuse_strap = I915_READ(SFUSE_STRAP);
994
995 /*
996 * SFUSE_STRAP is supposed to have a bit signalling the display
997 * is fused off. Unfortunately it seems that, at least in
998 * certain cases, fused off display means that PCH display
999 * reads don't land anywhere. In that case, we read 0s.
1000 *
1001 * On CPT/PPT, we can detect this case as SFUSE_STRAP_FUSE_LOCK
1002 * should be set when taking over after the firmware.
1003 */
1004 if (fuse_strap & ILK_INTERNAL_DISPLAY_DISABLE ||
1005 sfuse_strap & SFUSE_STRAP_DISPLAY_DISABLED ||
1006 (HAS_PCH_CPT(dev_priv) &&
1007 !(sfuse_strap & SFUSE_STRAP_FUSE_LOCK))) {
1008 drm_info(&dev_priv->drm,
1009 "Display fused off, disabling\n");
1010 info->pipe_mask = 0;
1011 } else if (fuse_strap & IVB_PIPE_C_DISABLE) {
1012 drm_info(&dev_priv->drm, "PipeC fused off\n");
1013 info->pipe_mask &= ~BIT(PIPE_C);
1014 }
1015 } else if (HAS_DISPLAY(dev_priv) && INTEL_GEN(dev_priv) >= 9) {
1016 u32 dfsm = I915_READ(SKL_DFSM);
1017 u8 enabled_mask = info->pipe_mask;
1018
1019 if (dfsm & SKL_DFSM_PIPE_A_DISABLE)
1020 enabled_mask &= ~BIT(PIPE_A);
1021 if (dfsm & SKL_DFSM_PIPE_B_DISABLE)
1022 enabled_mask &= ~BIT(PIPE_B);
1023 if (dfsm & SKL_DFSM_PIPE_C_DISABLE)
1024 enabled_mask &= ~BIT(PIPE_C);
1025 if (INTEL_GEN(dev_priv) >= 12 &&
1026 (dfsm & TGL_DFSM_PIPE_D_DISABLE))
1027 enabled_mask &= ~BIT(PIPE_D);
1028
1029 /*
1030 * At least one pipe should be enabled and if there are
1031 * disabled pipes, they should be the last ones, with no holes
1032 * in the mask.
1033 */
1034 if (enabled_mask == 0 || !is_power_of_2(enabled_mask + 1))
1035 drm_err(&dev_priv->drm,
1036 "invalid pipe fuse configuration: enabled_mask=0x%x\n",
1037 enabled_mask);
1038 else
1039 info->pipe_mask = enabled_mask;
1040
1041 if (dfsm & SKL_DFSM_DISPLAY_HDCP_DISABLE)
1042 info->display.has_hdcp = 0;
1043
1044 if (dfsm & SKL_DFSM_DISPLAY_PM_DISABLE)
1045 info->display.has_fbc = 0;
1046
1047 if (INTEL_GEN(dev_priv) >= 11 && (dfsm & ICL_DFSM_DMC_DISABLE))
1048 info->display.has_csr = 0;
1049
1050 if (INTEL_GEN(dev_priv) >= 10 &&
1051 (dfsm & CNL_DFSM_DISPLAY_DSC_DISABLE))
1052 info->display.has_dsc = 0;
1053 }
1054
1055 /* Initialize slice/subslice/EU info */
1056 if (IS_HASWELL(dev_priv))
1057 hsw_sseu_info_init(dev_priv);
1058 else if (IS_CHERRYVIEW(dev_priv)) {
1059 cherryview_sseu_info_init(dev_priv);
1060 BUILD_BUG_ON(ARRAY_SIZE(chv_config) != LOAD_TYPE_LAST);
> 1061 dev_priv->opt_config = chv_config;
1062 }
1063 else if (IS_BROADWELL(dev_priv))
1064 bdw_sseu_info_init(dev_priv);
1065 else if (IS_GEN(dev_priv, 9)) {
1066 gen9_sseu_info_init(dev_priv);
1067
1068 switch (info->gt) {
1069 default: /* fall through */
1070 case 1:
1071 BUILD_BUG_ON(ARRAY_SIZE(glk_gt1_config) !=
1072 LOAD_TYPE_LAST);
1073 dev_priv->opt_config = glk_gt1_config;
1074 break;
1075 case 2:
1076 BUILD_BUG_ON(ARRAY_SIZE(kbl_gt2_config) !=
1077 LOAD_TYPE_LAST);
1078 dev_priv->opt_config = kbl_gt2_config;
1079 break;
1080 case 3:
1081 BUILD_BUG_ON(ARRAY_SIZE(kbl_gt3_config) !=
1082 LOAD_TYPE_LAST);
1083 dev_priv->opt_config = kbl_gt3_config;
1084 break;
1085 }
1086 }
1087 else if (IS_GEN(dev_priv, 10))
1088 gen10_sseu_info_init(dev_priv);
1089 else if (IS_GEN(dev_priv, 11))
1090 gen11_sseu_info_init(dev_priv);
1091 else if (INTEL_GEN(dev_priv) >= 12)
1092 gen12_sseu_info_init(dev_priv);
1093
1094 if (IS_GEN(dev_priv, 6) && intel_vtd_active()) {
1095 drm_info(&dev_priv->drm,
1096 "Disabling ppGTT for VT-d support\n");
1097 info->ppgtt_type = INTEL_PPGTT_NONE;
1098 }
1099
1100 runtime->rawclk_freq = intel_read_rawclk(dev_priv);
1101 drm_dbg(&dev_priv->drm, "rawclk rate: %d kHz\n", runtime->rawclk_freq);
1102
1103 /* Initialize command stream timestamp frequency */
1104 runtime->cs_timestamp_frequency_khz =
1105 read_timestamp_frequency(dev_priv);
1106 if (runtime->cs_timestamp_frequency_khz) {
1107 runtime->cs_timestamp_period_ns =
1108 div_u64(1e6, runtime->cs_timestamp_frequency_khz);
1109 drm_dbg(&dev_priv->drm,
1110 "CS timestamp wraparound in %lldms\n",
1111 div_u64(mul_u32_u32(runtime->cs_timestamp_period_ns,
1112 S32_MAX),
1113 USEC_PER_SEC));
1114 }
1115 }
1116
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 71525 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Intel-gfx] [PATCH v7 0/3] Dynamic EU configuration of Slice/Sub-slice/EU
2020-03-15 18:08 ` Francisco Jerez
@ 2020-03-15 22:30 ` Lionel Landwerlin
0 siblings, 0 replies; 14+ messages in thread
From: Lionel Landwerlin @ 2020-03-15 22:30 UTC (permalink / raw)
To: Francisco Jerez, srinivasan.s, intel-gfx, chris, tvrtko.ursulin
On 15/03/2020 20:08, Francisco Jerez wrote:
> Lionel Landwerlin <lionel.g.landwerlin@intel.com> writes:
>
>> On 15/03/2020 02:12, Francisco Jerez wrote:
>>> srinivasan.s@intel.com writes:
>>>
>>>> From: Srinivasan S <srinivasan.s@intel.com>
>>>>
>>>> drm/i915: Context aware user agnostic EU/Slice/Sub-slice control within kernel
>>>>
>>>> This patch sets improves GPU power consumption on Linux kernel based OS such as
>>>> Chromium OS, Ubuntu, etc. Following are the power savings.
>>>>
>>>> Power savings on GLK-GT1 Bobba platform running on Chrome OS.
>>>> -----------------------------------------------|
>>>> App /KPI | % Power Benefit (mW) |
>>>> ------------------------|----------------------|
>>>> Hangout Call- 20 minute | 1.8% |
>>>> Youtube 4K VPB | 14.13% |
>>>> WebGL Aquarium | 13.76% |
>>>> Unity3D | 6.78% |
>>>> | |
>>>> ------------------------|----------------------|
>>>> Chrome PLT | BatteryLife Improves |
>>>> | by ~45 minute |
>>>> -----------------------------------------------|
>>>>
>>>> Power savings on KBL-GT3 running on Android and Ubuntu (Linux).
>>>> -----------------------------------------------|
>>>> App /KPI | % Power Benefit (mW) |
>>>> |----------------------|
>>>> | Android | Ubuntu |
>>>> ------------------------|----------|-----------|
>>>> 3D Mark (Ice storm) | 2.30% | N.A. |
>>>> TRex On screen | 2.49% | 2.97% |
>>>> Manhattan On screen | 3.11% | 4.90% |
>>>> Carchase On Screen | N.A. | 5.06% |
>>>> AnTuTu 6.1.4 | 3.42% | N.A. |
>>>> SynMark2 | N.A. | 1.7% |
>>>> -----------------------------------------------|
>>>>
>>> Did you get any performance (e.g. FPS) measurements from those
>>> test-cases? There is quite some potential for this feature to constrain
>>> the GPU throughput inadvertently, which could lead to an apparent
>>> reduction in power usage not accompanied by an improvement in energy
>>> efficiency -- In fact AFAIUI there is some potential for this feature to
>>> *decrease* the energy efficiency of the system if the GPU would have
>>> been able to keep all EUs busy at a lower frequency, but the parallelism
>>> constraint forces it to run at a higher frequency above RPe in order to
>>> achieve the same throughput, because due to the convexity of the power
>>> curve of the EU we have:
>>>
>>> P(k * f) > k * P(f)
>>>
>>> Where 'k' is the ratio between the EU parallelism without and with SSEU
>>> control, and f > RPe is the original GPU frequency without SSEU control.
>>>
>>> In scenarios like that we *might* seem to be using less power with SSEU
>>> control if the workload is running longer, but it would end up using
>>> more energy overall by the time it completes, so it would be good to
>>> have some performance-per-watt numbers to make sure that's not
>>> happening.
>>>
>>>> We have also observed GPU core residencies improves by 1.035%.
>>>>
>>>> Technical Insights of the patch:
>>>> Current GPU configuration code for i915 does not allow us to change
>>>> EU/Slice/Sub-slice configuration dynamically. Its done only once while context
>>>> is created.
>>>>
>>>> While particular graphics application is running, if we examine the command
>>>> requests from user space, we observe that command density is not consistent.
>>>> It means there is scope to change the graphics configuration dynamically even
>>>> while context is running actively. This patch series proposes the solution to
>>>> find the active pending load for all active context at given time and based on
>>>> that, dynamically perform graphics configuration for each context.
>>>>
>>>> We use a hr (high resolution) timer with i915 driver in kernel to get a
>>>> callback every few milliseconds (this timer value can be configured through
>>>> debugfs, default is '0' indicating timer is in disabled state i.e. original
>>>> system without any intervention).In the timer callback, we examine pending
>>>> commands for a context in the queue, essentially, we intercept them before
>>>> they are executed by GPU and we update context with required number of EUs.
>>>>
>>> Given that the EU configuration update is synchronous with command
>>> submission, do you really need a timer? It sounds like it would be less
>>> CPU overhead to adjust the EU count on demand whenever the counter
>>> reaches or drops below the threshold instead of polling some CPU-side
>>> data structure.
>>>
>>>> Two questions, how did we arrive at right timer value? and what's the right
>>>> number of EUs? For the prior one, empirical data to achieve best performance
>>>> in least power was considered. For the later one, we roughly categorized number
>>>> of EUs logically based on platform. Now we compare number of pending commands
>>>> with a particular threshold and then set number of EUs accordingly with update
>>>> context. That threshold is also based on experiments & findings. If GPU is able
>>>> to catch up with CPU, typically there are no pending commands, the EU config
>>>> would remain unchanged there. In case there are more pending commands we
>>>> reprogram context with higher number of EUs. Please note, here we are changing
>>>> EUs even while context is running by examining pending commands every 'x'
>>>> milliseconds.
>>>>
>>> I have doubts that the number of requests pending execution is a
>>> particularly reliable indicator of the optimal number of EUs the
>>> workload needs enabled, for starters because the execlists submission
>>> code seems to be able to merge multiple requests into the same port, so
>>> there might seem to be zero pending commands even if the GPU has a
>>> backlog of several seconds or minutes worth of work.
>>>
>>> But even if you were using an accurate measure of the GPU load, would
>>> that really be a good indicator of whether the GPU would run more
>>> efficiently with more or less EUs enabled? I can think of many
>>> scenarios where a short-lived GPU request would consume less energy and
>>> complete faster while running with all EUs enabled (e.g. if it actually
>>> has enough parallelism to take advantage of all EUs in the system).
>>> Conversely I can think of some scenarios where a long-running GPU
>>> request would benefit from SSEU control (e.g. a poorly parallelizable
>>> but heavy 3D geometry pipeline or GPGPU workload). The former seems
>>> more worrying than the latter since it could lead to performance or
>>> energy efficiency regressions.
>>>
>>> IOW it seems to me that the optimal number of EUs enabled is more of a
>>> function of the internal parallelism constraints of each request rather
>>> than of the overall GPU load. You should be able to get some
>>> understanding of that by e.g. calculating the number of threads loaded
>>> on the average based on the EU SPM counters, but unfortunately the ones
>>> you'd need are only available on TGL+ IIRC. On earlier platforms you
>>> should be able to achieve the same thing by sampling some FLEXEU
>>> counters, but you'd likely have to mess with the mux configuration which
>>> would interfere with OA sampling -- However it sounds like this feature
>>> may have to be disabled anytime OA is active anyway so that may not be a
>>> problem after all?
>>
>> FLEXEU has to be configured on all contexts but does not need the mux
>> configuration.
>>
> They have a sort of mux controlled through the EU_PERF_CNT_CTL*
> registers that have to be set up correctly for each counter to count the
> right event, which would certainly interfere with userspace using OA to
> gather EU metrics.
Maybe we're not talking about the same mux then :)
>
>> I think this feature would have to be shut off everytime you end using
>> OA from userspace though.
>>
> Yeah, that's probably necessary one way or another.
>
>> -Lionel
>>
>>
>>> Regards,
>>> Francisco.
>>>
>>>> Srinivasan S (3):
>>>> drm/i915: Get active pending request for given context
>>>> drm/i915: set optimum eu/slice/sub-slice configuration based on load
>>>> type
>>>> drm/i915: Predictive governor to control slice/subslice/eu
>>>>
>>>> drivers/gpu/drm/i915/Makefile | 1 +
>>>> drivers/gpu/drm/i915/gem/i915_gem_context.c | 20 +++++
>>>> drivers/gpu/drm/i915/gem/i915_gem_context.h | 2 +
>>>> drivers/gpu/drm/i915/gem/i915_gem_context_types.h | 38 ++++++++
>>>> drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 1 +
>>>> drivers/gpu/drm/i915/gt/intel_deu.c | 104 ++++++++++++++++++++++
>>>> drivers/gpu/drm/i915/gt/intel_deu.h | 31 +++++++
>>>> drivers/gpu/drm/i915/gt/intel_lrc.c | 44 ++++++++-
>>>> drivers/gpu/drm/i915/i915_drv.h | 6 ++
>>>> drivers/gpu/drm/i915/i915_gem.c | 4 +
>>>> drivers/gpu/drm/i915/i915_params.c | 4 +
>>>> drivers/gpu/drm/i915/i915_params.h | 1 +
>>>> drivers/gpu/drm/i915/intel_device_info.c | 74 ++++++++++++++-
>>>> 13 files changed, 325 insertions(+), 5 deletions(-)
>>>> create mode 100644 drivers/gpu/drm/i915/gt/intel_deu.c
>>>> create mode 100644 drivers/gpu/drm/i915/gt/intel_deu.h
>>>>
>>>> --
>>>> 2.7.4
>>>>
>>>> _______________________________________________
>>>> Intel-gfx mailing list
>>>> Intel-gfx@lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>>>>
>>>> _______________________________________________
>>>> Intel-gfx mailing list
>>>> Intel-gfx@lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Intel-gfx] [PATCH v7 0/3] Dynamic EU configuration of Slice/Sub-slice/EU
2020-03-15 16:56 ` Lionel Landwerlin
@ 2020-03-15 18:08 ` Francisco Jerez
2020-03-15 22:30 ` Lionel Landwerlin
0 siblings, 1 reply; 14+ messages in thread
From: Francisco Jerez @ 2020-03-15 18:08 UTC (permalink / raw)
To: Lionel Landwerlin, srinivasan.s, intel-gfx, chris, tvrtko.ursulin
[-- Attachment #1.1.1: Type: text/plain, Size: 9642 bytes --]
Lionel Landwerlin <lionel.g.landwerlin@intel.com> writes:
> On 15/03/2020 02:12, Francisco Jerez wrote:
>> srinivasan.s@intel.com writes:
>>
>>> From: Srinivasan S <srinivasan.s@intel.com>
>>>
>>> drm/i915: Context aware user agnostic EU/Slice/Sub-slice control within kernel
>>>
>>> This patch sets improves GPU power consumption on Linux kernel based OS such as
>>> Chromium OS, Ubuntu, etc. Following are the power savings.
>>>
>>> Power savings on GLK-GT1 Bobba platform running on Chrome OS.
>>> -----------------------------------------------|
>>> App /KPI | % Power Benefit (mW) |
>>> ------------------------|----------------------|
>>> Hangout Call- 20 minute | 1.8% |
>>> Youtube 4K VPB | 14.13% |
>>> WebGL Aquarium | 13.76% |
>>> Unity3D | 6.78% |
>>> | |
>>> ------------------------|----------------------|
>>> Chrome PLT | BatteryLife Improves |
>>> | by ~45 minute |
>>> -----------------------------------------------|
>>>
>>> Power savings on KBL-GT3 running on Android and Ubuntu (Linux).
>>> -----------------------------------------------|
>>> App /KPI | % Power Benefit (mW) |
>>> |----------------------|
>>> | Android | Ubuntu |
>>> ------------------------|----------|-----------|
>>> 3D Mark (Ice storm) | 2.30% | N.A. |
>>> TRex On screen | 2.49% | 2.97% |
>>> Manhattan On screen | 3.11% | 4.90% |
>>> Carchase On Screen | N.A. | 5.06% |
>>> AnTuTu 6.1.4 | 3.42% | N.A. |
>>> SynMark2 | N.A. | 1.7% |
>>> -----------------------------------------------|
>>>
>> Did you get any performance (e.g. FPS) measurements from those
>> test-cases? There is quite some potential for this feature to constrain
>> the GPU throughput inadvertently, which could lead to an apparent
>> reduction in power usage not accompanied by an improvement in energy
>> efficiency -- In fact AFAIUI there is some potential for this feature to
>> *decrease* the energy efficiency of the system if the GPU would have
>> been able to keep all EUs busy at a lower frequency, but the parallelism
>> constraint forces it to run at a higher frequency above RPe in order to
>> achieve the same throughput, because due to the convexity of the power
>> curve of the EU we have:
>>
>> P(k * f) > k * P(f)
>>
>> Where 'k' is the ratio between the EU parallelism without and with SSEU
>> control, and f > RPe is the original GPU frequency without SSEU control.
>>
>> In scenarios like that we *might* seem to be using less power with SSEU
>> control if the workload is running longer, but it would end up using
>> more energy overall by the time it completes, so it would be good to
>> have some performance-per-watt numbers to make sure that's not
>> happening.
>>
>>> We have also observed GPU core residencies improves by 1.035%.
>>>
>>> Technical Insights of the patch:
>>> Current GPU configuration code for i915 does not allow us to change
>>> EU/Slice/Sub-slice configuration dynamically. Its done only once while context
>>> is created.
>>>
>>> While particular graphics application is running, if we examine the command
>>> requests from user space, we observe that command density is not consistent.
>>> It means there is scope to change the graphics configuration dynamically even
>>> while context is running actively. This patch series proposes the solution to
>>> find the active pending load for all active context at given time and based on
>>> that, dynamically perform graphics configuration for each context.
>>>
>>> We use a hr (high resolution) timer with i915 driver in kernel to get a
>>> callback every few milliseconds (this timer value can be configured through
>>> debugfs, default is '0' indicating timer is in disabled state i.e. original
>>> system without any intervention).In the timer callback, we examine pending
>>> commands for a context in the queue, essentially, we intercept them before
>>> they are executed by GPU and we update context with required number of EUs.
>>>
>> Given that the EU configuration update is synchronous with command
>> submission, do you really need a timer? It sounds like it would be less
>> CPU overhead to adjust the EU count on demand whenever the counter
>> reaches or drops below the threshold instead of polling some CPU-side
>> data structure.
>>
>>> Two questions, how did we arrive at right timer value? and what's the right
>>> number of EUs? For the prior one, empirical data to achieve best performance
>>> in least power was considered. For the later one, we roughly categorized number
>>> of EUs logically based on platform. Now we compare number of pending commands
>>> with a particular threshold and then set number of EUs accordingly with update
>>> context. That threshold is also based on experiments & findings. If GPU is able
>>> to catch up with CPU, typically there are no pending commands, the EU config
>>> would remain unchanged there. In case there are more pending commands we
>>> reprogram context with higher number of EUs. Please note, here we are changing
>>> EUs even while context is running by examining pending commands every 'x'
>>> milliseconds.
>>>
>> I have doubts that the number of requests pending execution is a
>> particularly reliable indicator of the optimal number of EUs the
>> workload needs enabled, for starters because the execlists submission
>> code seems to be able to merge multiple requests into the same port, so
>> there might seem to be zero pending commands even if the GPU has a
>> backlog of several seconds or minutes worth of work.
>>
>> But even if you were using an accurate measure of the GPU load, would
>> that really be a good indicator of whether the GPU would run more
>> efficiently with more or less EUs enabled? I can think of many
>> scenarios where a short-lived GPU request would consume less energy and
>> complete faster while running with all EUs enabled (e.g. if it actually
>> has enough parallelism to take advantage of all EUs in the system).
>> Conversely I can think of some scenarios where a long-running GPU
>> request would benefit from SSEU control (e.g. a poorly parallelizable
>> but heavy 3D geometry pipeline or GPGPU workload). The former seems
>> more worrying than the latter since it could lead to performance or
>> energy efficiency regressions.
>>
>> IOW it seems to me that the optimal number of EUs enabled is more of a
>> function of the internal parallelism constraints of each request rather
>> than of the overall GPU load. You should be able to get some
>> understanding of that by e.g. calculating the number of threads loaded
>> on the average based on the EU SPM counters, but unfortunately the ones
>> you'd need are only available on TGL+ IIRC. On earlier platforms you
>> should be able to achieve the same thing by sampling some FLEXEU
>> counters, but you'd likely have to mess with the mux configuration which
>> would interfere with OA sampling -- However it sounds like this feature
>> may have to be disabled anytime OA is active anyway so that may not be a
>> problem after all?
>
>
> FLEXEU has to be configured on all contexts but does not need the mux
> configuration.
>
They have a sort of mux controlled through the EU_PERF_CNT_CTL*
registers that have to be set up correctly for each counter to count the
right event, which would certainly interfere with userspace using OA to
gather EU metrics.
> I think this feature would have to be shut off everytime you end using
> OA from userspace though.
>
Yeah, that's probably necessary one way or another.
>
> -Lionel
>
>
>>
>> Regards,
>> Francisco.
>>
>>> Srinivasan S (3):
>>> drm/i915: Get active pending request for given context
>>> drm/i915: set optimum eu/slice/sub-slice configuration based on load
>>> type
>>> drm/i915: Predictive governor to control slice/subslice/eu
>>>
>>> drivers/gpu/drm/i915/Makefile | 1 +
>>> drivers/gpu/drm/i915/gem/i915_gem_context.c | 20 +++++
>>> drivers/gpu/drm/i915/gem/i915_gem_context.h | 2 +
>>> drivers/gpu/drm/i915/gem/i915_gem_context_types.h | 38 ++++++++
>>> drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 1 +
>>> drivers/gpu/drm/i915/gt/intel_deu.c | 104 ++++++++++++++++++++++
>>> drivers/gpu/drm/i915/gt/intel_deu.h | 31 +++++++
>>> drivers/gpu/drm/i915/gt/intel_lrc.c | 44 ++++++++-
>>> drivers/gpu/drm/i915/i915_drv.h | 6 ++
>>> drivers/gpu/drm/i915/i915_gem.c | 4 +
>>> drivers/gpu/drm/i915/i915_params.c | 4 +
>>> drivers/gpu/drm/i915/i915_params.h | 1 +
>>> drivers/gpu/drm/i915/intel_device_info.c | 74 ++++++++++++++-
>>> 13 files changed, 325 insertions(+), 5 deletions(-)
>>> create mode 100644 drivers/gpu/drm/i915/gt/intel_deu.c
>>> create mode 100644 drivers/gpu/drm/i915/gt/intel_deu.h
>>>
>>> --
>>> 2.7.4
>>>
>>> _______________________________________________
>>> Intel-gfx mailing list
>>> Intel-gfx@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>>>
>>> _______________________________________________
>>> Intel-gfx mailing list
>>> Intel-gfx@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 227 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Intel-gfx] [PATCH v7 0/3] Dynamic EU configuration of Slice/Sub-slice/EU
2020-03-15 0:12 ` Francisco Jerez
@ 2020-03-15 16:56 ` Lionel Landwerlin
2020-03-15 18:08 ` Francisco Jerez
0 siblings, 1 reply; 14+ messages in thread
From: Lionel Landwerlin @ 2020-03-15 16:56 UTC (permalink / raw)
To: Francisco Jerez, srinivasan.s, intel-gfx, chris, tvrtko.ursulin
[-- Attachment #1.1: Type: text/plain, Size: 8924 bytes --]
On 15/03/2020 02:12, Francisco Jerez wrote:
> srinivasan.s@intel.com writes:
>
>> From: Srinivasan S <srinivasan.s@intel.com>
>>
>> drm/i915: Context aware user agnostic EU/Slice/Sub-slice control within kernel
>>
>> This patch sets improves GPU power consumption on Linux kernel based OS such as
>> Chromium OS, Ubuntu, etc. Following are the power savings.
>>
>> Power savings on GLK-GT1 Bobba platform running on Chrome OS.
>> -----------------------------------------------|
>> App /KPI | % Power Benefit (mW) |
>> ------------------------|----------------------|
>> Hangout Call- 20 minute | 1.8% |
>> Youtube 4K VPB | 14.13% |
>> WebGL Aquarium | 13.76% |
>> Unity3D | 6.78% |
>> | |
>> ------------------------|----------------------|
>> Chrome PLT | BatteryLife Improves |
>> | by ~45 minute |
>> -----------------------------------------------|
>>
>> Power savings on KBL-GT3 running on Android and Ubuntu (Linux).
>> -----------------------------------------------|
>> App /KPI | % Power Benefit (mW) |
>> |----------------------|
>> | Android | Ubuntu |
>> ------------------------|----------|-----------|
>> 3D Mark (Ice storm) | 2.30% | N.A. |
>> TRex On screen | 2.49% | 2.97% |
>> Manhattan On screen | 3.11% | 4.90% |
>> Carchase On Screen | N.A. | 5.06% |
>> AnTuTu 6.1.4 | 3.42% | N.A. |
>> SynMark2 | N.A. | 1.7% |
>> -----------------------------------------------|
>>
> Did you get any performance (e.g. FPS) measurements from those
> test-cases? There is quite some potential for this feature to constrain
> the GPU throughput inadvertently, which could lead to an apparent
> reduction in power usage not accompanied by an improvement in energy
> efficiency -- In fact AFAIUI there is some potential for this feature to
> *decrease* the energy efficiency of the system if the GPU would have
> been able to keep all EUs busy at a lower frequency, but the parallelism
> constraint forces it to run at a higher frequency above RPe in order to
> achieve the same throughput, because due to the convexity of the power
> curve of the EU we have:
>
> P(k * f) > k * P(f)
>
> Where 'k' is the ratio between the EU parallelism without and with SSEU
> control, and f > RPe is the original GPU frequency without SSEU control.
>
> In scenarios like that we *might* seem to be using less power with SSEU
> control if the workload is running longer, but it would end up using
> more energy overall by the time it completes, so it would be good to
> have some performance-per-watt numbers to make sure that's not
> happening.
>
>> We have also observed GPU core residencies improves by 1.035%.
>>
>> Technical Insights of the patch:
>> Current GPU configuration code for i915 does not allow us to change
>> EU/Slice/Sub-slice configuration dynamically. Its done only once while context
>> is created.
>>
>> While particular graphics application is running, if we examine the command
>> requests from user space, we observe that command density is not consistent.
>> It means there is scope to change the graphics configuration dynamically even
>> while context is running actively. This patch series proposes the solution to
>> find the active pending load for all active context at given time and based on
>> that, dynamically perform graphics configuration for each context.
>>
>> We use a hr (high resolution) timer with i915 driver in kernel to get a
>> callback every few milliseconds (this timer value can be configured through
>> debugfs, default is '0' indicating timer is in disabled state i.e. original
>> system without any intervention).In the timer callback, we examine pending
>> commands for a context in the queue, essentially, we intercept them before
>> they are executed by GPU and we update context with required number of EUs.
>>
> Given that the EU configuration update is synchronous with command
> submission, do you really need a timer? It sounds like it would be less
> CPU overhead to adjust the EU count on demand whenever the counter
> reaches or drops below the threshold instead of polling some CPU-side
> data structure.
>
>> Two questions, how did we arrive at right timer value? and what's the right
>> number of EUs? For the prior one, empirical data to achieve best performance
>> in least power was considered. For the later one, we roughly categorized number
>> of EUs logically based on platform. Now we compare number of pending commands
>> with a particular threshold and then set number of EUs accordingly with update
>> context. That threshold is also based on experiments & findings. If GPU is able
>> to catch up with CPU, typically there are no pending commands, the EU config
>> would remain unchanged there. In case there are more pending commands we
>> reprogram context with higher number of EUs. Please note, here we are changing
>> EUs even while context is running by examining pending commands every 'x'
>> milliseconds.
>>
> I have doubts that the number of requests pending execution is a
> particularly reliable indicator of the optimal number of EUs the
> workload needs enabled, for starters because the execlists submission
> code seems to be able to merge multiple requests into the same port, so
> there might seem to be zero pending commands even if the GPU has a
> backlog of several seconds or minutes worth of work.
>
> But even if you were using an accurate measure of the GPU load, would
> that really be a good indicator of whether the GPU would run more
> efficiently with more or less EUs enabled? I can think of many
> scenarios where a short-lived GPU request would consume less energy and
> complete faster while running with all EUs enabled (e.g. if it actually
> has enough parallelism to take advantage of all EUs in the system).
> Conversely I can think of some scenarios where a long-running GPU
> request would benefit from SSEU control (e.g. a poorly parallelizable
> but heavy 3D geometry pipeline or GPGPU workload). The former seems
> more worrying than the latter since it could lead to performance or
> energy efficiency regressions.
>
> IOW it seems to me that the optimal number of EUs enabled is more of a
> function of the internal parallelism constraints of each request rather
> than of the overall GPU load. You should be able to get some
> understanding of that by e.g. calculating the number of threads loaded
> on the average based on the EU SPM counters, but unfortunately the ones
> you'd need are only available on TGL+ IIRC. On earlier platforms you
> should be able to achieve the same thing by sampling some FLEXEU
> counters, but you'd likely have to mess with the mux configuration which
> would interfere with OA sampling -- However it sounds like this feature
> may have to be disabled anytime OA is active anyway so that may not be a
> problem after all?
FLEXEU has to be configured on all contexts but does not need the mux
configuration.
I think this feature would have to be shut off everytime you end using
OA from userspace though.
-Lionel
>
> Regards,
> Francisco.
>
>> Srinivasan S (3):
>> drm/i915: Get active pending request for given context
>> drm/i915: set optimum eu/slice/sub-slice configuration based on load
>> type
>> drm/i915: Predictive governor to control slice/subslice/eu
>>
>> drivers/gpu/drm/i915/Makefile | 1 +
>> drivers/gpu/drm/i915/gem/i915_gem_context.c | 20 +++++
>> drivers/gpu/drm/i915/gem/i915_gem_context.h | 2 +
>> drivers/gpu/drm/i915/gem/i915_gem_context_types.h | 38 ++++++++
>> drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 1 +
>> drivers/gpu/drm/i915/gt/intel_deu.c | 104 ++++++++++++++++++++++
>> drivers/gpu/drm/i915/gt/intel_deu.h | 31 +++++++
>> drivers/gpu/drm/i915/gt/intel_lrc.c | 44 ++++++++-
>> drivers/gpu/drm/i915/i915_drv.h | 6 ++
>> drivers/gpu/drm/i915/i915_gem.c | 4 +
>> drivers/gpu/drm/i915/i915_params.c | 4 +
>> drivers/gpu/drm/i915/i915_params.h | 1 +
>> drivers/gpu/drm/i915/intel_device_info.c | 74 ++++++++++++++-
>> 13 files changed, 325 insertions(+), 5 deletions(-)
>> create mode 100644 drivers/gpu/drm/i915/gt/intel_deu.c
>> create mode 100644 drivers/gpu/drm/i915/gt/intel_deu.h
>>
>> --
>> 2.7.4
>>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[-- Attachment #1.2: Type: text/html, Size: 10285 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Intel-gfx] [PATCH v7 0/3] Dynamic EU configuration of Slice/Sub-slice/EU
2020-03-13 11:12 [Intel-gfx] [PATCH v7 0/3] Dynamic EU configuration of Slice/Sub-slice/EU srinivasan.s
2020-03-13 17:18 ` Tvrtko Ursulin
@ 2020-03-15 0:12 ` Francisco Jerez
2020-03-15 16:56 ` Lionel Landwerlin
1 sibling, 1 reply; 14+ messages in thread
From: Francisco Jerez @ 2020-03-15 0:12 UTC (permalink / raw)
To: srinivasan.s, intel-gfx, chris, tvrtko.ursulin
[-- Attachment #1.1.1: Type: text/plain, Size: 8434 bytes --]
srinivasan.s@intel.com writes:
> From: Srinivasan S <srinivasan.s@intel.com>
>
> drm/i915: Context aware user agnostic EU/Slice/Sub-slice control within kernel
>
> This patch sets improves GPU power consumption on Linux kernel based OS such as
> Chromium OS, Ubuntu, etc. Following are the power savings.
>
> Power savings on GLK-GT1 Bobba platform running on Chrome OS.
> -----------------------------------------------|
> App /KPI | % Power Benefit (mW) |
> ------------------------|----------------------|
> Hangout Call- 20 minute | 1.8% |
> Youtube 4K VPB | 14.13% |
> WebGL Aquarium | 13.76% |
> Unity3D | 6.78% |
> | |
> ------------------------|----------------------|
> Chrome PLT | BatteryLife Improves |
> | by ~45 minute |
> -----------------------------------------------|
>
> Power savings on KBL-GT3 running on Android and Ubuntu (Linux).
> -----------------------------------------------|
> App /KPI | % Power Benefit (mW) |
> |----------------------|
> | Android | Ubuntu |
> ------------------------|----------|-----------|
> 3D Mark (Ice storm) | 2.30% | N.A. |
> TRex On screen | 2.49% | 2.97% |
> Manhattan On screen | 3.11% | 4.90% |
> Carchase On Screen | N.A. | 5.06% |
> AnTuTu 6.1.4 | 3.42% | N.A. |
> SynMark2 | N.A. | 1.7% |
> -----------------------------------------------|
>
Did you get any performance (e.g. FPS) measurements from those
test-cases? There is quite some potential for this feature to constrain
the GPU throughput inadvertently, which could lead to an apparent
reduction in power usage not accompanied by an improvement in energy
efficiency -- In fact AFAIUI there is some potential for this feature to
*decrease* the energy efficiency of the system if the GPU would have
been able to keep all EUs busy at a lower frequency, but the parallelism
constraint forces it to run at a higher frequency above RPe in order to
achieve the same throughput, because due to the convexity of the power
curve of the EU we have:
P(k * f) > k * P(f)
Where 'k' is the ratio between the EU parallelism without and with SSEU
control, and f > RPe is the original GPU frequency without SSEU control.
In scenarios like that we *might* seem to be using less power with SSEU
control if the workload is running longer, but it would end up using
more energy overall by the time it completes, so it would be good to
have some performance-per-watt numbers to make sure that's not
happening.
> We have also observed GPU core residencies improves by 1.035%.
>
> Technical Insights of the patch:
> Current GPU configuration code for i915 does not allow us to change
> EU/Slice/Sub-slice configuration dynamically. Its done only once while context
> is created.
>
> While particular graphics application is running, if we examine the command
> requests from user space, we observe that command density is not consistent.
> It means there is scope to change the graphics configuration dynamically even
> while context is running actively. This patch series proposes the solution to
> find the active pending load for all active context at given time and based on
> that, dynamically perform graphics configuration for each context.
>
> We use a hr (high resolution) timer with i915 driver in kernel to get a
> callback every few milliseconds (this timer value can be configured through
> debugfs, default is '0' indicating timer is in disabled state i.e. original
> system without any intervention).In the timer callback, we examine pending
> commands for a context in the queue, essentially, we intercept them before
> they are executed by GPU and we update context with required number of EUs.
>
Given that the EU configuration update is synchronous with command
submission, do you really need a timer? It sounds like it would be less
CPU overhead to adjust the EU count on demand whenever the counter
reaches or drops below the threshold instead of polling some CPU-side
data structure.
> Two questions, how did we arrive at right timer value? and what's the right
> number of EUs? For the prior one, empirical data to achieve best performance
> in least power was considered. For the later one, we roughly categorized number
> of EUs logically based on platform. Now we compare number of pending commands
> with a particular threshold and then set number of EUs accordingly with update
> context. That threshold is also based on experiments & findings. If GPU is able
> to catch up with CPU, typically there are no pending commands, the EU config
> would remain unchanged there. In case there are more pending commands we
> reprogram context with higher number of EUs. Please note, here we are changing
> EUs even while context is running by examining pending commands every 'x'
> milliseconds.
>
I have doubts that the number of requests pending execution is a
particularly reliable indicator of the optimal number of EUs the
workload needs enabled, for starters because the execlists submission
code seems to be able to merge multiple requests into the same port, so
there might seem to be zero pending commands even if the GPU has a
backlog of several seconds or minutes worth of work.
But even if you were using an accurate measure of the GPU load, would
that really be a good indicator of whether the GPU would run more
efficiently with more or less EUs enabled? I can think of many
scenarios where a short-lived GPU request would consume less energy and
complete faster while running with all EUs enabled (e.g. if it actually
has enough parallelism to take advantage of all EUs in the system).
Conversely I can think of some scenarios where a long-running GPU
request would benefit from SSEU control (e.g. a poorly parallelizable
but heavy 3D geometry pipeline or GPGPU workload). The former seems
more worrying than the latter since it could lead to performance or
energy efficiency regressions.
IOW it seems to me that the optimal number of EUs enabled is more of a
function of the internal parallelism constraints of each request rather
than of the overall GPU load. You should be able to get some
understanding of that by e.g. calculating the number of threads loaded
on the average based on the EU SPM counters, but unfortunately the ones
you'd need are only available on TGL+ IIRC. On earlier platforms you
should be able to achieve the same thing by sampling some FLEXEU
counters, but you'd likely have to mess with the mux configuration which
would interfere with OA sampling -- However it sounds like this feature
may have to be disabled anytime OA is active anyway so that may not be a
problem after all?
Regards,
Francisco.
> Srinivasan S (3):
> drm/i915: Get active pending request for given context
> drm/i915: set optimum eu/slice/sub-slice configuration based on load
> type
> drm/i915: Predictive governor to control slice/subslice/eu
>
> drivers/gpu/drm/i915/Makefile | 1 +
> drivers/gpu/drm/i915/gem/i915_gem_context.c | 20 +++++
> drivers/gpu/drm/i915/gem/i915_gem_context.h | 2 +
> drivers/gpu/drm/i915/gem/i915_gem_context_types.h | 38 ++++++++
> drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 1 +
> drivers/gpu/drm/i915/gt/intel_deu.c | 104 ++++++++++++++++++++++
> drivers/gpu/drm/i915/gt/intel_deu.h | 31 +++++++
> drivers/gpu/drm/i915/gt/intel_lrc.c | 44 ++++++++-
> drivers/gpu/drm/i915/i915_drv.h | 6 ++
> drivers/gpu/drm/i915/i915_gem.c | 4 +
> drivers/gpu/drm/i915/i915_params.c | 4 +
> drivers/gpu/drm/i915/i915_params.h | 1 +
> drivers/gpu/drm/i915/intel_device_info.c | 74 ++++++++++++++-
> 13 files changed, 325 insertions(+), 5 deletions(-)
> create mode 100644 drivers/gpu/drm/i915/gt/intel_deu.c
> create mode 100644 drivers/gpu/drm/i915/gt/intel_deu.h
>
> --
> 2.7.4
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 227 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Intel-gfx] [PATCH v7 0/3] Dynamic EU configuration of Slice/Sub-slice/EU
2020-03-13 17:18 ` Tvrtko Ursulin
@ 2020-03-13 17:32 ` S, Srinivasan
0 siblings, 0 replies; 14+ messages in thread
From: S, Srinivasan @ 2020-03-13 17:32 UTC (permalink / raw)
To: Tvrtko Ursulin, intel-gfx, chris, Francisco Jerez
> -----Original Message-----
> From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Sent: Friday, March 13, 2020 10:48 PM
> To: S, Srinivasan <srinivasan.s@intel.com>; intel-gfx@lists.freedesktop.org;
> chris@chris-wilson.co.uk; Francisco Jerez <currojerez@riseup.net>
> Subject: Re: [Intel-gfx] [PATCH v7 0/3] Dynamic EU configuration of Slice/Sub-
> slice/EU
>
>
> Hi,
>
> On 13/03/2020 11:12, srinivasan.s@intel.com wrote:
> > From: Srinivasan S <srinivasan.s@intel.com>
> >
> > drm/i915: Context aware user agnostic EU/Slice/Sub-slice control within
> kernel
> >
> > This patch sets improves GPU power consumption on Linux kernel based OS
> such as
> > Chromium OS, Ubuntu, etc. Following are the power savings.
> >
> > Power savings on GLK-GT1 Bobba platform running on Chrome OS.
> > -----------------------------------------------|
> > App /KPI | % Power Benefit (mW) |
> > ------------------------|----------------------|
> > Hangout Call- 20 minute | 1.8% |
> > Youtube 4K VPB | 14.13% |
> > WebGL Aquarium | 13.76% |
> > Unity3D | 6.78% |
> > | |
> > ------------------------|----------------------|
> > Chrome PLT | BatteryLife Improves |
> > | by ~45 minute |
> > -----------------------------------------------|
> >
> > Power savings on KBL-GT3 running on Android and Ubuntu (Linux).
> > -----------------------------------------------|
> > App /KPI | % Power Benefit (mW) |
> > |----------------------|
> > | Android | Ubuntu |
> > ------------------------|----------|-----------|
> > 3D Mark (Ice storm) | 2.30% | N.A. |
> > TRex On screen | 2.49% | 2.97% |
> > Manhattan On screen | 3.11% | 4.90% |
> > Carchase On Screen | N.A. | 5.06% |
> > AnTuTu 6.1.4 | 3.42% | N.A. |
> > SynMark2 | N.A. | 1.7% |
> > -----------------------------------------------|
>
> Have a look at the result Francisco obtained on Icelake with a different
> approach: https://patchwork.freedesktop.org/series/74540/
>
> Not all benchmarks overlap but if you are set up to easily test his
> patches it may be for a mutual benefit.
[S, Srinivasan] Thanks!, Could we reuse only his gfx related patches alone as below - to focus on our gfx power saving benefits as first step, or could you please let me know the entire patch series needs to be considered or is there any dependency of GPU power on CPU?
ie., https://patchwork.freedesktop.org/patch/357098/?series=74540&rev=2
https://patchwork.freedesktop.org/patch/357103/?series=74540&rev=2
>
> Regards,
>
> Tvrtko
>
> > We have also observed GPU core residencies improves by 1.035%.
> >
> > Technical Insights of the patch:
> > Current GPU configuration code for i915 does not allow us to change
> > EU/Slice/Sub-slice configuration dynamically. Its done only once while context
> > is created.
> >
> > While particular graphics application is running, if we examine the command
> > requests from user space, we observe that command density is not consistent.
> > It means there is scope to change the graphics configuration dynamically even
> > while context is running actively. This patch series proposes the solution to
> > find the active pending load for all active context at given time and based on
> > that, dynamically perform graphics configuration for each context.
> >
> > We use a hr (high resolution) timer with i915 driver in kernel to get a
> > callback every few milliseconds (this timer value can be configured through
> > debugfs, default is '0' indicating timer is in disabled state i.e. original
> > system without any intervention).In the timer callback, we examine pending
> > commands for a context in the queue, essentially, we intercept them before
> > they are executed by GPU and we update context with required number of
> EUs.
> >
> > Two questions, how did we arrive at right timer value? and what's the right
> > number of EUs? For the prior one, empirical data to achieve best performance
> > in least power was considered. For the later one, we roughly categorized
> number
> > of EUs logically based on platform. Now we compare number of pending
> commands
> > with a particular threshold and then set number of EUs accordingly with
> update
> > context. That threshold is also based on experiments & findings. If GPU is able
> > to catch up with CPU, typically there are no pending commands, the EU config
> > would remain unchanged there. In case there are more pending commands we
> > reprogram context with higher number of EUs. Please note, here we are
> changing
> > EUs even while context is running by examining pending commands every 'x'
> > milliseconds.
> >
> > Srinivasan S (3):
> > drm/i915: Get active pending request for given context
> > drm/i915: set optimum eu/slice/sub-slice configuration based on load
> > type
> > drm/i915: Predictive governor to control slice/subslice/eu
> >
> > drivers/gpu/drm/i915/Makefile | 1 +
> > drivers/gpu/drm/i915/gem/i915_gem_context.c | 20 +++++
> > drivers/gpu/drm/i915/gem/i915_gem_context.h | 2 +
> > drivers/gpu/drm/i915/gem/i915_gem_context_types.h | 38 ++++++++
> > drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 1 +
> > drivers/gpu/drm/i915/gt/intel_deu.c | 104
> ++++++++++++++++++++++
> > drivers/gpu/drm/i915/gt/intel_deu.h | 31 +++++++
> > drivers/gpu/drm/i915/gt/intel_lrc.c | 44 ++++++++-
> > drivers/gpu/drm/i915/i915_drv.h | 6 ++
> > drivers/gpu/drm/i915/i915_gem.c | 4 +
> > drivers/gpu/drm/i915/i915_params.c | 4 +
> > drivers/gpu/drm/i915/i915_params.h | 1 +
> > drivers/gpu/drm/i915/intel_device_info.c | 74 ++++++++++++++-
> > 13 files changed, 325 insertions(+), 5 deletions(-)
> > create mode 100644 drivers/gpu/drm/i915/gt/intel_deu.c
> > create mode 100644 drivers/gpu/drm/i915/gt/intel_deu.h
> >
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Intel-gfx] [PATCH v7 0/3] Dynamic EU configuration of Slice/Sub-slice/EU
2020-03-13 11:12 [Intel-gfx] [PATCH v7 0/3] Dynamic EU configuration of Slice/Sub-slice/EU srinivasan.s
@ 2020-03-13 17:18 ` Tvrtko Ursulin
2020-03-13 17:32 ` S, Srinivasan
2020-03-15 0:12 ` Francisco Jerez
1 sibling, 1 reply; 14+ messages in thread
From: Tvrtko Ursulin @ 2020-03-13 17:18 UTC (permalink / raw)
To: srinivasan.s, intel-gfx, chris, Francisco Jerez
Hi,
On 13/03/2020 11:12, srinivasan.s@intel.com wrote:
> From: Srinivasan S <srinivasan.s@intel.com>
>
> drm/i915: Context aware user agnostic EU/Slice/Sub-slice control within kernel
>
> This patch sets improves GPU power consumption on Linux kernel based OS such as
> Chromium OS, Ubuntu, etc. Following are the power savings.
>
> Power savings on GLK-GT1 Bobba platform running on Chrome OS.
> -----------------------------------------------|
> App /KPI | % Power Benefit (mW) |
> ------------------------|----------------------|
> Hangout Call- 20 minute | 1.8% |
> Youtube 4K VPB | 14.13% |
> WebGL Aquarium | 13.76% |
> Unity3D | 6.78% |
> | |
> ------------------------|----------------------|
> Chrome PLT | BatteryLife Improves |
> | by ~45 minute |
> -----------------------------------------------|
>
> Power savings on KBL-GT3 running on Android and Ubuntu (Linux).
> -----------------------------------------------|
> App /KPI | % Power Benefit (mW) |
> |----------------------|
> | Android | Ubuntu |
> ------------------------|----------|-----------|
> 3D Mark (Ice storm) | 2.30% | N.A. |
> TRex On screen | 2.49% | 2.97% |
> Manhattan On screen | 3.11% | 4.90% |
> Carchase On Screen | N.A. | 5.06% |
> AnTuTu 6.1.4 | 3.42% | N.A. |
> SynMark2 | N.A. | 1.7% |
> -----------------------------------------------|
Have a look at the result Francisco obtained on Icelake with a different
approach: https://patchwork.freedesktop.org/series/74540/
Not all benchmarks overlap but if you are set up to easily test his
patches it may be for a mutual benefit.
Regards,
Tvrtko
> We have also observed GPU core residencies improves by 1.035%.
>
> Technical Insights of the patch:
> Current GPU configuration code for i915 does not allow us to change
> EU/Slice/Sub-slice configuration dynamically. Its done only once while context
> is created.
>
> While particular graphics application is running, if we examine the command
> requests from user space, we observe that command density is not consistent.
> It means there is scope to change the graphics configuration dynamically even
> while context is running actively. This patch series proposes the solution to
> find the active pending load for all active context at given time and based on
> that, dynamically perform graphics configuration for each context.
>
> We use a hr (high resolution) timer with i915 driver in kernel to get a
> callback every few milliseconds (this timer value can be configured through
> debugfs, default is '0' indicating timer is in disabled state i.e. original
> system without any intervention).In the timer callback, we examine pending
> commands for a context in the queue, essentially, we intercept them before
> they are executed by GPU and we update context with required number of EUs.
>
> Two questions, how did we arrive at right timer value? and what's the right
> number of EUs? For the prior one, empirical data to achieve best performance
> in least power was considered. For the later one, we roughly categorized number
> of EUs logically based on platform. Now we compare number of pending commands
> with a particular threshold and then set number of EUs accordingly with update
> context. That threshold is also based on experiments & findings. If GPU is able
> to catch up with CPU, typically there are no pending commands, the EU config
> would remain unchanged there. In case there are more pending commands we
> reprogram context with higher number of EUs. Please note, here we are changing
> EUs even while context is running by examining pending commands every 'x'
> milliseconds.
>
> Srinivasan S (3):
> drm/i915: Get active pending request for given context
> drm/i915: set optimum eu/slice/sub-slice configuration based on load
> type
> drm/i915: Predictive governor to control slice/subslice/eu
>
> drivers/gpu/drm/i915/Makefile | 1 +
> drivers/gpu/drm/i915/gem/i915_gem_context.c | 20 +++++
> drivers/gpu/drm/i915/gem/i915_gem_context.h | 2 +
> drivers/gpu/drm/i915/gem/i915_gem_context_types.h | 38 ++++++++
> drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 1 +
> drivers/gpu/drm/i915/gt/intel_deu.c | 104 ++++++++++++++++++++++
> drivers/gpu/drm/i915/gt/intel_deu.h | 31 +++++++
> drivers/gpu/drm/i915/gt/intel_lrc.c | 44 ++++++++-
> drivers/gpu/drm/i915/i915_drv.h | 6 ++
> drivers/gpu/drm/i915/i915_gem.c | 4 +
> drivers/gpu/drm/i915/i915_params.c | 4 +
> drivers/gpu/drm/i915/i915_params.h | 1 +
> drivers/gpu/drm/i915/intel_device_info.c | 74 ++++++++++++++-
> 13 files changed, 325 insertions(+), 5 deletions(-)
> create mode 100644 drivers/gpu/drm/i915/gt/intel_deu.c
> create mode 100644 drivers/gpu/drm/i915/gt/intel_deu.h
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Intel-gfx] [PATCH v7 0/3] Dynamic EU configuration of Slice/Sub-slice/EU
@ 2020-03-13 11:12 srinivasan.s
2020-03-13 17:18 ` Tvrtko Ursulin
2020-03-15 0:12 ` Francisco Jerez
0 siblings, 2 replies; 14+ messages in thread
From: srinivasan.s @ 2020-03-13 11:12 UTC (permalink / raw)
To: intel-gfx, chris, tvrtko.ursulin
From: Srinivasan S <srinivasan.s@intel.com>
drm/i915: Context aware user agnostic EU/Slice/Sub-slice control within kernel
This patch sets improves GPU power consumption on Linux kernel based OS such as
Chromium OS, Ubuntu, etc. Following are the power savings.
Power savings on GLK-GT1 Bobba platform running on Chrome OS.
-----------------------------------------------|
App /KPI | % Power Benefit (mW) |
------------------------|----------------------|
Hangout Call- 20 minute | 1.8% |
Youtube 4K VPB | 14.13% |
WebGL Aquarium | 13.76% |
Unity3D | 6.78% |
| |
------------------------|----------------------|
Chrome PLT | BatteryLife Improves |
| by ~45 minute |
-----------------------------------------------|
Power savings on KBL-GT3 running on Android and Ubuntu (Linux).
-----------------------------------------------|
App /KPI | % Power Benefit (mW) |
|----------------------|
| Android | Ubuntu |
------------------------|----------|-----------|
3D Mark (Ice storm) | 2.30% | N.A. |
TRex On screen | 2.49% | 2.97% |
Manhattan On screen | 3.11% | 4.90% |
Carchase On Screen | N.A. | 5.06% |
AnTuTu 6.1.4 | 3.42% | N.A. |
SynMark2 | N.A. | 1.7% |
-----------------------------------------------|
We have also observed GPU core residencies improves by 1.035%.
Technical Insights of the patch:
Current GPU configuration code for i915 does not allow us to change
EU/Slice/Sub-slice configuration dynamically. Its done only once while context
is created.
While particular graphics application is running, if we examine the command
requests from user space, we observe that command density is not consistent.
It means there is scope to change the graphics configuration dynamically even
while context is running actively. This patch series proposes the solution to
find the active pending load for all active context at given time and based on
that, dynamically perform graphics configuration for each context.
We use a hr (high resolution) timer with i915 driver in kernel to get a
callback every few milliseconds (this timer value can be configured through
debugfs, default is '0' indicating timer is in disabled state i.e. original
system without any intervention).In the timer callback, we examine pending
commands for a context in the queue, essentially, we intercept them before
they are executed by GPU and we update context with required number of EUs.
Two questions, how did we arrive at right timer value? and what's the right
number of EUs? For the prior one, empirical data to achieve best performance
in least power was considered. For the later one, we roughly categorized number
of EUs logically based on platform. Now we compare number of pending commands
with a particular threshold and then set number of EUs accordingly with update
context. That threshold is also based on experiments & findings. If GPU is able
to catch up with CPU, typically there are no pending commands, the EU config
would remain unchanged there. In case there are more pending commands we
reprogram context with higher number of EUs. Please note, here we are changing
EUs even while context is running by examining pending commands every 'x'
milliseconds.
Srinivasan S (3):
drm/i915: Get active pending request for given context
drm/i915: set optimum eu/slice/sub-slice configuration based on load
type
drm/i915: Predictive governor to control slice/subslice/eu
drivers/gpu/drm/i915/Makefile | 1 +
drivers/gpu/drm/i915/gem/i915_gem_context.c | 20 +++++
drivers/gpu/drm/i915/gem/i915_gem_context.h | 2 +
drivers/gpu/drm/i915/gem/i915_gem_context_types.h | 38 ++++++++
drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 1 +
drivers/gpu/drm/i915/gt/intel_deu.c | 104 ++++++++++++++++++++++
drivers/gpu/drm/i915/gt/intel_deu.h | 31 +++++++
drivers/gpu/drm/i915/gt/intel_lrc.c | 44 ++++++++-
drivers/gpu/drm/i915/i915_drv.h | 6 ++
drivers/gpu/drm/i915/i915_gem.c | 4 +
drivers/gpu/drm/i915/i915_params.c | 4 +
drivers/gpu/drm/i915/i915_params.h | 1 +
drivers/gpu/drm/i915/intel_device_info.c | 74 ++++++++++++++-
13 files changed, 325 insertions(+), 5 deletions(-)
create mode 100644 drivers/gpu/drm/i915/gt/intel_deu.c
create mode 100644 drivers/gpu/drm/i915/gt/intel_deu.h
--
2.7.4
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2020-03-17 3:43 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-16 13:36 [Intel-gfx] [PATCH v7 0/3] Dynamic EU configuration of Slice/Sub-slice/EU Ankit Navik
2020-03-16 13:36 ` [Intel-gfx] [PATCH v7 1/3] drm/i915: Get active pending request for given context Ankit Navik
2020-03-16 13:37 ` [Intel-gfx] [PATCH v7 2/3] drm/i915: set optimum eu/slice/sub-slice configuration based on load type Ankit Navik
2020-03-17 3:42 ` kbuild test robot
2020-03-17 3:42 ` kbuild test robot
2020-03-16 13:37 ` [Intel-gfx] [PATCH v7 3/3] drm/i915: Predictive governor to control slice/subslice/eu Ankit Navik
2020-03-16 21:53 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for Dynamic EU configuration of Slice/Sub-slice/EU (rev7) Patchwork
-- strict thread matches above, loose matches on Subject: below --
2020-03-13 11:12 [Intel-gfx] [PATCH v7 0/3] Dynamic EU configuration of Slice/Sub-slice/EU srinivasan.s
2020-03-13 17:18 ` Tvrtko Ursulin
2020-03-13 17:32 ` S, Srinivasan
2020-03-15 0:12 ` Francisco Jerez
2020-03-15 16:56 ` Lionel Landwerlin
2020-03-15 18:08 ` Francisco Jerez
2020-03-15 22:30 ` Lionel Landwerlin
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.