All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/4] Dynamic EU configuration of Slice/Subslice/EU.
@ 2018-12-11 10:14 Ankit Navik
  2018-12-11 10:14 ` [PATCH v3 1/4] drm/i915: Get active pending request for given context Ankit Navik
                   ` (6 more replies)
  0 siblings, 7 replies; 18+ messages in thread
From: Ankit Navik @ 2018-12-11 10:14 UTC (permalink / raw)
  To: intel-gfx; +Cc: ankit.p.navik

drm/i915: Context aware user agnostic EU/Slice/Sub-slice control within kernel

Current GPU configuration code for i915 does not allow us to change
EU/Slice/Sub-slice configuration dynamically. Its done only once while context
is created.

While particular graphics application is running, if we examine the command
requests from user space, we observe that command density is not consistent.
It means there is scope to change the graphics configuration dynamically even
while context is running actively. This patch series proposes the solution to
find the active pending load for all active context at given time and based on
that, dynamically perform graphics configuration for each context.

We use a hr (high resolution) timer with i915 driver in kernel to get a
callback every few milliseconds (this timer value can be configured through
debugfs, default is '0' indicating timer is in disabled state i.e. original
system without any intervention).In the timer callback, we examine pending
commands for a context in the queue, essentially, we intercept them before
they are executed by GPU and we update context with required number of EUs.

Two questions, how did we arrive at right timer value? and what's the right
number of EUs? For the prior one, empirical data to achieve best performance
in least power was considered. For the later one, we roughly categorized number 
of EUs logically based on platform. Now we compare number of pending commands
with a particular threshold and then set number of EUs accordingly with update
context. That threshold is also based on experiments & findings. If GPU is able
to catch up with CPU, typically there are no pending commands, the EU config
would remain unchanged there. In case there are more pending commands we
reprogram context with higher number of EUs. Please note, here we are changing
EUs even while context is running by examining pending commands every 'x'
milliseconds.

With this solution in place, on KBL-GT3 + Android we saw following pnp
benefits, power numbers mentioned here are system power.

App /KPI               | % Power |
                       | Benefit |
                       |  (mW)   |
---------------------------------|
3D Mark (Ice storm)    | 2.30%   |
TRex On screen         | 2.49%   |
TRex Off screen        | 1.32%   |
ManhattanOn screen     | 3.11%   |
Manhattan Off screen   | 0.89%   |
AnTuTu  6.1.4          | 3.42%   |
SynMark2               | 1.70%   |

Note - For KBL (GEN9) we cannot control at sub-slice level, it was always  a
constraint.
We always controlled number of EUs rather than sub-slices/slices.
We have also observed GPU core residencies improves by 1.03%.

Praveen Diwakar (4):
  drm/i915: Get active pending request for given context
  drm/i915: Update render power clock state configuration for given
    context
  drm/i915: set optimum eu/slice/sub-slice configuration based on load
    type
  drm/i915: Predictive governor to control eu/slice/subslice

 drivers/gpu/drm/i915/i915_debugfs.c      | 90 +++++++++++++++++++++++++++++++-
 drivers/gpu/drm/i915/i915_drv.c          |  4 ++
 drivers/gpu/drm/i915/i915_drv.h          |  9 ++++
 drivers/gpu/drm/i915/i915_gem_context.c  | 23 ++++++++
 drivers/gpu/drm/i915/i915_gem_context.h  | 39 ++++++++++++++
 drivers/gpu/drm/i915/i915_request.c      |  2 +
 drivers/gpu/drm/i915/intel_device_info.c | 47 ++++++++++++++++-
 drivers/gpu/drm/i915/intel_lrc.c         | 16 +++++-
 8 files changed, 226 insertions(+), 4 deletions(-)

-- 
2.7.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v3 1/4] drm/i915: Get active pending request for given context
  2018-12-11 10:14 [PATCH v3 0/4] Dynamic EU configuration of Slice/Subslice/EU Ankit Navik
@ 2018-12-11 10:14 ` Ankit Navik
  2018-12-11 11:58   ` Tvrtko Ursulin
  2018-12-11 10:14 ` [PATCH v3 2/4] drm/i915: Update render power clock state configuration " Ankit Navik
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: Ankit Navik @ 2018-12-11 10:14 UTC (permalink / raw)
  To: intel-gfx; +Cc: ankit.p.navik

From: Praveen Diwakar <praveen.diwakar@intel.com>

This patch gives us the active pending request count which is yet
to be submitted to the GPU

V2:
 * Change 64-bit to atomic for request count. (Tvrtko Ursulin)

V3:
 * Remove mutex for request count.
 * Rebase.
 * Fixes hitting underflow for predictive request. (Tvrtko Ursulin)

Cc: Aravindan Muthukumar <aravindan.muthukumar@intel.com>
Cc: Kedar J Karanje <kedar.j.karanje@intel.com>
Cc: Yogesh Marathe <yogesh.marathe@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Praveen Diwakar <praveen.diwakar@intel.com>
Signed-off-by: Ankit Navik <ankit.p.navik@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 1 +
 drivers/gpu/drm/i915/i915_gem_context.h | 5 +++++
 drivers/gpu/drm/i915/i915_request.c     | 2 ++
 drivers/gpu/drm/i915/intel_lrc.c        | 2 ++
 4 files changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index b10770c..0bcbe32 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -387,6 +387,7 @@ i915_gem_create_context(struct drm_i915_private *dev_priv,
 	}
 
 	trace_i915_context_create(ctx);
+	atomic_set(&ctx->req_cnt, 0);
 
 	return ctx;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index b116e49..e824b15 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -194,6 +194,11 @@ struct i915_gem_context {
 	 * context close.
 	 */
 	struct list_head handles_list;
+
+	/** req_cnt: tracks the pending commands, based on which we decide to
+	 * go for low/medium/high load configuration of the GPU.
+	 */
+	atomic_t req_cnt;
 };
 
 static inline bool i915_gem_context_is_closed(const struct i915_gem_context *ctx)
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 5c2c93c..b90795a 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1113,6 +1113,8 @@ void i915_request_add(struct i915_request *request)
 	}
 	request->emitted_jiffies = jiffies;
 
+	atomic_inc(&request->gem_context->req_cnt);
+
 	/*
 	 * Let the backend know a new request has arrived that may need
 	 * to adjust the existing execution schedule due to a high priority
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 1744792..d33f5ac 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1127,6 +1127,8 @@ static void execlists_submit_request(struct i915_request *request)
 	submit_queue(engine, rq_prio(request));
 
 	spin_unlock_irqrestore(&engine->timeline.lock, flags);
+
+	atomic_dec(&request->gem_context->req_cnt);
 }
 
 static struct i915_request *sched_to_request(struct i915_sched_node *node)
-- 
2.7.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 2/4] drm/i915: Update render power clock state configuration for given context
  2018-12-11 10:14 [PATCH v3 0/4] Dynamic EU configuration of Slice/Subslice/EU Ankit Navik
  2018-12-11 10:14 ` [PATCH v3 1/4] drm/i915: Get active pending request for given context Ankit Navik
@ 2018-12-11 10:14 ` Ankit Navik
  2018-12-11 12:36   ` Tvrtko Ursulin
  2018-12-11 10:14 ` [PATCH v3 3/4] drm/i915: set optimum eu/slice/sub-slice configuration based on load type Ankit Navik
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: Ankit Navik @ 2018-12-11 10:14 UTC (permalink / raw)
  To: intel-gfx; +Cc: ankit.p.navik

From: Praveen Diwakar <praveen.diwakar@intel.com>

This patch will update power clock state register at runtime base on the
flag which can set by any governor which computes load and want to update
rpcs register.
subsequent patches will have a timer based governor which computes pending
load/request.

V2:
 * Reuse make_rpcs to get rpcs config. (Tvrtko Ursulin)

V3:
 * Rebase.

Cc: Aravindan Muthukumar <aravindan.muthukumar@intel.com>
Cc: Kedar J Karanje <kedar.j.karanje@intel.com>
Cc: Yogesh Marathe <yogesh.marathe@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Praveen Diwakar <praveen.diwakar@intel.com>
Signed-off-by: Ankit Navik <ankit.p.navik@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c |  4 ++++
 drivers/gpu/drm/i915/i915_gem_context.h |  9 +++++++++
 drivers/gpu/drm/i915/intel_lrc.c        | 12 +++++++++++-
 3 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 0bcbe32..d040858 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -388,6 +388,10 @@ i915_gem_create_context(struct drm_i915_private *dev_priv,
 
 	trace_i915_context_create(ctx);
 	atomic_set(&ctx->req_cnt, 0);
+	ctx->slice_cnt = hweight8(INTEL_INFO(dev_priv)->sseu.slice_mask);
+	ctx->subslice_cnt = hweight8(
+			INTEL_INFO(dev_priv)->sseu.subslice_mask[0]);
+	ctx->eu_cnt = INTEL_INFO(dev_priv)->sseu.eu_per_subslice;
 
 	return ctx;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index e824b15..e000530 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -199,6 +199,15 @@ struct i915_gem_context {
 	 * go for low/medium/high load configuration of the GPU.
 	 */
 	atomic_t req_cnt;
+
+	/** slice_cnt: used to set the # of slices to be enabled. */
+	u8 slice_cnt;
+
+	/** subslice_cnt: used to set the # of subslices to be enabled. */
+	u8 subslice_cnt;
+
+	/** eu_cnt: used to set the # of eu to be enabled. */
+	u8 eu_cnt;
 };
 
 static inline bool i915_gem_context_is_closed(const struct i915_gem_context *ctx)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index d33f5ac..a17f676 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -171,6 +171,7 @@ static void execlists_init_reg_state(u32 *reg_state,
 				     struct i915_gem_context *ctx,
 				     struct intel_engine_cs *engine,
 				     struct intel_ring *ring);
+static u32 make_rpcs(struct drm_i915_private *dev_priv);
 
 static inline struct i915_priolist *to_priolist(struct rb_node *rb)
 {
@@ -417,12 +418,21 @@ execlists_update_context_pdps(struct i915_hw_ppgtt *ppgtt, u32 *reg_state)
 
 static u64 execlists_update_context(struct i915_request *rq)
 {
+	u32 rpcs_config = 0;
 	struct intel_context *ce = rq->hw_context;
+	u32 *reg_state = ce->lrc_reg_state;
+	struct i915_gem_context *ctx = rq->gem_context;
 	struct i915_hw_ppgtt *ppgtt =
 		rq->gem_context->ppgtt ?: rq->i915->mm.aliasing_ppgtt;
-	u32 *reg_state = ce->lrc_reg_state;
 
 	reg_state[CTX_RING_TAIL+1] = intel_ring_set_tail(rq->ring, rq->tail);
+	/* FIXME: To avoid stale rpcs config, move it to context_pin */
+	if (ctx->pid && ctx->name && (rq->engine->id == RCS)) {
+		rpcs_config = make_rpcs(ctx->i915);
+		reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
+		CTX_REG(reg_state, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
+				rpcs_config);
+	}
 
 	/* True 32b PPGTT with dynamic page allocation: update PDP
 	 * registers and point the unallocated PDPs to scratch page.
-- 
2.7.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 3/4] drm/i915: set optimum eu/slice/sub-slice configuration based on load type
  2018-12-11 10:14 [PATCH v3 0/4] Dynamic EU configuration of Slice/Subslice/EU Ankit Navik
  2018-12-11 10:14 ` [PATCH v3 1/4] drm/i915: Get active pending request for given context Ankit Navik
  2018-12-11 10:14 ` [PATCH v3 2/4] drm/i915: Update render power clock state configuration " Ankit Navik
@ 2018-12-11 10:14 ` Ankit Navik
  2018-12-11 12:47   ` Tvrtko Ursulin
  2018-12-11 10:14 ` [PATCH v3 4/4] drm/i915: Predictive governor to control eu/slice/subslice Ankit Navik
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: Ankit Navik @ 2018-12-11 10:14 UTC (permalink / raw)
  To: intel-gfx; +Cc: ankit.p.navik

From: Praveen Diwakar <praveen.diwakar@intel.com>

This patch will select optimum eu/slice/sub-slice configuration based on
type of load (low, medium, high) as input.
Based on our readings and experiments we have predefined set of optimum
configuration for each platform(CHT, KBL).
i915_gem_context_set_load_type will select optimum configuration from
pre-defined optimum configuration table(opt_config).

It also introduce flag update_render_config which can set by any governor.

v2:
 * Move static optimum_config to device init time.
 * Rename function to appropriate name, fix data types and patch ordering.
 * Rename prev_load_type to pending_load_type. (Tvrtko Ursulin)

v3:
 * Add safe guard check in i915_gem_context_set_load_type.
 * Rename struct from optimum_config to i915_sseu_optimum_config to
   avoid namespace clashes.
 * Reduces memcpy for space efficient.
 * Rebase.
 * Improved commit message. (Tvrtko Ursulin)

Cc: Kedar J Karanje <kedar.j.karanje@intel.com>
Cc: Yogesh Marathe <yogesh.marathe@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Praveen Diwakar <praveen.diwakar@intel.com>
Signed-off-by: Aravindan Muthukumar <aravindan.muthukumar@intel.com>
Signed-off-by: Ankit Navik <ankit.p.navik@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h          |  3 ++
 drivers/gpu/drm/i915/i915_gem_context.c  | 18 ++++++++++++
 drivers/gpu/drm/i915/i915_gem_context.h  | 25 +++++++++++++++++
 drivers/gpu/drm/i915/intel_device_info.c | 47 ++++++++++++++++++++++++++++++--
 drivers/gpu/drm/i915/intel_lrc.c         |  4 ++-
 5 files changed, 94 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 4aca534..4b9a8c5 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1681,6 +1681,9 @@ struct drm_i915_private {
 	struct drm_i915_fence_reg fence_regs[I915_MAX_NUM_FENCES]; /* assume 965 */
 	int num_fence_regs; /* 8 on pre-965, 16 otherwise */
 
+	/* optimal slice/subslice/EU configration state */
+	struct i915_sseu_optimum_config opt_config[LOAD_TYPE_LAST];
+
 	unsigned int fsb_freq, mem_freq, is_ddr3;
 	unsigned int skl_preferred_vco_freq;
 	unsigned int max_cdclk_freq;
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index d040858..c0ced72 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -392,10 +392,28 @@ i915_gem_create_context(struct drm_i915_private *dev_priv,
 	ctx->subslice_cnt = hweight8(
 			INTEL_INFO(dev_priv)->sseu.subslice_mask[0]);
 	ctx->eu_cnt = INTEL_INFO(dev_priv)->sseu.eu_per_subslice;
+	ctx->load_type = 0;
+	ctx->pending_load_type = 0;
 
 	return ctx;
 }
 
+
+void i915_gem_context_set_load_type(struct i915_gem_context *ctx,
+		enum gem_load_type type)
+{
+	struct drm_i915_private *dev_priv = ctx->i915;
+
+	if (GEM_WARN_ON(type > LOAD_TYPE_LAST))
+		return;
+
+	/* Call opt_config to get correct configuration for eu,slice,subslice */
+	ctx->slice_cnt = dev_priv->opt_config[type].slice;
+	ctx->subslice_cnt = dev_priv->opt_config[type].subslice;
+	ctx->eu_cnt = dev_priv->opt_config[type].eu;
+	ctx->pending_load_type = type;
+}
+
 /**
  * i915_gem_context_create_gvt - create a GVT GEM context
  * @dev: drm device *
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index e000530..a0db13c 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -53,6 +53,19 @@ struct intel_context_ops {
 	void (*destroy)(struct intel_context *ce);
 };
 
+enum gem_load_type {
+	LOAD_TYPE_LOW,
+	LOAD_TYPE_MEDIUM,
+	LOAD_TYPE_HIGH,
+	LOAD_TYPE_LAST
+};
+
+struct i915_sseu_optimum_config {
+	u8 slice;
+	u8 subslice;
+	u8 eu;
+};
+
 /**
  * struct i915_gem_context - client state
  *
@@ -208,6 +221,16 @@ struct i915_gem_context {
 
 	/** eu_cnt: used to set the # of eu to be enabled. */
 	u8 eu_cnt;
+
+	/** load_type: The designated load_type (high/medium/low) for a given
+	 * number of pending commands in the command queue.
+	 */
+	enum gem_load_type load_type;
+
+	/** pending_load_type: The earlier load type that the GPU was configured
+	 * for (high/medium/low).
+	 */
+	enum gem_load_type pending_load_type;
 };
 
 static inline bool i915_gem_context_is_closed(const struct i915_gem_context *ctx)
@@ -336,6 +359,8 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 				    struct drm_file *file_priv);
 int i915_gem_context_reset_stats_ioctl(struct drm_device *dev, void *data,
 				       struct drm_file *file);
+void i915_gem_context_set_load_type(struct i915_gem_context *ctx,
+		enum gem_load_type type);
 
 struct i915_gem_context *
 i915_gem_context_create_kernel(struct drm_i915_private *i915, int prio);
diff --git a/drivers/gpu/drm/i915/intel_device_info.c b/drivers/gpu/drm/i915/intel_device_info.c
index 0ef0c64..33c6310 100644
--- a/drivers/gpu/drm/i915/intel_device_info.c
+++ b/drivers/gpu/drm/i915/intel_device_info.c
@@ -741,6 +741,28 @@ void intel_device_info_runtime_init(struct intel_device_info *info)
 		container_of(info, struct drm_i915_private, info);
 	enum pipe pipe;
 
+	struct i915_sseu_optimum_config *opt_config = NULL;
+	/* static table of slice/subslice/EU for Cherryview */
+	struct i915_sseu_optimum_config chv_config[LOAD_TYPE_LAST] = {
+		{1, 1, 4},	/* Low */
+		{1, 1, 6},	/* Medium */
+		{1, 2, 6}	/* High */
+	};
+
+	/* static table of slice/subslice/EU for KBL GT2 */
+	struct i915_sseu_optimum_config kbl_gt2_config[LOAD_TYPE_LAST] = {
+		{1, 3, 2},	/* Low */
+		{1, 3, 4},	/* Medium */
+		{1, 3, 8}	/* High */
+	};
+
+	/* static table of slice/subslice/EU for KBL GT3 */
+	struct i915_sseu_optimum_config kbl_gt3_config[LOAD_TYPE_LAST] = {
+		{2, 3, 4},	/* Low */
+		{2, 3, 6},	/* Medium */
+		{2, 3, 8}	/* High */
+	};
+
 	if (INTEL_GEN(dev_priv) >= 10) {
 		for_each_pipe(dev_priv, pipe)
 			info->num_scalers[pipe] = 2;
@@ -840,17 +862,38 @@ void intel_device_info_runtime_init(struct intel_device_info *info)
 	/* Initialize slice/subslice/EU info */
 	if (IS_HASWELL(dev_priv))
 		haswell_sseu_info_init(dev_priv);
-	else if (IS_CHERRYVIEW(dev_priv))
+	else if (IS_CHERRYVIEW(dev_priv)) {
 		cherryview_sseu_info_init(dev_priv);
+		opt_config = chv_config;
+	}
 	else if (IS_BROADWELL(dev_priv))
 		broadwell_sseu_info_init(dev_priv);
-	else if (INTEL_GEN(dev_priv) == 9)
+	else if (INTEL_GEN(dev_priv) == 9) {
 		gen9_sseu_info_init(dev_priv);
+
+		if (IS_KABYLAKE(dev_priv)) {
+			switch (info->gt) {
+			default:
+				MISSING_CASE(info->gt);
+				/* fall through */
+			case 2:
+				opt_config = kbl_gt2_config;
+			break;
+			case 3:
+				opt_config = kbl_gt3_config;
+			break;
+			}
+		}
+	}
 	else if (INTEL_GEN(dev_priv) == 10)
 		gen10_sseu_info_init(dev_priv);
 	else if (INTEL_GEN(dev_priv) >= 11)
 		gen11_sseu_info_init(dev_priv);
 
+	if (opt_config)
+		memcpy(dev_priv->opt_config, opt_config, LOAD_TYPE_LAST *
+			sizeof(struct i915_sseu_optimum_config));
+
 	/* Initialize command stream timestamp frequency */
 	info->cs_timestamp_frequency_khz = read_timestamp_frequency(dev_priv);
 }
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index a17f676..7fb9cd2 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -427,11 +427,13 @@ static u64 execlists_update_context(struct i915_request *rq)
 
 	reg_state[CTX_RING_TAIL+1] = intel_ring_set_tail(rq->ring, rq->tail);
 	/* FIXME: To avoid stale rpcs config, move it to context_pin */
-	if (ctx->pid && ctx->name && (rq->engine->id == RCS)) {
+	if (ctx->pid && ctx->name && (rq->engine->id == RCS) &&
+			(ctx->load_type != ctx->pending_load_type)) {
 		rpcs_config = make_rpcs(ctx->i915);
 		reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
 		CTX_REG(reg_state, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
 				rpcs_config);
+		ctx->load_type = ctx->pending_load_type;
 	}
 
 	/* True 32b PPGTT with dynamic page allocation: update PDP
-- 
2.7.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 4/4] drm/i915: Predictive governor to control eu/slice/subslice
  2018-12-11 10:14 [PATCH v3 0/4] Dynamic EU configuration of Slice/Subslice/EU Ankit Navik
                   ` (2 preceding siblings ...)
  2018-12-11 10:14 ` [PATCH v3 3/4] drm/i915: set optimum eu/slice/sub-slice configuration based on load type Ankit Navik
@ 2018-12-11 10:14 ` Ankit Navik
  2018-12-11 13:00   ` Tvrtko Ursulin
  2018-12-11 11:12 ` ✗ Fi.CI.BAT: failure for Dynamic EU configuration of Slice/Subslice/EU Patchwork
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: Ankit Navik @ 2018-12-11 10:14 UTC (permalink / raw)
  To: intel-gfx; +Cc: ankit.p.navik

From: Praveen Diwakar <praveen.diwakar@intel.com>

High resolution timer is used for predictive governor to control
eu/slice/subslice based on workloads.

Debugfs is provided to enable/disable/update timer configuration

V2:
 * Fix code style.
 * Move predictive_load_timer into a drm_i915_private
   structure.
 * Make generic function to set optimum config. (Tvrtko Ursulin)

V3:
 * Rebase.
 * Fix race condition for predictive load set.
 * Add slack to start hrtimer for more power efficient. (Tvrtko Ursulin)

Cc: Aravindan Muthukumar <aravindan.muthukumar@intel.com>
Cc: Yogesh Marathe <yogesh.marathe@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Praveen Diwakar <praveen.diwakar@intel.com>
Signed-off-by: Kedar J Karanje <kedar.j.karanje@intel.com>
Signed-off-by: Ankit Navik <ankit.p.navik@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 90 ++++++++++++++++++++++++++++++++++++-
 drivers/gpu/drm/i915/i915_drv.c     |  4 ++
 drivers/gpu/drm/i915/i915_drv.h     |  6 +++
 3 files changed, 99 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index f9ce35d..861f3c1 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -4740,6 +4740,92 @@ static const struct drm_info_list i915_debugfs_list[] = {
 	{"i915_drrs_status", i915_drrs_status, 0},
 	{"i915_rps_boost_info", i915_rps_boost_info, 0},
 };
+
+#define PENDING_REQ_0	0	/* No active request pending */
+
+/*
+ * Anything above threshold is considered as HIGH load, less is considered
+ * as LOW load and equal is considered as MEDIUM load.
+ *
+ * The threshold value of three active requests pending.
+ */
+#define PENDING_THRESHOLD_MEDIUM 3
+
+#define SLACK_TIMER_NSEC 1000000 /* Timer range in nano second */
+
+enum hrtimer_restart predictive_load_cb(struct hrtimer *hrtimer)
+{
+	struct drm_i915_private *dev_priv =
+			container_of(hrtimer, typeof(*dev_priv), pred_timer);
+	struct i915_gem_context *ctx;
+	enum gem_load_type load_type;
+	unsigned long req_pending;
+
+	list_for_each_entry(ctx, &dev_priv->contexts.list, link) {
+
+		/* Check for in-valid context */
+		if (!ctx->name)
+			continue;
+
+		req_pending = atomic_read(&ctx->req_cnt);
+
+		/*
+		 * Transitioning to low state whenever pending request is zero
+		 * would cause vacillation between low and high state.
+		 */
+		if (req_pending == PENDING_REQ_0)
+			continue;
+
+		if (req_pending > PENDING_THRESHOLD_MEDIUM)
+			load_type = LOAD_TYPE_HIGH;
+		else if (req_pending == PENDING_THRESHOLD_MEDIUM)
+			load_type = LOAD_TYPE_MEDIUM;
+		else
+			load_type = LOAD_TYPE_LOW;
+
+		i915_gem_context_set_load_type(ctx, load_type);
+	}
+
+	hrtimer_forward_now(hrtimer,
+			ms_to_ktime(dev_priv->predictive_load_enable));
+
+	return HRTIMER_RESTART;
+}
+
+static int i915_predictive_load_get(void *data, u64 *val)
+{
+	struct drm_i915_private *dev_priv = data;
+
+	*val = dev_priv->predictive_load_enable;
+	return 0;
+}
+
+static int i915_predictive_load_set(void *data, u64 val)
+{
+	struct drm_i915_private *dev_priv = data;
+
+	mutex_lock(&dev_priv->pred_mutex);
+
+	dev_priv->predictive_load_enable = val;
+
+	if (dev_priv->predictive_load_enable) {
+		if (!hrtimer_active(&dev_priv->pred_timer))
+			hrtimer_start_range_ns(&dev_priv->pred_timer,
+			ms_to_ktime(dev_priv->predictive_load_enable),
+			SLACK_TIMER_NSEC,
+			HRTIMER_MODE_REL_PINNED);
+	} else {
+		hrtimer_cancel(&dev_priv->pred_timer);
+	}
+
+	mutex_unlock(&dev_priv->pred_mutex);
+	return 0;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(i915_predictive_load_ctl,
+			i915_predictive_load_get, i915_predictive_load_set,
+			"%llu\n");
+
 #define I915_DEBUGFS_ENTRIES ARRAY_SIZE(i915_debugfs_list)
 
 static const struct i915_debugfs_files {
@@ -4769,7 +4855,9 @@ static const struct i915_debugfs_files {
 	{"i915_hpd_storm_ctl", &i915_hpd_storm_ctl_fops},
 	{"i915_ipc_status", &i915_ipc_status_fops},
 	{"i915_drrs_ctl", &i915_drrs_ctl_fops},
-	{"i915_edp_psr_debug", &i915_edp_psr_debug_fops}
+	{"i915_edp_psr_debug", &i915_edp_psr_debug_fops},
+	/* FIXME: When feature will become real, move to sysfs */
+	{"i915_predictive_load_ctl", &i915_predictive_load_ctl}
 };
 
 int i915_debugfs_register(struct drm_i915_private *dev_priv)
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index f8cfd16..79f4df5 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1397,6 +1397,10 @@ int i915_driver_load(struct pci_dev *pdev, const struct pci_device_id *ent)
 	if (ret < 0)
 		goto out_cleanup_hw;
 
+	/* Timer initialization for predictive load */
+	hrtimer_init(&dev_priv->pred_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+	dev_priv->pred_timer.function = predictive_load_cb;
+
 	i915_driver_register(dev_priv);
 
 	intel_runtime_pm_enable(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 4b9a8c5..a78fdbc 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1684,6 +1684,11 @@ struct drm_i915_private {
 	/* optimal slice/subslice/EU configration state */
 	struct i915_sseu_optimum_config opt_config[LOAD_TYPE_LAST];
 
+	/* protects predictive load state */
+	struct mutex pred_mutex;
+	struct hrtimer pred_timer;
+	int predictive_load_enable;
+
 	unsigned int fsb_freq, mem_freq, is_ddr3;
 	unsigned int skl_preferred_vco_freq;
 	unsigned int max_cdclk_freq;
@@ -2730,6 +2735,7 @@ extern unsigned long i915_chipset_val(struct drm_i915_private *dev_priv);
 extern unsigned long i915_mch_val(struct drm_i915_private *dev_priv);
 extern unsigned long i915_gfx_val(struct drm_i915_private *dev_priv);
 extern void i915_update_gfx_val(struct drm_i915_private *dev_priv);
+extern enum hrtimer_restart predictive_load_cb(struct hrtimer *hrtimer);
 int vlv_force_gfx_clock(struct drm_i915_private *dev_priv, bool on);
 
 int intel_engines_init_mmio(struct drm_i915_private *dev_priv);
-- 
2.7.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* ✗ Fi.CI.BAT: failure for Dynamic EU configuration of Slice/Subslice/EU.
  2018-12-11 10:14 [PATCH v3 0/4] Dynamic EU configuration of Slice/Subslice/EU Ankit Navik
                   ` (3 preceding siblings ...)
  2018-12-11 10:14 ` [PATCH v3 4/4] drm/i915: Predictive governor to control eu/slice/subslice Ankit Navik
@ 2018-12-11 11:12 ` Patchwork
  2018-12-11 13:02   ` Tvrtko Ursulin
  2018-12-11 11:48 ` [PATCH v3 0/4] " Tvrtko Ursulin
  2018-12-14 10:27 ` Joonas Lahtinen
  6 siblings, 1 reply; 18+ messages in thread
From: Patchwork @ 2018-12-11 11:12 UTC (permalink / raw)
  To: Navik, Ankit P; +Cc: intel-gfx

== Series Details ==

Series: Dynamic EU configuration of Slice/Subslice/EU.
URL   : https://patchwork.freedesktop.org/series/53876/
State : failure

== Summary ==

Applying: drm/i915: Get active pending request for given context
Using index info to reconstruct a base tree...
M	drivers/gpu/drm/i915/i915_gem_context.c
M	drivers/gpu/drm/i915/i915_gem_context.h
M	drivers/gpu/drm/i915/i915_request.c
M	drivers/gpu/drm/i915/intel_lrc.c
Falling back to patching base and 3-way merge...
Auto-merging drivers/gpu/drm/i915/intel_lrc.c
Auto-merging drivers/gpu/drm/i915/i915_request.c
Auto-merging drivers/gpu/drm/i915/i915_gem_context.h
Auto-merging drivers/gpu/drm/i915/i915_gem_context.c
Applying: drm/i915: Update render power clock state configuration for given context
Using index info to reconstruct a base tree...
M	drivers/gpu/drm/i915/i915_gem_context.c
M	drivers/gpu/drm/i915/i915_gem_context.h
M	drivers/gpu/drm/i915/intel_lrc.c
Falling back to patching base and 3-way merge...
Auto-merging drivers/gpu/drm/i915/intel_lrc.c
CONFLICT (content): Merge conflict in drivers/gpu/drm/i915/intel_lrc.c
Auto-merging drivers/gpu/drm/i915/i915_gem_context.h
Auto-merging drivers/gpu/drm/i915/i915_gem_context.c
error: Failed to merge in the changes.
Patch failed at 0002 drm/i915: Update render power clock state configuration for given context
Use 'git am --show-current-patch' to see the failed patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 0/4] Dynamic EU configuration of Slice/Subslice/EU.
  2018-12-11 10:14 [PATCH v3 0/4] Dynamic EU configuration of Slice/Subslice/EU Ankit Navik
                   ` (4 preceding siblings ...)
  2018-12-11 11:12 ` ✗ Fi.CI.BAT: failure for Dynamic EU configuration of Slice/Subslice/EU Patchwork
@ 2018-12-11 11:48 ` Tvrtko Ursulin
  2018-12-12  7:43   ` Navik, Ankit P
  2018-12-14 10:27 ` Joonas Lahtinen
  6 siblings, 1 reply; 18+ messages in thread
From: Tvrtko Ursulin @ 2018-12-11 11:48 UTC (permalink / raw)
  To: Ankit Navik, intel-gfx


On 11/12/2018 10:14, Ankit Navik wrote:
> drm/i915: Context aware user agnostic EU/Slice/Sub-slice control within kernel
> 
> Current GPU configuration code for i915 does not allow us to change
> EU/Slice/Sub-slice configuration dynamically. Its done only once while context
> is created.
> 
> While particular graphics application is running, if we examine the command
> requests from user space, we observe that command density is not consistent.
> It means there is scope to change the graphics configuration dynamically even
> while context is running actively. This patch series proposes the solution to
> find the active pending load for all active context at given time and based on
> that, dynamically perform graphics configuration for each context.
> 
> We use a hr (high resolution) timer with i915 driver in kernel to get a
> callback every few milliseconds (this timer value can be configured through
> debugfs, default is '0' indicating timer is in disabled state i.e. original
> system without any intervention).In the timer callback, we examine pending
> commands for a context in the queue, essentially, we intercept them before
> they are executed by GPU and we update context with required number of EUs.
> 
> Two questions, how did we arrive at right timer value? and what's the right
> number of EUs? For the prior one, empirical data to achieve best performance
> in least power was considered. For the later one, we roughly categorized number
> of EUs logically based on platform. Now we compare number of pending commands
> with a particular threshold and then set number of EUs accordingly with update
> context. That threshold is also based on experiments & findings. If GPU is able
> to catch up with CPU, typically there are no pending commands, the EU config
> would remain unchanged there. In case there are more pending commands we
> reprogram context with higher number of EUs. Please note, here we are changing
> EUs even while context is running by examining pending commands every 'x'
> milliseconds.
> 
> With this solution in place, on KBL-GT3 + Android we saw following pnp
> benefits, power numbers mentioned here are system power.
> 
> App /KPI               | % Power |
>                         | Benefit |
>                         |  (mW)   |
> ---------------------------------|
> 3D Mark (Ice storm)    | 2.30%   |
> TRex On screen         | 2.49%   |
> TRex Off screen        | 1.32%   |
> ManhattanOn screen     | 3.11%   |
> Manhattan Off screen   | 0.89%   |
> AnTuTu  6.1.4          | 3.42%   |
> SynMark2               | 1.70%   |

Is this the aggregated SynMark2 result, like all sub-tests averaged or 
something?

I suggest you do want to list much more detail here, all individual 
sub-tests, different platforms, etc. The change you are proposing is 
quite big and the amount of research that you must demonstrate for 
people to take this seriously has to be equally exhaustive.

Regards,

Tvrtko

> 
> Note - For KBL (GEN9) we cannot control at sub-slice level, it was always  a
> constraint.
> We always controlled number of EUs rather than sub-slices/slices.
> We have also observed GPU core residencies improves by 1.03%.
> 
> Praveen Diwakar (4):
>    drm/i915: Get active pending request for given context
>    drm/i915: Update render power clock state configuration for given
>      context
>    drm/i915: set optimum eu/slice/sub-slice configuration based on load
>      type
>    drm/i915: Predictive governor to control eu/slice/subslice
> 
>   drivers/gpu/drm/i915/i915_debugfs.c      | 90 +++++++++++++++++++++++++++++++-
>   drivers/gpu/drm/i915/i915_drv.c          |  4 ++
>   drivers/gpu/drm/i915/i915_drv.h          |  9 ++++
>   drivers/gpu/drm/i915/i915_gem_context.c  | 23 ++++++++
>   drivers/gpu/drm/i915/i915_gem_context.h  | 39 ++++++++++++++
>   drivers/gpu/drm/i915/i915_request.c      |  2 +
>   drivers/gpu/drm/i915/intel_device_info.c | 47 ++++++++++++++++-
>   drivers/gpu/drm/i915/intel_lrc.c         | 16 +++++-
>   8 files changed, 226 insertions(+), 4 deletions(-)
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 1/4] drm/i915: Get active pending request for given context
  2018-12-11 10:14 ` [PATCH v3 1/4] drm/i915: Get active pending request for given context Ankit Navik
@ 2018-12-11 11:58   ` Tvrtko Ursulin
  0 siblings, 0 replies; 18+ messages in thread
From: Tvrtko Ursulin @ 2018-12-11 11:58 UTC (permalink / raw)
  To: Ankit Navik, intel-gfx


On 11/12/2018 10:14, Ankit Navik wrote:
> From: Praveen Diwakar <praveen.diwakar@intel.com>
> 
> This patch gives us the active pending request count which is yet
> to be submitted to the GPU
> 
> V2:
>   * Change 64-bit to atomic for request count. (Tvrtko Ursulin)
> 
> V3:
>   * Remove mutex for request count.
>   * Rebase.
>   * Fixes hitting underflow for predictive request. (Tvrtko Ursulin)
> 
> Cc: Aravindan Muthukumar <aravindan.muthukumar@intel.com>
> Cc: Kedar J Karanje <kedar.j.karanje@intel.com>
> Cc: Yogesh Marathe <yogesh.marathe@intel.com>
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>

No, I did not tag this with r-b and you are not allowed to do this!!

> Signed-off-by: Praveen Diwakar <praveen.diwakar@intel.com>
> Signed-off-by: Ankit Navik <ankit.p.navik@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem_context.c | 1 +
>   drivers/gpu/drm/i915/i915_gem_context.h | 5 +++++
>   drivers/gpu/drm/i915/i915_request.c     | 2 ++
>   drivers/gpu/drm/i915/intel_lrc.c        | 2 ++
>   4 files changed, 10 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index b10770c..0bcbe32 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -387,6 +387,7 @@ i915_gem_create_context(struct drm_i915_private *dev_priv,
>   	}
>   
>   	trace_i915_context_create(ctx);
> +	atomic_set(&ctx->req_cnt, 0);
>   
>   	return ctx;
>   }
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
> index b116e49..e824b15 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.h
> +++ b/drivers/gpu/drm/i915/i915_gem_context.h
> @@ -194,6 +194,11 @@ struct i915_gem_context {
>   	 * context close.
>   	 */
>   	struct list_head handles_list;
> +
> +	/** req_cnt: tracks the pending commands, based on which we decide to
> +	 * go for low/medium/high load configuration of the GPU.
> +	 */
> +	atomic_t req_cnt;
>   };
>   
>   static inline bool i915_gem_context_is_closed(const struct i915_gem_context *ctx)
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 5c2c93c..b90795a 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -1113,6 +1113,8 @@ void i915_request_add(struct i915_request *request)
>   	}
>   	request->emitted_jiffies = jiffies;
>   
> +	atomic_inc(&request->gem_context->req_cnt);
> +
>   	/*
>   	 * Let the backend know a new request has arrived that may need
>   	 * to adjust the existing execution schedule due to a high priority
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 1744792..d33f5ac 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1127,6 +1127,8 @@ static void execlists_submit_request(struct i915_request *request)
>   	submit_queue(engine, rq_prio(request));
>   
>   	spin_unlock_irqrestore(&engine->timeline.lock, flags);
> +
> +	atomic_dec(&request->gem_context->req_cnt);
>   }
>   
>   static struct i915_request *sched_to_request(struct i915_sched_node *node)
> 

With such placement of accounting you are only considering requests 
which are not yet runnable (due fences and implicit dependencies). If on 
the contrary everything is runnable, and there is a lot of it waiting 
for the GPU to execute it, this counter will show zero. And you'll 
decide to run in a reduced slice/EU configuration. There has to be some 
benchmarks which shows the adverse effect of this, you just haven't 
found it yet I guess.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 2/4] drm/i915: Update render power clock state configuration for given context
  2018-12-11 10:14 ` [PATCH v3 2/4] drm/i915: Update render power clock state configuration " Ankit Navik
@ 2018-12-11 12:36   ` Tvrtko Ursulin
  2019-03-14  8:55     ` Ankit Navik
  0 siblings, 1 reply; 18+ messages in thread
From: Tvrtko Ursulin @ 2018-12-11 12:36 UTC (permalink / raw)
  To: Ankit Navik, intel-gfx


On 11/12/2018 10:14, Ankit Navik wrote:
> From: Praveen Diwakar <praveen.diwakar@intel.com>
> 
> This patch will update power clock state register at runtime base on the
> flag which can set by any governor which computes load and want to update
> rpcs register.
> subsequent patches will have a timer based governor which computes pending
> load/request.
> 
> V2:
>   * Reuse make_rpcs to get rpcs config. (Tvrtko Ursulin)
> 
> V3:
>   * Rebase.
> 
> Cc: Aravindan Muthukumar <aravindan.muthukumar@intel.com>
> Cc: Kedar J Karanje <kedar.j.karanje@intel.com>
> Cc: Yogesh Marathe <yogesh.marathe@intel.com>
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>

Again, I did not give an r-b for this!

> Signed-off-by: Praveen Diwakar <praveen.diwakar@intel.com>
> Signed-off-by: Ankit Navik <ankit.p.navik@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem_context.c |  4 ++++
>   drivers/gpu/drm/i915/i915_gem_context.h |  9 +++++++++
>   drivers/gpu/drm/i915/intel_lrc.c        | 12 +++++++++++-
>   3 files changed, 24 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 0bcbe32..d040858 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -388,6 +388,10 @@ i915_gem_create_context(struct drm_i915_private *dev_priv,
>   
>   	trace_i915_context_create(ctx);
>   	atomic_set(&ctx->req_cnt, 0);
> +	ctx->slice_cnt = hweight8(INTEL_INFO(dev_priv)->sseu.slice_mask);
> +	ctx->subslice_cnt = hweight8(
> +			INTEL_INFO(dev_priv)->sseu.subslice_mask[0]);
> +	ctx->eu_cnt = INTEL_INFO(dev_priv)->sseu.eu_per_subslice;
>   
>   	return ctx;
>   }
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
> index e824b15..e000530 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.h
> +++ b/drivers/gpu/drm/i915/i915_gem_context.h
> @@ -199,6 +199,15 @@ struct i915_gem_context {
>   	 * go for low/medium/high load configuration of the GPU.
>   	 */
>   	atomic_t req_cnt;
> +
> +	/** slice_cnt: used to set the # of slices to be enabled. */
> +	u8 slice_cnt;
> +
> +	/** subslice_cnt: used to set the # of subslices to be enabled. */
> +	u8 subslice_cnt;
> +
> +	/** eu_cnt: used to set the # of eu to be enabled. */
> +	u8 eu_cnt;
>   };
>   
>   static inline bool i915_gem_context_is_closed(const struct i915_gem_context *ctx)
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index d33f5ac..a17f676 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -171,6 +171,7 @@ static void execlists_init_reg_state(u32 *reg_state,
>   				     struct i915_gem_context *ctx,
>   				     struct intel_engine_cs *engine,
>   				     struct intel_ring *ring);
> +static u32 make_rpcs(struct drm_i915_private *dev_priv);
>   
>   static inline struct i915_priolist *to_priolist(struct rb_node *rb)
>   {
> @@ -417,12 +418,21 @@ execlists_update_context_pdps(struct i915_hw_ppgtt *ppgtt, u32 *reg_state)
>   
>   static u64 execlists_update_context(struct i915_request *rq)
>   {
> +	u32 rpcs_config = 0;

Move to if block where it is used.

>   	struct intel_context *ce = rq->hw_context;
> +	u32 *reg_state = ce->lrc_reg_state;

No need to touch this.

> +	struct i915_gem_context *ctx = rq->gem_context;

Can go under the if block as well.

>   	struct i915_hw_ppgtt *ppgtt =
>   		rq->gem_context->ppgtt ?: rq->i915->mm.aliasing_ppgtt;
> -	u32 *reg_state = ce->lrc_reg_state;
>   
>   	reg_state[CTX_RING_TAIL+1] = intel_ring_set_tail(rq->ring, rq->tail);
> +	/* FIXME: To avoid stale rpcs config, move it to context_pin */

This FIXME remains unresolved by the whole series, so that really cannot 
be like that.

I suggest you add https://patchwork.freedesktop.org/patch/261560/ to 
your series and rebase this on top.

Which would actually make this patch not do very much and you should 
probably squash it with the one which actually uses the added fields.

> +	if (ctx->pid && ctx->name && (rq->engine->id == RCS)) {

Why name and pid? You are using them as proxy for something.. but for 
what and why? The answer may hold a hint to the proper solution.

> +		rpcs_config = make_rpcs(ctx->i915);
> +		reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
> +		CTX_REG(reg_state, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
> +				rpcs_config);
> +	}
>   
>   	/* True 32b PPGTT with dynamic page allocation: update PDP
>   	 * registers and point the unallocated PDPs to scratch page.
> 

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 3/4] drm/i915: set optimum eu/slice/sub-slice configuration based on load type
  2018-12-11 10:14 ` [PATCH v3 3/4] drm/i915: set optimum eu/slice/sub-slice configuration based on load type Ankit Navik
@ 2018-12-11 12:47   ` Tvrtko Ursulin
  2018-12-12  7:36     ` Navik, Ankit P
  0 siblings, 1 reply; 18+ messages in thread
From: Tvrtko Ursulin @ 2018-12-11 12:47 UTC (permalink / raw)
  To: Ankit Navik, intel-gfx


On 11/12/2018 10:14, Ankit Navik wrote:
> From: Praveen Diwakar <praveen.diwakar@intel.com>
> 
> This patch will select optimum eu/slice/sub-slice configuration based on
> type of load (low, medium, high) as input.
> Based on our readings and experiments we have predefined set of optimum
> configuration for each platform(CHT, KBL).
> i915_gem_context_set_load_type will select optimum configuration from
> pre-defined optimum configuration table(opt_config).
> 
> It also introduce flag update_render_config which can set by any governor.
> 
> v2:
>   * Move static optimum_config to device init time.
>   * Rename function to appropriate name, fix data types and patch ordering.
>   * Rename prev_load_type to pending_load_type. (Tvrtko Ursulin)
> 
> v3:
>   * Add safe guard check in i915_gem_context_set_load_type.
>   * Rename struct from optimum_config to i915_sseu_optimum_config to
>     avoid namespace clashes.
>   * Reduces memcpy for space efficient.
>   * Rebase.
>   * Improved commit message. (Tvrtko Ursulin)
> 
> Cc: Kedar J Karanje <kedar.j.karanje@intel.com>
> Cc: Yogesh Marathe <yogesh.marathe@intel.com>
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>

Again for the record no, I did not r-b this.

> Signed-off-by: Praveen Diwakar <praveen.diwakar@intel.com>
> Signed-off-by: Aravindan Muthukumar <aravindan.muthukumar@intel.com>
> Signed-off-by: Ankit Navik <ankit.p.navik@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_drv.h          |  3 ++
>   drivers/gpu/drm/i915/i915_gem_context.c  | 18 ++++++++++++
>   drivers/gpu/drm/i915/i915_gem_context.h  | 25 +++++++++++++++++
>   drivers/gpu/drm/i915/intel_device_info.c | 47 ++++++++++++++++++++++++++++++--
>   drivers/gpu/drm/i915/intel_lrc.c         |  4 ++-
>   5 files changed, 94 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 4aca534..4b9a8c5 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1681,6 +1681,9 @@ struct drm_i915_private {
>   	struct drm_i915_fence_reg fence_regs[I915_MAX_NUM_FENCES]; /* assume 965 */
>   	int num_fence_regs; /* 8 on pre-965, 16 otherwise */
>   
> +	/* optimal slice/subslice/EU configration state */
> +	struct i915_sseu_optimum_config opt_config[LOAD_TYPE_LAST];

Make it a pointer to struct i915_sseu_optimum_config.

> +
>   	unsigned int fsb_freq, mem_freq, is_ddr3;
>   	unsigned int skl_preferred_vco_freq;
>   	unsigned int max_cdclk_freq;
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index d040858..c0ced72 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -392,10 +392,28 @@ i915_gem_create_context(struct drm_i915_private *dev_priv,
>   	ctx->subslice_cnt = hweight8(
>   			INTEL_INFO(dev_priv)->sseu.subslice_mask[0]);
>   	ctx->eu_cnt = INTEL_INFO(dev_priv)->sseu.eu_per_subslice;
> +	ctx->load_type = 0;
> +	ctx->pending_load_type = 0;

Not needed because we zero the allocation and probably depend on 
untouched fields being zero elsewhere.

>   
>   	return ctx;
>   }
>   
> +
> +void i915_gem_context_set_load_type(struct i915_gem_context *ctx,
> +		enum gem_load_type type)
> +{
> +	struct drm_i915_private *dev_priv = ctx->i915;
> +
> +	if (GEM_WARN_ON(type > LOAD_TYPE_LAST))
> +		return;
> +
> +	/* Call opt_config to get correct configuration for eu,slice,subslice */
> +	ctx->slice_cnt = dev_priv->opt_config[type].slice;
> +	ctx->subslice_cnt = dev_priv->opt_config[type].subslice;
> +	ctx->eu_cnt = dev_priv->opt_config[type].eu;
> +	ctx->pending_load_type = type;
> +}
> +
>   /**
>    * i915_gem_context_create_gvt - create a GVT GEM context
>    * @dev: drm device *
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
> index e000530..a0db13c 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.h
> +++ b/drivers/gpu/drm/i915/i915_gem_context.h
> @@ -53,6 +53,19 @@ struct intel_context_ops {
>   	void (*destroy)(struct intel_context *ce);
>   };
>   
> +enum gem_load_type {
> +	LOAD_TYPE_LOW,
> +	LOAD_TYPE_MEDIUM,
> +	LOAD_TYPE_HIGH,
> +	LOAD_TYPE_LAST
> +};
> +
> +struct i915_sseu_optimum_config {
> +	u8 slice;
> +	u8 subslice;
> +	u8 eu;
> +};
> +
>   /**
>    * struct i915_gem_context - client state
>    *
> @@ -208,6 +221,16 @@ struct i915_gem_context {
>   
>   	/** eu_cnt: used to set the # of eu to be enabled. */
>   	u8 eu_cnt;
> +
> +	/** load_type: The designated load_type (high/medium/low) for a given
> +	 * number of pending commands in the command queue.
> +	 */
> +	enum gem_load_type load_type;
> +
> +	/** pending_load_type: The earlier load type that the GPU was configured
> +	 * for (high/medium/low).
> +	 */
> +	enum gem_load_type pending_load_type;
>   };
>   
>   static inline bool i915_gem_context_is_closed(const struct i915_gem_context *ctx)
> @@ -336,6 +359,8 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>   				    struct drm_file *file_priv);
>   int i915_gem_context_reset_stats_ioctl(struct drm_device *dev, void *data,
>   				       struct drm_file *file);
> +void i915_gem_context_set_load_type(struct i915_gem_context *ctx,
> +		enum gem_load_type type);
>   
>   struct i915_gem_context *
>   i915_gem_context_create_kernel(struct drm_i915_private *i915, int prio);
> diff --git a/drivers/gpu/drm/i915/intel_device_info.c b/drivers/gpu/drm/i915/intel_device_info.c
> index 0ef0c64..33c6310 100644
> --- a/drivers/gpu/drm/i915/intel_device_info.c
> +++ b/drivers/gpu/drm/i915/intel_device_info.c
> @@ -741,6 +741,28 @@ void intel_device_info_runtime_init(struct intel_device_info *info)
>   		container_of(info, struct drm_i915_private, info);
>   	enum pipe pipe;
>   
> +	struct i915_sseu_optimum_config *opt_config = NULL;
> +	/* static table of slice/subslice/EU for Cherryview */
> +	struct i915_sseu_optimum_config chv_config[LOAD_TYPE_LAST] = {
> +		{1, 1, 4},	/* Low */
> +		{1, 1, 6},	/* Medium */
> +		{1, 2, 6}	/* High */
> +	};
> +
> +	/* static table of slice/subslice/EU for KBL GT2 */
> +	struct i915_sseu_optimum_config kbl_gt2_config[LOAD_TYPE_LAST] = {
> +		{1, 3, 2},	/* Low */
> +		{1, 3, 4},	/* Medium */
> +		{1, 3, 8}	/* High */
> +	};
> +
> +	/* static table of slice/subslice/EU for KBL GT3 */
> +	struct i915_sseu_optimum_config kbl_gt3_config[LOAD_TYPE_LAST] = {
> +		{2, 3, 4},	/* Low */
> +		{2, 3, 6},	/* Medium */
> +		{2, 3, 8}	/* High */
> +	};

Move these to file scope as static const.

> +
>   	if (INTEL_GEN(dev_priv) >= 10) {
>   		for_each_pipe(dev_priv, pipe)
>   			info->num_scalers[pipe] = 2;
> @@ -840,17 +862,38 @@ void intel_device_info_runtime_init(struct intel_device_info *info)
>   	/* Initialize slice/subslice/EU info */
>   	if (IS_HASWELL(dev_priv))
>   		haswell_sseu_info_init(dev_priv);
> -	else if (IS_CHERRYVIEW(dev_priv))
> +	else if (IS_CHERRYVIEW(dev_priv)) {
>   		cherryview_sseu_info_init(dev_priv);
> +		opt_config = chv_config;

For my idea to work here and below add:

BUILD_BUG_ON(ARRAY_SIZE(chv_config) != LOAD_TYPE_LAST);

> +	}
>   	else if (IS_BROADWELL(dev_priv))
>   		broadwell_sseu_info_init(dev_priv);
> -	else if (INTEL_GEN(dev_priv) == 9)
> +	else if (INTEL_GEN(dev_priv) == 9) {
>   		gen9_sseu_info_init(dev_priv);
> +
> +		if (IS_KABYLAKE(dev_priv)) {
> +			switch (info->gt) {
> +			default:
> +				MISSING_CASE(info->gt);

This should probably be silent since absence of sseu tuning is not an 
error on other platforms.

> +				/* fall through */
> +			case 2:
> +				opt_config = kbl_gt2_config;
> +			break;
> +			case 3:
> +				opt_config = kbl_gt3_config;

Would Kabylake table work for other Gen9 platforms?

> +			break;
> +			}
> +		}
> +	}
>   	else if (INTEL_GEN(dev_priv) == 10)
>   		gen10_sseu_info_init(dev_priv);
>   	else if (INTEL_GEN(dev_priv) >= 11)
>   		gen11_sseu_info_init(dev_priv);
>   
> +	if (opt_config)
> +		memcpy(dev_priv->opt_config, opt_config, LOAD_TYPE_LAST *
> +			sizeof(struct i915_sseu_optimum_config));

And if you agree with my other suggestions then you can replace a copy with:

	dev_priv->opt_config = opt_config;

Or in fact, even do it directly on assignment sites above.

> +
>   	/* Initialize command stream timestamp frequency */
>   	info->cs_timestamp_frequency_khz = read_timestamp_frequency(dev_priv);
>   }
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index a17f676..7fb9cd2 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -427,11 +427,13 @@ static u64 execlists_update_context(struct i915_request *rq)
>   
>   	reg_state[CTX_RING_TAIL+1] = intel_ring_set_tail(rq->ring, rq->tail);
>   	/* FIXME: To avoid stale rpcs config, move it to context_pin */
> -	if (ctx->pid && ctx->name && (rq->engine->id == RCS)) {
> +	if (ctx->pid && ctx->name && (rq->engine->id == RCS) &&
> +			(ctx->load_type != ctx->pending_load_type)) {
>   		rpcs_config = make_rpcs(ctx->i915);
>   		reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
>   		CTX_REG(reg_state, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
>   				rpcs_config);
> +		ctx->load_type = ctx->pending_load_type;
>   	}
>   
>   	/* True 32b PPGTT with dynamic page allocation: update PDP
> 

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 4/4] drm/i915: Predictive governor to control eu/slice/subslice
  2018-12-11 10:14 ` [PATCH v3 4/4] drm/i915: Predictive governor to control eu/slice/subslice Ankit Navik
@ 2018-12-11 13:00   ` Tvrtko Ursulin
  0 siblings, 0 replies; 18+ messages in thread
From: Tvrtko Ursulin @ 2018-12-11 13:00 UTC (permalink / raw)
  To: Ankit Navik, intel-gfx


On 11/12/2018 10:14, Ankit Navik wrote:
> From: Praveen Diwakar <praveen.diwakar@intel.com>
> 
> High resolution timer is used for predictive governor to control
> eu/slice/subslice based on workloads.
> 
> Debugfs is provided to enable/disable/update timer configuration
> 
> V2:
>   * Fix code style.
>   * Move predictive_load_timer into a drm_i915_private
>     structure.
>   * Make generic function to set optimum config. (Tvrtko Ursulin)
> 
> V3:
>   * Rebase.
>   * Fix race condition for predictive load set.
>   * Add slack to start hrtimer for more power efficient. (Tvrtko Ursulin)
> 
> Cc: Aravindan Muthukumar <aravindan.muthukumar@intel.com>
> Cc: Yogesh Marathe <yogesh.marathe@intel.com>
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>

For the record - no, I did not r-b this. Please remove all false r-b tags.

> Signed-off-by: Praveen Diwakar <praveen.diwakar@intel.com>
> Signed-off-by: Kedar J Karanje <kedar.j.karanje@intel.com>
> Signed-off-by: Ankit Navik <ankit.p.navik@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_debugfs.c | 90 ++++++++++++++++++++++++++++++++++++-
>   drivers/gpu/drm/i915/i915_drv.c     |  4 ++
>   drivers/gpu/drm/i915/i915_drv.h     |  6 +++
>   3 files changed, 99 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index f9ce35d..861f3c1 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -4740,6 +4740,92 @@ static const struct drm_info_list i915_debugfs_list[] = {
>   	{"i915_drrs_status", i915_drrs_status, 0},
>   	{"i915_rps_boost_info", i915_rps_boost_info, 0},
>   };
> +
> +#define PENDING_REQ_0	0	/* No active request pending */

Still seems silly - what is the benefit of adding a define with zero in 
it's name, which evaluates to zero? Just remove it.

> +
> +/*
> + * Anything above threshold is considered as HIGH load, less is considered
> + * as LOW load and equal is considered as MEDIUM load.
> + *
> + * The threshold value of three active requests pending.
> + */
> +#define PENDING_THRESHOLD_MEDIUM 3
> +
> +#define SLACK_TIMER_NSEC 1000000 /* Timer range in nano second */
> +
> +enum hrtimer_restart predictive_load_cb(struct hrtimer *hrtimer)
> +{
> +	struct drm_i915_private *dev_priv =
> +			container_of(hrtimer, typeof(*dev_priv), pred_timer);
> +	struct i915_gem_context *ctx;
> +	enum gem_load_type load_type;
> +	unsigned long req_pending;

unsigned int is enough to hold atomic_t.

> +
> +	list_for_each_entry(ctx, &dev_priv->contexts.list, link) {
> +
> +		/* Check for in-valid context */
> +		if (!ctx->name)
> +			continue;

What kind of contexts are these invalid ones? :)

> +
> +		req_pending = atomic_read(&ctx->req_cnt);
> +
> +		/*
> +		 * Transitioning to low state whenever pending request is zero
> +		 * would cause vacillation between low and high state.
> +		 */
> +		if (req_pending == PENDING_REQ_0)
> +			continue;
> +
> +		if (req_pending > PENDING_THRESHOLD_MEDIUM)
> +			load_type = LOAD_TYPE_HIGH;
> +		else if (req_pending == PENDING_THRESHOLD_MEDIUM)
> +			load_type = LOAD_TYPE_MEDIUM;
> +		else
> +			load_type = LOAD_TYPE_LOW;
> +
> +		i915_gem_context_set_load_type(ctx, load_type);
> +	}
> +
> +	hrtimer_forward_now(hrtimer,
> +			ms_to_ktime(dev_priv->predictive_load_enable));
> +
> +	return HRTIMER_RESTART;
> +}
> +
> +static int i915_predictive_load_get(void *data, u64 *val)
> +{
> +	struct drm_i915_private *dev_priv = data;
> +
> +	*val = dev_priv->predictive_load_enable;
> +	return 0;
> +}
> +
> +static int i915_predictive_load_set(void *data, u64 val)
> +{
> +	struct drm_i915_private *dev_priv = data;
> +
> +	mutex_lock(&dev_priv->pred_mutex);
> +
> +	dev_priv->predictive_load_enable = val;
> +
> +	if (dev_priv->predictive_load_enable) {

The period change applies after the current timer expires, okay.

> +		if (!hrtimer_active(&dev_priv->pred_timer))
> +			hrtimer_start_range_ns(&dev_priv->pred_timer,
> +			ms_to_ktime(dev_priv->predictive_load_enable),
> +			SLACK_TIMER_NSEC,
> +			HRTIMER_MODE_REL_PINNED);
> +	} else {
> +		hrtimer_cancel(&dev_priv->pred_timer);
> +	}
> +
> +	mutex_unlock(&dev_priv->pred_mutex);
> +	return 0;
> +}
> +
> +DEFINE_SIMPLE_ATTRIBUTE(i915_predictive_load_ctl,
> +			i915_predictive_load_get, i915_predictive_load_set,
> +			"%llu\n");
> +
>   #define I915_DEBUGFS_ENTRIES ARRAY_SIZE(i915_debugfs_list)
>   
>   static const struct i915_debugfs_files {
> @@ -4769,7 +4855,9 @@ static const struct i915_debugfs_files {
>   	{"i915_hpd_storm_ctl", &i915_hpd_storm_ctl_fops},
>   	{"i915_ipc_status", &i915_ipc_status_fops},
>   	{"i915_drrs_ctl", &i915_drrs_ctl_fops},
> -	{"i915_edp_psr_debug", &i915_edp_psr_debug_fops}
> +	{"i915_edp_psr_debug", &i915_edp_psr_debug_fops},
> +	/* FIXME: When feature will become real, move to sysfs */
> +	{"i915_predictive_load_ctl", &i915_predictive_load_ctl}
>   };
>   
>   int i915_debugfs_register(struct drm_i915_private *dev_priv)
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index f8cfd16..79f4df5 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -1397,6 +1397,10 @@ int i915_driver_load(struct pci_dev *pdev, const struct pci_device_id *ent)
>   	if (ret < 0)
>   		goto out_cleanup_hw;
>   
> +	/* Timer initialization for predictive load */
> +	hrtimer_init(&dev_priv->pred_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> +	dev_priv->pred_timer.function = predictive_load_cb;

mutex_init(&dev_priv->pred_mutex) is missing.

And please move this whole block to i915_gem_init_early.

> +
>   	i915_driver_register(dev_priv);
>   
>   	intel_runtime_pm_enable(dev_priv);
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 4b9a8c5..a78fdbc 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1684,6 +1684,11 @@ struct drm_i915_private {
>   	/* optimal slice/subslice/EU configration state */
>   	struct i915_sseu_optimum_config opt_config[LOAD_TYPE_LAST];
>   
> +	/* protects predictive load state */
> +	struct mutex pred_mutex;
> +	struct hrtimer pred_timer;
> +	int predictive_load_enable;
> +
>   	unsigned int fsb_freq, mem_freq, is_ddr3;
>   	unsigned int skl_preferred_vco_freq;
>   	unsigned int max_cdclk_freq;
> @@ -2730,6 +2735,7 @@ extern unsigned long i915_chipset_val(struct drm_i915_private *dev_priv);
>   extern unsigned long i915_mch_val(struct drm_i915_private *dev_priv);
>   extern unsigned long i915_gfx_val(struct drm_i915_private *dev_priv);
>   extern void i915_update_gfx_val(struct drm_i915_private *dev_priv);
> +extern enum hrtimer_restart predictive_load_cb(struct hrtimer *hrtimer);
>   int vlv_force_gfx_clock(struct drm_i915_private *dev_priv, bool on);
>   
>   int intel_engines_init_mmio(struct drm_i915_private *dev_priv);
> 

Looking smaller and better, right?

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: ✗ Fi.CI.BAT: failure for Dynamic EU configuration of Slice/Subslice/EU.
  2018-12-11 11:12 ` ✗ Fi.CI.BAT: failure for Dynamic EU configuration of Slice/Subslice/EU Patchwork
@ 2018-12-11 13:02   ` Tvrtko Ursulin
  0 siblings, 0 replies; 18+ messages in thread
From: Tvrtko Ursulin @ 2018-12-11 13:02 UTC (permalink / raw)
  To: intel-gfx, Patchwork, Navik, Ankit P


On 11/12/2018 11:12, Patchwork wrote:
> == Series Details ==
> 
> Series: Dynamic EU configuration of Slice/Subslice/EU.
> URL   : https://patchwork.freedesktop.org/series/53876/
> State : failure
> 
> == Summary ==
> 
> Applying: drm/i915: Get active pending request for given context
> Using index info to reconstruct a base tree...
> M	drivers/gpu/drm/i915/i915_gem_context.c
> M	drivers/gpu/drm/i915/i915_gem_context.h
> M	drivers/gpu/drm/i915/i915_request.c
> M	drivers/gpu/drm/i915/intel_lrc.c
> Falling back to patching base and 3-way merge...
> Auto-merging drivers/gpu/drm/i915/intel_lrc.c
> Auto-merging drivers/gpu/drm/i915/i915_request.c
> Auto-merging drivers/gpu/drm/i915/i915_gem_context.h
> Auto-merging drivers/gpu/drm/i915/i915_gem_context.c
> Applying: drm/i915: Update render power clock state configuration for given context
> Using index info to reconstruct a base tree...
> M	drivers/gpu/drm/i915/i915_gem_context.c
> M	drivers/gpu/drm/i915/i915_gem_context.h
> M	drivers/gpu/drm/i915/intel_lrc.c
> Falling back to patching base and 3-way merge...
> Auto-merging drivers/gpu/drm/i915/intel_lrc.c
> CONFLICT (content): Merge conflict in drivers/gpu/drm/i915/intel_lrc.c
> Auto-merging drivers/gpu/drm/i915/i915_gem_context.h
> Auto-merging drivers/gpu/drm/i915/i915_gem_context.c
> error: Failed to merge in the changes.
> Patch failed at 0002 drm/i915: Update render power clock state configuration for given context
> Use 'git am --show-current-patch' to see the failed patch
> When you have resolved this problem, run "git am --continue".
> If you prefer to skip this patch, run "git am --skip" instead.
> To restore the original branch and stop patching, run "git am --abort".

Always rebase before sending if you are not sure since drm-tip can be 
pretty dynamic.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 3/4] drm/i915: set optimum eu/slice/sub-slice configuration based on load type
  2018-12-11 12:47   ` Tvrtko Ursulin
@ 2018-12-12  7:36     ` Navik, Ankit P
  0 siblings, 0 replies; 18+ messages in thread
From: Navik, Ankit P @ 2018-12-12  7:36 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Hi Tvrtko, 

> On Tue, Dec 11, 2018 at 6:17 PM Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> 
> 
> On 11/12/2018 10:14, Ankit Navik wrote:
> > From: Praveen Diwakar <praveen.diwakar@intel.com>
> >
> > This patch will select optimum eu/slice/sub-slice configuration based
> > on type of load (low, medium, high) as input.
> > Based on our readings and experiments we have predefined set of
> > optimum configuration for each platform(CHT, KBL).
> > i915_gem_context_set_load_type will select optimum configuration from
> > pre-defined optimum configuration table(opt_config).
> >
> > It also introduce flag update_render_config which can set by any governor.
> >
> > v2:
> >   * Move static optimum_config to device init time.
> >   * Rename function to appropriate name, fix data types and patch ordering.
> >   * Rename prev_load_type to pending_load_type. (Tvrtko Ursulin)
> >
> > v3:
> >   * Add safe guard check in i915_gem_context_set_load_type.
> >   * Rename struct from optimum_config to i915_sseu_optimum_config to
> >     avoid namespace clashes.
> >   * Reduces memcpy for space efficient.
> >   * Rebase.
> >   * Improved commit message. (Tvrtko Ursulin)
> >
> > Cc: Kedar J Karanje <kedar.j.karanje@intel.com>
> > Cc: Yogesh Marathe <yogesh.marathe@intel.com>
> > Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> 
> Again for the record no, I did not r-b this.

Sorry, I wasn't aware that r-b has to be added by reviewer. Will take care in next patch sets.
> 
> > Signed-off-by: Praveen Diwakar <praveen.diwakar@intel.com>
> > Signed-off-by: Aravindan Muthukumar <aravindan.muthukumar@intel.com>
> > Signed-off-by: Ankit Navik <ankit.p.navik@intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_drv.h          |  3 ++
> >   drivers/gpu/drm/i915/i915_gem_context.c  | 18 ++++++++++++
> >   drivers/gpu/drm/i915/i915_gem_context.h  | 25 +++++++++++++++++
> >   drivers/gpu/drm/i915/intel_device_info.c | 47
> ++++++++++++++++++++++++++++++--
> >   drivers/gpu/drm/i915/intel_lrc.c         |  4 ++-
> >   5 files changed, 94 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h
> > b/drivers/gpu/drm/i915/i915_drv.h index 4aca534..4b9a8c5 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -1681,6 +1681,9 @@ struct drm_i915_private {
> >   	struct drm_i915_fence_reg fence_regs[I915_MAX_NUM_FENCES]; /*
> assume 965 */
> >   	int num_fence_regs; /* 8 on pre-965, 16 otherwise */
> >
> > +	/* optimal slice/subslice/EU configration state */
> > +	struct i915_sseu_optimum_config opt_config[LOAD_TYPE_LAST];
> 
> Make it a pointer to struct i915_sseu_optimum_config.

Will incorporate this and other reviews in next patch.
> 
> > +
> >   	unsigned int fsb_freq, mem_freq, is_ddr3;
> >   	unsigned int skl_preferred_vco_freq;
> >   	unsigned int max_cdclk_freq;
> > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c
> > b/drivers/gpu/drm/i915/i915_gem_context.c
> > index d040858..c0ced72 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> > @@ -392,10 +392,28 @@ i915_gem_create_context(struct drm_i915_private
> *dev_priv,
> >   	ctx->subslice_cnt = hweight8(
> >   			INTEL_INFO(dev_priv)->sseu.subslice_mask[0]);
> >   	ctx->eu_cnt = INTEL_INFO(dev_priv)->sseu.eu_per_subslice;
> > +	ctx->load_type = 0;
> > +	ctx->pending_load_type = 0;
> 
> Not needed because we zero the allocation and probably depend on untouched
> fields being zero elsewhere.
> 
> >
> >   	return ctx;
> >   }
> >
> > +
> > +void i915_gem_context_set_load_type(struct i915_gem_context *ctx,
> > +		enum gem_load_type type)
> > +{
> > +	struct drm_i915_private *dev_priv = ctx->i915;
> > +
> > +	if (GEM_WARN_ON(type > LOAD_TYPE_LAST))
> > +		return;
> > +
> > +	/* Call opt_config to get correct configuration for eu,slice,subslice */
> > +	ctx->slice_cnt = dev_priv->opt_config[type].slice;
> > +	ctx->subslice_cnt = dev_priv->opt_config[type].subslice;
> > +	ctx->eu_cnt = dev_priv->opt_config[type].eu;
> > +	ctx->pending_load_type = type;
> > +}
> > +
> >   /**
> >    * i915_gem_context_create_gvt - create a GVT GEM context
> >    * @dev: drm device *
> > diff --git a/drivers/gpu/drm/i915/i915_gem_context.h
> > b/drivers/gpu/drm/i915/i915_gem_context.h
> > index e000530..a0db13c 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_context.h
> > +++ b/drivers/gpu/drm/i915/i915_gem_context.h
> > @@ -53,6 +53,19 @@ struct intel_context_ops {
> >   	void (*destroy)(struct intel_context *ce);
> >   };
> >
> > +enum gem_load_type {
> > +	LOAD_TYPE_LOW,
> > +	LOAD_TYPE_MEDIUM,
> > +	LOAD_TYPE_HIGH,
> > +	LOAD_TYPE_LAST
> > +};
> > +
> > +struct i915_sseu_optimum_config {
> > +	u8 slice;
> > +	u8 subslice;
> > +	u8 eu;
> > +};
> > +
> >   /**
> >    * struct i915_gem_context - client state
> >    *
> > @@ -208,6 +221,16 @@ struct i915_gem_context {
> >
> >   	/** eu_cnt: used to set the # of eu to be enabled. */
> >   	u8 eu_cnt;
> > +
> > +	/** load_type: The designated load_type (high/medium/low) for a given
> > +	 * number of pending commands in the command queue.
> > +	 */
> > +	enum gem_load_type load_type;
> > +
> > +	/** pending_load_type: The earlier load type that the GPU was
> configured
> > +	 * for (high/medium/low).
> > +	 */
> > +	enum gem_load_type pending_load_type;
> >   };
> >
> >   static inline bool i915_gem_context_is_closed(const struct
> > i915_gem_context *ctx) @@ -336,6 +359,8 @@ int
> i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
> >   				    struct drm_file *file_priv);
> >   int i915_gem_context_reset_stats_ioctl(struct drm_device *dev, void *data,
> >   				       struct drm_file *file);
> > +void i915_gem_context_set_load_type(struct i915_gem_context *ctx,
> > +		enum gem_load_type type);
> >
> >   struct i915_gem_context *
> >   i915_gem_context_create_kernel(struct drm_i915_private *i915, int
> > prio); diff --git a/drivers/gpu/drm/i915/intel_device_info.c
> > b/drivers/gpu/drm/i915/intel_device_info.c
> > index 0ef0c64..33c6310 100644
> > --- a/drivers/gpu/drm/i915/intel_device_info.c
> > +++ b/drivers/gpu/drm/i915/intel_device_info.c
> > @@ -741,6 +741,28 @@ void intel_device_info_runtime_init(struct
> intel_device_info *info)
> >   		container_of(info, struct drm_i915_private, info);
> >   	enum pipe pipe;
> >
> > +	struct i915_sseu_optimum_config *opt_config = NULL;
> > +	/* static table of slice/subslice/EU for Cherryview */
> > +	struct i915_sseu_optimum_config chv_config[LOAD_TYPE_LAST] = {
> > +		{1, 1, 4},	/* Low */
> > +		{1, 1, 6},	/* Medium */
> > +		{1, 2, 6}	/* High */
> > +	};
> > +
> > +	/* static table of slice/subslice/EU for KBL GT2 */
> > +	struct i915_sseu_optimum_config kbl_gt2_config[LOAD_TYPE_LAST] = {
> > +		{1, 3, 2},	/* Low */
> > +		{1, 3, 4},	/* Medium */
> > +		{1, 3, 8}	/* High */
> > +	};
> > +
> > +	/* static table of slice/subslice/EU for KBL GT3 */
> > +	struct i915_sseu_optimum_config kbl_gt3_config[LOAD_TYPE_LAST] = {
> > +		{2, 3, 4},	/* Low */
> > +		{2, 3, 6},	/* Medium */
> > +		{2, 3, 8}	/* High */
> > +	};
> 
> Move these to file scope as static const.
> 
> > +
> >   	if (INTEL_GEN(dev_priv) >= 10) {
> >   		for_each_pipe(dev_priv, pipe)
> >   			info->num_scalers[pipe] = 2;
> > @@ -840,17 +862,38 @@ void intel_device_info_runtime_init(struct
> intel_device_info *info)
> >   	/* Initialize slice/subslice/EU info */
> >   	if (IS_HASWELL(dev_priv))
> >   		haswell_sseu_info_init(dev_priv);
> > -	else if (IS_CHERRYVIEW(dev_priv))
> > +	else if (IS_CHERRYVIEW(dev_priv)) {
> >   		cherryview_sseu_info_init(dev_priv);
> > +		opt_config = chv_config;
> 
> For my idea to work here and below add:
> 
> BUILD_BUG_ON(ARRAY_SIZE(chv_config) != LOAD_TYPE_LAST);
> 
> > +	}
> >   	else if (IS_BROADWELL(dev_priv))
> >   		broadwell_sseu_info_init(dev_priv);
> > -	else if (INTEL_GEN(dev_priv) == 9)
> > +	else if (INTEL_GEN(dev_priv) == 9) {
> >   		gen9_sseu_info_init(dev_priv);
> > +
> > +		if (IS_KABYLAKE(dev_priv)) {
> > +			switch (info->gt) {
> > +			default:
> > +				MISSING_CASE(info->gt);
> 
> This should probably be silent since absence of sseu tuning is not an error on
> other platforms.
> 
> > +				/* fall through */
> > +			case 2:
> > +				opt_config = kbl_gt2_config;
> > +			break;
> > +			case 3:
> > +				opt_config = kbl_gt3_config;
> 
> Would Kabylake table work for other Gen9 platforms?

The table is designed from heuristic data (collected data from gen9 GT3 based).
It should work ideally for other platform which is having 48 EUs
(2-slices, 3-sub-slices, 8-EUs);  but I haven't verified on other gen9 config.
I will check and update further in next patch set.

Thank you for your valuable feedback!

Regards,  Ankit
> 
> > +			break;
> > +			}
> > +		}
> > +	}
> >   	else if (INTEL_GEN(dev_priv) == 10)
> >   		gen10_sseu_info_init(dev_priv);
> >   	else if (INTEL_GEN(dev_priv) >= 11)
> >   		gen11_sseu_info_init(dev_priv);
> >
> > +	if (opt_config)
> > +		memcpy(dev_priv->opt_config, opt_config, LOAD_TYPE_LAST *
> > +			sizeof(struct i915_sseu_optimum_config));
> 
> And if you agree with my other suggestions then you can replace a copy with:
> 
> 	dev_priv->opt_config = opt_config;
> 
> Or in fact, even do it directly on assignment sites above.
> 
> > +
> >   	/* Initialize command stream timestamp frequency */
> >   	info->cs_timestamp_frequency_khz =
> read_timestamp_frequency(dev_priv);
> >   }
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c
> > b/drivers/gpu/drm/i915/intel_lrc.c
> > index a17f676..7fb9cd2 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -427,11 +427,13 @@ static u64 execlists_update_context(struct
> > i915_request *rq)
> >
> >   	reg_state[CTX_RING_TAIL+1] = intel_ring_set_tail(rq->ring, rq->tail);
> >   	/* FIXME: To avoid stale rpcs config, move it to context_pin */
> > -	if (ctx->pid && ctx->name && (rq->engine->id == RCS)) {
> > +	if (ctx->pid && ctx->name && (rq->engine->id == RCS) &&
> > +			(ctx->load_type != ctx->pending_load_type)) {
> >   		rpcs_config = make_rpcs(ctx->i915);
> >   		reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
> >   		CTX_REG(reg_state, CTX_R_PWR_CLK_STATE,
> GEN8_R_PWR_CLK_STATE,
> >   				rpcs_config);
> > +		ctx->load_type = ctx->pending_load_type;
> >   	}
> >
> >   	/* True 32b PPGTT with dynamic page allocation: update PDP
> >
> 
> Regards,
> 
> Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 0/4] Dynamic EU configuration of Slice/Subslice/EU.
  2018-12-11 11:48 ` [PATCH v3 0/4] " Tvrtko Ursulin
@ 2018-12-12  7:43   ` Navik, Ankit P
  0 siblings, 0 replies; 18+ messages in thread
From: Navik, Ankit P @ 2018-12-12  7:43 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Hi Tvrtko, 

On Tue, Dec 11, 2018 at 5:18 PM Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> 
> 
> On 11/12/2018 10:14, Ankit Navik wrote:
> > drm/i915: Context aware user agnostic EU/Slice/Sub-slice control
> > within kernel
> >
> > Current GPU configuration code for i915 does not allow us to change
> > EU/Slice/Sub-slice configuration dynamically. Its done only once while
> > context is created.
> >
> > While particular graphics application is running, if we examine the
> > command requests from user space, we observe that command density is not
> consistent.
> > It means there is scope to change the graphics configuration
> > dynamically even while context is running actively. This patch series
> > proposes the solution to find the active pending load for all active
> > context at given time and based on that, dynamically perform graphics
> configuration for each context.
> >
> > We use a hr (high resolution) timer with i915 driver in kernel to get
> > a callback every few milliseconds (this timer value can be configured
> > through debugfs, default is '0' indicating timer is in disabled state
> > i.e. original system without any intervention).In the timer callback,
> > we examine pending commands for a context in the queue, essentially,
> > we intercept them before they are executed by GPU and we update context
> with required number of EUs.
> >
> > Two questions, how did we arrive at right timer value? and what's the
> > right number of EUs? For the prior one, empirical data to achieve best
> > performance in least power was considered. For the later one, we
> > roughly categorized number of EUs logically based on platform. Now we
> > compare number of pending commands with a particular threshold and
> > then set number of EUs accordingly with update context. That threshold
> > is also based on experiments & findings. If GPU is able to catch up
> > with CPU, typically there are no pending commands, the EU config would
> > remain unchanged there. In case there are more pending commands we
> > reprogram context with higher number of EUs. Please note, here we are
> changing EUs even while context is running by examining pending commands
> every 'x'
> > milliseconds.
> >
> > With this solution in place, on KBL-GT3 + Android we saw following pnp
> > benefits, power numbers mentioned here are system power.
> >
> > App /KPI               | % Power |
> >                         | Benefit |
> >                         |  (mW)   |
> > ---------------------------------|
> > 3D Mark (Ice storm)    | 2.30%   |
> > TRex On screen         | 2.49%   |
> > TRex Off screen        | 1.32%   |
> > ManhattanOn screen     | 3.11%   |
> > Manhattan Off screen   | 0.89%   |
> > AnTuTu  6.1.4          | 3.42%   |
> > SynMark2               | 1.70%   |
> 
> Is this the aggregated SynMark2 result, like all sub-tests averaged or something?

Yes, It is averaged result covering all the test cases.
> 
> I suggest you do want to list much more detail here, all individual sub-tests,
> different platforms, etc. The change you are proposing is quite big and the
> amount of research that you must demonstrate for people to take this seriously
> has to be equally exhaustive.

I will verify and add more details covering various platform and sub-tests.

Regards, Ankit 
> 
> Regards,
> 
> Tvrtko
> 
> >
> > Note - For KBL (GEN9) we cannot control at sub-slice level, it was
> > always  a constraint.
> > We always controlled number of EUs rather than sub-slices/slices.
> > We have also observed GPU core residencies improves by 1.03%.
> >
> > Praveen Diwakar (4):
> >    drm/i915: Get active pending request for given context
> >    drm/i915: Update render power clock state configuration for given
> >      context
> >    drm/i915: set optimum eu/slice/sub-slice configuration based on load
> >      type
> >    drm/i915: Predictive governor to control eu/slice/subslice
> >
> >   drivers/gpu/drm/i915/i915_debugfs.c      | 90
> +++++++++++++++++++++++++++++++-
> >   drivers/gpu/drm/i915/i915_drv.c          |  4 ++
> >   drivers/gpu/drm/i915/i915_drv.h          |  9 ++++
> >   drivers/gpu/drm/i915/i915_gem_context.c  | 23 ++++++++
> >   drivers/gpu/drm/i915/i915_gem_context.h  | 39 ++++++++++++++
> >   drivers/gpu/drm/i915/i915_request.c      |  2 +
> >   drivers/gpu/drm/i915/intel_device_info.c | 47 ++++++++++++++++-
> >   drivers/gpu/drm/i915/intel_lrc.c         | 16 +++++-
> >   8 files changed, 226 insertions(+), 4 deletions(-)
> >
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 0/4] Dynamic EU configuration of Slice/Subslice/EU.
  2018-12-11 10:14 [PATCH v3 0/4] Dynamic EU configuration of Slice/Subslice/EU Ankit Navik
                   ` (5 preceding siblings ...)
  2018-12-11 11:48 ` [PATCH v3 0/4] " Tvrtko Ursulin
@ 2018-12-14 10:27 ` Joonas Lahtinen
  2018-12-14 12:09   ` Navik, Ankit P
  6 siblings, 1 reply; 18+ messages in thread
From: Joonas Lahtinen @ 2018-12-14 10:27 UTC (permalink / raw)
  To: intel-gfx; +Cc: ankit.p.navik

Quoting Ankit Navik (2018-12-11 12:14:17)
> drm/i915: Context aware user agnostic EU/Slice/Sub-slice control within kernel
> 
> Current GPU configuration code for i915 does not allow us to change
> EU/Slice/Sub-slice configuration dynamically. Its done only once while context
> is created.
> 
> While particular graphics application is running, if we examine the command
> requests from user space, we observe that command density is not consistent.
> It means there is scope to change the graphics configuration dynamically even
> while context is running actively. This patch series proposes the solution to
> find the active pending load for all active context at given time and based on
> that, dynamically perform graphics configuration for each context.
> 
> We use a hr (high resolution) timer with i915 driver in kernel to get a
> callback every few milliseconds (this timer value can be configured through
> debugfs, default is '0' indicating timer is in disabled state i.e. original
> system without any intervention).In the timer callback, we examine pending
> commands for a context in the queue, essentially, we intercept them before
> they are executed by GPU and we update context with required number of EUs.
> 
> Two questions, how did we arrive at right timer value? and what's the right
> number of EUs? For the prior one, empirical data to achieve best performance
> in least power was considered. For the later one, we roughly categorized number 
> of EUs logically based on platform. Now we compare number of pending commands
> with a particular threshold and then set number of EUs accordingly with update
> context. That threshold is also based on experiments & findings. If GPU is able
> to catch up with CPU, typically there are no pending commands, the EU config
> would remain unchanged there. In case there are more pending commands we
> reprogram context with higher number of EUs. Please note, here we are changing
> EUs even while context is running by examining pending commands every 'x'
> milliseconds.

On the overall strategy. This will be unsuitable to be merged as a
debugfs interface. So is the idea to evolve into a sysfs interface? As
this seems to require tuning for each specific workload, I don't think
that would scale too well if you consider a desktop distro?

Also, there's the patch series to to enable/disable subslices with VME
hardware (the other dynamic slice shutdown/SSEU series) depending on the
type of load being run. Certain workloads would hang the system if
they're executed with full subslice configuration. In that light, it
would make more sense if the applications would be the ones reporting
their optimal running configuration.

> 
> With this solution in place, on KBL-GT3 + Android we saw following pnp
> benefits, power numbers mentioned here are system power.
> 
> App /KPI               | % Power |
>                        | Benefit |
>                        |  (mW)   |
> ---------------------------------|
> 3D Mark (Ice storm)    | 2.30%   |
> TRex On screen         | 2.49%   |
> TRex Off screen        | 1.32%   |
> ManhattanOn screen     | 3.11%   |
> Manhattan Off screen   | 0.89%   |
> AnTuTu  6.1.4          | 3.42%   |
> SynMark2               | 1.70%   |

Just to verify, these numbers are true while there's no negative effect
on the benchmark scores?

Regards, Joonas

> Note - For KBL (GEN9) we cannot control at sub-slice level, it was always  a
> constraint.
> We always controlled number of EUs rather than sub-slices/slices.
> We have also observed GPU core residencies improves by 1.03%.
> 
> Praveen Diwakar (4):
>   drm/i915: Get active pending request for given context
>   drm/i915: Update render power clock state configuration for given
>     context
>   drm/i915: set optimum eu/slice/sub-slice configuration based on load
>     type
>   drm/i915: Predictive governor to control eu/slice/subslice
> 
>  drivers/gpu/drm/i915/i915_debugfs.c      | 90 +++++++++++++++++++++++++++++++-
>  drivers/gpu/drm/i915/i915_drv.c          |  4 ++
>  drivers/gpu/drm/i915/i915_drv.h          |  9 ++++
>  drivers/gpu/drm/i915/i915_gem_context.c  | 23 ++++++++
>  drivers/gpu/drm/i915/i915_gem_context.h  | 39 ++++++++++++++
>  drivers/gpu/drm/i915/i915_request.c      |  2 +
>  drivers/gpu/drm/i915/intel_device_info.c | 47 ++++++++++++++++-
>  drivers/gpu/drm/i915/intel_lrc.c         | 16 +++++-
>  8 files changed, 226 insertions(+), 4 deletions(-)
> 
> -- 
> 2.7.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 0/4] Dynamic EU configuration of Slice/Subslice/EU.
  2018-12-14 10:27 ` Joonas Lahtinen
@ 2018-12-14 12:09   ` Navik, Ankit P
  0 siblings, 0 replies; 18+ messages in thread
From: Navik, Ankit P @ 2018-12-14 12:09 UTC (permalink / raw)
  To: Joonas Lahtinen, intel-gfx

Hi Joonas, 

On Fri, Dec 14, 2018 at 3:57 PM Joonas Lahtinen <joonas.lahtinen@linux.intel.com> wrote:
> 
> Quoting Ankit Navik (2018-12-11 12:14:17)
> > drm/i915: Context aware user agnostic EU/Slice/Sub-slice control
> > within kernel
> >
> > Current GPU configuration code for i915 does not allow us to change
> > EU/Slice/Sub-slice configuration dynamically. Its done only once while
> > context is created.
> >
> > While particular graphics application is running, if we examine the
> > command requests from user space, we observe that command density is not
> consistent.
> > It means there is scope to change the graphics configuration
> > dynamically even while context is running actively. This patch series
> > proposes the solution to find the active pending load for all active
> > context at given time and based on that, dynamically perform graphics
> configuration for each context.
> >
> > We use a hr (high resolution) timer with i915 driver in kernel to get
> > a callback every few milliseconds (this timer value can be configured
> > through debugfs, default is '0' indicating timer is in disabled state
> > i.e. original system without any intervention).In the timer callback,
> > we examine pending commands for a context in the queue, essentially,
> > we intercept them before they are executed by GPU and we update context
> with required number of EUs.
> >
> > Two questions, how did we arrive at right timer value? and what's the
> > right number of EUs? For the prior one, empirical data to achieve best
> > performance in least power was considered. For the later one, we
> > roughly categorized number of EUs logically based on platform. Now we
> > compare number of pending commands with a particular threshold and
> > then set number of EUs accordingly with update context. That threshold
> > is also based on experiments & findings. If GPU is able to catch up
> > with CPU, typically there are no pending commands, the EU config would
> > remain unchanged there. In case there are more pending commands we
> > reprogram context with higher number of EUs. Please note, here we are
> changing EUs even while context is running by examining pending commands
> every 'x'
> > milliseconds.
> 
> On the overall strategy. This will be unsuitable to be merged as a debugfs
> interface. So is the idea to evolve into a sysfs interface? As this seems to require
> tuning for each specific workload, I don't think that would scale too well if you
> consider a desktop distro?

We started initially as debugfs interface. I have added the comment to
move the functionality into sysfs interface. Yes, I will consider the desktop
distro and share the detail results.
> 
> Also, there's the patch series to to enable/disable subslices with VME hardware
> (the other dynamic slice shutdown/SSEU series) depending on the type of load
> being run. Certain workloads would hang the system if they're executed with full
> subslice configuration. In that light, it would make more sense if the applications
> would be the ones reporting their optimal running configuration.

I think, the series expose rpcs for gen 11 only for VME use case.
The patch I have tested on KBL (Gen 9). I will consider other gen 9 platform as well. 

> 
> >
> > With this solution in place, on KBL-GT3 + Android we saw following pnp
> > benefits, power numbers mentioned here are system power.
> >
> > App /KPI               | % Power |
> >                        | Benefit |
> >                        |  (mW)   |
> > ---------------------------------|
> > 3D Mark (Ice storm)    | 2.30%   |
> > TRex On screen         | 2.49%   |
> > TRex Off screen        | 1.32%   |
> > ManhattanOn screen     | 3.11%   |
> > Manhattan Off screen   | 0.89%   |
> > AnTuTu  6.1.4          | 3.42%   |
> > SynMark2               | 1.70%   |
> 
> Just to verify, these numbers are true while there's no negative effect on the
> benchmark scores?

Yes, There is no impact on the benchmark scores.
Thank you Joonas for your valuable feedback.

Regards, Ankit

> 
> Regards, Joonas
> 
> > Note - For KBL (GEN9) we cannot control at sub-slice level, it was
> > always  a constraint.
> > We always controlled number of EUs rather than sub-slices/slices.
> > We have also observed GPU core residencies improves by 1.03%.
> >
> > Praveen Diwakar (4):
> >   drm/i915: Get active pending request for given context
> >   drm/i915: Update render power clock state configuration for given
> >     context
> >   drm/i915: set optimum eu/slice/sub-slice configuration based on load
> >     type
> >   drm/i915: Predictive governor to control eu/slice/subslice
> >
> >  drivers/gpu/drm/i915/i915_debugfs.c      | 90
> +++++++++++++++++++++++++++++++-
> >  drivers/gpu/drm/i915/i915_drv.c          |  4 ++
> >  drivers/gpu/drm/i915/i915_drv.h          |  9 ++++
> >  drivers/gpu/drm/i915/i915_gem_context.c  | 23 ++++++++
> > drivers/gpu/drm/i915/i915_gem_context.h  | 39 ++++++++++++++
> >  drivers/gpu/drm/i915/i915_request.c      |  2 +
> >  drivers/gpu/drm/i915/intel_device_info.c | 47 ++++++++++++++++-
> >  drivers/gpu/drm/i915/intel_lrc.c         | 16 +++++-
> >  8 files changed, 226 insertions(+), 4 deletions(-)
> >
> > --
> > 2.7.4
> >
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 2/4] drm/i915: Update render power clock state configuration for given context
  2018-12-11 12:36   ` Tvrtko Ursulin
@ 2019-03-14  8:55     ` Ankit Navik
  0 siblings, 0 replies; 18+ messages in thread
From: Ankit Navik @ 2019-03-14  8:55 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Ankit Navik, intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 5468 bytes --]

Hi Tvrtko,

On Tue, Dec 11, 2018 at 6:06 PM Tvrtko Ursulin <
tvrtko.ursulin@linux.intel.com> wrote:

>
> On 11/12/2018 10:14, Ankit Navik wrote:
> > From: Praveen Diwakar <praveen.diwakar@intel.com>
> >
> > This patch will update power clock state register at runtime base on the
> > flag which can set by any governor which computes load and want to update
> > rpcs register.
> > subsequent patches will have a timer based governor which computes
> pending
> > load/request.
> >
> > V2:
> >   * Reuse make_rpcs to get rpcs config. (Tvrtko Ursulin)
> >
> > V3:
> >   * Rebase.
> >
> > Cc: Aravindan Muthukumar <aravindan.muthukumar@intel.com>
> > Cc: Kedar J Karanje <kedar.j.karanje@intel.com>
> > Cc: Yogesh Marathe <yogesh.marathe@intel.com>
> > Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>
> Again, I did not give an r-b for this!
>
> > Signed-off-by: Praveen Diwakar <praveen.diwakar@intel.com>
> > Signed-off-by: Ankit Navik <ankit.p.navik@intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_gem_context.c |  4 ++++
> >   drivers/gpu/drm/i915/i915_gem_context.h |  9 +++++++++
> >   drivers/gpu/drm/i915/intel_lrc.c        | 12 +++++++++++-
> >   3 files changed, 24 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c
> b/drivers/gpu/drm/i915/i915_gem_context.c
> > index 0bcbe32..d040858 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> > @@ -388,6 +388,10 @@ i915_gem_create_context(struct drm_i915_private
> *dev_priv,
> >
> >       trace_i915_context_create(ctx);
> >       atomic_set(&ctx->req_cnt, 0);
> > +     ctx->slice_cnt = hweight8(INTEL_INFO(dev_priv)->sseu.slice_mask);
> > +     ctx->subslice_cnt = hweight8(
> > +                     INTEL_INFO(dev_priv)->sseu.subslice_mask[0]);
> > +     ctx->eu_cnt = INTEL_INFO(dev_priv)->sseu.eu_per_subslice;
> >
> >       return ctx;
> >   }
> > diff --git a/drivers/gpu/drm/i915/i915_gem_context.h
> b/drivers/gpu/drm/i915/i915_gem_context.h
> > index e824b15..e000530 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_context.h
> > +++ b/drivers/gpu/drm/i915/i915_gem_context.h
> > @@ -199,6 +199,15 @@ struct i915_gem_context {
> >        * go for low/medium/high load configuration of the GPU.
> >        */
> >       atomic_t req_cnt;
> > +
> > +     /** slice_cnt: used to set the # of slices to be enabled. */
> > +     u8 slice_cnt;
> > +
> > +     /** subslice_cnt: used to set the # of subslices to be enabled. */
> > +     u8 subslice_cnt;
> > +
> > +     /** eu_cnt: used to set the # of eu to be enabled. */
> > +     u8 eu_cnt;
> >   };
> >
> >   static inline bool i915_gem_context_is_closed(const struct
> i915_gem_context *ctx)
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c
> b/drivers/gpu/drm/i915/intel_lrc.c
> > index d33f5ac..a17f676 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -171,6 +171,7 @@ static void execlists_init_reg_state(u32 *reg_state,
> >                                    struct i915_gem_context *ctx,
> >                                    struct intel_engine_cs *engine,
> >                                    struct intel_ring *ring);
> > +static u32 make_rpcs(struct drm_i915_private *dev_priv);
> >
> >   static inline struct i915_priolist *to_priolist(struct rb_node *rb)
> >   {
> > @@ -417,12 +418,21 @@ execlists_update_context_pdps(struct i915_hw_ppgtt
> *ppgtt, u32 *reg_state)
> >
> >   static u64 execlists_update_context(struct i915_request *rq)
> >   {
> > +     u32 rpcs_config = 0;
>
> Move to if block where it is used.
>
> >       struct intel_context *ce = rq->hw_context;
> > +     u32 *reg_state = ce->lrc_reg_state;
>
> No need to touch this.
>
> > +     struct i915_gem_context *ctx = rq->gem_context;
>
> Can go under the if block as well.
>
> >       struct i915_hw_ppgtt *ppgtt =
> >               rq->gem_context->ppgtt ?: rq->i915->mm.aliasing_ppgtt;
> > -     u32 *reg_state = ce->lrc_reg_state;
> >
> >       reg_state[CTX_RING_TAIL+1] = intel_ring_set_tail(rq->ring,
> rq->tail);
> > +     /* FIXME: To avoid stale rpcs config, move it to context_pin */
>
> This FIXME remains unresolved by the whole series, so that really cannot
> be like that.
>
> I suggest you add https://patchwork.freedesktop.org/patch/261560/ to
> your series and rebase this on top.
>
> Which would actually make this patch not do very much and you should
> probably squash it with the one which actually uses the added fields.
>

submitted v4 patch and reverted to previous function
i.e., get_context_rpcs_config as make_rpcs is changed in mainline.

Regards, Ankit

>
> > +     if (ctx->pid && ctx->name && (rq->engine->id == RCS)) {
>
> Why name and pid? You are using them as proxy for something.. but for
> what and why? The answer may hold a hint to the proper solution.
>
> > +             rpcs_config = make_rpcs(ctx->i915);
> > +             reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
> > +             CTX_REG(reg_state, CTX_R_PWR_CLK_STATE,
> GEN8_R_PWR_CLK_STATE,
> > +                             rpcs_config);
> > +     }
> >
> >       /* True 32b PPGTT with dynamic page allocation: update PDP
> >        * registers and point the unallocated PDPs to scratch page.
> >
>
> Regards,
>
> Tvrtko
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>

[-- Attachment #1.2: Type: text/html, Size: 7781 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* ✗ Fi.CI.BAT: failure for Dynamic EU configuration of Slice/Subslice/EU.
  2018-09-21  9:13 [PATCH 0/4][RFC] " kedar.j.karanje
@ 2018-09-21 11:16 ` Patchwork
  0 siblings, 0 replies; 18+ messages in thread
From: Patchwork @ 2018-09-21 11:16 UTC (permalink / raw)
  To: kedar.j.karanje; +Cc: intel-gfx

== Series Details ==

Series: Dynamic EU configuration of Slice/Subslice/EU.
URL   : https://patchwork.freedesktop.org/series/50006/
State : failure

== Summary ==

CALL    scripts/checksyscalls.sh
  DESCEND  objtool
  CHK     include/generated/compile.h
  CC [M]  drivers/gpu/drm/i915/i915_debugfs.o
drivers/gpu/drm/i915/i915_debugfs.c: In function ‘predictive_load_cb’:
drivers/gpu/drm/i915/i915_debugfs.c:4813:26: error: unused variable ‘engine’ [-Werror=unused-variable]
  struct intel_engine_cs *engine;
                          ^~~~~~
drivers/gpu/drm/i915/i915_debugfs.c:4812:23: error: unused variable ‘id’ [-Werror=unused-variable]
  enum intel_engine_id id;
                       ^~
drivers/gpu/drm/i915/i915_debugfs.c: In function ‘i915_predictive_load_get’:
drivers/gpu/drm/i915/i915_debugfs.c:4848:27: error: unused variable ‘dev_priv’ [-Werror=unused-variable]
  struct drm_i915_private *dev_priv = data;
                           ^~~~~~~~
cc1: all warnings being treated as errors
scripts/Makefile.build:305: recipe for target 'drivers/gpu/drm/i915/i915_debugfs.o' failed
make[4]: *** [drivers/gpu/drm/i915/i915_debugfs.o] Error 1
scripts/Makefile.build:546: recipe for target 'drivers/gpu/drm/i915' failed
make[3]: *** [drivers/gpu/drm/i915] Error 2
scripts/Makefile.build:546: recipe for target 'drivers/gpu/drm' failed
make[2]: *** [drivers/gpu/drm] Error 2
scripts/Makefile.build:546: recipe for target 'drivers/gpu' failed
make[1]: *** [drivers/gpu] Error 2
Makefile:1062: recipe for target 'drivers' failed
make: *** [drivers] Error 2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2019-03-14  8:55 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-11 10:14 [PATCH v3 0/4] Dynamic EU configuration of Slice/Subslice/EU Ankit Navik
2018-12-11 10:14 ` [PATCH v3 1/4] drm/i915: Get active pending request for given context Ankit Navik
2018-12-11 11:58   ` Tvrtko Ursulin
2018-12-11 10:14 ` [PATCH v3 2/4] drm/i915: Update render power clock state configuration " Ankit Navik
2018-12-11 12:36   ` Tvrtko Ursulin
2019-03-14  8:55     ` Ankit Navik
2018-12-11 10:14 ` [PATCH v3 3/4] drm/i915: set optimum eu/slice/sub-slice configuration based on load type Ankit Navik
2018-12-11 12:47   ` Tvrtko Ursulin
2018-12-12  7:36     ` Navik, Ankit P
2018-12-11 10:14 ` [PATCH v3 4/4] drm/i915: Predictive governor to control eu/slice/subslice Ankit Navik
2018-12-11 13:00   ` Tvrtko Ursulin
2018-12-11 11:12 ` ✗ Fi.CI.BAT: failure for Dynamic EU configuration of Slice/Subslice/EU Patchwork
2018-12-11 13:02   ` Tvrtko Ursulin
2018-12-11 11:48 ` [PATCH v3 0/4] " Tvrtko Ursulin
2018-12-12  7:43   ` Navik, Ankit P
2018-12-14 10:27 ` Joonas Lahtinen
2018-12-14 12:09   ` Navik, Ankit P
  -- strict thread matches above, loose matches on Subject: below --
2018-09-21  9:13 [PATCH 0/4][RFC] " kedar.j.karanje
2018-09-21 11:16 ` ✗ Fi.CI.BAT: failure for " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.