All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/4] Dynamic EU configuration of Slice/Subslice/EU.
@ 2018-11-06  4:13 Ankit Navik
  2018-11-06  4:13 ` [PATCH v2 1/4] drm/i915: Get active pending request for given context Ankit Navik
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Ankit Navik @ 2018-11-06  4:13 UTC (permalink / raw)
  To: intel-gfx

drm/i915: Context aware user agnostic EU/Slice/Sub-slice control within kernel

Current GPU configuration code for i915 does not allow us to change
EU/Slice/Sub-slice configuration dynamically. Its done only once while context
is created.

While particular graphics application is running, if we examine the command
requests from user space, we observe that command density is not consistent.
It means there is scope to change the graphics configuration dynamically even
while context is running actively. This patch series proposes the solution to
find the active pending load for all active context at given time and based on
that, dynamically perform graphics configuration for each context.

We use a hr (high resolution) timer with i915 driver in kernel to get a
callback every few milliseconds (this timer value can be configured through
debugfs, default is '0' indicating timer is in disabled state i.e. original
system without any intervention).In the timer callback, we examine pending
commands for a context in the queue, essentially, we intercept them before
they are executed by GPU and we update context with required number of EUs.

Two questions, how did we arrive at right timer value? and what's the right
number of EUs? For the prior one, empirical data to achieve best performance
in least power was considered. For the later one, we roughly categorized number 
of EUs logically based on platform. Now we compare number of pending commands
with a particular threshold and then set number of EUs accordingly with update
context. That threshold is also based on experiments & findings. If GPU is able
to catch up with CPU, typically there are no pending commands, the EU config
would remain unchanged there. In case there are more pending commands we
reprogram context with higher number of EUs. Please note, here we are changing
EUs even while context is running by examining pending commands every 'x'
milliseconds.

With this solution in place, on KBL-GT3 + Android we saw following pnp
benefits, power numbers mentioned here are system power.

App /KPI               | % Power |
                       | Benefit |
                       |  (mW)   |
---------------------------------|
3D Mark (Ice storm)    | 2.30%   |
TRex On screen         | 2.49%   |
TRex Off screen        | 1.32%   |
ManhattanOn screen     | 3.11%   |
Manhattan Off screen   | 0.89%   |
AnTuTu  6.1.4          | 3.42%   |

Note - For KBL (GEN9) we cannot control at sub-slice level, it was always  a
constraint.
We always controlled number of EUs rather than sub-slices/slices.

Praveen Diwakar (4):
  drm/i915: Get active pending request for given context
  drm/i915: Update render power clock state configuration for given
    context
  drm/i915: set optimum eu/slice/sub-slice configuration based on load
    type
  drm/i915: Predictive governor to control eu/slice/subslice

 drivers/gpu/drm/i915/i915_debugfs.c        | 88 +++++++++++++++++++++++++++++-
 drivers/gpu/drm/i915/i915_drv.c            |  1 +
 drivers/gpu/drm/i915/i915_drv.h            | 10 ++++
 drivers/gpu/drm/i915/i915_gem_context.c    | 26 +++++++++
 drivers/gpu/drm/i915/i915_gem_context.h    | 45 +++++++++++++++
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  5 ++
 drivers/gpu/drm/i915/intel_device_info.c   | 44 ++++++++++++++-
 drivers/gpu/drm/i915/intel_lrc.c           | 20 ++++++-
 8 files changed, 235 insertions(+), 4 deletions(-)

-- 
2.7.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 1/4] drm/i915: Get active pending request for given context
  2018-11-06  4:13 [PATCH v2 0/4] Dynamic EU configuration of Slice/Subslice/EU Ankit Navik
@ 2018-11-06  4:13 ` Ankit Navik
  2018-11-06  9:44   ` Tvrtko Ursulin
  2018-11-06  4:13 ` [PATCH v2 2/4] drm/i915: Update render power clock state configuration " Ankit Navik
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Ankit Navik @ 2018-11-06  4:13 UTC (permalink / raw)
  To: intel-gfx

From: Praveen Diwakar <praveen.diwakar@intel.com>

This patch gives us the active pending request count which is yet
to be submitted to the GPU

Signed-off-by: Praveen Diwakar <praveen.diwakar@intel.com>
Signed-off-by: Yogesh Marathe <yogesh.marathe@intel.com>
Signed-off-by: Aravindan Muthukumar <aravindan.muthukumar@intel.com>
Signed-off-by: Kedar J Karanje <kedar.j.karanje@intel.com>
Signed-off-by: Ankit Navik <ankit.p.navik@intel.com>
Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c            | 1 +
 drivers/gpu/drm/i915/i915_drv.h            | 5 +++++
 drivers/gpu/drm/i915/i915_gem_context.c    | 1 +
 drivers/gpu/drm/i915/i915_gem_context.h    | 6 ++++++
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 5 +++++
 drivers/gpu/drm/i915/intel_lrc.c           | 6 ++++++
 6 files changed, 24 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index f8cfd16..d37c46e 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -903,6 +903,7 @@ static int i915_driver_init_early(struct drm_i915_private *dev_priv,
 	mutex_init(&dev_priv->av_mutex);
 	mutex_init(&dev_priv->wm.wm_mutex);
 	mutex_init(&dev_priv->pps_mutex);
+	mutex_init(&dev_priv->pred_mutex);
 
 	i915_memcpy_init_early(dev_priv);
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 4aca534..137ec33 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1609,6 +1609,11 @@ struct drm_i915_private {
 	 * controller on different i2c buses. */
 	struct mutex gmbus_mutex;
 
+	/** pred_mutex protects against councurrent usage of pending
+	 * request counter for multiple contexts
+	 */
+	struct mutex pred_mutex;
+
 	/**
 	 * Base address of the gmbus and gpio block.
 	 */
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index b10770c..0bcbe32 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -387,6 +387,7 @@ i915_gem_create_context(struct drm_i915_private *dev_priv,
 	}
 
 	trace_i915_context_create(ctx);
+	atomic_set(&ctx->req_cnt, 0);
 
 	return ctx;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index b116e49..04e3ff7 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -194,6 +194,12 @@ struct i915_gem_context {
 	 * context close.
 	 */
 	struct list_head handles_list;
+
+	/** req_cnt: tracks the pending commands, based on which we decide to
+	 * go for low/medium/high load configuration of the GPU, this is
+	 * controlled via a mutex
+	 */
+	atomic_t req_cnt;
 };
 
 static inline bool i915_gem_context_is_closed(const struct i915_gem_context *ctx)
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 3f0c612..8afa2a5 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -2178,6 +2178,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 		       struct drm_syncobj **fences)
 {
 	struct i915_execbuffer eb;
+	struct drm_i915_private *dev_priv = to_i915(dev);
 	struct dma_fence *in_fence = NULL;
 	struct sync_file *out_fence = NULL;
 	int out_fence_fd = -1;
@@ -2390,6 +2391,10 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	 */
 	eb.request->batch = eb.batch;
 
+	mutex_lock(&dev_priv->pred_mutex);
+	atomic_inc(&eb.ctx->req_cnt);
+	mutex_unlock(&dev_priv->pred_mutex);
+
 	trace_i915_request_queue(eb.request, eb.batch_flags);
 	err = eb_submit(&eb);
 err_request:
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 1744792..bcbb66b 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -728,6 +728,12 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 			trace_i915_request_in(rq, port_index(port, execlists));
 			last = rq;
 			submit = true;
+
+			mutex_lock(&rq->i915->pred_mutex);
+			if (atomic_read(&rq->gem_context->req_cnt) > 0)
+				atomic_dec(&rq->gem_context->req_cnt);
+
+			mutex_unlock(&rq->i915->pred_mutex);
 		}
 
 		rb_erase_cached(&p->node, &execlists->queue);
-- 
2.7.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 2/4] drm/i915: Update render power clock state configuration for given context
  2018-11-06  4:13 [PATCH v2 0/4] Dynamic EU configuration of Slice/Subslice/EU Ankit Navik
  2018-11-06  4:13 ` [PATCH v2 1/4] drm/i915: Get active pending request for given context Ankit Navik
@ 2018-11-06  4:13 ` Ankit Navik
  2018-11-06  4:13 ` [PATCH v2 3/4] drm/i915: set optimum eu/slice/sub-slice configuration based on load type Ankit Navik
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Ankit Navik @ 2018-11-06  4:13 UTC (permalink / raw)
  To: intel-gfx

From: Praveen Diwakar <praveen.diwakar@intel.com>

This patch will update power clock state register at runtime base on the
flag which can set by any governor which computes load and want to update
rpcs register.
subsequent patches will have a timer based governor which computes pending
load/request.

Signed-off-by: Praveen Diwakar <praveen.diwakar@intel.com>
Signed-off-by: Yogesh Marathe <yogesh.marathe@intel.com>
Signed-off-by: Aravindan Muthukumar <aravindan.muthukumar@intel.com>
Signed-off-by: Kedar J Karanje <kedar.j.karanje@intel.com>
Signed-off-by: Ankit Navik <ankit.p.navik@intel.com>
Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c |  4 ++++
 drivers/gpu/drm/i915/i915_gem_context.h |  9 +++++++++
 drivers/gpu/drm/i915/intel_lrc.c        | 12 +++++++++++-
 3 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 0bcbe32..d040858 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -388,6 +388,10 @@ i915_gem_create_context(struct drm_i915_private *dev_priv,
 
 	trace_i915_context_create(ctx);
 	atomic_set(&ctx->req_cnt, 0);
+	ctx->slice_cnt = hweight8(INTEL_INFO(dev_priv)->sseu.slice_mask);
+	ctx->subslice_cnt = hweight8(
+			INTEL_INFO(dev_priv)->sseu.subslice_mask[0]);
+	ctx->eu_cnt = INTEL_INFO(dev_priv)->sseu.eu_per_subslice;
 
 	return ctx;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index 04e3ff7..2b3bf09 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -200,6 +200,15 @@ struct i915_gem_context {
 	 * controlled via a mutex
 	 */
 	atomic_t req_cnt;
+
+	/** slice_cnt: used to set the # of slices to be enabled. */
+	u8 slice_cnt;
+
+	/** subslice_cnt: used to set the # of subslices to be enabled. */
+	u8 subslice_cnt;
+
+	/** eu_cnt: used to set the # of eu to be enabled. */
+	u8 eu_cnt;
 };
 
 static inline bool i915_gem_context_is_closed(const struct i915_gem_context *ctx)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index bcbb66b..a8ab71a 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -171,6 +171,7 @@ static void execlists_init_reg_state(u32 *reg_state,
 				     struct i915_gem_context *ctx,
 				     struct intel_engine_cs *engine,
 				     struct intel_ring *ring);
+static u32 make_rpcs(struct drm_i915_private *dev_priv);
 
 static inline struct i915_priolist *to_priolist(struct rb_node *rb)
 {
@@ -417,12 +418,21 @@ execlists_update_context_pdps(struct i915_hw_ppgtt *ppgtt, u32 *reg_state)
 
 static u64 execlists_update_context(struct i915_request *rq)
 {
+	u32 rpcs_config = 0;
 	struct intel_context *ce = rq->hw_context;
+	u32 *reg_state = ce->lrc_reg_state;
+	struct i915_gem_context *ctx = rq->gem_context;
 	struct i915_hw_ppgtt *ppgtt =
 		rq->gem_context->ppgtt ?: rq->i915->mm.aliasing_ppgtt;
-	u32 *reg_state = ce->lrc_reg_state;
 
 	reg_state[CTX_RING_TAIL+1] = intel_ring_set_tail(rq->ring, rq->tail);
+	/* FIXME: To avoid stale rpcs config, move it to context_pin */
+	if (ctx->pid && ctx->name && (rq->engine->id == RCS)) {
+		rpcs_config = make_rpcs(ctx->i915);
+		reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
+		CTX_REG(reg_state, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
+				rpcs_config);
+	}
 
 	/* True 32b PPGTT with dynamic page allocation: update PDP
 	 * registers and point the unallocated PDPs to scratch page.
-- 
2.7.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 3/4] drm/i915: set optimum eu/slice/sub-slice configuration based on load type
  2018-11-06  4:13 [PATCH v2 0/4] Dynamic EU configuration of Slice/Subslice/EU Ankit Navik
  2018-11-06  4:13 ` [PATCH v2 1/4] drm/i915: Get active pending request for given context Ankit Navik
  2018-11-06  4:13 ` [PATCH v2 2/4] drm/i915: Update render power clock state configuration " Ankit Navik
@ 2018-11-06  4:13 ` Ankit Navik
  2018-11-06 10:34   ` Tvrtko Ursulin
  2018-11-06  4:13 ` [PATCH v2 4/4] drm/i915: Predictive governor to control eu/slice/subslice Ankit Navik
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Ankit Navik @ 2018-11-06  4:13 UTC (permalink / raw)
  To: intel-gfx

From: Praveen Diwakar <praveen.diwakar@intel.com>

This patch will select optimum eu/slice/sub-slice configuration based on
type of load (low, medium, high) as input.
Based on our readings and experiments we have predefined set of optimum
configuration for each platform(CHT, KBL).
i915_gem_context_set_load_type will select optimum configuration from
pre-defined optimum configuration table(opt_config).

It also introduce flag update_render_config which can set by any governor.

Signed-off-by: Praveen Diwakar <praveen.diwakar@intel.com>
Signed-off-by: Yogesh Marathe <yogesh.marathe@intel.com>
Signed-off-by: Aravindan Muthukumar <aravindan.muthukumar@intel.com>
Signed-off-by: Kedar J Karanje <kedar.j.karanje@intel.com>
Signed-off-by: Ankit Navik <ankit.p.navik@intel.com>
Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h          |  2 ++
 drivers/gpu/drm/i915/i915_gem_context.c  | 21 +++++++++++++++
 drivers/gpu/drm/i915/i915_gem_context.h  | 30 ++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_device_info.c | 44 ++++++++++++++++++++++++++++++--
 drivers/gpu/drm/i915/intel_lrc.c         |  4 ++-
 5 files changed, 98 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 137ec33..f1b16d0 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1686,6 +1686,8 @@ struct drm_i915_private {
 	struct drm_i915_fence_reg fence_regs[I915_MAX_NUM_FENCES]; /* assume 965 */
 	int num_fence_regs; /* 8 on pre-965, 16 otherwise */
 
+	struct optimum_config opt_config[LOAD_TYPE_MAX];
+
 	unsigned int fsb_freq, mem_freq, is_ddr3;
 	unsigned int skl_preferred_vco_freq;
 	unsigned int max_cdclk_freq;
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index d040858..28ff868 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -388,14 +388,35 @@ i915_gem_create_context(struct drm_i915_private *dev_priv,
 
 	trace_i915_context_create(ctx);
 	atomic_set(&ctx->req_cnt, 0);
+	ctx->update_render_config = 0;
 	ctx->slice_cnt = hweight8(INTEL_INFO(dev_priv)->sseu.slice_mask);
 	ctx->subslice_cnt = hweight8(
 			INTEL_INFO(dev_priv)->sseu.subslice_mask[0]);
 	ctx->eu_cnt = INTEL_INFO(dev_priv)->sseu.eu_per_subslice;
+	ctx->load_type = 0;
+	ctx->pending_load_type = 0;
 
 	return ctx;
 }
 
+
+void i915_gem_context_set_load_type(struct i915_gem_context *ctx,
+		enum gem_load_type type)
+{
+	struct drm_i915_private *dev_priv = ctx->i915;
+
+	/* Call opt_config to get correct configuration for eu,slice,subslice */
+	ctx->slice_cnt = dev_priv->opt_config[type].slice;
+	ctx->subslice_cnt = dev_priv->opt_config[type].subslice;
+	ctx->eu_cnt = dev_priv->opt_config[type].eu;
+
+	/* Enabling this to update the rpcs */
+	if (ctx->pending_load_type != type)
+		ctx->update_render_config = 1;
+
+	ctx->pending_load_type = type;
+}
+
 /**
  * i915_gem_context_create_gvt - create a GVT GEM context
  * @dev: drm device *
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index 2b3bf09..9345aa3 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -53,6 +53,19 @@ struct intel_context_ops {
 	void (*destroy)(struct intel_context *ce);
 };
 
+enum gem_load_type {
+	LOAD_TYPE_LOW,
+	LOAD_TYPE_MEDIUM,
+	LOAD_TYPE_HIGH,
+	LOAD_TYPE_MAX
+};
+
+struct optimum_config {
+	u8 slice;
+	u8 subslice;
+	u8 eu;
+};
+
 /**
  * struct i915_gem_context - client state
  *
@@ -209,6 +222,21 @@ struct i915_gem_context {
 
 	/** eu_cnt: used to set the # of eu to be enabled. */
 	u8 eu_cnt;
+
+	/** update_render_config: to track the updates to the render
+	 * configuration (S/SS/EU Configuration on the GPU)
+	 */
+	bool update_render_config;
+
+	/** load_type: The designated load_type (high/medium/low) for a given
+	 * number of pending commands in the command queue.
+	 */
+	enum gem_load_type load_type;
+
+	/** pending_load_type: The earlier load type that the GPU was configured
+	 * for (high/medium/low).
+	 */
+	enum gem_load_type pending_load_type;
 };
 
 static inline bool i915_gem_context_is_closed(const struct i915_gem_context *ctx)
@@ -337,6 +365,8 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 				    struct drm_file *file_priv);
 int i915_gem_context_reset_stats_ioctl(struct drm_device *dev, void *data,
 				       struct drm_file *file);
+void i915_gem_context_set_load_type(struct i915_gem_context *ctx,
+		enum gem_load_type type);
 
 struct i915_gem_context *
 i915_gem_context_create_kernel(struct drm_i915_private *i915, int prio);
diff --git a/drivers/gpu/drm/i915/intel_device_info.c b/drivers/gpu/drm/i915/intel_device_info.c
index 0ef0c64..4c6224a 100644
--- a/drivers/gpu/drm/i915/intel_device_info.c
+++ b/drivers/gpu/drm/i915/intel_device_info.c
@@ -741,6 +741,27 @@ void intel_device_info_runtime_init(struct intel_device_info *info)
 		container_of(info, struct drm_i915_private, info);
 	enum pipe pipe;
 
+	/* static table of slice/subslice/EU for Cherrytrial */
+	struct optimum_config chv_config[LOAD_TYPE_MAX] = {
+		{1, 1, 4},	/* Low */
+		{1, 1, 6},	/* Medium */
+		{1, 2, 6}	/* High */
+	};
+
+	/* static table of slice/subslice/EU for KBL GT2 */
+	struct optimum_config kbl_gt2_config[LOAD_TYPE_MAX] = {
+		{1, 3, 2},	/* Low */
+		{1, 3, 4},	/* Medium */
+		{1, 3, 8}	/* High */
+	};
+
+	/* static table of slice/subslice/EU for KBL GT3 */
+	struct optimum_config kbl_gt3_config[LOAD_TYPE_MAX] = {
+		{2, 3, 4},	/* Low */
+		{2, 3, 6},	/* Medium */
+		{2, 3, 8}	/* High */
+	};
+
 	if (INTEL_GEN(dev_priv) >= 10) {
 		for_each_pipe(dev_priv, pipe)
 			info->num_scalers[pipe] = 2;
@@ -840,12 +861,31 @@ void intel_device_info_runtime_init(struct intel_device_info *info)
 	/* Initialize slice/subslice/EU info */
 	if (IS_HASWELL(dev_priv))
 		haswell_sseu_info_init(dev_priv);
-	else if (IS_CHERRYVIEW(dev_priv))
+	else if (IS_CHERRYVIEW(dev_priv)) {
 		cherryview_sseu_info_init(dev_priv);
+		memcpy(dev_priv->opt_config, chv_config, sizeof(chv_config));
+	}
 	else if (IS_BROADWELL(dev_priv))
 		broadwell_sseu_info_init(dev_priv);
-	else if (INTEL_GEN(dev_priv) == 9)
+	else if (INTEL_GEN(dev_priv) == 9) {
 		gen9_sseu_info_init(dev_priv);
+
+		if (IS_KABYLAKE(dev_priv)) {
+			switch (info->gt) {
+			default:
+				MISSING_CASE(info->gt);
+				/* fall through */
+			case 2:
+				memcpy(dev_priv->opt_config, kbl_gt2_config,
+						sizeof(kbl_gt2_config));
+			break;
+			case 3:
+				memcpy(dev_priv->opt_config, kbl_gt3_config,
+						sizeof(kbl_gt3_config));
+			break;
+			}
+		}
+	}
 	else if (INTEL_GEN(dev_priv) == 10)
 		gen10_sseu_info_init(dev_priv);
 	else if (INTEL_GEN(dev_priv) >= 11)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index a8ab71a..4d000eb 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -427,11 +427,13 @@ static u64 execlists_update_context(struct i915_request *rq)
 
 	reg_state[CTX_RING_TAIL+1] = intel_ring_set_tail(rq->ring, rq->tail);
 	/* FIXME: To avoid stale rpcs config, move it to context_pin */
-	if (ctx->pid && ctx->name && (rq->engine->id == RCS)) {
+	if (ctx->pid && ctx->name && (rq->engine->id == RCS) &&
+			ctx->update_render_config) {
 		rpcs_config = make_rpcs(ctx->i915);
 		reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
 		CTX_REG(reg_state, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
 				rpcs_config);
+		ctx->update_render_config = 0;
 	}
 
 	/* True 32b PPGTT with dynamic page allocation: update PDP
-- 
2.7.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 4/4] drm/i915: Predictive governor to control eu/slice/subslice
  2018-11-06  4:13 [PATCH v2 0/4] Dynamic EU configuration of Slice/Subslice/EU Ankit Navik
                   ` (2 preceding siblings ...)
  2018-11-06  4:13 ` [PATCH v2 3/4] drm/i915: set optimum eu/slice/sub-slice configuration based on load type Ankit Navik
@ 2018-11-06  4:13 ` Ankit Navik
  2018-11-06 10:34   ` Tvrtko Ursulin
  2018-11-06  4:23 ` ✗ Fi.CI.BAT: failure for Dynamic EU configuration of Slice/Subslice/EU. (rev2) Patchwork
  2018-11-07 10:38 ` [PATCH v2 0/4] Dynamic EU configuration of Slice/Subslice/EU Tvrtko Ursulin
  5 siblings, 1 reply; 13+ messages in thread
From: Ankit Navik @ 2018-11-06  4:13 UTC (permalink / raw)
  To: intel-gfx

From: Praveen Diwakar <praveen.diwakar@intel.com>

High resoluton timer is used for predictive governor to control
eu/slice/subslice based on workloads.

Debugfs is provided to enable/disable/update timer configuration

Signed-off-by: Praveen Diwakar <praveen.diwakar@intel.com>
Signed-off-by: Yogesh Marathe <yogesh.marathe@intel.com>
Signed-off-by: Aravindan Muthukumar <aravindan.muthukumar@intel.com>
Signed-off-by: Kedar J Karanje <kedar.j.karanje@intel.com>
Signed-off-by: Ankit Navik <ankit.p.navik@intel.com>
Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 88 ++++++++++++++++++++++++++++++++++++-
 drivers/gpu/drm/i915/i915_drv.h     |  3 ++
 2 files changed, 90 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index f9ce35d..0f368f6 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -4740,6 +4740,90 @@ static const struct drm_info_list i915_debugfs_list[] = {
 	{"i915_drrs_status", i915_drrs_status, 0},
 	{"i915_rps_boost_info", i915_rps_boost_info, 0},
 };
+
+#define PENDING_REQ_0	0	/* No active request pending */
+
+/*
+ * Anything above threshold is considered as HIGH load, less is considered
+ * as LOW load and equal is considered as MEDIAUM load.
+ *
+ * The threshold value of three active requests pending.
+ */
+#define PENDING_REQ_3	3
+
+static int predictive_load_enable;
+
+static enum hrtimer_restart predictive_load_cb(struct hrtimer *hrtimer)
+{
+	struct drm_i915_private *dev_priv =
+			container_of(hrtimer, typeof(*dev_priv), pred_timer);
+	struct i915_gem_context *ctx;
+	atomic_t req_pending;
+
+	list_for_each_entry(ctx, &dev_priv->contexts.list, link) {
+
+		if (!ctx->name)
+			continue;
+
+		mutex_lock(&dev_priv->pred_mutex);
+		atomic_set(&req_pending, atomic_read(&ctx->req_cnt));
+		mutex_unlock(&dev_priv->pred_mutex);
+
+		if (atomic_read(&req_pending) == PENDING_REQ_0)
+			continue;
+
+		if (atomic_read(&req_pending) > PENDING_REQ_3)
+			ctx->load_type = LOAD_TYPE_HIGH;
+		else if (atomic_read(&req_pending) == PENDING_REQ_3)
+			ctx->load_type = LOAD_TYPE_MEDIUM;
+		else
+			ctx->load_type = LOAD_TYPE_LOW;
+
+		i915_gem_context_set_load_type(ctx, ctx->load_type);
+	}
+
+	hrtimer_forward_now(hrtimer,
+			ms_to_ktime(predictive_load_enable));
+
+	return HRTIMER_RESTART;
+}
+
+static int
+i915_predictive_load_get(void *data, u64 *val)
+{
+	*val = predictive_load_enable;
+	return 0;
+}
+
+static int
+i915_predictive_load_set(void *data, u64 val)
+{
+	struct drm_i915_private *dev_priv = data;
+
+	predictive_load_enable = val;
+
+	if (predictive_load_enable) {
+		if (!dev_priv->predictive_load_timer_init) {
+			hrtimer_init(&dev_priv->pred_timer, CLOCK_MONOTONIC,
+					HRTIMER_MODE_REL);
+			dev_priv->pred_timer.function = predictive_load_cb;
+			dev_priv->predictive_load_timer_init = 1;
+		}
+
+		hrtimer_start(&dev_priv->pred_timer,
+			ms_to_ktime(predictive_load_enable),
+			HRTIMER_MODE_REL_PINNED);
+	} else {
+		hrtimer_cancel(&dev_priv->pred_timer);
+	}
+
+	return 0;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(i915_predictive_load_ctl,
+			i915_predictive_load_get, i915_predictive_load_set,
+			"%llu\n");
+
 #define I915_DEBUGFS_ENTRIES ARRAY_SIZE(i915_debugfs_list)
 
 static const struct i915_debugfs_files {
@@ -4769,7 +4853,9 @@ static const struct i915_debugfs_files {
 	{"i915_hpd_storm_ctl", &i915_hpd_storm_ctl_fops},
 	{"i915_ipc_status", &i915_ipc_status_fops},
 	{"i915_drrs_ctl", &i915_drrs_ctl_fops},
-	{"i915_edp_psr_debug", &i915_edp_psr_debug_fops}
+	{"i915_edp_psr_debug", &i915_edp_psr_debug_fops},
+	/* FIXME: When feature will become real, move to sysfs */
+	{"i915_predictive_load_ctl", &i915_predictive_load_ctl}
 };
 
 int i915_debugfs_register(struct drm_i915_private *dev_priv)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index f1b16d0..72ddd63 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1686,7 +1686,10 @@ struct drm_i915_private {
 	struct drm_i915_fence_reg fence_regs[I915_MAX_NUM_FENCES]; /* assume 965 */
 	int num_fence_regs; /* 8 on pre-965, 16 otherwise */
 
+	/* optimal slice/subslice/EU configration state */
 	struct optimum_config opt_config[LOAD_TYPE_MAX];
+	struct hrtimer pred_timer;
+	int predictive_load_timer_init;
 
 	unsigned int fsb_freq, mem_freq, is_ddr3;
 	unsigned int skl_preferred_vco_freq;
-- 
2.7.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* ✗ Fi.CI.BAT: failure for Dynamic EU configuration of Slice/Subslice/EU. (rev2)
  2018-11-06  4:13 [PATCH v2 0/4] Dynamic EU configuration of Slice/Subslice/EU Ankit Navik
                   ` (3 preceding siblings ...)
  2018-11-06  4:13 ` [PATCH v2 4/4] drm/i915: Predictive governor to control eu/slice/subslice Ankit Navik
@ 2018-11-06  4:23 ` Patchwork
  2018-11-07 10:38 ` [PATCH v2 0/4] Dynamic EU configuration of Slice/Subslice/EU Tvrtko Ursulin
  5 siblings, 0 replies; 13+ messages in thread
From: Patchwork @ 2018-11-06  4:23 UTC (permalink / raw)
  To: Kedar J. Karanje; +Cc: intel-gfx

== Series Details ==

Series: Dynamic EU configuration of Slice/Subslice/EU. (rev2)
URL   : https://patchwork.freedesktop.org/series/50006/
State : failure

== Summary ==

Applying: drm/i915: Get active pending request for given context
Using index info to reconstruct a base tree...
M	drivers/gpu/drm/i915/i915_drv.c
M	drivers/gpu/drm/i915/i915_drv.h
M	drivers/gpu/drm/i915/i915_gem_context.c
M	drivers/gpu/drm/i915/i915_gem_context.h
M	drivers/gpu/drm/i915/i915_gem_execbuffer.c
M	drivers/gpu/drm/i915/intel_lrc.c
Falling back to patching base and 3-way merge...
Auto-merging drivers/gpu/drm/i915/intel_lrc.c
Auto-merging drivers/gpu/drm/i915/i915_gem_execbuffer.c
Auto-merging drivers/gpu/drm/i915/i915_gem_context.h
Auto-merging drivers/gpu/drm/i915/i915_gem_context.c
Auto-merging drivers/gpu/drm/i915/i915_drv.h
Auto-merging drivers/gpu/drm/i915/i915_drv.c
Applying: drm/i915: Update render power clock state configuration for given context
Using index info to reconstruct a base tree...
M	drivers/gpu/drm/i915/i915_gem_context.c
M	drivers/gpu/drm/i915/i915_gem_context.h
M	drivers/gpu/drm/i915/intel_lrc.c
Falling back to patching base and 3-way merge...
Auto-merging drivers/gpu/drm/i915/intel_lrc.c
CONFLICT (content): Merge conflict in drivers/gpu/drm/i915/intel_lrc.c
Auto-merging drivers/gpu/drm/i915/i915_gem_context.h
Auto-merging drivers/gpu/drm/i915/i915_gem_context.c
error: Failed to merge in the changes.
Patch failed at 0002 drm/i915: Update render power clock state configuration for given context
Use 'git am --show-current-patch' to see the failed patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/4] drm/i915: Get active pending request for given context
  2018-11-06  4:13 ` [PATCH v2 1/4] drm/i915: Get active pending request for given context Ankit Navik
@ 2018-11-06  9:44   ` Tvrtko Ursulin
  2018-12-11 10:48     ` Navik, Ankit P
  2019-03-14  8:51     ` Ankit Navik
  0 siblings, 2 replies; 13+ messages in thread
From: Tvrtko Ursulin @ 2018-11-06  9:44 UTC (permalink / raw)
  To: Ankit Navik, intel-gfx


On 06/11/2018 04:13, Ankit Navik wrote:
> From: Praveen Diwakar <praveen.diwakar@intel.com>
> 
> This patch gives us the active pending request count which is yet
> to be submitted to the GPU
> 
> Signed-off-by: Praveen Diwakar <praveen.diwakar@intel.com>
> Signed-off-by: Yogesh Marathe <yogesh.marathe@intel.com>
> Signed-off-by: Aravindan Muthukumar <aravindan.muthukumar@intel.com>
> Signed-off-by: Kedar J Karanje <kedar.j.karanje@intel.com>
> Signed-off-by: Ankit Navik <ankit.p.navik@intel.com>
> Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/i915_drv.c            | 1 +
>   drivers/gpu/drm/i915/i915_drv.h            | 5 +++++
>   drivers/gpu/drm/i915/i915_gem_context.c    | 1 +
>   drivers/gpu/drm/i915/i915_gem_context.h    | 6 ++++++
>   drivers/gpu/drm/i915/i915_gem_execbuffer.c | 5 +++++
>   drivers/gpu/drm/i915/intel_lrc.c           | 6 ++++++
>   6 files changed, 24 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index f8cfd16..d37c46e 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -903,6 +903,7 @@ static int i915_driver_init_early(struct drm_i915_private *dev_priv,
>   	mutex_init(&dev_priv->av_mutex);
>   	mutex_init(&dev_priv->wm.wm_mutex);
>   	mutex_init(&dev_priv->pps_mutex);
> +	mutex_init(&dev_priv->pred_mutex);
>   
>   	i915_memcpy_init_early(dev_priv);
>   
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 4aca534..137ec33 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1609,6 +1609,11 @@ struct drm_i915_private {
>   	 * controller on different i2c buses. */
>   	struct mutex gmbus_mutex;
>   
> +	/** pred_mutex protects against councurrent usage of pending
> +	 * request counter for multiple contexts
> +	 */
> +	struct mutex pred_mutex;
> +
>   	/**
>   	 * Base address of the gmbus and gpio block.
>   	 */
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index b10770c..0bcbe32 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -387,6 +387,7 @@ i915_gem_create_context(struct drm_i915_private *dev_priv,
>   	}
>   
>   	trace_i915_context_create(ctx);
> +	atomic_set(&ctx->req_cnt, 0);
>   
>   	return ctx;
>   }
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
> index b116e49..04e3ff7 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.h
> +++ b/drivers/gpu/drm/i915/i915_gem_context.h
> @@ -194,6 +194,12 @@ struct i915_gem_context {
>   	 * context close.
>   	 */
>   	struct list_head handles_list;
> +
> +	/** req_cnt: tracks the pending commands, based on which we decide to
> +	 * go for low/medium/high load configuration of the GPU, this is
> +	 * controlled via a mutex
> +	 */
> +	atomic_t req_cnt;
>   };
>   
>   static inline bool i915_gem_context_is_closed(const struct i915_gem_context *ctx)
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 3f0c612..8afa2a5 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -2178,6 +2178,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
>   		       struct drm_syncobj **fences)
>   {
>   	struct i915_execbuffer eb;
> +	struct drm_i915_private *dev_priv = to_i915(dev);
>   	struct dma_fence *in_fence = NULL;
>   	struct sync_file *out_fence = NULL;
>   	int out_fence_fd = -1;
> @@ -2390,6 +2391,10 @@ i915_gem_do_execbuffer(struct drm_device *dev,
>   	 */
>   	eb.request->batch = eb.batch;
>   
> +	mutex_lock(&dev_priv->pred_mutex);
> +	atomic_inc(&eb.ctx->req_cnt);

Point of going to atomic_t was to remove need for the mutex.

> +	mutex_unlock(&dev_priv->pred_mutex);
> +
>   	trace_i915_request_queue(eb.request, eb.batch_flags);
>   	err = eb_submit(&eb);
>   err_request:
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 1744792..bcbb66b 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -728,6 +728,12 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   			trace_i915_request_in(rq, port_index(port, execlists));
>   			last = rq;
>   			submit = true;
> +
> +			mutex_lock(&rq->i915->pred_mutex);
> +			if (atomic_read(&rq->gem_context->req_cnt) > 0)
> +				atomic_dec(&rq->gem_context->req_cnt);

Hitting underflow is a hint accounting does not work as expected. I 
really think you need to fix it by gathering some ideas from the patches 
I've pointed at in the previous round.

And there is also GuC to think about.

Regards,

Tvrtko

> +
> +			mutex_unlock(&rq->i915->pred_mutex);
>   		}
>   
>   		rb_erase_cached(&p->node, &execlists->queue);
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 3/4] drm/i915: set optimum eu/slice/sub-slice configuration based on load type
  2018-11-06  4:13 ` [PATCH v2 3/4] drm/i915: set optimum eu/slice/sub-slice configuration based on load type Ankit Navik
@ 2018-11-06 10:34   ` Tvrtko Ursulin
  0 siblings, 0 replies; 13+ messages in thread
From: Tvrtko Ursulin @ 2018-11-06 10:34 UTC (permalink / raw)
  To: Ankit Navik, intel-gfx


On 06/11/2018 04:13, Ankit Navik wrote:
> From: Praveen Diwakar <praveen.diwakar@intel.com>
> 
> This patch will select optimum eu/slice/sub-slice configuration based on
> type of load (low, medium, high) as input.
> Based on our readings and experiments we have predefined set of optimum
> configuration for each platform(CHT, KBL).
> i915_gem_context_set_load_type will select optimum configuration from
> pre-defined optimum configuration table(opt_config).
> 
> It also introduce flag update_render_config which can set by any governor.

Patch changelog is missing here and in the first patch.

> Signed-off-by: Praveen Diwakar <praveen.diwakar@intel.com>
> Signed-off-by: Yogesh Marathe <yogesh.marathe@intel.com>
> Signed-off-by: Aravindan Muthukumar <aravindan.muthukumar@intel.com>
> Signed-off-by: Kedar J Karanje <kedar.j.karanje@intel.com>
> Signed-off-by: Ankit Navik <ankit.p.navik@intel.com>
> Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/i915_drv.h          |  2 ++
>   drivers/gpu/drm/i915/i915_gem_context.c  | 21 +++++++++++++++
>   drivers/gpu/drm/i915/i915_gem_context.h  | 30 ++++++++++++++++++++++
>   drivers/gpu/drm/i915/intel_device_info.c | 44 ++++++++++++++++++++++++++++++--
>   drivers/gpu/drm/i915/intel_lrc.c         |  4 ++-
>   5 files changed, 98 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 137ec33..f1b16d0 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1686,6 +1686,8 @@ struct drm_i915_private {
>   	struct drm_i915_fence_reg fence_regs[I915_MAX_NUM_FENCES]; /* assume 965 */
>   	int num_fence_regs; /* 8 on pre-965, 16 otherwise */
>   
> +	struct optimum_config opt_config[LOAD_TYPE_MAX];
> +
>   	unsigned int fsb_freq, mem_freq, is_ddr3;
>   	unsigned int skl_preferred_vco_freq;
>   	unsigned int max_cdclk_freq;
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index d040858..28ff868 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -388,14 +388,35 @@ i915_gem_create_context(struct drm_i915_private *dev_priv,
>   
>   	trace_i915_context_create(ctx);
>   	atomic_set(&ctx->req_cnt, 0);
> +	ctx->update_render_config = 0;
>   	ctx->slice_cnt = hweight8(INTEL_INFO(dev_priv)->sseu.slice_mask);
>   	ctx->subslice_cnt = hweight8(
>   			INTEL_INFO(dev_priv)->sseu.subslice_mask[0]);
>   	ctx->eu_cnt = INTEL_INFO(dev_priv)->sseu.eu_per_subslice;
> +	ctx->load_type = 0;
> +	ctx->pending_load_type = 0;
>   
>   	return ctx;
>   }
>   
> +
> +void i915_gem_context_set_load_type(struct i915_gem_context *ctx,
> +		enum gem_load_type type)
> +{
> +	struct drm_i915_private *dev_priv = ctx->i915;
> +

To be safe I suggest:

if (GEM_WARN_ON(type > LOAD_TYPE_MAX))
	return;

> +	/* Call opt_config to get correct configuration for eu,slice,subslice */
> +	ctx->slice_cnt = dev_priv->opt_config[type].slice;
> +	ctx->subslice_cnt = dev_priv->opt_config[type].subslice;
> +	ctx->eu_cnt = dev_priv->opt_config[type].eu;
> +
> +	/* Enabling this to update the rpcs */
> +	if (ctx->pending_load_type != type)
> +		ctx->update_render_config = 1;
> +
> +	ctx->pending_load_type = type;
> +}
> +
>   /**
>    * i915_gem_context_create_gvt - create a GVT GEM context
>    * @dev: drm device *
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
> index 2b3bf09..9345aa3 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.h
> +++ b/drivers/gpu/drm/i915/i915_gem_context.h
> @@ -53,6 +53,19 @@ struct intel_context_ops {
>   	void (*destroy)(struct intel_context *ce);
>   };
>   
> +enum gem_load_type {
> +	LOAD_TYPE_LOW,
> +	LOAD_TYPE_MEDIUM,
> +	LOAD_TYPE_HIGH,
> +	LOAD_TYPE_MAX

Max is actually max plus one. Maybe instead have it as:

	LOAD_TYPE_LAST = LOAD_TYPE_HIGH,

> +};
> +
> +struct optimum_config {

Struct name needs to be more specific to avoid namespace clashes. 
Perhaps i915_sseu_optimum_config?

> +	u8 slice;
> +	u8 subslice;
> +	u8 eu;
> +};
> +
>   /**
>    * struct i915_gem_context - client state
>    *
> @@ -209,6 +222,21 @@ struct i915_gem_context {
>   
>   	/** eu_cnt: used to set the # of eu to be enabled. */
>   	u8 eu_cnt;
> +
> +	/** update_render_config: to track the updates to the render
> +	 * configuration (S/SS/EU Configuration on the GPU)
> +	 */
> +	bool update_render_config;
> +
> +	/** load_type: The designated load_type (high/medium/low) for a given
> +	 * number of pending commands in the command queue.
> +	 */
> +	enum gem_load_type load_type;

load_type looks unused in this patch.

> +
> +	/** pending_load_type: The earlier load type that the GPU was configured
> +	 * for (high/medium/low).
> +	 */
> +	enum gem_load_type pending_load_type;
>   };
>   
>   static inline bool i915_gem_context_is_closed(const struct i915_gem_context *ctx)
> @@ -337,6 +365,8 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>   				    struct drm_file *file_priv);
>   int i915_gem_context_reset_stats_ioctl(struct drm_device *dev, void *data,
>   				       struct drm_file *file);
> +void i915_gem_context_set_load_type(struct i915_gem_context *ctx,
> +		enum gem_load_type type);
>   
>   struct i915_gem_context *
>   i915_gem_context_create_kernel(struct drm_i915_private *i915, int prio);
> diff --git a/drivers/gpu/drm/i915/intel_device_info.c b/drivers/gpu/drm/i915/intel_device_info.c
> index 0ef0c64..4c6224a 100644
> --- a/drivers/gpu/drm/i915/intel_device_info.c
> +++ b/drivers/gpu/drm/i915/intel_device_info.c
> @@ -741,6 +741,27 @@ void intel_device_info_runtime_init(struct intel_device_info *info)
>   		container_of(info, struct drm_i915_private, info);
>   	enum pipe pipe;
>   
> +	/* static table of slice/subslice/EU for Cherrytrial */

Cherryview? At least a typo in trail.

> +	struct optimum_config chv_config[LOAD_TYPE_MAX] = {
> +		{1, 1, 4},	/* Low */
> +		{1, 1, 6},	/* Medium */
> +		{1, 2, 6}	/* High */
> +	};
> +
> +	/* static table of slice/subslice/EU for KBL GT2 */
> +	struct optimum_config kbl_gt2_config[LOAD_TYPE_MAX] = {
> +		{1, 3, 2},	/* Low */
> +		{1, 3, 4},	/* Medium */
> +		{1, 3, 8}	/* High */
> +	};
> +
> +	/* static table of slice/subslice/EU for KBL GT3 */
> +	struct optimum_config kbl_gt3_config[LOAD_TYPE_MAX] = {
> +		{2, 3, 4},	/* Low */
> +		{2, 3, 6},	/* Medium */
> +		{2, 3, 8}	/* High */
> +	};
> +
>   	if (INTEL_GEN(dev_priv) >= 10) {
>   		for_each_pipe(dev_priv, pipe)
>   			info->num_scalers[pipe] = 2;
> @@ -840,12 +861,31 @@ void intel_device_info_runtime_init(struct intel_device_info *info)
>   	/* Initialize slice/subslice/EU info */
>   	if (IS_HASWELL(dev_priv))
>   		haswell_sseu_info_init(dev_priv);
> -	else if (IS_CHERRYVIEW(dev_priv))
> +	else if (IS_CHERRYVIEW(dev_priv)) {
>   		cherryview_sseu_info_init(dev_priv);
> +		memcpy(dev_priv->opt_config, chv_config, sizeof(chv_config));
> +	}
>   	else if (IS_BROADWELL(dev_priv))
>   		broadwell_sseu_info_init(dev_priv);
> -	else if (INTEL_GEN(dev_priv) == 9)
> +	else if (INTEL_GEN(dev_priv) == 9) {
>   		gen9_sseu_info_init(dev_priv);
> +
> +		if (IS_KABYLAKE(dev_priv)) {
> +			switch (info->gt) {
> +			default:
> +				MISSING_CASE(info->gt);
> +				/* fall through */
> +			case 2:
> +				memcpy(dev_priv->opt_config, kbl_gt2_config,
> +						sizeof(kbl_gt2_config));
> +			break;
> +			case 3:
> +				memcpy(dev_priv->opt_config, kbl_gt3_config,
> +						sizeof(kbl_gt3_config));

As an alternative to many memcpy you could have:

	opt_config = NULL;
	...
	opt_config = kbl_gt2_config;
	...
	if (opt_config)
		memcpy(dev_priv->opt_config, opt_config, ...);

Or you could actually just store the pointer to rodata table in 
dev_priv, if you moved the tables to global scope as static const.

Both approaches are increasingly space efficient I think.

> +			break;
> +			}
> +		}
> +	}
>   	else if (INTEL_GEN(dev_priv) == 10)
>   		gen10_sseu_info_init(dev_priv);
>   	else if (INTEL_GEN(dev_priv) >= 11)
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index a8ab71a..4d000eb 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -427,11 +427,13 @@ static u64 execlists_update_context(struct i915_request *rq)
>   
>   	reg_state[CTX_RING_TAIL+1] = intel_ring_set_tail(rq->ring, rq->tail);
>   	/* FIXME: To avoid stale rpcs config, move it to context_pin */
> -	if (ctx->pid && ctx->name && (rq->engine->id == RCS)) {
> +	if (ctx->pid && ctx->name && (rq->engine->id == RCS) &&
> +			ctx->update_render_config) {

Could you do if ctx->load_type != ctx->pending_load_type here and lose 
the need for ctx->update_render_config?

>   		rpcs_config = make_rpcs(ctx->i915);
>   		reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
>   		CTX_REG(reg_state, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
>   				rpcs_config);
> +		ctx->update_render_config = 0;

In that case would need to do ctx->load_type = ctx->pending_load_type here.

Regards,

Tvrtko

>   	}
>   
>   	/* True 32b PPGTT with dynamic page allocation: update PDP
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 4/4] drm/i915: Predictive governor to control eu/slice/subslice
  2018-11-06  4:13 ` [PATCH v2 4/4] drm/i915: Predictive governor to control eu/slice/subslice Ankit Navik
@ 2018-11-06 10:34   ` Tvrtko Ursulin
  0 siblings, 0 replies; 13+ messages in thread
From: Tvrtko Ursulin @ 2018-11-06 10:34 UTC (permalink / raw)
  To: Ankit Navik, intel-gfx


On 06/11/2018 04:13, Ankit Navik wrote:
> From: Praveen Diwakar <praveen.diwakar@intel.com>
> 
> High resoluton timer is used for predictive governor to control

resolution

> eu/slice/subslice based on workloads.
> 
> Debugfs is provided to enable/disable/update timer configuration

Changelog is missing.

> 
> Signed-off-by: Praveen Diwakar <praveen.diwakar@intel.com>
> Signed-off-by: Yogesh Marathe <yogesh.marathe@intel.com>
> Signed-off-by: Aravindan Muthukumar <aravindan.muthukumar@intel.com>
> Signed-off-by: Kedar J Karanje <kedar.j.karanje@intel.com>
> Signed-off-by: Ankit Navik <ankit.p.navik@intel.com>
> Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/i915_debugfs.c | 88 ++++++++++++++++++++++++++++++++++++-
>   drivers/gpu/drm/i915/i915_drv.h     |  3 ++
>   2 files changed, 90 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index f9ce35d..0f368f6 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -4740,6 +4740,90 @@ static const struct drm_info_list i915_debugfs_list[] = {
>   	{"i915_drrs_status", i915_drrs_status, 0},
>   	{"i915_rps_boost_info", i915_rps_boost_info, 0},
>   };
> +
> +#define PENDING_REQ_0	0	/* No active request pending */
> +
> +/*
> + * Anything above threshold is considered as HIGH load, less is considered
> + * as LOW load and equal is considered as MEDIAUM load.

MEDIUM.

> + *
> + * The threshold value of three active requests pending.
> + */
> +#define PENDING_REQ_3	3

Is there a better name for these than number suffixes? Like maybe 
PENDING_THRESHOLD_MEDIUM or something?

> +
> +static int predictive_load_enable;
> +
> +static enum hrtimer_restart predictive_load_cb(struct hrtimer *hrtimer)
> +{
> +	struct drm_i915_private *dev_priv =
> +			container_of(hrtimer, typeof(*dev_priv), pred_timer);
> +	struct i915_gem_context *ctx;
> +	atomic_t req_pending;
> +
> +	list_for_each_entry(ctx, &dev_priv->contexts.list, link) {
> +
> +		if (!ctx->name)
> +			continue;

Put a comment saying why are you doing this.

> +
> +		mutex_lock(&dev_priv->pred_mutex);

You can't sleep in the hrtimer callback. I think I said this last time 
as well. But you also don't need to for atomic_read.

> +		atomic_set(&req_pending, atomic_read(&ctx->req_cnt));

req_pending does not need to be atomic_t.

> +		mutex_unlock(&dev_priv->pred_mutex);
> +
> +		if (atomic_read(&req_pending) == PENDING_REQ_0)
> +			continue;

By design you don't want to transition to LOW state with zero pending 
requests? If so this needs a comment.

> +
> +		if (atomic_read(&req_pending) > PENDING_REQ_3)
> +			ctx->load_type = LOAD_TYPE_HIGH;
> +		else if (atomic_read(&req_pending) == PENDING_REQ_3)
> +			ctx->load_type = LOAD_TYPE_MEDIUM;
> +		else
> +			ctx->load_type = LOAD_TYPE_LOW;
> +
> +		i915_gem_context_set_load_type(ctx, ctx->load_type);

You seem to be using ctx->load_type just for temporary storage which is 
not right.

Also, I think the usual pattern would be to set the pending type in 
i915_gem_context_set_load_type, which is then consumed (turned into 
load_type) in lower level code.

> +	}
> +
> +	hrtimer_forward_now(hrtimer,
> +			ms_to_ktime(predictive_load_enable));
> +
> +	return HRTIMER_RESTART;
> +}
> +
> +static int
> +i915_predictive_load_get(void *data, u64 *val)
> +{
> +	*val = predictive_load_enable;
> +	return 0;
> +}
> +
> +static int
> +i915_predictive_load_set(void *data, u64 val)
> +{
> +	struct drm_i915_private *dev_priv = data;
> +
> +	predictive_load_enable = val;
> +
> +	if (predictive_load_enable) {

This is still unsafe wrt sysfs set race.

> +		if (!dev_priv->predictive_load_timer_init) {
> +			hrtimer_init(&dev_priv->pred_timer, CLOCK_MONOTONIC,
> +					HRTIMER_MODE_REL);
> +			dev_priv->pred_timer.function = predictive_load_cb;
> +			dev_priv->predictive_load_timer_init = 1;

Timer init should be in dev_priv init code.

> +		}
> +
> +		hrtimer_start(&dev_priv->pred_timer,
> +			ms_to_ktime(predictive_load_enable),
> +			HRTIMER_MODE_REL_PINNED);

I think we talked about giving some slack to the timer.

> +	} else {
> +		hrtimer_cancel(&dev_priv->pred_timer);
> +	}
> +
> +	return 0;
> +}
> +
> +DEFINE_SIMPLE_ATTRIBUTE(i915_predictive_load_ctl,
> +			i915_predictive_load_get, i915_predictive_load_set,
> +			"%llu\n");
> +
>   #define I915_DEBUGFS_ENTRIES ARRAY_SIZE(i915_debugfs_list)
>   
>   static const struct i915_debugfs_files {
> @@ -4769,7 +4853,9 @@ static const struct i915_debugfs_files {
>   	{"i915_hpd_storm_ctl", &i915_hpd_storm_ctl_fops},
>   	{"i915_ipc_status", &i915_ipc_status_fops},
>   	{"i915_drrs_ctl", &i915_drrs_ctl_fops},
> -	{"i915_edp_psr_debug", &i915_edp_psr_debug_fops}
> +	{"i915_edp_psr_debug", &i915_edp_psr_debug_fops},
> +	/* FIXME: When feature will become real, move to sysfs */
> +	{"i915_predictive_load_ctl", &i915_predictive_load_ctl}
>   };
>   
>   int i915_debugfs_register(struct drm_i915_private *dev_priv)
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index f1b16d0..72ddd63 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1686,7 +1686,10 @@ struct drm_i915_private {
>   	struct drm_i915_fence_reg fence_regs[I915_MAX_NUM_FENCES]; /* assume 965 */
>   	int num_fence_regs; /* 8 on pre-965, 16 otherwise */
>   
> +	/* optimal slice/subslice/EU configration state */
>   	struct optimum_config opt_config[LOAD_TYPE_MAX];
> +	struct hrtimer pred_timer;
> +	int predictive_load_timer_init;
>   
>   	unsigned int fsb_freq, mem_freq, is_ddr3;
>   	unsigned int skl_preferred_vco_freq;
> 

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 0/4] Dynamic EU configuration of Slice/Subslice/EU.
  2018-11-06  4:13 [PATCH v2 0/4] Dynamic EU configuration of Slice/Subslice/EU Ankit Navik
                   ` (4 preceding siblings ...)
  2018-11-06  4:23 ` ✗ Fi.CI.BAT: failure for Dynamic EU configuration of Slice/Subslice/EU. (rev2) Patchwork
@ 2018-11-07 10:38 ` Tvrtko Ursulin
  2018-12-11  9:58   ` Navik, Ankit P
  5 siblings, 1 reply; 13+ messages in thread
From: Tvrtko Ursulin @ 2018-11-07 10:38 UTC (permalink / raw)
  To: Ankit Navik, intel-gfx


On 06/11/2018 04:13, Ankit Navik wrote:
> drm/i915: Context aware user agnostic EU/Slice/Sub-slice control within kernel
> 
> Current GPU configuration code for i915 does not allow us to change
> EU/Slice/Sub-slice configuration dynamically. Its done only once while context
> is created.
> 
> While particular graphics application is running, if we examine the command
> requests from user space, we observe that command density is not consistent.
> It means there is scope to change the graphics configuration dynamically even
> while context is running actively. This patch series proposes the solution to
> find the active pending load for all active context at given time and based on
> that, dynamically perform graphics configuration for each context.
> 
> We use a hr (high resolution) timer with i915 driver in kernel to get a
> callback every few milliseconds (this timer value can be configured through
> debugfs, default is '0' indicating timer is in disabled state i.e. original
> system without any intervention).In the timer callback, we examine pending
> commands for a context in the queue, essentially, we intercept them before
> they are executed by GPU and we update context with required number of EUs.
> 
> Two questions, how did we arrive at right timer value? and what's the right
> number of EUs? For the prior one, empirical data to achieve best performance
> in least power was considered. For the later one, we roughly categorized number
> of EUs logically based on platform. Now we compare number of pending commands
> with a particular threshold and then set number of EUs accordingly with update
> context. That threshold is also based on experiments & findings. If GPU is able
> to catch up with CPU, typically there are no pending commands, the EU config
> would remain unchanged there. In case there are more pending commands we
> reprogram context with higher number of EUs. Please note, here we are changing
> EUs even while context is running by examining pending commands every 'x'
> milliseconds.
> 
> With this solution in place, on KBL-GT3 + Android we saw following pnp
> benefits, power numbers mentioned here are system power.
> 
> App /KPI               | % Power |
>                         | Benefit |
>                         |  (mW)   |
> ---------------------------------|
> 3D Mark (Ice storm)    | 2.30%   |
> TRex On screen         | 2.49%   |
> TRex Off screen        | 1.32%   |
> ManhattanOn screen     | 3.11%   |
> Manhattan Off screen   | 0.89%   |
> AnTuTu  6.1.4          | 3.42%   |

Were you able to find some benchmarks which regress? Maybe try Synmark2 
and more from gfxbench? Not all benchmarks there are equally important, 
and regressions on some are fine, but I think a fuller set would be 
interesting to see.

Regards,

Tvrtko

> 
> Note - For KBL (GEN9) we cannot control at sub-slice level, it was always  a
> constraint.
> We always controlled number of EUs rather than sub-slices/slices.
> 
> Praveen Diwakar (4):
>    drm/i915: Get active pending request for given context
>    drm/i915: Update render power clock state configuration for given
>      context
>    drm/i915: set optimum eu/slice/sub-slice configuration based on load
>      type
>    drm/i915: Predictive governor to control eu/slice/subslice
> 
>   drivers/gpu/drm/i915/i915_debugfs.c        | 88 +++++++++++++++++++++++++++++-
>   drivers/gpu/drm/i915/i915_drv.c            |  1 +
>   drivers/gpu/drm/i915/i915_drv.h            | 10 ++++
>   drivers/gpu/drm/i915/i915_gem_context.c    | 26 +++++++++
>   drivers/gpu/drm/i915/i915_gem_context.h    | 45 +++++++++++++++
>   drivers/gpu/drm/i915/i915_gem_execbuffer.c |  5 ++
>   drivers/gpu/drm/i915/intel_device_info.c   | 44 ++++++++++++++-
>   drivers/gpu/drm/i915/intel_lrc.c           | 20 ++++++-
>   8 files changed, 235 insertions(+), 4 deletions(-)
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 0/4] Dynamic EU configuration of Slice/Subslice/EU.
  2018-11-07 10:38 ` [PATCH v2 0/4] Dynamic EU configuration of Slice/Subslice/EU Tvrtko Ursulin
@ 2018-12-11  9:58   ` Navik, Ankit P
  0 siblings, 0 replies; 13+ messages in thread
From: Navik, Ankit P @ 2018-12-11  9:58 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Hi Tvrtko,

> On Wed, Nov 7, 2018 at 4:08 PM Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> 
> 
> On 06/11/2018 04:13, Ankit Navik wrote:
> > drm/i915: Context aware user agnostic EU/Slice/Sub-slice control
> > within kernel
> >
> > Current GPU configuration code for i915 does not allow us to change
> > EU/Slice/Sub-slice configuration dynamically. Its done only once while
> > context is created.
> >
> > While particular graphics application is running, if we examine the
> > command requests from user space, we observe that command density is not
> consistent.
> > It means there is scope to change the graphics configuration
> > dynamically even while context is running actively. This patch series
> > proposes the solution to find the active pending load for all active
> > context at given time and based on that, dynamically perform graphics
> configuration for each context.
> >
> > We use a hr (high resolution) timer with i915 driver in kernel to get
> > a callback every few milliseconds (this timer value can be configured
> > through debugfs, default is '0' indicating timer is in disabled state
> > i.e. original system without any intervention).In the timer callback,
> > we examine pending commands for a context in the queue, essentially,
> > we intercept them before they are executed by GPU and we update context
> with required number of EUs.
> >
> > Two questions, how did we arrive at right timer value? and what's the
> > right number of EUs? For the prior one, empirical data to achieve best
> > performance in least power was considered. For the later one, we
> > roughly categorized number of EUs logically based on platform. Now we
> > compare number of pending commands with a particular threshold and
> > then set number of EUs accordingly with update context. That threshold
> > is also based on experiments & findings. If GPU is able to catch up
> > with CPU, typically there are no pending commands, the EU config would
> > remain unchanged there. In case there are more pending commands we
> > reprogram context with higher number of EUs. Please note, here we are
> changing EUs even while context is running by examining pending commands
> every 'x'
> > milliseconds.
> >
> > With this solution in place, on KBL-GT3 + Android we saw following pnp
> > benefits, power numbers mentioned here are system power.
> >
> > App /KPI               | % Power |
> >                         | Benefit |
> >                         |  (mW)   |
> > ---------------------------------|
> > 3D Mark (Ice storm)    | 2.30%   |
> > TRex On screen         | 2.49%   |
> > TRex Off screen        | 1.32%   |
> > ManhattanOn screen     | 3.11%   |
> > Manhattan Off screen   | 0.89%   |
> > AnTuTu  6.1.4          | 3.42%   |
> 
> Were you able to find some benchmarks which regress? Maybe try Synmark2
> and more from gfxbench? Not all benchmarks there are equally important, and
> regressions on some are fine, but I think a fuller set would be interesting to see.

We have not seen much improvement in GFX Carchase, but there was no degradation in performance.
Regards, Ankit 
> 
> Regards,
> 
> Tvrtko
> 
> >
> > Note - For KBL (GEN9) we cannot control at sub-slice level, it was
> > always  a constraint.
> > We always controlled number of EUs rather than sub-slices/slices.
> >
> > Praveen Diwakar (4):
> >    drm/i915: Get active pending request for given context
> >    drm/i915: Update render power clock state configuration for given
> >      context
> >    drm/i915: set optimum eu/slice/sub-slice configuration based on load
> >      type
> >    drm/i915: Predictive governor to control eu/slice/subslice
> >
> >   drivers/gpu/drm/i915/i915_debugfs.c        | 88
> +++++++++++++++++++++++++++++-
> >   drivers/gpu/drm/i915/i915_drv.c            |  1 +
> >   drivers/gpu/drm/i915/i915_drv.h            | 10 ++++
> >   drivers/gpu/drm/i915/i915_gem_context.c    | 26 +++++++++
> >   drivers/gpu/drm/i915/i915_gem_context.h    | 45 +++++++++++++++
> >   drivers/gpu/drm/i915/i915_gem_execbuffer.c |  5 ++
> >   drivers/gpu/drm/i915/intel_device_info.c   | 44 ++++++++++++++-
> >   drivers/gpu/drm/i915/intel_lrc.c           | 20 ++++++-
> >   8 files changed, 235 insertions(+), 4 deletions(-)
> >
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/4] drm/i915: Get active pending request for given context
  2018-11-06  9:44   ` Tvrtko Ursulin
@ 2018-12-11 10:48     ` Navik, Ankit P
  2019-03-14  8:51     ` Ankit Navik
  1 sibling, 0 replies; 13+ messages in thread
From: Navik, Ankit P @ 2018-12-11 10:48 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Hi Tvrtko, 

> On Tue, Nov 6, 2018 at 3:14 PM Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> 
> 
> On 06/11/2018 04:13, Ankit Navik wrote:
> > From: Praveen Diwakar <praveen.diwakar@intel.com>
> >
> > This patch gives us the active pending request count which is yet to
> > be submitted to the GPU
> >
> > Signed-off-by: Praveen Diwakar <praveen.diwakar@intel.com>
> > Signed-off-by: Yogesh Marathe <yogesh.marathe@intel.com>
> > Signed-off-by: Aravindan Muthukumar <aravindan.muthukumar@intel.com>
> > Signed-off-by: Kedar J Karanje <kedar.j.karanje@intel.com>
> > Signed-off-by: Ankit Navik <ankit.p.navik@intel.com>
> > Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_drv.c            | 1 +
> >   drivers/gpu/drm/i915/i915_drv.h            | 5 +++++
> >   drivers/gpu/drm/i915/i915_gem_context.c    | 1 +
> >   drivers/gpu/drm/i915/i915_gem_context.h    | 6 ++++++
> >   drivers/gpu/drm/i915/i915_gem_execbuffer.c | 5 +++++
> >   drivers/gpu/drm/i915/intel_lrc.c           | 6 ++++++
> >   6 files changed, 24 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_drv.c
> > b/drivers/gpu/drm/i915/i915_drv.c index f8cfd16..d37c46e 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.c
> > +++ b/drivers/gpu/drm/i915/i915_drv.c
> > @@ -903,6 +903,7 @@ static int i915_driver_init_early(struct
> drm_i915_private *dev_priv,
> >   	mutex_init(&dev_priv->av_mutex);
> >   	mutex_init(&dev_priv->wm.wm_mutex);
> >   	mutex_init(&dev_priv->pps_mutex);
> > +	mutex_init(&dev_priv->pred_mutex);
> >
> >   	i915_memcpy_init_early(dev_priv);
> >
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h
> > b/drivers/gpu/drm/i915/i915_drv.h index 4aca534..137ec33 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -1609,6 +1609,11 @@ struct drm_i915_private {
> >   	 * controller on different i2c buses. */
> >   	struct mutex gmbus_mutex;
> >
> > +	/** pred_mutex protects against councurrent usage of pending
> > +	 * request counter for multiple contexts
> > +	 */
> > +	struct mutex pred_mutex;
> > +
> >   	/**
> >   	 * Base address of the gmbus and gpio block.
> >   	 */
> > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c
> > b/drivers/gpu/drm/i915/i915_gem_context.c
> > index b10770c..0bcbe32 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> > @@ -387,6 +387,7 @@ i915_gem_create_context(struct drm_i915_private
> *dev_priv,
> >   	}
> >
> >   	trace_i915_context_create(ctx);
> > +	atomic_set(&ctx->req_cnt, 0);
> >
> >   	return ctx;
> >   }
> > diff --git a/drivers/gpu/drm/i915/i915_gem_context.h
> > b/drivers/gpu/drm/i915/i915_gem_context.h
> > index b116e49..04e3ff7 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_context.h
> > +++ b/drivers/gpu/drm/i915/i915_gem_context.h
> > @@ -194,6 +194,12 @@ struct i915_gem_context {
> >   	 * context close.
> >   	 */
> >   	struct list_head handles_list;
> > +
> > +	/** req_cnt: tracks the pending commands, based on which we decide
> to
> > +	 * go for low/medium/high load configuration of the GPU, this is
> > +	 * controlled via a mutex
> > +	 */
> > +	atomic_t req_cnt;
> >   };
> >
> >   static inline bool i915_gem_context_is_closed(const struct
> > i915_gem_context *ctx) diff --git
> > a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > index 3f0c612..8afa2a5 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > @@ -2178,6 +2178,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
> >   		       struct drm_syncobj **fences)
> >   {
> >   	struct i915_execbuffer eb;
> > +	struct drm_i915_private *dev_priv = to_i915(dev);
> >   	struct dma_fence *in_fence = NULL;
> >   	struct sync_file *out_fence = NULL;
> >   	int out_fence_fd = -1;
> > @@ -2390,6 +2391,10 @@ i915_gem_do_execbuffer(struct drm_device *dev,
> >   	 */
> >   	eb.request->batch = eb.batch;
> >
> > +	mutex_lock(&dev_priv->pred_mutex);
> > +	atomic_inc(&eb.ctx->req_cnt);
> 
> Point of going to atomic_t was to remove need for the mutex.
> 
> > +	mutex_unlock(&dev_priv->pred_mutex);
> > +
> >   	trace_i915_request_queue(eb.request, eb.batch_flags);
> >   	err = eb_submit(&eb);
> >   err_request:
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c
> > b/drivers/gpu/drm/i915/intel_lrc.c
> > index 1744792..bcbb66b 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -728,6 +728,12 @@ static void execlists_dequeue(struct intel_engine_cs
> *engine)
> >   			trace_i915_request_in(rq, port_index(port, execlists));
> >   			last = rq;
> >   			submit = true;
> > +
> > +			mutex_lock(&rq->i915->pred_mutex);
> > +			if (atomic_read(&rq->gem_context->req_cnt) > 0)
> > +				atomic_dec(&rq->gem_context->req_cnt);
> 
> Hitting underflow is a hint accounting does not work as expected. I really think
> you need to fix it by gathering some ideas from the patches I've pointed at in the
> previous round.

Changes submitted in v3 patch.
https://patchwork.freedesktop.org/patch/267386/ 

Regards, 
Ankit
> 
> And there is also GuC to think about.
> 
> Regards,
> 
> Tvrtko
> 
> > +
> > +			mutex_unlock(&rq->i915->pred_mutex);
> >   		}
> >
> >   		rb_erase_cached(&p->node, &execlists->queue);
> >
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/4] drm/i915: Get active pending request for given context
  2018-11-06  9:44   ` Tvrtko Ursulin
  2018-12-11 10:48     ` Navik, Ankit P
@ 2019-03-14  8:51     ` Ankit Navik
  1 sibling, 0 replies; 13+ messages in thread
From: Ankit Navik @ 2019-03-14  8:51 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Ankit Navik, intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 5849 bytes --]

Hi Tvrtko,

On Tue, Nov 6, 2018 at 3:14 PM Tvrtko Ursulin <
tvrtko.ursulin@linux.intel.com> wrote:

>
> On 06/11/2018 04:13, Ankit Navik wrote:
> > From: Praveen Diwakar <praveen.diwakar@intel.com>
> >
> > This patch gives us the active pending request count which is yet
> > to be submitted to the GPU
> >
> > Signed-off-by: Praveen Diwakar <praveen.diwakar@intel.com>
> > Signed-off-by: Yogesh Marathe <yogesh.marathe@intel.com>
> > Signed-off-by: Aravindan Muthukumar <aravindan.muthukumar@intel.com>
> > Signed-off-by: Kedar J Karanje <kedar.j.karanje@intel.com>
> > Signed-off-by: Ankit Navik <ankit.p.navik@intel.com>
> > Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_drv.c            | 1 +
> >   drivers/gpu/drm/i915/i915_drv.h            | 5 +++++
> >   drivers/gpu/drm/i915/i915_gem_context.c    | 1 +
> >   drivers/gpu/drm/i915/i915_gem_context.h    | 6 ++++++
> >   drivers/gpu/drm/i915/i915_gem_execbuffer.c | 5 +++++
> >   drivers/gpu/drm/i915/intel_lrc.c           | 6 ++++++
> >   6 files changed, 24 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_drv.c
> b/drivers/gpu/drm/i915/i915_drv.c
> > index f8cfd16..d37c46e 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.c
> > +++ b/drivers/gpu/drm/i915/i915_drv.c
> > @@ -903,6 +903,7 @@ static int i915_driver_init_early(struct
> drm_i915_private *dev_priv,
> >       mutex_init(&dev_priv->av_mutex);
> >       mutex_init(&dev_priv->wm.wm_mutex);
> >       mutex_init(&dev_priv->pps_mutex);
> > +     mutex_init(&dev_priv->pred_mutex);
> >
> >       i915_memcpy_init_early(dev_priv);
> >
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h
> b/drivers/gpu/drm/i915/i915_drv.h
> > index 4aca534..137ec33 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -1609,6 +1609,11 @@ struct drm_i915_private {
> >        * controller on different i2c buses. */
> >       struct mutex gmbus_mutex;
> >
> > +     /** pred_mutex protects against councurrent usage of pending
> > +      * request counter for multiple contexts
> > +      */
> > +     struct mutex pred_mutex;
> > +
> >       /**
> >        * Base address of the gmbus and gpio block.
> >        */
> > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c
> b/drivers/gpu/drm/i915/i915_gem_context.c
> > index b10770c..0bcbe32 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> > @@ -387,6 +387,7 @@ i915_gem_create_context(struct drm_i915_private
> *dev_priv,
> >       }
> >
> >       trace_i915_context_create(ctx);
> > +     atomic_set(&ctx->req_cnt, 0);
> >
> >       return ctx;
> >   }
> > diff --git a/drivers/gpu/drm/i915/i915_gem_context.h
> b/drivers/gpu/drm/i915/i915_gem_context.h
> > index b116e49..04e3ff7 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_context.h
> > +++ b/drivers/gpu/drm/i915/i915_gem_context.h
> > @@ -194,6 +194,12 @@ struct i915_gem_context {
> >        * context close.
> >        */
> >       struct list_head handles_list;
> > +
> > +     /** req_cnt: tracks the pending commands, based on which we decide
> to
> > +      * go for low/medium/high load configuration of the GPU, this is
> > +      * controlled via a mutex
> > +      */
> > +     atomic_t req_cnt;
> >   };
> >
> >   static inline bool i915_gem_context_is_closed(const struct
> i915_gem_context *ctx)
> > diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > index 3f0c612..8afa2a5 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > @@ -2178,6 +2178,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
> >                      struct drm_syncobj **fences)
> >   {
> >       struct i915_execbuffer eb;
> > +     struct drm_i915_private *dev_priv = to_i915(dev);
> >       struct dma_fence *in_fence = NULL;
> >       struct sync_file *out_fence = NULL;
> >       int out_fence_fd = -1;
> > @@ -2390,6 +2391,10 @@ i915_gem_do_execbuffer(struct drm_device *dev,
> >        */
> >       eb.request->batch = eb.batch;
> >
> > +     mutex_lock(&dev_priv->pred_mutex);
> > +     atomic_inc(&eb.ctx->req_cnt);
>
> Point of going to atomic_t was to remove need for the mutex.
>
> > +     mutex_unlock(&dev_priv->pred_mutex);
> > +
> >       trace_i915_request_queue(eb.request, eb.batch_flags);
> >       err = eb_submit(&eb);
> >   err_request:
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c
> b/drivers/gpu/drm/i915/intel_lrc.c
> > index 1744792..bcbb66b 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -728,6 +728,12 @@ static void execlists_dequeue(struct
> intel_engine_cs *engine)
> >                       trace_i915_request_in(rq, port_index(port,
> execlists));
> >                       last = rq;
> >                       submit = true;
> > +
> > +                     mutex_lock(&rq->i915->pred_mutex);
> > +                     if (atomic_read(&rq->gem_context->req_cnt) > 0)
> > +                             atomic_dec(&rq->gem_context->req_cnt);
>
> Hitting underflow is a hint accounting does not work as expected. I
> really think you need to fix it by gathering some ideas from the patches
> I've pointed at in the previous round.
>

I have submitted the patch v4.
I have tried with point which you have suggested, but didnt see much
power benefit with that.

Regards, Ankit

>
> And there is also GuC to think about.
>
> Regards,
>
> Tvrtko
>
> > +
> > +                     mutex_unlock(&rq->i915->pred_mutex);
> >               }
> >
> >               rb_erase_cached(&p->node, &execlists->queue);
> >
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>

[-- Attachment #1.2: Type: text/html, Size: 8234 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-03-14  8:51 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-06  4:13 [PATCH v2 0/4] Dynamic EU configuration of Slice/Subslice/EU Ankit Navik
2018-11-06  4:13 ` [PATCH v2 1/4] drm/i915: Get active pending request for given context Ankit Navik
2018-11-06  9:44   ` Tvrtko Ursulin
2018-12-11 10:48     ` Navik, Ankit P
2019-03-14  8:51     ` Ankit Navik
2018-11-06  4:13 ` [PATCH v2 2/4] drm/i915: Update render power clock state configuration " Ankit Navik
2018-11-06  4:13 ` [PATCH v2 3/4] drm/i915: set optimum eu/slice/sub-slice configuration based on load type Ankit Navik
2018-11-06 10:34   ` Tvrtko Ursulin
2018-11-06  4:13 ` [PATCH v2 4/4] drm/i915: Predictive governor to control eu/slice/subslice Ankit Navik
2018-11-06 10:34   ` Tvrtko Ursulin
2018-11-06  4:23 ` ✗ Fi.CI.BAT: failure for Dynamic EU configuration of Slice/Subslice/EU. (rev2) Patchwork
2018-11-07 10:38 ` [PATCH v2 0/4] Dynamic EU configuration of Slice/Subslice/EU Tvrtko Ursulin
2018-12-11  9:58   ` Navik, Ankit P

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.