All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/8] New DDB Algo and WM fixes
@ 2016-10-13 10:58 Kumar, Mahesh
  2016-10-13 10:58 ` [PATCH v4 1/8] drm/i915/skl+: use linetime latency instead of ddb size Kumar, Mahesh
                   ` (8 more replies)
  0 siblings, 9 replies; 27+ messages in thread
From: Kumar, Mahesh @ 2016-10-13 10:58 UTC (permalink / raw)
  To: intel-gfx; +Cc: paulo.r.zanoni

From: Mahesh Kumar <mahesh1.kumar@intel.com>

This series implements new DDB allocation algorithm to solve the cases,
where we have sufficient DDB available to enable multiple planes, But
due to the current algorithm not dividing it properly among planes, we
end-up failing the flip.
It also takes care of enabling same watermark level for each
plane, for efficient power saving.
This series also implements Transition Watermarks and Gen-9 related
arbitrated display bandwidth Workarounds.

There are two steps in current WM programming.

1. Calculate minimum number of blocks required  for a WM level to be
enabled. For 1440x2560 panel we need 41 blocks as minimum number of
blocks to enable WM0. This is the step which doesn't use vertical size.
It only depends on Pipe drain rate and plane horizontal size as per the
current Bspec algorithm.
So all the plane below have minimum  number of blocks required to enable
WM0 as 41
    Plane 1  - 1440x2560        -    Min blocks to enable WM0 = 41
    Plane 2  - 1440x2560        -    Min blocks to enable WM0 = 41
    Plane 3  - 1440x48          -    Min blocks to enable WM0 = 41
    Plane 4  - 1440x96          -    Min blocks to enable WM0 = 41

2. Number of blocks allotted by the driver
    Driver allocates  12 for Plane 3   &  16 for plane 4

    Total Dbuf Available = 508
    Dbuf Available after 32 blocks for cursor = 508 - (32)  = 476
    allocate minimum blocks for each plane 8 * 4 = 32
    remaining blocks = 476 - 32 = 444
    Relative Data Rate for Planes
       Plane 1  =  1440 * 2560 * 3  =  11059200
       Plane 2  =  1440 * 2560 * 3  =  11059200
       Plane 3  =  1440 * 48   * 3  =  207360
       Plane 4  =  1440 * 96   * 3  =  414720
       Total Relative BW            =  22740480

-   Allocate Buffer
    buffer allocation = (Plane relative data rate / total data rate)
		    * total remaming DDB + minimum plane DDB
     Plane 1  buffer allocation = (11059200 / 22740480) * 444 + 8 = 223
     Plane 2  buffer allocation = (11059200 / 22740480) * 444 + 8 = 223
     Plane 3  buffer allocation = (207360   / 22740480) * 444 + 8 = 12
     Plane 4  buffer allocation = (414720   / 22740480) * 444 + 8 = 16

In this case it forced driver to disable Plane 3 & 4. Driver need to use
more efficient way to allocate buffer that is optimum for power.

New Algorithm suggested by HW team is:

1. Calculate minimum buffer allocations for each plane and for each
    watermark level

2. Add minimum buffer allocations required for enabling WM7
    for all the planes

Level 0 =  41 + 41 + 41 + 41  = 164
Level 1 =  42 + 42 + 42 + 42  = 168
Level 2 =  42 + 42 + 42 + 42  = 168
Level 3 =  94 + 94 + 94 + 94 =  376
Level 4 =  94 + 94 + 94 + 94 =  376
Level 5 =  94 + 94 + 94 + 94 =  376
Level 6 =  94 + 94 + 94 + 94 =  376
Level 7 =  94 + 94 + 94 + 94 =  376

3. Check to see how many buffer allocation are left and enable
the best case. In this case since we have 476 blocks we can enable
WM0-7 on all 4 planes.
Let's say if we have only 200 block available then the best cases
allocation is to enable Level2 which requires 168 blocks

Changes since v1:
 - Rebased the series on top of Paulo's patches (under review)
    https://patchwork.freedesktop.org/series/12082/
 - Add changes to calculate WM in fixed point 16.16
 - Address review comments for Transition WM
Changes since v2:
 - Added Paulo's WM fixes series URL in cover letter
Changes since v3:
 - Address Review comments
 - Drop patch to set y-tile WA, as this is already there &
    as per HW team always setting the WA doesn't harm.
 - Drop Transition WM and patch to reduce function parameter patch
 - split patches

Kumar, Mahesh (3):
  drm/i915/skl+: use linetime latency instead of ddb size
  drm/i915: Decode system memory bandwidth
  drm/i915/gen9: WM memory bandwidth related workaround

Mahesh Kumar (5):
  drm/i915/skl: New ddb allocation algorithm
  drm/i915/skl+: reset y_plane ddb structure also during calculation
  drm/i915/skl: Add variables to check x_tile and y_tile
  drm/i915/skl+: change WM calc to fixed point 16.16
  drm/i915/bxt: Enable IPC support

 drivers/gpu/drm/i915/i915_drv.c  | 172 +++++++++++++++++++
 drivers/gpu/drm/i915/i915_drv.h  |  23 +++
 drivers/gpu/drm/i915/i915_reg.h  |  39 +++++
 drivers/gpu/drm/i915/intel_drv.h |  12 ++
 drivers/gpu/drm/i915/intel_pm.c  | 351 ++++++++++++++++++++++++++++++---------
 5 files changed, 514 insertions(+), 83 deletions(-)

-- 
2.10.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v4 1/8] drm/i915/skl+: use linetime latency instead of ddb size
  2016-10-13 10:58 [PATCH v4 0/8] New DDB Algo and WM fixes Kumar, Mahesh
@ 2016-10-13 10:58 ` Kumar, Mahesh
  2016-10-31 18:03   ` Paulo Zanoni
  2016-10-13 10:58 ` [PATCH v4 2/8] drm/i915/skl: New ddb allocation algorithm Kumar, Mahesh
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 27+ messages in thread
From: Kumar, Mahesh @ 2016-10-13 10:58 UTC (permalink / raw)
  To: intel-gfx; +Cc: paulo.r.zanoni

This patch make changes to use linetime latency instead of allocated
DDB size during plane watermark calculation in switch case, This is
required to implement new DDB allocation algorithm.

In New Algorithm DDB is allocated based on WM values, because of which
number of DDB blocks will not be available during WM calculation,
So this "linetime latency" is suggested by SV/HW team to use during
switch-case for WM blocks selection.

Changes since v1:
 - Rebase on top of Paulo's patch series
Changes since v2:
 - Fix if-else condition (pointed by Maarten)

Signed-off-by: "Kumar, Mahesh" <mahesh1.kumar@intel.com>
---
 drivers/gpu/drm/i915/intel_pm.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 7f1748a..098336d 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -3616,10 +3616,14 @@ static int skl_compute_plane_wm(const struct drm_i915_private *dev_priv,
 	    fb->modifier[0] == I915_FORMAT_MOD_Yf_TILED) {
 		selected_result = max(method2, y_tile_minimum);
 	} else {
+		uint32_t linetime_us = 0;
+
+		linetime_us = DIV_ROUND_UP(width * 1000,
+				skl_pipe_pixel_rate(cstate));
 		if ((cpp * cstate->base.adjusted_mode.crtc_htotal / 512 < 1) &&
 		    (plane_bytes_per_line / 512 < 1))
 			selected_result = method2;
-		else if ((ddb_allocation / plane_blocks_per_line) >= 1)
+		else if (latency >= linetime_us)
 			selected_result = min(method1, method2);
 		else
 			selected_result = method1;
-- 
2.10.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 2/8] drm/i915/skl: New ddb allocation algorithm
  2016-10-13 10:58 [PATCH v4 0/8] New DDB Algo and WM fixes Kumar, Mahesh
  2016-10-13 10:58 ` [PATCH v4 1/8] drm/i915/skl+: use linetime latency instead of ddb size Kumar, Mahesh
@ 2016-10-13 10:58 ` Kumar, Mahesh
  2016-11-04 19:25   ` Paulo Zanoni
  2016-10-13 10:58 ` [PATCH v4 3/8] drm/i915: Decode system memory bandwidth Kumar, Mahesh
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 27+ messages in thread
From: Kumar, Mahesh @ 2016-10-13 10:58 UTC (permalink / raw)
  To: intel-gfx; +Cc: paulo.r.zanoni

From: Mahesh Kumar <mahesh1.kumar@intel.com>

This patch implements new DDB allocation algorithm as per HW team
recommendation. This algo takecare of scenario where we allocate less DDB
for the planes with lower relative pixel rate, but they require more DDB
to work.
It also takes care of enabling same watermark level for each
plane, for efficient power saving.

Changes since v1:
 - Rebase on top of Paulo's patch series

Changes since v2:
 - Fix the for loop condition to enable WM

Changes since v3:
 - Fix crash in cursor i-g-t reported by Maarten
 - Rebase after addressing Paulo's comments
 - Few other ULT fixes

Signed-off-by: Mahesh Kumar <mahesh1.kumar@intel.com>
---
 drivers/gpu/drm/i915/intel_pm.c | 149 +++++++++++++++++++++-------------------
 1 file changed, 79 insertions(+), 70 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 098336d..84ec6b1 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -3344,6 +3344,7 @@ skl_ddb_min_alloc(const struct drm_plane_state *pstate,
 
 static int
 skl_allocate_pipe_ddb(struct intel_crtc_state *cstate,
+		      struct skl_pipe_wm *pipe_wm,
 		      struct skl_ddb_allocation *ddb /* out */)
 {
 	struct drm_atomic_state *state = cstate->base.state;
@@ -3359,8 +3360,12 @@ skl_allocate_pipe_ddb(struct intel_crtc_state *cstate,
 	uint16_t *minimum = cstate->wm.skl.minimum_blocks;
 	uint16_t *y_minimum = cstate->wm.skl.minimum_y_blocks;
 	unsigned int total_data_rate;
+	uint16_t total_min_blocks = 0;
+	uint16_t total_level_ddb = 0;
 	int num_active;
-	int id, i;
+	int max_level, level;
+	int id, i, ret = 0;
+
 
 	if (WARN_ON(!state))
 		return 0;
@@ -3409,19 +3414,42 @@ skl_allocate_pipe_ddb(struct intel_crtc_state *cstate,
 	}
 
 	for (i = 0; i < PLANE_CURSOR; i++) {
-		alloc_size -= minimum[i];
-		alloc_size -= y_minimum[i];
+		total_min_blocks += minimum[i];
+		total_min_blocks += y_minimum[i];
 	}
 
-	/*
-	 * 2. Distribute the remaining space in proportion to the amount of
-	 * data each plane needs to fetch from memory.
-	 *
-	 * FIXME: we may not allocate every single block here.
-	 */
+	for (level = ilk_wm_max_level(dev); level >= 0; level--) {
+		total_level_ddb = 0;
+		for (i = 0; i < PLANE_CURSOR; i++) {
+			/*
+			 * TODO: We should calculate watermark values for Y/UV
+			 * plane both in case of NV12 format and use both values
+			 * for ddb calculation, As NV12 is disabled as of now.
+			 * using only single plane value here.
+			 */
+			uint16_t min = minimum[i] + y_minimum[i];
+			uint16_t plane_level_ddb_wm =
+				max(pipe_wm->wm[level].plane_res_b[i], min);
+			total_level_ddb += plane_level_ddb_wm;
+		}
+
+		if (total_level_ddb <= alloc_size)
+			break;
+	}
+
+	if ((level < 0) || (total_min_blocks > alloc_size)) {
+		DRM_DEBUG_KMS("Requested display configuration exceeds system DDB limitations");
+		DRM_DEBUG_KMS("minimum required %d/%d\n", (level < 0) ?
+				total_level_ddb : total_min_blocks, alloc_size);
+		ret = -EINVAL;
+		goto exit;
+	}
+	max_level = level;
+	alloc_size -= total_level_ddb;
+
 	total_data_rate = skl_get_total_relative_data_rate(cstate);
 	if (total_data_rate == 0)
-		return 0;
+		goto exit;
 
 	start = alloc->start;
 	for_each_intel_plane_on_crtc(dev, intel_crtc, intel_plane) {
@@ -3436,7 +3464,8 @@ skl_allocate_pipe_ddb(struct intel_crtc_state *cstate,
 		 * promote the expression to 64 bits to avoid overflowing, the
 		 * result is < available as data_rate / total_data_rate < 1
 		 */
-		plane_blocks = minimum[id];
+		plane_blocks = max(pipe_wm->wm[max_level].plane_res_b[id],
+					minimum[id]);
 		plane_blocks += div_u64((uint64_t)alloc_size * data_rate,
 					total_data_rate);
 
@@ -3444,12 +3473,13 @@ skl_allocate_pipe_ddb(struct intel_crtc_state *cstate,
 		if (data_rate) {
 			ddb->plane[pipe][id].start = start;
 			ddb->plane[pipe][id].end = start + plane_blocks;
+			start += plane_blocks;
 		}
 
-		start += plane_blocks;
-
 		/*
 		 * allocation for y_plane part of planar format:
+		 * TODO: Once we start calculating watermark values for Y/UV
+		 * plane both consider it for initial allowed wm blocks.
 		 */
 		y_data_rate = cstate->wm.skl.plane_y_data_rate[id];
 
@@ -3460,12 +3490,29 @@ skl_allocate_pipe_ddb(struct intel_crtc_state *cstate,
 		if (y_data_rate) {
 			ddb->y_plane[pipe][id].start = start;
 			ddb->y_plane[pipe][id].end = start + y_plane_blocks;
+			start += y_plane_blocks;
 		}
 
-		start += y_plane_blocks;
+		/*
+		 * Now enable all levels in WM structure which can be enabled
+		 * using current DDB allocation
+		 */
+		for (i = ilk_wm_max_level(dev); i >= 0; i--) {
+			if (i > max_level || pipe_wm->wm[i].plane_res_l[id] > 31
+					|| pipe_wm->wm[i].plane_res_b[id] == 0
+					|| pipe_wm->wm[i].plane_res_b[id] >=
+					skl_ddb_entry_size(&ddb->plane[pipe][id])) {
+				pipe_wm->wm[i].plane_en[id] = false;
+				pipe_wm->wm[i].plane_res_b[id] = 0;
+				pipe_wm->wm[i].plane_res_l[id] = 0;
+			} else {
+				pipe_wm->wm[i].plane_en[id] = true;
+			}
+		}
 	}
 
-	return 0;
+exit:
+	return ret;
 }
 
 static uint32_t skl_pipe_pixel_rate(const struct intel_crtc_state *config)
@@ -3536,11 +3583,9 @@ static uint32_t skl_adjusted_plane_pixel_rate(const struct intel_crtc_state *cst
 static int skl_compute_plane_wm(const struct drm_i915_private *dev_priv,
 				struct intel_crtc_state *cstate,
 				struct intel_plane_state *intel_pstate,
-				uint16_t ddb_allocation,
 				int level,
 				uint16_t *out_blocks, /* out */
-				uint8_t *out_lines, /* out */
-				bool *enabled /* out */)
+				uint8_t *out_lines /* out */)
 {
 	struct drm_plane_state *pstate = &intel_pstate->base;
 	struct drm_framebuffer *fb = pstate->fb;
@@ -3554,10 +3599,8 @@ static int skl_compute_plane_wm(const struct drm_i915_private *dev_priv,
 	uint32_t plane_pixel_rate;
 	uint32_t y_tile_minimum, y_min_scanlines;
 
-	if (latency == 0 || !cstate->base.active || !intel_pstate->base.visible) {
-		*enabled = false;
+	if (latency == 0 || !cstate->base.active || !intel_pstate->base.visible)
 		return 0;
-	}
 
 	width = drm_rect_width(&intel_pstate->base.src) >> 16;
 	height = drm_rect_height(&intel_pstate->base.src) >> 16;
@@ -3642,47 +3685,22 @@ static int skl_compute_plane_wm(const struct drm_i915_private *dev_priv,
 		}
 	}
 
-	if (res_blocks >= ddb_allocation || res_lines > 31) {
-		*enabled = false;
-
-		/*
-		 * If there are no valid level 0 watermarks, then we can't
-		 * support this display configuration.
-		 */
-		if (level) {
-			return 0;
-		} else {
-			DRM_DEBUG_KMS("Requested display configuration exceeds system watermark limitations\n");
-			DRM_DEBUG_KMS("Plane %d.%d: blocks required = %u/%u, lines required = %u/31\n",
-				      to_intel_crtc(cstate->base.crtc)->pipe,
-				      skl_wm_plane_id(to_intel_plane(pstate->plane)),
-				      res_blocks, ddb_allocation, res_lines);
-
-			return -EINVAL;
-		}
-	}
-
 	*out_blocks = res_blocks;
 	*out_lines = res_lines;
-	*enabled = true;
 
 	return 0;
 }
 
 static int
 skl_compute_wm_level(const struct drm_i915_private *dev_priv,
-		     struct skl_ddb_allocation *ddb,
 		     struct intel_crtc_state *cstate,
 		     int level,
 		     struct skl_wm_level *result)
 {
 	struct drm_atomic_state *state = cstate->base.state;
-	struct intel_crtc *intel_crtc = to_intel_crtc(cstate->base.crtc);
 	struct drm_plane *plane;
 	struct intel_plane *intel_plane;
 	struct intel_plane_state *intel_pstate;
-	uint16_t ddb_blocks;
-	enum pipe pipe = intel_crtc->pipe;
 	int ret;
 
 	/*
@@ -3721,16 +3739,12 @@ skl_compute_wm_level(const struct drm_i915_private *dev_priv,
 
 		WARN_ON(!intel_pstate->base.fb);
 
-		ddb_blocks = skl_ddb_entry_size(&ddb->plane[pipe][i]);
-
 		ret = skl_compute_plane_wm(dev_priv,
 					   cstate,
 					   intel_pstate,
-					   ddb_blocks,
 					   level,
 					   &result->plane_res_b[i],
-					   &result->plane_res_l[i],
-					   &result->plane_en[i]);
+					   &result->plane_res_l[i]);
 		if (ret)
 			return ret;
 	}
@@ -3779,11 +3793,15 @@ static int skl_build_pipe_wm(struct intel_crtc_state *cstate,
 	int ret;
 
 	for (level = 0; level <= max_level; level++) {
-		ret = skl_compute_wm_level(dev_priv, ddb, cstate,
+		ret = skl_compute_wm_level(dev_priv, cstate,
 					   level, &pipe_wm->wm[level]);
 		if (ret)
 			return ret;
 	}
+	ret = skl_allocate_pipe_ddb(cstate, pipe_wm, ddb);
+	if (ret)
+		return ret;
+
 	pipe_wm->linetime = skl_compute_linetime_wm(cstate);
 
 	skl_compute_transition_wm(cstate, &pipe_wm->trans_wm);
@@ -4006,13 +4024,12 @@ skl_ddb_add_affected_planes(struct intel_crtc_state *cstate)
 }
 
 static int
-skl_compute_ddb(struct drm_atomic_state *state)
+skl_include_affected_pipes(struct drm_atomic_state *state)
 {
 	struct drm_device *dev = state->dev;
 	struct drm_i915_private *dev_priv = to_i915(dev);
 	struct intel_atomic_state *intel_state = to_intel_atomic_state(state);
 	struct intel_crtc *intel_crtc;
-	struct skl_ddb_allocation *ddb = &intel_state->wm_results.ddb;
 	uint32_t realloc_pipes = pipes_modified(state);
 	int ret;
 
@@ -4064,14 +4081,6 @@ skl_compute_ddb(struct drm_atomic_state *state)
 		cstate = intel_atomic_get_crtc_state(state, intel_crtc);
 		if (IS_ERR(cstate))
 			return PTR_ERR(cstate);
-
-		ret = skl_allocate_pipe_ddb(cstate, ddb);
-		if (ret)
-			return ret;
-
-		ret = skl_ddb_add_affected_planes(cstate);
-		if (ret)
-			return ret;
 	}
 
 	return 0;
@@ -4122,19 +4131,15 @@ skl_compute_wm(struct drm_atomic_state *state)
 	/* Clear all dirty flags */
 	results->dirty_pipes = 0;
 
-	ret = skl_compute_ddb(state);
+	ret = skl_include_affected_pipes(state);
 	if (ret)
 		return ret;
 
 	/*
 	 * Calculate WM's for all pipes that are part of this transaction.
-	 * Note that the DDB allocation above may have added more CRTC's that
-	 * weren't otherwise being modified (and set bits in dirty_pipes) if
-	 * pipe allocations had to change.
-	 *
-	 * FIXME:  Now that we're doing this in the atomic check phase, we
-	 * should allow skl_update_pipe_wm() to return failure in cases where
-	 * no suitable watermark values can be found.
+	 * Note that affected pipe calculation above may have added more
+	 * CRTC's that weren't otherwise being modified (and set bits in
+	 * dirty_pipes) if pipe allocations had to change.
 	 */
 	for_each_crtc_in_state(state, crtc, cstate, i) {
 		struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
@@ -4150,6 +4155,10 @@ skl_compute_wm(struct drm_atomic_state *state)
 		if (changed)
 			results->dirty_pipes |= drm_crtc_mask(crtc);
 
+		ret = skl_ddb_add_affected_planes(intel_cstate);
+		if (ret)
+			return ret;
+
 		if ((results->dirty_pipes & drm_crtc_mask(crtc)) == 0)
 			/* This pipe's WM's did not change */
 			continue;
-- 
2.10.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 3/8] drm/i915: Decode system memory bandwidth
  2016-10-13 10:58 [PATCH v4 0/8] New DDB Algo and WM fixes Kumar, Mahesh
  2016-10-13 10:58 ` [PATCH v4 1/8] drm/i915/skl+: use linetime latency instead of ddb size Kumar, Mahesh
  2016-10-13 10:58 ` [PATCH v4 2/8] drm/i915/skl: New ddb allocation algorithm Kumar, Mahesh
@ 2016-10-13 10:58 ` Kumar, Mahesh
  2016-11-03 19:06   ` Paulo Zanoni
  2016-10-13 10:58 ` [PATCH v4 4/8] drm/i915/gen9: WM memory bandwidth related workaround Kumar, Mahesh
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 27+ messages in thread
From: Kumar, Mahesh @ 2016-10-13 10:58 UTC (permalink / raw)
  To: intel-gfx; +Cc: paulo.r.zanoni

This patch adds support to decode system memory bandwidth
which will be used for arbitrated display memory percentage
calculation in GEN9 based system.

Changes from v1:
 - Address comments from Paulo
 - implement decode function for SKL/KBL also

Signed-off-by: "Kumar, Mahesh" <mahesh1.kumar@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c | 170 ++++++++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_drv.h |  14 ++++
 drivers/gpu/drm/i915/i915_reg.h |  38 +++++++++
 3 files changed, 222 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 89d3222..b5f601c 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -979,6 +979,170 @@ static void intel_sanitize_options(struct drm_i915_private *dev_priv)
 	DRM_DEBUG_DRIVER("use GPU sempahores? %s\n", yesno(i915.semaphores));
 }
 
+static inline void skl_memdev_info_fill(struct memdev_info *info, uint32_t val)
+{
+	uint8_t channel_width, rank;
+
+	info->rank_valid = true;
+	channel_width = (val >> SKL_DRAM_CHANNEL_WIDTH_SHIFT) &
+		SKL_DRAM_CHANNEL_WIDTH_MASK;
+	if (channel_width == SKL_DRAM_WIDTH_1)
+		info->channel_width_bytes = 1;
+	else if (channel_width == SKL_DRAM_WIDTH_2)
+		info->channel_width_bytes = 2;
+	else
+		info->channel_width_bytes = 4;
+
+	rank = (val >> SKL_DRAM_RANK_SHIFT) & SKL_DRAM_RANK_MASK;
+	if (rank == SKL_DRAM_RANK_SINGLE)
+		info->rank = DRAM_RANK_SINGLE;
+	else
+		info->rank = DRAM_RANK_DUAL;
+}
+
+static int
+skl_get_memdev_info(struct drm_i915_private *dev_priv)
+{
+	uint32_t val = 0;
+	uint32_t mem_speed = 0;
+	struct memdev_info *memdev_info = &dev_priv->memdev_info;
+
+	val = I915_READ(SKL_MC_BIOS_DATA_0_0_0_MCHBAR_PCU);
+	mem_speed = div_u64((uint64_t) (val & SKL_REQ_DATA_MASK) *
+			SKL_MEMORY_FREQ_MULTIPLIER, 1000);
+
+	if (mem_speed == 0)
+		return -EINVAL;
+
+	memdev_info->valid = true;
+	memdev_info->mem_speed_khz = mem_speed;
+	memdev_info->num_channels  = 0;
+
+	memdev_info->rank_valid = false;
+	val = I915_READ(SKL_MAD_DIMM_CH0_0_0_0_MCHBAR_MCMAIN);
+
+	if (val != 0xFFFFFFFF) {
+		memdev_info->num_channels++;
+		skl_memdev_info_fill(memdev_info, val);
+	}
+
+	val = I915_READ(SKL_MAD_DIMM_CH1_0_0_0_MCHBAR_MCMAIN);
+
+	if (val != 0xFFFFFFFF) {
+		memdev_info->num_channels++;
+		if (!memdev_info->rank_valid)
+			skl_memdev_info_fill(memdev_info, val);
+	}
+
+	if (memdev_info->num_channels == 0)
+		memdev_info->valid = false;
+	return 0;
+}
+
+static int
+bxt_get_memdev_info(struct drm_i915_private *dev_priv)
+{
+	struct memdev_info *memdev_info = &dev_priv->memdev_info;
+	uint32_t dram_channel;
+	uint32_t mem_speed, val;
+	uint8_t num_channel, dram_type;
+	int i;
+
+	val = I915_READ(BXT_P_CR_MC_BIOS_REQ_0_0_0);
+	mem_speed = div_u64((uint64_t) (val & BXT_REQ_DATA_MASK) *
+			SKL_MEMORY_FREQ_MULTIPLIER, 1000);
+
+	if (mem_speed == 0)
+		return -EINVAL;
+
+	memdev_info->valid = true;
+	memdev_info->mem_speed_khz = mem_speed;
+	dram_type = (val >> BXT_DRAM_TYPE_SHIFT) & BXT_DRAM_TYPE_MASK;
+	dram_channel = (val >> BXT_DRAM_CHANNEL_SHIFT) & BXT_DRAM_CHANNEL_MASK;
+	num_channel = hweight32(dram_channel);
+
+	/*
+	 * The lpddr3 and lpddr4 technologies can have 1-4 channels and the
+	 * channels are 32bits wide; while ddr3l technologies can have 1-2
+	 * channels and the channels are 64 bits wide. In case of single 64 bit
+	 * wide DDR3L dimm sets two channel-active bits in
+	 * P_CR_MC_BIOS_REQ_0_0_0 register and system with two DDR3L 64bit dimm
+	 * will set all four channel-active bits in above register.
+	 * In order to get actual number of channels in DDR3L DRAM we need to
+	 * device total channel-active bits set by 2.
+	 */
+
+	switch (dram_type) {
+	case BXT_DRAM_TYPE_LPDDR3:
+	case BXT_DRAM_TYPE_LPDDR4:
+		memdev_info->channel_width_bytes = 4;
+		memdev_info->num_channels = num_channel;
+		break;
+	case BXT_DRAM_TYPE_DDR3L:
+		memdev_info->channel_width_bytes = 8;
+		memdev_info->num_channels = num_channel / 2;
+		break;
+	default:
+		DRM_DEBUG_KMS("Unknown DRAM type\n");
+		memdev_info->valid = false;
+		return -EINVAL;
+	}
+
+	/*
+	 * Now read each DUNIT8/9/10/11 to check the rank of each dimms.
+	 * all the dimms should have same rank as in first valid Dimm
+	 */
+	memdev_info->rank_valid = false;
+	for (i = 0; i < BXT_D_CR_DRP0_DUNIT_MAX; i++) {
+		val = I915_READ(BXT_D_CR_DRP0_DUNIT(i));
+		if (val != 0xFFFFFFFF) {
+			uint8_t rank;
+
+			memdev_info->rank_valid = true;
+			rank = val & BXT_DRAM_RANK_MASK;
+			if (rank == BXT_DRAM_RANK_SINGLE)
+				memdev_info->rank = DRAM_RANK_SINGLE;
+			else if (rank == BXT_DRAM_RANK_DUAL)
+				memdev_info->rank = DRAM_RANK_DUAL;
+			else
+				memdev_info->rank = DRAM_RANK_NONE;
+
+			break;
+		}
+	}
+	return 0;
+}
+
+static void
+intel_get_memdev_info(struct drm_i915_private *dev_priv)
+{
+	struct memdev_info *memdev_info = &dev_priv->memdev_info;
+	int ret;
+
+	memdev_info->valid = false;
+	if (!IS_GEN9(dev_priv))
+		return;
+
+	if (IS_BROXTON(dev_priv))
+		ret = bxt_get_memdev_info(dev_priv);
+	else
+		ret = skl_get_memdev_info(dev_priv);
+	if (ret)
+		return;
+
+	if (memdev_info->valid) {
+		DRM_DEBUG_DRIVER("DRAM speed-%d Khz total-channels-%d channel-width-%d bytes\n",
+				memdev_info->mem_speed_khz,
+				memdev_info->num_channels,
+				memdev_info->channel_width_bytes);
+		if (memdev_info->rank_valid)
+			DRM_DEBUG_DRIVER("DRAM rank-%s\n",
+			(memdev_info->rank == DRAM_RANK_DUAL) ?
+						"dual" : "single");
+	}
+}
+
+
 /**
  * i915_driver_init_hw - setup state requiring device access
  * @dev_priv: device private
@@ -1082,6 +1246,12 @@ static int i915_driver_init_hw(struct drm_i915_private *dev_priv)
 			DRM_DEBUG_DRIVER("can't enable MSI");
 	}
 
+	/*
+	 * Fill the memdev structure to get the system raw bandwidth
+	 * This will be used by WM algorithm, to implement GEN9 based WA
+	 */
+	intel_get_memdev_info(dev_priv);
+
 	return 0;
 
 out_ggtt:
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index a219a35..adbd9aa 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2057,6 +2057,20 @@ struct drm_i915_private {
 		bool distrust_bios_wm;
 	} wm;
 
+	struct memdev_info {
+		bool valid;
+		uint32_t mem_speed_khz;
+		uint8_t channel_width_bytes;
+		uint8_t num_channels;
+		bool rank_valid;
+		enum {
+			DRAM_RANK_NONE = 0,
+			DRAM_RANK_SINGLE,
+			DRAM_RANK_DUAL
+		} rank;
+	} memdev_info;
+
+
 	struct i915_runtime_pm pm;
 
 	/* Abstract the submission mechanism (legacy ringbuffer or execlists) away */
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index acc767a..a9c467c 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -7721,6 +7721,44 @@ enum {
 #define  DC_STATE_DEBUG_MASK_CORES	(1<<0)
 #define  DC_STATE_DEBUG_MASK_MEMORY_UP	(1<<1)
 
+#define _MMIO_MCHBAR_PIPE(x, a, b)	_MMIO(MCHBAR_MIRROR_BASE_SNB + _PIPE(x, a, b))
+#define BXT_P_CR_MC_BIOS_REQ_0_0_0	_MMIO(MCHBAR_MIRROR_BASE_SNB + 0x7114)
+#define BXT_REQ_DATA_MASK		(0x3F << 0)
+#define BXT_DRAM_TYPE_SHIFT		24
+#define BXT_DRAM_TYPE_MASK		0x7
+#define BXT_DRAM_CHANNEL_SHIFT		12
+#define BXT_DRAM_CHANNEL_MASK		0xF
+
+#define BXT_DRAM_TYPE_LPDDR3		0x1
+#define BXT_DRAM_TYPE_LPDDR4		0x2
+#define BXT_DRAM_TYPE_DDR3L		0x4
+/*
+ * BIOS programs this field of REQ_DATA [5:0] in integer
+ * multiple of 133330 KHz (133.33MHz)
+ */
+#define	SKL_MEMORY_FREQ_MULTIPLIER		0x208D2
+#define BXT_D_CR_DRP0_DUNIT8			0x1000
+#define BXT_D_CR_DRP0_DUNIT9			0x1200
+#define BXT_D_CR_DRP0_DUNIT_MAX			4
+#define BXT_D_CR_DRP0_DUNIT(x)	_MMIO_MCHBAR_PIPE(x, BXT_D_CR_DRP0_DUNIT8, BXT_D_CR_DRP0_DUNIT9)
+#define BXT_DRAM_RANK_MASK			0x3
+#define BXT_DRAM_RANK_SINGLE			0x1
+#define BXT_DRAM_RANK_DUAL			0x3
+
+#define SKL_MC_BIOS_DATA_0_0_0_MCHBAR_PCU	_MMIO(MCHBAR_MIRROR_BASE_SNB + 0x5E04)
+#define SKL_REQ_DATA_MASK			(0xF << 0)
+#define SKL_MAD_DIMM_CH0_0_0_0_MCHBAR_MCMAIN	_MMIO(MCHBAR_MIRROR_BASE_SNB + 0x500C)
+#define SKL_MAD_DIMM_CH1_0_0_0_MCHBAR_MCMAIN	_MMIO(MCHBAR_MIRROR_BASE_SNB + 0x5010)
+#define SKL_DRAM_CHANNEL_WIDTH_MASK		0x3
+#define SKL_DRAM_CHANNEL_WIDTH_SHIFT		8
+#define SKL_DRAM_WIDTH_1			0x0
+#define SKL_DRAM_WIDTH_2			0x1
+#define SKL_DRAM_WIDTH_4			0x2
+#define SKL_DRAM_RANK_MASK			0x1
+#define SKL_DRAM_RANK_SHIFT			10
+#define SKL_DRAM_RANK_SINGLE			0x0
+#define SKL_DRAM_RANK_DUAL			0x1
+
 /* Please see hsw_read_dcomp() and hsw_write_dcomp() before using this register,
  * since on HSW we can't write to it using I915_WRITE. */
 #define D_COMP_HSW			_MMIO(MCHBAR_MIRROR_BASE_SNB + 0x5F0C)
-- 
2.10.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 4/8] drm/i915/gen9: WM memory bandwidth related workaround
  2016-10-13 10:58 [PATCH v4 0/8] New DDB Algo and WM fixes Kumar, Mahesh
                   ` (2 preceding siblings ...)
  2016-10-13 10:58 ` [PATCH v4 3/8] drm/i915: Decode system memory bandwidth Kumar, Mahesh
@ 2016-10-13 10:58 ` Kumar, Mahesh
  2016-11-04 17:09   ` Paulo Zanoni
  2016-10-13 10:58 ` [PATCH v4 5/8] drm/i915/skl+: reset y_plane ddb structure also during calculation Kumar, Mahesh
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 27+ messages in thread
From: Kumar, Mahesh @ 2016-10-13 10:58 UTC (permalink / raw)
  To: intel-gfx; +Cc: paulo.r.zanoni

This patch implemnets Workariunds related to display arbitrated memory
bandwidth. These WA are applicabe for all gen-9 based platforms.

Changes since v1:
 - Rebase on top of Paulo's patch series
Changes since v2:
 - Rebase/rework after addressing Paulo's comments in previous patch

Signed-off-by: "Kumar, Mahesh" <mahesh1.kumar@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h  |   9 +++
 drivers/gpu/drm/i915/intel_drv.h |  11 +++
 drivers/gpu/drm/i915/intel_pm.c  | 146 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 166 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index adbd9aa..c169360 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1092,6 +1092,13 @@ enum intel_sbi_destination {
 	SBI_MPHY,
 };
 
+/* SKL+ Watermark arbitrated display bandwidth Workarounds */
+enum watermark_memory_wa {
+	WATERMARK_WA_NONE,
+	WATERMARK_WA_X_TILED,
+	WATERMARK_WA_Y_TILED,
+};
+
 #define QUIRK_PIPEA_FORCE (1<<0)
 #define QUIRK_LVDS_SSC_DISABLE (1<<1)
 #define QUIRK_INVERT_BRIGHTNESS (1<<2)
@@ -1644,6 +1651,8 @@ struct skl_ddb_allocation {
 
 struct skl_wm_values {
 	unsigned dirty_pipes;
+	/* any WaterMark memory workaround Required */
+	enum watermark_memory_wa mem_wa;
 	struct skl_ddb_allocation ddb;
 	uint32_t wm_linetime[I915_MAX_PIPES];
 	uint32_t plane[I915_MAX_PIPES][I915_MAX_PLANES][8];
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index f48e79a..2c1897b 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -1813,6 +1813,17 @@ intel_atomic_get_crtc_state(struct drm_atomic_state *state,
 	return to_intel_crtc_state(crtc_state);
 }
 
+static inline struct intel_crtc_state *
+intel_atomic_get_existing_crtc_state(struct drm_atomic_state *state,
+				      struct intel_crtc *crtc)
+{
+	struct drm_crtc_state *crtc_state;
+
+	crtc_state = drm_atomic_get_existing_crtc_state(state, &crtc->base);
+
+	return to_intel_crtc_state(crtc_state);
+}
+
 static inline struct intel_plane_state *
 intel_atomic_get_existing_plane_state(struct drm_atomic_state *state,
 				      struct intel_plane *plane)
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 84ec6b1..5b8f715 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -3589,6 +3589,8 @@ static int skl_compute_plane_wm(const struct drm_i915_private *dev_priv,
 {
 	struct drm_plane_state *pstate = &intel_pstate->base;
 	struct drm_framebuffer *fb = pstate->fb;
+	struct intel_atomic_state *intel_state =
+			to_intel_atomic_state(cstate->base.state);
 	uint32_t latency = dev_priv->wm.skl_latency[level];
 	uint32_t method1, method2;
 	uint32_t plane_bytes_per_line, plane_blocks_per_line;
@@ -3598,10 +3600,17 @@ static int skl_compute_plane_wm(const struct drm_i915_private *dev_priv,
 	uint32_t width = 0, height = 0;
 	uint32_t plane_pixel_rate;
 	uint32_t y_tile_minimum, y_min_scanlines;
+	enum watermark_memory_wa mem_wa;
 
 	if (latency == 0 || !cstate->base.active || !intel_pstate->base.visible)
 		return 0;
 
+	mem_wa = intel_state ? intel_state->wm_results.mem_wa : WATERMARK_WA_NONE;
+	if (mem_wa != WATERMARK_WA_NONE) {
+		if (fb->modifier[0] == I915_FORMAT_MOD_X_TILED)
+			latency += 15;
+	}
+
 	width = drm_rect_width(&intel_pstate->base.src) >> 16;
 	height = drm_rect_height(&intel_pstate->base.src) >> 16;
 
@@ -3634,6 +3643,9 @@ static int skl_compute_plane_wm(const struct drm_i915_private *dev_priv,
 		y_min_scanlines = 4;
 	}
 
+	if (mem_wa == WATERMARK_WA_Y_TILED)
+		y_min_scanlines *= 2;
+
 	plane_bytes_per_line = width * cpp;
 	if (fb->modifier[0] == I915_FORMAT_MOD_Y_TILED ||
 	    fb->modifier[0] == I915_FORMAT_MOD_Yf_TILED) {
@@ -4075,6 +4087,15 @@ skl_include_affected_pipes(struct drm_atomic_state *state)
 		intel_state->wm_results.dirty_pipes = ~0;
 	}
 
+	/*
+	 * If Watermark workaround is changed we need to recalculate
+	 * watermark values for all active pipes
+	 */
+	if (intel_state->wm_results.mem_wa != dev_priv->wm.skl_hw.mem_wa) {
+		realloc_pipes = ~0;
+		intel_state->wm_results.dirty_pipes = ~0;
+	}
+
 	for_each_intel_crtc_mask(dev, intel_crtc, realloc_pipes) {
 		struct intel_crtc_state *cstate;
 
@@ -4087,6 +4108,129 @@ skl_include_affected_pipes(struct drm_atomic_state *state)
 }
 
 static void
+skl_set_memory_bandwidth_wm_wa(struct drm_atomic_state *state)
+{
+	struct drm_device *dev = state->dev;
+	struct drm_i915_private *dev_priv = to_i915(dev);
+	struct intel_crtc *intel_crtc;
+	struct intel_plane_state *intel_pstate;
+	struct intel_atomic_state *intel_state = to_intel_atomic_state(state);
+	struct memdev_info *memdev_info = &dev_priv->memdev_info;
+	int num_active_plane, num_active_pipe;
+	uint32_t plane_bw, max_plane_bw, pipe_bw, max_pipe_bw;
+	uint32_t total_pipe_bw;
+	uint32_t system_bw = 0;
+	uint32_t rank;
+	int x_tile_per;
+	int display_bw_per;
+	bool y_tile_enabled = false;
+
+	if (!memdev_info->valid)
+		goto exit;
+
+	system_bw = memdev_info->mem_speed_khz * memdev_info->num_channels *
+				memdev_info->channel_width_bytes;
+
+	if (!system_bw)
+		goto exit;
+
+	max_pipe_bw = 0;
+	for_each_intel_crtc(dev, intel_crtc) {
+		struct intel_crtc_state *cstate;
+		struct intel_plane *plane;
+
+		/*
+		 * If CRTC is part of current atomic commit, get crtc state from
+		 * existing CRTC state. else take the cached CRTC state
+		 */
+		cstate = NULL;
+		if (state)
+			cstate = intel_atomic_get_existing_crtc_state(state,
+					intel_crtc);
+		if (!cstate)
+			cstate = to_intel_crtc_state(intel_crtc->base.state);
+
+		if (!cstate->base.active)
+			continue;
+
+		num_active_plane = 0;
+		max_plane_bw = 0;
+		for_each_intel_plane_mask(dev, plane, cstate->base.plane_mask) {
+			struct drm_framebuffer *fb = NULL;
+
+			intel_pstate = NULL;
+			if (state)
+				intel_pstate =
+				intel_atomic_get_existing_plane_state(state,
+									plane);
+			if (!intel_pstate)
+				intel_pstate =
+					to_intel_plane_state(plane->base.state);
+
+			WARN_ON(!intel_pstate->base.fb);
+
+			if (!intel_pstate->base.visible)
+				continue;
+
+			fb = intel_pstate->base.fb;
+			if (fb && (fb->modifier[0] == I915_FORMAT_MOD_Y_TILED ||
+				fb->modifier[0] == I915_FORMAT_MOD_Yf_TILED))
+				y_tile_enabled = true;
+
+			plane_bw = skl_adjusted_plane_pixel_rate(cstate,
+								intel_pstate);
+			max_plane_bw = max(plane_bw, max_plane_bw);
+			num_active_plane++;
+		}
+		pipe_bw = max_plane_bw * num_active_plane;
+		max_pipe_bw = max(pipe_bw, max_pipe_bw);
+	}
+
+	if (intel_state->active_pipe_changes)
+		num_active_pipe = hweight32(intel_state->active_crtcs);
+	else
+		num_active_pipe = hweight32(dev_priv->active_crtcs);
+
+	total_pipe_bw = max_pipe_bw * num_active_pipe;
+
+	display_bw_per = DIV_ROUND_UP_ULL(total_pipe_bw * 100, system_bw * 1000);
+
+	/*
+	 * If there is any Ytile plane enabled and arbitrated display
+	 * bandwidth > 20% of raw system memory bandwidth
+	 * Enale Y-tile related WA
+	 *
+	 * If memory is dual channel single rank, Xtile limit = 35%, else Xtile
+	 * limit = 60%
+	 * If there is no Ytile plane enabled and
+	 * arbitrated display bandwidth > Xtile limit
+	 * Enable X-tile realated WA
+	 */
+	if (y_tile_enabled && (display_bw_per > 20))
+		intel_state->wm_results.mem_wa = WATERMARK_WA_Y_TILED;
+	else {
+
+		if (memdev_info->rank_valid)
+			rank = memdev_info->rank;
+		else
+			rank = DRAM_RANK_DUAL; /* Assume we are dual rank */
+
+		if ((rank == DRAM_RANK_SINGLE) &&
+					(memdev_info->num_channels == 2))
+			x_tile_per = 35;
+		else
+			x_tile_per = 60;
+
+		if (display_bw_per > x_tile_per)
+			intel_state->wm_results.mem_wa = WATERMARK_WA_X_TILED;
+	}
+	return;
+
+exit:
+	intel_state->wm_results.mem_wa = WATERMARK_WA_NONE;
+}
+
+static void
 skl_copy_wm_for_pipe(struct skl_wm_values *dst,
 		     struct skl_wm_values *src,
 		     enum pipe pipe)
@@ -4131,6 +4275,8 @@ skl_compute_wm(struct drm_atomic_state *state)
 	/* Clear all dirty flags */
 	results->dirty_pipes = 0;
 
+	skl_set_memory_bandwidth_wm_wa(state);
+
 	ret = skl_include_affected_pipes(state);
 	if (ret)
 		return ret;
-- 
2.10.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 5/8] drm/i915/skl+: reset y_plane ddb structure also during calculation
  2016-10-13 10:58 [PATCH v4 0/8] New DDB Algo and WM fixes Kumar, Mahesh
                   ` (3 preceding siblings ...)
  2016-10-13 10:58 ` [PATCH v4 4/8] drm/i915/gen9: WM memory bandwidth related workaround Kumar, Mahesh
@ 2016-10-13 10:58 ` Kumar, Mahesh
  2016-11-04 17:39   ` Paulo Zanoni
  2016-10-13 10:58 ` [PATCH v4 6/8] drm/i915/skl: Add variables to check x_tile and y_tile Kumar, Mahesh
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 27+ messages in thread
From: Kumar, Mahesh @ 2016-10-13 10:58 UTC (permalink / raw)
  To: intel-gfx; +Cc: paulo.r.zanoni

From: Mahesh Kumar <mahesh1.kumar@intel.com>

Current code clears only plane ddb allocation if total ddb allocated to
pipe in zero. y_plane ddb still contains old value, clear that as well.

Signed-off-by: Mahesh Kumar <mahesh1.kumar@intel.com>
---
 drivers/gpu/drm/i915/intel_pm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 5b8f715..a668204 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -3381,6 +3381,7 @@ skl_allocate_pipe_ddb(struct intel_crtc_state *cstate,
 	alloc_size = skl_ddb_entry_size(alloc);
 	if (alloc_size == 0) {
 		memset(ddb->plane[pipe], 0, sizeof(ddb->plane[pipe]));
+		memset(ddb->y_plane[pipe], 0, sizeof(ddb->y_plane[pipe]));
 		return 0;
 	}
 
-- 
2.10.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 6/8] drm/i915/skl: Add variables to check x_tile and y_tile
  2016-10-13 10:58 [PATCH v4 0/8] New DDB Algo and WM fixes Kumar, Mahesh
                   ` (4 preceding siblings ...)
  2016-10-13 10:58 ` [PATCH v4 5/8] drm/i915/skl+: reset y_plane ddb structure also during calculation Kumar, Mahesh
@ 2016-10-13 10:58 ` Kumar, Mahesh
  2016-11-04 17:45   ` Paulo Zanoni
  2016-10-13 10:58 ` [PATCH v4 7/8] drm/i915/skl+: change WM calc to fixed point 16.16 Kumar, Mahesh
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 27+ messages in thread
From: Kumar, Mahesh @ 2016-10-13 10:58 UTC (permalink / raw)
  To: intel-gfx; +Cc: paulo.r.zanoni

From: Mahesh Kumar <mahesh1.kumar@intel.com>

This patch adds variable to check for X_tiled & y_tiled planes, instead
of always checking against framebuffer-modifiers.

Changes:
 - Created separate patch as per Paulo's comment
 - Added x_tiled variable as well

Signed-off-by: Mahesh Kumar <mahesh1.kumar@intel.com>
---
 drivers/gpu/drm/i915/intel_pm.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index a668204..0eaaadc 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -3602,6 +3602,7 @@ static int skl_compute_plane_wm(const struct drm_i915_private *dev_priv,
 	uint32_t plane_pixel_rate;
 	uint32_t y_tile_minimum, y_min_scanlines;
 	enum watermark_memory_wa mem_wa;
+	bool y_tiled = false, x_tiled = false;
 
 	if (latency == 0 || !cstate->base.active || !intel_pstate->base.visible)
 		return 0;
@@ -3621,6 +3622,12 @@ static int skl_compute_plane_wm(const struct drm_i915_private *dev_priv,
 	cpp = drm_format_plane_cpp(fb->pixel_format, 0);
 	plane_pixel_rate = skl_adjusted_plane_pixel_rate(cstate, intel_pstate);
 
+	if (fb->modifier[0] == I915_FORMAT_MOD_Y_TILED ||
+	    fb->modifier[0] == I915_FORMAT_MOD_Yf_TILED)
+		y_tiled = true;
+	else if (fb->modifier[0] == I915_FORMAT_MOD_X_TILED)
+		x_tiled = true;
+
 	if (intel_rotation_90_or_270(pstate->rotation)) {
 		int cpp = (fb->pixel_format == DRM_FORMAT_NV12) ?
 			drm_format_plane_cpp(fb->pixel_format, 1) :
@@ -3648,16 +3655,15 @@ static int skl_compute_plane_wm(const struct drm_i915_private *dev_priv,
 		y_min_scanlines *= 2;
 
 	plane_bytes_per_line = width * cpp;
-	if (fb->modifier[0] == I915_FORMAT_MOD_Y_TILED ||
-	    fb->modifier[0] == I915_FORMAT_MOD_Yf_TILED) {
+	if (y_tiled) {
 		plane_blocks_per_line =
 		      DIV_ROUND_UP(plane_bytes_per_line * y_min_scanlines, 512);
 		plane_blocks_per_line /= y_min_scanlines;
-	} else if (fb->modifier[0] == DRM_FORMAT_MOD_NONE) {
+	} else if (x_tiled) {
+		plane_blocks_per_line = DIV_ROUND_UP(plane_bytes_per_line, 512);
+	} else {
 		plane_blocks_per_line = DIV_ROUND_UP(plane_bytes_per_line, 512)
 					+ 1;
-	} else {
-		plane_blocks_per_line = DIV_ROUND_UP(plane_bytes_per_line, 512);
 	}
 
 	method1 = skl_wm_method1(plane_pixel_rate, cpp, latency);
@@ -3668,8 +3674,7 @@ static int skl_compute_plane_wm(const struct drm_i915_private *dev_priv,
 
 	y_tile_minimum = plane_blocks_per_line * y_min_scanlines;
 
-	if (fb->modifier[0] == I915_FORMAT_MOD_Y_TILED ||
-	    fb->modifier[0] == I915_FORMAT_MOD_Yf_TILED) {
+	if (y_tiled) {
 		selected_result = max(method2, y_tile_minimum);
 	} else {
 		uint32_t linetime_us = 0;
@@ -3689,8 +3694,7 @@ static int skl_compute_plane_wm(const struct drm_i915_private *dev_priv,
 	res_lines = DIV_ROUND_UP(selected_result, plane_blocks_per_line);
 
 	if (level >= 1 && level <= 7) {
-		if (fb->modifier[0] == I915_FORMAT_MOD_Y_TILED ||
-		    fb->modifier[0] == I915_FORMAT_MOD_Yf_TILED) {
+		if (y_tiled) {
 			res_blocks += y_tile_minimum;
 			res_lines += y_min_scanlines;
 		} else {
-- 
2.10.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 7/8] drm/i915/skl+: change WM calc to fixed point 16.16
  2016-10-13 10:58 [PATCH v4 0/8] New DDB Algo and WM fixes Kumar, Mahesh
                   ` (5 preceding siblings ...)
  2016-10-13 10:58 ` [PATCH v4 6/8] drm/i915/skl: Add variables to check x_tile and y_tile Kumar, Mahesh
@ 2016-10-13 10:58 ` Kumar, Mahesh
  2016-11-04 19:03   ` Paulo Zanoni
  2016-10-13 10:58 ` [PATCH v4 8/8] drm/i915/bxt: Enable IPC support Kumar, Mahesh
  2016-10-13 13:50 ` ✗ Fi.CI.BAT: warning for New DDB Algo and WM fixes (rev4) Patchwork
  8 siblings, 1 reply; 27+ messages in thread
From: Kumar, Mahesh @ 2016-10-13 10:58 UTC (permalink / raw)
  To: intel-gfx; +Cc: paulo.r.zanoni

From: Mahesh Kumar <mahesh1.kumar@intel.com>

This patch changes Watermak calculation to fixed point calculation.
Problem with current calculation is during plane_blocks_per_line
calculation we divide intermediate blocks with min_scanlines and
takes floor of the result because of integer operation.
hence we end-up assigning less blocks than required. Which leads to
flickers.

Signed-off-by: Mahesh Kumar <mahesh1.kumar@intel.com>
---
 drivers/gpu/drm/i915/intel_pm.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 0eaaadc..4263212 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -3527,16 +3527,19 @@ static uint32_t skl_pipe_pixel_rate(const struct intel_crtc_state *config)
  * for the read latency) and cpp should always be <= 8, so that
  * should allow pixel_rate up to ~2 GHz which seems sufficient since max
  * 2xcdclk is 1350 MHz and the pixel rate should never exceed that.
+ * Both Method1 & Method2 returns fixedpoint 16.16 output
 */
 static uint32_t skl_wm_method1(uint32_t pixel_rate, uint8_t cpp, uint32_t latency)
 {
-	uint32_t wm_intermediate_val, ret;
+	uint64_t wm_intermediate_val;
+	uint32_t ret;
 
 	if (latency == 0)
 		return UINT_MAX;
 
-	wm_intermediate_val = latency * pixel_rate * cpp / 512;
-	ret = DIV_ROUND_UP(wm_intermediate_val, 1000);
+	wm_intermediate_val = latency * pixel_rate * cpp;
+	wm_intermediate_val <<= 16;
+	ret = DIV_ROUND_UP_ULL(wm_intermediate_val, 1000 * 512);
 
 	return ret;
 }
@@ -3658,12 +3661,15 @@ static int skl_compute_plane_wm(const struct drm_i915_private *dev_priv,
 	if (y_tiled) {
 		plane_blocks_per_line =
 		      DIV_ROUND_UP(plane_bytes_per_line * y_min_scanlines, 512);
-		plane_blocks_per_line /= y_min_scanlines;
+		plane_blocks_per_line = (plane_blocks_per_line << 16) /
+								y_min_scanlines;
 	} else if (x_tiled) {
 		plane_blocks_per_line = DIV_ROUND_UP(plane_bytes_per_line, 512);
+		plane_blocks_per_line <<= 16;
 	} else {
 		plane_blocks_per_line = DIV_ROUND_UP(plane_bytes_per_line, 512)
 					+ 1;
+		plane_blocks_per_line <<= 16;
 	}
 
 	method1 = skl_wm_method1(plane_pixel_rate, cpp, latency);
@@ -3690,7 +3696,7 @@ static int skl_compute_plane_wm(const struct drm_i915_private *dev_priv,
 			selected_result = method1;
 	}
 
-	res_blocks = selected_result + 1;
+	res_blocks = DIV_ROUND_UP(selected_result, 1 << 16) + 1;
 	res_lines = DIV_ROUND_UP(selected_result, plane_blocks_per_line);
 
 	if (level >= 1 && level <= 7) {
-- 
2.10.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 8/8] drm/i915/bxt: Enable IPC support
  2016-10-13 10:58 [PATCH v4 0/8] New DDB Algo and WM fixes Kumar, Mahesh
                   ` (6 preceding siblings ...)
  2016-10-13 10:58 ` [PATCH v4 7/8] drm/i915/skl+: change WM calc to fixed point 16.16 Kumar, Mahesh
@ 2016-10-13 10:58 ` Kumar, Mahesh
  2016-10-13 11:19   ` Maarten Lankhorst
  2016-11-04 19:20   ` Paulo Zanoni
  2016-10-13 13:50 ` ✗ Fi.CI.BAT: warning for New DDB Algo and WM fixes (rev4) Patchwork
  8 siblings, 2 replies; 27+ messages in thread
From: Kumar, Mahesh @ 2016-10-13 10:58 UTC (permalink / raw)
  To: intel-gfx; +Cc: paulo.r.zanoni

From: Mahesh Kumar <mahesh1.kumar@intel.com>

This patch adds IPC support for platforms. This patch enables IPC
only for BXT/KBL platform as for SKL recommendation is to keep is disabled.
IPC (Isochronous Priority Control) is the hardware feature, which
dynamically controles the memory read priority of Display.

When IPC is enabled, plane read requests are sent at high priority until
filling above the transition watermark, then the requests are sent at
lower priority until dropping below the level 0 watermark.
The lower priority requests allow other memory clients to have better
memory access. When IPC is disabled, all plane read requests are sent at
high priority.

Changes since V1:
 - Remove commandline parameter to disable ipc
 - Address Paulo's comments

Signed-off-by: Mahesh Kumar <mahesh1.kumar@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c  |  2 ++
 drivers/gpu/drm/i915/i915_reg.h  |  1 +
 drivers/gpu/drm/i915/intel_drv.h |  1 +
 drivers/gpu/drm/i915/intel_pm.c  | 15 +++++++++++++++
 4 files changed, 19 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index b5f601c..58abbaa 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1415,6 +1415,8 @@ int i915_driver_load(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	intel_runtime_pm_enable(dev_priv);
 
+	intel_enable_ipc(dev_priv);
+
 	/* Everything is in place, we can now relax! */
 	DRM_INFO("Initialized %s %d.%d.%d %s for %s on minor %d\n",
 		 driver.name, driver.major, driver.minor, driver.patchlevel,
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index a9c467c..c9ebf23 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -6144,6 +6144,7 @@ enum {
 #define  DISP_FBC_WM_DIS		(1<<15)
 #define DISP_ARB_CTL2	_MMIO(0x45004)
 #define  DISP_DATA_PARTITION_5_6	(1<<6)
+#define  DISP_IPC_ENABLE		(1<<3)
 #define DBUF_CTL	_MMIO(0x45008)
 #define  DBUF_POWER_REQUEST		(1<<31)
 #define  DBUF_POWER_STATE		(1<<30)
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index 2c1897b..45b0fa4 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -1766,6 +1766,7 @@ void skl_write_plane_wm(struct intel_crtc *intel_crtc,
 uint32_t ilk_pipe_pixel_rate(const struct intel_crtc_state *pipe_config);
 bool ilk_disable_lp_wm(struct drm_device *dev);
 int sanitize_rc6_option(struct drm_i915_private *dev_priv, int enable_rc6);
+void intel_enable_ipc(struct drm_i915_private *dev_priv);
 static inline int intel_enable_rc6(void)
 {
 	return i915.enable_rc6;
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 4263212..543aa5d 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -4833,6 +4833,21 @@ void intel_update_watermarks(struct drm_crtc *crtc)
 		dev_priv->display.update_wm(crtc);
 }
 
+void intel_enable_ipc(struct drm_i915_private *dev_priv)
+{
+	u32 val;
+
+	/* enable IPC only for Broxton for now*/
+	if (!IS_BROXTON(dev_priv) || !IS_KABYLAKE(dev_priv))
+		return;
+
+	val = I915_READ(DISP_ARB_CTL2);
+
+	val |= DISP_IPC_ENABLE;
+
+	I915_WRITE(DISP_ARB_CTL2, val);
+}
+
 /*
  * Lock protecting IPS related data structures
  */
-- 
2.10.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 8/8] drm/i915/bxt: Enable IPC support
  2016-10-13 10:58 ` [PATCH v4 8/8] drm/i915/bxt: Enable IPC support Kumar, Mahesh
@ 2016-10-13 11:19   ` Maarten Lankhorst
  2016-10-13 13:01     ` Mahesh Kumar
  2016-11-04 19:20   ` Paulo Zanoni
  1 sibling, 1 reply; 27+ messages in thread
From: Maarten Lankhorst @ 2016-10-13 11:19 UTC (permalink / raw)
  To: Kumar, Mahesh, intel-gfx; +Cc: paulo.r.zanoni

Op 13-10-16 om 12:58 schreef Kumar, Mahesh:
> From: Mahesh Kumar <mahesh1.kumar@intel.com>
>
> This patch adds IPC support for platforms. This patch enables IPC
> only for BXT/KBL platform as for SKL recommendation is to keep is disabled.
> IPC (Isochronous Priority Control) is the hardware feature, which
> dynamically controles the memory read priority of Display.
>
> When IPC is enabled, plane read requests are sent at high priority until
> filling above the transition watermark, then the requests are sent at
> lower priority until dropping below the level 0 watermark.
> The lower priority requests allow other memory clients to have better
> memory access. When IPC is disabled, all plane read requests are sent at
> high priority.
>
> Changes since V1:
>  - Remove commandline parameter to disable ipc
>  - Address Paulo's comments
>
> Signed-off-by: Mahesh Kumar <mahesh1.kumar@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.c  |  2 ++
>  drivers/gpu/drm/i915/i915_reg.h  |  1 +
>  drivers/gpu/drm/i915/intel_drv.h |  1 +
>  drivers/gpu/drm/i915/intel_pm.c  | 15 +++++++++++++++
>  4 files changed, 19 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index b5f601c..58abbaa 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -1415,6 +1415,8 @@ int i915_driver_load(struct pci_dev *pdev, const struct pci_device_id *ent)
>  
>  	intel_runtime_pm_enable(dev_priv);
>  
> +	intel_enable_ipc(dev_priv);
> +
>  	/* Everything is in place, we can now relax! */
>  	DRM_INFO("Initialized %s %d.%d.%d %s for %s on minor %d\n",
>  		 driver.name, driver.major, driver.minor, driver.patchlevel,
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index a9c467c..c9ebf23 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -6144,6 +6144,7 @@ enum {
>  #define  DISP_FBC_WM_DIS		(1<<15)
>  #define DISP_ARB_CTL2	_MMIO(0x45004)
>  #define  DISP_DATA_PARTITION_5_6	(1<<6)
> +#define  DISP_IPC_ENABLE		(1<<3)
>  #define DBUF_CTL	_MMIO(0x45008)
>  #define  DBUF_POWER_REQUEST		(1<<31)
>  #define  DBUF_POWER_STATE		(1<<30)
> diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> index 2c1897b..45b0fa4 100644
> --- a/drivers/gpu/drm/i915/intel_drv.h
> +++ b/drivers/gpu/drm/i915/intel_drv.h
> @@ -1766,6 +1766,7 @@ void skl_write_plane_wm(struct intel_crtc *intel_crtc,
>  uint32_t ilk_pipe_pixel_rate(const struct intel_crtc_state *pipe_config);
>  bool ilk_disable_lp_wm(struct drm_device *dev);
>  int sanitize_rc6_option(struct drm_i915_private *dev_priv, int enable_rc6);
> +void intel_enable_ipc(struct drm_i915_private *dev_priv);
>  static inline int intel_enable_rc6(void)
>  {
>  	return i915.enable_rc6;
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index 4263212..543aa5d 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -4833,6 +4833,21 @@ void intel_update_watermarks(struct drm_crtc *crtc)
>  		dev_priv->display.update_wm(crtc);
>  }
>  
> +void intel_enable_ipc(struct drm_i915_private *dev_priv)
> +{
> +	u32 val;
> +
> +	/* enable IPC only for Broxton for now*/
> +	if (!IS_BROXTON(dev_priv) || !IS_KABYLAKE(dev_priv))
> +		return;
> +
This comment doesn't match the code.
Is it ok to enable IPC right away? Not when the driver is writing the first watermarks? (distrust_bios_wm)
And what about suspend/resume, should this flag be set again after resume?

~Maarten
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 8/8] drm/i915/bxt: Enable IPC support
  2016-10-13 11:19   ` Maarten Lankhorst
@ 2016-10-13 13:01     ` Mahesh Kumar
  2016-10-13 15:36       ` Daniel Vetter
  0 siblings, 1 reply; 27+ messages in thread
From: Mahesh Kumar @ 2016-10-13 13:01 UTC (permalink / raw)
  To: Maarten Lankhorst, intel-gfx; +Cc: paulo.r.zanoni

Hi,


On Thursday 13 October 2016 04:49 PM, Maarten Lankhorst wrote:
> Op 13-10-16 om 12:58 schreef Kumar, Mahesh:
>> From: Mahesh Kumar <mahesh1.kumar@intel.com>
>>
>> This patch adds IPC support for platforms. This patch enables IPC
>> only for BXT/KBL platform as for SKL recommendation is to keep is disabled.
>> IPC (Isochronous Priority Control) is the hardware feature, which
>> dynamically controles the memory read priority of Display.
>>
>> When IPC is enabled, plane read requests are sent at high priority until
>> filling above the transition watermark, then the requests are sent at
>> lower priority until dropping below the level 0 watermark.
>> The lower priority requests allow other memory clients to have better
>> memory access. When IPC is disabled, all plane read requests are sent at
>> high priority.
>>
>> Changes since V1:
>>   - Remove commandline parameter to disable ipc
>>   - Address Paulo's comments
>>
>> Signed-off-by: Mahesh Kumar <mahesh1.kumar@intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_drv.c  |  2 ++
>>   drivers/gpu/drm/i915/i915_reg.h  |  1 +
>>   drivers/gpu/drm/i915/intel_drv.h |  1 +
>>   drivers/gpu/drm/i915/intel_pm.c  | 15 +++++++++++++++
>>   4 files changed, 19 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
>> index b5f601c..58abbaa 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.c
>> +++ b/drivers/gpu/drm/i915/i915_drv.c
>> @@ -1415,6 +1415,8 @@ int i915_driver_load(struct pci_dev *pdev, const struct pci_device_id *ent)
>>   
>>   	intel_runtime_pm_enable(dev_priv);
>>   
>> +	intel_enable_ipc(dev_priv);
>> +
>>   	/* Everything is in place, we can now relax! */
>>   	DRM_INFO("Initialized %s %d.%d.%d %s for %s on minor %d\n",
>>   		 driver.name, driver.major, driver.minor, driver.patchlevel,
>> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
>> index a9c467c..c9ebf23 100644
>> --- a/drivers/gpu/drm/i915/i915_reg.h
>> +++ b/drivers/gpu/drm/i915/i915_reg.h
>> @@ -6144,6 +6144,7 @@ enum {
>>   #define  DISP_FBC_WM_DIS		(1<<15)
>>   #define DISP_ARB_CTL2	_MMIO(0x45004)
>>   #define  DISP_DATA_PARTITION_5_6	(1<<6)
>> +#define  DISP_IPC_ENABLE		(1<<3)
>>   #define DBUF_CTL	_MMIO(0x45008)
>>   #define  DBUF_POWER_REQUEST		(1<<31)
>>   #define  DBUF_POWER_STATE		(1<<30)
>> diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
>> index 2c1897b..45b0fa4 100644
>> --- a/drivers/gpu/drm/i915/intel_drv.h
>> +++ b/drivers/gpu/drm/i915/intel_drv.h
>> @@ -1766,6 +1766,7 @@ void skl_write_plane_wm(struct intel_crtc *intel_crtc,
>>   uint32_t ilk_pipe_pixel_rate(const struct intel_crtc_state *pipe_config);
>>   bool ilk_disable_lp_wm(struct drm_device *dev);
>>   int sanitize_rc6_option(struct drm_i915_private *dev_priv, int enable_rc6);
>> +void intel_enable_ipc(struct drm_i915_private *dev_priv);
>>   static inline int intel_enable_rc6(void)
>>   {
>>   	return i915.enable_rc6;
>> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
>> index 4263212..543aa5d 100644
>> --- a/drivers/gpu/drm/i915/intel_pm.c
>> +++ b/drivers/gpu/drm/i915/intel_pm.c
>> @@ -4833,6 +4833,21 @@ void intel_update_watermarks(struct drm_crtc *crtc)
>>   		dev_priv->display.update_wm(crtc);
>>   }
>>   
>> +void intel_enable_ipc(struct drm_i915_private *dev_priv)
>> +{
>> +	u32 val;
>> +
>> +	/* enable IPC only for Broxton for now*/
>> +	if (!IS_BROXTON(dev_priv) || !IS_KABYLAKE(dev_priv))
>> +		return;
>> +
> This comment doesn't match the code.
> Is it ok to enable IPC right away? Not when the driver is writing the first watermarks? (distrust_bios_wm)
> And what about suspend/resume, should this flag be set again after resume?
>
> ~Maarten
hmm.. comment should have been Broxton/Kabylake.
Yes we can enable IPC at any time. In future BIOS itself may enable IPC. 
(though I'm not sure about the behavior if WM programmed by BIOS are not 
correct)
We don't reset (save/restore) this during suspend/resume, it's onetime 
programming.

Regards,
-Mahesh
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 27+ messages in thread

* ✗ Fi.CI.BAT: warning for New DDB Algo and WM fixes (rev4)
  2016-10-13 10:58 [PATCH v4 0/8] New DDB Algo and WM fixes Kumar, Mahesh
                   ` (7 preceding siblings ...)
  2016-10-13 10:58 ` [PATCH v4 8/8] drm/i915/bxt: Enable IPC support Kumar, Mahesh
@ 2016-10-13 13:50 ` Patchwork
  8 siblings, 0 replies; 27+ messages in thread
From: Patchwork @ 2016-10-13 13:50 UTC (permalink / raw)
  To: Kumar, Mahesh; +Cc: intel-gfx

== Series Details ==

Series: New DDB Algo and WM fixes (rev4)
URL   : https://patchwork.freedesktop.org/series/12222/
State : warning

== Summary ==

Series 12222v4 New DDB Algo and WM fixes
https://patchwork.freedesktop.org/api/1.0/series/12222/revisions/4/mbox/

Test gem_exec_suspend:
        Subgroup basic-s3:
                pass       -> DMESG-WARN (fi-hsw-4770)
Test kms_busy:
        Subgroup basic-flip-default-b:
                dmesg-warn -> PASS       (fi-hsw-4770)
        Subgroup basic-flip-default-c:
                dmesg-warn -> PASS       (fi-hsw-4770)
Test kms_cursor_legacy:
        Subgroup basic-busy-flip-before-cursor-legacy:
                dmesg-warn -> PASS       (fi-hsw-4770)
        Subgroup basic-busy-flip-before-cursor-varying-size:
                dmesg-warn -> PASS       (fi-hsw-4770)
        Subgroup basic-flip-after-cursor-legacy:
                dmesg-warn -> PASS       (fi-hsw-4770)
        Subgroup basic-flip-after-cursor-varying-size:
                dmesg-warn -> PASS       (fi-hsw-4770)
Test kms_pipe_crc_basic:
        Subgroup bad-nb-words-3:
                dmesg-warn -> PASS       (fi-ilk-650)
        Subgroup bad-pipe:
                dmesg-warn -> PASS       (fi-hsw-4770)
        Subgroup hang-read-crc-pipe-b:
                dmesg-warn -> PASS       (fi-hsw-4770)
        Subgroup hang-read-crc-pipe-c:
                dmesg-warn -> PASS       (fi-hsw-4770)
        Subgroup nonblocking-crc-pipe-a:
                dmesg-warn -> PASS       (fi-hsw-4770)
        Subgroup nonblocking-crc-pipe-a-frame-sequence:
                dmesg-warn -> PASS       (fi-hsw-4770)
        Subgroup nonblocking-crc-pipe-b:
                dmesg-warn -> PASS       (fi-hsw-4770)
        Subgroup nonblocking-crc-pipe-b-frame-sequence:
                dmesg-warn -> PASS       (fi-hsw-4770)
        Subgroup nonblocking-crc-pipe-c:
                dmesg-warn -> PASS       (fi-hsw-4770)
        Subgroup nonblocking-crc-pipe-c-frame-sequence:
                dmesg-warn -> PASS       (fi-hsw-4770)
        Subgroup read-crc-pipe-a:
                dmesg-warn -> PASS       (fi-hsw-4770)
        Subgroup read-crc-pipe-b:
                dmesg-warn -> PASS       (fi-hsw-4770)
        Subgroup read-crc-pipe-b-frame-sequence:
                dmesg-warn -> PASS       (fi-hsw-4770)
                pass       -> DMESG-WARN (fi-ilk-650)
        Subgroup suspend-read-crc-pipe-a:
                dmesg-warn -> PASS       (fi-skl-6700k)
                dmesg-warn -> PASS       (fi-byt-j1900)
        Subgroup suspend-read-crc-pipe-b:
                pass       -> DMESG-WARN (fi-byt-j1900)
        Subgroup suspend-read-crc-pipe-c:
                pass       -> DMESG-WARN (fi-hsw-4770)
Test pm_rpm:
        Subgroup basic-rte:
                pass       -> DMESG-WARN (fi-hsw-4770)
Test vgem_basic:
        Subgroup unload:
                skip       -> PASS       (fi-bdw-5557u)

fi-bdw-5557u     total:246  pass:231  dwarn:0   dfail:0   fail:0   skip:15 
fi-bsw-n3050     total:246  pass:204  dwarn:0   dfail:0   fail:0   skip:42 
fi-bxt-t5700     total:246  pass:216  dwarn:0   dfail:0   fail:0   skip:30 
fi-byt-j1900     total:246  pass:213  dwarn:1   dfail:0   fail:1   skip:31 
fi-byt-n2820     total:246  pass:210  dwarn:0   dfail:0   fail:1   skip:35 
fi-hsw-4770      total:246  pass:221  dwarn:3   dfail:0   fail:0   skip:22 
fi-hsw-4770r     total:246  pass:224  dwarn:0   dfail:0   fail:0   skip:22 
fi-ilk-650       total:246  pass:183  dwarn:1   dfail:0   fail:2   skip:60 
fi-ivb-3520m     total:246  pass:221  dwarn:0   dfail:0   fail:0   skip:25 
fi-ivb-3770      total:246  pass:221  dwarn:0   dfail:0   fail:0   skip:25 
fi-kbl-7200u     total:246  pass:222  dwarn:0   dfail:0   fail:0   skip:24 
fi-skl-6260u     total:246  pass:232  dwarn:0   dfail:0   fail:0   skip:14 
fi-skl-6700hq    total:246  pass:223  dwarn:0   dfail:0   fail:0   skip:23 
fi-skl-6700k     total:246  pass:221  dwarn:1   dfail:0   fail:0   skip:24 
fi-skl-6770hq    total:246  pass:230  dwarn:1   dfail:0   fail:1   skip:14 
fi-snb-2520m     total:246  pass:210  dwarn:0   dfail:0   fail:0   skip:36 
fi-snb-2600      total:246  pass:209  dwarn:0   dfail:0   fail:0   skip:37 

Results at /archive/results/CI_IGT_test/Patchwork_2703/

ab76ad934eba2e427d4b1db1313a9f30a009c075 drm-intel-nightly: 2016y-10m-13d-12h-30m-12s UTC integration manifest
5f867e4 drm/i915/skl+: use linetime latency instead of ddb size

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 8/8] drm/i915/bxt: Enable IPC support
  2016-10-13 13:01     ` Mahesh Kumar
@ 2016-10-13 15:36       ` Daniel Vetter
  0 siblings, 0 replies; 27+ messages in thread
From: Daniel Vetter @ 2016-10-13 15:36 UTC (permalink / raw)
  To: Mahesh Kumar; +Cc: intel-gfx, paulo.r.zanoni

On Thu, Oct 13, 2016 at 06:31:37PM +0530, Mahesh Kumar wrote:
> Hi,
> 
> 
> On Thursday 13 October 2016 04:49 PM, Maarten Lankhorst wrote:
> > Op 13-10-16 om 12:58 schreef Kumar, Mahesh:
> > > From: Mahesh Kumar <mahesh1.kumar@intel.com>
> > > 
> > > This patch adds IPC support for platforms. This patch enables IPC
> > > only for BXT/KBL platform as for SKL recommendation is to keep is disabled.
> > > IPC (Isochronous Priority Control) is the hardware feature, which
> > > dynamically controles the memory read priority of Display.
> > > 
> > > When IPC is enabled, plane read requests are sent at high priority until
> > > filling above the transition watermark, then the requests are sent at
> > > lower priority until dropping below the level 0 watermark.
> > > The lower priority requests allow other memory clients to have better
> > > memory access. When IPC is disabled, all plane read requests are sent at
> > > high priority.
> > > 
> > > Changes since V1:
> > >   - Remove commandline parameter to disable ipc
> > >   - Address Paulo's comments
> > > 
> > > Signed-off-by: Mahesh Kumar <mahesh1.kumar@intel.com>
> > > ---
> > >   drivers/gpu/drm/i915/i915_drv.c  |  2 ++
> > >   drivers/gpu/drm/i915/i915_reg.h  |  1 +
> > >   drivers/gpu/drm/i915/intel_drv.h |  1 +
> > >   drivers/gpu/drm/i915/intel_pm.c  | 15 +++++++++++++++
> > >   4 files changed, 19 insertions(+)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> > > index b5f601c..58abbaa 100644
> > > --- a/drivers/gpu/drm/i915/i915_drv.c
> > > +++ b/drivers/gpu/drm/i915/i915_drv.c
> > > @@ -1415,6 +1415,8 @@ int i915_driver_load(struct pci_dev *pdev, const struct pci_device_id *ent)
> > >   	intel_runtime_pm_enable(dev_priv);
> > > +	intel_enable_ipc(dev_priv);
> > > +
> > >   	/* Everything is in place, we can now relax! */
> > >   	DRM_INFO("Initialized %s %d.%d.%d %s for %s on minor %d\n",
> > >   		 driver.name, driver.major, driver.minor, driver.patchlevel,
> > > diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> > > index a9c467c..c9ebf23 100644
> > > --- a/drivers/gpu/drm/i915/i915_reg.h
> > > +++ b/drivers/gpu/drm/i915/i915_reg.h
> > > @@ -6144,6 +6144,7 @@ enum {
> > >   #define  DISP_FBC_WM_DIS		(1<<15)
> > >   #define DISP_ARB_CTL2	_MMIO(0x45004)
> > >   #define  DISP_DATA_PARTITION_5_6	(1<<6)
> > > +#define  DISP_IPC_ENABLE		(1<<3)
> > >   #define DBUF_CTL	_MMIO(0x45008)
> > >   #define  DBUF_POWER_REQUEST		(1<<31)
> > >   #define  DBUF_POWER_STATE		(1<<30)
> > > diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> > > index 2c1897b..45b0fa4 100644
> > > --- a/drivers/gpu/drm/i915/intel_drv.h
> > > +++ b/drivers/gpu/drm/i915/intel_drv.h
> > > @@ -1766,6 +1766,7 @@ void skl_write_plane_wm(struct intel_crtc *intel_crtc,
> > >   uint32_t ilk_pipe_pixel_rate(const struct intel_crtc_state *pipe_config);
> > >   bool ilk_disable_lp_wm(struct drm_device *dev);
> > >   int sanitize_rc6_option(struct drm_i915_private *dev_priv, int enable_rc6);
> > > +void intel_enable_ipc(struct drm_i915_private *dev_priv);
> > >   static inline int intel_enable_rc6(void)
> > >   {
> > >   	return i915.enable_rc6;
> > > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> > > index 4263212..543aa5d 100644
> > > --- a/drivers/gpu/drm/i915/intel_pm.c
> > > +++ b/drivers/gpu/drm/i915/intel_pm.c
> > > @@ -4833,6 +4833,21 @@ void intel_update_watermarks(struct drm_crtc *crtc)
> > >   		dev_priv->display.update_wm(crtc);
> > >   }
> > > +void intel_enable_ipc(struct drm_i915_private *dev_priv)
> > > +{
> > > +	u32 val;
> > > +
> > > +	/* enable IPC only for Broxton for now*/
> > > +	if (!IS_BROXTON(dev_priv) || !IS_KABYLAKE(dev_priv))
> > > +		return;
> > > +
> > This comment doesn't match the code.
> > Is it ok to enable IPC right away? Not when the driver is writing the first watermarks? (distrust_bios_wm)
> > And what about suspend/resume, should this flag be set again after resume?
> > 
> > ~Maarten
> hmm.. comment should have been Broxton/Kabylake.
> Yes we can enable IPC at any time. In future BIOS itself may enable IPC.
> (though I'm not sure about the behavior if WM programmed by BIOS are not
> correct)
> We don't reset (save/restore) this during suspend/resume, it's onetime
> programming.

The comment is also not really useful, since it's obvious from the code
what's going on. If you add a comment, explain _why_ you're doing
something. The _what_ should be clear from the code flow, if not you need
to refactor.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 1/8] drm/i915/skl+: use linetime latency instead of ddb size
  2016-10-13 10:58 ` [PATCH v4 1/8] drm/i915/skl+: use linetime latency instead of ddb size Kumar, Mahesh
@ 2016-10-31 18:03   ` Paulo Zanoni
  2016-11-09 14:58     ` Mahesh Kumar
  0 siblings, 1 reply; 27+ messages in thread
From: Paulo Zanoni @ 2016-10-31 18:03 UTC (permalink / raw)
  To: Kumar, Mahesh, intel-gfx

Em Qui, 2016-10-13 às 16:28 +0530, Kumar, Mahesh escreveu:
> This patch make changes to use linetime latency instead of allocated
> DDB size during plane watermark calculation in switch case, This is
> required to implement new DDB allocation algorithm.
> 
> In New Algorithm DDB is allocated based on WM values, because of
> which
> number of DDB blocks will not be available during WM calculation,
> So this "linetime latency" is suggested by SV/HW team to use during
> switch-case for WM blocks selection.
> 
> Changes since v1:
>  - Rebase on top of Paulo's patch series
> Changes since v2:
>  - Fix if-else condition (pointed by Maarten)
> 
> Signed-off-by: "Kumar, Mahesh" <mahesh1.kumar@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_pm.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_pm.c
> b/drivers/gpu/drm/i915/intel_pm.c
> index 7f1748a..098336d 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -3616,10 +3616,14 @@ static int skl_compute_plane_wm(const struct
> drm_i915_private *dev_priv,
>  	    fb->modifier[0] == I915_FORMAT_MOD_Yf_TILED) {
>  		selected_result = max(method2, y_tile_minimum);
>  	} else {
> +		uint32_t linetime_us = 0;
> +
> +		linetime_us = DIV_ROUND_UP(width * 1000,
> +				skl_pipe_pixel_rate(cstate));

Can't we just call skl_compute_linetime_wm() here? I don't like having
two pieces of the code computing the same thing. My last round of bug
fixes included a fix for duplicated code that got out of sync after
spec changes.

>  		if ((cpp * cstate->base.adjusted_mode.crtc_htotal /
> 512 < 1) &&
>  		    (plane_bytes_per_line / 512 < 1))
>  			selected_result = method2;
> -		else if ((ddb_allocation / plane_blocks_per_line) >=
> 1)
> +		else if (latency >= linetime_us)

Still doesn't match the spec. The "ddb_allocation /
planes_block_per_line" is still necessary.

>  			selected_result = min(method1, method2);
>  		else
>  			selected_result = method1;
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 3/8] drm/i915: Decode system memory bandwidth
  2016-10-13 10:58 ` [PATCH v4 3/8] drm/i915: Decode system memory bandwidth Kumar, Mahesh
@ 2016-11-03 19:06   ` Paulo Zanoni
  2016-11-16 11:48     ` Mahesh Kumar
  0 siblings, 1 reply; 27+ messages in thread
From: Paulo Zanoni @ 2016-11-03 19:06 UTC (permalink / raw)
  To: Kumar, Mahesh, intel-gfx

Em Qui, 2016-10-13 às 16:28 +0530, Kumar, Mahesh escreveu:
> This patch adds support to decode system memory bandwidth
> which will be used for arbitrated display memory percentage
> calculation in GEN9 based system.
> 
> Changes from v1:
>  - Address comments from Paulo
>  - implement decode function for SKL/KBL also

Again, my understanding of these registers is not good so I will ask
some questions that will help me properly understand and review the
code. I may also end up asking some questions that don't make sense.
Please correct me in case any of my assumptions is wrong.

Perhaps answers to my questions could become code comments since I
suppose most of the gfx team is not super familiar with the memory
regs, and we're going to have to maintain the code.


> 
> Signed-off-by: "Kumar, Mahesh" <mahesh1.kumar@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.c | 170
> ++++++++++++++++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/i915_drv.h |  14 ++++
>  drivers/gpu/drm/i915/i915_reg.h |  38 +++++++++
>  3 files changed, 222 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.c
> b/drivers/gpu/drm/i915/i915_drv.c
> index 89d3222..b5f601c 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -979,6 +979,170 @@ static void intel_sanitize_options(struct
> drm_i915_private *dev_priv)
>  	DRM_DEBUG_DRIVER("use GPU sempahores? %s\n",
> yesno(i915.semaphores));
>  }
>  
> +static inline void skl_memdev_info_fill(struct memdev_info *info,
> uint32_t val)
> +{
> +	uint8_t channel_width, rank;
> +
> +	info->rank_valid = true;
> +	channel_width = (val >> SKL_DRAM_CHANNEL_WIDTH_SHIFT) &
> +		SKL_DRAM_CHANNEL_WIDTH_MASK;

Why are we looking at the L bits instead of the S bits? What's the
difference between them?


> +	if (channel_width == SKL_DRAM_WIDTH_1)
> +		info->channel_width_bytes = 1;
> +	else if (channel_width == SKL_DRAM_WIDTH_2)
> +		info->channel_width_bytes = 2;
> +	else
> +		info->channel_width_bytes = 4;

This is not checking for the invalid value of 0x3. Perhaps a switch()
instead of an if() ladder would be better here.

> +
> +	rank = (val >> SKL_DRAM_RANK_SHIFT) & SKL_DRAM_RANK_MASK;
> +	if (rank == SKL_DRAM_RANK_SINGLE)
> +		info->rank = DRAM_RANK_SINGLE;
> +	else
> +		info->rank = DRAM_RANK_DUAL;
> +}
> +
> +static int
> +skl_get_memdev_info(struct drm_i915_private *dev_priv)
> +{
> +	uint32_t val = 0;
> +	uint32_t mem_speed = 0;
> +	struct memdev_info *memdev_info = &dev_priv->memdev_info;
> +
> +	val = I915_READ(SKL_MC_BIOS_DATA_0_0_0_MCHBAR_PCU);
> +	mem_speed = div_u64((uint64_t) (val & SKL_REQ_DATA_MASK) *
> +			SKL_MEMORY_FREQ_MULTIPLIER, 1000);

But but but isn't it possible to have more than 2GHz in SKL? I think I
have a big SKL machine with more than 2GHz here... I'll check it later.
Also, don't we need to check FREQ_TYPE and ODD_RATIO?

Some digging suggests that maybe we need to figure out somehow if we're
using DCLK or QCLK and multiply accordingly (based on the BIOS_REQ
register instead of BIOS_DATA).

I'd really need some clarification on how all of this works.

Also, it looks like there's no need for this to be a 64bit calculation
due to the mask limit.


> +
> +	if (mem_speed == 0)
> +		return -EINVAL;
> +
> +	memdev_info->valid = true;
> +	memdev_info->mem_speed_khz = mem_speed;
> +	memdev_info->num_channels  = 0;
> +
> +	memdev_info->rank_valid = false;
> +	val = I915_READ(SKL_MAD_DIMM_CH0_0_0_0_MCHBAR_MCMAIN);
> +
> +	if (val != 0xFFFFFFFF) {
> +		memdev_info->num_channels++;
> +		skl_memdev_info_fill(memdev_info, val);
> +	}
> +
> +	val = I915_READ(SKL_MAD_DIMM_CH1_0_0_0_MCHBAR_MCMAIN);
> +
> +	if (val != 0xFFFFFFFF) {
> +		memdev_info->num_channels++;
> +		if (!memdev_info->rank_valid)
> +			skl_memdev_info_fill(memdev_info, val);
> +	}

A simple iteration over the two register addresses would have saved
some code :).

Also, isn't it possible to have channels with different width/rank?
Shouldn't our code be sanity-checking this? Perhaps grab the width/rank
for each channel and compare. When the specs are unclear or don't
exist, it's probably better if our code is able to check/validate its
own assumptions.


> +
> +	if (memdev_info->num_channels == 0)
> +		memdev_info->valid = false;
> +	return 0;
> +}
> +
> +static int
> +bxt_get_memdev_info(struct drm_i915_private *dev_priv)
> +{
> +	struct memdev_info *memdev_info = &dev_priv->memdev_info;
> +	uint32_t dram_channel;
> +	uint32_t mem_speed, val;
> +	uint8_t num_channel, dram_type;
> +	int i;
> +
> +	val = I915_READ(BXT_P_CR_MC_BIOS_REQ_0_0_0);
> +	mem_speed = div_u64((uint64_t) (val & BXT_REQ_DATA_MASK) *
> +			SKL_MEMORY_FREQ_MULTIPLIER, 1000);

AFAIU, there's no need for 64 bit division since the maximum value
is 8399790.


> +
> +	if (mem_speed == 0)
> +		return -EINVAL;
> +
> +	memdev_info->valid = true;
> +	memdev_info->mem_speed_khz = mem_speed;
> +	dram_type = (val >> BXT_DRAM_TYPE_SHIFT) &
> BXT_DRAM_TYPE_MASK;
> +	dram_channel = (val >> BXT_DRAM_CHANNEL_SHIFT) &
> BXT_DRAM_CHANNEL_MASK;
> +	num_channel = hweight32(dram_channel);
> +
> +	/*
> +	 * The lpddr3 and lpddr4 technologies can have 1-4 channels
> and the
> +	 * channels are 32bits wide; while ddr3l technologies can
> have 1-2
> +	 * channels and the channels are 64 bits wide. In case of
> single 64 bit
> +	 * wide DDR3L dimm sets two channel-active bits in
> +	 * P_CR_MC_BIOS_REQ_0_0_0 register and system with two DDR3L
> 64bit dimm
> +	 * will set all four channel-active bits in above register.
> +	 * In order to get actual number of channels in DDR3L DRAM
> we need to
> +	 * device total channel-active bits set by 2.
> +	 */
> +
> +	switch (dram_type) {
> +	case BXT_DRAM_TYPE_LPDDR3:
> +	case BXT_DRAM_TYPE_LPDDR4:
> +		memdev_info->channel_width_bytes = 4;
> +		memdev_info->num_channels = num_channel;
> +		break;
> +	case BXT_DRAM_TYPE_DDR3L:
> +		memdev_info->channel_width_bytes = 8;
> +		memdev_info->num_channels = num_channel / 2;
> +		break;
> +	default:
> +		DRM_DEBUG_KMS("Unknown DRAM type\n");
> +		memdev_info->valid = false;
> +		return -EINVAL;
> +	}

Shouldn't channel_width_bytes just be num_channels * 4 for LPDDR3/4 and
num_channels * 8 for DDR3L?

Or just multiply by 4 before we do the "num_channel / 2"... This would
avoid even having to find out the DRAM type (in case it's not needed
anywhere else).

> +
> +	/*
> +	 * Now read each DUNIT8/9/10/11 to check the rank of each
> dimms.
> +	 * all the dimms should have same rank as in first valid
> Dimm
> +	 */


I think we need to explain in the comment why we're iterating over the
4 dunits but just looking at the first one that's not 0xFFFFFFFF.

Isn't there a way to discover which dunits to look at based on the
number of channels and DRAM type?

Also, does the valid dunit amount reflect the number of channels like
in SKL?


> +	memdev_info->rank_valid = false;
> +	for (i = 0; i < BXT_D_CR_DRP0_DUNIT_MAX; i++) {
> +		val = I915_READ(BXT_D_CR_DRP0_DUNIT(i));
> +		if (val != 0xFFFFFFFF) {
> +			uint8_t rank;
> +
> +			memdev_info->rank_valid = true;
> +			rank = val & BXT_DRAM_RANK_MASK;
> +			if (rank == BXT_DRAM_RANK_SINGLE)
> +				memdev_info->rank =
> DRAM_RANK_SINGLE;
> +			else if (rank == BXT_DRAM_RANK_DUAL)
> +				memdev_info->rank = DRAM_RANK_DUAL;
> +			else
> +				memdev_info->rank = DRAM_RANK_NONE;

Probably DRM_DEBUG_KMS("something\n") here? Also maybe set rank_valid
back to false? Return non-zero?


> +
> +			break;
> +		}
> +	}
> +	return 0;
> +}
> +
> +static void
> +intel_get_memdev_info(struct drm_i915_private *dev_priv)
> +{
> +	struct memdev_info *memdev_info = &dev_priv->memdev_info;
> +	int ret;
> +
> +	memdev_info->valid = false;
> +	if (!IS_GEN9(dev_priv))
> +		return;
> +
> +	if (IS_BROXTON(dev_priv))
> +		ret = bxt_get_memdev_info(dev_priv);
> +	else
> +		ret = skl_get_memdev_info(dev_priv);
> +	if (ret)
> +		return;

What I find a little hard to read is that we set memdev_info->valid
both here and in the functions we call. Due to the early returns it's
not super simple to see the possible value it may get, and things get
even more complicated due to the fact that we may return 0 from
skl_get_memdev_info() while still setting memdev_info->valid to false.

Things would be much simpler and easier to read if we only set
memdev_info->valid in this function, based on the return value of the
get_memdev_info() functions. And then the get_memdev_info() functions
would appropriately return zero/nonzero based on valid/invalid
configuration.


> +
> +	if (memdev_info->valid) {
> +		DRM_DEBUG_DRIVER("DRAM speed-%d Khz total-channels-
> %d channel-width-%d bytes\n",
> +				memdev_info->mem_speed_khz,
> +				memdev_info->num_channels,
> +				memdev_info->channel_width_bytes);

As I mentioned in my previous review, please use the appropriate
specifiers here instead of %d for everybody.

Also, we usually print "variable: value, other variable: value" instead
of "variable-value other-variable-value". Please use the established
convention. Reading "total-channels-1" is less clear than "total
channels: 1" and doesn't raise ambiguity concerns with negative values.


> +		if (memdev_info->rank_valid)
> +			DRM_DEBUG_DRIVER("DRAM rank-%s\n",
> +			(memdev_info->rank == DRAM_RANK_DUAL) ?
> +						"dual" : "single");

But there's also DRAM_RANK_NONE.


> +	}
> +}
> +
> +
>  /**
>   * i915_driver_init_hw - setup state requiring device access
>   * @dev_priv: device private
> @@ -1082,6 +1246,12 @@ static int i915_driver_init_hw(struct
> drm_i915_private *dev_priv)
>  			DRM_DEBUG_DRIVER("can't enable MSI");
>  	}
>  
> +	/*
> +	 * Fill the memdev structure to get the system raw bandwidth
> +	 * This will be used by WM algorithm, to implement GEN9
> based WA
> +	 */
> +	intel_get_memdev_info(dev_priv);
> +
>  	return 0;
>  
>  out_ggtt:
> diff --git a/drivers/gpu/drm/i915/i915_drv.h
> b/drivers/gpu/drm/i915/i915_drv.h
> index a219a35..adbd9aa 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2057,6 +2057,20 @@ struct drm_i915_private {
>  		bool distrust_bios_wm;
>  	} wm;
>  
> +	struct memdev_info {
> +		bool valid;
> +		uint32_t mem_speed_khz;
> +		uint8_t channel_width_bytes;
> +		uint8_t num_channels;
> +		bool rank_valid;
> +		enum {
> +			DRAM_RANK_NONE = 0,
> +			DRAM_RANK_SINGLE,
> +			DRAM_RANK_DUAL
> +		} rank;

What's the meaning of DRAM_RANK_NONE? Does DRAM_RANK_NONE imply
rank_valid=false? What's the meaning of rank=DRAM_RANK_NONE when
rank_valid=true?

Perhaps we could get rid of the rank_valid variable? Maybe rename
DRAM_RANK_NONE to DRAM_RANK_INVALID?


> +	} memdev_info;
> +
> +
>  	struct i915_runtime_pm pm;
>  
>  	/* Abstract the submission mechanism (legacy ringbuffer or
> execlists) away */
> diff --git a/drivers/gpu/drm/i915/i915_reg.h
> b/drivers/gpu/drm/i915/i915_reg.h
> index acc767a..a9c467c 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -7721,6 +7721,44 @@ enum {
>  #define  DC_STATE_DEBUG_MASK_CORES	(1<<0)
>  #define  DC_STATE_DEBUG_MASK_MEMORY_UP	(1<<1)
>  
> +#define _MMIO_MCHBAR_PIPE(x, a, b)	_MMIO(MCHBAR_MIRROR_BASE_S
> NB + _PIPE(x, a, b))

Wait, what? I may have misunderstood something, but AFAICS the
registers that use this macro are not per-pipe, so using the _PIPE
macro makes things super confusing. In this case it should be fine to
just do something a little more hardcoded, like:

#define BXT_D_CR_DRP0_DUNIT(unit) _MMIO(MCHBAR_MIRROR_BASE_SNB + (unit)
* 0x200)

	
> +#define BXT_P_CR_MC_BIOS_REQ_0_0_0	_MMIO(MCHBAR_MIRROR_BASE_S
> NB + 0x7114)
> +#define BXT_REQ_DATA_MASK		(0x3F << 0)

Or just 0x3F to be consistent with the other definitions.


> +#define BXT_DRAM_TYPE_SHIFT		24
> +#define BXT_DRAM_TYPE_MASK		0x7

The spec you sent me lists these bits as reserved. Are you sure these
come from this register?


> +#define BXT_DRAM_CHANNEL_SHIFT		12
> +#define BXT_DRAM_CHANNEL_MASK		0xF

I'd rename these to BXT_DRAM_CHANNEL_ACTIVE_{SHIFT,MASK}.


> +
> +#define BXT_DRAM_TYPE_LPDDR3		0x1
> +#define BXT_DRAM_TYPE_LPDDR4		0x2
> +#define BXT_DRAM_TYPE_DDR3L		0x4
> +/*
> + * BIOS programs this field of REQ_DATA [5:0] in integer
> + * multiple of 133330 KHz (133.33MHz)
> + */
> +#define	SKL_MEMORY_FREQ_MULTIPLIER		0x208D2

There's a bad tab here that should be a space.

Also, it makes a looooot more sense to use the decimal value instead of
0x208D2.


> +#define BXT_D_CR_DRP0_DUNIT8			0x1000
> +#define BXT_D_CR_DRP0_DUNIT9			0x1200
> +#define BXT_D_CR_DRP0_DUNIT_MAX			4
> +#define BXT_D_CR_DRP0_DUNIT(x)	_MMIO_MCHBAR_PIPE(x,
> BXT_D_CR_DRP0_DUNIT8, BXT_D_CR_DRP0_DUNIT9)
> +#define BXT_DRAM_RANK_MASK			0x3
> +#define BXT_DRAM_RANK_SINGLE			0x1
> +#define BXT_DRAM_RANK_DUAL			0x3
> +
> +#define SKL_MC_BIOS_DATA_0_0_0_MCHBAR_PCU	_MMIO(MCHBAR_MIRROR
> _BASE_SNB + 0x5E04)

Why does BXT look at BIOS_REQ while SKL looks at BIOS_DATA? Shouldn't
SKL also look at BIOS_DATA or BXT look at BIOS_REQ? What's the big
difference? Also, that comment mentioning that we may sometimes get
zero is a little worrying: shouldn't we make sure we always read when
it's non-zero?

Thanks for the patch, and sorry for the huge amount of questions here:
learning while doing the review is sometimes tricky.

> +#define SKL_REQ_DATA_MASK			(0xF << 0)
> +#define SKL_MAD_DIMM_CH0_0_0_0_MCHBAR_MCMAIN	_MMIO(MCHBAR_MIR
> ROR_BASE_SNB + 0x500C)
> +#define SKL_MAD_DIMM_CH1_0_0_0_MCHBAR_MCMAIN	_MMIO(MCHBAR_MIR
> ROR_BASE_SNB + 0x5010)
> +#define SKL_DRAM_CHANNEL_WIDTH_MASK		0x3
> +#define SKL_DRAM_CHANNEL_WIDTH_SHIFT		8
> +#define SKL_DRAM_WIDTH_1			0x0
> +#define SKL_DRAM_WIDTH_2			0x1
> +#define SKL_DRAM_WIDTH_4			0x2
> +#define SKL_DRAM_RANK_MASK			0x1
> +#define SKL_DRAM_RANK_SHIFT			10
> +#define SKL_DRAM_RANK_SINGLE			0x0
> +#define SKL_DRAM_RANK_DUAL			0x1
> +
>  /* Please see hsw_read_dcomp() and hsw_write_dcomp() before using
> this register,
>   * since on HSW we can't write to it using I915_WRITE. */
>  #define D_COMP_HSW			_MMIO(MCHBAR_MIRROR_BASE_S
> NB + 0x5F0C)
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 4/8] drm/i915/gen9: WM memory bandwidth related workaround
  2016-10-13 10:58 ` [PATCH v4 4/8] drm/i915/gen9: WM memory bandwidth related workaround Kumar, Mahesh
@ 2016-11-04 17:09   ` Paulo Zanoni
  2016-11-04 17:22     ` Ville Syrjälä
  2016-11-10  5:54     ` Mahesh Kumar
  0 siblings, 2 replies; 27+ messages in thread
From: Paulo Zanoni @ 2016-11-04 17:09 UTC (permalink / raw)
  To: Kumar, Mahesh, intel-gfx

Em Qui, 2016-10-13 às 16:28 +0530, Kumar, Mahesh escreveu:
> This patch implemnets Workariunds related to display arbitrated
> memory
> bandwidth. These WA are applicabe for all gen-9 based platforms.
> 
> Changes since v1:
>  - Rebase on top of Paulo's patch series
> Changes since v2:
>  - Rebase/rework after addressing Paulo's comments in previous patch

A lot of this code has changed since then, so this will need a
significant rebase. In the meantime, I added skl_needs_memory_bw_wa()
and we're now applying the WA by default: we just won't apply the WA
when we're pretty sure we don't need to. This helps avoiding underruns
by default.

See more below.


> 
> Signed-off-by: "Kumar, Mahesh" <mahesh1.kumar@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h  |   9 +++
>  drivers/gpu/drm/i915/intel_drv.h |  11 +++
>  drivers/gpu/drm/i915/intel_pm.c  | 146
> +++++++++++++++++++++++++++++++++++++++
>  3 files changed, 166 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h
> b/drivers/gpu/drm/i915/i915_drv.h
> index adbd9aa..c169360 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1092,6 +1092,13 @@ enum intel_sbi_destination {
>  	SBI_MPHY,
>  };
>  
> +/* SKL+ Watermark arbitrated display bandwidth Workarounds */
> +enum watermark_memory_wa {
> +	WATERMARK_WA_NONE,
> +	WATERMARK_WA_X_TILED,
> +	WATERMARK_WA_Y_TILED,
> +};
> +
>  #define QUIRK_PIPEA_FORCE (1<<0)
>  #define QUIRK_LVDS_SSC_DISABLE (1<<1)
>  #define QUIRK_INVERT_BRIGHTNESS (1<<2)
> @@ -1644,6 +1651,8 @@ struct skl_ddb_allocation {
>  
>  struct skl_wm_values {
>  	unsigned dirty_pipes;
> +	/* any WaterMark memory workaround Required */

We can remove this comment since it doesn't say anything the variable
name doesn't.

> +	enum watermark_memory_wa mem_wa;

Now that we have a proper variable in the state struct, it probably
makes sense to just kill skl_needs_memory_bw_wa() and read this
variable when we need to.


>  	struct skl_ddb_allocation ddb;
>  	uint32_t wm_linetime[I915_MAX_PIPES];
>  	uint32_t plane[I915_MAX_PIPES][I915_MAX_PLANES][8];
> diff --git a/drivers/gpu/drm/i915/intel_drv.h
> b/drivers/gpu/drm/i915/intel_drv.h
> index f48e79a..2c1897b 100644
> --- a/drivers/gpu/drm/i915/intel_drv.h
> +++ b/drivers/gpu/drm/i915/intel_drv.h
> @@ -1813,6 +1813,17 @@ intel_atomic_get_crtc_state(struct
> drm_atomic_state *state,
>  	return to_intel_crtc_state(crtc_state);
>  }
>  
> +static inline struct intel_crtc_state *
> +intel_atomic_get_existing_crtc_state(struct drm_atomic_state *state,
> +				      struct intel_crtc *crtc)
> +{
> +	struct drm_crtc_state *crtc_state;
> +
> +	crtc_state = drm_atomic_get_existing_crtc_state(state,
> &crtc->base);
> +
> +	return to_intel_crtc_state(crtc_state);

I really don't like the idea of calling to_intel_crtc_state() on a
potentially NULL pointer so the caller of this function will also check
for NULL. Even though it works today, I still think it's unsafe
practice. Please check crtc_state for NULL directly and then return
NULL.

Also, I think this function should be extracted to its own commit, and
we'd probably be able to find some callers in the existing i915 code.


> +}
> +
>  static inline struct intel_plane_state *
>  intel_atomic_get_existing_plane_state(struct drm_atomic_state
> *state,
>  				      struct intel_plane *plane)
> diff --git a/drivers/gpu/drm/i915/intel_pm.c
> b/drivers/gpu/drm/i915/intel_pm.c
> index 84ec6b1..5b8f715 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -3589,6 +3589,8 @@ static int skl_compute_plane_wm(const struct
> drm_i915_private *dev_priv,
>  {
>  	struct drm_plane_state *pstate = &intel_pstate->base;
>  	struct drm_framebuffer *fb = pstate->fb;
> +	struct intel_atomic_state *intel_state =
> +			to_intel_atomic_state(cstate->base.state);
>  	uint32_t latency = dev_priv->wm.skl_latency[level];
>  	uint32_t method1, method2;
>  	uint32_t plane_bytes_per_line, plane_blocks_per_line;
> @@ -3598,10 +3600,17 @@ static int skl_compute_plane_wm(const struct
> drm_i915_private *dev_priv,
>  	uint32_t width = 0, height = 0;
>  	uint32_t plane_pixel_rate;
>  	uint32_t y_tile_minimum, y_min_scanlines;
> +	enum watermark_memory_wa mem_wa;
>  
>  	if (latency == 0 || !cstate->base.active || !intel_pstate-
> >base.visible)
>  		return 0;
>  
> +	mem_wa = intel_state ? intel_state->wm_results.mem_wa :
> WATERMARK_WA_NONE;
> +	if (mem_wa != WATERMARK_WA_NONE) {
> +		if (fb->modifier[0] == I915_FORMAT_MOD_X_TILED)
> +			latency += 15;
> +	}
> +
>  	width = drm_rect_width(&intel_pstate->base.src) >> 16;
>  	height = drm_rect_height(&intel_pstate->base.src) >> 16;
>  
> @@ -3634,6 +3643,9 @@ static int skl_compute_plane_wm(const struct
> drm_i915_private *dev_priv,
>  		y_min_scanlines = 4;
>  	}
>  
> +	if (mem_wa == WATERMARK_WA_Y_TILED)
> +		y_min_scanlines *= 2;

The changes to this function will need to be rebased. But it's
interesting that my interpretation regarding this *= 2 was different.
After an email to the spec authors it seems your interpretation is the
right one...


> +
>  	plane_bytes_per_line = width * cpp;
>  	if (fb->modifier[0] == I915_FORMAT_MOD_Y_TILED ||
>  	    fb->modifier[0] == I915_FORMAT_MOD_Yf_TILED) {
> @@ -4075,6 +4087,15 @@ skl_include_affected_pipes(struct
> drm_atomic_state *state)
>  		intel_state->wm_results.dirty_pipes = ~0;
>  	}
>  
> +	/*
> +	 * If Watermark workaround is changed we need to recalculate
> +	 * watermark values for all active pipes
> +	 */
> +	if (intel_state->wm_results.mem_wa != dev_priv-
> >wm.skl_hw.mem_wa) {
> +		realloc_pipes = ~0;
> +		intel_state->wm_results.dirty_pipes = ~0;
> +	}
> +

Things have changed since this patch was written. I'd recommend moving
this to skl_set_memory_bandwidth_wa().


>  	for_each_intel_crtc_mask(dev, intel_crtc, realloc_pipes) {
>  		struct intel_crtc_state *cstate;
>  
> @@ -4087,6 +4108,129 @@ skl_include_affected_pipes(struct
> drm_atomic_state *state)
>  }
>  
>  static void
> +skl_set_memory_bandwidth_wm_wa(struct drm_atomic_state *state)

We're not really setting anything here: we're computing. Rename to
skl_compute_wm_memory_bw_wa?


> +{
> +	struct drm_device *dev = state->dev;
> +	struct drm_i915_private *dev_priv = to_i915(dev);
> +	struct intel_crtc *intel_crtc;
> +	struct intel_plane_state *intel_pstate;
> +	struct intel_atomic_state *intel_state =
> to_intel_atomic_state(state);
> +	struct memdev_info *memdev_info = &dev_priv->memdev_info;
> +	int num_active_plane, num_active_pipe;

I like to use plural in the number-of variables, since we're counting
the number of active planeS, not the number of the active plane.


> +	uint32_t plane_bw, max_plane_bw, pipe_bw, max_pipe_bw;
> +	uint32_t total_pipe_bw;
> +	uint32_t system_bw = 0;
> +	uint32_t rank;
> +	int x_tile_per;
> +	int display_bw_per;

I read this and kept thinking "x tiles per what?", took me a while to
realize it's percentage. It's probably better to just use a long self-
documenting name here than to use an ambiguous abbreviation.

Also, I'd just go with uint32_t in the percentages too to avoid any
possible problems with the uin32_t calculations.

And perhaps some declarations could be moved to smaller inner scopes
below.


> +	bool y_tile_enabled = false;
> +

if (!platforms_that_require_the_wa) {
	wa = WATERMARK_WA_NONE;
	return;
}

> +	if (!memdev_info->valid)
> +		goto exit;

Our default behavior should be to apply the WA then in doubt, not
to avoid it, so the return value here and in the other error cases
should be WATERAMARK_WA_Y_TILED.

Also, you can avoid the gotos by just setting mem_wa at the beginning
of the function, then you can just "return" later instead of goto.


> +
> +	system_bw = memdev_info->mem_speed_khz * memdev_info-
> >num_channels *
> +				memdev_info->channel_width_bytes;
> +
> +	if (!system_bw)
> +		goto exit;

This shouldn't be possible. Otherwise, the previous patch needs to be
fixed in a way to not allow system_bw to end up as zero. Please either
remove the check or change to "if (WARN_ON(!system_bw))".

> +
> +	max_pipe_bw = 0;
> +	for_each_intel_crtc(dev, intel_crtc) {
> +		struct intel_crtc_state *cstate;
> +		struct intel_plane *plane;
> +
> +		/*
> +		 * If CRTC is part of current atomic commit, get
> crtc state from
> +		 * existing CRTC state. else take the cached CRTC
> state
> +		 */
> +		cstate = NULL;
> +		if (state)

If state is NULL we'll segfault way before this point, so there's no
need for this check.


> +			cstate =
> intel_atomic_get_existing_crtc_state(state,
> +					intel_crtc);
> +		if (!cstate)
> +			cstate = to_intel_crtc_state(intel_crtc-
> >base.state);
> +
> +		if (!cstate->base.active)
> +			continue;
> +
> +		num_active_plane = 0;
> +		max_plane_bw = 0;
> +		for_each_intel_plane_mask(dev, plane, cstate-
> >base.plane_mask) {
> +			struct drm_framebuffer *fb = NULL;
> +
> +			intel_pstate = NULL;
> +			if (state)

Same here: no need to check for state.

> +				intel_pstate =
> +				intel_atomic_get_existing_plane_stat
> e(state,
> +									
> plane);
> +			if (!intel_pstate)
> +				intel_pstate =
> +					to_intel_plane_state(plane-
> >base.state);
> +
> +			WARN_ON(!intel_pstate->base.fb);

if (WARN_ON(!intel_pstate->base.fb))
	return;

Then we can just forget about checking for fb again below.

> +
> +			if (!intel_pstate->base.visible)
> +				continue;

Don't we also need to exclude the cursor here?


> +
> +			fb = intel_pstate->base.fb;
> +			if (fb && (fb->modifier[0] ==
> I915_FORMAT_MOD_Y_TILED ||
> +				fb->modifier[0] ==
> I915_FORMAT_MOD_Yf_TILED))
> +				y_tile_enabled = true;
> +
> +			plane_bw =
> skl_adjusted_plane_pixel_rate(cstate,
> +								inte
> l_pstate);
> +			max_plane_bw = max(plane_bw, max_plane_bw);
> +			num_active_plane++;
> +		}
> +		pipe_bw = max_plane_bw * num_active_plane;
> +		max_pipe_bw = max(pipe_bw, max_pipe_bw);
> +	}
> +
> +	if (intel_state->active_pipe_changes)
> +		num_active_pipe = hweight32(intel_state-
> >active_crtcs);
> +	else
> +		num_active_pipe = hweight32(dev_priv->active_crtcs);

Why don't we just trust intel_state->active_crtcs even when
active_pipe_changes is false?


> +
> +	total_pipe_bw = max_pipe_bw * num_active_pipe;
> +
> +	display_bw_per = DIV_ROUND_UP_ULL(total_pipe_bw * 100,
> system_bw * 1000);

Why ULL?

Also, if everything is KHz as it's supposed to be, the *100 and *1000
don't make sense.


> +
> +	/*
> +	 * If there is any Ytile plane enabled and arbitrated
> display
> +	 * bandwidth > 20% of raw system memory bandwidth
> +	 * Enale Y-tile related WA
> +	 *
> +	 * If memory is dual channel single rank, Xtile limit = 35%,
> else Xtile
> +	 * limit = 60%
> +	 * If there is no Ytile plane enabled and
> +	 * arbitrated display bandwidth > Xtile limit
> +	 * Enable X-tile realated WA
> +	 */
> +	if (y_tile_enabled && (display_bw_per > 20))
> +		intel_state->wm_results.mem_wa =
> WATERMARK_WA_Y_TILED;
> +	else {
> +
> +		if (memdev_info->rank_valid)
> +			rank = memdev_info->rank;
> +		else
> +			rank = DRAM_RANK_DUAL; /* Assume we are dual
> rank */

When in doubt, apply the most restrictive workaround to avoid the
possibility of underruns. So here it's safer to assume
DRAM_RANK_SINGLE.


> +
> +		if ((rank == DRAM_RANK_SINGLE) &&
> +					(memdev_info->num_channels
> == 2))
> +			x_tile_per = 35;
> +		else
> +			x_tile_per = 60;
> +
> +		if (display_bw_per > x_tile_per)
> +			intel_state->wm_results.mem_wa =
> WATERMARK_WA_X_TILED;
> +	}
> +	return;
> +
> +exit:
> +	intel_state->wm_results.mem_wa = WATERMARK_WA_NONE;
> +}
> +
> +static void
>  skl_copy_wm_for_pipe(struct skl_wm_values *dst,
>  		     struct skl_wm_values *src,
>  		     enum pipe pipe)
> @@ -4131,6 +4275,8 @@ skl_compute_wm(struct drm_atomic_state *state)
>  	/* Clear all dirty flags */
>  	results->dirty_pipes = 0;
>  
> +	skl_set_memory_bandwidth_wm_wa(state);
> +
>  	ret = skl_include_affected_pipes(state);
>  	if (ret)
>  		return ret;
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 4/8] drm/i915/gen9: WM memory bandwidth related workaround
  2016-11-04 17:09   ` Paulo Zanoni
@ 2016-11-04 17:22     ` Ville Syrjälä
  2016-11-10  5:54     ` Mahesh Kumar
  1 sibling, 0 replies; 27+ messages in thread
From: Ville Syrjälä @ 2016-11-04 17:22 UTC (permalink / raw)
  To: Paulo Zanoni; +Cc: intel-gfx

On Fri, Nov 04, 2016 at 03:09:04PM -0200, Paulo Zanoni wrote:
> Em Qui, 2016-10-13 às 16:28 +0530, Kumar, Mahesh escreveu:
> > This patch implemnets Workariunds related to display arbitrated
> > memory
> > bandwidth. These WA are applicabe for all gen-9 based platforms.
> > 
> > Changes since v1:
> >  - Rebase on top of Paulo's patch series
> > Changes since v2:
> >  - Rebase/rework after addressing Paulo's comments in previous patch
> 
> A lot of this code has changed since then, so this will need a
> significant rebase. In the meantime, I added skl_needs_memory_bw_wa()
> and we're now applying the WA by default: we just won't apply the WA
> when we're pretty sure we don't need to. This helps avoiding underruns
> by default.
> 
> See more below.
> 
> 
> > 
> > Signed-off-by: "Kumar, Mahesh" <mahesh1.kumar@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_drv.h  |   9 +++
> >  drivers/gpu/drm/i915/intel_drv.h |  11 +++
> >  drivers/gpu/drm/i915/intel_pm.c  | 146
> > +++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 166 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h
> > b/drivers/gpu/drm/i915/i915_drv.h
> > index adbd9aa..c169360 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -1092,6 +1092,13 @@ enum intel_sbi_destination {
> >  	SBI_MPHY,
> >  };
> >  
> > +/* SKL+ Watermark arbitrated display bandwidth Workarounds */
> > +enum watermark_memory_wa {
> > +	WATERMARK_WA_NONE,
> > +	WATERMARK_WA_X_TILED,
> > +	WATERMARK_WA_Y_TILED,
> > +};
> > +
> >  #define QUIRK_PIPEA_FORCE (1<<0)
> >  #define QUIRK_LVDS_SSC_DISABLE (1<<1)
> >  #define QUIRK_INVERT_BRIGHTNESS (1<<2)
> > @@ -1644,6 +1651,8 @@ struct skl_ddb_allocation {
> >  
> >  struct skl_wm_values {
> >  	unsigned dirty_pipes;
> > +	/* any WaterMark memory workaround Required */
> 
> We can remove this comment since it doesn't say anything the variable
> name doesn't.
> 
> > +	enum watermark_memory_wa mem_wa;
> 
> Now that we have a proper variable in the state struct, it probably
> makes sense to just kill skl_needs_memory_bw_wa() and read this
> variable when we need to.
> 
> 
> >  	struct skl_ddb_allocation ddb;
> >  	uint32_t wm_linetime[I915_MAX_PIPES];
> >  	uint32_t plane[I915_MAX_PIPES][I915_MAX_PLANES][8];
> > diff --git a/drivers/gpu/drm/i915/intel_drv.h
> > b/drivers/gpu/drm/i915/intel_drv.h
> > index f48e79a..2c1897b 100644
> > --- a/drivers/gpu/drm/i915/intel_drv.h
> > +++ b/drivers/gpu/drm/i915/intel_drv.h
> > @@ -1813,6 +1813,17 @@ intel_atomic_get_crtc_state(struct
> > drm_atomic_state *state,
> >  	return to_intel_crtc_state(crtc_state);
> >  }
> >  
> > +static inline struct intel_crtc_state *
> > +intel_atomic_get_existing_crtc_state(struct drm_atomic_state *state,
> > +				      struct intel_crtc *crtc)
> > +{
> > +	struct drm_crtc_state *crtc_state;
> > +
> > +	crtc_state = drm_atomic_get_existing_crtc_state(state,
> > &crtc->base);
> > +
> > +	return to_intel_crtc_state(crtc_state);
> 
> I really don't like the idea of calling to_intel_crtc_state() on a
> potentially NULL pointer so the caller of this function will also check
> for NULL. Even though it works today, I still think it's unsafe
> practice. Please check crtc_state for NULL directly and then return
> NULL.

I want to make this safe by making it a compile error if offsetof(base) != 0.
https://lists.freedesktop.org/archives/intel-gfx/2016-October/108175.html

But I think we want to go further than that patch by adding a bit more
type safety to things. I did play around with this stuff a bit more,
and I have something sitting on a branch, but I didn't quite figure out
what I want to do about const vs. non const yet.

> 
> Also, I think this function should be extracted to its own commit, and
> we'd probably be able to find some callers in the existing i915 code.

I have, on some branch again, _intel_ versions of the for_each_foo_in_state()
macros as well. I think those are going to allow a lot of ugly casting
stuff to disappear. But I think I'll hold off until Maarten's new
iterators go in before I try to send those out.

-- 
Ville Syrjälä
Intel OTC
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 5/8] drm/i915/skl+: reset y_plane ddb structure also during calculation
  2016-10-13 10:58 ` [PATCH v4 5/8] drm/i915/skl+: reset y_plane ddb structure also during calculation Kumar, Mahesh
@ 2016-11-04 17:39   ` Paulo Zanoni
  0 siblings, 0 replies; 27+ messages in thread
From: Paulo Zanoni @ 2016-11-04 17:39 UTC (permalink / raw)
  To: Kumar, Mahesh, intel-gfx

Em Qui, 2016-10-13 às 16:28 +0530, Kumar, Mahesh escreveu:
> From: Mahesh Kumar <mahesh1.kumar@intel.com>
> 
> Current code clears only plane ddb allocation if total ddb allocated
> to
> pipe in zero. y_plane ddb still contains old value, clear that as
> well.
> 
> Signed-off-by: Mahesh Kumar <mahesh1.kumar@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_pm.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_pm.c
> b/drivers/gpu/drm/i915/intel_pm.c
> index 5b8f715..a668204 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -3381,6 +3381,7 @@ skl_allocate_pipe_ddb(struct intel_crtc_state
> *cstate,
>  	alloc_size = skl_ddb_entry_size(alloc);
>  	if (alloc_size == 0) {
>  		memset(ddb->plane[pipe], 0, sizeof(ddb-
> >plane[pipe]));
> +		memset(ddb->y_plane[pipe], 0, sizeof(ddb-
> >y_plane[pipe]));

With the latest code we can just remove both memset() calls.

>  		return 0;
>  	}
>  
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 6/8] drm/i915/skl: Add variables to check x_tile and y_tile
  2016-10-13 10:58 ` [PATCH v4 6/8] drm/i915/skl: Add variables to check x_tile and y_tile Kumar, Mahesh
@ 2016-11-04 17:45   ` Paulo Zanoni
  0 siblings, 0 replies; 27+ messages in thread
From: Paulo Zanoni @ 2016-11-04 17:45 UTC (permalink / raw)
  To: Kumar, Mahesh, intel-gfx

Em Qui, 2016-10-13 às 16:28 +0530, Kumar, Mahesh escreveu:
> From: Mahesh Kumar <mahesh1.kumar@intel.com>
> 
> This patch adds variable to check for X_tiled & y_tiled planes,
> instead
> of always checking against framebuffer-modifiers.
> 
> Changes:
>  - Created separate patch as per Paulo's comment
>  - Added x_tiled variable as well
> 
> Signed-off-by: Mahesh Kumar <mahesh1.kumar@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_pm.c | 22 +++++++++++++---------
>  1 file changed, 13 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_pm.c
> b/drivers/gpu/drm/i915/intel_pm.c
> index a668204..0eaaadc 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -3602,6 +3602,7 @@ static int skl_compute_plane_wm(const struct
> drm_i915_private *dev_priv,
>  	uint32_t plane_pixel_rate;
>  	uint32_t y_tile_minimum, y_min_scanlines;
>  	enum watermark_memory_wa mem_wa;
> +	bool y_tiled = false, x_tiled = false;
>  
>  	if (latency == 0 || !cstate->base.active || !intel_pstate-
> >base.visible)
>  		return 0;
> @@ -3621,6 +3622,12 @@ static int skl_compute_plane_wm(const struct
> drm_i915_private *dev_priv,
>  	cpp = drm_format_plane_cpp(fb->pixel_format, 0);
>  	plane_pixel_rate = skl_adjusted_plane_pixel_rate(cstate,
> intel_pstate);
>  
> +	if (fb->modifier[0] == I915_FORMAT_MOD_Y_TILED ||
> +	    fb->modifier[0] == I915_FORMAT_MOD_Yf_TILED)
> +		y_tiled = true;
> +	else if (fb->modifier[0] == I915_FORMAT_MOD_X_TILED)
> +		x_tiled = true;
> +

Or you could go with the simpler:

y_tiled = fb->modifier[0] == Y_TILED || fb->modifier[0] == Yf_TILED;
x_tiled = fb->modifier[0] == X_TILED;

And this would allow you to even remove the initialization to false
above, and would allow the compiler to complain in case we try to use
uninitialized values.

But that's just an optional bikeshed.

Anyway, I like the patch but it needs a rebase. It's better to just
include this patch in the beginning of the series so we can merge it
more easily, independently of the others.

>  	if (intel_rotation_90_or_270(pstate->rotation)) {
>  		int cpp = (fb->pixel_format == DRM_FORMAT_NV12) ?
>  			drm_format_plane_cpp(fb->pixel_format, 1) :
> @@ -3648,16 +3655,15 @@ static int skl_compute_plane_wm(const struct
> drm_i915_private *dev_priv,
>  		y_min_scanlines *= 2;
>  
>  	plane_bytes_per_line = width * cpp;
> -	if (fb->modifier[0] == I915_FORMAT_MOD_Y_TILED ||
> -	    fb->modifier[0] == I915_FORMAT_MOD_Yf_TILED) {
> +	if (y_tiled) {
>  		plane_blocks_per_line =
>  		      DIV_ROUND_UP(plane_bytes_per_line *
> y_min_scanlines, 512);
>  		plane_blocks_per_line /= y_min_scanlines;
> -	} else if (fb->modifier[0] == DRM_FORMAT_MOD_NONE) {
> +	} else if (x_tiled) {
> +		plane_blocks_per_line =
> DIV_ROUND_UP(plane_bytes_per_line, 512);
> +	} else {
>  		plane_blocks_per_line =
> DIV_ROUND_UP(plane_bytes_per_line, 512)
>  					+ 1;
> -	} else {
> -		plane_blocks_per_line =
> DIV_ROUND_UP(plane_bytes_per_line, 512);
>  	}
>  
>  	method1 = skl_wm_method1(plane_pixel_rate, cpp, latency);
> @@ -3668,8 +3674,7 @@ static int skl_compute_plane_wm(const struct
> drm_i915_private *dev_priv,
>  
>  	y_tile_minimum = plane_blocks_per_line * y_min_scanlines;
>  
> -	if (fb->modifier[0] == I915_FORMAT_MOD_Y_TILED ||
> -	    fb->modifier[0] == I915_FORMAT_MOD_Yf_TILED) {
> +	if (y_tiled) {
>  		selected_result = max(method2, y_tile_minimum);
>  	} else {
>  		uint32_t linetime_us = 0;
> @@ -3689,8 +3694,7 @@ static int skl_compute_plane_wm(const struct
> drm_i915_private *dev_priv,
>  	res_lines = DIV_ROUND_UP(selected_result,
> plane_blocks_per_line);
>  
>  	if (level >= 1 && level <= 7) {
> -		if (fb->modifier[0] == I915_FORMAT_MOD_Y_TILED ||
> -		    fb->modifier[0] == I915_FORMAT_MOD_Yf_TILED) {
> +		if (y_tiled) {
>  			res_blocks += y_tile_minimum;
>  			res_lines += y_min_scanlines;
>  		} else {
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 7/8] drm/i915/skl+: change WM calc to fixed point 16.16
  2016-10-13 10:58 ` [PATCH v4 7/8] drm/i915/skl+: change WM calc to fixed point 16.16 Kumar, Mahesh
@ 2016-11-04 19:03   ` Paulo Zanoni
  0 siblings, 0 replies; 27+ messages in thread
From: Paulo Zanoni @ 2016-11-04 19:03 UTC (permalink / raw)
  To: Kumar, Mahesh, intel-gfx

Em Qui, 2016-10-13 às 16:28 +0530, Kumar, Mahesh escreveu:
> From: Mahesh Kumar <mahesh1.kumar@intel.com>
> 
> This patch changes Watermak calculation to fixed point calculation.
> Problem with current calculation is during plane_blocks_per_line
> calculation we divide intermediate blocks with min_scanlines and
> takes floor of the result because of integer operation.
> hence we end-up assigning less blocks than required. Which leads to
> flickers.
> 

There are still variables that got auto-converted to 16.16 and need to
be adjusted because later they are mixed with non-16.16 in non-safe
ways. The fact that's it's hard to identify these things really worries
me.


> Signed-off-by: Mahesh Kumar <mahesh1.kumar@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_pm.c | 16 +++++++++++-----
>  1 file changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_pm.c
> b/drivers/gpu/drm/i915/intel_pm.c
> index 0eaaadc..4263212 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -3527,16 +3527,19 @@ static uint32_t skl_pipe_pixel_rate(const
> struct intel_crtc_state *config)
>   * for the read latency) and cpp should always be <= 8, so that
>   * should allow pixel_rate up to ~2 GHz which seems sufficient since
> max
>   * 2xcdclk is 1350 MHz and the pixel rate should never exceed that.
> + * Both Method1 & Method2 returns fixedpoint 16.16 output
>  */
>  static uint32_t skl_wm_method1(uint32_t pixel_rate, uint8_t cpp,
> uint32_t latency)
>  {
> -	uint32_t wm_intermediate_val, ret;
> +	uint64_t wm_intermediate_val;
> +	uint32_t ret;
>  
>  	if (latency == 0)
>  		return UINT_MAX;
>  
> -	wm_intermediate_val = latency * pixel_rate * cpp / 512;
> -	ret = DIV_ROUND_UP(wm_intermediate_val, 1000);
> +	wm_intermediate_val = latency * pixel_rate * cpp;
> +	wm_intermediate_val <<= 16;
> +	ret = DIV_ROUND_UP_ULL(wm_intermediate_val, 1000 * 512);
>  
>  	return ret;
>  }
> @@ -3658,12 +3661,15 @@ static int skl_compute_plane_wm(const struct
> drm_i915_private *dev_priv,
>  	if (y_tiled) {
>  		plane_blocks_per_line =
>  		      DIV_ROUND_UP(plane_bytes_per_line *
> y_min_scanlines, 512);
> -		plane_blocks_per_line /= y_min_scanlines;
> +		plane_blocks_per_line = (plane_blocks_per_line <<
> 16) /
> +								y_mi
> n_scanlines;
>  	} else if (x_tiled) {
>  		plane_blocks_per_line =
> DIV_ROUND_UP(plane_bytes_per_line, 512);
> +		plane_blocks_per_line <<= 16;
>  	} else {
>  		plane_blocks_per_line =
> DIV_ROUND_UP(plane_bytes_per_line, 512)
>  					+ 1;
> +		plane_blocks_per_line <<= 16;
>  	}
>  
>  	method1 = skl_wm_method1(plane_pixel_rate, cpp, latency);
> @@ -3690,7 +3696,7 @@ static int skl_compute_plane_wm(const struct
> drm_i915_private *dev_priv,
>  			selected_result = method1;
>  	}
>  
> -	res_blocks = selected_result + 1;
> +	res_blocks = DIV_ROUND_UP(selected_result, 1 << 16) + 1;
>  	res_lines = DIV_ROUND_UP(selected_result,
> plane_blocks_per_line);
>  
>  	if (level >= 1 && level <= 7) {
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 8/8] drm/i915/bxt: Enable IPC support
  2016-10-13 10:58 ` [PATCH v4 8/8] drm/i915/bxt: Enable IPC support Kumar, Mahesh
  2016-10-13 11:19   ` Maarten Lankhorst
@ 2016-11-04 19:20   ` Paulo Zanoni
  1 sibling, 0 replies; 27+ messages in thread
From: Paulo Zanoni @ 2016-11-04 19:20 UTC (permalink / raw)
  To: Kumar, Mahesh, intel-gfx

Em Qui, 2016-10-13 às 16:28 +0530, Kumar, Mahesh escreveu:
> From: Mahesh Kumar <mahesh1.kumar@intel.com>
> 
> This patch adds IPC support for platforms. This patch enables IPC
> only for BXT/KBL platform as for SKL recommendation is to keep is
> disabled.
> IPC (Isochronous Priority Control) is the hardware feature, which
> dynamically controles the memory read priority of Display.
> 
> When IPC is enabled, plane read requests are sent at high priority
> until
> filling above the transition watermark, then the requests are sent at
> lower priority until dropping below the level 0 watermark.
> The lower priority requests allow other memory clients to have better
> memory access. When IPC is disabled, all plane read requests are sent
> at
> high priority.
> 
> Changes since V1:
>  - Remove commandline parameter to disable ipc
>  - Address Paulo's comments
> 

In addition to what others said, we also need the linetime/2 WA if we
want to enable IPC.

Also, see below.

> Signed-off-by: Mahesh Kumar <mahesh1.kumar@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.c  |  2 ++
>  drivers/gpu/drm/i915/i915_reg.h  |  1 +
>  drivers/gpu/drm/i915/intel_drv.h |  1 +
>  drivers/gpu/drm/i915/intel_pm.c  | 15 +++++++++++++++
>  4 files changed, 19 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.c
> b/drivers/gpu/drm/i915/i915_drv.c
> index b5f601c..58abbaa 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -1415,6 +1415,8 @@ int i915_driver_load(struct pci_dev *pdev,
> const struct pci_device_id *ent)
>  
>  	intel_runtime_pm_enable(dev_priv);
>  
> +	intel_enable_ipc(dev_priv);
> +
>  	/* Everything is in place, we can now relax! */
>  	DRM_INFO("Initialized %s %d.%d.%d %s for %s on minor %d\n",
>  		 driver.name, driver.major, driver.minor,
> driver.patchlevel,
> diff --git a/drivers/gpu/drm/i915/i915_reg.h
> b/drivers/gpu/drm/i915/i915_reg.h
> index a9c467c..c9ebf23 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -6144,6 +6144,7 @@ enum {
>  #define  DISP_FBC_WM_DIS		(1<<15)
>  #define DISP_ARB_CTL2	_MMIO(0x45004)
>  #define  DISP_DATA_PARTITION_5_6	(1<<6)
> +#define  DISP_IPC_ENABLE		(1<<3)
>  #define DBUF_CTL	_MMIO(0x45008)
>  #define  DBUF_POWER_REQUEST		(1<<31)
>  #define  DBUF_POWER_STATE		(1<<30)
> diff --git a/drivers/gpu/drm/i915/intel_drv.h
> b/drivers/gpu/drm/i915/intel_drv.h
> index 2c1897b..45b0fa4 100644
> --- a/drivers/gpu/drm/i915/intel_drv.h
> +++ b/drivers/gpu/drm/i915/intel_drv.h
> @@ -1766,6 +1766,7 @@ void skl_write_plane_wm(struct intel_crtc
> *intel_crtc,
>  uint32_t ilk_pipe_pixel_rate(const struct intel_crtc_state
> *pipe_config);
>  bool ilk_disable_lp_wm(struct drm_device *dev);
>  int sanitize_rc6_option(struct drm_i915_private *dev_priv, int
> enable_rc6);
> +void intel_enable_ipc(struct drm_i915_private *dev_priv);
>  static inline int intel_enable_rc6(void)
>  {
>  	return i915.enable_rc6;
> diff --git a/drivers/gpu/drm/i915/intel_pm.c
> b/drivers/gpu/drm/i915/intel_pm.c
> index 4263212..543aa5d 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -4833,6 +4833,21 @@ void intel_update_watermarks(struct drm_crtc
> *crtc)
>  		dev_priv->display.update_wm(crtc);
>  }
>  
> +void intel_enable_ipc(struct drm_i915_private *dev_priv)
> +{
> +	u32 val;
> +
> +	/* enable IPC only for Broxton for now*/
> +	if (!IS_BROXTON(dev_priv) || !IS_KABYLAKE(dev_priv))
> +		return;

This will always return...

> +
> +	val = I915_READ(DISP_ARB_CTL2);
> +
> +	val |= DISP_IPC_ENABLE;
> +
> +	I915_WRITE(DISP_ARB_CTL2, val);
> +}
> +
>  /*
>   * Lock protecting IPS related data structures
>   */
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 2/8] drm/i915/skl: New ddb allocation algorithm
  2016-10-13 10:58 ` [PATCH v4 2/8] drm/i915/skl: New ddb allocation algorithm Kumar, Mahesh
@ 2016-11-04 19:25   ` Paulo Zanoni
  0 siblings, 0 replies; 27+ messages in thread
From: Paulo Zanoni @ 2016-11-04 19:25 UTC (permalink / raw)
  To: Kumar, Mahesh, intel-gfx

Em Qui, 2016-10-13 às 16:28 +0530, Kumar, Mahesh escreveu:
> From: Mahesh Kumar <mahesh1.kumar@intel.com>
> 
> This patch implements new DDB allocation algorithm as per HW team
> recommendation. This algo takecare of scenario where we allocate less
> DDB
> for the planes with lower relative pixel rate, but they require more
> DDB
> to work.
> It also takes care of enabling same watermark level for each
> plane, for efficient power saving.
> 
> Changes since v1:
>  - Rebase on top of Paulo's patch series
> 
> Changes since v2:
>  - Fix the for loop condition to enable WM
> 
> Changes since v3:
>  - Fix crash in cursor i-g-t reported by Maarten
>  - Rebase after addressing Paulo's comments
>  - Few other ULT fixes
> 

This will require a huge rebase due to the things that were already
merged and those who are about to be merged. Also, this is a general
improvement while the other patches are bug fixes. Can you please move
this to the end of the series? I'd really like to get the other things
merged first, in case we decide to backport the fixes.


> Signed-off-by: Mahesh Kumar <mahesh1.kumar@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_pm.c | 149 +++++++++++++++++++++---------
> ----------
>  1 file changed, 79 insertions(+), 70 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_pm.c
> b/drivers/gpu/drm/i915/intel_pm.c
> index 098336d..84ec6b1 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -3344,6 +3344,7 @@ skl_ddb_min_alloc(const struct drm_plane_state
> *pstate,
>  
>  static int
>  skl_allocate_pipe_ddb(struct intel_crtc_state *cstate,
> +		      struct skl_pipe_wm *pipe_wm,
>  		      struct skl_ddb_allocation *ddb /* out */)
>  {
>  	struct drm_atomic_state *state = cstate->base.state;
> @@ -3359,8 +3360,12 @@ skl_allocate_pipe_ddb(struct intel_crtc_state
> *cstate,
>  	uint16_t *minimum = cstate->wm.skl.minimum_blocks;
>  	uint16_t *y_minimum = cstate->wm.skl.minimum_y_blocks;
>  	unsigned int total_data_rate;
> +	uint16_t total_min_blocks = 0;
> +	uint16_t total_level_ddb = 0;
>  	int num_active;
> -	int id, i;
> +	int max_level, level;
> +	int id, i, ret = 0;
> +
>  
>  	if (WARN_ON(!state))
>  		return 0;
> @@ -3409,19 +3414,42 @@ skl_allocate_pipe_ddb(struct intel_crtc_state
> *cstate,
>  	}
>  
>  	for (i = 0; i < PLANE_CURSOR; i++) {
> -		alloc_size -= minimum[i];
> -		alloc_size -= y_minimum[i];
> +		total_min_blocks += minimum[i];
> +		total_min_blocks += y_minimum[i];
>  	}
>  
> -	/*
> -	 * 2. Distribute the remaining space in proportion to the
> amount of
> -	 * data each plane needs to fetch from memory.
> -	 *
> -	 * FIXME: we may not allocate every single block here.
> -	 */
> +	for (level = ilk_wm_max_level(dev); level >= 0; level--) {
> +		total_level_ddb = 0;
> +		for (i = 0; i < PLANE_CURSOR; i++) {
> +			/*
> +			 * TODO: We should calculate watermark
> values for Y/UV
> +			 * plane both in case of NV12 format and use
> both values
> +			 * for ddb calculation, As NV12 is disabled
> as of now.
> +			 * using only single plane value here.
> +			 */
> +			uint16_t min = minimum[i] + y_minimum[i];
> +			uint16_t plane_level_ddb_wm =
> +				max(pipe_wm-
> >wm[level].plane_res_b[i], min);
> +			total_level_ddb += plane_level_ddb_wm;
> +		}
> +
> +		if (total_level_ddb <= alloc_size)
> +			break;
> +	}
> +
> +	if ((level < 0) || (total_min_blocks > alloc_size)) {
> +		DRM_DEBUG_KMS("Requested display configuration
> exceeds system DDB limitations");
> +		DRM_DEBUG_KMS("minimum required %d/%d\n", (level <
> 0) ?
> +				total_level_ddb : total_min_blocks,
> alloc_size);
> +		ret = -EINVAL;
> +		goto exit;
> +	}
> +	max_level = level;
> +	alloc_size -= total_level_ddb;
> +
>  	total_data_rate = skl_get_total_relative_data_rate(cstate);
>  	if (total_data_rate == 0)
> -		return 0;
> +		goto exit;
>  
>  	start = alloc->start;
>  	for_each_intel_plane_on_crtc(dev, intel_crtc, intel_plane) {
> @@ -3436,7 +3464,8 @@ skl_allocate_pipe_ddb(struct intel_crtc_state
> *cstate,
>  		 * promote the expression to 64 bits to avoid
> overflowing, the
>  		 * result is < available as data_rate /
> total_data_rate < 1
>  		 */
> -		plane_blocks = minimum[id];
> +		plane_blocks = max(pipe_wm-
> >wm[max_level].plane_res_b[id],
> +					minimum[id]);
>  		plane_blocks += div_u64((uint64_t)alloc_size *
> data_rate,
>  					total_data_rate);
>  
> @@ -3444,12 +3473,13 @@ skl_allocate_pipe_ddb(struct intel_crtc_state
> *cstate,
>  		if (data_rate) {
>  			ddb->plane[pipe][id].start = start;
>  			ddb->plane[pipe][id].end = start +
> plane_blocks;
> +			start += plane_blocks;
>  		}
>  
> -		start += plane_blocks;
> -
>  		/*
>  		 * allocation for y_plane part of planar format:
> +		 * TODO: Once we start calculating watermark values
> for Y/UV
> +		 * plane both consider it for initial allowed wm
> blocks.
>  		 */
>  		y_data_rate = cstate->wm.skl.plane_y_data_rate[id];
>  
> @@ -3460,12 +3490,29 @@ skl_allocate_pipe_ddb(struct intel_crtc_state
> *cstate,
>  		if (y_data_rate) {
>  			ddb->y_plane[pipe][id].start = start;
>  			ddb->y_plane[pipe][id].end = start +
> y_plane_blocks;
> +			start += y_plane_blocks;
>  		}
>  
> -		start += y_plane_blocks;
> +		/*
> +		 * Now enable all levels in WM structure which can
> be enabled
> +		 * using current DDB allocation
> +		 */
> +		for (i = ilk_wm_max_level(dev); i >= 0; i--) {
> +			if (i > max_level || pipe_wm-
> >wm[i].plane_res_l[id] > 31
> +					|| pipe_wm-
> >wm[i].plane_res_b[id] == 0
> +					|| pipe_wm-
> >wm[i].plane_res_b[id] >=
> +					skl_ddb_entry_size(&ddb-
> >plane[pipe][id])) {
> +				pipe_wm->wm[i].plane_en[id] = false;
> +				pipe_wm->wm[i].plane_res_b[id] = 0;
> +				pipe_wm->wm[i].plane_res_l[id] = 0;
> +			} else {
> +				pipe_wm->wm[i].plane_en[id] = true;
> +			}
> +		}
>  	}
>  
> -	return 0;
> +exit:
> +	return ret;
>  }
>  
>  static uint32_t skl_pipe_pixel_rate(const struct intel_crtc_state
> *config)
> @@ -3536,11 +3583,9 @@ static uint32_t
> skl_adjusted_plane_pixel_rate(const struct intel_crtc_state *cst
>  static int skl_compute_plane_wm(const struct drm_i915_private
> *dev_priv,
>  				struct intel_crtc_state *cstate,
>  				struct intel_plane_state
> *intel_pstate,
> -				uint16_t ddb_allocation,
>  				int level,
>  				uint16_t *out_blocks, /* out */
> -				uint8_t *out_lines, /* out */
> -				bool *enabled /* out */)
> +				uint8_t *out_lines /* out */)
>  {
>  	struct drm_plane_state *pstate = &intel_pstate->base;
>  	struct drm_framebuffer *fb = pstate->fb;
> @@ -3554,10 +3599,8 @@ static int skl_compute_plane_wm(const struct
> drm_i915_private *dev_priv,
>  	uint32_t plane_pixel_rate;
>  	uint32_t y_tile_minimum, y_min_scanlines;
>  
> -	if (latency == 0 || !cstate->base.active || !intel_pstate-
> >base.visible) {
> -		*enabled = false;
> +	if (latency == 0 || !cstate->base.active || !intel_pstate-
> >base.visible)
>  		return 0;
> -	}
>  
>  	width = drm_rect_width(&intel_pstate->base.src) >> 16;
>  	height = drm_rect_height(&intel_pstate->base.src) >> 16;
> @@ -3642,47 +3685,22 @@ static int skl_compute_plane_wm(const struct
> drm_i915_private *dev_priv,
>  		}
>  	}
>  
> -	if (res_blocks >= ddb_allocation || res_lines > 31) {
> -		*enabled = false;
> -
> -		/*
> -		 * If there are no valid level 0 watermarks, then we
> can't
> -		 * support this display configuration.
> -		 */
> -		if (level) {
> -			return 0;
> -		} else {
> -			DRM_DEBUG_KMS("Requested display
> configuration exceeds system watermark limitations\n");
> -			DRM_DEBUG_KMS("Plane %d.%d: blocks required
> = %u/%u, lines required = %u/31\n",
> -				      to_intel_crtc(cstate-
> >base.crtc)->pipe,
> -				      skl_wm_plane_id(to_intel_plane
> (pstate->plane)),
> -				      res_blocks, ddb_allocation,
> res_lines);
> -
> -			return -EINVAL;
> -		}
> -	}
> -
>  	*out_blocks = res_blocks;
>  	*out_lines = res_lines;
> -	*enabled = true;
>  
>  	return 0;
>  }
>  
>  static int
>  skl_compute_wm_level(const struct drm_i915_private *dev_priv,
> -		     struct skl_ddb_allocation *ddb,
>  		     struct intel_crtc_state *cstate,
>  		     int level,
>  		     struct skl_wm_level *result)
>  {
>  	struct drm_atomic_state *state = cstate->base.state;
> -	struct intel_crtc *intel_crtc = to_intel_crtc(cstate-
> >base.crtc);
>  	struct drm_plane *plane;
>  	struct intel_plane *intel_plane;
>  	struct intel_plane_state *intel_pstate;
> -	uint16_t ddb_blocks;
> -	enum pipe pipe = intel_crtc->pipe;
>  	int ret;
>  
>  	/*
> @@ -3721,16 +3739,12 @@ skl_compute_wm_level(const struct
> drm_i915_private *dev_priv,
>  
>  		WARN_ON(!intel_pstate->base.fb);
>  
> -		ddb_blocks = skl_ddb_entry_size(&ddb-
> >plane[pipe][i]);
> -
>  		ret = skl_compute_plane_wm(dev_priv,
>  					   cstate,
>  					   intel_pstate,
> -					   ddb_blocks,
>  					   level,
>  					   &result->plane_res_b[i],
> -					   &result->plane_res_l[i],
> -					   &result->plane_en[i]);
> +					   &result->plane_res_l[i]);
>  		if (ret)
>  			return ret;
>  	}
> @@ -3779,11 +3793,15 @@ static int skl_build_pipe_wm(struct
> intel_crtc_state *cstate,
>  	int ret;
>  
>  	for (level = 0; level <= max_level; level++) {
> -		ret = skl_compute_wm_level(dev_priv, ddb, cstate,
> +		ret = skl_compute_wm_level(dev_priv, cstate,
>  					   level, &pipe_wm-
> >wm[level]);
>  		if (ret)
>  			return ret;
>  	}
> +	ret = skl_allocate_pipe_ddb(cstate, pipe_wm, ddb);
> +	if (ret)
> +		return ret;
> +
>  	pipe_wm->linetime = skl_compute_linetime_wm(cstate);
>  
>  	skl_compute_transition_wm(cstate, &pipe_wm->trans_wm);
> @@ -4006,13 +4024,12 @@ skl_ddb_add_affected_planes(struct
> intel_crtc_state *cstate)
>  }
>  
>  static int
> -skl_compute_ddb(struct drm_atomic_state *state)
> +skl_include_affected_pipes(struct drm_atomic_state *state)
>  {
>  	struct drm_device *dev = state->dev;
>  	struct drm_i915_private *dev_priv = to_i915(dev);
>  	struct intel_atomic_state *intel_state =
> to_intel_atomic_state(state);
>  	struct intel_crtc *intel_crtc;
> -	struct skl_ddb_allocation *ddb = &intel_state-
> >wm_results.ddb;
>  	uint32_t realloc_pipes = pipes_modified(state);
>  	int ret;
>  
> @@ -4064,14 +4081,6 @@ skl_compute_ddb(struct drm_atomic_state
> *state)
>  		cstate = intel_atomic_get_crtc_state(state,
> intel_crtc);
>  		if (IS_ERR(cstate))
>  			return PTR_ERR(cstate);
> -
> -		ret = skl_allocate_pipe_ddb(cstate, ddb);
> -		if (ret)
> -			return ret;
> -
> -		ret = skl_ddb_add_affected_planes(cstate);
> -		if (ret)
> -			return ret;
>  	}
>  
>  	return 0;
> @@ -4122,19 +4131,15 @@ skl_compute_wm(struct drm_atomic_state
> *state)
>  	/* Clear all dirty flags */
>  	results->dirty_pipes = 0;
>  
> -	ret = skl_compute_ddb(state);
> +	ret = skl_include_affected_pipes(state);
>  	if (ret)
>  		return ret;
>  
>  	/*
>  	 * Calculate WM's for all pipes that are part of this
> transaction.
> -	 * Note that the DDB allocation above may have added more
> CRTC's that
> -	 * weren't otherwise being modified (and set bits in
> dirty_pipes) if
> -	 * pipe allocations had to change.
> -	 *
> -	 * FIXME:  Now that we're doing this in the atomic check
> phase, we
> -	 * should allow skl_update_pipe_wm() to return failure in
> cases where
> -	 * no suitable watermark values can be found.
> +	 * Note that affected pipe calculation above may have added
> more
> +	 * CRTC's that weren't otherwise being modified (and set
> bits in
> +	 * dirty_pipes) if pipe allocations had to change.
>  	 */
>  	for_each_crtc_in_state(state, crtc, cstate, i) {
>  		struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
> @@ -4150,6 +4155,10 @@ skl_compute_wm(struct drm_atomic_state *state)
>  		if (changed)
>  			results->dirty_pipes |= drm_crtc_mask(crtc);
>  
> +		ret = skl_ddb_add_affected_planes(intel_cstate);
> +		if (ret)
> +			return ret;
> +
>  		if ((results->dirty_pipes & drm_crtc_mask(crtc)) ==
> 0)
>  			/* This pipe's WM's did not change */
>  			continue;
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 1/8] drm/i915/skl+: use linetime latency instead of ddb size
  2016-10-31 18:03   ` Paulo Zanoni
@ 2016-11-09 14:58     ` Mahesh Kumar
  2016-11-09 17:02       ` Paulo Zanoni
  0 siblings, 1 reply; 27+ messages in thread
From: Mahesh Kumar @ 2016-11-09 14:58 UTC (permalink / raw)
  To: Paulo Zanoni, intel-gfx

Hi,


On Monday 31 October 2016 11:33 PM, Paulo Zanoni wrote:
> Em Qui, 2016-10-13 às 16:28 +0530, Kumar, Mahesh escreveu:
>> This patch make changes to use linetime latency instead of allocated
>> DDB size during plane watermark calculation in switch case, This is
>> required to implement new DDB allocation algorithm.
>>
>> In New Algorithm DDB is allocated based on WM values, because of
>> which
>> number of DDB blocks will not be available during WM calculation,
>> So this "linetime latency" is suggested by SV/HW team to use during
>> switch-case for WM blocks selection.
>>
>> Changes since v1:
>>   - Rebase on top of Paulo's patch series
>> Changes since v2:
>>   - Fix if-else condition (pointed by Maarten)
>>
>> Signed-off-by: "Kumar, Mahesh" <mahesh1.kumar@intel.com>
>> ---
>>   drivers/gpu/drm/i915/intel_pm.c | 6 +++++-
>>   1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/i915/intel_pm.c
>> b/drivers/gpu/drm/i915/intel_pm.c
>> index 7f1748a..098336d 100644
>> --- a/drivers/gpu/drm/i915/intel_pm.c
>> +++ b/drivers/gpu/drm/i915/intel_pm.c
>> @@ -3616,10 +3616,14 @@ static int skl_compute_plane_wm(const struct
>> drm_i915_private *dev_priv,
>>   	    fb->modifier[0] == I915_FORMAT_MOD_Yf_TILED) {
>>   		selected_result = max(method2, y_tile_minimum);
>>   	} else {
>> +		uint32_t linetime_us = 0;
>> +
>> +		linetime_us = DIV_ROUND_UP(width * 1000,
>> +				skl_pipe_pixel_rate(cstate));
> Can't we just call skl_compute_linetime_wm() here? I don't like having
> two pieces of the code computing the same thing. My last round of bug
> fixes included a fix for duplicated code that got out of sync after
> spec changes.
These two have different values in skl_compute_linetime_wm we multiply 
by 8, not here.
>
>>   		if ((cpp * cstate->base.adjusted_mode.crtc_htotal /
>> 512 < 1) &&
>>   		    (plane_bytes_per_line / 512 < 1))
>>   			selected_result = method2;
>> -		else if ((ddb_allocation / plane_blocks_per_line) >=
>> 1)
>> +		else if (latency >= linetime_us)
> Still doesn't match the spec. The "ddb_allocation /
> planes_block_per_line" is still necessary.
After New algorithm patch, we will not have access to ddb_allocation, as 
allocation will happen later.
So we can't use ddb_allocation in our calculation, This check in Bspec 
is kept for the environment/OS where we don't allocate DDB dynamically.
hence those OS will have access to ddb_allocation during WM calculation 
phase.
thanks,

Regards,
-Mahesh
>
>>   			selected_result = min(method1, method2);
>>   		else
>>   			selected_result = method1;

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 1/8] drm/i915/skl+: use linetime latency instead of ddb size
  2016-11-09 14:58     ` Mahesh Kumar
@ 2016-11-09 17:02       ` Paulo Zanoni
  0 siblings, 0 replies; 27+ messages in thread
From: Paulo Zanoni @ 2016-11-09 17:02 UTC (permalink / raw)
  To: Mahesh Kumar, intel-gfx

Em Qua, 2016-11-09 às 20:28 +0530, Mahesh Kumar escreveu:
> Hi,
> 
> 
> On Monday 31 October 2016 11:33 PM, Paulo Zanoni wrote:
> > 
> > Em Qui, 2016-10-13 às 16:28 +0530, Kumar, Mahesh escreveu:
> > > 
> > > This patch make changes to use linetime latency instead of
> > > allocated
> > > DDB size during plane watermark calculation in switch case, This
> > > is
> > > required to implement new DDB allocation algorithm.
> > > 
> > > In New Algorithm DDB is allocated based on WM values, because of
> > > which
> > > number of DDB blocks will not be available during WM calculation,
> > > So this "linetime latency" is suggested by SV/HW team to use
> > > during
> > > switch-case for WM blocks selection.
> > > 
> > > Changes since v1:
> > >   - Rebase on top of Paulo's patch series
> > > Changes since v2:
> > >   - Fix if-else condition (pointed by Maarten)
> > > 
> > > Signed-off-by: "Kumar, Mahesh" <mahesh1.kumar@intel.com>
> > > ---
> > >   drivers/gpu/drm/i915/intel_pm.c | 6 +++++-
> > >   1 file changed, 5 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/intel_pm.c
> > > b/drivers/gpu/drm/i915/intel_pm.c
> > > index 7f1748a..098336d 100644
> > > --- a/drivers/gpu/drm/i915/intel_pm.c
> > > +++ b/drivers/gpu/drm/i915/intel_pm.c
> > > @@ -3616,10 +3616,14 @@ static int skl_compute_plane_wm(const
> > > struct
> > > drm_i915_private *dev_priv,
> > >   	    fb->modifier[0] == I915_FORMAT_MOD_Yf_TILED) {
> > >   		selected_result = max(method2, y_tile_minimum);
> > >   	} else {
> > > +		uint32_t linetime_us = 0;
> > > +
> > > +		linetime_us = DIV_ROUND_UP(width * 1000,
> > > +				skl_pipe_pixel_rate(cstate));
> > Can't we just call skl_compute_linetime_wm() here? I don't like
> > having
> > two pieces of the code computing the same thing. My last round of
> > bug
> > fixes included a fix for duplicated code that got out of sync after
> > spec changes.
> These two have different values in skl_compute_linetime_wm we
> multiply 
> by 8, not here.

You can move the *8 multiplication to the caller that needs it.


> > 
> > 
> > > 
> > >   		if ((cpp * cstate-
> > > >base.adjusted_mode.crtc_htotal /
> > > 512 < 1) &&
> > >   		    (plane_bytes_per_line / 512 < 1))
> > >   			selected_result = method2;
> > > -		else if ((ddb_allocation /
> > > plane_blocks_per_line) >=
> > > 1)
> > > +		else if (latency >= linetime_us)
> > Still doesn't match the spec. The "ddb_allocation /
> > planes_block_per_line" is still necessary.
> After New algorithm patch, we will not have access to ddb_allocation,
> as 
> allocation will happen later.
> So we can't use ddb_allocation in our calculation, This check in
> Bspec 
> is kept for the environment/OS where we don't allocate DDB
> dynamically.
> hence those OS will have access to ddb_allocation during WM
> calculation 
> phase.
> thanks,

So the patch that implements the DDB allocation can remove the check
when/if it gets merged. The code with only this patch does not use the
new algorithm, so let's make everything works if we only have it
applied. Bisectability is really important.

> 
> Regards,
> -Mahesh
> > 
> > 
> > > 
> > >   			selected_result = min(method1,
> > > method2);
> > >   		else
> > >   			selected_result = method1;
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 4/8] drm/i915/gen9: WM memory bandwidth related workaround
  2016-11-04 17:09   ` Paulo Zanoni
  2016-11-04 17:22     ` Ville Syrjälä
@ 2016-11-10  5:54     ` Mahesh Kumar
  2016-11-16 12:36       ` Paulo Zanoni
  1 sibling, 1 reply; 27+ messages in thread
From: Mahesh Kumar @ 2016-11-10  5:54 UTC (permalink / raw)
  To: Paulo Zanoni, intel-gfx

Hi,


On Friday 04 November 2016 10:39 PM, Paulo Zanoni wrote:
> Em Qui, 2016-10-13 às 16:28 +0530, Kumar, Mahesh escreveu:
>> This patch implemnets Workariunds related to display arbitrated
>> memory
>> bandwidth. These WA are applicabe for all gen-9 based platforms.
>>
>> Changes since v1:
>>   - Rebase on top of Paulo's patch series
>> Changes since v2:
>>   - Rebase/rework after addressing Paulo's comments in previous patch
> A lot of this code has changed since then, so this will need a
> significant rebase. In the meantime, I added skl_needs_memory_bw_wa()
> and we're now applying the WA by default: we just won't apply the WA
> when we're pretty sure we don't need to. This helps avoiding underruns
> by default.
>
> See more below.
make sense, Will change default WA to WA_T_TILED
>> Signed-off-by: "Kumar, Mahesh" <mahesh1.kumar@intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_drv.h  |   9 +++
>>   drivers/gpu/drm/i915/intel_drv.h |  11 +++
>>   drivers/gpu/drm/i915/intel_pm.c  | 146
>> +++++++++++++++++++++++++++++++++++++++
>>   3 files changed, 166 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h
>> b/drivers/gpu/drm/i915/i915_drv.h
>> index adbd9aa..c169360 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -1092,6 +1092,13 @@ enum intel_sbi_destination {
>>   	SBI_MPHY,
>>   };
>>   
>> +/* SKL+ Watermark arbitrated display bandwidth Workarounds */
>> +enum watermark_memory_wa {
>> +	WATERMARK_WA_NONE,
>> +	WATERMARK_WA_X_TILED,
>> +	WATERMARK_WA_Y_TILED,
>> +};
>> +
>>   #define QUIRK_PIPEA_FORCE (1<<0)
>>   #define QUIRK_LVDS_SSC_DISABLE (1<<1)
>>   #define QUIRK_INVERT_BRIGHTNESS (1<<2)
>> @@ -1644,6 +1651,8 @@ struct skl_ddb_allocation {
>>   
>>   struct skl_wm_values {
>>   	unsigned dirty_pipes;
>> +	/* any WaterMark memory workaround Required */
> We can remove this comment since it doesn't say anything the variable
> name doesn't.
>
>> +	enum watermark_memory_wa mem_wa;
> Now that we have a proper variable in the state struct, it probably
> makes sense to just kill skl_needs_memory_bw_wa() and read this
> variable when we need to.
>
>
>>   	struct skl_ddb_allocation ddb;
>>   	uint32_t wm_linetime[I915_MAX_PIPES];
>>   	uint32_t plane[I915_MAX_PIPES][I915_MAX_PLANES][8];
>> diff --git a/drivers/gpu/drm/i915/intel_drv.h
>> b/drivers/gpu/drm/i915/intel_drv.h
>> index f48e79a..2c1897b 100644
>> --- a/drivers/gpu/drm/i915/intel_drv.h
>> +++ b/drivers/gpu/drm/i915/intel_drv.h
>> @@ -1813,6 +1813,17 @@ intel_atomic_get_crtc_state(struct
>> drm_atomic_state *state,
>>   	return to_intel_crtc_state(crtc_state);
>>   }
>>   
>> +static inline struct intel_crtc_state *
>> +intel_atomic_get_existing_crtc_state(struct drm_atomic_state *state,
>> +				      struct intel_crtc *crtc)
>> +{
>> +	struct drm_crtc_state *crtc_state;
>> +
>> +	crtc_state = drm_atomic_get_existing_crtc_state(state,
>> &crtc->base);
>> +
>> +	return to_intel_crtc_state(crtc_state);
> I really don't like the idea of calling to_intel_crtc_state() on a
> potentially NULL pointer so the caller of this function will also check
> for NULL. Even though it works today, I still think it's unsafe
> practice. Please check crtc_state for NULL directly and then return
> NULL.
>
> Also, I think this function should be extracted to its own commit, and
> we'd probably be able to find some callers in the existing i915 code.
Will extract this function & include NULL check for crtc_state itself
>
>> +}
>> +
>>   static inline struct intel_plane_state *
>>   intel_atomic_get_existing_plane_state(struct drm_atomic_state
>> *state,
>>   				      struct intel_plane *plane)
>> diff --git a/drivers/gpu/drm/i915/intel_pm.c
>> b/drivers/gpu/drm/i915/intel_pm.c
>> index 84ec6b1..5b8f715 100644
>> --- a/drivers/gpu/drm/i915/intel_pm.c
>> +++ b/drivers/gpu/drm/i915/intel_pm.c
>> @@ -3589,6 +3589,8 @@ static int skl_compute_plane_wm(const struct
>> drm_i915_private *dev_priv,
>>   {
>>   	struct drm_plane_state *pstate = &intel_pstate->base;
>>   	struct drm_framebuffer *fb = pstate->fb;
>> +	struct intel_atomic_state *intel_state =
>> +			to_intel_atomic_state(cstate->base.state);
>>   	uint32_t latency = dev_priv->wm.skl_latency[level];
>>   	uint32_t method1, method2;
>>   	uint32_t plane_bytes_per_line, plane_blocks_per_line;
>> @@ -3598,10 +3600,17 @@ static int skl_compute_plane_wm(const struct
>> drm_i915_private *dev_priv,
>>   	uint32_t width = 0, height = 0;
>>   	uint32_t plane_pixel_rate;
>>   	uint32_t y_tile_minimum, y_min_scanlines;
>> +	enum watermark_memory_wa mem_wa;
>>   
>>   	if (latency == 0 || !cstate->base.active || !intel_pstate-
>>> base.visible)
>>   		return 0;
>>   
>> +	mem_wa = intel_state ? intel_state->wm_results.mem_wa :
>> WATERMARK_WA_NONE;
>> +	if (mem_wa != WATERMARK_WA_NONE) {
>> +		if (fb->modifier[0] == I915_FORMAT_MOD_X_TILED)
>> +			latency += 15;
>> +	}
>> +
>>   	width = drm_rect_width(&intel_pstate->base.src) >> 16;
>>   	height = drm_rect_height(&intel_pstate->base.src) >> 16;
>>   
>> @@ -3634,6 +3643,9 @@ static int skl_compute_plane_wm(const struct
>> drm_i915_private *dev_priv,
>>   		y_min_scanlines = 4;
>>   	}
>>   
>> +	if (mem_wa == WATERMARK_WA_Y_TILED)
>> +		y_min_scanlines *= 2;
> The changes to this function will need to be rebased. But it's
> interesting that my interpretation regarding this *= 2 was different.
> After an email to the spec authors it seems your interpretation is the
> right one...
>
>
>> +
>>   	plane_bytes_per_line = width * cpp;
>>   	if (fb->modifier[0] == I915_FORMAT_MOD_Y_TILED ||
>>   	    fb->modifier[0] == I915_FORMAT_MOD_Yf_TILED) {
>> @@ -4075,6 +4087,15 @@ skl_include_affected_pipes(struct
>> drm_atomic_state *state)
>>   		intel_state->wm_results.dirty_pipes = ~0;
>>   	}
>>   
>> +	/*
>> +	 * If Watermark workaround is changed we need to recalculate
>> +	 * watermark values for all active pipes
>> +	 */
>> +	if (intel_state->wm_results.mem_wa != dev_priv-
>>> wm.skl_hw.mem_wa) {
>> +		realloc_pipes = ~0;
>> +		intel_state->wm_results.dirty_pipes = ~0;
>> +	}
>> +
> Things have changed since this patch was written. I'd recommend moving
> this to skl_set_memory_bandwidth_wa().
Working on re-basing the patch.
>
>>   	for_each_intel_crtc_mask(dev, intel_crtc, realloc_pipes) {
>>   		struct intel_crtc_state *cstate;
>>   
>> @@ -4087,6 +4108,129 @@ skl_include_affected_pipes(struct
>> drm_atomic_state *state)
>>   }
>>   
>>   static void
>> +skl_set_memory_bandwidth_wm_wa(struct drm_atomic_state *state)
> We're not really setting anything here: we're computing. Rename to
> skl_compute_wm_memory_bw_wa?
>
>
>> +{
>> +	struct drm_device *dev = state->dev;
>> +	struct drm_i915_private *dev_priv = to_i915(dev);
>> +	struct intel_crtc *intel_crtc;
>> +	struct intel_plane_state *intel_pstate;
>> +	struct intel_atomic_state *intel_state =
>> to_intel_atomic_state(state);
>> +	struct memdev_info *memdev_info = &dev_priv->memdev_info;
>> +	int num_active_plane, num_active_pipe;
> I like to use plural in the number-of variables, since we're counting
> the number of active planeS, not the number of the active plane.
>
>
>> +	uint32_t plane_bw, max_plane_bw, pipe_bw, max_pipe_bw;
>> +	uint32_t total_pipe_bw;
>> +	uint32_t system_bw = 0;
>> +	uint32_t rank;
>> +	int x_tile_per;
>> +	int display_bw_per;
> I read this and kept thinking "x tiles per what?", took me a while to
> realize it's percentage. It's probably better to just use a long self-
> documenting name here than to use an ambiguous abbreviation.
>
> Also, I'd just go with uint32_t in the percentages too to avoid any
> possible problems with the uin32_t calculations.
>
> And perhaps some declarations could be moved to smaller inner scopes
> below.
>
>
>> +	bool y_tile_enabled = false;
>> +
> if (!platforms_that_require_the_wa) {
> 	wa = WATERMARK_WA_NONE;
> 	return;
> }
this function is not called by any GEN other than GEN9, will it be ok to 
add "if (!GEN9)" check?
>> +	if (!memdev_info->valid)
>> +		goto exit;
> Our default behavior should be to apply the WA then in doubt, not
> to avoid it, so the return value here and in the other error cases
> should be WATERAMARK_WA_Y_TILED.
>
> Also, you can avoid the gotos by just setting mem_wa at the beginning
> of the function, then you can just "return" later instead of goto.
>
>
>> +
>> +	system_bw = memdev_info->mem_speed_khz * memdev_info-
>>> num_channels *
>> +				memdev_info->channel_width_bytes;
>> +
>> +	if (!system_bw)
>> +		goto exit;
> This shouldn't be possible. Otherwise, the previous patch needs to be
> fixed in a way to not allow system_bw to end up as zero. Please either
> remove the check or change to "if (WARN_ON(!system_bw))".
yes, ideally this should not be possible, but what if because of any 
reason BIOS is not writing the register OR we don't have BIOS altogether?

If we add WARN_ON here, it'll flood the logs, we can add WARN_ON_ONCE, 
but we are anyway printing the warning in previous patch, if frequency 
or any other variable calculated is ZERO, IMO will add more prints in 
previous patch instead of adding here.
>
>> +
>> +	max_pipe_bw = 0;
>> +	for_each_intel_crtc(dev, intel_crtc) {
>> +		struct intel_crtc_state *cstate;
>> +		struct intel_plane *plane;
>> +
>> +		/*
>> +		 * If CRTC is part of current atomic commit, get
>> crtc state from
>> +		 * existing CRTC state. else take the cached CRTC
>> state
>> +		 */
>> +		cstate = NULL;
>> +		if (state)
> If state is NULL we'll segfault way before this point, so there's no
> need for this check.
>
>
>> +			cstate =
>> intel_atomic_get_existing_crtc_state(state,
>> +					intel_crtc);
>> +		if (!cstate)
>> +			cstate = to_intel_crtc_state(intel_crtc-
>>> base.state);
>> +
>> +		if (!cstate->base.active)
>> +			continue;
>> +
>> +		num_active_plane = 0;
>> +		max_plane_bw = 0;
>> +		for_each_intel_plane_mask(dev, plane, cstate-
>>> base.plane_mask) {
>> +			struct drm_framebuffer *fb = NULL;
>> +
>> +			intel_pstate = NULL;
>> +			if (state)
> Same here: no need to check for state.
>
>> +				intel_pstate =
>> +				intel_atomic_get_existing_plane_stat
>> e(state,
>> +									
>> plane);
>> +			if (!intel_pstate)
>> +				intel_pstate =
>> +					to_intel_plane_state(plane-
>>> base.state);
>> +
>> +			WARN_ON(!intel_pstate->base.fb);
> if (WARN_ON(!intel_pstate->base.fb))
> 	return;
>
> Then we can just forget about checking for fb again below.
>
>> +
>> +			if (!intel_pstate->base.visible)
>> +				continue;
> Don't we also need to exclude the cursor here?
yes, cursor should be excluded, will add that. thanks,
>
>
>> +
>> +			fb = intel_pstate->base.fb;
>> +			if (fb && (fb->modifier[0] ==
>> I915_FORMAT_MOD_Y_TILED ||
>> +				fb->modifier[0] ==
>> I915_FORMAT_MOD_Yf_TILED))
>> +				y_tile_enabled = true;
>> +
>> +			plane_bw =
>> skl_adjusted_plane_pixel_rate(cstate,
>> +								inte
>> l_pstate);
>> +			max_plane_bw = max(plane_bw, max_plane_bw);
>> +			num_active_plane++;
>> +		}
>> +		pipe_bw = max_plane_bw * num_active_plane;
>> +		max_pipe_bw = max(pipe_bw, max_pipe_bw);
>> +	}
>> +
>> +	if (intel_state->active_pipe_changes)
>> +		num_active_pipe = hweight32(intel_state-
>>> active_crtcs);
>> +	else
>> +		num_active_pipe = hweight32(dev_priv->active_crtcs);
> Why don't we just trust intel_state->active_crtcs even when
> active_pipe_changes is false?
I really didn't dig into that code & reused the code from different 
function. will check into this
>
>
>> +
>> +	total_pipe_bw = max_pipe_bw * num_active_pipe;
>> +
>> +	display_bw_per = DIV_ROUND_UP_ULL(total_pipe_bw * 100,
>> system_bw * 1000);
> Why ULL?
>
> Also, if everything is KHz as it's supposed to be, the *100 and *1000
> don't make sense.
In previous patch I converted the frequency to MHz... need to fix that, 
then *1000 not be required,
*100 is still required to get the percentage.
>
>
>> +
>> +	/*
>> +	 * If there is any Ytile plane enabled and arbitrated
>> display
>> +	 * bandwidth > 20% of raw system memory bandwidth
>> +	 * Enale Y-tile related WA
>> +	 *
>> +	 * If memory is dual channel single rank, Xtile limit = 35%,
>> else Xtile
>> +	 * limit = 60%
>> +	 * If there is no Ytile plane enabled and
>> +	 * arbitrated display bandwidth > Xtile limit
>> +	 * Enable X-tile realated WA
>> +	 */
>> +	if (y_tile_enabled && (display_bw_per > 20))
>> +		intel_state->wm_results.mem_wa =
>> WATERMARK_WA_Y_TILED;
>> +	else {
>> +
>> +		if (memdev_info->rank_valid)
>> +			rank = memdev_info->rank;
>> +		else
>> +			rank = DRAM_RANK_DUAL; /* Assume we are dual
>> rank */
> When in doubt, apply the most restrictive workaround to avoid the
> possibility of underruns. So here it's safer to assume
> DRAM_RANK_SINGLE.
OK...

thanks,

Regards,
-Mahesh
>
>> +
>> +		if ((rank == DRAM_RANK_SINGLE) &&
>> +					(memdev_info->num_channels
>> == 2))
>> +			x_tile_per = 35;
>> +		else
>> +			x_tile_per = 60;
>> +
>> +		if (display_bw_per > x_tile_per)
>> +			intel_state->wm_results.mem_wa =
>> WATERMARK_WA_X_TILED;
>> +	}
>> +	return;
>> +
>> +exit:
>> +	intel_state->wm_results.mem_wa = WATERMARK_WA_NONE;
>> +}
>> +
>> +static void
>>   skl_copy_wm_for_pipe(struct skl_wm_values *dst,
>>   		     struct skl_wm_values *src,
>>   		     enum pipe pipe)
>> @@ -4131,6 +4275,8 @@ skl_compute_wm(struct drm_atomic_state *state)
>>   	/* Clear all dirty flags */
>>   	results->dirty_pipes = 0;
>>   
>> +	skl_set_memory_bandwidth_wm_wa(state);
>> +
>>   	ret = skl_include_affected_pipes(state);
>>   	if (ret)
>>   		return ret;

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 3/8] drm/i915: Decode system memory bandwidth
  2016-11-03 19:06   ` Paulo Zanoni
@ 2016-11-16 11:48     ` Mahesh Kumar
  0 siblings, 0 replies; 27+ messages in thread
From: Mahesh Kumar @ 2016-11-16 11:48 UTC (permalink / raw)
  To: Paulo Zanoni, intel-gfx

Hi,


On Friday 04 November 2016 12:36 AM, Paulo Zanoni wrote:
> Em Qui, 2016-10-13 às 16:28 +0530, Kumar, Mahesh escreveu:
>> This patch adds support to decode system memory bandwidth
>> which will be used for arbitrated display memory percentage
>> calculation in GEN9 based system.
>>
>> Changes from v1:
>>   - Address comments from Paulo
>>   - implement decode function for SKL/KBL also
> Again, my understanding of these registers is not good so I will ask
> some questions that will help me properly understand and review the
> code. I may also end up asking some questions that don't make sense.
> Please correct me in case any of my assumptions is wrong.
>
> Perhaps answers to my questions could become code comments since I
> suppose most of the gfx team is not super familiar with the memory
> regs, and we're going to have to maintain the code.
>
>
>> Signed-off-by: "Kumar, Mahesh" <mahesh1.kumar@intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_drv.c | 170
>> ++++++++++++++++++++++++++++++++++++++++
>>   drivers/gpu/drm/i915/i915_drv.h |  14 ++++
>>   drivers/gpu/drm/i915/i915_reg.h |  38 +++++++++
>>   3 files changed, 222 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.c
>> b/drivers/gpu/drm/i915/i915_drv.c
>> index 89d3222..b5f601c 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.c
>> +++ b/drivers/gpu/drm/i915/i915_drv.c
>> @@ -979,6 +979,170 @@ static void intel_sanitize_options(struct
>> drm_i915_private *dev_priv)
>>   	DRM_DEBUG_DRIVER("use GPU sempahores? %s\n",
>> yesno(i915.semaphores));
>>   }
>>   
>> +static inline void skl_memdev_info_fill(struct memdev_info *info,
>> uint32_t val)
>> +{
>> +	uint8_t channel_width, rank;
>> +
>> +	info->rank_valid = true;
>> +	channel_width = (val >> SKL_DRAM_CHANNEL_WIDTH_SHIFT) &
>> +		SKL_DRAM_CHANNEL_WIDTH_MASK;
> Why are we looking at the L bits instead of the S bits? What's the
> difference between them?
We need to see both L & S bits. This happen due to incomplete 
interpretation/explanation of register.
There can be two dimm per channel, & each nibble tells the information 
about L/S dimm
>
>> +	if (channel_width == SKL_DRAM_WIDTH_1)
>> +		info->channel_width_bytes = 1;
>> +	else if (channel_width == SKL_DRAM_WIDTH_2)
>> +		info->channel_width_bytes = 2;
>> +	else
>> +		info->channel_width_bytes = 4;
> This is not checking for the invalid value of 0x3. Perhaps a switch()
> instead of an if() ladder would be better here.
>
>> +
>> +	rank = (val >> SKL_DRAM_RANK_SHIFT) & SKL_DRAM_RANK_MASK;
>> +	if (rank == SKL_DRAM_RANK_SINGLE)
>> +		info->rank = DRAM_RANK_SINGLE;
>> +	else
>> +		info->rank = DRAM_RANK_DUAL;
>> +}
>> +
>> +static int
>> +skl_get_memdev_info(struct drm_i915_private *dev_priv)
>> +{
>> +	uint32_t val = 0;
>> +	uint32_t mem_speed = 0;
>> +	struct memdev_info *memdev_info = &dev_priv->memdev_info;
>> +
>> +	val = I915_READ(SKL_MC_BIOS_DATA_0_0_0_MCHBAR_PCU);
>> +	mem_speed = div_u64((uint64_t) (val & SKL_REQ_DATA_MASK) *
>> +			SKL_MEMORY_FREQ_MULTIPLIER, 1000);
> But but but isn't it possible to have more than 2GHz in SKL? I think I
> have a big SKL machine with more than 2GHz here... I'll check it later.
> Also, don't we need to check FREQ_TYPE and ODD_RATIO?
It seems it's possible,
here we have to multiply frequency by 2 again to get the actual freq. So 
we can have upto 4GHz. current MAX is 2133MHz
This was not mentioned in the CSpec, but got confirmation from HW team.
>
> Some digging suggests that maybe we need to figure out somehow if we're
> using DCLK or QCLK and multiply accordingly (based on the BIOS_REQ
> register instead of BIOS_DATA).
>
> I'd really need some clarification on how all of this works.
>
> Also, it looks like there's no need for this to be a 64bit calculation
> due to the mask limit.
>
>
>> +
>> +	if (mem_speed == 0)
>> +		return -EINVAL;
>> +
>> +	memdev_info->valid = true;
>> +	memdev_info->mem_speed_khz = mem_speed;
>> +	memdev_info->num_channels  = 0;
>> +
>> +	memdev_info->rank_valid = false;
>> +	val = I915_READ(SKL_MAD_DIMM_CH0_0_0_0_MCHBAR_MCMAIN);
>> +
>> +	if (val != 0xFFFFFFFF) {
>> +		memdev_info->num_channels++;
>> +		skl_memdev_info_fill(memdev_info, val);
>> +	}
>> +
>> +	val = I915_READ(SKL_MAD_DIMM_CH1_0_0_0_MCHBAR_MCMAIN);
>> +
>> +	if (val != 0xFFFFFFFF) {
>> +		memdev_info->num_channels++;
>> +		if (!memdev_info->rank_valid)
>> +			skl_memdev_info_fill(memdev_info, val);
>> +	}
> A simple iteration over the two register addresses would have saved
> some code :).
>
> Also, isn't it possible to have channels with different width/rank?
Yes, channel can have different RANK, assumption made here is wrong.
need to rewrite the code according to new information got.
> Shouldn't our code be sanity-checking this? Perhaps grab the width/rank
> for each channel and compare. When the specs are unclear or don't
> exist, it's probably better if our code is able to check/validate its
> own assumptions.
>
>
>> +
>> +	if (memdev_info->num_channels == 0)
>> +		memdev_info->valid = false;
>> +	return 0;
>> +}
>> +
>> +static int
>> +bxt_get_memdev_info(struct drm_i915_private *dev_priv)
>> +{
>> +	struct memdev_info *memdev_info = &dev_priv->memdev_info;
>> +	uint32_t dram_channel;
>> +	uint32_t mem_speed, val;
>> +	uint8_t num_channel, dram_type;
>> +	int i;
>> +
>> +	val = I915_READ(BXT_P_CR_MC_BIOS_REQ_0_0_0);
>> +	mem_speed = div_u64((uint64_t) (val & BXT_REQ_DATA_MASK) *
>> +			SKL_MEMORY_FREQ_MULTIPLIER, 1000);
> AFAIU, there's no need for 64 bit division since the maximum value
> is 8399790.
Will remove 64 bit operation
>
>
>> +
>> +	if (mem_speed == 0)
>> +		return -EINVAL;
>> +
>> +	memdev_info->valid = true;
>> +	memdev_info->mem_speed_khz = mem_speed;
>> +	dram_type = (val >> BXT_DRAM_TYPE_SHIFT) &
>> BXT_DRAM_TYPE_MASK;
>> +	dram_channel = (val >> BXT_DRAM_CHANNEL_SHIFT) &
>> BXT_DRAM_CHANNEL_MASK;
>> +	num_channel = hweight32(dram_channel);
>> +
>> +	/*
>> +	 * The lpddr3 and lpddr4 technologies can have 1-4 channels
>> and the
>> +	 * channels are 32bits wide; while ddr3l technologies can
>> have 1-2
>> +	 * channels and the channels are 64 bits wide. In case of
>> single 64 bit
>> +	 * wide DDR3L dimm sets two channel-active bits in
>> +	 * P_CR_MC_BIOS_REQ_0_0_0 register and system with two DDR3L
>> 64bit dimm
>> +	 * will set all four channel-active bits in above register.
>> +	 * In order to get actual number of channels in DDR3L DRAM
>> we need to
>> +	 * device total channel-active bits set by 2.
>> +	 */
>> +
>> +	switch (dram_type) {
>> +	case BXT_DRAM_TYPE_LPDDR3:
>> +	case BXT_DRAM_TYPE_LPDDR4:
>> +		memdev_info->channel_width_bytes = 4;
>> +		memdev_info->num_channels = num_channel;
>> +		break;
>> +	case BXT_DRAM_TYPE_DDR3L:
>> +		memdev_info->channel_width_bytes = 8;
>> +		memdev_info->num_channels = num_channel / 2;
>> +		break;
>> +	default:
>> +		DRM_DEBUG_KMS("Unknown DRAM type\n");
>> +		memdev_info->valid = false;
>> +		return -EINVAL;
>> +	}
> Shouldn't channel_width_bytes just be num_channels * 4 for LPDDR3/4 and
> num_channels * 8 for DDR3L?
>
> Or just multiply by 4 before we do the "num_channel / 2"... This would
> avoid even having to find out the DRAM type (in case it's not needed
> anywhere else).
DRAM type is not needed anywhere else, but it's require to correctly 
find the number of channels,
which will be needed while deciding on WA.
>> +
>> +	/*
>> +	 * Now read each DUNIT8/9/10/11 to check the rank of each
>> dimms.
>> +	 * all the dimms should have same rank as in first valid
>> Dimm
>> +	 */
>
> I think we need to explain in the comment why we're iterating over the
> 4 dunits but just looking at the first one that's not 0xFFFFFFFF.
As told above, assumption made that all channel will have same RANK is 
wrong, need to rework.
> Isn't there a way to discover which dunits to look at based on the
> number of channels and DRAM type?
In case of DDR3L dunit-9/10 only can be set as per number of channels.
Don't you think it'll add more complexity in the code?
> Also, does the valid dunit amount reflect the number of channels like
> in SKL?
Yes, number of channel will be equal to DUNIT set.
Oh, using this we can re-factor the code & calculate num of channels 
same way as in SKL & always assume
channel_width_bytes * channel as "4 * active_bits set above",
even instead of storing freq & channel width separately, we can store 
system bandwidth itself. This will avoid same "system bandwidth" 
calculation in each flip.
>
>
>> +	memdev_info->rank_valid = false;
>> +	for (i = 0; i < BXT_D_CR_DRP0_DUNIT_MAX; i++) {
>> +		val = I915_READ(BXT_D_CR_DRP0_DUNIT(i));
>> +		if (val != 0xFFFFFFFF) {
>> +			uint8_t rank;
>> +
>> +			memdev_info->rank_valid = true;
>> +			rank = val & BXT_DRAM_RANK_MASK;
>> +			if (rank == BXT_DRAM_RANK_SINGLE)
>> +				memdev_info->rank =
>> DRAM_RANK_SINGLE;
>> +			else if (rank == BXT_DRAM_RANK_DUAL)
>> +				memdev_info->rank = DRAM_RANK_DUAL;
>> +			else
>> +				memdev_info->rank = DRAM_RANK_NONE;
> Probably DRM_DEBUG_KMS("something\n") here? Also maybe set rank_valid
> back to false? Return non-zero?
>
>
>> +
>> +			break;
>> +		}
>> +	}
>> +	return 0;
>> +}
>> +
>> +static void
>> +intel_get_memdev_info(struct drm_i915_private *dev_priv)
>> +{
>> +	struct memdev_info *memdev_info = &dev_priv->memdev_info;
>> +	int ret;
>> +
>> +	memdev_info->valid = false;
>> +	if (!IS_GEN9(dev_priv))
>> +		return;
>> +
>> +	if (IS_BROXTON(dev_priv))
>> +		ret = bxt_get_memdev_info(dev_priv);
>> +	else
>> +		ret = skl_get_memdev_info(dev_priv);
>> +	if (ret)
>> +		return;
> What I find a little hard to read is that we set memdev_info->valid
> both here and in the functions we call. Due to the early returns it's
> not super simple to see the possible value it may get, and things get
> even more complicated due to the fact that we may return 0 from
> skl_get_memdev_info() while still setting memdev_info->valid to false.
>
> Things would be much simpler and easier to read if we only set
> memdev_info->valid in this function, based on the return value of the
> get_memdev_info() functions. And then the get_memdev_info() functions
> would appropriately return zero/nonzero based on valid/invalid
> configuration.
>
>
>> +
>> +	if (memdev_info->valid) {
>> +		DRM_DEBUG_DRIVER("DRAM speed-%d Khz total-channels-
>> %d channel-width-%d bytes\n",
>> +				memdev_info->mem_speed_khz,
>> +				memdev_info->num_channels,
>> +				memdev_info->channel_width_bytes);
> As I mentioned in my previous review, please use the appropriate
> specifiers here instead of %d for everybody.
>
> Also, we usually print "variable: value, other variable: value" instead
> of "variable-value other-variable-value". Please use the established
> convention. Reading "total-channels-1" is less clear than "total
> channels: 1" and doesn't raise ambiguity concerns with negative values.
>
>
>> +		if (memdev_info->rank_valid)
>> +			DRM_DEBUG_DRIVER("DRAM rank-%s\n",
>> +			(memdev_info->rank == DRAM_RANK_DUAL) ?
>> +						"dual" : "single");
> But there's also DRAM_RANK_NONE.
>
>
>> +	}
>> +}
>> +
>> +
>>   /**
>>    * i915_driver_init_hw - setup state requiring device access
>>    * @dev_priv: device private
>> @@ -1082,6 +1246,12 @@ static int i915_driver_init_hw(struct
>> drm_i915_private *dev_priv)
>>   			DRM_DEBUG_DRIVER("can't enable MSI");
>>   	}
>>   
>> +	/*
>> +	 * Fill the memdev structure to get the system raw bandwidth
>> +	 * This will be used by WM algorithm, to implement GEN9
>> based WA
>> +	 */
>> +	intel_get_memdev_info(dev_priv);
>> +
>>   	return 0;
>>   
>>   out_ggtt:
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h
>> b/drivers/gpu/drm/i915/i915_drv.h
>> index a219a35..adbd9aa 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -2057,6 +2057,20 @@ struct drm_i915_private {
>>   		bool distrust_bios_wm;
>>   	} wm;
>>   
>> +	struct memdev_info {
>> +		bool valid;
>> +		uint32_t mem_speed_khz;
>> +		uint8_t channel_width_bytes;
>> +		uint8_t num_channels;
>> +		bool rank_valid;
>> +		enum {
>> +			DRAM_RANK_NONE = 0,
>> +			DRAM_RANK_SINGLE,
>> +			DRAM_RANK_DUAL
>> +		} rank;
> What's the meaning of DRAM_RANK_NONE? Does DRAM_RANK_NONE imply
> rank_valid=false? What's the meaning of rank=DRAM_RANK_NONE when
> rank_valid=true?
>
> Perhaps we could get rid of the rank_valid variable? Maybe rename
> DRAM_RANK_NONE to DRAM_RANK_INVALID?
Agree, will make change to use DRAM_RANK_INVALID instead of rank_valid 
variable :)
>
>
>> +	} memdev_info;
>> +
>> +
>>   	struct i915_runtime_pm pm;
>>   
>>   	/* Abstract the submission mechanism (legacy ringbuffer or
>> execlists) away */
>> diff --git a/drivers/gpu/drm/i915/i915_reg.h
>> b/drivers/gpu/drm/i915/i915_reg.h
>> index acc767a..a9c467c 100644
>> --- a/drivers/gpu/drm/i915/i915_reg.h
>> +++ b/drivers/gpu/drm/i915/i915_reg.h
>> @@ -7721,6 +7721,44 @@ enum {
>>   #define  DC_STATE_DEBUG_MASK_CORES	(1<<0)
>>   #define  DC_STATE_DEBUG_MASK_MEMORY_UP	(1<<1)
>>   
>> +#define _MMIO_MCHBAR_PIPE(x, a, b)	_MMIO(MCHBAR_MIRROR_BASE_S
>> NB + _PIPE(x, a, b))
> Wait, what? I may have misunderstood something, but AFAICS the
> registers that use this macro are not per-pipe, so using the _PIPE
> macro makes things super confusing. In this case it should be fine to
> just do something a little more hardcoded, like:
>
> #define BXT_D_CR_DRP0_DUNIT(unit) _MMIO(MCHBAR_MIRROR_BASE_SNB + (unit)
> * 0x200)
I was trying to use already available MACRO, will make changes as 
suggested by you.
>
> 	
>> +#define BXT_P_CR_MC_BIOS_REQ_0_0_0	_MMIO(MCHBAR_MIRROR_BASE_S
>> NB + 0x7114)
>> +#define BXT_REQ_DATA_MASK		(0x3F << 0)
> Or just 0x3F to be consistent with the other definitions.
>
>
>> +#define BXT_DRAM_TYPE_SHIFT		24
>> +#define BXT_DRAM_TYPE_MASK		0x7
> The spec you sent me lists these bits as reserved. Are you sure these
> come from this register?
bits 26:24 tells the DRAM_TYPE information in 
P_CR_MC_BIOS_REQ_0_0_0_MCHBAR register
thanks for review,

Regards,
-Mahesh
>
>
>> +#define BXT_DRAM_CHANNEL_SHIFT		12
>> +#define BXT_DRAM_CHANNEL_MASK		0xF
> I'd rename these to BXT_DRAM_CHANNEL_ACTIVE_{SHIFT,MASK}.
>
>
>> +
>> +#define BXT_DRAM_TYPE_LPDDR3		0x1
>> +#define BXT_DRAM_TYPE_LPDDR4		0x2
>> +#define BXT_DRAM_TYPE_DDR3L		0x4
>> +/*
>> + * BIOS programs this field of REQ_DATA [5:0] in integer
>> + * multiple of 133330 KHz (133.33MHz)
>> + */
>> +#define	SKL_MEMORY_FREQ_MULTIPLIER		0x208D2
> There's a bad tab here that should be a space.
>
> Also, it makes a looooot more sense to use the decimal value instead of
> 0x208D2.
>
>
>> +#define BXT_D_CR_DRP0_DUNIT8			0x1000
>> +#define BXT_D_CR_DRP0_DUNIT9			0x1200
>> +#define BXT_D_CR_DRP0_DUNIT_MAX			4
>> +#define BXT_D_CR_DRP0_DUNIT(x)	_MMIO_MCHBAR_PIPE(x,
>> BXT_D_CR_DRP0_DUNIT8, BXT_D_CR_DRP0_DUNIT9)
>> +#define BXT_DRAM_RANK_MASK			0x3
>> +#define BXT_DRAM_RANK_SINGLE			0x1
>> +#define BXT_DRAM_RANK_DUAL			0x3
>> +
>> +#define SKL_MC_BIOS_DATA_0_0_0_MCHBAR_PCU	_MMIO(MCHBAR_MIRROR
>> _BASE_SNB + 0x5E04)
> Why does BXT look at BIOS_REQ while SKL looks at BIOS_DATA? Shouldn't
> SKL also look at BIOS_DATA or BXT look at BIOS_REQ? What's the big
> difference? Also, that comment mentioning that we may sometimes get
> zero is a little worrying: shouldn't we make sure we always read when
> it's non-zero?
>
> Thanks for the patch, and sorry for the huge amount of questions here:
> learning while doing the review is sometimes tricky.
>
>> +#define SKL_REQ_DATA_MASK			(0xF << 0)
>> +#define SKL_MAD_DIMM_CH0_0_0_0_MCHBAR_MCMAIN	_MMIO(MCHBAR_MIR
>> ROR_BASE_SNB + 0x500C)
>> +#define SKL_MAD_DIMM_CH1_0_0_0_MCHBAR_MCMAIN	_MMIO(MCHBAR_MIR
>> ROR_BASE_SNB + 0x5010)
>> +#define SKL_DRAM_CHANNEL_WIDTH_MASK		0x3
>> +#define SKL_DRAM_CHANNEL_WIDTH_SHIFT		8
>> +#define SKL_DRAM_WIDTH_1			0x0
>> +#define SKL_DRAM_WIDTH_2			0x1
>> +#define SKL_DRAM_WIDTH_4			0x2
>> +#define SKL_DRAM_RANK_MASK			0x1
>> +#define SKL_DRAM_RANK_SHIFT			10
>> +#define SKL_DRAM_RANK_SINGLE			0x0
>> +#define SKL_DRAM_RANK_DUAL			0x1
>> +
>>   /* Please see hsw_read_dcomp() and hsw_write_dcomp() before using
>> this register,
>>    * since on HSW we can't write to it using I915_WRITE. */
>>   #define D_COMP_HSW			_MMIO(MCHBAR_MIRROR_BASE_S
>> NB + 0x5F0C)

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 4/8] drm/i915/gen9: WM memory bandwidth related workaround
  2016-11-10  5:54     ` Mahesh Kumar
@ 2016-11-16 12:36       ` Paulo Zanoni
  0 siblings, 0 replies; 27+ messages in thread
From: Paulo Zanoni @ 2016-11-16 12:36 UTC (permalink / raw)
  To: Mahesh Kumar, intel-gfx

Em Qui, 2016-11-10 às 11:24 +0530, Mahesh Kumar escreveu:
> Hi,
> 
> 
> 
(removed a bunch of stuff here)

> > 
> > > 
> > 
> > 
> > > 
> > > +	bool y_tile_enabled = false;
> > > +
> > if (!platforms_that_require_the_wa) {
> > 	wa = WATERMARK_WA_NONE;
> > 	return;
> > }
> this function is not called by any GEN other than GEN9, will it be ok
> to 
> add "if (!GEN9)" check?

Well, there's Gemnilake support floating around the mailing list now...

> > 
> > > 
> > > +	if (!memdev_info->valid)
> > > +		goto exit;
> > Our default behavior should be to apply the WA then in doubt, not
> > to avoid it, so the return value here and in the other error cases
> > should be WATERAMARK_WA_Y_TILED.
> > 
> > Also, you can avoid the gotos by just setting mem_wa at the
> > beginning
> > of the function, then you can just "return" later instead of goto.
> > 
> > 
> > > 
> > > +
> > > +	system_bw = memdev_info->mem_speed_khz * memdev_info-
> > > > 
> > > > num_channels *
> > > +				memdev_info-
> > > >channel_width_bytes;
> > > +
> > > +	if (!system_bw)
> > > +		goto exit;
> > This shouldn't be possible. Otherwise, the previous patch needs to
> > be
> > fixed in a way to not allow system_bw to end up as zero. Please
> > either
> > remove the check or change to "if (WARN_ON(!system_bw))".
> yes, ideally this should not be possible, but what if because of any 
> reason BIOS is not writing the register OR we don't have BIOS
> altogether?

Then the previous patch should be able to deal with it, and leave us
with memdev_info->valid=false.


> 
> If we add WARN_ON here, it'll flood the logs, we can add
> WARN_ON_ONCE, 
> but we are anyway printing the warning in previous patch, if
> frequency 
> or any other variable calculated is ZERO, IMO will add more prints
> in 
> previous patch instead of adding here.

Usually we add WARNs when we expect them to never ever be triggered. So
instead of making it WARN_ON_ONCE, we should make sure the condition is
never met: if the memory info is bad, set memdev_info->valid to false.


Thanks,
Paulo
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2016-11-16 12:36 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-13 10:58 [PATCH v4 0/8] New DDB Algo and WM fixes Kumar, Mahesh
2016-10-13 10:58 ` [PATCH v4 1/8] drm/i915/skl+: use linetime latency instead of ddb size Kumar, Mahesh
2016-10-31 18:03   ` Paulo Zanoni
2016-11-09 14:58     ` Mahesh Kumar
2016-11-09 17:02       ` Paulo Zanoni
2016-10-13 10:58 ` [PATCH v4 2/8] drm/i915/skl: New ddb allocation algorithm Kumar, Mahesh
2016-11-04 19:25   ` Paulo Zanoni
2016-10-13 10:58 ` [PATCH v4 3/8] drm/i915: Decode system memory bandwidth Kumar, Mahesh
2016-11-03 19:06   ` Paulo Zanoni
2016-11-16 11:48     ` Mahesh Kumar
2016-10-13 10:58 ` [PATCH v4 4/8] drm/i915/gen9: WM memory bandwidth related workaround Kumar, Mahesh
2016-11-04 17:09   ` Paulo Zanoni
2016-11-04 17:22     ` Ville Syrjälä
2016-11-10  5:54     ` Mahesh Kumar
2016-11-16 12:36       ` Paulo Zanoni
2016-10-13 10:58 ` [PATCH v4 5/8] drm/i915/skl+: reset y_plane ddb structure also during calculation Kumar, Mahesh
2016-11-04 17:39   ` Paulo Zanoni
2016-10-13 10:58 ` [PATCH v4 6/8] drm/i915/skl: Add variables to check x_tile and y_tile Kumar, Mahesh
2016-11-04 17:45   ` Paulo Zanoni
2016-10-13 10:58 ` [PATCH v4 7/8] drm/i915/skl+: change WM calc to fixed point 16.16 Kumar, Mahesh
2016-11-04 19:03   ` Paulo Zanoni
2016-10-13 10:58 ` [PATCH v4 8/8] drm/i915/bxt: Enable IPC support Kumar, Mahesh
2016-10-13 11:19   ` Maarten Lankhorst
2016-10-13 13:01     ` Mahesh Kumar
2016-10-13 15:36       ` Daniel Vetter
2016-11-04 19:20   ` Paulo Zanoni
2016-10-13 13:50 ` ✗ Fi.CI.BAT: warning for New DDB Algo and WM fixes (rev4) Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.