All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] DMC/DC state hardening
@ 2016-01-19 19:50 Mika Kuoppala
  2016-01-19 19:50 ` [PATCH 1/2] drm/i915: Stop using DC states if firmware disagrees on the state Mika Kuoppala
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Mika Kuoppala @ 2016-01-19 19:50 UTC (permalink / raw)
  To: intel-gfx; +Cc: miku

Hi,

There is not known root cause to why we end up in a situation
where the dmc doesn't anymore obey us on dc state setup.

https://bugs.freedesktop.org/show_bug.cgi?id=93698
https://bugs.freedesktop.org/show_bug.cgi?id=93697

But here are few patches to alleviate the consequences
if that happens. We really want to keep the gpu alive
even if the dmc starts hallucinating.

Mika Kuoppala (2):
  drm/i915: Stop using DC states if firmware disagrees on the state
  drm/i915: Use init power domain during reset

 drivers/gpu/drm/i915/i915_irq.c         | 10 ++++++++++
 drivers/gpu/drm/i915/intel_runtime_pm.c | 31 +++++++++++++++++++++++++++----
 2 files changed, 37 insertions(+), 4 deletions(-)

-- 
2.5.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/2] drm/i915: Stop using DC states if firmware disagrees on the state
  2016-01-19 19:50 [PATCH 0/2] DMC/DC state hardening Mika Kuoppala
@ 2016-01-19 19:50 ` Mika Kuoppala
  2016-01-20  9:36   ` Chris Wilson
  2016-01-19 19:50 ` [PATCH 2/2] drm/i915: Use init power domain during reset Mika Kuoppala
  2016-01-20  8:49 ` ✗ Fi.CI.BAT: warning for DMC/DC state hardening Patchwork
  2 siblings, 1 reply; 10+ messages in thread
From: Mika Kuoppala @ 2016-01-19 19:50 UTC (permalink / raw)
  To: intel-gfx; +Cc: miku

Sometimes we get dmesg warnings claiming that DC6 was already
enabled prior to our enabling. Investigations using readback of
the written dc state value indicate that sometimes when we disable
the dc6, the write doesn't stick, or it does but for a very short time.

This leads to state keeping confusion between firmware and driver,
driver thinking that dc6 is off and proceeds with modesets
while firmware still keeps/thinks dc6 is on. Evidence from logs
suggests that the dc6 is still on as flips start to timeout,
leading to eventual gpu hang.

Further complication is that to recover the hang, we reset while
the DC6 is enabled. With this state on the reinit of the gpu side
won't come out clean and rings rehang right after the init.

Harden the detection of this situation and immediately disable
DC6 if we disagree on state with the firmware. This should make it
less probable for driver to do end up in a wrong conclusion and
let things like modeset and gpu reset proceed with dc6 still
enabled.

References: https://bugs.freedesktop.org/show_bug.cgi?id=93768
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Patrik Jakobsson <patrik.jakobsson@linux.intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/intel_runtime_pm.c | 31 +++++++++++++++++++++++++++----
 1 file changed, 27 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c b/drivers/gpu/drm/i915/intel_runtime_pm.c
index bbca527184d0..5a21453e38e1 100644
--- a/drivers/gpu/drm/i915/intel_runtime_pm.c
+++ b/drivers/gpu/drm/i915/intel_runtime_pm.c
@@ -470,9 +470,23 @@ static void gen9_set_dc_state_debugmask_memory_up(
 	}
 }
 
+static int gen9_dc_state_verified(struct drm_i915_private *dev_priv,
+				  u32 mask, u32 state)
+{
+	int i;
+
+	/* Observe the dc state with handful of reads */
+	for (i = 0; i < 3; i++) {
+		if (wait_for((I915_READ(DC_STATE_EN) & mask) == state, 100))
+			return false;
+	}
+
+	return true;
+}
+
 static void gen9_set_dc_state(struct drm_i915_private *dev_priv, uint32_t state)
 {
-	uint32_t val;
+	uint32_t val_pre, val;
 	uint32_t mask;
 
 	mask = DC_STATE_EN_UPTO_DC5;
@@ -491,13 +505,22 @@ static void gen9_set_dc_state(struct drm_i915_private *dev_priv, uint32_t state)
 	if (state & DC_STATE_EN_UPTO_DC5_DC6_MASK)
 		gen9_set_dc_state_debugmask_memory_up(dev_priv);
 
-	val = I915_READ(DC_STATE_EN);
+	val_pre = I915_READ(DC_STATE_EN);
 	DRM_DEBUG_KMS("Setting DC state from %02x to %02x\n",
-		      val & mask, state);
+		      val_pre & mask, state);
+	val = val_pre;
 	val &= ~mask;
 	val |= state;
 	I915_WRITE(DC_STATE_EN, val);
-	POSTING_READ(DC_STATE_EN);
+
+	if (gen9_dc_state_verified(dev_priv, mask, state) == false) {
+		DRM_ERROR("DC set timeout, from %02x to %02x, now %x\n",
+			  val_pre & mask, state, I915_READ(DC_STATE_EN));
+
+		i915.enable_dc = 0;
+		I915_WRITE(DC_STATE_EN, DC_STATE_DISABLE);
+		POSTING_READ(DC_STATE_EN);
+	}
 }
 
 void bxt_enable_dc9(struct drm_i915_private *dev_priv)
-- 
2.5.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] drm/i915: Use init power domain during reset
  2016-01-19 19:50 [PATCH 0/2] DMC/DC state hardening Mika Kuoppala
  2016-01-19 19:50 ` [PATCH 1/2] drm/i915: Stop using DC states if firmware disagrees on the state Mika Kuoppala
@ 2016-01-19 19:50 ` Mika Kuoppala
  2016-01-19 20:05   ` Ville Syrjälä
  2016-01-20 10:52   ` Daniel Vetter
  2016-01-20  8:49 ` ✗ Fi.CI.BAT: warning for DMC/DC state hardening Patchwork
  2 siblings, 2 replies; 10+ messages in thread
From: Mika Kuoppala @ 2016-01-19 19:50 UTC (permalink / raw)
  To: intel-gfx; +Cc: miku

If we have driver failure in our power well and/or dc
state keeping, we might try to reset without powers.

Evidence shows that resetting the chip with dc6 enabled
don't lead to desired results. The dmc kept it's enabled
dc6 state over the reset. On subsequent init the rings
just hanged right from the start so they didn't reset
properly or that the dmc interfered with the init.

Harden the reset by setting power domains to be in
init mode during reset. This causes the hw state to be
flushed so dc6 will be forcibly disabled also.

References: https://bugs.freedesktop.org/show_bug.cgi?id=93768
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Patrik Jakobsson <patrik.jakobsson@linux.intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_irq.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 25a89373df63..78e242b5c357 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2510,6 +2510,14 @@ static void i915_reset_and_wakeup(struct drm_device *dev)
 		 */
 		intel_runtime_pm_get(dev_priv);
 
+		/* Even if we hold the pm ref, we still might have inconsistent
+		 * power states due to driver failure. Trying to reset without
+		 * powers or with wrong dmc firmware state is futile. Flush
+		 * our power well and dc states ensuring that we reset with
+		 * powers enabled.
+		 */
+		intel_display_set_init_power(dev_priv, true);
+
 		intel_prepare_reset(dev);
 
 		/*
@@ -2522,6 +2530,8 @@ static void i915_reset_and_wakeup(struct drm_device *dev)
 
 		intel_finish_reset(dev);
 
+		intel_display_set_init_power(dev_priv, false);
+
 		intel_runtime_pm_put(dev_priv);
 
 		if (ret == 0) {
-- 
2.5.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] drm/i915: Use init power domain during reset
  2016-01-19 19:50 ` [PATCH 2/2] drm/i915: Use init power domain during reset Mika Kuoppala
@ 2016-01-19 20:05   ` Ville Syrjälä
  2016-01-20  9:53     ` Mika Kuoppala
  2016-01-20 10:52   ` Daniel Vetter
  1 sibling, 1 reply; 10+ messages in thread
From: Ville Syrjälä @ 2016-01-19 20:05 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: intel-gfx, miku

On Tue, Jan 19, 2016 at 09:50:09PM +0200, Mika Kuoppala wrote:
> If we have driver failure in our power well and/or dc
> state keeping, we might try to reset without powers.
> 
> Evidence shows that resetting the chip with dc6 enabled
> don't lead to desired results. The dmc kept it's enabled
> dc6 state over the reset. On subsequent init the rings
> just hanged right from the start so they didn't reset
> properly or that the dmc interfered with the init.
> 
> Harden the reset by setting power domains to be in
> init mode during reset. This causes the hw state to be
> flushed so dc6 will be forcibly disabled also.

Hmm. Aren't we holding a forcewake around the reset already? Also with
the GT hung it shouldn't really be dropping into RC6 anyway, which
should keep the thing in pc2 (IIRC) or higher. I guess if we do a
simulated hang it might be in rc6, but the forcewake should be enough
protection there.

Deep package C-state is the only link I could think of between the GT
and DMC. Otherwise I can't think of any real link between the two. So
quite baffled by this.

> 
> References: https://bugs.freedesktop.org/show_bug.cgi?id=93768
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Patrik Jakobsson <patrik.jakobsson@linux.intel.com>
> Cc: Imre Deak <imre.deak@intel.com>
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_irq.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 25a89373df63..78e242b5c357 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -2510,6 +2510,14 @@ static void i915_reset_and_wakeup(struct drm_device *dev)
>  		 */
>  		intel_runtime_pm_get(dev_priv);
>  
> +		/* Even if we hold the pm ref, we still might have inconsistent
> +		 * power states due to driver failure. Trying to reset without
> +		 * powers or with wrong dmc firmware state is futile. Flush
> +		 * our power well and dc states ensuring that we reset with
> +		 * powers enabled.
> +		 */
> +		intel_display_set_init_power(dev_priv, true);
> +
>  		intel_prepare_reset(dev);
>  
>  		/*
> @@ -2522,6 +2530,8 @@ static void i915_reset_and_wakeup(struct drm_device *dev)
>  
>  		intel_finish_reset(dev);
>  
> +		intel_display_set_init_power(dev_priv, false);
> +
>  		intel_runtime_pm_put(dev_priv);
>  
>  		if (ret == 0) {
> -- 
> 2.5.0

-- 
Ville Syrjälä
Intel OTC
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* ✗ Fi.CI.BAT: warning for DMC/DC state hardening
  2016-01-19 19:50 [PATCH 0/2] DMC/DC state hardening Mika Kuoppala
  2016-01-19 19:50 ` [PATCH 1/2] drm/i915: Stop using DC states if firmware disagrees on the state Mika Kuoppala
  2016-01-19 19:50 ` [PATCH 2/2] drm/i915: Use init power domain during reset Mika Kuoppala
@ 2016-01-20  8:49 ` Patchwork
  2 siblings, 0 replies; 10+ messages in thread
From: Patchwork @ 2016-01-20  8:49 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: intel-gfx

== Summary ==

Built on 7d26528d30b0d8119c858115b6e5e22deb74ec71 drm-intel-nightly: 2016y-01m-19d-19h-38m-52s UTC integration manifest

Test kms_pipe_crc_basic:
        Subgroup read-crc-pipe-b:
                dmesg-warn -> PASS       (ilk-hp8440p)
        Subgroup read-crc-pipe-b-frame-sequence:
                pass       -> DMESG-WARN (bdw-ultra)

bdw-nuci7        total:140  pass:131  dwarn:0   dfail:0   fail:0   skip:9  
bdw-ultra        total:140  pass:131  dwarn:2   dfail:1   fail:0   skip:6  
byt-nuc          total:143  pass:125  dwarn:3   dfail:0   fail:0   skip:15 
hsw-brixbox      total:143  pass:136  dwarn:0   dfail:0   fail:0   skip:7  
hsw-gt2          total:143  pass:139  dwarn:0   dfail:0   fail:0   skip:4  
ilk-hp8440p      total:143  pass:102  dwarn:3   dfail:0   fail:0   skip:38 
ivb-t430s        total:137  pass:124  dwarn:3   dfail:4   fail:0   skip:6  
skl-i5k-2        total:143  pass:134  dwarn:1   dfail:0   fail:0   skip:8  
skl-i7k-2        total:143  pass:134  dwarn:1   dfail:0   fail:0   skip:8  
snb-dellxps      total:143  pass:124  dwarn:5   dfail:0   fail:0   skip:14 
snb-x220t        total:143  pass:124  dwarn:5   dfail:0   fail:1   skip:13 

Results at /archive/results/CI_IGT_test/Patchwork_1227/

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] drm/i915: Stop using DC states if firmware disagrees on the state
  2016-01-19 19:50 ` [PATCH 1/2] drm/i915: Stop using DC states if firmware disagrees on the state Mika Kuoppala
@ 2016-01-20  9:36   ` Chris Wilson
  2016-01-20  9:49     ` Mika Kuoppala
  0 siblings, 1 reply; 10+ messages in thread
From: Chris Wilson @ 2016-01-20  9:36 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: intel-gfx, miku

On Tue, Jan 19, 2016 at 09:50:08PM +0200, Mika Kuoppala wrote:
> Sometimes we get dmesg warnings claiming that DC6 was already
> enabled prior to our enabling. Investigations using readback of
> the written dc state value indicate that sometimes when we disable
> the dc6, the write doesn't stick, or it does but for a very short time.
> 
> This leads to state keeping confusion between firmware and driver,
> driver thinking that dc6 is off and proceeds with modesets
> while firmware still keeps/thinks dc6 is on. Evidence from logs
> suggests that the dc6 is still on as flips start to timeout,
> leading to eventual gpu hang.
> 
> Further complication is that to recover the hang, we reset while
> the DC6 is enabled. With this state on the reinit of the gpu side
> won't come out clean and rings rehang right after the init.
> 
> Harden the detection of this situation and immediately disable
> DC6 if we disagree on state with the firmware. This should make it
> less probable for driver to do end up in a wrong conclusion and
> let things like modeset and gpu reset proceed with dc6 still
> enabled.
> 
> References: https://bugs.freedesktop.org/show_bug.cgi?id=93768
> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Patrik Jakobsson <patrik.jakobsson@linux.intel.com>
> Cc: Imre Deak <imre.deak@intel.com>
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_runtime_pm.c | 31 +++++++++++++++++++++++++++----
>  1 file changed, 27 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c b/drivers/gpu/drm/i915/intel_runtime_pm.c
> index bbca527184d0..5a21453e38e1 100644
> --- a/drivers/gpu/drm/i915/intel_runtime_pm.c
> +++ b/drivers/gpu/drm/i915/intel_runtime_pm.c
> @@ -470,9 +470,23 @@ static void gen9_set_dc_state_debugmask_memory_up(
>  	}
>  }
>  
> +static int gen9_dc_state_verified(struct drm_i915_private *dev_priv,
> +				  u32 mask, u32 state)
> +{
> +	int i;
> +
> +	/* Observe the dc state with handful of reads */
> +	for (i = 0; i < 3; i++) {
> +		if (wait_for((I915_READ(DC_STATE_EN) & mask) == state, 100))
> +			return false;
> +	}

Why two loops of reads?

Since this is equivalent to
	return wait_for((I915_READ(DC_STATE_EN) & mask) == state, 300) == 0;

In this case

WRITE(DC_STATE_EN, val & ~mask | state);
if (wait_for((READ(DC_STATE_END) & mask) = state, 300) {
	/* oh noes */
}

is more obvious (since we verify with a read of the same register, not
some magic ack sequence).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] drm/i915: Stop using DC states if firmware disagrees on the state
  2016-01-20  9:36   ` Chris Wilson
@ 2016-01-20  9:49     ` Mika Kuoppala
  2016-01-20 10:17       ` Chris Wilson
  0 siblings, 1 reply; 10+ messages in thread
From: Mika Kuoppala @ 2016-01-20  9:49 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx, miku

Chris Wilson <chris@chris-wilson.co.uk> writes:

> On Tue, Jan 19, 2016 at 09:50:08PM +0200, Mika Kuoppala wrote:
>> Sometimes we get dmesg warnings claiming that DC6 was already
>> enabled prior to our enabling. Investigations using readback of
>> the written dc state value indicate that sometimes when we disable
>> the dc6, the write doesn't stick, or it does but for a very short time.
>> 
>> This leads to state keeping confusion between firmware and driver,
>> driver thinking that dc6 is off and proceeds with modesets
>> while firmware still keeps/thinks dc6 is on. Evidence from logs
>> suggests that the dc6 is still on as flips start to timeout,
>> leading to eventual gpu hang.
>> 
>> Further complication is that to recover the hang, we reset while
>> the DC6 is enabled. With this state on the reinit of the gpu side
>> won't come out clean and rings rehang right after the init.
>> 
>> Harden the detection of this situation and immediately disable
>> DC6 if we disagree on state with the firmware. This should make it
>> less probable for driver to do end up in a wrong conclusion and
>> let things like modeset and gpu reset proceed with dc6 still
>> enabled.
>> 
>> References: https://bugs.freedesktop.org/show_bug.cgi?id=93768
>> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Patrik Jakobsson <patrik.jakobsson@linux.intel.com>
>> Cc: Imre Deak <imre.deak@intel.com>
>> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
>> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
>> ---
>>  drivers/gpu/drm/i915/intel_runtime_pm.c | 31 +++++++++++++++++++++++++++----
>>  1 file changed, 27 insertions(+), 4 deletions(-)
>> 
>> diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c b/drivers/gpu/drm/i915/intel_runtime_pm.c
>> index bbca527184d0..5a21453e38e1 100644
>> --- a/drivers/gpu/drm/i915/intel_runtime_pm.c
>> +++ b/drivers/gpu/drm/i915/intel_runtime_pm.c
>> @@ -470,9 +470,23 @@ static void gen9_set_dc_state_debugmask_memory_up(
>>  	}
>>  }
>>  
>> +static int gen9_dc_state_verified(struct drm_i915_private *dev_priv,
>> +				  u32 mask, u32 state)
>> +{
>> +	int i;
>> +
>> +	/* Observe the dc state with handful of reads */
>> +	for (i = 0; i < 3; i++) {
>> +		if (wait_for((I915_READ(DC_STATE_EN) & mask) == state, 100))
>> +			return false;
>> +	}
>
> Why two loops of reads?
>

I once saw the write stick but for a very short time, so with one wait_for we
might miss it.

> Since this is equivalent to
> 	return wait_for((I915_READ(DC_STATE_EN) & mask) == state, 300) == 0;
>
> In this case
>
> WRITE(DC_STATE_EN, val & ~mask | state);
> if (wait_for((READ(DC_STATE_END) & mask) = state, 300) {
> 	/* oh noes */
> }
>
> is more obvious (since we verify with a read of the same register, not
> some magic ack sequence).

I could open code the 2 checks here, would be more obvious yes.

Btw what led you to think that the state doesn't stick? Looking
at the code for correctness and then deducting that if that warn
happens, there is no other explanation?

This is all duct tape until we understand why this
interface doesn't work as advertized.

-Mika

> -Chris
>
> -- 
> Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] drm/i915: Use init power domain during reset
  2016-01-19 20:05   ` Ville Syrjälä
@ 2016-01-20  9:53     ` Mika Kuoppala
  0 siblings, 0 replies; 10+ messages in thread
From: Mika Kuoppala @ 2016-01-20  9:53 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: intel-gfx, miku

Ville Syrjälä <ville.syrjala@linux.intel.com> writes:

> On Tue, Jan 19, 2016 at 09:50:09PM +0200, Mika Kuoppala wrote:
>> If we have driver failure in our power well and/or dc
>> state keeping, we might try to reset without powers.
>> 
>> Evidence shows that resetting the chip with dc6 enabled
>> don't lead to desired results. The dmc kept it's enabled
>> dc6 state over the reset. On subsequent init the rings
>> just hanged right from the start so they didn't reset
>> properly or that the dmc interfered with the init.
>> 
>> Harden the reset by setting power domains to be in
>> init mode during reset. This causes the hw state to be
>> flushed so dc6 will be forcibly disabled also.
>
> Hmm. Aren't we holding a forcewake around the reset already? Also with
> the GT hung it shouldn't really be dropping into RC6 anyway, which
> should keep the thing in pc2 (IIRC) or higher. I guess if we do a
> simulated hang it might be in rc6, but the forcewake should be enough
> protection there.
>

We are holding forcewake and pm ref. So this is paranoia.
But as the dmc don't obey us leading to state mismatch,
I thought this is the way to force (throught the hw sync)
the dm state as disabled before we hit the reset button.

Another way would be to introduce intel_power_domains_reset().
Looks like _*suspend() does too much for reset.

> Deep package C-state is the only link I could think of between the GT
> and DMC. Otherwise I can't think of any real link between the two. So
> quite baffled by this.
>

Me too. This is all ducttape until the root cause is known.
-Mika

>> 
>> References: https://bugs.freedesktop.org/show_bug.cgi?id=93768
>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Patrik Jakobsson <patrik.jakobsson@linux.intel.com>
>> Cc: Imre Deak <imre.deak@intel.com>
>> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
>> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
>> ---
>>  drivers/gpu/drm/i915/i915_irq.c | 10 ++++++++++
>>  1 file changed, 10 insertions(+)
>> 
>> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
>> index 25a89373df63..78e242b5c357 100644
>> --- a/drivers/gpu/drm/i915/i915_irq.c
>> +++ b/drivers/gpu/drm/i915/i915_irq.c
>> @@ -2510,6 +2510,14 @@ static void i915_reset_and_wakeup(struct drm_device *dev)
>>  		 */
>>  		intel_runtime_pm_get(dev_priv);
>>  
>> +		/* Even if we hold the pm ref, we still might have inconsistent
>> +		 * power states due to driver failure. Trying to reset without
>> +		 * powers or with wrong dmc firmware state is futile. Flush
>> +		 * our power well and dc states ensuring that we reset with
>> +		 * powers enabled.
>> +		 */
>> +		intel_display_set_init_power(dev_priv, true);
>> +
>>  		intel_prepare_reset(dev);
>>  
>>  		/*
>> @@ -2522,6 +2530,8 @@ static void i915_reset_and_wakeup(struct drm_device *dev)
>>  
>>  		intel_finish_reset(dev);
>>  
>> +		intel_display_set_init_power(dev_priv, false);
>> +
>>  		intel_runtime_pm_put(dev_priv);
>>  
>>  		if (ret == 0) {
>> -- 
>> 2.5.0
>
> -- 
> Ville Syrjälä
> Intel OTC
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] drm/i915: Stop using DC states if firmware disagrees on the state
  2016-01-20  9:49     ` Mika Kuoppala
@ 2016-01-20 10:17       ` Chris Wilson
  0 siblings, 0 replies; 10+ messages in thread
From: Chris Wilson @ 2016-01-20 10:17 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: intel-gfx, miku

On Wed, Jan 20, 2016 at 11:49:51AM +0200, Mika Kuoppala wrote:
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > On Tue, Jan 19, 2016 at 09:50:08PM +0200, Mika Kuoppala wrote:
> >> Sometimes we get dmesg warnings claiming that DC6 was already
> >> enabled prior to our enabling. Investigations using readback of
> >> the written dc state value indicate that sometimes when we disable
> >> the dc6, the write doesn't stick, or it does but for a very short time.
> >> 
> >> This leads to state keeping confusion between firmware and driver,
> >> driver thinking that dc6 is off and proceeds with modesets
> >> while firmware still keeps/thinks dc6 is on. Evidence from logs
> >> suggests that the dc6 is still on as flips start to timeout,
> >> leading to eventual gpu hang.
> >> 
> >> Further complication is that to recover the hang, we reset while
> >> the DC6 is enabled. With this state on the reinit of the gpu side
> >> won't come out clean and rings rehang right after the init.
> >> 
> >> Harden the detection of this situation and immediately disable
> >> DC6 if we disagree on state with the firmware. This should make it
> >> less probable for driver to do end up in a wrong conclusion and
> >> let things like modeset and gpu reset proceed with dc6 still
> >> enabled.
> >> 
> >> References: https://bugs.freedesktop.org/show_bug.cgi?id=93768
> >> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
> >> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> >> Cc: Patrik Jakobsson <patrik.jakobsson@linux.intel.com>
> >> Cc: Imre Deak <imre.deak@intel.com>
> >> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> >> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> >> ---
> >>  drivers/gpu/drm/i915/intel_runtime_pm.c | 31 +++++++++++++++++++++++++++----
> >>  1 file changed, 27 insertions(+), 4 deletions(-)
> >> 
> >> diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c b/drivers/gpu/drm/i915/intel_runtime_pm.c
> >> index bbca527184d0..5a21453e38e1 100644
> >> --- a/drivers/gpu/drm/i915/intel_runtime_pm.c
> >> +++ b/drivers/gpu/drm/i915/intel_runtime_pm.c
> >> @@ -470,9 +470,23 @@ static void gen9_set_dc_state_debugmask_memory_up(
> >>  	}
> >>  }
> >>  
> >> +static int gen9_dc_state_verified(struct drm_i915_private *dev_priv,
> >> +				  u32 mask, u32 state)
> >> +{
> >> +	int i;
> >> +
> >> +	/* Observe the dc state with handful of reads */
> >> +	for (i = 0; i < 3; i++) {
> >> +		if (wait_for((I915_READ(DC_STATE_EN) & mask) == state, 100))
> >> +			return false;
> >> +	}
> >
> > Why two loops of reads?
> >
> 
> I once saw the write stick but for a very short time, so with one wait_for we
> might miss it.
> 
> > Since this is equivalent to
> > 	return wait_for((I915_READ(DC_STATE_EN) & mask) == state, 300) == 0;
> >
> > In this case
> >
> > WRITE(DC_STATE_EN, val & ~mask | state);
> > if (wait_for((READ(DC_STATE_END) & mask) = state, 300) {
> > 	/* oh noes */
> > }
> >
> > is more obvious (since we verify with a read of the same register, not
> > some magic ack sequence).
> 
> I could open code the 2 checks here, would be more obvious yes.
> 
> Btw what led you to think that the state doesn't stick? Looking
> at the code for correctness and then deducting that if that warn
> happens, there is no other explanation?

Yes. It has also been used in the past as a form of acknowledging state
changes, though usually it is a separate bit (i.e. a request bit and a
current state bit).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] drm/i915: Use init power domain during reset
  2016-01-19 19:50 ` [PATCH 2/2] drm/i915: Use init power domain during reset Mika Kuoppala
  2016-01-19 20:05   ` Ville Syrjälä
@ 2016-01-20 10:52   ` Daniel Vetter
  1 sibling, 0 replies; 10+ messages in thread
From: Daniel Vetter @ 2016-01-20 10:52 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: intel-gfx, miku

On Tue, Jan 19, 2016 at 09:50:09PM +0200, Mika Kuoppala wrote:
> If we have driver failure in our power well and/or dc
> state keeping, we might try to reset without powers.
> 
> Evidence shows that resetting the chip with dc6 enabled
> don't lead to desired results. The dmc kept it's enabled
> dc6 state over the reset. On subsequent init the rings
> just hanged right from the start so they didn't reset
> properly or that the dmc interfered with the init.
> 
> Harden the reset by setting power domains to be in
> init mode during reset. This causes the hw state to be
> flushed so dc6 will be forcibly disabled also.
> 
> References: https://bugs.freedesktop.org/show_bug.cgi?id=93768
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Patrik Jakobsson <patrik.jakobsson@linux.intel.com>
> Cc: Imre Deak <imre.deak@intel.com>
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_irq.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 25a89373df63..78e242b5c357 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -2510,6 +2510,14 @@ static void i915_reset_and_wakeup(struct drm_device *dev)
>  		 */
>  		intel_runtime_pm_get(dev_priv);
>  
> +		/* Even if we hold the pm ref, we still might have inconsistent
> +		 * power states due to driver failure. Trying to reset without
> +		 * powers or with wrong dmc firmware state is futile. Flush
> +		 * our power well and dc states ensuring that we reset with
> +		 * powers enabled.
> +		 */
> +		intel_display_set_init_power(dev_priv, true);
> +
>  		intel_prepare_reset(dev);

Since this seems just dc6 related there's good reason to believe it's the
display mmio writes in intel_prepare/finish_reset(). Can you just grab
relevant power wells for the register we write in there instead of
grabbing everything at the top level here?

Asking for that since with atomic this shouldn't happen any more ...
-Daniel

>  
>  		/*
> @@ -2522,6 +2530,8 @@ static void i915_reset_and_wakeup(struct drm_device *dev)
>  
>  		intel_finish_reset(dev);
>  
> +		intel_display_set_init_power(dev_priv, false);
> +
>  		intel_runtime_pm_put(dev_priv);
>  
>  		if (ret == 0) {
> -- 
> 2.5.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-01-20 10:52 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-19 19:50 [PATCH 0/2] DMC/DC state hardening Mika Kuoppala
2016-01-19 19:50 ` [PATCH 1/2] drm/i915: Stop using DC states if firmware disagrees on the state Mika Kuoppala
2016-01-20  9:36   ` Chris Wilson
2016-01-20  9:49     ` Mika Kuoppala
2016-01-20 10:17       ` Chris Wilson
2016-01-19 19:50 ` [PATCH 2/2] drm/i915: Use init power domain during reset Mika Kuoppala
2016-01-19 20:05   ` Ville Syrjälä
2016-01-20  9:53     ` Mika Kuoppala
2016-01-20 10:52   ` Daniel Vetter
2016-01-20  8:49 ` ✗ Fi.CI.BAT: warning for DMC/DC state hardening Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.