* [PATCH 0/2] DMC/DC state hardening @ 2016-01-19 19:50 Mika Kuoppala 2016-01-19 19:50 ` [PATCH 1/2] drm/i915: Stop using DC states if firmware disagrees on the state Mika Kuoppala ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: Mika Kuoppala @ 2016-01-19 19:50 UTC (permalink / raw) To: intel-gfx; +Cc: miku Hi, There is not known root cause to why we end up in a situation where the dmc doesn't anymore obey us on dc state setup. https://bugs.freedesktop.org/show_bug.cgi?id=93698 https://bugs.freedesktop.org/show_bug.cgi?id=93697 But here are few patches to alleviate the consequences if that happens. We really want to keep the gpu alive even if the dmc starts hallucinating. Mika Kuoppala (2): drm/i915: Stop using DC states if firmware disagrees on the state drm/i915: Use init power domain during reset drivers/gpu/drm/i915/i915_irq.c | 10 ++++++++++ drivers/gpu/drm/i915/intel_runtime_pm.c | 31 +++++++++++++++++++++++++++---- 2 files changed, 37 insertions(+), 4 deletions(-) -- 2.5.0 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/2] drm/i915: Stop using DC states if firmware disagrees on the state 2016-01-19 19:50 [PATCH 0/2] DMC/DC state hardening Mika Kuoppala @ 2016-01-19 19:50 ` Mika Kuoppala 2016-01-20 9:36 ` Chris Wilson 2016-01-19 19:50 ` [PATCH 2/2] drm/i915: Use init power domain during reset Mika Kuoppala 2016-01-20 8:49 ` ✗ Fi.CI.BAT: warning for DMC/DC state hardening Patchwork 2 siblings, 1 reply; 10+ messages in thread From: Mika Kuoppala @ 2016-01-19 19:50 UTC (permalink / raw) To: intel-gfx; +Cc: miku Sometimes we get dmesg warnings claiming that DC6 was already enabled prior to our enabling. Investigations using readback of the written dc state value indicate that sometimes when we disable the dc6, the write doesn't stick, or it does but for a very short time. This leads to state keeping confusion between firmware and driver, driver thinking that dc6 is off and proceeds with modesets while firmware still keeps/thinks dc6 is on. Evidence from logs suggests that the dc6 is still on as flips start to timeout, leading to eventual gpu hang. Further complication is that to recover the hang, we reset while the DC6 is enabled. With this state on the reinit of the gpu side won't come out clean and rings rehang right after the init. Harden the detection of this situation and immediately disable DC6 if we disagree on state with the firmware. This should make it less probable for driver to do end up in a wrong conclusion and let things like modeset and gpu reset proceed with dc6 still enabled. References: https://bugs.freedesktop.org/show_bug.cgi?id=93768 Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Patrik Jakobsson <patrik.jakobsson@linux.intel.com> Cc: Imre Deak <imre.deak@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> --- drivers/gpu/drm/i915/intel_runtime_pm.c | 31 +++++++++++++++++++++++++++---- 1 file changed, 27 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c b/drivers/gpu/drm/i915/intel_runtime_pm.c index bbca527184d0..5a21453e38e1 100644 --- a/drivers/gpu/drm/i915/intel_runtime_pm.c +++ b/drivers/gpu/drm/i915/intel_runtime_pm.c @@ -470,9 +470,23 @@ static void gen9_set_dc_state_debugmask_memory_up( } } +static int gen9_dc_state_verified(struct drm_i915_private *dev_priv, + u32 mask, u32 state) +{ + int i; + + /* Observe the dc state with handful of reads */ + for (i = 0; i < 3; i++) { + if (wait_for((I915_READ(DC_STATE_EN) & mask) == state, 100)) + return false; + } + + return true; +} + static void gen9_set_dc_state(struct drm_i915_private *dev_priv, uint32_t state) { - uint32_t val; + uint32_t val_pre, val; uint32_t mask; mask = DC_STATE_EN_UPTO_DC5; @@ -491,13 +505,22 @@ static void gen9_set_dc_state(struct drm_i915_private *dev_priv, uint32_t state) if (state & DC_STATE_EN_UPTO_DC5_DC6_MASK) gen9_set_dc_state_debugmask_memory_up(dev_priv); - val = I915_READ(DC_STATE_EN); + val_pre = I915_READ(DC_STATE_EN); DRM_DEBUG_KMS("Setting DC state from %02x to %02x\n", - val & mask, state); + val_pre & mask, state); + val = val_pre; val &= ~mask; val |= state; I915_WRITE(DC_STATE_EN, val); - POSTING_READ(DC_STATE_EN); + + if (gen9_dc_state_verified(dev_priv, mask, state) == false) { + DRM_ERROR("DC set timeout, from %02x to %02x, now %x\n", + val_pre & mask, state, I915_READ(DC_STATE_EN)); + + i915.enable_dc = 0; + I915_WRITE(DC_STATE_EN, DC_STATE_DISABLE); + POSTING_READ(DC_STATE_EN); + } } void bxt_enable_dc9(struct drm_i915_private *dev_priv) -- 2.5.0 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] drm/i915: Stop using DC states if firmware disagrees on the state 2016-01-19 19:50 ` [PATCH 1/2] drm/i915: Stop using DC states if firmware disagrees on the state Mika Kuoppala @ 2016-01-20 9:36 ` Chris Wilson 2016-01-20 9:49 ` Mika Kuoppala 0 siblings, 1 reply; 10+ messages in thread From: Chris Wilson @ 2016-01-20 9:36 UTC (permalink / raw) To: Mika Kuoppala; +Cc: intel-gfx, miku On Tue, Jan 19, 2016 at 09:50:08PM +0200, Mika Kuoppala wrote: > Sometimes we get dmesg warnings claiming that DC6 was already > enabled prior to our enabling. Investigations using readback of > the written dc state value indicate that sometimes when we disable > the dc6, the write doesn't stick, or it does but for a very short time. > > This leads to state keeping confusion between firmware and driver, > driver thinking that dc6 is off and proceeds with modesets > while firmware still keeps/thinks dc6 is on. Evidence from logs > suggests that the dc6 is still on as flips start to timeout, > leading to eventual gpu hang. > > Further complication is that to recover the hang, we reset while > the DC6 is enabled. With this state on the reinit of the gpu side > won't come out clean and rings rehang right after the init. > > Harden the detection of this situation and immediately disable > DC6 if we disagree on state with the firmware. This should make it > less probable for driver to do end up in a wrong conclusion and > let things like modeset and gpu reset proceed with dc6 still > enabled. > > References: https://bugs.freedesktop.org/show_bug.cgi?id=93768 > Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Patrik Jakobsson <patrik.jakobsson@linux.intel.com> > Cc: Imre Deak <imre.deak@intel.com> > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> > Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> > --- > drivers/gpu/drm/i915/intel_runtime_pm.c | 31 +++++++++++++++++++++++++++---- > 1 file changed, 27 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c b/drivers/gpu/drm/i915/intel_runtime_pm.c > index bbca527184d0..5a21453e38e1 100644 > --- a/drivers/gpu/drm/i915/intel_runtime_pm.c > +++ b/drivers/gpu/drm/i915/intel_runtime_pm.c > @@ -470,9 +470,23 @@ static void gen9_set_dc_state_debugmask_memory_up( > } > } > > +static int gen9_dc_state_verified(struct drm_i915_private *dev_priv, > + u32 mask, u32 state) > +{ > + int i; > + > + /* Observe the dc state with handful of reads */ > + for (i = 0; i < 3; i++) { > + if (wait_for((I915_READ(DC_STATE_EN) & mask) == state, 100)) > + return false; > + } Why two loops of reads? Since this is equivalent to return wait_for((I915_READ(DC_STATE_EN) & mask) == state, 300) == 0; In this case WRITE(DC_STATE_EN, val & ~mask | state); if (wait_for((READ(DC_STATE_END) & mask) = state, 300) { /* oh noes */ } is more obvious (since we verify with a read of the same register, not some magic ack sequence). -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] drm/i915: Stop using DC states if firmware disagrees on the state 2016-01-20 9:36 ` Chris Wilson @ 2016-01-20 9:49 ` Mika Kuoppala 2016-01-20 10:17 ` Chris Wilson 0 siblings, 1 reply; 10+ messages in thread From: Mika Kuoppala @ 2016-01-20 9:49 UTC (permalink / raw) To: Chris Wilson; +Cc: intel-gfx, miku Chris Wilson <chris@chris-wilson.co.uk> writes: > On Tue, Jan 19, 2016 at 09:50:08PM +0200, Mika Kuoppala wrote: >> Sometimes we get dmesg warnings claiming that DC6 was already >> enabled prior to our enabling. Investigations using readback of >> the written dc state value indicate that sometimes when we disable >> the dc6, the write doesn't stick, or it does but for a very short time. >> >> This leads to state keeping confusion between firmware and driver, >> driver thinking that dc6 is off and proceeds with modesets >> while firmware still keeps/thinks dc6 is on. Evidence from logs >> suggests that the dc6 is still on as flips start to timeout, >> leading to eventual gpu hang. >> >> Further complication is that to recover the hang, we reset while >> the DC6 is enabled. With this state on the reinit of the gpu side >> won't come out clean and rings rehang right after the init. >> >> Harden the detection of this situation and immediately disable >> DC6 if we disagree on state with the firmware. This should make it >> less probable for driver to do end up in a wrong conclusion and >> let things like modeset and gpu reset proceed with dc6 still >> enabled. >> >> References: https://bugs.freedesktop.org/show_bug.cgi?id=93768 >> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> >> Cc: Chris Wilson <chris@chris-wilson.co.uk> >> Cc: Patrik Jakobsson <patrik.jakobsson@linux.intel.com> >> Cc: Imre Deak <imre.deak@intel.com> >> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> >> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> >> --- >> drivers/gpu/drm/i915/intel_runtime_pm.c | 31 +++++++++++++++++++++++++++---- >> 1 file changed, 27 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c b/drivers/gpu/drm/i915/intel_runtime_pm.c >> index bbca527184d0..5a21453e38e1 100644 >> --- a/drivers/gpu/drm/i915/intel_runtime_pm.c >> +++ b/drivers/gpu/drm/i915/intel_runtime_pm.c >> @@ -470,9 +470,23 @@ static void gen9_set_dc_state_debugmask_memory_up( >> } >> } >> >> +static int gen9_dc_state_verified(struct drm_i915_private *dev_priv, >> + u32 mask, u32 state) >> +{ >> + int i; >> + >> + /* Observe the dc state with handful of reads */ >> + for (i = 0; i < 3; i++) { >> + if (wait_for((I915_READ(DC_STATE_EN) & mask) == state, 100)) >> + return false; >> + } > > Why two loops of reads? > I once saw the write stick but for a very short time, so with one wait_for we might miss it. > Since this is equivalent to > return wait_for((I915_READ(DC_STATE_EN) & mask) == state, 300) == 0; > > In this case > > WRITE(DC_STATE_EN, val & ~mask | state); > if (wait_for((READ(DC_STATE_END) & mask) = state, 300) { > /* oh noes */ > } > > is more obvious (since we verify with a read of the same register, not > some magic ack sequence). I could open code the 2 checks here, would be more obvious yes. Btw what led you to think that the state doesn't stick? Looking at the code for correctness and then deducting that if that warn happens, there is no other explanation? This is all duct tape until we understand why this interface doesn't work as advertized. -Mika > -Chris > > -- > Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] drm/i915: Stop using DC states if firmware disagrees on the state 2016-01-20 9:49 ` Mika Kuoppala @ 2016-01-20 10:17 ` Chris Wilson 0 siblings, 0 replies; 10+ messages in thread From: Chris Wilson @ 2016-01-20 10:17 UTC (permalink / raw) To: Mika Kuoppala; +Cc: intel-gfx, miku On Wed, Jan 20, 2016 at 11:49:51AM +0200, Mika Kuoppala wrote: > Chris Wilson <chris@chris-wilson.co.uk> writes: > > > On Tue, Jan 19, 2016 at 09:50:08PM +0200, Mika Kuoppala wrote: > >> Sometimes we get dmesg warnings claiming that DC6 was already > >> enabled prior to our enabling. Investigations using readback of > >> the written dc state value indicate that sometimes when we disable > >> the dc6, the write doesn't stick, or it does but for a very short time. > >> > >> This leads to state keeping confusion between firmware and driver, > >> driver thinking that dc6 is off and proceeds with modesets > >> while firmware still keeps/thinks dc6 is on. Evidence from logs > >> suggests that the dc6 is still on as flips start to timeout, > >> leading to eventual gpu hang. > >> > >> Further complication is that to recover the hang, we reset while > >> the DC6 is enabled. With this state on the reinit of the gpu side > >> won't come out clean and rings rehang right after the init. > >> > >> Harden the detection of this situation and immediately disable > >> DC6 if we disagree on state with the firmware. This should make it > >> less probable for driver to do end up in a wrong conclusion and > >> let things like modeset and gpu reset proceed with dc6 still > >> enabled. > >> > >> References: https://bugs.freedesktop.org/show_bug.cgi?id=93768 > >> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> > >> Cc: Chris Wilson <chris@chris-wilson.co.uk> > >> Cc: Patrik Jakobsson <patrik.jakobsson@linux.intel.com> > >> Cc: Imre Deak <imre.deak@intel.com> > >> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> > >> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> > >> --- > >> drivers/gpu/drm/i915/intel_runtime_pm.c | 31 +++++++++++++++++++++++++++---- > >> 1 file changed, 27 insertions(+), 4 deletions(-) > >> > >> diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c b/drivers/gpu/drm/i915/intel_runtime_pm.c > >> index bbca527184d0..5a21453e38e1 100644 > >> --- a/drivers/gpu/drm/i915/intel_runtime_pm.c > >> +++ b/drivers/gpu/drm/i915/intel_runtime_pm.c > >> @@ -470,9 +470,23 @@ static void gen9_set_dc_state_debugmask_memory_up( > >> } > >> } > >> > >> +static int gen9_dc_state_verified(struct drm_i915_private *dev_priv, > >> + u32 mask, u32 state) > >> +{ > >> + int i; > >> + > >> + /* Observe the dc state with handful of reads */ > >> + for (i = 0; i < 3; i++) { > >> + if (wait_for((I915_READ(DC_STATE_EN) & mask) == state, 100)) > >> + return false; > >> + } > > > > Why two loops of reads? > > > > I once saw the write stick but for a very short time, so with one wait_for we > might miss it. > > > Since this is equivalent to > > return wait_for((I915_READ(DC_STATE_EN) & mask) == state, 300) == 0; > > > > In this case > > > > WRITE(DC_STATE_EN, val & ~mask | state); > > if (wait_for((READ(DC_STATE_END) & mask) = state, 300) { > > /* oh noes */ > > } > > > > is more obvious (since we verify with a read of the same register, not > > some magic ack sequence). > > I could open code the 2 checks here, would be more obvious yes. > > Btw what led you to think that the state doesn't stick? Looking > at the code for correctness and then deducting that if that warn > happens, there is no other explanation? Yes. It has also been used in the past as a form of acknowledging state changes, though usually it is a separate bit (i.e. a request bit and a current state bit). -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 2/2] drm/i915: Use init power domain during reset 2016-01-19 19:50 [PATCH 0/2] DMC/DC state hardening Mika Kuoppala 2016-01-19 19:50 ` [PATCH 1/2] drm/i915: Stop using DC states if firmware disagrees on the state Mika Kuoppala @ 2016-01-19 19:50 ` Mika Kuoppala 2016-01-19 20:05 ` Ville Syrjälä 2016-01-20 10:52 ` Daniel Vetter 2016-01-20 8:49 ` ✗ Fi.CI.BAT: warning for DMC/DC state hardening Patchwork 2 siblings, 2 replies; 10+ messages in thread From: Mika Kuoppala @ 2016-01-19 19:50 UTC (permalink / raw) To: intel-gfx; +Cc: miku If we have driver failure in our power well and/or dc state keeping, we might try to reset without powers. Evidence shows that resetting the chip with dc6 enabled don't lead to desired results. The dmc kept it's enabled dc6 state over the reset. On subsequent init the rings just hanged right from the start so they didn't reset properly or that the dmc interfered with the init. Harden the reset by setting power domains to be in init mode during reset. This causes the hw state to be flushed so dc6 will be forcibly disabled also. References: https://bugs.freedesktop.org/show_bug.cgi?id=93768 Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Patrik Jakobsson <patrik.jakobsson@linux.intel.com> Cc: Imre Deak <imre.deak@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> --- drivers/gpu/drm/i915/i915_irq.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 25a89373df63..78e242b5c357 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -2510,6 +2510,14 @@ static void i915_reset_and_wakeup(struct drm_device *dev) */ intel_runtime_pm_get(dev_priv); + /* Even if we hold the pm ref, we still might have inconsistent + * power states due to driver failure. Trying to reset without + * powers or with wrong dmc firmware state is futile. Flush + * our power well and dc states ensuring that we reset with + * powers enabled. + */ + intel_display_set_init_power(dev_priv, true); + intel_prepare_reset(dev); /* @@ -2522,6 +2530,8 @@ static void i915_reset_and_wakeup(struct drm_device *dev) intel_finish_reset(dev); + intel_display_set_init_power(dev_priv, false); + intel_runtime_pm_put(dev_priv); if (ret == 0) { -- 2.5.0 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] drm/i915: Use init power domain during reset 2016-01-19 19:50 ` [PATCH 2/2] drm/i915: Use init power domain during reset Mika Kuoppala @ 2016-01-19 20:05 ` Ville Syrjälä 2016-01-20 9:53 ` Mika Kuoppala 2016-01-20 10:52 ` Daniel Vetter 1 sibling, 1 reply; 10+ messages in thread From: Ville Syrjälä @ 2016-01-19 20:05 UTC (permalink / raw) To: Mika Kuoppala; +Cc: intel-gfx, miku On Tue, Jan 19, 2016 at 09:50:09PM +0200, Mika Kuoppala wrote: > If we have driver failure in our power well and/or dc > state keeping, we might try to reset without powers. > > Evidence shows that resetting the chip with dc6 enabled > don't lead to desired results. The dmc kept it's enabled > dc6 state over the reset. On subsequent init the rings > just hanged right from the start so they didn't reset > properly or that the dmc interfered with the init. > > Harden the reset by setting power domains to be in > init mode during reset. This causes the hw state to be > flushed so dc6 will be forcibly disabled also. Hmm. Aren't we holding a forcewake around the reset already? Also with the GT hung it shouldn't really be dropping into RC6 anyway, which should keep the thing in pc2 (IIRC) or higher. I guess if we do a simulated hang it might be in rc6, but the forcewake should be enough protection there. Deep package C-state is the only link I could think of between the GT and DMC. Otherwise I can't think of any real link between the two. So quite baffled by this. > > References: https://bugs.freedesktop.org/show_bug.cgi?id=93768 > Cc: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Patrik Jakobsson <patrik.jakobsson@linux.intel.com> > Cc: Imre Deak <imre.deak@intel.com> > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> > Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> > --- > drivers/gpu/drm/i915/i915_irq.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c > index 25a89373df63..78e242b5c357 100644 > --- a/drivers/gpu/drm/i915/i915_irq.c > +++ b/drivers/gpu/drm/i915/i915_irq.c > @@ -2510,6 +2510,14 @@ static void i915_reset_and_wakeup(struct drm_device *dev) > */ > intel_runtime_pm_get(dev_priv); > > + /* Even if we hold the pm ref, we still might have inconsistent > + * power states due to driver failure. Trying to reset without > + * powers or with wrong dmc firmware state is futile. Flush > + * our power well and dc states ensuring that we reset with > + * powers enabled. > + */ > + intel_display_set_init_power(dev_priv, true); > + > intel_prepare_reset(dev); > > /* > @@ -2522,6 +2530,8 @@ static void i915_reset_and_wakeup(struct drm_device *dev) > > intel_finish_reset(dev); > > + intel_display_set_init_power(dev_priv, false); > + > intel_runtime_pm_put(dev_priv); > > if (ret == 0) { > -- > 2.5.0 -- Ville Syrjälä Intel OTC _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] drm/i915: Use init power domain during reset 2016-01-19 20:05 ` Ville Syrjälä @ 2016-01-20 9:53 ` Mika Kuoppala 0 siblings, 0 replies; 10+ messages in thread From: Mika Kuoppala @ 2016-01-20 9:53 UTC (permalink / raw) To: Ville Syrjälä; +Cc: intel-gfx, miku Ville Syrjälä <ville.syrjala@linux.intel.com> writes: > On Tue, Jan 19, 2016 at 09:50:09PM +0200, Mika Kuoppala wrote: >> If we have driver failure in our power well and/or dc >> state keeping, we might try to reset without powers. >> >> Evidence shows that resetting the chip with dc6 enabled >> don't lead to desired results. The dmc kept it's enabled >> dc6 state over the reset. On subsequent init the rings >> just hanged right from the start so they didn't reset >> properly or that the dmc interfered with the init. >> >> Harden the reset by setting power domains to be in >> init mode during reset. This causes the hw state to be >> flushed so dc6 will be forcibly disabled also. > > Hmm. Aren't we holding a forcewake around the reset already? Also with > the GT hung it shouldn't really be dropping into RC6 anyway, which > should keep the thing in pc2 (IIRC) or higher. I guess if we do a > simulated hang it might be in rc6, but the forcewake should be enough > protection there. > We are holding forcewake and pm ref. So this is paranoia. But as the dmc don't obey us leading to state mismatch, I thought this is the way to force (throught the hw sync) the dm state as disabled before we hit the reset button. Another way would be to introduce intel_power_domains_reset(). Looks like _*suspend() does too much for reset. > Deep package C-state is the only link I could think of between the GT > and DMC. Otherwise I can't think of any real link between the two. So > quite baffled by this. > Me too. This is all ducttape until the root cause is known. -Mika >> >> References: https://bugs.freedesktop.org/show_bug.cgi?id=93768 >> Cc: Chris Wilson <chris@chris-wilson.co.uk> >> Cc: Patrik Jakobsson <patrik.jakobsson@linux.intel.com> >> Cc: Imre Deak <imre.deak@intel.com> >> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> >> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> >> --- >> drivers/gpu/drm/i915/i915_irq.c | 10 ++++++++++ >> 1 file changed, 10 insertions(+) >> >> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c >> index 25a89373df63..78e242b5c357 100644 >> --- a/drivers/gpu/drm/i915/i915_irq.c >> +++ b/drivers/gpu/drm/i915/i915_irq.c >> @@ -2510,6 +2510,14 @@ static void i915_reset_and_wakeup(struct drm_device *dev) >> */ >> intel_runtime_pm_get(dev_priv); >> >> + /* Even if we hold the pm ref, we still might have inconsistent >> + * power states due to driver failure. Trying to reset without >> + * powers or with wrong dmc firmware state is futile. Flush >> + * our power well and dc states ensuring that we reset with >> + * powers enabled. >> + */ >> + intel_display_set_init_power(dev_priv, true); >> + >> intel_prepare_reset(dev); >> >> /* >> @@ -2522,6 +2530,8 @@ static void i915_reset_and_wakeup(struct drm_device *dev) >> >> intel_finish_reset(dev); >> >> + intel_display_set_init_power(dev_priv, false); >> + >> intel_runtime_pm_put(dev_priv); >> >> if (ret == 0) { >> -- >> 2.5.0 > > -- > Ville Syrjälä > Intel OTC _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] drm/i915: Use init power domain during reset 2016-01-19 19:50 ` [PATCH 2/2] drm/i915: Use init power domain during reset Mika Kuoppala 2016-01-19 20:05 ` Ville Syrjälä @ 2016-01-20 10:52 ` Daniel Vetter 1 sibling, 0 replies; 10+ messages in thread From: Daniel Vetter @ 2016-01-20 10:52 UTC (permalink / raw) To: Mika Kuoppala; +Cc: intel-gfx, miku On Tue, Jan 19, 2016 at 09:50:09PM +0200, Mika Kuoppala wrote: > If we have driver failure in our power well and/or dc > state keeping, we might try to reset without powers. > > Evidence shows that resetting the chip with dc6 enabled > don't lead to desired results. The dmc kept it's enabled > dc6 state over the reset. On subsequent init the rings > just hanged right from the start so they didn't reset > properly or that the dmc interfered with the init. > > Harden the reset by setting power domains to be in > init mode during reset. This causes the hw state to be > flushed so dc6 will be forcibly disabled also. > > References: https://bugs.freedesktop.org/show_bug.cgi?id=93768 > Cc: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Patrik Jakobsson <patrik.jakobsson@linux.intel.com> > Cc: Imre Deak <imre.deak@intel.com> > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> > Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> > --- > drivers/gpu/drm/i915/i915_irq.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c > index 25a89373df63..78e242b5c357 100644 > --- a/drivers/gpu/drm/i915/i915_irq.c > +++ b/drivers/gpu/drm/i915/i915_irq.c > @@ -2510,6 +2510,14 @@ static void i915_reset_and_wakeup(struct drm_device *dev) > */ > intel_runtime_pm_get(dev_priv); > > + /* Even if we hold the pm ref, we still might have inconsistent > + * power states due to driver failure. Trying to reset without > + * powers or with wrong dmc firmware state is futile. Flush > + * our power well and dc states ensuring that we reset with > + * powers enabled. > + */ > + intel_display_set_init_power(dev_priv, true); > + > intel_prepare_reset(dev); Since this seems just dc6 related there's good reason to believe it's the display mmio writes in intel_prepare/finish_reset(). Can you just grab relevant power wells for the register we write in there instead of grabbing everything at the top level here? Asking for that since with atomic this shouldn't happen any more ... -Daniel > > /* > @@ -2522,6 +2530,8 @@ static void i915_reset_and_wakeup(struct drm_device *dev) > > intel_finish_reset(dev); > > + intel_display_set_init_power(dev_priv, false); > + > intel_runtime_pm_put(dev_priv); > > if (ret == 0) { > -- > 2.5.0 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 10+ messages in thread
* ✗ Fi.CI.BAT: warning for DMC/DC state hardening 2016-01-19 19:50 [PATCH 0/2] DMC/DC state hardening Mika Kuoppala 2016-01-19 19:50 ` [PATCH 1/2] drm/i915: Stop using DC states if firmware disagrees on the state Mika Kuoppala 2016-01-19 19:50 ` [PATCH 2/2] drm/i915: Use init power domain during reset Mika Kuoppala @ 2016-01-20 8:49 ` Patchwork 2 siblings, 0 replies; 10+ messages in thread From: Patchwork @ 2016-01-20 8:49 UTC (permalink / raw) To: Mika Kuoppala; +Cc: intel-gfx == Summary == Built on 7d26528d30b0d8119c858115b6e5e22deb74ec71 drm-intel-nightly: 2016y-01m-19d-19h-38m-52s UTC integration manifest Test kms_pipe_crc_basic: Subgroup read-crc-pipe-b: dmesg-warn -> PASS (ilk-hp8440p) Subgroup read-crc-pipe-b-frame-sequence: pass -> DMESG-WARN (bdw-ultra) bdw-nuci7 total:140 pass:131 dwarn:0 dfail:0 fail:0 skip:9 bdw-ultra total:140 pass:131 dwarn:2 dfail:1 fail:0 skip:6 byt-nuc total:143 pass:125 dwarn:3 dfail:0 fail:0 skip:15 hsw-brixbox total:143 pass:136 dwarn:0 dfail:0 fail:0 skip:7 hsw-gt2 total:143 pass:139 dwarn:0 dfail:0 fail:0 skip:4 ilk-hp8440p total:143 pass:102 dwarn:3 dfail:0 fail:0 skip:38 ivb-t430s total:137 pass:124 dwarn:3 dfail:4 fail:0 skip:6 skl-i5k-2 total:143 pass:134 dwarn:1 dfail:0 fail:0 skip:8 skl-i7k-2 total:143 pass:134 dwarn:1 dfail:0 fail:0 skip:8 snb-dellxps total:143 pass:124 dwarn:5 dfail:0 fail:0 skip:14 snb-x220t total:143 pass:124 dwarn:5 dfail:0 fail:1 skip:13 Results at /archive/results/CI_IGT_test/Patchwork_1227/ _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2016-01-20 10:52 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-01-19 19:50 [PATCH 0/2] DMC/DC state hardening Mika Kuoppala 2016-01-19 19:50 ` [PATCH 1/2] drm/i915: Stop using DC states if firmware disagrees on the state Mika Kuoppala 2016-01-20 9:36 ` Chris Wilson 2016-01-20 9:49 ` Mika Kuoppala 2016-01-20 10:17 ` Chris Wilson 2016-01-19 19:50 ` [PATCH 2/2] drm/i915: Use init power domain during reset Mika Kuoppala 2016-01-19 20:05 ` Ville Syrjälä 2016-01-20 9:53 ` Mika Kuoppala 2016-01-20 10:52 ` Daniel Vetter 2016-01-20 8:49 ` ✗ Fi.CI.BAT: warning for DMC/DC state hardening Patchwork
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.