All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915: Fix random aux transactions failures.
@ 2015-12-07 10:57 Rodrigo Vivi
  2015-12-07 12:04 ` Rodrigo Vivi
  0 siblings, 1 reply; 3+ messages in thread
From: Rodrigo Vivi @ 2015-12-07 10:57 UTC (permalink / raw)
  To: intel-gfx; +Cc: Jani Nikula, Daniel Vetter, Rodrigo Vivi

Mainly aux communications on sink_crc
were failing a lot randomly on recent platforms.
The first solution was to try to use intel_dp_dpcd_read_wake, but then
it was suggested to move retries to drm level.

Since drm level was already taking care of retries and didn't want
to through random retries on that level the second solution was to
put the retries at aux_transfer layer what was nacked.

So I realized we had so many retries in different places and
started to organize that a bit. During this organization I noticed
that we weren't handing at all the case were the message size was
zeroed. And this was exactly the case that was affecting sink_crc.

Also we weren't respect BSPec who says this size message = 0 or > 20
are forbidden.

It is a fact that we still have no clue why we are getting this
forbidden value there. But anyway we need to handle that for now
so we return -EBUSY and drm level takes care of the retries that
are already in place.

v2: Print debug messsage when this case is reached as suggested
    by Jani.
v3: This patch is crucial to make PSR test cases reliably working
    on SKL. So split this patch from the aux re-org series and add
    a FIXME as a promisse to continue that effort besides reminding
    to remove the sleep when that is merged.

Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Tested-by: Daniel Stone <daniels@collabora.com> # SKL
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/i915/intel_dp.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
index f335c92..2898146 100644
--- a/drivers/gpu/drm/i915/intel_dp.c
+++ b/drivers/gpu/drm/i915/intel_dp.c
@@ -915,6 +915,27 @@ done:
 	/* Unload any bytes sent back from the other side */
 	recv_bytes = ((status & DP_AUX_CH_CTL_MESSAGE_SIZE_MASK) >>
 		      DP_AUX_CH_CTL_MESSAGE_SIZE_SHIFT);
+
+	/*
+	 * By BSpec: "Message sizes of 0 or >20 are not allowed."
+	 * We have no idea of what happened so we return -EBUSY so
+	 * drm layer takes care for the necessary retries.
+	 */
+	if (recv_bytes == 0 || recv_bytes > 20) {
+		DRM_DEBUG_KMS("Forbidden recv_bytes = %d on aux transaction\n",
+			      recv_bytes);
+		/*
+		 * FIXME: This patch was created on top of a series that
+		 * organize the retries at drm level. There EBUSY should
+		 * also take care for 1ms wait before retrying.
+		 * That aux retries re-org is still needed and after that is
+		 * merged we remove this sleep from here.
+		 */
+		usleep_range(1000,1000);
+		ret = -EBUSY;
+		goto out;
+	}
+
 	if (recv_bytes > recv_size)
 		recv_bytes = recv_size;
 
-- 
2.4.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH] drm/i915: Fix random aux transactions failures.
  2015-12-07 10:57 [PATCH] drm/i915: Fix random aux transactions failures Rodrigo Vivi
@ 2015-12-07 12:04 ` Rodrigo Vivi
  2015-12-09 18:53   ` Paulo Zanoni
  0 siblings, 1 reply; 3+ messages in thread
From: Rodrigo Vivi @ 2015-12-07 12:04 UTC (permalink / raw)
  To: intel-gfx; +Cc: Jani Nikula, Daniel Vetter, Rodrigo Vivi

Mainly aux communications on sink_crc
were failing a lot randomly on recent platforms.
The first solution was to try to use intel_dp_dpcd_read_wake, but then
it was suggested to move retries to drm level.

Since drm level was already taking care of retries and didn't want
to through random retries on that level the second solution was to
put the retries at aux_transfer layer what was nacked.

So I realized we had so many retries in different places and
started to organize that a bit. During this organization I noticed
that we weren't handing at all the case were the message size was
zeroed. And this was exactly the case that was affecting sink_crc.

Also we weren't respect BSPec who says this size message = 0 or > 20
are forbidden.

It is a fact that we still have no clue why we are getting this
forbidden value there. But anyway we need to handle that for now
so we return -EBUSY and drm level takes care of the retries that
are already in place.

v2: Print debug messsage when this case is reached as suggested
    by Jani.
v3: This patch is crucial to make PSR test cases reliably working
    on SKL. So split this patch from the aux re-org series and add
    a FIXME as a promisse to continue that effort besides reminding
    to remove the sleep when that is merged.
v4: Use a bigger usleep range so kernel doesn't need to be interrupted
    on a exact time, as suggested by Paulo.
    But anyway we should discuss the better time
    ranges on the EBUSY handle re-org at drm level since this one here
    is temporary.

Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Tested-by: Daniel Stone <daniels@collabora.com> # SKL
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/i915/intel_dp.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
index f335c92..0d5fe80 100644
--- a/drivers/gpu/drm/i915/intel_dp.c
+++ b/drivers/gpu/drm/i915/intel_dp.c
@@ -915,6 +915,27 @@ done:
 	/* Unload any bytes sent back from the other side */
 	recv_bytes = ((status & DP_AUX_CH_CTL_MESSAGE_SIZE_MASK) >>
 		      DP_AUX_CH_CTL_MESSAGE_SIZE_SHIFT);
+
+	/*
+	 * By BSpec: "Message sizes of 0 or >20 are not allowed."
+	 * We have no idea of what happened so we return -EBUSY so
+	 * drm layer takes care for the necessary retries.
+	 */
+	if (recv_bytes == 0 || recv_bytes > 20) {
+		DRM_DEBUG_KMS("Forbidden recv_bytes = %d on aux transaction\n",
+			      recv_bytes);
+		/*
+		 * FIXME: This patch was created on top of a series that
+		 * organize the retries at drm level. There EBUSY should
+		 * also take care for 1ms wait before retrying.
+		 * That aux retries re-org is still needed and after that is
+		 * merged we remove this sleep from here.
+		 */
+		usleep_range(1000,1500);
+		ret = -EBUSY;
+		goto out;
+	}
+
 	if (recv_bytes > recv_size)
 		recv_bytes = recv_size;
 
-- 
2.4.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] drm/i915: Fix random aux transactions failures.
  2015-12-07 12:04 ` Rodrigo Vivi
@ 2015-12-09 18:53   ` Paulo Zanoni
  0 siblings, 0 replies; 3+ messages in thread
From: Paulo Zanoni @ 2015-12-09 18:53 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: Jani Nikula, Daniel Vetter, Intel Graphics Development

2015-12-07 10:04 GMT-02:00 Rodrigo Vivi <rodrigo.vivi@intel.com>:
> Mainly aux communications on sink_crc
> were failing a lot randomly on recent platforms.
> The first solution was to try to use intel_dp_dpcd_read_wake, but then
> it was suggested to move retries to drm level.
>
> Since drm level was already taking care of retries and didn't want
> to through random retries on that level the second solution was to
> put the retries at aux_transfer layer what was nacked.
>
> So I realized we had so many retries in different places and
> started to organize that a bit. During this organization I noticed
> that we weren't handing at all the case were the message size was
> zeroed. And this was exactly the case that was affecting sink_crc.
>
> Also we weren't respect BSPec who says this size message = 0 or > 20
> are forbidden.
>
> It is a fact that we still have no clue why we are getting this
> forbidden value there. But anyway we need to handle that for now
> so we return -EBUSY and drm level takes care of the retries that
> are already in place.
>
> v2: Print debug messsage when this case is reached as suggested
>     by Jani.
> v3: This patch is crucial to make PSR test cases reliably working
>     on SKL. So split this patch from the aux re-org series and add
>     a FIXME as a promisse to continue that effort besides reminding
>     to remove the sleep when that is merged.
> v4: Use a bigger usleep range so kernel doesn't need to be interrupted
>     on a exact time, as suggested by Paulo.
>     But anyway we should discuss the better time
>     ranges on the EBUSY handle re-org at drm level since this one here
>     is temporary.
>
> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
> Cc: Jani Nikula <jani.nikula@intel.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> Tested-by: Daniel Stone <daniels@collabora.com> # SKL
> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_dp.c | 21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
> index f335c92..0d5fe80 100644
> --- a/drivers/gpu/drm/i915/intel_dp.c
> +++ b/drivers/gpu/drm/i915/intel_dp.c
> @@ -915,6 +915,27 @@ done:
>         /* Unload any bytes sent back from the other side */
>         recv_bytes = ((status & DP_AUX_CH_CTL_MESSAGE_SIZE_MASK) >>
>                       DP_AUX_CH_CTL_MESSAGE_SIZE_SHIFT);
> +
> +       /*
> +        * By BSpec: "Message sizes of 0 or >20 are not allowed."
> +        * We have no idea of what happened so we return -EBUSY so
> +        * drm layer takes care for the necessary retries.
> +        */
> +       if (recv_bytes == 0 || recv_bytes > 20) {
> +               DRM_DEBUG_KMS("Forbidden recv_bytes = %d on aux transaction\n",
> +                             recv_bytes);
> +               /*
> +                * FIXME: This patch was created on top of a series that
> +                * organize the retries at drm level. There EBUSY should
> +                * also take care for 1ms wait before retrying.
> +                * That aux retries re-org is still needed and after that is
> +                * merged we remove this sleep from here.
> +                */
> +               usleep_range(1000,1500);

s/1000,1500/1000, 1500/

Although I'm not so sure why 1ms. Shouldn't we stick to the spec
default of 400 or Ville's default of 500?

Anyway, having this seems better than not having this, and the
possible problems are highlighted by the FIXME so, with or without
changes:
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>

> +               ret = -EBUSY;
> +               goto out;
> +       }
> +
>         if (recv_bytes > recv_size)
>                 recv_bytes = recv_size;
>
> --
> 2.4.3
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx



-- 
Paulo Zanoni
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-12-09 18:53 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-07 10:57 [PATCH] drm/i915: Fix random aux transactions failures Rodrigo Vivi
2015-12-07 12:04 ` Rodrigo Vivi
2015-12-09 18:53   ` Paulo Zanoni

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.