All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Paul <sean@poorly.run>
To: Lyude Paul <lyude@redhat.com>
Cc: dri-devel <dri-devel@lists.freedesktop.org>,
	stable <stable@vger.kernel.org>,
	Maarten Lankhorst <maarten.lankhorst@linux.intel.com>,
	Maxime Ripard <mripard@kernel.org>,
	Thomas Zimmermann <tzimmermann@suse.de>,
	David Airlie <airlied@linux.ie>, Daniel Vetter <daniel@ffwll.ch>,
	Todd Previte <tprevite@gmail.com>,
	Dave Airlie <airlied@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 3/4] drm/dp_mst: Increase ACT retry timeout to 3s
Date: Mon, 6 Apr 2020 15:48:34 -0400	[thread overview]
Message-ID: <CAMavQKJdh22Xa82W19UuQ+6P-XYgK-f+VV9maTFO7kE0Zs+hwg@mail.gmail.com> (raw)
In-Reply-To: <3eccd492237ee8797a8af2ea757594bc13ae055f.camel@redhat.com>

On Mon, Apr 6, 2020 at 3:43 PM Lyude Paul <lyude@redhat.com> wrote:
>
> On Mon, 2020-04-06 at 15:41 -0400, Sean Paul wrote:
> > On Fri, Apr 3, 2020 at 4:08 PM Lyude Paul <lyude@redhat.com> wrote:
> > > Currently we only poll for an ACT up to 30 times, with a busy-wait delay
> > > of 100µs between each attempt - giving us a timeout of 2900µs. While
> > > this might seem sensible, it would appear that in certain scenarios it
> > > can take dramatically longer then that for us to receive an ACT. On one
> > > of the EVGA MST hubs that I have available, I observed said hub
> > > sometimes taking longer then a second before signalling the ACT. These
> > > delays mostly seem to occur when previous sideband messages we've sent
> > > are NAKd by the hub, however it wouldn't be particularly surprising if
> > > it's possible to reproduce times like this simply by introducing branch
> > > devices with large LCTs since payload allocations have to take effect on
> > > every downstream device up to the payload's target.
> > >
> > > So, instead of just retrying 30 times we poll for the ACT for up to 3ms,
> > > and additionally use usleep_range() to avoid a very long and rude
> > > busy-wait. Note that the previous retry count of 30 appears to have been
> > > arbitrarily chosen, as I can't find any mention of a recommended timeout
> > > or retry count for ACTs in the DisplayPort 2.0 specification. This also
> > > goes for the range we were previously using for udelay(), although I
> > > suspect that was just copied from the recommended delay for link
> > > training on SST devices.
> > >
> > > Signed-off-by: Lyude Paul <lyude@redhat.com>
> > > Fixes: ad7f8a1f9ced ("drm/helper: add Displayport multi-stream helper
> > > (v0.6)")
> > > Cc: Sean Paul <sean@poorly.run>
> > > Cc: <stable@vger.kernel.org> # v3.17+
> > > ---
> > >  drivers/gpu/drm/drm_dp_mst_topology.c | 26 +++++++++++++++++++-------
> > >  1 file changed, 19 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c
> > > b/drivers/gpu/drm/drm_dp_mst_topology.c
> > > index 7aaf184a2e5f..f313407374ed 100644
> > > --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> > > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> > > @@ -4466,17 +4466,30 @@ static int drm_dp_dpcd_write_payload(struct
> > > drm_dp_mst_topology_mgr *mgr,
> > >   * @mgr: manager to use
> > >   *
> > >   * Tries waiting for the MST hub to finish updating it's payload table by
> > > - * polling for the ACT handled bit.
> > > + * polling for the ACT handled bit for up to 3 seconds (yes-some hubs
> > > really
> > > + * take that long).
> > >   *
> > >   * Returns:
> > >   * 0 if the ACT was handled in time, negative error code on failure.
> > >   */
> > >  int drm_dp_check_act_status(struct drm_dp_mst_topology_mgr *mgr)
> > >  {
> > > -       int count = 0, ret;
> > > +       /*
> > > +        * There doesn't seem to be any recommended retry count or timeout
> > > in
> > > +        * the MST specification. Since some hubs have been observed to
> > > take
> > > +        * over 1 second to update their payload allocations under certain
> > > +        * conditions, we use a rather large timeout value.
> > > +        */
> > > +       const int timeout_ms = 3000;
> > > +      unsigned long timeout = jiffies + msecs_to_jiffies(timeout_ms);
> > > +       int ret;
> > > +       bool retrying = false;
> > >         u8 status;
> > >
> > >         do {
> > > +               if (retrying)
> > > +                       usleep_range(100, 1000);
> > > +
> > >                 ret = drm_dp_dpcd_readb(mgr->aux,
> > >                                         DP_PAYLOAD_TABLE_UPDATE_STATUS,
> > >                                         &status);
> > > @@ -4488,13 +4501,12 @@ int drm_dp_check_act_status(struct
> > > drm_dp_mst_topology_mgr *mgr)
> > >
> > >                 if (status & DP_PAYLOAD_ACT_HANDLED)
> > >                         break;
> > > -               count++;
> > > -               udelay(100);
> > > -       } while (count < 30);
> > > +               retrying = true;
> > > +       } while (jiffies < timeout);
> >
> > Somewhat academic, but I think there's an overflow possibility here if
> > timeout is near ulong_max and jiffies overflows during the usleep. In
> > that case we'll be retrying for a very loong time.
> >
> > I wish we had i915's wait_for() macro available to all drm...
>
> Maybe we could add it to the kernel library somewhere? I don't see why we'd
> need to stop at DRM

So You Want To Build A Bikeshed...

Seriously though, I'd be very happy with that. Alternatively you could
shoehorn this into readx_poll_timeout as well.

Sean

>
> >
> > Sean
> >
> > >         if (!(status & DP_PAYLOAD_ACT_HANDLED)) {
> > > -               DRM_DEBUG_KMS("failed to get ACT bit %d after %d
> > > retries\n",
> > > -                             status, count);
> > > +               DRM_DEBUG_KMS("failed to get ACT bit %d after %dms\n",
> > > +                             status, timeout_ms);
> > >                 return -EINVAL;
> > >         }
> > >         return 0;
> > > --
> > > 2.25.1
> > >
> --
> Cheers,
>         Lyude Paul (she/her)
>         Associate Software Engineer at Red Hat
>

WARNING: multiple messages have this Message-ID (diff)
From: Sean Paul <sean@poorly.run>
To: Lyude Paul <lyude@redhat.com>
Cc: Todd Previte <tprevite@gmail.com>,
	David Airlie <airlied@linux.ie>,
	dri-devel <dri-devel@lists.freedesktop.org>,
	LKML <linux-kernel@vger.kernel.org>,
	stable <stable@vger.kernel.org>,
	Thomas Zimmermann <tzimmermann@suse.de>,
	Dave Airlie <airlied@redhat.com>
Subject: Re: [PATCH 3/4] drm/dp_mst: Increase ACT retry timeout to 3s
Date: Mon, 6 Apr 2020 15:48:34 -0400	[thread overview]
Message-ID: <CAMavQKJdh22Xa82W19UuQ+6P-XYgK-f+VV9maTFO7kE0Zs+hwg@mail.gmail.com> (raw)
In-Reply-To: <3eccd492237ee8797a8af2ea757594bc13ae055f.camel@redhat.com>

On Mon, Apr 6, 2020 at 3:43 PM Lyude Paul <lyude@redhat.com> wrote:
>
> On Mon, 2020-04-06 at 15:41 -0400, Sean Paul wrote:
> > On Fri, Apr 3, 2020 at 4:08 PM Lyude Paul <lyude@redhat.com> wrote:
> > > Currently we only poll for an ACT up to 30 times, with a busy-wait delay
> > > of 100µs between each attempt - giving us a timeout of 2900µs. While
> > > this might seem sensible, it would appear that in certain scenarios it
> > > can take dramatically longer then that for us to receive an ACT. On one
> > > of the EVGA MST hubs that I have available, I observed said hub
> > > sometimes taking longer then a second before signalling the ACT. These
> > > delays mostly seem to occur when previous sideband messages we've sent
> > > are NAKd by the hub, however it wouldn't be particularly surprising if
> > > it's possible to reproduce times like this simply by introducing branch
> > > devices with large LCTs since payload allocations have to take effect on
> > > every downstream device up to the payload's target.
> > >
> > > So, instead of just retrying 30 times we poll for the ACT for up to 3ms,
> > > and additionally use usleep_range() to avoid a very long and rude
> > > busy-wait. Note that the previous retry count of 30 appears to have been
> > > arbitrarily chosen, as I can't find any mention of a recommended timeout
> > > or retry count for ACTs in the DisplayPort 2.0 specification. This also
> > > goes for the range we were previously using for udelay(), although I
> > > suspect that was just copied from the recommended delay for link
> > > training on SST devices.
> > >
> > > Signed-off-by: Lyude Paul <lyude@redhat.com>
> > > Fixes: ad7f8a1f9ced ("drm/helper: add Displayport multi-stream helper
> > > (v0.6)")
> > > Cc: Sean Paul <sean@poorly.run>
> > > Cc: <stable@vger.kernel.org> # v3.17+
> > > ---
> > >  drivers/gpu/drm/drm_dp_mst_topology.c | 26 +++++++++++++++++++-------
> > >  1 file changed, 19 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c
> > > b/drivers/gpu/drm/drm_dp_mst_topology.c
> > > index 7aaf184a2e5f..f313407374ed 100644
> > > --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> > > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> > > @@ -4466,17 +4466,30 @@ static int drm_dp_dpcd_write_payload(struct
> > > drm_dp_mst_topology_mgr *mgr,
> > >   * @mgr: manager to use
> > >   *
> > >   * Tries waiting for the MST hub to finish updating it's payload table by
> > > - * polling for the ACT handled bit.
> > > + * polling for the ACT handled bit for up to 3 seconds (yes-some hubs
> > > really
> > > + * take that long).
> > >   *
> > >   * Returns:
> > >   * 0 if the ACT was handled in time, negative error code on failure.
> > >   */
> > >  int drm_dp_check_act_status(struct drm_dp_mst_topology_mgr *mgr)
> > >  {
> > > -       int count = 0, ret;
> > > +       /*
> > > +        * There doesn't seem to be any recommended retry count or timeout
> > > in
> > > +        * the MST specification. Since some hubs have been observed to
> > > take
> > > +        * over 1 second to update their payload allocations under certain
> > > +        * conditions, we use a rather large timeout value.
> > > +        */
> > > +       const int timeout_ms = 3000;
> > > +      unsigned long timeout = jiffies + msecs_to_jiffies(timeout_ms);
> > > +       int ret;
> > > +       bool retrying = false;
> > >         u8 status;
> > >
> > >         do {
> > > +               if (retrying)
> > > +                       usleep_range(100, 1000);
> > > +
> > >                 ret = drm_dp_dpcd_readb(mgr->aux,
> > >                                         DP_PAYLOAD_TABLE_UPDATE_STATUS,
> > >                                         &status);
> > > @@ -4488,13 +4501,12 @@ int drm_dp_check_act_status(struct
> > > drm_dp_mst_topology_mgr *mgr)
> > >
> > >                 if (status & DP_PAYLOAD_ACT_HANDLED)
> > >                         break;
> > > -               count++;
> > > -               udelay(100);
> > > -       } while (count < 30);
> > > +               retrying = true;
> > > +       } while (jiffies < timeout);
> >
> > Somewhat academic, but I think there's an overflow possibility here if
> > timeout is near ulong_max and jiffies overflows during the usleep. In
> > that case we'll be retrying for a very loong time.
> >
> > I wish we had i915's wait_for() macro available to all drm...
>
> Maybe we could add it to the kernel library somewhere? I don't see why we'd
> need to stop at DRM

So You Want To Build A Bikeshed...

Seriously though, I'd be very happy with that. Alternatively you could
shoehorn this into readx_poll_timeout as well.

Sean

>
> >
> > Sean
> >
> > >         if (!(status & DP_PAYLOAD_ACT_HANDLED)) {
> > > -               DRM_DEBUG_KMS("failed to get ACT bit %d after %d
> > > retries\n",
> > > -                             status, count);
> > > +               DRM_DEBUG_KMS("failed to get ACT bit %d after %dms\n",
> > > +                             status, timeout_ms);
> > >                 return -EINVAL;
> > >         }
> > >         return 0;
> > > --
> > > 2.25.1
> > >
> --
> Cheers,
>         Lyude Paul (she/her)
>         Associate Software Engineer at Red Hat
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

  reply	other threads:[~2020-04-06 19:49 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-03 20:07 [PATCH 0/4] drm/dp_mst: drm_dp_check_act_status() fixes Lyude Paul
2020-04-03 20:07 ` Lyude Paul
2020-04-03 20:07 ` [PATCH 1/4] drm/dp_mst: Improve kdocs for drm_dp_check_act_status() Lyude Paul
2020-04-03 20:07   ` Lyude Paul
2020-04-06 19:21   ` Sean Paul
2020-04-06 19:21     ` Sean Paul
2020-04-03 20:07 ` [PATCH 2/4] drm/dp_mst: Reformat drm_dp_check_act_status() a bit Lyude Paul
2020-04-03 20:07   ` Lyude Paul
2020-04-06 19:23   ` Sean Paul
2020-04-06 19:23     ` Sean Paul
2020-04-06 19:27     ` Lyude Paul
2020-04-06 19:27       ` Lyude Paul
2020-04-06 22:11     ` Lyude Paul
2020-04-06 22:11       ` Lyude Paul
2020-04-03 20:07 ` [PATCH 3/4] drm/dp_mst: Increase ACT retry timeout to 3s Lyude Paul
2020-04-03 20:07   ` Lyude Paul
2020-04-06 19:41   ` Sean Paul
2020-04-06 19:41     ` Sean Paul
2020-04-06 19:43     ` Lyude Paul
2020-04-06 19:43       ` Lyude Paul
2020-04-06 19:48       ` Sean Paul [this message]
2020-04-06 19:48         ` Sean Paul
2020-04-03 20:07 ` [PATCH 4/4] drm/dp_mst: Print errors on ACT timeouts Lyude Paul
2020-04-03 20:07   ` Lyude Paul
2020-04-06 19:43   ` Sean Paul
2020-04-06 19:43     ` Sean Paul

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAMavQKJdh22Xa82W19UuQ+6P-XYgK-f+VV9maTFO7kE0Zs+hwg@mail.gmail.com \
    --to=sean@poorly.run \
    --cc=airlied@linux.ie \
    --cc=airlied@redhat.com \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lyude@redhat.com \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=mripard@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=tprevite@gmail.com \
    --cc=tzimmermann@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.