All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Auld <matthew.william.auld@gmail.com>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>,
	Intel Graphics Development <Intel-gfx@lists.freedesktop.org>,
	ML dri-devel <dri-devel@lists.freedesktop.org>
Subject: Re: [Intel-gfx] [PATCH 5/6] drm/i915: Fail too long user submissions by default
Date: Tue, 23 Mar 2021 15:56:21 +0000	[thread overview]
Message-ID: <CAM0jSHOXtwEukFb0ugS8r2_wJFKxb-XHunzvmMFGvvWWp20KwA@mail.gmail.com> (raw)
In-Reply-To: <20210318170419.2107512-6-tvrtko.ursulin@linux.intel.com>

On Thu, 18 Mar 2021 at 17:04, Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
>
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>
> A new Kconfig option CONFIG_DRM_I915_REQUEST_TIMEOUT is added, defaulting
> to 20s, and this timeout is applied to all users contexts using the
> previously added watchdog facility.
>
> Result of this is that any user submission will simply fail after this
> timeout, either causing a reset (for non-preemptable), or incomplete
> results.
>
> This can have an effect that workloads which used to work fine will
> suddenly start failing. Even workloads comprised of short batches but in
> long dependency chains can be terminated.
>
> And becuase of lack of agreement on usefulness and safety of fence error

   because

> propagation this partial execution can be invisible to userspace even if
> it is "listening" to returned fence status.
>
> Another interaction is with hangcheck where care needs to be taken timeout
> is not set lower or close to three times the heartbeat interval. Otherwise
> a hang in any application can cause complete termination of all
> submissions from unrelated clients. Any users modifying the per engine
> heartbeat intervals therefore need to be aware of this potential denial of
> service to avoid inadvertently enabling it.
>
> Given all this I am personally not convinced the scheme is a good idea.
> Intuitively it feels object importers would be better positioned to
> enforce the time they are willing to wait for something to complete.
>
> v2:
>  * Improved commit message and Kconfig text.
>  * Pull in some helper code from patch which got dropped.
>
> v3:
>  * Bump timeout to 20s to see if it helps Tigerlake.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Matthew Auld <matthew.auld@intel.com>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

WARNING: multiple messages have this Message-ID (diff)
From: Matthew Auld <matthew.william.auld@gmail.com>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>,
	Intel Graphics Development <Intel-gfx@lists.freedesktop.org>,
	ML dri-devel <dri-devel@lists.freedesktop.org>
Subject: Re: [Intel-gfx] [PATCH 5/6] drm/i915: Fail too long user submissions by default
Date: Tue, 23 Mar 2021 15:56:21 +0000	[thread overview]
Message-ID: <CAM0jSHOXtwEukFb0ugS8r2_wJFKxb-XHunzvmMFGvvWWp20KwA@mail.gmail.com> (raw)
In-Reply-To: <20210318170419.2107512-6-tvrtko.ursulin@linux.intel.com>

On Thu, 18 Mar 2021 at 17:04, Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
>
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>
> A new Kconfig option CONFIG_DRM_I915_REQUEST_TIMEOUT is added, defaulting
> to 20s, and this timeout is applied to all users contexts using the
> previously added watchdog facility.
>
> Result of this is that any user submission will simply fail after this
> timeout, either causing a reset (for non-preemptable), or incomplete
> results.
>
> This can have an effect that workloads which used to work fine will
> suddenly start failing. Even workloads comprised of short batches but in
> long dependency chains can be terminated.
>
> And becuase of lack of agreement on usefulness and safety of fence error

   because

> propagation this partial execution can be invisible to userspace even if
> it is "listening" to returned fence status.
>
> Another interaction is with hangcheck where care needs to be taken timeout
> is not set lower or close to three times the heartbeat interval. Otherwise
> a hang in any application can cause complete termination of all
> submissions from unrelated clients. Any users modifying the per engine
> heartbeat intervals therefore need to be aware of this potential denial of
> service to avoid inadvertently enabling it.
>
> Given all this I am personally not convinced the scheme is a good idea.
> Intuitively it feels object importers would be better positioned to
> enforce the time they are willing to wait for something to complete.
>
> v2:
>  * Improved commit message and Kconfig text.
>  * Pull in some helper code from patch which got dropped.
>
> v3:
>  * Bump timeout to 20s to see if it helps Tigerlake.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Matthew Auld <matthew.auld@intel.com>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2021-03-23 15:56 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-18 17:04 [PATCH v3 0/6] Default request/fence expiry + watchdog Tvrtko Ursulin
2021-03-18 17:04 ` [Intel-gfx] " Tvrtko Ursulin
2021-03-18 17:04 ` [PATCH 1/6] drm/i915: Individual request cancellation Tvrtko Ursulin
2021-03-18 17:04   ` [Intel-gfx] " Tvrtko Ursulin
2021-03-22 15:38   ` Matthew Auld
2021-03-22 15:38     ` Matthew Auld
2021-03-23  9:48     ` Tvrtko Ursulin
2021-03-23  9:48       ` Tvrtko Ursulin
2021-03-18 17:04 ` [PATCH 2/6] drm/i915: Restrict sentinel requests further Tvrtko Ursulin
2021-03-18 17:04   ` [Intel-gfx] " Tvrtko Ursulin
2021-03-22 17:12   ` Matthew Auld
2021-03-22 17:12     ` Matthew Auld
2021-03-23  9:09     ` Tvrtko Ursulin
2021-03-23  9:09       ` Tvrtko Ursulin
2021-03-18 17:04 ` [PATCH 3/6] drm/i915: Handle async cancellation in sentinel assert Tvrtko Ursulin
2021-03-18 17:04   ` [Intel-gfx] " Tvrtko Ursulin
2021-03-23 10:09   ` Matthew Auld
2021-03-23 10:09     ` Matthew Auld
2021-03-18 17:04 ` [PATCH 4/6] drm/i915: Request watchdog infrastructure Tvrtko Ursulin
2021-03-18 17:04   ` [Intel-gfx] " Tvrtko Ursulin
2021-03-22 13:29   ` [PATCH v3 " Tvrtko Ursulin
2021-03-22 13:29     ` [Intel-gfx] " Tvrtko Ursulin
2021-03-23 10:54     ` Matthew Auld
2021-03-23 10:54       ` Matthew Auld
2021-03-23 11:09       ` Tvrtko Ursulin
2021-03-23 11:09         ` Tvrtko Ursulin
2021-03-23 11:40         ` Matthew Auld
2021-03-23 11:40           ` Matthew Auld
2021-03-18 17:04 ` [PATCH 5/6] drm/i915: Fail too long user submissions by default Tvrtko Ursulin
2021-03-18 17:04   ` [Intel-gfx] " Tvrtko Ursulin
2021-03-23 15:56   ` Matthew Auld [this message]
2021-03-23 15:56     ` Matthew Auld
2021-03-18 17:04 ` [PATCH 6/6] drm/i915: Allow configuring default request expiry via modparam Tvrtko Ursulin
2021-03-18 17:04   ` [Intel-gfx] " Tvrtko Ursulin
2021-03-18 19:07 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Default request/fence expiry + watchdog (rev3) Patchwork
2021-03-18 19:36 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2021-03-19  1:17 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
2021-03-22 13:37   ` Tvrtko Ursulin
2021-03-22 13:37     ` [Intel-gfx] " Tvrtko Ursulin
2021-03-22 13:41     ` Daniel Vetter
2021-03-22 13:41       ` [Intel-gfx] " Daniel Vetter
2021-03-22 14:05 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Default request/fence expiry + watchdog (rev4) Patchwork
2021-03-22 14:33 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
  -- strict thread matches above, loose matches on Subject: below --
2021-03-16 16:23 [PATCH 0/6] Default request/fence expiry + watchdog Tvrtko Ursulin
2021-03-16 16:23 ` [Intel-gfx] [PATCH 5/6] drm/i915: Fail too long user submissions by default Tvrtko Ursulin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAM0jSHOXtwEukFb0ugS8r2_wJFKxb-XHunzvmMFGvvWWp20KwA@mail.gmail.com \
    --to=matthew.william.auld@gmail.com \
    --cc=Intel-gfx@lists.freedesktop.org \
    --cc=daniel.vetter@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=tvrtko.ursulin@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.