All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Intel-gfx@lists.freedesktop.org
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>,
	dri-devel@lists.freedesktop.org,
	Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Subject: [RFC 0/6] Default request/fence expiry + watchdog
Date: Fri, 12 Mar 2021 15:46:16 +0000	[thread overview]
Message-ID: <20210312154622.1767865-1-tvrtko.ursulin@linux.intel.com> (raw)

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

"Watchdog" aka "restoring hangcheck" aka default request/fence expiry - first
post of a somewhat controversial feature so may be somewhat rough in commit
messages, commentary and implementation. So only RFC for now.

I parenthesise the "watchdog" becuase in classical sense watchdog would allow
userspace to ping it and so remain alive.

I parenthesise "restoring hangcheck" because this series, contrary to the old
hangcheck, is not looking at whether the workload is making any progress from
the kernel side either. (Althoguh disclaimer my memory may be leaky - Daniel
suspects old hangcheck had some stricter, more indiscriminatory, angles to it.
But apart from being prone to both false negatives and false positives I can't
remember that myself.)

Short version - ask is to fail any user submissions after a set time period. In
this RFC that time is ten seconds.

Time counts from the moment user submission is "runnable" (implicit and explicit
dependencies have been cleared) and keeps counting regardless of the GPU
contetion caused by other users of the system. So semantics are really a bit
weak but again, I understand this is really wanted by the DRM core.

As an attempt to compensate for this brutish nature, I proposed adding
extendable configurability via a context param as part of the series. That could
allow userspace to pick different semantics (always going more restrictive than
the system default) and so implement interesting things like long desired media
watchdog. Module trickyness of the implementation there.

Test-with: 20210312093329.1639502-1-tvrtko.ursulin@linux.intel.com
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>

Chris Wilson (1):
  drm/i915: Individual request cancellation

Tvrtko Ursulin (5):
  drm/i915: Restrict sentinel requests further
  drm/i915: Request watchdog infrastructure
  drm/i915: Allow userspace to configure the watchdog
  drm/i915: Fail too long user submissions by default
  drm/i915: Allow configuring default request expiry via modparam

 drivers/gpu/drm/i915/Kconfig.profile          |   8 +
 drivers/gpu/drm/i915/gem/i915_gem_context.c   |  92 ++++++
 .../gpu/drm/i915/gem/i915_gem_context_types.h |   4 +
 drivers/gpu/drm/i915/gt/intel_context_param.h |  11 +-
 drivers/gpu/drm/i915/gt/intel_context_types.h |   4 +
 .../gpu/drm/i915/gt/intel_engine_heartbeat.c  |   1 +
 .../drm/i915/gt/intel_execlists_submission.c  |  11 +-
 .../drm/i915/gt/intel_execlists_submission.h  |   2 +
 drivers/gpu/drm/i915/gt/intel_gt.c            |   3 +
 drivers/gpu/drm/i915/gt/intel_gt.h            |   2 +
 drivers/gpu/drm/i915/gt/intel_gt_requests.c   |  21 ++
 drivers/gpu/drm/i915/gt/intel_gt_types.h      |   7 +
 drivers/gpu/drm/i915/i915_params.c            |   5 +
 drivers/gpu/drm/i915/i915_params.h            |   1 +
 drivers/gpu/drm/i915/i915_request.c           | 129 +++++++-
 drivers/gpu/drm/i915/i915_request.h           |  12 +-
 drivers/gpu/drm/i915/selftests/i915_request.c | 275 ++++++++++++++++++
 include/uapi/drm/i915_drm.h                   |   5 +-
 18 files changed, 584 insertions(+), 9 deletions(-)

-- 
2.27.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

WARNING: multiple messages have this Message-ID
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Intel-gfx@lists.freedesktop.org
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>, dri-devel@lists.freedesktop.org
Subject: [Intel-gfx] [RFC 0/6] Default request/fence expiry + watchdog
Date: Fri, 12 Mar 2021 15:46:16 +0000	[thread overview]
Message-ID: <20210312154622.1767865-1-tvrtko.ursulin@linux.intel.com> (raw)

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

"Watchdog" aka "restoring hangcheck" aka default request/fence expiry - first
post of a somewhat controversial feature so may be somewhat rough in commit
messages, commentary and implementation. So only RFC for now.

I parenthesise the "watchdog" becuase in classical sense watchdog would allow
userspace to ping it and so remain alive.

I parenthesise "restoring hangcheck" because this series, contrary to the old
hangcheck, is not looking at whether the workload is making any progress from
the kernel side either. (Althoguh disclaimer my memory may be leaky - Daniel
suspects old hangcheck had some stricter, more indiscriminatory, angles to it.
But apart from being prone to both false negatives and false positives I can't
remember that myself.)

Short version - ask is to fail any user submissions after a set time period. In
this RFC that time is ten seconds.

Time counts from the moment user submission is "runnable" (implicit and explicit
dependencies have been cleared) and keeps counting regardless of the GPU
contetion caused by other users of the system. So semantics are really a bit
weak but again, I understand this is really wanted by the DRM core.

As an attempt to compensate for this brutish nature, I proposed adding
extendable configurability via a context param as part of the series. That could
allow userspace to pick different semantics (always going more restrictive than
the system default) and so implement interesting things like long desired media
watchdog. Module trickyness of the implementation there.

Test-with: 20210312093329.1639502-1-tvrtko.ursulin@linux.intel.com
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>

Chris Wilson (1):
  drm/i915: Individual request cancellation

Tvrtko Ursulin (5):
  drm/i915: Restrict sentinel requests further
  drm/i915: Request watchdog infrastructure
  drm/i915: Allow userspace to configure the watchdog
  drm/i915: Fail too long user submissions by default
  drm/i915: Allow configuring default request expiry via modparam

 drivers/gpu/drm/i915/Kconfig.profile          |   8 +
 drivers/gpu/drm/i915/gem/i915_gem_context.c   |  92 ++++++
 .../gpu/drm/i915/gem/i915_gem_context_types.h |   4 +
 drivers/gpu/drm/i915/gt/intel_context_param.h |  11 +-
 drivers/gpu/drm/i915/gt/intel_context_types.h |   4 +
 .../gpu/drm/i915/gt/intel_engine_heartbeat.c  |   1 +
 .../drm/i915/gt/intel_execlists_submission.c  |  11 +-
 .../drm/i915/gt/intel_execlists_submission.h  |   2 +
 drivers/gpu/drm/i915/gt/intel_gt.c            |   3 +
 drivers/gpu/drm/i915/gt/intel_gt.h            |   2 +
 drivers/gpu/drm/i915/gt/intel_gt_requests.c   |  21 ++
 drivers/gpu/drm/i915/gt/intel_gt_types.h      |   7 +
 drivers/gpu/drm/i915/i915_params.c            |   5 +
 drivers/gpu/drm/i915/i915_params.h            |   1 +
 drivers/gpu/drm/i915/i915_request.c           | 129 +++++++-
 drivers/gpu/drm/i915/i915_request.h           |  12 +-
 drivers/gpu/drm/i915/selftests/i915_request.c | 275 ++++++++++++++++++
 include/uapi/drm/i915_drm.h                   |   5 +-
 18 files changed, 584 insertions(+), 9 deletions(-)

-- 
2.27.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

             reply	other threads:[~2021-03-12 15:49 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-12 15:46 Tvrtko Ursulin [this message]
2021-03-12 15:46 ` [Intel-gfx] [RFC 0/6] Default request/fence expiry + watchdog Tvrtko Ursulin
2021-03-12 15:46 ` [RFC 1/6] drm/i915: Individual request cancellation Tvrtko Ursulin
2021-03-12 15:46   ` [Intel-gfx] " Tvrtko Ursulin
2021-03-15 17:37   ` Tvrtko Ursulin
2021-03-15 17:37     ` Tvrtko Ursulin
2021-03-16 10:02     ` Daniel Vetter
2021-03-16 10:02       ` Daniel Vetter
2021-03-12 15:46 ` [RFC 2/6] drm/i915: Restrict sentinel requests further Tvrtko Ursulin
2021-03-12 15:46   ` [Intel-gfx] " Tvrtko Ursulin
2021-03-12 15:46 ` [RFC 3/6] drm/i915: Request watchdog infrastructure Tvrtko Ursulin
2021-03-12 15:46   ` [Intel-gfx] " Tvrtko Ursulin
2021-03-12 15:46 ` [RFC 4/6] drm/i915: Allow userspace to configure the watchdog Tvrtko Ursulin
2021-03-12 15:46   ` [Intel-gfx] " Tvrtko Ursulin
2021-03-16 10:09   ` Daniel Vetter
2021-03-16 10:09     ` [Intel-gfx] " Daniel Vetter
2021-03-12 15:46 ` [RFC 5/6] drm/i915: Fail too long user submissions by default Tvrtko Ursulin
2021-03-12 15:46   ` [Intel-gfx] " Tvrtko Ursulin
2021-03-16 10:10   ` Daniel Vetter
2021-03-16 10:10     ` [Intel-gfx] " Daniel Vetter
2021-03-12 15:46 ` [RFC 6/6] drm/i915: Allow configuring default request expiry via modparam Tvrtko Ursulin
2021-03-12 15:46   ` [Intel-gfx] " Tvrtko Ursulin
2021-03-16 10:03   ` Daniel Vetter
2021-03-16 10:03     ` [Intel-gfx] " Daniel Vetter
2021-03-12 16:22 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Default request/fence expiry + watchdog Patchwork
2021-03-12 16:48 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2021-03-12 18:25 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210312154622.1767865-1-tvrtko.ursulin@linux.intel.com \
    --to=tvrtko.ursulin@linux.intel.com \
    --cc=Intel-gfx@lists.freedesktop.org \
    --cc=daniel.vetter@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=tvrtko.ursulin@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.