dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Vetter <daniel.vetter@ffwll.ch>
To: DRI Development <dri-devel@lists.freedesktop.org>
Cc: linux-rdma@vger.kernel.org,
	"Daniel Vetter" <daniel.vetter@ffwll.ch>,
	"Intel Graphics Development" <intel-gfx@lists.freedesktop.org>,
	amd-gfx@lists.freedesktop.org,
	"Chris Wilson" <chris@chris-wilson.co.uk>,
	linaro-mm-sig@lists.linaro.org,
	"Daniel Vetter" <daniel.vetter@intel.com>,
	"Christian König" <christian.koenig@amd.com>,
	linux-media@vger.kernel.org
Subject: [PATCH 22/25] drm/scheduler: use dma-fence annotations in tdr work
Date: Tue,  7 Jul 2020 22:12:26 +0200	[thread overview]
Message-ID: <20200707201229.472834-23-daniel.vetter@ffwll.ch> (raw)
In-Reply-To: <20200707201229.472834-1-daniel.vetter@ffwll.ch>

In the face of unpriviledged userspace being able to submit bogus gpu
workloads the kernel needs gpu timeout and reset (tdr) to guarantee
that dma_fences actually complete. Annotate this worker to make sure
we don't have any accidental locking inversions or other problems
lurking.

Originally this was part of the overall scheduler annotation patch.
But amdgpu has some glorious inversions here:

- grabs console_lock
- does a full modeset, which grabs all kinds of locks
  (drm_modeset_lock, dma_resv_lock) which can deadlock with
  dma_fence_wait held inside them.
- almost minor at that point, but the modeset code also allocates
  memory

These all look like they'll be very hard to fix properly, the hardware
seems to require a full display reset with any gpu recovery.

Hence split out as a seperate patch.

Since amdgpu isn't the only hardware driver that needs to reset the
display (at least gen2/3 on intel have the same problem) we need a
generic solution for this. There's two tricks we could still from
drm/i915 and lift to dma-fence:

- The big whack, aka force-complete all fences. i915 does this for all
  pending jobs if the reset is somehow stuck. Trouble is we'd need to
  do this for all fences in the entire system, and just the
  book-keeping for that will be fun. Plus lots of drivers use fences
  for all kinds of internal stuff like memory management, so
  unconditionally resetting all of them doesn't work.

  I'm also hoping that with these fence annotations we could enlist
  lockdep in finding the last offenders causing deadlocks, and we
  could remove this get-out-of-jail trick.

- The more feasible approach (across drivers at least as part of the
  dma_fence contract) is what drm/i915 does for gen2/3: When we need
  to reset the display we wake up all dma_fence_wait_interruptible
  calls, or well at least the equivalent of those in i915 internally.

  Relying on ioctl restart we force all other threads to release their
  locks, which means the tdr thread is guaranteed to be able to get
  them. I think we could implement this at the dma_fence level,
  including proper lockdep annotations.

  dma_fence_begin_tdr():
  - must be nested within a dma_fence_begin/end_signalling section
  - will wake up all interruptible (but not the non-interruptible)
    dma_fence_wait() calls and force them to complete with a
    -ERESTARTSYS errno code. All new interrupitble calls to
    dma_fence_wait() will immeidately fail with the same error code.

  dma_fence_end_trdr():
  - this will convert dma_fence_wait() calls back to normal.

  Of course interrupting dma_fence_wait is only ok if the caller
  specified that, which means we need to split the annotations into
  interruptible and non-interruptible version. If we then make sure
  that we only use interruptible dma_fence_wait() calls while holding
  drm_modeset_lock we can grab them in tdr code, and allow display
  resets. Doing the same for dma_resv_lock might be a lot harder, so
  buffer updates must be avoided.

  What's worse, we're not going to be able to make the dma_fence_wait
  calls in mmu-notifiers interruptible, that doesn't work. So
  allocating memory still wont' be allowed, even in tdr sections. Plus
  obviously we can use this trick only in tdr, it is rather intrusive.

Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
Cc: linux-rdma@vger.kernel.org
Cc: amd-gfx@lists.freedesktop.org
Cc: intel-gfx@lists.freedesktop.org
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
 drivers/gpu/drm/scheduler/sched_main.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 52f1ab4bc922..a1c091e11ffd 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -281,9 +281,12 @@ static void drm_sched_job_timedout(struct work_struct *work)
 {
 	struct drm_gpu_scheduler *sched;
 	struct drm_sched_job *job;
+	bool fence_cookie;
 
 	sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
 
+	fence_cookie = dma_fence_begin_signalling();
+
 	/* Protects against concurrent deletion in drm_sched_get_cleanup_job */
 	spin_lock(&sched->job_list_lock);
 	job = list_first_entry_or_null(&sched->ring_mirror_list,
@@ -315,6 +318,8 @@ static void drm_sched_job_timedout(struct work_struct *work)
 	spin_lock(&sched->job_list_lock);
 	drm_sched_start_timeout(sched);
 	spin_unlock(&sched->job_list_lock);
+
+	dma_fence_end_signalling(fence_cookie);
 }
 
  /**
-- 
2.27.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

  parent reply	other threads:[~2020-07-07 20:14 UTC|newest]

Thread overview: 119+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-07 20:12 [PATCH 00/25] dma-fence annotations, round 3 Daniel Vetter
2020-07-07 20:12 ` [PATCH 01/25] dma-fence: basic lockdep annotations Daniel Vetter
2020-07-08 14:57   ` Christian König
2020-07-08 15:12     ` Daniel Vetter
2020-07-08 15:19       ` Alex Deucher
2020-07-08 15:37         ` Daniel Vetter
2020-07-14 11:09           ` Daniel Vetter
2020-07-09  7:32       ` [Intel-gfx] " Daniel Stone
2020-07-09  7:52         ` Daniel Vetter
2020-07-13 16:26     ` Daniel Vetter
2020-07-13 16:39       ` Christian König
2020-07-13 20:31         ` Dave Airlie
2020-07-07 20:12 ` [PATCH 02/25] dma-fence: prime " Daniel Vetter
2020-07-09  8:09   ` Daniel Vetter
2020-07-10 12:43     ` Jason Gunthorpe
2020-07-10 12:48       ` Christian König
2020-07-10 12:54         ` Jason Gunthorpe
2020-07-10 13:01           ` Christian König
2020-07-10 13:48             ` Jason Gunthorpe
2020-07-10 14:02               ` Daniel Vetter
2020-07-10 14:23                 ` Jason Gunthorpe
2020-07-10 20:02                   ` Daniel Vetter
2020-07-07 20:12 ` [PATCH 03/25] dma-buf.rst: Document why idenfinite fences are a bad idea Daniel Vetter
2020-07-09  7:36   ` [Intel-gfx] " Daniel Stone
2020-07-09  8:04     ` Daniel Vetter
2020-07-09 12:11       ` Daniel Stone
2020-07-09 12:31         ` Daniel Vetter
2020-07-09 14:28           ` Christian König
2020-07-09 11:53   ` Christian König
2020-07-09 12:33   ` [PATCH 1/2] dma-buf.rst: Document why indefinite " Daniel Vetter
2020-07-09 12:33     ` [PATCH 2/2] drm/virtio: Remove open-coded commit-tail function Daniel Vetter
2020-07-09 12:48       ` Gerd Hoffmann
2020-07-09 14:05       ` Sam Ravnborg
2020-07-14  9:13         ` Daniel Vetter
2020-08-19 12:43       ` Jiri Slaby
2020-08-19 12:47         ` Jiri Slaby
2020-08-19 13:24         ` Gerd Hoffmann
2020-08-20  6:32           ` Jiri Slaby
2020-08-21  7:01             ` Gerd Hoffmann
2020-07-10 12:30     ` [PATCH 1/2] dma-buf.rst: Document why indefinite fences are a bad idea Maarten Lankhorst
2020-07-14 17:46     ` Jason Ekstrand
2020-07-20 11:15     ` [Linaro-mm-sig] " Thomas Hellström (Intel)
2020-07-21  7:41       ` Daniel Vetter
2020-07-21  7:45         ` Christian König
2020-07-21  8:47           ` Thomas Hellström (Intel)
2020-07-21  8:55             ` Christian König
2020-07-21  9:16               ` Daniel Vetter
2020-07-21  9:24                 ` Daniel Vetter
2020-07-21  9:37               ` Thomas Hellström (Intel)
2020-07-21  9:50                 ` Daniel Vetter
2020-07-21 10:47                   ` Thomas Hellström (Intel)
2020-07-21 13:59                     ` Christian König
2020-07-21 17:46                       ` Thomas Hellström (Intel)
2020-07-21 18:18                         ` Daniel Vetter
2020-07-21 21:42                       ` Dave Airlie
2020-07-21 22:45             ` Dave Airlie
2020-07-22  6:45               ` Thomas Hellström (Intel)
2020-07-22  7:11                 ` Daniel Vetter
2020-07-22  8:05                   ` Thomas Hellström (Intel)
2020-07-22  9:45                     ` Daniel Vetter
2020-07-22 10:31                       ` Thomas Hellström (Intel)
2020-07-22 11:39                         ` Daniel Vetter
2020-07-22 12:22                           ` Thomas Hellström (Intel)
2020-07-22 12:41                             ` Daniel Vetter
2020-07-22 13:12                               ` Thomas Hellström (Intel)
2020-07-22 14:07                                 ` Daniel Vetter
2020-07-22 14:23                                   ` Christian König
2020-07-22 14:30                                     ` Thomas Hellström (Intel)
2020-07-22 14:35                                       ` Christian König
2020-07-07 20:12 ` [PATCH 04/25] drm/vkms: Annotate vblank timer Daniel Vetter
2020-07-12 22:27   ` Rodrigo Siqueira
2020-07-14  9:57     ` Melissa Wen
2020-07-14  9:59       ` Daniel Vetter
2020-07-14 14:55         ` Melissa Wen
2020-07-14 15:23           ` Daniel Vetter
2020-07-07 20:12 ` [PATCH 05/25] drm/vblank: Annotate with dma-fence signalling section Daniel Vetter
2020-07-07 20:12 ` [PATCH 06/25] drm/amdgpu: add dma-fence annotations to atomic commit path Daniel Vetter
2020-07-07 20:12 ` [PATCH 07/25] drm/komdea: Annotate dma-fence critical section in " Daniel Vetter
2020-07-08  5:17   ` james qian wang (Arm Technology China)
2020-07-14  8:34     ` Daniel Vetter
2020-07-07 20:12 ` [PATCH 08/25] drm/malidp: " Daniel Vetter
2020-07-15 12:53   ` Liviu Dudau
2020-07-15 13:51     ` Daniel Vetter
2020-07-07 20:12 ` [PATCH 09/25] drm/atmel: Use drm_atomic_helper_commit Daniel Vetter
2020-07-07 20:37   ` Sam Ravnborg
2020-07-07 21:31   ` [PATCH] " Daniel Vetter
2020-07-14  9:55     ` Sam Ravnborg
2020-07-07 20:12 ` [PATCH 10/25] drm/imx: Annotate dma-fence critical section in commit path Daniel Vetter
2020-07-07 20:12 ` [PATCH 11/25] drm/omapdrm: " Daniel Vetter
2020-07-07 20:12 ` [PATCH 12/25] drm/rcar-du: " Daniel Vetter
2020-07-07 23:32   ` Laurent Pinchart
2020-07-14  8:39     ` Daniel Vetter
2020-07-07 20:12 ` [PATCH 13/25] drm/tegra: " Daniel Vetter
2020-07-07 20:12 ` [PATCH 14/25] drm/tidss: " Daniel Vetter
2020-07-08  9:01   ` Jyri Sarha
2020-07-07 20:12 ` [PATCH 15/25] drm/tilcdc: Use standard drm_atomic_helper_commit Daniel Vetter
2020-07-08  9:17   ` Jyri Sarha
2020-07-08  9:27     ` Daniel Vetter
2020-07-08  9:44   ` [PATCH] " Daniel Vetter
2020-07-08 10:21     ` Jyri Sarha
2020-07-08 14:20   ` Daniel Vetter
2020-07-10 11:16     ` Jyri Sarha
2020-07-14  8:32       ` Daniel Vetter
2020-07-07 20:12 ` [PATCH 16/25] drm/atomic-helper: Add dma-fence annotations Daniel Vetter
2020-07-07 20:12 ` [PATCH 17/25] drm/scheduler: use dma-fence annotations in main thread Daniel Vetter
2020-07-07 20:12 ` [PATCH 18/25] drm/amdgpu: use dma-fence annotations in cs_submit() Daniel Vetter
2020-07-07 20:12 ` [PATCH 19/25] drm/amdgpu: s/GFP_KERNEL/GFP_ATOMIC in scheduler code Daniel Vetter
2020-07-14 10:49   ` Daniel Vetter
2020-07-14 11:40     ` Christian König
2020-07-14 14:31       ` Daniel Vetter
2020-07-15  9:17         ` Christian König
2020-07-15 11:53           ` Daniel Vetter
2020-07-07 20:12 ` [PATCH 20/25] drm/amdgpu: DC also loves to allocate stuff where it shouldn't Daniel Vetter
2020-07-14 11:12   ` Daniel Vetter
2020-07-07 20:12 ` [PATCH 21/25] drm/amdgpu/dc: Stop dma_resv_lock inversion in commit_tail Daniel Vetter
2020-07-07 20:12 ` Daniel Vetter [this message]
2020-07-07 20:12 ` [PATCH 23/25] drm/amdgpu: use dma-fence annotations for gpu reset code Daniel Vetter
2020-07-07 20:12 ` [PATCH 24/25] Revert "drm/amdgpu: add fbdev suspend/resume on gpu reset" Daniel Vetter
2020-07-07 20:12 ` [PATCH 25/25] drm/amdgpu: gpu recovery does full modesets Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200707201229.472834-23-daniel.vetter@ffwll.ch \
    --to=daniel.vetter@ffwll.ch \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=chris@chris-wilson.co.uk \
    --cc=christian.koenig@amd.com \
    --cc=daniel.vetter@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=linux-media@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).