From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> To: Intel-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Subject: Re: [Intel-gfx] [RFC 1/6] drm/i915: Individual request cancellation Date: Mon, 15 Mar 2021 17:37:27 +0000 [thread overview] Message-ID: <f361804a-2c51-77ee-dbb4-0caba6bfffd0@linux.intel.com> (raw) In-Reply-To: <20210312154622.1767865-2-tvrtko.ursulin@linux.intel.com> On 12/03/2021 15:46, Tvrtko Ursulin wrote: > From: Chris Wilson <chris@chris-wilson.co.uk> > > Currently, we cancel outstanding requests within a context when the > context is closed. We may also want to cancel individual requests using > the same graceful preemption mechanism. > > v2 (Tvrtko): > * Cancel waiters carefully considering no timeline lock and RCU. > * Fixed selftests. > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> [snip] > +void i915_request_cancel(struct i915_request *rq, int error) > +{ > + if (!i915_request_set_error_once(rq, error)) > + return; > + > + set_bit(I915_FENCE_FLAG_SENTINEL, &rq->fence.flags); > + > + if (i915_sw_fence_signaled(&rq->submit)) { > + struct i915_dependency *p; > + > +restart: > + rcu_read_lock(); > + for_each_waiter(p, rq) { > + struct i915_request *w = > + container_of(p->waiter, typeof(*w), sched); > + > + if (__i915_request_is_complete(w) || > + fatal_error(w->fence.error)) > + continue; > + > + w = i915_request_get(w); > + rcu_read_unlock(); > + /* Recursion bound by the number of engines */ > + i915_request_cancel(w, error); > + i915_request_put(w); > + > + /* Restart after having to drop rcu lock. */ > + goto restart; > + } So I need to fix this error propagation to waiters in order to avoid potential stack overflow caught in shards (gem_ctx_ringsize). Or alternatively we decide not to propagate fence errors. Not sure that consequences either way are particularly better or worse. Things will break anyway since what userspace inspects for unexpected fence errors?! So rendering corruption more or less. Can it cause a further stream of GPU hangs I am not sure. Only if there is a inter-engine data dependency involving data more complex than images/textures. Regards, Tvrtko > + rcu_read_unlock(); > + } > + > + __cancel_request(rq); > +} _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
WARNING: multiple messages have this Message-ID (diff)
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> To: Intel-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Subject: Re: [Intel-gfx] [RFC 1/6] drm/i915: Individual request cancellation Date: Mon, 15 Mar 2021 17:37:27 +0000 [thread overview] Message-ID: <f361804a-2c51-77ee-dbb4-0caba6bfffd0@linux.intel.com> (raw) In-Reply-To: <20210312154622.1767865-2-tvrtko.ursulin@linux.intel.com> On 12/03/2021 15:46, Tvrtko Ursulin wrote: > From: Chris Wilson <chris@chris-wilson.co.uk> > > Currently, we cancel outstanding requests within a context when the > context is closed. We may also want to cancel individual requests using > the same graceful preemption mechanism. > > v2 (Tvrtko): > * Cancel waiters carefully considering no timeline lock and RCU. > * Fixed selftests. > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> [snip] > +void i915_request_cancel(struct i915_request *rq, int error) > +{ > + if (!i915_request_set_error_once(rq, error)) > + return; > + > + set_bit(I915_FENCE_FLAG_SENTINEL, &rq->fence.flags); > + > + if (i915_sw_fence_signaled(&rq->submit)) { > + struct i915_dependency *p; > + > +restart: > + rcu_read_lock(); > + for_each_waiter(p, rq) { > + struct i915_request *w = > + container_of(p->waiter, typeof(*w), sched); > + > + if (__i915_request_is_complete(w) || > + fatal_error(w->fence.error)) > + continue; > + > + w = i915_request_get(w); > + rcu_read_unlock(); > + /* Recursion bound by the number of engines */ > + i915_request_cancel(w, error); > + i915_request_put(w); > + > + /* Restart after having to drop rcu lock. */ > + goto restart; > + } So I need to fix this error propagation to waiters in order to avoid potential stack overflow caught in shards (gem_ctx_ringsize). Or alternatively we decide not to propagate fence errors. Not sure that consequences either way are particularly better or worse. Things will break anyway since what userspace inspects for unexpected fence errors?! So rendering corruption more or less. Can it cause a further stream of GPU hangs I am not sure. Only if there is a inter-engine data dependency involving data more complex than images/textures. Regards, Tvrtko > + rcu_read_unlock(); > + } > + > + __cancel_request(rq); > +} _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2021-03-15 17:37 UTC|newest] Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-03-12 15:46 [RFC 0/6] Default request/fence expiry + watchdog Tvrtko Ursulin 2021-03-12 15:46 ` [Intel-gfx] " Tvrtko Ursulin 2021-03-12 15:46 ` [RFC 1/6] drm/i915: Individual request cancellation Tvrtko Ursulin 2021-03-12 15:46 ` [Intel-gfx] " Tvrtko Ursulin 2021-03-15 17:37 ` Tvrtko Ursulin [this message] 2021-03-15 17:37 ` Tvrtko Ursulin 2021-03-16 10:02 ` Daniel Vetter 2021-03-16 10:02 ` Daniel Vetter 2021-03-12 15:46 ` [RFC 2/6] drm/i915: Restrict sentinel requests further Tvrtko Ursulin 2021-03-12 15:46 ` [Intel-gfx] " Tvrtko Ursulin 2021-03-12 15:46 ` [RFC 3/6] drm/i915: Request watchdog infrastructure Tvrtko Ursulin 2021-03-12 15:46 ` [Intel-gfx] " Tvrtko Ursulin 2021-03-12 15:46 ` [RFC 4/6] drm/i915: Allow userspace to configure the watchdog Tvrtko Ursulin 2021-03-12 15:46 ` [Intel-gfx] " Tvrtko Ursulin 2021-03-16 10:09 ` Daniel Vetter 2021-03-16 10:09 ` [Intel-gfx] " Daniel Vetter 2021-03-12 15:46 ` [RFC 5/6] drm/i915: Fail too long user submissions by default Tvrtko Ursulin 2021-03-12 15:46 ` [Intel-gfx] " Tvrtko Ursulin 2021-03-16 10:10 ` Daniel Vetter 2021-03-16 10:10 ` [Intel-gfx] " Daniel Vetter 2021-03-12 15:46 ` [RFC 6/6] drm/i915: Allow configuring default request expiry via modparam Tvrtko Ursulin 2021-03-12 15:46 ` [Intel-gfx] " Tvrtko Ursulin 2021-03-16 10:03 ` Daniel Vetter 2021-03-16 10:03 ` [Intel-gfx] " Daniel Vetter 2021-03-12 16:22 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Default request/fence expiry + watchdog Patchwork 2021-03-12 16:48 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork 2021-03-12 18:25 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=f361804a-2c51-77ee-dbb4-0caba6bfffd0@linux.intel.com \ --to=tvrtko.ursulin@linux.intel.com \ --cc=Intel-gfx@lists.freedesktop.org \ --cc=dri-devel@lists.freedesktop.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.