All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Wilson <chris@chris-wilson.co.uk>
To: Lionel Landwerlin <lionel.g.landwerlin@intel.com>,
	intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH v3 2/2] drm/i915: add syncobj timeline support
Date: Thu, 01 Aug 2019 10:22:13 +0100	[thread overview]
Message-ID: <156465133388.2512.9224045476475482021@skylake-alporthouse-com> (raw)
In-Reply-To: <e6f97cd6-3e0b-0a6d-fa3b-3341b0ccd555@intel.com>

Quoting Lionel Landwerlin (2019-08-01 10:01:44)
> On 01/08/2019 11:08, Chris Wilson wrote:
> > Quoting Lionel Landwerlin (2019-08-01 08:43:24)
> >> On 31/07/2019 23:03, Chris Wilson wrote:
> >>> Quoting Lionel Landwerlin (2019-07-31 15:07:33)
> >>> I think I have convinced myself that with the split between wait before,
> >>> signal after combined with the rule that seqno point along the syncobj
> >>> are monotonic, you should not be able to generate an AB-BA deadlock
> >>> between concurrent clients.
> >>>
> >>> Except that we need to fixup drm_syncobj_add_point() to actually apply
> >>> that rule. Throwing an error and leaving the client stuck is less than
> >>> ideal, we can't even recover with a GPU reset, and as they are native
> >>> fences we don't employ a failsafe timer.
> >>
> >> Unfortunately we're not the only user of add_point() and amdgpu doesn't
> >> want it to fail.
> > It's dangerously exposing the deadlock from incorrect seqno ordering to
> > userspace. We should be able to hit that DRM_ERROR() in testing, and be
> > able to detect the out-of-sequence timeline.
> >
> >> Best I can come up with is take the syncobj lock when signaling in
> >> get_timeline_fence_array() do the check there, unlock in __free_fence_array.
> > I think the intermediate step is a "safe" version of add_point that
> > doesn't leave the timeline broken. That still leaves a glitch visible to
> > userspace, but it should not deadlock.
> 
> 
> Right, sounds doable. What happens in execbuf after add_point() fails?
> 
> We've already handed the request to the scheduler and potentially it 
> might be running already.

Right, at present we've already submitted the request, so the batch will
be run and fence will be signaled. The failure to add to it the
timeline means that someone else already submitted a later fence and
some third party is using that syncpt instead of ours for their
dependency. So that third party will be delayed, but so long as they kept
their dependencies in order it should just be a delay.

The problem comes into play if we go ahead and insert our fence into the
dma_fence_chain out of order, breaking all the monotonic guarantees and
flipping the order of the fences. (The equivalent to taking mutexes out
of order.)

> > The way I was looking at it was to reserve a placeholder syncpt before
> > building the request and allow replacing the placeholder afterwards.
> 
> 
> That sounds fairly tricky to get right :(

Invasive, yeah, turns out one needs to start walking the fence chain
more carefully. The actual placeholder fence is pretty run of the mill
as far as dma-fence*.c goes, which is say...
 
> The fence stuff is already full of magic.

It's full of magic.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2019-08-01  9:22 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-31 14:07 [PATCH v3 0/2] drm/i915: timeline semaphore support Lionel Landwerlin
2019-07-31 14:07 ` [PATCH v3 1/2] drm/i915: introduce a mechanism to extend execbuf2 Lionel Landwerlin
2019-07-31 21:41   ` Chris Wilson
2019-07-31 14:07 ` [PATCH v3 2/2] drm/i915: add syncobj timeline support Lionel Landwerlin
2019-07-31 20:03   ` Chris Wilson
2019-08-01  7:43     ` Lionel Landwerlin
2019-08-01  8:08       ` Chris Wilson
2019-08-01  9:01         ` Lionel Landwerlin
2019-08-01  9:22           ` Chris Wilson [this message]
2019-08-01  9:37             ` Lionel Landwerlin
2019-08-01 10:18               ` Chris Wilson
2019-08-01 14:29     ` Lionel Landwerlin
2019-08-01 15:16       ` Chris Wilson
2019-08-01 15:35         ` Lionel Landwerlin
2019-08-11  8:51         ` Lionel Landwerlin
2019-07-31 14:27 ` ✗ Fi.CI.CHECKPATCH: warning for drm/i915: timeline semaphore support (rev3) Patchwork
2019-07-31 14:29 ` ✗ Fi.CI.SPARSE: " Patchwork
2019-08-01 14:31 ` ✓ Fi.CI.BAT: success " Patchwork
2019-08-02  6:05 ` ✓ Fi.CI.IGT: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=156465133388.2512.9224045476475482021@skylake-alporthouse-com \
    --to=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=lionel.g.landwerlin@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.