AMD-GFX Archive on lore.kernel.org
 help / color / Atom feed
From: Chris Wilson <chris@chris-wilson.co.uk>
To: Daniel Vetter <daniel@ffwll.ch>
Cc: linux-rdma <linux-rdma@vger.kernel.org>,
	Daniel Vetter <daniel.vetter@ffwll.ch>,
	Intel Graphics Development <intel-gfx@lists.freedesktop.org>,
	LKML <linux-kernel@vger.kernel.org>,
	DRI Development <dri-devel@lists.freedesktop.org>,
	"moderated list:DMA BUFFER SHARING FRAMEWORK"
	<linaro-mm-sig@lists.linaro.org>,
	Thomas Hellstrom <thomas.hellstrom@intel.com>,
	amd-gfx mailing list <amd-gfx@lists.freedesktop.org>,
	Daniel Vetter <daniel.vetter@intel.com>,
	Linux Media Mailing List <linux-media@vger.kernel.org>,
	Christian König <christian.koenig@amd.com>,
	Mika Kuoppala <mika.kuoppala@intel.com>
Subject: Re: [Intel-gfx] [PATCH 03/18] dma-fence: basic lockdep annotations
Date: Fri, 19 Jun 2020 14:12:17 +0100
Message-ID: <159257233754.7737.17318605310513355800@build.alporthouse.com> (raw)
In-Reply-To: <20200619094309.GT20149@phenom.ffwll.local>

Quoting Daniel Vetter (2020-06-19 10:43:09)
> On Fri, Jun 19, 2020 at 10:13:35AM +0100, Chris Wilson wrote:
> > Quoting Daniel Vetter (2020-06-19 09:51:59)
> > > On Fri, Jun 19, 2020 at 10:25 AM Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > > > Forcing a generic primitive to always be part of the same global map is
> > > > horrible.
> > > 
> > > And  no concrete example or reason for why that's not possible.
> > > Because frankly it's not horrible, this is what upstream is all about:
> > > Shared concepts, shared contracts, shared code.
> > > 
> > > The proposed patches might very well encode the wrong contract, that's
> > > all up for discussion. But fundamentally questioning that we need one
> > > is missing what upstream is all about.
> > 
> > Then I have not clearly communicated, as my opinion is not that
> > validation is worthless, but that the implementation is enshrining a
> > global property on a low level primitive that prevents it from being
> > used elsewhere. And I want to replace completion [chains] with fences, and
> > bio with fences, and closures with fences, and what other equivalencies
> > there are in the kernel. The fence is as central a locking construct as
> > struct completion and deserves to be a foundational primitive provided
> > by kernel/ used throughout all drivers for discrete problem domains.
> > 
> > This is narrowing dma_fence whereby adding
> >       struct lockdep_map *dma_fence::wait_map
> > and annotating linkage, allows you to continue to specify that all
> > dma_fence used for a particular purpose must follow common rules,
> > without restricting the primitive for uses outside of this scope.
> 
> Somewhere else in this thread I had discussions with Jason Gunthorpe about
> this topic. It might maybe change somewhat depending upon exact rules, but
> his take is very much "I don't want dma_fence in rdma". Or pretty close to
> that at least.
> 
> Similar discussions with habanalabs, they're using dma_fence internally
> without any of the uapi. Discussion there has also now concluded that it's
> best if they remove them, and simply switch over to a wait_queue or
> completion like every other driver does.
> 
> The next round of the patches already have a paragraph to at least
> somewhat limit how non-gpu drivers use dma_fence. And I guess actual
> consensus might be pointing even more strongly at dma_fence being solely
> something for gpus and closely related subsystem (maybe media) for syncing
> dma-buf access.
> 
> So dma_fence as general replacement for completion chains I think just
> wont happen.

That is sad. I cannot comprehend going back to pure completions after a
taste of fence scheduling. And we are not even close to fully utilising
them, as not all the async cpu [allocation!] tasks are fully tracked by
fences yet and are still stuck in a FIFO workqueue.

> What might make sense is if e.g. the lockdep annotations could be reused,
> at least in design, for wait_queue or completion or anything else
> really. I do think that has a fair chance compared to the automagic
> cross-release annotations approach, which relied way too heavily on
> guessing where barriers are. My experience from just a bit of playing
> around with these patches here and discussing them with other driver
> maintainers is that accurately deciding where critical sections start and
> end is a job for humans only. And if you get it wrong, you will have a
> false positive.
> 
> And you're indeed correct that if we'd do annotations for completions and
> wait queues, then that would need to have a class per semantically
> equivalent user, like we have lockdep classes for mutexes, not just one
> overall.
> 
> But dma_fence otoh is something very specific, which comes with very
> specific rules attached - it's not a generic wait_queue at all. Originally
> it did start out as one even, but it is a very specialized wait_queue.
> 
> So there's imo two cases:
> 
> - Your completion is entirely orthogonal of dma_fences, and can never ever
>   block a dma_fence. Don't use dma_fence for this, and no problem. It's
>   just another wait_queue somewhere.
> 
> - Your completion can eventually, maybe through lots of convolutions and
>   depdencies, block a dma_fence. In that case full dma_fence rules apply,
>   and the only thing you can do with a custom annotation is make the rules
>   even stricter. E.g. if a sub-timeline in the scheduler isn't allowed to
>   take certain scheduler locks. But the userspace visible/published fence
>   do take them, maybe as part of command submission or retirement.
>   Entirely hypotethical, no idea any driver actually needs this.

I think we are faced with this very real problem.

The papering we have today over userptr is so very thin, and if you
squint you can already see it is coupled into the completion signal. Just
it happens to be on the other side of the fence.

The next batch of priority inversions involve integrating the async cpu
tasks into the scheduler, and have full dependency tracking over every
internal fence. I do not see any way to avoid coupling the completion
signal from the GPU to the earliest resource allocation, as it's an
unbroken chain of work, at least from the user's perspective. [Next up
for annotations is that we need to always assume that userspace has an
implicit lock on GPU resources; having to break that lock with a GPU
reset should be a breach of our data integrity, and best avoided, for
compute does not care one iota about system integrity and insist
userspace knows best.] Such allocations have to be allowed to fail and
for that failure to propagate cancelling the queued work, such that I'm
considering what rules we need for gfp_t. That might allow enough
leverage to break any fs_reclaim loops, but userptr is likely forever
doomed [aside from its fs_reclaim loop is as preventable as the normal
shrinker paths], but we still need to suggest to pin_user_pages that
failure is better than oom and that is not clear atm. Plus the usual
failure can happen at any time after updating the user facing
bookkeeping, but that is just extra layers in the execution monitor
ready to step in and replacing failing work with the error propagation.
Or where the system grinds to a halt, requiring the monitor to patch in
a new page / resource.
-Chris
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

  reply index

Thread overview: 106+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-04  8:12 [PATCH 00/18] dma-fence lockdep annotations, round 2 Daniel Vetter
2020-06-04  8:12 ` [PATCH 01/18] mm: Track mmu notifiers in fs_reclaim_acquire/release Daniel Vetter
2020-06-10 12:01   ` Thomas Hellström (Intel)
2020-06-10 12:25     ` [Intel-gfx] " Daniel Vetter
2020-06-10 19:41   ` [PATCH] " Daniel Vetter
2020-06-11 14:29     ` Jason Gunthorpe
2020-06-21 17:42     ` Qian Cai
2020-06-21 18:07       ` Daniel Vetter
2020-06-21 20:01         ` Daniel Vetter
2020-06-21 22:09           ` Qian Cai
2020-06-23 16:17           ` Qian Cai
2020-06-23 22:13             ` Daniel Vetter
2020-06-23 22:29               ` Qian Cai
2020-06-23 22:31       ` Dave Chinner
2020-06-23 22:36         ` Daniel Vetter
2020-06-21 17:00   ` [PATCH 01/18] " Qian Cai
2020-06-21 17:28     ` Daniel Vetter
2020-06-21 17:46       ` Qian Cai
2020-06-04  8:12 ` [PATCH 02/18] dma-buf: minor doc touch-ups Daniel Vetter
2020-06-10 13:07   ` Thomas Hellström (Intel)
2020-06-04  8:12 ` [PATCH 03/18] dma-fence: basic lockdep annotations Daniel Vetter
2020-06-04  8:57   ` Thomas Hellström (Intel)
2020-06-04  9:21     ` Daniel Vetter
2020-06-04  9:26       ` Chris Wilson
2020-06-04  9:36         ` [Intel-gfx] " Daniel Vetter
2020-06-05 13:29   ` [PATCH] " Daniel Vetter
2020-06-05 14:30     ` Thomas Hellström (Intel)
2020-06-11  9:57     ` Maarten Lankhorst
2020-06-10 14:21   ` [Intel-gfx] [PATCH 03/18] " Tvrtko Ursulin
2020-06-10 15:17     ` Daniel Vetter
2020-06-11 10:36       ` Tvrtko Ursulin
2020-06-11 11:29         ` Daniel Vetter
2020-06-11 14:29           ` Tvrtko Ursulin
2020-06-11 15:03             ` Daniel Vetter
2020-06-11  8:00   ` Chris Wilson
2020-06-11  8:44     ` Dave Airlie
2020-06-11  9:01       ` [Intel-gfx] " Daniel Stone
2020-06-19  8:25         ` Chris Wilson
2020-06-19  8:51           ` Daniel Vetter
2020-06-19  9:13             ` Chris Wilson
2020-06-19  9:43               ` Daniel Vetter
2020-06-19 13:12                 ` Chris Wilson [this message]
2020-06-22  9:16                   ` Daniel Vetter
2020-07-09  7:29                 ` Daniel Stone
2020-07-09  8:01                   ` Daniel Vetter
2020-06-12  7:06   ` [PATCH] " Daniel Vetter
2020-06-04  8:12 ` [PATCH 04/18] dma-fence: prime " Daniel Vetter
2020-06-11  7:30   ` [Linaro-mm-sig] " Thomas Hellström (Intel)
2020-06-11  8:34     ` Daniel Vetter
2020-06-11 14:15       ` Jason Gunthorpe
2020-06-11 23:35         ` Felix Kuehling
2020-06-12  5:11           ` Daniel Vetter
2020-06-19 18:13           ` Jerome Glisse
2020-06-23  7:39           ` Daniel Vetter
2020-06-23 18:44             ` Felix Kuehling
2020-06-23 19:02               ` Daniel Vetter
2020-06-16 12:07         ` Daniel Vetter
2020-06-16 14:53           ` Jason Gunthorpe
2020-06-17  7:57             ` Daniel Vetter
2020-06-17 15:29               ` Jason Gunthorpe
2020-06-18 14:42                 ` Daniel Vetter
2020-06-17  6:48           ` Daniel Vetter
2020-06-17 15:28             ` Jason Gunthorpe
2020-06-18 15:00               ` Daniel Vetter
2020-06-18 17:23                 ` Jason Gunthorpe
2020-06-19  7:22                   ` Daniel Vetter
2020-06-19 11:39                     ` Jason Gunthorpe
2020-06-19 15:06                       ` Daniel Vetter
2020-06-19 15:15                         ` Jason Gunthorpe
2020-06-19 16:19                           ` Daniel Vetter
2020-06-19 17:23                             ` Jason Gunthorpe
2020-06-19 18:09                               ` Jerome Glisse
2020-06-19 18:18                                 ` Jason Gunthorpe
2020-06-19 19:48                                   ` Felix Kuehling
2020-06-19 19:55                                     ` Jason Gunthorpe
2020-06-19 20:03                                       ` Felix Kuehling
2020-06-19 20:31                                       ` Jerome Glisse
2020-06-22 11:46                                         ` Jason Gunthorpe
2020-06-22 20:15                                           ` Jerome Glisse
2020-06-23  0:02                                             ` Jason Gunthorpe
2020-06-19 20:10                                   ` Jerome Glisse
2020-06-19 20:43                                     ` Daniel Vetter
2020-06-19 20:59                                       ` Jerome Glisse
2020-06-23  0:05                                     ` Jason Gunthorpe
2020-06-19 19:11                                 ` Alex Deucher
2020-06-19 19:30                                   ` Felix Kuehling
2020-06-19 19:40                                     ` Jerome Glisse
2020-06-19 19:51                                     ` Jason Gunthorpe
2020-06-12  7:01   ` [PATCH] " Daniel Vetter
2020-06-04  8:12 ` [PATCH 05/18] drm/vkms: Annotate vblank timer Daniel Vetter
2020-06-04  8:12 ` [PATCH 06/18] drm/vblank: Annotate with dma-fence signalling section Daniel Vetter
2020-06-04  8:12 ` [PATCH 07/18] drm/atomic-helper: Add dma-fence annotations Daniel Vetter
2020-06-04  8:12 ` [PATCH 08/18] drm/amdgpu: add dma-fence annotations to atomic commit path Daniel Vetter
2020-06-23 10:51   ` Daniel Vetter
2020-06-04  8:12 ` [PATCH 09/18] drm/scheduler: use dma-fence annotations in main thread Daniel Vetter
2020-06-04  8:12 ` [PATCH 10/18] drm/amdgpu: use dma-fence annotations in cs_submit() Daniel Vetter
2020-06-04  8:12 ` [PATCH 11/18] drm/amdgpu: s/GFP_KERNEL/GFP_ATOMIC in scheduler code Daniel Vetter
2020-06-04  8:12 ` [PATCH 12/18] drm/amdgpu: DC also loves to allocate stuff where it shouldn't Daniel Vetter
2020-06-04  8:12 ` [PATCH 13/18] drm/amdgpu/dc: Stop dma_resv_lock inversion in commit_tail Daniel Vetter
2020-06-05  8:30   ` Pierre-Eric Pelloux-Prayer
2020-06-05 12:41     ` Daniel Vetter
2020-06-04  8:12 ` [PATCH 14/18] drm/scheduler: use dma-fence annotations in tdr work Daniel Vetter
2020-06-04  8:12 ` [PATCH 15/18] drm/amdgpu: use dma-fence annotations for gpu reset code Daniel Vetter
2020-06-04  8:12 ` [PATCH 16/18] Revert "drm/amdgpu: add fbdev suspend/resume on gpu reset" Daniel Vetter
2020-06-04  8:12 ` [PATCH 17/18] drm/amdgpu: gpu recovery does full modesets Daniel Vetter
2020-06-04  8:12 ` [PATCH 18/18] drm/i915: Annotate dma_fence_work Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=159257233754.7737.17318605310513355800@build.alporthouse.com \
    --to=chris@chris-wilson.co.uk \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=daniel.vetter@ffwll.ch \
    --cc=daniel.vetter@intel.com \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=mika.kuoppala@intel.com \
    --cc=thomas.hellstrom@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

AMD-GFX Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/amd-gfx/0 amd-gfx/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 amd-gfx amd-gfx/ https://lore.kernel.org/amd-gfx \
		amd-gfx@lists.freedesktop.org
	public-inbox-index amd-gfx

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.freedesktop.lists.amd-gfx


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git