AMD-GFX Archive on lore.kernel.org
 help / color / Atom feed
From: Daniel Vetter <daniel@ffwll.ch>
To: Jason Gunthorpe <jgg@ziepe.ca>
Cc: linux-rdma <linux-rdma@vger.kernel.org>,
	"Thomas Hellström (Intel)" <thomas_os@shipmail.org>,
	"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
	LKML <linux-kernel@vger.kernel.org>,
	"DRI Development" <dri-devel@lists.freedesktop.org>,
	"Christian König" <christian.koenig@amd.com>,
	"moderated list:DMA BUFFER SHARING FRAMEWORK"
	<linaro-mm-sig@lists.linaro.org>,
	"Thomas Hellstrom" <thomas.hellstrom@intel.com>,
	"amd-gfx list" <amd-gfx@lists.freedesktop.org>,
	"Daniel Vetter" <daniel.vetter@intel.com>,
	"Mika Kuoppala" <mika.kuoppala@intel.com>,
	"Intel Graphics Development" <intel-gfx@lists.freedesktop.org>,
	"open list:DMA BUFFER SHARING FRAMEWORK"
	<linux-media@vger.kernel.org>
Subject: Re: [Linaro-mm-sig] [PATCH 04/18] dma-fence: prime lockdep annotations
Date: Fri, 19 Jun 2020 09:22:09 +0200
Message-ID: <CAKMK7uEbqTu4q-amkLXyd1i8KNtLaoO2ZFoGqYiG6D0m0FKpOg@mail.gmail.com> (raw)
In-Reply-To: <20200618172338.GM6578@ziepe.ca>

On Fri, Jun 19, 2020 at 8:58 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Thu, Jun 18, 2020 at 05:00:51PM +0200, Daniel Vetter wrote:
> > On Wed, Jun 17, 2020 at 12:28:35PM -0300, Jason Gunthorpe wrote:
> > > On Wed, Jun 17, 2020 at 08:48:50AM +0200, Daniel Vetter wrote:
> > >
> > > > Now my understanding for rdma is that if you don't have hw page fault
> > > > support,
> > >
> > > The RDMA ODP feature is restartable HW page faulting just like nouveau
> > > has. The classical MR feature doesn't have this. Only mlx5 HW supports
> > > ODP today.
> > >
> > > > It's only gpus (I think) which are in this awkward in-between spot
> > > > where dynamic memory management really is much wanted, but the hw
> > > > kinda sucks. Aside, about 10+ years ago we had a similar problem with
> > > > gpu hw, but for security: Many gpu didn't have any kinds of page
> > > > tables to isolate different clients from each another. drivers/gpu
> > > > fixed this by parsing&validating what userspace submitted to make sure
> > > > it's only every accessing its own buffers. Most gpus have become
> > > > reasonable nowadays and do have proper per-process pagetables (gpu
> > > > process, not the pasid stuff), but even today there's still some of
> > > > the old model left in some of the smallest SoC.
> > >
> > > But I still don't understand why a dma fence is needed inside the GPU
> > > driver itself in the notifier.
> > >
> > > Surely the GPU driver can block and release the notifier directly from
> > > its own command processing channel?
> > >
> > > Why does this fence and all it entails need to leak out across
> > > drivers?
> >
> > So 10 years ago we had this world of every gpu driver is its own bucket,
> > nothing leaks out to the world. But the world had a different idea how
> > gpus where supposed to work, with stuff like:
>
> Sure, I understand DMA fence, but why does a *notifier* need it?
>
> The job of the notifier is to guarentee that the device it is
> connected to is not doing DMA before it returns.
>
> That just means you need to prove that device is done with the buffer.
>
> As I've understood GPU that means you need to show that the commands
> associated with the buffer have completed. This is all local stuff
> within the driver, right? Why use fence (other than it already exists)

Because that's the end-of-dma thing. And it's cross-driver for the
above reasons, e.g.
- device A renders some stuff. Userspace gets dma_fence A out of that
(well sync_file or one of the other uapi interfaces, but you get the
idea)
- userspace (across process or just different driver) issues more
rendering for device B, which depends upon the rendering done on
device A. So dma_fence A is an dependency and will block this dma
operation. Userspace (and the kernel) gets dma_fence B out of this
- because unfortunate reasons, the same rendering on device B also
needs a userptr buffer, which means that dma_fence B is also the one
that the mmu_range_notifier needs to wait on before it can tell core
mm that it can go ahead and release those pages
- unhappiness ensues, because now the mmu notifier from device B can
get hung up on the dma operation device A is doing

If you want to avoid this either a) have less shitty hw (not an
option, gpus are gpus, it is slowly getting better though) or b) force
userspace to stall before handing over to next device (about as
uncool) or c) just pin all the memory always, who cares (also rather
unpopular, gpus tend to use all the memory they can get).

I guess the thing with gpus is that dma operations aren't like
read/writes for pretty much everything else, but essentially compute
contexts (usually implemented as ringbuffers where you stream stuff
into) with cross everything dependencies. This even holds within a
single gpu, since pretty much all modern gpus have multiple different
engines special on different things. And yup that's directly exposed
to userspace, for vulkan and other low-level gpu apis even directly to
applications. So dma operation for gpu isn't just "done when the
read/write finishes", but pulls in an entire chain of dependencies and
ordering that needs to happen before it can even start.

-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

  reply index

Thread overview: 106+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-04  8:12 [PATCH 00/18] dma-fence lockdep annotations, round 2 Daniel Vetter
2020-06-04  8:12 ` [PATCH 01/18] mm: Track mmu notifiers in fs_reclaim_acquire/release Daniel Vetter
2020-06-10 12:01   ` Thomas Hellström (Intel)
2020-06-10 12:25     ` [Intel-gfx] " Daniel Vetter
2020-06-10 19:41   ` [PATCH] " Daniel Vetter
2020-06-11 14:29     ` Jason Gunthorpe
2020-06-21 17:42     ` Qian Cai
2020-06-21 18:07       ` Daniel Vetter
2020-06-21 20:01         ` Daniel Vetter
2020-06-21 22:09           ` Qian Cai
2020-06-23 16:17           ` Qian Cai
2020-06-23 22:13             ` Daniel Vetter
2020-06-23 22:29               ` Qian Cai
2020-06-23 22:31       ` Dave Chinner
2020-06-23 22:36         ` Daniel Vetter
2020-06-21 17:00   ` [PATCH 01/18] " Qian Cai
2020-06-21 17:28     ` Daniel Vetter
2020-06-21 17:46       ` Qian Cai
2020-06-04  8:12 ` [PATCH 02/18] dma-buf: minor doc touch-ups Daniel Vetter
2020-06-10 13:07   ` Thomas Hellström (Intel)
2020-06-04  8:12 ` [PATCH 03/18] dma-fence: basic lockdep annotations Daniel Vetter
2020-06-04  8:57   ` Thomas Hellström (Intel)
2020-06-04  9:21     ` Daniel Vetter
2020-06-04  9:26       ` Chris Wilson
2020-06-04  9:36         ` [Intel-gfx] " Daniel Vetter
2020-06-05 13:29   ` [PATCH] " Daniel Vetter
2020-06-05 14:30     ` Thomas Hellström (Intel)
2020-06-11  9:57     ` Maarten Lankhorst
2020-06-10 14:21   ` [Intel-gfx] [PATCH 03/18] " Tvrtko Ursulin
2020-06-10 15:17     ` Daniel Vetter
2020-06-11 10:36       ` Tvrtko Ursulin
2020-06-11 11:29         ` Daniel Vetter
2020-06-11 14:29           ` Tvrtko Ursulin
2020-06-11 15:03             ` Daniel Vetter
2020-06-11  8:00   ` Chris Wilson
2020-06-11  8:44     ` Dave Airlie
2020-06-11  9:01       ` [Intel-gfx] " Daniel Stone
2020-06-19  8:25         ` Chris Wilson
2020-06-19  8:51           ` Daniel Vetter
2020-06-19  9:13             ` Chris Wilson
2020-06-19  9:43               ` Daniel Vetter
2020-06-19 13:12                 ` Chris Wilson
2020-06-22  9:16                   ` Daniel Vetter
2020-07-09  7:29                 ` Daniel Stone
2020-07-09  8:01                   ` Daniel Vetter
2020-06-12  7:06   ` [PATCH] " Daniel Vetter
2020-06-04  8:12 ` [PATCH 04/18] dma-fence: prime " Daniel Vetter
2020-06-11  7:30   ` [Linaro-mm-sig] " Thomas Hellström (Intel)
2020-06-11  8:34     ` Daniel Vetter
2020-06-11 14:15       ` Jason Gunthorpe
2020-06-11 23:35         ` Felix Kuehling
2020-06-12  5:11           ` Daniel Vetter
2020-06-19 18:13           ` Jerome Glisse
2020-06-23  7:39           ` Daniel Vetter
2020-06-23 18:44             ` Felix Kuehling
2020-06-23 19:02               ` Daniel Vetter
2020-06-16 12:07         ` Daniel Vetter
2020-06-16 14:53           ` Jason Gunthorpe
2020-06-17  7:57             ` Daniel Vetter
2020-06-17 15:29               ` Jason Gunthorpe
2020-06-18 14:42                 ` Daniel Vetter
2020-06-17  6:48           ` Daniel Vetter
2020-06-17 15:28             ` Jason Gunthorpe
2020-06-18 15:00               ` Daniel Vetter
2020-06-18 17:23                 ` Jason Gunthorpe
2020-06-19  7:22                   ` Daniel Vetter [this message]
2020-06-19 11:39                     ` Jason Gunthorpe
2020-06-19 15:06                       ` Daniel Vetter
2020-06-19 15:15                         ` Jason Gunthorpe
2020-06-19 16:19                           ` Daniel Vetter
2020-06-19 17:23                             ` Jason Gunthorpe
2020-06-19 18:09                               ` Jerome Glisse
2020-06-19 18:18                                 ` Jason Gunthorpe
2020-06-19 19:48                                   ` Felix Kuehling
2020-06-19 19:55                                     ` Jason Gunthorpe
2020-06-19 20:03                                       ` Felix Kuehling
2020-06-19 20:31                                       ` Jerome Glisse
2020-06-22 11:46                                         ` Jason Gunthorpe
2020-06-22 20:15                                           ` Jerome Glisse
2020-06-23  0:02                                             ` Jason Gunthorpe
2020-06-19 20:10                                   ` Jerome Glisse
2020-06-19 20:43                                     ` Daniel Vetter
2020-06-19 20:59                                       ` Jerome Glisse
2020-06-23  0:05                                     ` Jason Gunthorpe
2020-06-19 19:11                                 ` Alex Deucher
2020-06-19 19:30                                   ` Felix Kuehling
2020-06-19 19:40                                     ` Jerome Glisse
2020-06-19 19:51                                     ` Jason Gunthorpe
2020-06-12  7:01   ` [PATCH] " Daniel Vetter
2020-06-04  8:12 ` [PATCH 05/18] drm/vkms: Annotate vblank timer Daniel Vetter
2020-06-04  8:12 ` [PATCH 06/18] drm/vblank: Annotate with dma-fence signalling section Daniel Vetter
2020-06-04  8:12 ` [PATCH 07/18] drm/atomic-helper: Add dma-fence annotations Daniel Vetter
2020-06-04  8:12 ` [PATCH 08/18] drm/amdgpu: add dma-fence annotations to atomic commit path Daniel Vetter
2020-06-23 10:51   ` Daniel Vetter
2020-06-04  8:12 ` [PATCH 09/18] drm/scheduler: use dma-fence annotations in main thread Daniel Vetter
2020-06-04  8:12 ` [PATCH 10/18] drm/amdgpu: use dma-fence annotations in cs_submit() Daniel Vetter
2020-06-04  8:12 ` [PATCH 11/18] drm/amdgpu: s/GFP_KERNEL/GFP_ATOMIC in scheduler code Daniel Vetter
2020-06-04  8:12 ` [PATCH 12/18] drm/amdgpu: DC also loves to allocate stuff where it shouldn't Daniel Vetter
2020-06-04  8:12 ` [PATCH 13/18] drm/amdgpu/dc: Stop dma_resv_lock inversion in commit_tail Daniel Vetter
2020-06-05  8:30   ` Pierre-Eric Pelloux-Prayer
2020-06-05 12:41     ` Daniel Vetter
2020-06-04  8:12 ` [PATCH 14/18] drm/scheduler: use dma-fence annotations in tdr work Daniel Vetter
2020-06-04  8:12 ` [PATCH 15/18] drm/amdgpu: use dma-fence annotations for gpu reset code Daniel Vetter
2020-06-04  8:12 ` [PATCH 16/18] Revert "drm/amdgpu: add fbdev suspend/resume on gpu reset" Daniel Vetter
2020-06-04  8:12 ` [PATCH 17/18] drm/amdgpu: gpu recovery does full modesets Daniel Vetter
2020-06-04  8:12 ` [PATCH 18/18] drm/i915: Annotate dma_fence_work Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAKMK7uEbqTu4q-amkLXyd1i8KNtLaoO2ZFoGqYiG6D0m0FKpOg@mail.gmail.com \
    --to=daniel@ffwll.ch \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=daniel.vetter@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=jgg@ziepe.ca \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=mika.kuoppala@intel.com \
    --cc=thomas.hellstrom@intel.com \
    --cc=thomas_os@shipmail.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

AMD-GFX Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/amd-gfx/0 amd-gfx/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 amd-gfx amd-gfx/ https://lore.kernel.org/amd-gfx \
		amd-gfx@lists.freedesktop.org
	public-inbox-index amd-gfx

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.freedesktop.lists.amd-gfx


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git