AMD-GFX Archive on lore.kernel.org
 help / color / Atom feed
From: Jason Gunthorpe <jgg@ziepe.ca>
To: Felix Kuehling <felix.kuehling@amd.com>
Cc: linux-rdma <linux-rdma@vger.kernel.org>,
	"Thomas Hellström (Intel)" <thomas_os@shipmail.org>,
	"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
	LKML <linux-kernel@vger.kernel.org>,
	"DRI Development" <dri-devel@lists.freedesktop.org>,
	"moderated list:DMA BUFFER SHARING FRAMEWORK"
	<linaro-mm-sig@lists.linaro.org>,
	"Jerome Glisse" <jglisse@redhat.com>,
	"Thomas Hellstrom" <thomas.hellstrom@intel.com>,
	"amd-gfx list" <amd-gfx@lists.freedesktop.org>,
	"Daniel Vetter" <daniel@ffwll.ch>,
	"Daniel Vetter" <daniel.vetter@intel.com>,
	"open list:DMA BUFFER SHARING FRAMEWORK"
	<linux-media@vger.kernel.org>,
	"Intel Graphics Development" <intel-gfx@lists.freedesktop.org>,
	"Christian König" <christian.koenig@amd.com>,
	"Mika Kuoppala" <mika.kuoppala@intel.com>
Subject: Re: [Linaro-mm-sig] [PATCH 04/18] dma-fence: prime lockdep annotations
Date: Fri, 19 Jun 2020 16:55:38 -0300
Message-ID: <20200619195538.GT6578@ziepe.ca> (raw)
In-Reply-To: <56008d64-772d-5757-6136-f20591ef71d2@amd.com>

On Fri, Jun 19, 2020 at 03:48:49PM -0400, Felix Kuehling wrote:
> Am 2020-06-19 um 2:18 p.m. schrieb Jason Gunthorpe:
> > On Fri, Jun 19, 2020 at 02:09:35PM -0400, Jerome Glisse wrote:
> >> On Fri, Jun 19, 2020 at 02:23:08PM -0300, Jason Gunthorpe wrote:
> >>> On Fri, Jun 19, 2020 at 06:19:41PM +0200, Daniel Vetter wrote:
> >>>
> >>>> The madness is only that device B's mmu notifier might need to wait
> >>>> for fence_B so that the dma operation finishes. Which in turn has to
> >>>> wait for device A to finish first.
> >>> So, it sound, fundamentally you've got this graph of operations across
> >>> an unknown set of drivers and the kernel cannot insert itself in
> >>> dma_fence hand offs to re-validate any of the buffers involved?
> >>> Buffers which by definition cannot be touched by the hardware yet.
> >>>
> >>> That really is a pretty horrible place to end up..
> >>>
> >>> Pinning really is right answer for this kind of work flow. I think
> >>> converting pinning to notifers should not be done unless notifier
> >>> invalidation is relatively bounded. 
> >>>
> >>> I know people like notifiers because they give a bit nicer performance
> >>> in some happy cases, but this cripples all the bad cases..
> >>>
> >>> If pinning doesn't work for some reason maybe we should address that?
> >> Note that the dma fence is only true for user ptr buffer which predate
> >> any HMM work and thus were using mmu notifier already. You need the
> >> mmu notifier there because of fork and other corner cases.
> > I wonder if we should try to fix the fork case more directly - RDMA
> > has this same problem and added MADV_DONTFORK a long time ago as a
> > hacky way to deal with it.
> >
> > Some crazy page pin that resolved COW in a way that always kept the
> > physical memory with the mm that initiated the pin?
> >
> > (isn't this broken for O_DIRECT as well anyhow?)
> >
> > How does mmu_notifiers help the fork case anyhow? Block fork from
> > progressing?
> 
> How much the mmu_notifier blocks fork progress depends, on quickly we
> can preempt GPU jobs accessing affected memory. If we don't have
> fine-grained preemption capability (graphics), the best we can do is
> wait for the GPU jobs to complete. We can also delay submission of new
> GPU jobs to the same memory until the MMU notifier is done. Future jobs
> would use the new page addresses.
> 
> With fine-grained preemption (ROCm compute), we can preempt GPU work on
> the affected adders space to minimize the delay seen by fork.
> 
> With recoverable device page faults, we can invalidate GPU page table
> entries, so device access to the affected pages stops immediately.
> 
> In all cases, the end result is, that the device page table gets updated
> with the address of the copied pages before the GPU accesses the COW
> memory again.Without the MMU notifier, we'd end up with the GPU
> corrupting memory of the other process.

The model here in fork has been wrong for a long time, and I do wonder
how O_DIRECT manages to not be broken too.. I guess the time windows
there are too small to get unlucky.

If you have a write pin on a page then it should not be COW'd into the
fork'd process but copied with the originating page remaining with the
original mm. 

I wonder if there is some easy way to achive that - if that is the
main reason to use notifiers then it would be a better solution.

Jason
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

  reply index

Thread overview: 106+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-04  8:12 [PATCH 00/18] dma-fence lockdep annotations, round 2 Daniel Vetter
2020-06-04  8:12 ` [PATCH 01/18] mm: Track mmu notifiers in fs_reclaim_acquire/release Daniel Vetter
2020-06-10 12:01   ` Thomas Hellström (Intel)
2020-06-10 12:25     ` [Intel-gfx] " Daniel Vetter
2020-06-10 19:41   ` [PATCH] " Daniel Vetter
2020-06-11 14:29     ` Jason Gunthorpe
2020-06-21 17:42     ` Qian Cai
2020-06-21 18:07       ` Daniel Vetter
2020-06-21 20:01         ` Daniel Vetter
2020-06-21 22:09           ` Qian Cai
2020-06-23 16:17           ` Qian Cai
2020-06-23 22:13             ` Daniel Vetter
2020-06-23 22:29               ` Qian Cai
2020-06-23 22:31       ` Dave Chinner
2020-06-23 22:36         ` Daniel Vetter
2020-06-21 17:00   ` [PATCH 01/18] " Qian Cai
2020-06-21 17:28     ` Daniel Vetter
2020-06-21 17:46       ` Qian Cai
2020-06-04  8:12 ` [PATCH 02/18] dma-buf: minor doc touch-ups Daniel Vetter
2020-06-10 13:07   ` Thomas Hellström (Intel)
2020-06-04  8:12 ` [PATCH 03/18] dma-fence: basic lockdep annotations Daniel Vetter
2020-06-04  8:57   ` Thomas Hellström (Intel)
2020-06-04  9:21     ` Daniel Vetter
2020-06-04  9:26       ` Chris Wilson
2020-06-04  9:36         ` [Intel-gfx] " Daniel Vetter
2020-06-05 13:29   ` [PATCH] " Daniel Vetter
2020-06-05 14:30     ` Thomas Hellström (Intel)
2020-06-11  9:57     ` Maarten Lankhorst
2020-06-10 14:21   ` [Intel-gfx] [PATCH 03/18] " Tvrtko Ursulin
2020-06-10 15:17     ` Daniel Vetter
2020-06-11 10:36       ` Tvrtko Ursulin
2020-06-11 11:29         ` Daniel Vetter
2020-06-11 14:29           ` Tvrtko Ursulin
2020-06-11 15:03             ` Daniel Vetter
2020-06-11  8:00   ` Chris Wilson
2020-06-11  8:44     ` Dave Airlie
2020-06-11  9:01       ` [Intel-gfx] " Daniel Stone
2020-06-19  8:25         ` Chris Wilson
2020-06-19  8:51           ` Daniel Vetter
2020-06-19  9:13             ` Chris Wilson
2020-06-19  9:43               ` Daniel Vetter
2020-06-19 13:12                 ` Chris Wilson
2020-06-22  9:16                   ` Daniel Vetter
2020-07-09  7:29                 ` Daniel Stone
2020-07-09  8:01                   ` Daniel Vetter
2020-06-12  7:06   ` [PATCH] " Daniel Vetter
2020-06-04  8:12 ` [PATCH 04/18] dma-fence: prime " Daniel Vetter
2020-06-11  7:30   ` [Linaro-mm-sig] " Thomas Hellström (Intel)
2020-06-11  8:34     ` Daniel Vetter
2020-06-11 14:15       ` Jason Gunthorpe
2020-06-11 23:35         ` Felix Kuehling
2020-06-12  5:11           ` Daniel Vetter
2020-06-19 18:13           ` Jerome Glisse
2020-06-23  7:39           ` Daniel Vetter
2020-06-23 18:44             ` Felix Kuehling
2020-06-23 19:02               ` Daniel Vetter
2020-06-16 12:07         ` Daniel Vetter
2020-06-16 14:53           ` Jason Gunthorpe
2020-06-17  7:57             ` Daniel Vetter
2020-06-17 15:29               ` Jason Gunthorpe
2020-06-18 14:42                 ` Daniel Vetter
2020-06-17  6:48           ` Daniel Vetter
2020-06-17 15:28             ` Jason Gunthorpe
2020-06-18 15:00               ` Daniel Vetter
2020-06-18 17:23                 ` Jason Gunthorpe
2020-06-19  7:22                   ` Daniel Vetter
2020-06-19 11:39                     ` Jason Gunthorpe
2020-06-19 15:06                       ` Daniel Vetter
2020-06-19 15:15                         ` Jason Gunthorpe
2020-06-19 16:19                           ` Daniel Vetter
2020-06-19 17:23                             ` Jason Gunthorpe
2020-06-19 18:09                               ` Jerome Glisse
2020-06-19 18:18                                 ` Jason Gunthorpe
2020-06-19 19:48                                   ` Felix Kuehling
2020-06-19 19:55                                     ` Jason Gunthorpe [this message]
2020-06-19 20:03                                       ` Felix Kuehling
2020-06-19 20:31                                       ` Jerome Glisse
2020-06-22 11:46                                         ` Jason Gunthorpe
2020-06-22 20:15                                           ` Jerome Glisse
2020-06-23  0:02                                             ` Jason Gunthorpe
2020-06-19 20:10                                   ` Jerome Glisse
2020-06-19 20:43                                     ` Daniel Vetter
2020-06-19 20:59                                       ` Jerome Glisse
2020-06-23  0:05                                     ` Jason Gunthorpe
2020-06-19 19:11                                 ` Alex Deucher
2020-06-19 19:30                                   ` Felix Kuehling
2020-06-19 19:40                                     ` Jerome Glisse
2020-06-19 19:51                                     ` Jason Gunthorpe
2020-06-12  7:01   ` [PATCH] " Daniel Vetter
2020-06-04  8:12 ` [PATCH 05/18] drm/vkms: Annotate vblank timer Daniel Vetter
2020-06-04  8:12 ` [PATCH 06/18] drm/vblank: Annotate with dma-fence signalling section Daniel Vetter
2020-06-04  8:12 ` [PATCH 07/18] drm/atomic-helper: Add dma-fence annotations Daniel Vetter
2020-06-04  8:12 ` [PATCH 08/18] drm/amdgpu: add dma-fence annotations to atomic commit path Daniel Vetter
2020-06-23 10:51   ` Daniel Vetter
2020-06-04  8:12 ` [PATCH 09/18] drm/scheduler: use dma-fence annotations in main thread Daniel Vetter
2020-06-04  8:12 ` [PATCH 10/18] drm/amdgpu: use dma-fence annotations in cs_submit() Daniel Vetter
2020-06-04  8:12 ` [PATCH 11/18] drm/amdgpu: s/GFP_KERNEL/GFP_ATOMIC in scheduler code Daniel Vetter
2020-06-04  8:12 ` [PATCH 12/18] drm/amdgpu: DC also loves to allocate stuff where it shouldn't Daniel Vetter
2020-06-04  8:12 ` [PATCH 13/18] drm/amdgpu/dc: Stop dma_resv_lock inversion in commit_tail Daniel Vetter
2020-06-05  8:30   ` Pierre-Eric Pelloux-Prayer
2020-06-05 12:41     ` Daniel Vetter
2020-06-04  8:12 ` [PATCH 14/18] drm/scheduler: use dma-fence annotations in tdr work Daniel Vetter
2020-06-04  8:12 ` [PATCH 15/18] drm/amdgpu: use dma-fence annotations for gpu reset code Daniel Vetter
2020-06-04  8:12 ` [PATCH 16/18] Revert "drm/amdgpu: add fbdev suspend/resume on gpu reset" Daniel Vetter
2020-06-04  8:12 ` [PATCH 17/18] drm/amdgpu: gpu recovery does full modesets Daniel Vetter
2020-06-04  8:12 ` [PATCH 18/18] drm/i915: Annotate dma_fence_work Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200619195538.GT6578@ziepe.ca \
    --to=jgg@ziepe.ca \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=daniel.vetter@intel.com \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=felix.kuehling@amd.com \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=jglisse@redhat.com \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=mika.kuoppala@intel.com \
    --cc=thomas.hellstrom@intel.com \
    --cc=thomas_os@shipmail.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

AMD-GFX Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/amd-gfx/0 amd-gfx/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 amd-gfx amd-gfx/ https://lore.kernel.org/amd-gfx \
		amd-gfx@lists.freedesktop.org
	public-inbox-index amd-gfx

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.freedesktop.lists.amd-gfx


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git