linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Felix Kuehling <felix.kuehling@amd.com>
To: Daniel Vetter <daniel@ffwll.ch>
Cc: "Jason Gunthorpe" <jgg@ziepe.ca>,
	"Thomas Hellström (Intel)" <thomas_os@shipmail.org>,
	"DRI Development" <dri-devel@lists.freedesktop.org>,
	linux-rdma <linux-rdma@vger.kernel.org>,
	"Intel Graphics Development" <intel-gfx@lists.freedesktop.org>,
	"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
	LKML <linux-kernel@vger.kernel.org>,
	"amd-gfx list" <amd-gfx@lists.freedesktop.org>,
	"moderated list:DMA BUFFER SHARING FRAMEWORK"
	<linaro-mm-sig@lists.linaro.org>,
	"Thomas Hellstrom" <thomas.hellstrom@intel.com>,
	"Daniel Vetter" <daniel.vetter@intel.com>,
	"open list:DMA BUFFER SHARING FRAMEWORK"
	<linux-media@vger.kernel.org>,
	"Christian König" <christian.koenig@amd.com>,
	"Mika Kuoppala" <mika.kuoppala@intel.com>
Subject: Re: [Linaro-mm-sig] [PATCH 04/18] dma-fence: prime lockdep annotations
Date: Tue, 23 Jun 2020 14:44:24 -0400	[thread overview]
Message-ID: <99758c09-262a-e9a1-bf65-5702b35b4388@amd.com> (raw)
In-Reply-To: <CAKMK7uHBKrpDWu+DvtYncDK=LOdGJyMK7t6fpOaGovnYFiBUZw@mail.gmail.com>

Am 2020-06-23 um 3:39 a.m. schrieb Daniel Vetter:
> On Fri, Jun 12, 2020 at 1:35 AM Felix Kuehling <felix.kuehling@amd.com> wrote:
>> Am 2020-06-11 um 10:15 a.m. schrieb Jason Gunthorpe:
>>> On Thu, Jun 11, 2020 at 10:34:30AM +0200, Daniel Vetter wrote:
>>>>> I still have my doubts about allowing fence waiting from within shrinkers.
>>>>> IMO ideally they should use a trywait approach, in order to allow memory
>>>>> allocation during command submission for drivers that
>>>>> publish fences before command submission. (Since early reservation object
>>>>> release requires that).
>>>> Yeah it is a bit annoying, e.g. for drm/scheduler I think we'll end up
>>>> with a mempool to make sure it can handle it's allocations.
>>>>
>>>>> But since drivers are already waiting from within shrinkers and I take your
>>>>> word for HMM requiring this,
>>>> Yeah the big trouble is HMM and mmu notifiers. That's the really awkward
>>>> one, the shrinker one is a lot less established.
>>> I really question if HW that needs something like DMA fence should
>>> even be using mmu notifiers - the best use is HW that can fence the
>>> DMA directly without having to get involved with some command stream
>>> processing.
>>>
>>> Or at the very least it should not be a generic DMA fence but a
>>> narrowed completion tied only into the same GPU driver's command
>>> completion processing which should be able to progress without
>>> blocking.
>>>
>>> The intent of notifiers was never to endlessly block while vast
>>> amounts of SW does work.
>>>
>>> Going around and switching everything in a GPU to GFP_ATOMIC seems
>>> like bad idea.
>>>
>>>> I've pinged a bunch of armsoc gpu driver people and ask them how much this
>>>> hurts, so that we have a clear answer. On x86 I don't think we have much
>>>> of a choice on this, with userptr in amd and i915 and hmm work in nouveau
>>>> (but nouveau I think doesn't use dma_fence in there).
>> Soon nouveau will get company. We're working on a recoverable page fault
>> implementation for HMM in amdgpu where we'll need to update page tables
>> using the GPUs SDMA engine and wait for corresponding fences in MMU
>> notifiers.
> Can you pls cc these patches to dri-devel when they show up? Depending
> upon how your hw works there's and endless amount of bad things that
> can happen.

Yes, I'll do that.


>
> Also I think (again depending upon how the hw exactly works) this
> stuff would be a perfect example for the dma_fence annotations.

We have already applied your patch series to our development branch. I
haven't looked into what annotations we'd have to add to our new code yet.


>
> The worst case is if your hw cannot preempt while a hw page fault is
> pending. That means none of the dma_fence will ever signal (the amdkfd
> preempt ctx fences wont, and the classic fences from amdgpu might be
> also stall). At least when you're unlucky and the fence you're waiting
> on somehow (anywhere in its dependency chain really) need the engine
> that's currently blocked waiting for the hw page fault.

Our HW can preempt while handling a page fault, at least on the GPU
generation we're working on now. On other GPUs we haven't included in
our initial effort, we will not be able to preempt while a page fault is
in progress. This is problematic, but that's for reasons related to our
GPU hardware scheduler and unrelated to fences.


>
> That in turn means anything you do in your hw page fault handler is in
> the critical section for dma fence signalling, which has far reaching
> implications.

I'm not sure I agree, at least for KFD. The only place where KFD uses
fences that depend on preemptions is eviction fences. And we can get rid
of those if we can preempt GPU access to specific BOs by invalidating
GPU PTEs. That way we don't need to preempt the GPU queues while a page
fault is in progress. Instead we would create more page faults.

That assumes that we can invalidate GPU PTEs without depending on
fences. We've discussed possible deadlocks due to memory allocations
needed on that code paths for IBs or page tables. We've already
eliminated page table allocations and reservation locks on the PTE
invalidation code path. And we're using a separate scheduler entity so
we can't get stuck behind other IBs that depend on fences. IIRC,
Christian also implemented a separate memory pool for IBs for this code
path.

Regards,
  Felix


> -Daniel
>
>> Regards,
>>   Felix
>>
>>
>>> Right, nor will RDMA ODP.
>>>
>>> Jason
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
>

  reply	other threads:[~2020-06-23 18:44 UTC|newest]

Thread overview: 99+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-04  8:12 [PATCH 00/18] dma-fence lockdep annotations, round 2 Daniel Vetter
2020-06-04  8:12 ` [PATCH 01/18] mm: Track mmu notifiers in fs_reclaim_acquire/release Daniel Vetter
2020-06-10 12:01   ` Thomas Hellström (Intel)
2020-06-10 12:25     ` [Intel-gfx] " Daniel Vetter
2020-06-10 19:41   ` [PATCH] " Daniel Vetter
2020-06-11 14:29     ` Jason Gunthorpe
2020-06-21 17:42     ` Qian Cai
2020-06-21 18:07       ` Daniel Vetter
2020-06-21 20:01         ` Daniel Vetter
2020-06-21 22:09           ` Qian Cai
2020-06-23 16:17           ` Qian Cai
2020-06-23 22:13             ` Daniel Vetter
2020-06-23 22:29               ` Qian Cai
2020-06-23 22:31       ` Dave Chinner
2020-06-23 22:36         ` Daniel Vetter
2020-06-21 17:00   ` [PATCH 01/18] " Qian Cai
2020-06-21 17:28     ` Daniel Vetter
2020-06-21 17:46       ` Qian Cai
2020-06-04  8:12 ` [PATCH 02/18] dma-buf: minor doc touch-ups Daniel Vetter
2020-06-10 13:07   ` Thomas Hellström (Intel)
2020-06-04  8:12 ` [PATCH 03/18] dma-fence: basic lockdep annotations Daniel Vetter
2020-06-04  8:57   ` Thomas Hellström (Intel)
2020-06-04  9:21     ` Daniel Vetter
     [not found]       ` <159126281827.25109.3992161193069793005@build.alporthouse.com>
2020-06-04  9:36         ` [Intel-gfx] " Daniel Vetter
2020-06-05 13:29   ` [PATCH] " Daniel Vetter
2020-06-05 14:30     ` Thomas Hellström (Intel)
2020-06-11  9:57     ` Maarten Lankhorst
2020-06-10 14:21   ` [Intel-gfx] [PATCH 03/18] " Tvrtko Ursulin
2020-06-10 15:17     ` Daniel Vetter
2020-06-11 10:36       ` Tvrtko Ursulin
2020-06-11 11:29         ` Daniel Vetter
2020-06-11 14:29           ` Tvrtko Ursulin
2020-06-11 15:03             ` Daniel Vetter
     [not found]   ` <159186243606.1506.4437341616828968890@build.alporthouse.com>
2020-06-11  8:44     ` Dave Airlie
2020-06-11  9:01       ` [Intel-gfx] " Daniel Stone
     [not found]         ` <159255511144.7737.12635440776531222029@build.alporthouse.com>
2020-06-19  8:51           ` Daniel Vetter
     [not found]             ` <159255801588.7737.4425728073225310839@build.alporthouse.com>
2020-06-19  9:43               ` Daniel Vetter
     [not found]                 ` <159257233754.7737.17318605310513355800@build.alporthouse.com>
2020-06-22  9:16                   ` Daniel Vetter
2020-07-09  7:29                 ` Daniel Stone
2020-07-09  8:01                   ` Daniel Vetter
2020-06-04  8:12 ` [PATCH 04/18] dma-fence: prime " Daniel Vetter
2020-06-11  7:30   ` [Linaro-mm-sig] " Thomas Hellström (Intel)
2020-06-11  8:34     ` Daniel Vetter
2020-06-11 14:15       ` Jason Gunthorpe
2020-06-11 23:35         ` Felix Kuehling
2020-06-12  5:11           ` Daniel Vetter
2020-06-19 18:13           ` Jerome Glisse
2020-06-23  7:39           ` Daniel Vetter
2020-06-23 18:44             ` Felix Kuehling [this message]
2020-06-23 19:02               ` Daniel Vetter
2020-06-16 12:07         ` Daniel Vetter
2020-06-16 14:53           ` Jason Gunthorpe
2020-06-17  7:57             ` Daniel Vetter
2020-06-17 15:29               ` Jason Gunthorpe
2020-06-18 14:42                 ` Daniel Vetter
2020-06-17  6:48           ` Daniel Vetter
2020-06-17 15:28             ` Jason Gunthorpe
2020-06-18 15:00               ` Daniel Vetter
2020-06-18 17:23                 ` Jason Gunthorpe
2020-06-19  7:22                   ` Daniel Vetter
2020-06-19 11:39                     ` Jason Gunthorpe
2020-06-19 15:06                       ` Daniel Vetter
2020-06-19 15:15                         ` Jason Gunthorpe
2020-06-19 16:19                           ` Daniel Vetter
2020-06-19 17:23                             ` Jason Gunthorpe
2020-06-19 18:09                               ` Jerome Glisse
2020-06-19 18:18                                 ` Jason Gunthorpe
2020-06-19 19:48                                   ` Felix Kuehling
2020-06-19 19:55                                     ` Jason Gunthorpe
2020-06-19 20:03                                       ` Felix Kuehling
2020-06-19 20:31                                       ` Jerome Glisse
2020-06-22 11:46                                         ` Jason Gunthorpe
2020-06-22 20:15                                           ` Jerome Glisse
2020-06-23  0:02                                             ` Jason Gunthorpe
2020-06-19 20:10                                   ` Jerome Glisse
2020-06-19 20:43                                     ` Daniel Vetter
2020-06-19 20:59                                       ` Jerome Glisse
2020-06-23  0:05                                     ` Jason Gunthorpe
2020-06-19 19:11                                 ` Alex Deucher
2020-06-19 19:30                                   ` Felix Kuehling
2020-06-19 19:40                                     ` Jerome Glisse
2020-06-19 19:51                                     ` Jason Gunthorpe
2020-06-04  8:12 ` [PATCH 05/18] drm/vkms: Annotate vblank timer Daniel Vetter
2020-06-04  8:12 ` [PATCH 06/18] drm/vblank: Annotate with dma-fence signalling section Daniel Vetter
2020-06-04  8:12 ` [PATCH 07/18] drm/atomic-helper: Add dma-fence annotations Daniel Vetter
2020-06-04  8:12 ` [PATCH 08/18] drm/amdgpu: add dma-fence annotations to atomic commit path Daniel Vetter
2020-06-23 10:51   ` Daniel Vetter
2020-06-04  8:12 ` [PATCH 09/18] drm/scheduler: use dma-fence annotations in main thread Daniel Vetter
2020-06-04  8:12 ` [PATCH 10/18] drm/amdgpu: use dma-fence annotations in cs_submit() Daniel Vetter
2020-06-04  8:12 ` [PATCH 11/18] drm/amdgpu: s/GFP_KERNEL/GFP_ATOMIC in scheduler code Daniel Vetter
2020-06-04  8:12 ` [PATCH 12/18] drm/amdgpu: DC also loves to allocate stuff where it shouldn't Daniel Vetter
2020-06-04  8:12 ` [PATCH 13/18] drm/amdgpu/dc: Stop dma_resv_lock inversion in commit_tail Daniel Vetter
2020-06-05  8:30   ` Pierre-Eric Pelloux-Prayer
2020-06-05 12:41     ` Daniel Vetter
2020-06-04  8:12 ` [PATCH 14/18] drm/scheduler: use dma-fence annotations in tdr work Daniel Vetter
2020-06-04  8:12 ` [PATCH 15/18] drm/amdgpu: use dma-fence annotations for gpu reset code Daniel Vetter
2020-06-04  8:12 ` [PATCH 16/18] Revert "drm/amdgpu: add fbdev suspend/resume on gpu reset" Daniel Vetter
2020-06-04  8:12 ` [PATCH 17/18] drm/amdgpu: gpu recovery does full modesets Daniel Vetter
2020-06-04  8:12 ` [PATCH 18/18] drm/i915: Annotate dma_fence_work Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=99758c09-262a-e9a1-bf65-5702b35b4388@amd.com \
    --to=felix.kuehling@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=daniel.vetter@intel.com \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=jgg@ziepe.ca \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=mika.kuoppala@intel.com \
    --cc=thomas.hellstrom@intel.com \
    --cc=thomas_os@shipmail.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).