linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Christian König" <christian.koenig@amd.com>
To: "Thomas Hellström (Intel)" <thomas_os@shipmail.org>,
	"Daniel Vetter" <daniel@ffwll.ch>
Cc: DRI Development <dri-devel@lists.freedesktop.org>,
	Daniel Stone <daniels@collabora.com>,
	linux-rdma <linux-rdma@vger.kernel.org>,
	Intel Graphics Development <intel-gfx@lists.freedesktop.org>,
	Maarten Lankhorst <maarten.lankhorst@linux.intel.com>,
	amd-gfx list <amd-gfx@lists.freedesktop.org>,
	"moderated list:DMA BUFFER SHARING FRAMEWORK" 
	<linaro-mm-sig@lists.linaro.org>,
	Steve Pronovost <spronovo@microsoft.com>,
	Daniel Vetter <daniel.vetter@intel.com>,
	Jason Ekstrand <jason@jlekstrand.net>,
	Jesse Natalie <jenatali@microsoft.com>,
	Felix Kuehling <Felix.Kuehling@amd.com>,
	Thomas Hellstrom <thomas.hellstrom@intel.com>,
	"open list:DMA BUFFER SHARING FRAMEWORK" 
	<linux-media@vger.kernel.org>,
	Mika Kuoppala <mika.kuoppala@intel.com>
Subject: Re: [Linaro-mm-sig] [PATCH 1/2] dma-buf.rst: Document why indefinite fences are a bad idea
Date: Tue, 21 Jul 2020 15:59:09 +0200	[thread overview]
Message-ID: <f8f73b9f-ce8d-ea02-7caa-d50b75b72809@amd.com> (raw)
In-Reply-To: <ae4e4188-39e6-ec41-c11d-91e9211b4d3a@shipmail.org>

Am 21.07.20 um 12:47 schrieb Thomas Hellström (Intel):
>
> On 7/21/20 11:50 AM, Daniel Vetter wrote:
>> On Tue, Jul 21, 2020 at 11:38 AM Thomas Hellström (Intel)
>> <thomas_os@shipmail.org> wrote:
>>>
>>> On 7/21/20 10:55 AM, Christian König wrote:
>>>> Am 21.07.20 um 10:47 schrieb Thomas Hellström (Intel):
>>>>> On 7/21/20 9:45 AM, Christian König wrote:
>>>>>> Am 21.07.20 um 09:41 schrieb Daniel Vetter:
>>>>>>> On Mon, Jul 20, 2020 at 01:15:17PM +0200, Thomas Hellström (Intel)
>>>>>>> wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> On 7/9/20 2:33 PM, Daniel Vetter wrote:
>>>>>>>>> Comes up every few years, gets somewhat tedious to discuss, let's
>>>>>>>>> write this down once and for all.
>>>>>>>>>
>>>>>>>>> What I'm not sure about is whether the text should be more
>>>>>>>>> explicit in
>>>>>>>>> flat out mandating the amdkfd eviction fences for long running
>>>>>>>>> compute
>>>>>>>>> workloads or workloads where userspace fencing is allowed.
>>>>>>>> Although (in my humble opinion) it might be possible to completely
>>>>>>>> untangle
>>>>>>>> kernel-introduced fences for resource management and dma-fences
>>>>>>>> used for
>>>>>>>> completion- and dependency tracking and lift a lot of restrictions
>>>>>>>> for the
>>>>>>>> dma-fences, including prohibiting infinite ones, I think this
>>>>>>>> makes sense
>>>>>>>> describing the current state.
>>>>>>> Yeah I think a future patch needs to type up how we want to make 
>>>>>>> that
>>>>>>> happen (for some cross driver consistency) and what needs to be
>>>>>>> considered. Some of the necessary parts are already there (with
>>>>>>> like the
>>>>>>> preemption fences amdkfd has as an example), but I think some clear
>>>>>>> docs
>>>>>>> on what's required from both hw, drivers and userspace would be 
>>>>>>> really
>>>>>>> good.
>>>>>> I'm currently writing that up, but probably still need a few days
>>>>>> for this.
>>>>> Great! I put down some (very) initial thoughts a couple of weeks ago
>>>>> building on eviction fences for various hardware complexity levels 
>>>>> here:
>>>>>
>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fthomash%2Fdocs%2F-%2Fblob%2Fmaster%2FUntangling%2520dma-fence%2520and%2520memory%2520allocation.odt&amp;data=02%7C01%7Cchristian.koenig%40amd.com%7C0af39422c4e744a9303b08d82d637d62%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637309252665326201&amp;sdata=Zk3LVX7bbMpfAMsq%2Fs2jyA0puRQNcjzliJS%2BC7uDLMo%3D&amp;reserved=0 
>>>>>
>>>>>
>>>> I don't think that this will ever be possible.
>>>>
>>>> See that Daniel describes in his text is that indefinite fences are a
>>>> bad idea for memory management, and I think that this is a fixed fact.
>>>>
>>>> In other words the whole concept of submitting work to the kernel
>>>> which depends on some user space interaction doesn't work and never 
>>>> will.
>>> Well the idea here is that memory management will *never* depend on
>>> indefinite fences: As soon as someone waits on a memory manager fence
>>> (be it eviction, shrinker or mmu notifier) it breaks out of any
>>> dma-fence dependencies and /or user-space interaction. The text 
>>> tries to
>>> describe what's required to be able to do that (save for 
>>> non-preemptible
>>> gpus where someone submits a forever-running shader).
>> Yeah I think that part of your text is good to describe how to
>> untangle memory fences from synchronization fences given how much the
>> hw can do.
>>
>>> So while I think this is possible (until someone comes up with a case
>>> where it wouldn't work of course), I guess Daniel has a point in 
>>> that it
>>> won't happen because of inertia and there might be better options.
>> Yeah it's just I don't see much chance for splitting dma-fence itself.

Well that's the whole idea with the timeline semaphores and waiting for 
a signal number to appear.

E.g. instead of doing the wait with the dma_fence we are separating that 
out into the timeline semaphore object.

This not only avoids the indefinite fence problem for the wait before 
signal case in Vulkan, but also prevents userspace to submit stuff which 
can't be processed immediately.

>> That's also why I'm not positive on the "no hw preemption, only
>> scheduler" case: You still have a dma_fence for the batch itself,
>> which means still no userspace controlled synchronization or other
>> form of indefinite batches allowed. So not getting us any closer to
>> enabling the compute use cases people want.

What compute use case are you talking about? I'm only aware about the 
wait before signal case from Vulkan, the page fault case and the KFD 
preemption fence case.

>
> Yes, we can't do magic. As soon as an indefinite batch makes it to 
> such hardware we've lost. But since we can break out while the batch 
> is stuck in the scheduler waiting, what I believe we *can* do with 
> this approach is to avoid deadlocks due to locally unknown 
> dependencies, which has some bearing on this documentation patch, and 
> also to allow memory allocation in dma-fence (not memory-fence) 
> critical sections, like gpu fault- and error handlers without 
> resorting to using memory pools.

Avoiding deadlocks is only the tip of the iceberg here.

When you allow the kernel to depend on user space to proceed with some 
operation there are a lot more things which need consideration.

E.g. what happens when an userspace process which has submitted stuff to 
the kernel is killed? Are the prepared commands send to the hardware or 
aborted as well? What do we do with other processes waiting for that stuff?

How to we do resource accounting? When processes need to block when 
submitting to the hardware stuff which is not ready we have a process we 
can punish for blocking resources. But how is kernel memory used for a 
submission accounted? How do we avoid deny of service attacks here were 
somebody eats up all memory by doing submissions which can't finish?

> But again. I'm not saying we should actually implement this. Better to 
> consider it and reject it than not consider it at all.

Agreed.

Same thing as it turned out with the Wait before Signal for Vulkan, 
initially it looked simpler to do it in the kernel. But as far as I know 
the solution in userspace now works so well that we don't really want 
the pain for a kernel implementation any more.

Christian.

>
> /Thomas
>
>


  reply	other threads:[~2020-07-21 13:59 UTC|newest]

Thread overview: 119+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-07 20:12 [PATCH 00/25] dma-fence annotations, round 3 Daniel Vetter
2020-07-07 20:12 ` [PATCH 01/25] dma-fence: basic lockdep annotations Daniel Vetter
2020-07-08 14:57   ` Christian König
2020-07-08 15:12     ` Daniel Vetter
2020-07-08 15:19       ` Alex Deucher
2020-07-08 15:37         ` Daniel Vetter
2020-07-14 11:09           ` Daniel Vetter
2020-07-09  7:32       ` [Intel-gfx] " Daniel Stone
2020-07-09  7:52         ` Daniel Vetter
2020-07-13 16:26     ` Daniel Vetter
2020-07-13 16:39       ` Christian König
2020-07-13 20:31         ` Dave Airlie
2020-07-07 20:12 ` [PATCH 02/25] dma-fence: prime " Daniel Vetter
2020-07-09  8:09   ` Daniel Vetter
2020-07-10 12:43     ` Jason Gunthorpe
2020-07-10 12:48       ` Christian König
2020-07-10 12:54         ` Jason Gunthorpe
2020-07-10 13:01           ` Christian König
2020-07-10 13:48             ` Jason Gunthorpe
2020-07-10 14:02               ` Daniel Vetter
2020-07-10 14:23                 ` Jason Gunthorpe
2020-07-10 20:02                   ` Daniel Vetter
2020-07-07 20:12 ` [PATCH 03/25] dma-buf.rst: Document why idenfinite fences are a bad idea Daniel Vetter
2020-07-09  7:36   ` [Intel-gfx] " Daniel Stone
2020-07-09  8:04     ` Daniel Vetter
2020-07-09 12:11       ` Daniel Stone
2020-07-09 12:31         ` Daniel Vetter
2020-07-09 14:28           ` Christian König
2020-07-09 11:53   ` Christian König
2020-07-09 12:33   ` [PATCH 1/2] dma-buf.rst: Document why indefinite " Daniel Vetter
2020-07-09 12:33     ` [PATCH 2/2] drm/virtio: Remove open-coded commit-tail function Daniel Vetter
2020-07-09 12:48       ` Gerd Hoffmann
2020-07-09 14:05       ` Sam Ravnborg
2020-07-14  9:13         ` Daniel Vetter
2020-08-19 12:43       ` Jiri Slaby
2020-08-19 12:47         ` Jiri Slaby
2020-08-19 13:24         ` Gerd Hoffmann
2020-08-20  6:32           ` Jiri Slaby
2020-08-21  7:01             ` Gerd Hoffmann
2020-07-10 12:30     ` [PATCH 1/2] dma-buf.rst: Document why indefinite fences are a bad idea Maarten Lankhorst
2020-07-14 17:46     ` Jason Ekstrand
2020-07-20 11:15     ` [Linaro-mm-sig] " Thomas Hellström (Intel)
2020-07-21  7:41       ` Daniel Vetter
2020-07-21  7:45         ` Christian König
2020-07-21  8:47           ` Thomas Hellström (Intel)
2020-07-21  8:55             ` Christian König
2020-07-21  9:16               ` Daniel Vetter
2020-07-21  9:24                 ` Daniel Vetter
2020-07-21  9:37               ` Thomas Hellström (Intel)
2020-07-21  9:50                 ` Daniel Vetter
2020-07-21 10:47                   ` Thomas Hellström (Intel)
2020-07-21 13:59                     ` Christian König [this message]
2020-07-21 17:46                       ` Thomas Hellström (Intel)
2020-07-21 18:18                         ` Daniel Vetter
2020-07-21 21:42                       ` Dave Airlie
2020-07-21 22:45             ` Dave Airlie
2020-07-22  6:45               ` Thomas Hellström (Intel)
2020-07-22  7:11                 ` Daniel Vetter
2020-07-22  8:05                   ` Thomas Hellström (Intel)
2020-07-22  9:45                     ` Daniel Vetter
2020-07-22 10:31                       ` Thomas Hellström (Intel)
2020-07-22 11:39                         ` Daniel Vetter
2020-07-22 12:22                           ` Thomas Hellström (Intel)
2020-07-22 12:41                             ` Daniel Vetter
2020-07-22 13:12                               ` Thomas Hellström (Intel)
2020-07-22 14:07                                 ` Daniel Vetter
2020-07-22 14:23                                   ` Christian König
2020-07-22 14:30                                     ` Thomas Hellström (Intel)
2020-07-22 14:35                                       ` Christian König
2020-07-07 20:12 ` [PATCH 04/25] drm/vkms: Annotate vblank timer Daniel Vetter
2020-07-12 22:27   ` Rodrigo Siqueira
2020-07-14  9:57     ` Melissa Wen
2020-07-14  9:59       ` Daniel Vetter
2020-07-14 14:55         ` Melissa Wen
2020-07-14 15:23           ` Daniel Vetter
2020-07-07 20:12 ` [PATCH 05/25] drm/vblank: Annotate with dma-fence signalling section Daniel Vetter
2020-07-07 20:12 ` [PATCH 06/25] drm/amdgpu: add dma-fence annotations to atomic commit path Daniel Vetter
2020-07-07 20:12 ` [PATCH 07/25] drm/komdea: Annotate dma-fence critical section in " Daniel Vetter
2020-07-08  5:17   ` james qian wang (Arm Technology China)
2020-07-14  8:34     ` Daniel Vetter
2020-07-07 20:12 ` [PATCH 08/25] drm/malidp: " Daniel Vetter
2020-07-15 12:53   ` Liviu Dudau
2020-07-15 13:51     ` Daniel Vetter
2020-07-07 20:12 ` [PATCH 09/25] drm/atmel: Use drm_atomic_helper_commit Daniel Vetter
2020-07-07 20:37   ` Sam Ravnborg
2020-07-07 21:31   ` [PATCH] " Daniel Vetter
2020-07-14  9:55     ` Sam Ravnborg
2020-07-07 20:12 ` [PATCH 10/25] drm/imx: Annotate dma-fence critical section in commit path Daniel Vetter
2020-07-07 20:12 ` [PATCH 11/25] drm/omapdrm: " Daniel Vetter
2020-07-07 20:12 ` [PATCH 12/25] drm/rcar-du: " Daniel Vetter
2020-07-07 23:32   ` Laurent Pinchart
2020-07-14  8:39     ` Daniel Vetter
2020-07-07 20:12 ` [PATCH 13/25] drm/tegra: " Daniel Vetter
2020-07-07 20:12 ` [PATCH 14/25] drm/tidss: " Daniel Vetter
2020-07-08  9:01   ` Jyri Sarha
2020-07-07 20:12 ` [PATCH 15/25] drm/tilcdc: Use standard drm_atomic_helper_commit Daniel Vetter
2020-07-08  9:17   ` Jyri Sarha
2020-07-08  9:27     ` Daniel Vetter
2020-07-08  9:44   ` [PATCH] " Daniel Vetter
2020-07-08 10:21     ` Jyri Sarha
2020-07-08 14:20   ` Daniel Vetter
2020-07-10 11:16     ` Jyri Sarha
2020-07-14  8:32       ` Daniel Vetter
2020-07-07 20:12 ` [PATCH 16/25] drm/atomic-helper: Add dma-fence annotations Daniel Vetter
2020-07-07 20:12 ` [PATCH 17/25] drm/scheduler: use dma-fence annotations in main thread Daniel Vetter
2020-07-07 20:12 ` [PATCH 18/25] drm/amdgpu: use dma-fence annotations in cs_submit() Daniel Vetter
2020-07-07 20:12 ` [PATCH 19/25] drm/amdgpu: s/GFP_KERNEL/GFP_ATOMIC in scheduler code Daniel Vetter
2020-07-14 10:49   ` Daniel Vetter
2020-07-14 11:40     ` Christian König
2020-07-14 14:31       ` Daniel Vetter
2020-07-15  9:17         ` Christian König
2020-07-15 11:53           ` Daniel Vetter
2020-07-07 20:12 ` [PATCH 20/25] drm/amdgpu: DC also loves to allocate stuff where it shouldn't Daniel Vetter
2020-07-14 11:12   ` Daniel Vetter
2020-07-07 20:12 ` [PATCH 21/25] drm/amdgpu/dc: Stop dma_resv_lock inversion in commit_tail Daniel Vetter
2020-07-07 20:12 ` [PATCH 22/25] drm/scheduler: use dma-fence annotations in tdr work Daniel Vetter
2020-07-07 20:12 ` [PATCH 23/25] drm/amdgpu: use dma-fence annotations for gpu reset code Daniel Vetter
2020-07-07 20:12 ` [PATCH 24/25] Revert "drm/amdgpu: add fbdev suspend/resume on gpu reset" Daniel Vetter
2020-07-07 20:12 ` [PATCH 25/25] drm/amdgpu: gpu recovery does full modesets Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f8f73b9f-ce8d-ea02-7caa-d50b75b72809@amd.com \
    --to=christian.koenig@amd.com \
    --cc=Felix.Kuehling@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=daniel.vetter@intel.com \
    --cc=daniel@ffwll.ch \
    --cc=daniels@collabora.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=jason@jlekstrand.net \
    --cc=jenatali@microsoft.com \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=linux-media@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=mika.kuoppala@intel.com \
    --cc=spronovo@microsoft.com \
    --cc=thomas.hellstrom@intel.com \
    --cc=thomas_os@shipmail.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).