linux-sgx.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Christian König" <christian.koenig@amd.com>
To: "Asahi Lina" <lina@asahilina.net>,
	"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
	"Maxime Ripard" <mripard@kernel.org>,
	"Thomas Zimmermann" <tzimmermann@suse.de>,
	"David Airlie" <airlied@gmail.com>,
	"Daniel Vetter" <daniel@ffwll.ch>,
	"Miguel Ojeda" <ojeda@kernel.org>,
	"Alex Gaynor" <alex.gaynor@gmail.com>,
	"Wedson Almeida Filho" <wedsonaf@gmail.com>,
	"Boqun Feng" <boqun.feng@gmail.com>,
	"Gary Guo" <gary@garyguo.net>,
	"Björn Roy Baron" <bjorn3_gh@protonmail.com>,
	"Sumit Semwal" <sumit.semwal@linaro.org>,
	"Luben Tuikov" <luben.tuikov@amd.com>,
	"Jarkko Sakkinen" <jarkko@kernel.org>,
	"Dave Hansen" <dave.hansen@linux.intel.com>
Cc: Alyssa Rosenzweig <alyssa@rosenzweig.io>,
	Karol Herbst <kherbst@redhat.com>,
	Ella Stanforth <ella@iglunix.org>,
	Faith Ekstrand <faith.ekstrand@collabora.com>,
	Mary <mary@mary.zone>,
	linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org,
	rust-for-linux@vger.kernel.org, linux-media@vger.kernel.org,
	linaro-mm-sig@lists.linaro.org, linux-sgx@vger.kernel.org,
	asahi@lists.linux.dev
Subject: Re: [PATCH RFC 10/18] drm/scheduler: Add can_run_job callback
Date: Wed, 8 Mar 2023 18:57:21 +0100	[thread overview]
Message-ID: <b0aa78b2-b432-200a-8953-a80c462fa6ee@amd.com> (raw)
In-Reply-To: <4bbfc1a3-cfc3-87f4-897b-b6637bac3bd0@asahilina.net>

Am 08.03.23 um 17:44 schrieb Asahi Lina:
> On 09/03/2023 00.30, Christian König wrote:
>> Am 08.03.23 um 15:53 schrieb Asahi Lina:
>>> [SNIP]
>>>> The background is that core memory management requires that signaling a
>>>> fence only depends on signaling other fences and hardware progress and
>>>> nothing else. Otherwise you immediately run into problems because of
>>>> circle dependencies or what we call infinite fences.
>>> And hardware progress is exactly the only dependency here...
>> Well then you should have a fence for that hardware progress.
> I do, it's the prior job hardware completion fences that drm_sched
> already knows about!
>
> Yes, I could return those in the prepare callback, it just means I need
> to start stashing fence references in the underlying firmware job queue
> command objects so I can find out what is the oldest pending fence is,
> and return it when a queue is full. As long as drm_sched doesn't mind if
> I keep giving it fences (since multiple commands can have to complete
> before there is space) or the occasional already signaled fence (because
> this process is inherently racy), it should work fine.

Well this handling is intentional and necessary, but see below for a 
more in deep explanation.

> If you think this is the better way, I'll do it that way and drop this
> patch. It just seemed simpler to do it with another callback, since
> drm_sched is already tracking those fences and doing a hardware queue
> limit check anyway, and that way I can avoid tracking the fences down
> into the hardware queue code... *

Well it's not the better way, it's the only way that works.

I have to admit that my bet on your intentions was wrong, but even that 
use case doesn't work correctly.

See when your callback returns false it is perfectly possible that all 
hw fences are signaled between returning that information and processing it.

The result would be that the scheduler goes to sleep and never wakes up 
again.

That's why we have that rule that all dependencies need to be expressed 
by those dma_fence objects, cause those are design with such races in mind.

> (But I still maintain what I'm trying to do here is entirely correct and
> deadlock-free! If you prefer I use prepare_job and return prior job
> fences from that instead, that's very different from NAKing the patch
> saying it's broken...)

As I said we exercised those ideas before and yes this approach here 
came up before as well and no it doesn't work.

> * If you're wondering how the fences get signaled at all then: callback
> closures that capture a reference to the fence when firmware commands
> are constructed and submitted. I know, I know, fancy Rust stuff... ^^
> If you'd rather have me use the fences for the blocking, I'll probably
> just drop the signaling bit from the closures so we don't need to keep
> two redundant fence references in different places per command. I still
> need the closures for command completion processing though, since I use
> them to process statistics too...
>
>> I see that we have a disconnection here. As far as I can see you can use
>> the can_run callback in only three ways:
>>
>> 1. To check for some userspace dependency (We don't need to discuss
>> that, it's evil and we both know it).
>>
>> 2. You check for some hw resource availability. Similar to VMID on
>> amdgpu hw.
>>
>>       This is what I think you do here (but I might be wrong).
> It isn't... I agree, it would be problematic. It doesn't make any sense
> to check for global resources this way, not just because you might
> deadlock but also because there might be nothing to signal to the
> scheduler that a resource was freed at all once it is!
>
>> But this
>> would be extremely problematic because you can then live lock.
>>       E.g. queue A keeps submitting jobs which take only a few resources
>> and by doing so delays submitting jobs from queue B indefinitely.
> This particular issue aside, fairness in global resource allocation is a
> conversation I'd love to have! Right now the driver doesn't try to
> ensure that, a queue can easily monopolize certain hardware resources
> (though one queue can only monopolize one of each, so you'd need
> something like 63 queues with 63 distinct VMs all submitting
> free-running jobs back to back in order to starve other queues of
> resources forever). For starters, one thing I'm thinking of doing is
> reserving certain subsets of hardware resources for queues with a given
> priority, so you can at least guarantee forward progress of
> higher-priority queues when faced with misbehaving lower-priority
> queues. But if we want to guarantee proper fairness, I think I'll have
> to start doing things like switching to a CPU-roundtrip submission model
> when resources become scarce (to guarantee that queues actually release
> the resources once in a while) and then figure out how to add fairness
> to the allocation code...
>
> But let's have that conversation when we talk about the driver (or maybe
> on IRC or something?), right now I'm more interested in getting the
> abstractions reviewed ^^

Well that stuff is highly problematic as well. The fairness aside you 
risk starvation which in turn breaks the guarantee of forward progress.

In this particular case you can catch this with a timeout for the hw 
operation, but you should consider blocking that from the sw side as well.

>> 3. You have an intra queue dependency. E.g. you have jobs which take X
>> amount of resources, you can submit only to a specific limit.
>>       But in this case you should be able to return fences from the same
>> queue as dependency and won't need that callback.
> Yes, I can do this. I can just do the same check can_run_job() does and
> if it fails, pick the oldest job in the full firmware queue and return
> its fence (it just means I need to keep track of those fences there, as
> I said above).
>
>>       We would just need to adjust drm_sched_entity_add_dependency_cb() a
>> bit because dependencies from the same queue are currently filtered out
>> because it assumes a pipeline nature of submission (e.g. previous
>> submissions are finished before new submissions start).
> Actually that should be fine, because I'd be returning the underlying
> hardware completion fences (what the run() callback returns) which the
> driver owns, and wouldn't be recognized as belonging to the sched.
>
>>> I actually know I have a different theoretical deadlock issue along
>>> these lines in the driver because right now we grab actually global
>>> resources (including a VMID) before job submission to drm_sched. This is
>>> a known issue, and to fix it without reducing performance I need to
>>> introduce some kind of "patching/fixup" system for firmware commands
>>> (because we need to inject those identifiers in dozens of places, but we
>>> don't want to construct those commands from scratch at job run time
>>> because that introduces latency at the wrong time and makes error
>>> handling/validation more complicated and error-prone), and that is
>>> exactly what should happen in prepare_job, as you say. And yes, at that
>>> point that should use fences to block when those resources are
>>> exhausted. But that's a different discussion we should have when
>>> reviewing the driver, it has nothing to do with the DRM abstractions nor
>>> the can_run_job callback I'm adding here nor the firmware queue length
>>> limit issue! (And also the global hardware devices are plentiful enough
>>> that I would be very surprised if anyone ever deadlocks it in practice
>>> even with the current code, so I honestly don't think that should be a
>>> blocker for driver submission either, I can and will fix it later...)
>> Well this is what I thought about those problems in amdgpu as well and
>> it totally shipwrecked.
>>
>> We still have memory allocations in the VMID code path which I'm still
>> not sure how to remove.
> We don't even have a shrinker yet, and I'm sure that's going to be a lot
> of fun when we add it too... but yes, if we can't do any memory
> allocations in some of these callbacks (is this documented anywhere?),
> that's going to be interesting...

Yes, that is all part of the dma_fence documentation. It's just 
absolutely not obvious what all this means.

> It's not all bad news though! All memory allocations are fallible in
> kernel Rust (and therefore explicit, and also failures have to be
> explicitly handled or propagated), so it's pretty easy to point out
> where they are, and there are already discussions of higher-level
> tooling to enforce rules like that (and things like wait contexts).
> Also, Rust makes it a lot easier to refactor code in general and not be
> scared that you're going to regress everything, so I'm not really
> worried if I need to turn a chunk of the driver on its head to solve
> some of these problems in the future ^^ (I already did that when I
> switched it from the "demo" synchronous submission model to the proper
> explicit sync + fences one.)

Yeah, well the problem isn't that you run into memory allocation failure.

The problem is rather something like this:
1. You try to allocate memory to signal your fence.
2. This memory allocation can't be fulfilled and goes to sleep to wait 
for reclaim.
3. On another CPU reclaim is running and through the general purpose 
shrinker, page fault or MMU notifier ends up wait for your dma_fence.

You don't even need to implement the shrinker for this to go boom 
extremely easy.

So everything involved with signaling the fence can allocate memory only 
with GFP_ATOMIC and only if you absolutely have to.

Christian.


>
> ~~ Lina


  reply	other threads:[~2023-03-08 17:57 UTC|newest]

Thread overview: 122+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-07 14:25 [PATCH RFC 00/18] Rust DRM subsystem abstractions (& preview AGX driver) Asahi Lina
2023-03-07 14:25 ` [PATCH RFC 01/18] rust: drm: ioctl: Add DRM ioctl abstraction Asahi Lina
2023-03-07 14:48   ` Karol Herbst
2023-03-07 14:51     ` Karol Herbst
2023-03-07 15:32   ` Maíra Canal
2023-03-09  5:32     ` Asahi Lina
2023-03-09  6:15       ` Dave Airlie
2023-03-09 12:09         ` Maíra Canal
2023-03-07 17:34   ` Björn Roy Baron
2023-03-09  6:04     ` Asahi Lina
2023-03-09 20:24       ` Faith Ekstrand
2023-03-09 20:39         ` Karol Herbst
2023-03-10  6:21           ` Asahi Lina
2023-04-13  9:23   ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 02/18] rust: drm: Add Device and Driver abstractions Asahi Lina
2023-03-07 18:19   ` Björn Roy Baron
2023-03-09  6:10     ` Asahi Lina
2023-03-10 18:56   ` Boqun Feng
2023-03-11  5:41   ` Boqun Feng
2023-04-05 17:10   ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 03/18] rust: drm: file: Add File abstraction Asahi Lina
2023-03-09 21:16   ` Faith Ekstrand
2023-03-09 22:16     ` Asahi Lina
2023-03-13 17:49       ` Faith Ekstrand
2023-03-14  2:07         ` Boqun Feng
2023-04-05 11:25           ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 04/18] rust: drm: gem: Add GEM object abstraction Asahi Lina
2023-04-05 11:08   ` Daniel Vetter
2023-04-05 11:19     ` Miguel Ojeda
2023-04-05 11:22       ` Daniel Vetter
2023-04-05 12:32         ` Miguel Ojeda
2023-04-05 12:36           ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 05/18] drm/gem-shmem: Export VM ops functions Asahi Lina
2023-03-07 14:25 ` [PATCH RFC 06/18] rust: drm: gem: shmem: Add DRM shmem helper abstraction Asahi Lina
2023-03-08 13:38   ` Maíra Canal
2023-03-09  5:25     ` Asahi Lina
2023-03-09 11:47       ` Maíra Canal
2023-03-09 14:16         ` Asahi Lina
2023-03-07 14:25 ` [PATCH RFC 07/18] rust: drm: mm: Add DRM MM Range Allocator abstraction Asahi Lina
2023-04-06 14:15   ` Daniel Vetter
2023-04-06 15:28     ` Miguel Ojeda
2023-04-06 15:45       ` Daniel Vetter
2023-04-06 17:19         ` Miguel Ojeda
2023-04-06 15:53     ` Asahi Lina
2023-04-06 16:13       ` [Linaro-mm-sig] " Daniel Vetter
2023-04-06 16:39         ` Asahi Lina
2023-03-07 14:25 ` [PATCH RFC 08/18] rust: dma_fence: Add DMA Fence abstraction Asahi Lina
2023-04-05 11:10   ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 09/18] rust: drm: syncobj: Add DRM Sync Object abstraction Asahi Lina
2023-04-05 12:33   ` Daniel Vetter
2023-04-06 16:04     ` Asahi Lina
2023-03-07 14:25 ` [PATCH RFC 10/18] drm/scheduler: Add can_run_job callback Asahi Lina
2023-03-08  8:46   ` Christian König
2023-03-08  9:41     ` Asahi Lina
2023-03-08 10:00       ` Christian König
2023-03-08 14:53         ` Asahi Lina
2023-03-08 15:30           ` Christian König
2023-03-08 16:44             ` Asahi Lina
2023-03-08 17:57               ` Christian König [this message]
2023-03-08 19:05                 ` Asahi Lina
2023-03-08 19:12                   ` Christian König
2023-03-08 19:45                     ` Asahi Lina
2023-03-08 20:14                       ` Christian König
2023-03-09  6:30                         ` Asahi Lina
2023-03-09  8:05                           ` Christian König
2023-03-09  9:14                             ` Asahi Lina
2023-03-09 18:50                               ` Faith Ekstrand
2023-03-10  9:16                                 ` Asahi Lina
2023-03-08 12:39     ` Karol Herbst
2023-03-08 13:47       ` Christian König
2023-03-08 14:43         ` Karol Herbst
2023-03-08 15:02           ` Christian König
2023-03-08 15:19             ` Karol Herbst
2023-03-16 13:40               ` Daniel Vetter
2023-04-05 13:40   ` Daniel Vetter
2023-04-05 14:14     ` Christian König
2023-04-05 14:21       ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 11/18] drm/scheduler: Clean up jobs when the scheduler is torn down Asahi Lina
2023-03-08  9:57   ` Maarten Lankhorst
2023-03-08 10:03     ` Christian König
2023-03-08 15:18       ` Asahi Lina
2023-03-08 15:42         ` Christian König
2023-03-08 17:32           ` Asahi Lina
2023-03-08 18:12             ` Christian König
2023-03-08 19:37               ` Asahi Lina
2023-03-09  8:42                 ` Christian König
2023-03-09  9:43                   ` Asahi Lina
2023-03-09 11:47                     ` Christian König
2023-03-09 13:48                       ` Asahi Lina
2023-03-09 19:59                     ` Faith Ekstrand
2023-03-10  9:58                       ` Asahi Lina
2023-03-13 20:11                         ` Faith Ekstrand
2023-03-08 17:39           ` alyssa
2023-03-08 17:44             ` Asahi Lina
2023-03-08 18:13             ` Christian König
2023-04-05 13:52   ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 12/18] rust: drm: sched: Add GPU scheduler abstraction Asahi Lina
2023-04-05 15:43   ` Daniel Vetter
2023-04-05 19:29     ` Daniel Vetter
2023-04-18  8:45       ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 13/18] drm/gem: Add a flag to control whether objects can be exported Asahi Lina
2023-04-05 14:55   ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 14/18] rust: drm: gem: Add set_exportable() method Asahi Lina
2023-03-07 14:25 ` [PATCH RFC 15/18] drm/asahi: Add the Asahi driver UAPI [DO NOT MERGE] Asahi Lina
2023-03-07 15:28   ` Karol Herbst
2023-03-07 14:25 ` [PATCH RFC 16/18] rust: bindings: Bind the Asahi DRM UAPI Asahi Lina
2023-03-07 14:25 ` [PATCH RFC 17/18] rust: macros: Add versions macro Asahi Lina
2023-03-07 16:17 ` [PATCH RFC 00/18] Rust DRM subsystem abstractions (& preview AGX driver) Asahi Lina
     [not found] ` <20230307-rust-drm-v1-18-917ff5bc80a8@asahilina.net>
2023-04-05 14:44   ` [PATCH RFC 18/18] drm/asahi: Add the Asahi driver for Apple AGX GPUs Daniel Vetter
2023-04-06  5:02     ` Asahi Lina
2023-04-06  5:09       ` Asahi Lina
2023-04-06 11:25       ` [Linaro-mm-sig] " Daniel Vetter
2023-04-06 13:32         ` Asahi Lina
2023-04-06 13:54           ` Daniel Vetter
     [not found]   ` <ZC2HtBOaoUAzVCVH@phenom.ffwll.local>
2023-04-06  4:44     ` Asahi Lina
2023-04-06  5:09       ` Asahi Lina
2023-04-06 11:26         ` Daniel Vetter
2023-04-06 10:42       ` [Linaro-mm-sig] " Daniel Vetter
2023-04-06 11:55       ` Daniel Vetter
2023-04-06 13:15         ` Asahi Lina
2023-04-06 13:48           ` Daniel Vetter
2023-04-06 15:19             ` Asahi Lina

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b0aa78b2-b432-200a-8953-a80c462fa6ee@amd.com \
    --to=christian.koenig@amd.com \
    --cc=airlied@gmail.com \
    --cc=alex.gaynor@gmail.com \
    --cc=alyssa@rosenzweig.io \
    --cc=asahi@lists.linux.dev \
    --cc=bjorn3_gh@protonmail.com \
    --cc=boqun.feng@gmail.com \
    --cc=daniel@ffwll.ch \
    --cc=dave.hansen@linux.intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=ella@iglunix.org \
    --cc=faith.ekstrand@collabora.com \
    --cc=gary@garyguo.net \
    --cc=jarkko@kernel.org \
    --cc=kherbst@redhat.com \
    --cc=lina@asahilina.net \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=linux-sgx@vger.kernel.org \
    --cc=luben.tuikov@amd.com \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=mary@mary.zone \
    --cc=mripard@kernel.org \
    --cc=ojeda@kernel.org \
    --cc=rust-for-linux@vger.kernel.org \
    --cc=sumit.semwal@linaro.org \
    --cc=tzimmermann@suse.de \
    --cc=wedsonaf@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).