linux-sgx.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Christian König" <christian.koenig@amd.com>
To: "Asahi Lina" <lina@asahilina.net>,
	"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
	"Maxime Ripard" <mripard@kernel.org>,
	"Thomas Zimmermann" <tzimmermann@suse.de>,
	"David Airlie" <airlied@gmail.com>,
	"Daniel Vetter" <daniel@ffwll.ch>,
	"Miguel Ojeda" <ojeda@kernel.org>,
	"Alex Gaynor" <alex.gaynor@gmail.com>,
	"Wedson Almeida Filho" <wedsonaf@gmail.com>,
	"Boqun Feng" <boqun.feng@gmail.com>,
	"Gary Guo" <gary@garyguo.net>,
	"Björn Roy Baron" <bjorn3_gh@protonmail.com>,
	"Sumit Semwal" <sumit.semwal@linaro.org>,
	"Luben Tuikov" <luben.tuikov@amd.com>,
	"Jarkko Sakkinen" <jarkko@kernel.org>,
	"Dave Hansen" <dave.hansen@linux.intel.com>
Cc: Alyssa Rosenzweig <alyssa@rosenzweig.io>,
	Karol Herbst <kherbst@redhat.com>,
	Ella Stanforth <ella@iglunix.org>,
	Faith Ekstrand <faith.ekstrand@collabora.com>,
	Mary <mary@mary.zone>,
	linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org,
	rust-for-linux@vger.kernel.org, linux-media@vger.kernel.org,
	linaro-mm-sig@lists.linaro.org, linux-sgx@vger.kernel.org,
	asahi@lists.linux.dev
Subject: Re: [PATCH RFC 10/18] drm/scheduler: Add can_run_job callback
Date: Wed, 8 Mar 2023 16:30:52 +0100	[thread overview]
Message-ID: <a39c6b40-f190-002d-ae1c-8b58c6442df2@amd.com> (raw)
In-Reply-To: <9f76bb68-b462-b138-d0ad-d27c972530d4@asahilina.net>

Am 08.03.23 um 15:53 schrieb Asahi Lina:
> [SNIP]
>> The background is that core memory management requires that signaling a
>> fence only depends on signaling other fences and hardware progress and
>> nothing else. Otherwise you immediately run into problems because of
>> circle dependencies or what we call infinite fences.
> And hardware progress is exactly the only dependency here...

Well then you should have a fence for that hardware progress.

>> Jason Ekstrand gave a create presentation on that problem a few years
>> ago on LPC. I strongly suggest you google that one up.
> Faith Ekstrand (it looks like you mistyped that name...)

My fault I was really just mistyping that :)

>   is the person
> who proposed that I should use drm_sched in this way (see below), we've
> had a few private meetings about this design ^^
>
>>> You can see the driver implementation of that callback in
>>> drivers/gpu/drm/asahi/queue/mod.rs (QueueJob::can_run()), which then
>>> calls into drivers/gpu/drm/asahi/workqueue.rs (Job::can_submit()) that
>>> does the actual available slot count checks.
>>>
>>> The can_run_job logic is written to mirror the hw_submission_limit logic
>>> (just a bit later in the sched main loop since we need to actually pick
>>> a job to do the check), and just like for that case, completion of any
>>> job in the same scheduler will cause another run of the main loop and
>>> another check (which is exactly what we want here).
>> Yeah and that hw_submission_limit is based on a fence signaling again.
> I don't think so...? It's just an atomic that gets checked in
> drm_sched_ready(). There are no extra fences involved (other than the
> job completion fences that trigger another scheduler run). The idea is
> that when the hardware queue makes forward progress you check against
> the limit again and submit more jobs as needed. I'm doing the same exact
> thing, I'm just using more complex logic for the notion of in-flight
> queue limits!

Then why can't you express that logic in a dependency fence?

>> When you have some firmware limitation that a job needs resources which
>> are currently in use by other submissions then those other submissions
>> have fences as well and you can return those in the prepare_job callback.
>>
>> If those other submissions don't have fences, then you have a major
>> design problem inside your driver and we need to get back to square one
>> and talk about that dependency handling.
> I think we have a disconnect in our views of what is going on here...
>
> This hardware has firmware-side scheduling with an arbitrary (as far as
> I know) number of queues. There is one scheduler instance and one entity
> per userspace queue (not global!). These queues process jobs in some
> logical sequence, though at the firmware level they get split into up to
> three queues each (and there is some parallelism allowed). The
> limitation here is in the number of in-flight jobs per firmware queue,
> not global.

So far I'm familiar with that design.

> There is no way for things to deadlock. If jobs have been submitted to
> the firmware queue, that means their dependencies were signaled already.
> Jobs have intra-job dependencies via driver barriers (which drm_sched
> knows nothing about), but the submission code in the driver guarantees
> that they are deadlock-free since you can only barrier on past commands,
> which by definition submit first.
>
> If a firmware queue is full, drm_sched blocks. Since it is full, that
> means it will run those commands (since they have no outside
> dependencies and they are already queued and ready to run by the
> firmware), eventually space will be freed, and each time a job completes
> drm_sched will do the can_run_job check again and decide whether to run
> a new job.
>
> Since the firmware queues contain commands which only have past-facing
> barriers on other already submitted commands, by definition they will
> become empty at some point as long as the firmware is making forward
> progress. And therefore, by definition, can_run_job will eventually
> return true at some point after a job completion fence is signaled (the
> one for the last job submitted prior). There is a check in the driver to
> ensure that we do not allow submissions which, by themselves, would
> exceed the queued command limit (we actually just limit to 64 commands
> overall right now, which is conservative but seems reasonable given the
> 128-per-firmware-queue limit).

Well then again why don't you give that fence out as dependency? Is it 
because the scheduler tries to optimize those away?

> I get the feeling that you are conflating pending jobs with submitted
> jobs. This isn't about how many jobs you can have pending in drm_sched
> before running them or anything like that. Of course, at that point,
> arbitrary dependencies come into play and you can end up with deadlocks
> on dependency fences. But that's not the case here. What can_run_job is
> waiting on is guaranteed to make forward progress.

I see that we have a disconnection here. As far as I can see you can use 
the can_run callback in only three ways:

1. To check for some userspace dependency (We don't need to discuss 
that, it's evil and we both know it).

2. You check for some hw resource availability. Similar to VMID on 
amdgpu hw.

     This is what I think you do here (but I might be wrong). But this 
would be extremely problematic because you can then live lock.
     E.g. queue A keeps submitting jobs which take only a few resources 
and by doing so delays submitting jobs from queue B indefinitely.

3. You have an intra queue dependency. E.g. you have jobs which take X 
amount of resources, you can submit only to a specific limit.
     But in this case you should be able to return fences from the same 
queue as dependency and won't need that callback.

     We would just need to adjust drm_sched_entity_add_dependency_cb() a 
bit because dependencies from the same queue are currently filtered out 
because it assumes a pipeline nature of submission (e.g. previous 
submissions are finished before new submissions start).

>>> This case (potentially scheduling more than the FW job limit) is rare
>>> but handling it is necessary, since otherwise the entire job
>>> completion/tracking logic gets screwed up on the firmware end and queues
>>> end up stuck (I've managed to trigger this before).
>> Actually that's a pretty normal use case. I've have rejected similar
>> requirements like this before as well.
>>
>> For an example how this can work see amdgpu_job_prepare_job():
>> https://elixir.bootlin.com/linux/v6.3-rc1/source/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c#L251
>>
>> The gang submit gives and example of a global fence lock and the VMIDs
>> are an example of a global shared firmware resource.
> But the resource can_run_job is checking on isn't globally shared! It's
> specific to this scheduler instance, just like hw_submission_limit is,
> so as long as the firmware behind the scheduler is making forward
> progress, the resource will be guaranteed to be freed until another job
> can run.

Well either it should be globally shared because it is a shared resource 
(similar to our VMID or gangs) or it is an intra queue limitation in 
which case you could just use the fences previously submitted on the 
queue as dependency.

> I actually know I have a different theoretical deadlock issue along
> these lines in the driver because right now we grab actually global
> resources (including a VMID) before job submission to drm_sched. This is
> a known issue, and to fix it without reducing performance I need to
> introduce some kind of "patching/fixup" system for firmware commands
> (because we need to inject those identifiers in dozens of places, but we
> don't want to construct those commands from scratch at job run time
> because that introduces latency at the wrong time and makes error
> handling/validation more complicated and error-prone), and that is
> exactly what should happen in prepare_job, as you say. And yes, at that
> point that should use fences to block when those resources are
> exhausted. But that's a different discussion we should have when
> reviewing the driver, it has nothing to do with the DRM abstractions nor
> the can_run_job callback I'm adding here nor the firmware queue length
> limit issue! (And also the global hardware devices are plentiful enough
> that I would be very surprised if anyone ever deadlocks it in practice
> even with the current code, so I honestly don't think that should be a
> blocker for driver submission either, I can and will fix it later...)

Well this is what I thought about those problems in amdgpu as well and 
it totally shipwrecked.

We still have memory allocations in the VMID code path which I'm still 
not sure how to remove.

Regards,
Christian.

>
> ~~ Lina


  reply	other threads:[~2023-03-08 15:31 UTC|newest]

Thread overview: 122+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-07 14:25 [PATCH RFC 00/18] Rust DRM subsystem abstractions (& preview AGX driver) Asahi Lina
2023-03-07 14:25 ` [PATCH RFC 01/18] rust: drm: ioctl: Add DRM ioctl abstraction Asahi Lina
2023-03-07 14:48   ` Karol Herbst
2023-03-07 14:51     ` Karol Herbst
2023-03-07 15:32   ` Maíra Canal
2023-03-09  5:32     ` Asahi Lina
2023-03-09  6:15       ` Dave Airlie
2023-03-09 12:09         ` Maíra Canal
2023-03-07 17:34   ` Björn Roy Baron
2023-03-09  6:04     ` Asahi Lina
2023-03-09 20:24       ` Faith Ekstrand
2023-03-09 20:39         ` Karol Herbst
2023-03-10  6:21           ` Asahi Lina
2023-04-13  9:23   ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 02/18] rust: drm: Add Device and Driver abstractions Asahi Lina
2023-03-07 18:19   ` Björn Roy Baron
2023-03-09  6:10     ` Asahi Lina
2023-03-10 18:56   ` Boqun Feng
2023-03-11  5:41   ` Boqun Feng
2023-04-05 17:10   ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 03/18] rust: drm: file: Add File abstraction Asahi Lina
2023-03-09 21:16   ` Faith Ekstrand
2023-03-09 22:16     ` Asahi Lina
2023-03-13 17:49       ` Faith Ekstrand
2023-03-14  2:07         ` Boqun Feng
2023-04-05 11:25           ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 04/18] rust: drm: gem: Add GEM object abstraction Asahi Lina
2023-04-05 11:08   ` Daniel Vetter
2023-04-05 11:19     ` Miguel Ojeda
2023-04-05 11:22       ` Daniel Vetter
2023-04-05 12:32         ` Miguel Ojeda
2023-04-05 12:36           ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 05/18] drm/gem-shmem: Export VM ops functions Asahi Lina
2023-03-07 14:25 ` [PATCH RFC 06/18] rust: drm: gem: shmem: Add DRM shmem helper abstraction Asahi Lina
2023-03-08 13:38   ` Maíra Canal
2023-03-09  5:25     ` Asahi Lina
2023-03-09 11:47       ` Maíra Canal
2023-03-09 14:16         ` Asahi Lina
2023-03-07 14:25 ` [PATCH RFC 07/18] rust: drm: mm: Add DRM MM Range Allocator abstraction Asahi Lina
2023-04-06 14:15   ` Daniel Vetter
2023-04-06 15:28     ` Miguel Ojeda
2023-04-06 15:45       ` Daniel Vetter
2023-04-06 17:19         ` Miguel Ojeda
2023-04-06 15:53     ` Asahi Lina
2023-04-06 16:13       ` [Linaro-mm-sig] " Daniel Vetter
2023-04-06 16:39         ` Asahi Lina
2023-03-07 14:25 ` [PATCH RFC 08/18] rust: dma_fence: Add DMA Fence abstraction Asahi Lina
2023-04-05 11:10   ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 09/18] rust: drm: syncobj: Add DRM Sync Object abstraction Asahi Lina
2023-04-05 12:33   ` Daniel Vetter
2023-04-06 16:04     ` Asahi Lina
2023-03-07 14:25 ` [PATCH RFC 10/18] drm/scheduler: Add can_run_job callback Asahi Lina
2023-03-08  8:46   ` Christian König
2023-03-08  9:41     ` Asahi Lina
2023-03-08 10:00       ` Christian König
2023-03-08 14:53         ` Asahi Lina
2023-03-08 15:30           ` Christian König [this message]
2023-03-08 16:44             ` Asahi Lina
2023-03-08 17:57               ` Christian König
2023-03-08 19:05                 ` Asahi Lina
2023-03-08 19:12                   ` Christian König
2023-03-08 19:45                     ` Asahi Lina
2023-03-08 20:14                       ` Christian König
2023-03-09  6:30                         ` Asahi Lina
2023-03-09  8:05                           ` Christian König
2023-03-09  9:14                             ` Asahi Lina
2023-03-09 18:50                               ` Faith Ekstrand
2023-03-10  9:16                                 ` Asahi Lina
2023-03-08 12:39     ` Karol Herbst
2023-03-08 13:47       ` Christian König
2023-03-08 14:43         ` Karol Herbst
2023-03-08 15:02           ` Christian König
2023-03-08 15:19             ` Karol Herbst
2023-03-16 13:40               ` Daniel Vetter
2023-04-05 13:40   ` Daniel Vetter
2023-04-05 14:14     ` Christian König
2023-04-05 14:21       ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 11/18] drm/scheduler: Clean up jobs when the scheduler is torn down Asahi Lina
2023-03-08  9:57   ` Maarten Lankhorst
2023-03-08 10:03     ` Christian König
2023-03-08 15:18       ` Asahi Lina
2023-03-08 15:42         ` Christian König
2023-03-08 17:32           ` Asahi Lina
2023-03-08 18:12             ` Christian König
2023-03-08 19:37               ` Asahi Lina
2023-03-09  8:42                 ` Christian König
2023-03-09  9:43                   ` Asahi Lina
2023-03-09 11:47                     ` Christian König
2023-03-09 13:48                       ` Asahi Lina
2023-03-09 19:59                     ` Faith Ekstrand
2023-03-10  9:58                       ` Asahi Lina
2023-03-13 20:11                         ` Faith Ekstrand
2023-03-08 17:39           ` alyssa
2023-03-08 17:44             ` Asahi Lina
2023-03-08 18:13             ` Christian König
2023-04-05 13:52   ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 12/18] rust: drm: sched: Add GPU scheduler abstraction Asahi Lina
2023-04-05 15:43   ` Daniel Vetter
2023-04-05 19:29     ` Daniel Vetter
2023-04-18  8:45       ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 13/18] drm/gem: Add a flag to control whether objects can be exported Asahi Lina
2023-04-05 14:55   ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 14/18] rust: drm: gem: Add set_exportable() method Asahi Lina
2023-03-07 14:25 ` [PATCH RFC 15/18] drm/asahi: Add the Asahi driver UAPI [DO NOT MERGE] Asahi Lina
2023-03-07 15:28   ` Karol Herbst
2023-03-07 14:25 ` [PATCH RFC 16/18] rust: bindings: Bind the Asahi DRM UAPI Asahi Lina
2023-03-07 14:25 ` [PATCH RFC 17/18] rust: macros: Add versions macro Asahi Lina
2023-03-07 16:17 ` [PATCH RFC 00/18] Rust DRM subsystem abstractions (& preview AGX driver) Asahi Lina
     [not found] ` <20230307-rust-drm-v1-18-917ff5bc80a8@asahilina.net>
2023-04-05 14:44   ` [PATCH RFC 18/18] drm/asahi: Add the Asahi driver for Apple AGX GPUs Daniel Vetter
2023-04-06  5:02     ` Asahi Lina
2023-04-06  5:09       ` Asahi Lina
2023-04-06 11:25       ` [Linaro-mm-sig] " Daniel Vetter
2023-04-06 13:32         ` Asahi Lina
2023-04-06 13:54           ` Daniel Vetter
     [not found]   ` <ZC2HtBOaoUAzVCVH@phenom.ffwll.local>
2023-04-06  4:44     ` Asahi Lina
2023-04-06  5:09       ` Asahi Lina
2023-04-06 11:26         ` Daniel Vetter
2023-04-06 10:42       ` [Linaro-mm-sig] " Daniel Vetter
2023-04-06 11:55       ` Daniel Vetter
2023-04-06 13:15         ` Asahi Lina
2023-04-06 13:48           ` Daniel Vetter
2023-04-06 15:19             ` Asahi Lina

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a39c6b40-f190-002d-ae1c-8b58c6442df2@amd.com \
    --to=christian.koenig@amd.com \
    --cc=airlied@gmail.com \
    --cc=alex.gaynor@gmail.com \
    --cc=alyssa@rosenzweig.io \
    --cc=asahi@lists.linux.dev \
    --cc=bjorn3_gh@protonmail.com \
    --cc=boqun.feng@gmail.com \
    --cc=daniel@ffwll.ch \
    --cc=dave.hansen@linux.intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=ella@iglunix.org \
    --cc=faith.ekstrand@collabora.com \
    --cc=gary@garyguo.net \
    --cc=jarkko@kernel.org \
    --cc=kherbst@redhat.com \
    --cc=lina@asahilina.net \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=linux-sgx@vger.kernel.org \
    --cc=luben.tuikov@amd.com \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=mary@mary.zone \
    --cc=mripard@kernel.org \
    --cc=ojeda@kernel.org \
    --cc=rust-for-linux@vger.kernel.org \
    --cc=sumit.semwal@linaro.org \
    --cc=tzimmermann@suse.de \
    --cc=wedsonaf@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).