linux-sgx.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Vetter <daniel@ffwll.ch>
To: Karol Herbst <kherbst@redhat.com>
Cc: "Christian König" <christian.koenig@amd.com>,
	"Asahi Lina" <lina@asahilina.net>,
	"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
	"Maxime Ripard" <mripard@kernel.org>,
	"Thomas Zimmermann" <tzimmermann@suse.de>,
	"David Airlie" <airlied@gmail.com>,
	"Daniel Vetter" <daniel@ffwll.ch>,
	"Miguel Ojeda" <ojeda@kernel.org>,
	"Alex Gaynor" <alex.gaynor@gmail.com>,
	"Wedson Almeida Filho" <wedsonaf@gmail.com>,
	"Boqun Feng" <boqun.feng@gmail.com>,
	"Gary Guo" <gary@garyguo.net>,
	"Björn Roy Baron" <bjorn3_gh@protonmail.com>,
	"Sumit Semwal" <sumit.semwal@linaro.org>,
	"Luben Tuikov" <luben.tuikov@amd.com>,
	"Jarkko Sakkinen" <jarkko@kernel.org>,
	"Dave Hansen" <dave.hansen@linux.intel.com>,
	"Alyssa Rosenzweig" <alyssa@rosenzweig.io>,
	"Ella Stanforth" <ella@iglunix.org>,
	"Faith Ekstrand" <faith.ekstrand@collabora.com>,
	Mary <mary@mary.zone>,
	linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org,
	rust-for-linux@vger.kernel.org, linux-media@vger.kernel.org,
	linaro-mm-sig@lists.linaro.org, linux-sgx@vger.kernel.org,
	asahi@lists.linux.dev
Subject: Re: [PATCH RFC 10/18] drm/scheduler: Add can_run_job callback
Date: Thu, 16 Mar 2023 14:40:42 +0100	[thread overview]
Message-ID: <ZBMcWuu+W2zfP64O@phenom.ffwll.local> (raw)
In-Reply-To: <CACO55tuVVnC-_iVmX=SGGVF=0t6kqqr0Cy0pGe3YoppBgUbiXw@mail.gmail.com>

On Wed, Mar 08, 2023 at 04:19:17PM +0100, Karol Herbst wrote:
> On Wed, Mar 8, 2023 at 4:09 PM Christian König <christian.koenig@amd.com> wrote:
> >
> > Am 08.03.23 um 15:43 schrieb Karol Herbst:
> > > [SNIP]
> > > "further"? There was no discussion at all,
> >
> > Yeah, well that is exactly what I wanted to archive.
> >
> > >   you just started off like
> > > that. If you think somebody misses that connection, you can point out
> > > to documentation/videos whatever so the contributor can understand
> > > what's wrong with an approach. You did that, so that's fine. It's just
> > > starting off _any_ discussion with a "Well complete NAK" is terrible
> > > style. I'd feel uncomfortable if that happened to me and I'm sure
> > > there are enough people like that that we should be more reasonable
> > > with our replies. Just.. don't.
> > >
> > > We are all humans here and people react negatively to such things. And
> > > if people do it on purpose it just makes it worse.
> >
> > I completely see your point, I just don't know how to improve it.
> >
> > I don't stop people like this because I want to make them uncomfortable
> > but because I want to prevent further discussions on that topic.
> >
> > In other words how can I make people notice that this is something
> > fundamental while still being polite?

Ask them to improve the docs. Gets them on board, and for bonus point you
- can check they actually get it when you review the doc patch
- get scheduler docs for free
- have an easily pasteable link for next time around instead of just an
  aggressive NAK that helps no one really (aside from getting people
  boiling).

It's not really about being polite but making sure that efficient
communiction happens and that you don't have to repeat yourself. In rare
cases you get to type the docs themself when people are too dense to learn
(like what I had to do with the various dma_fence docs).

> I think a little improvement over this would be to at least wait a few
> replies before resorting to those strong statements. Just before it
> becomes a risk in just wasting time.

See above what I'm trying to do. When the message doesn't sink in as
either a proper doc patch or when linking to the doc patch for next time
around (because let's face it, this entire concept of "dma_fence committed
for execution" is extremely trick, there will be repeations of this
question until we've sunset dma_fence, which is probably decades away).

If the learning does not happen, then it's the time to whack the big
hammer (and if people don't get it, you can escalate to Dave&me, we have
tools to make sure people get the message). But this really should be the
end, not the start of the escalation chain :-)

Cheers, Daniel

> 
> > >>>> This is clearly going against the idea of having jobs only depend on
> > >>>> fences and nothing else which is mandatory for correct memory management.
> > >>>>
> > >>> I'm sure it's all documented and there is a design document on how
> > >>> things have to look like you can point out? Might help to get a better
> > >>> understanding on how things should be.
> > >> Yeah, that's the problematic part. We have documented this very
> > >> extensively:
> > >> https://www.kernel.org/doc/html/v5.9/driver-api/dma-buf.html#indefinite-dma-fences
> > >>
> > >> And both Jason and Daniel gave talks about the underlying problem and
> > > fyi:
> > > s/Jason/Faith/g
> >
> > +1. I wasn't aware of that.
> >
> > >> try to come up with patches to raise warnings when that happens, but
> > >> people still keep coming up with the same idea over and over again.
> > >>
> > > Yes, and we'll have to tell them over and over again. Nothing wrong
> > > with that. That's just part of maintaining such a big subsystem. And
> > > that's definitely not a valid reason to phrase things like above.
> > >
> > >> It's just that the technical relationship between preventing jobs from
> > >> running and with that preventing dma_fences from signaling and the core
> > >> memory management with page faults and shrinkers waiting for those
> > >> fences is absolutely not obvious.
> > >>
> > >> We had at least 10 different teams from different companies falling into
> > >> the same trap already and either the patches were rejected of hand or
> > >> had to painfully reverted or mitigated later on.
> > >>
> > > Sure, but that's just part of the job. And pointing out fundamental
> > > mistakes early on is important, but the situation won't get any better
> > > by being like that. Yes, we'll have to repeat the same words over and
> > > over again, and yes that might be annoying, but that's just how it is.
> >
> > Well I have no problem explaining people why a solution doesn't work.
> >
> > But what usually happens is that people don't realize that they need to
> > back of from a design and completely start over.
> >
> > Regards,
> > Christian.
> >
> > >
> > >> Regards,
> > >> Christian.
> > >>
> > >>>> If the hw is busy with something you need to return the fence for this
> > >>>> from the prepare_job callback so that the scheduler can be notified when
> > >>>> the hw is available again.
> > >>>>
> > >>>> Regards,
> > >>>> Christian.
> > >>>>
> > >>>>> Signed-off-by: Asahi Lina <lina@asahilina.net>
> > >>>>> ---
> > >>>>>     drivers/gpu/drm/scheduler/sched_main.c | 10 ++++++++++
> > >>>>>     include/drm/gpu_scheduler.h            |  8 ++++++++
> > >>>>>     2 files changed, 18 insertions(+)
> > >>>>>
> > >>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> > >>>>> index 4e6ad6e122bc..5c0add2c7546 100644
> > >>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
> > >>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > >>>>> @@ -1001,6 +1001,16 @@ static int drm_sched_main(void *param)
> > >>>>>                 if (!entity)
> > >>>>>                         continue;
> > >>>>>
> > >>>>> +             if (sched->ops->can_run_job) {
> > >>>>> +                     sched_job = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
> > >>>>> +                     if (!sched_job) {
> > >>>>> +                             complete_all(&entity->entity_idle);
> > >>>>> +                             continue;
> > >>>>> +                     }
> > >>>>> +                     if (!sched->ops->can_run_job(sched_job))
> > >>>>> +                             continue;
> > >>>>> +             }
> > >>>>> +
> > >>>>>                 sched_job = drm_sched_entity_pop_job(entity);
> > >>>>>
> > >>>>>                 if (!sched_job) {
> > >>>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> > >>>>> index 9db9e5e504ee..bd89ea9507b9 100644
> > >>>>> --- a/include/drm/gpu_scheduler.h
> > >>>>> +++ b/include/drm/gpu_scheduler.h
> > >>>>> @@ -396,6 +396,14 @@ struct drm_sched_backend_ops {
> > >>>>>         struct dma_fence *(*prepare_job)(struct drm_sched_job *sched_job,
> > >>>>>                                          struct drm_sched_entity *s_entity);
> > >>>>>
> > >>>>> +     /**
> > >>>>> +      * @can_run_job: Called before job execution to check whether the
> > >>>>> +      * hardware is free enough to run the job.  This can be used to
> > >>>>> +      * implement more complex hardware resource policies than the
> > >>>>> +      * hw_submission limit.
> > >>>>> +      */
> > >>>>> +     bool (*can_run_job)(struct drm_sched_job *sched_job);
> > >>>>> +
> > >>>>>         /**
> > >>>>>              * @run_job: Called to execute the job once all of the dependencies
> > >>>>>              * have been resolved.  This may be called multiple times, if
> > >>>>>
> >
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

  reply	other threads:[~2023-03-16 13:40 UTC|newest]

Thread overview: 122+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-07 14:25 [PATCH RFC 00/18] Rust DRM subsystem abstractions (& preview AGX driver) Asahi Lina
2023-03-07 14:25 ` [PATCH RFC 01/18] rust: drm: ioctl: Add DRM ioctl abstraction Asahi Lina
2023-03-07 14:48   ` Karol Herbst
2023-03-07 14:51     ` Karol Herbst
2023-03-07 15:32   ` Maíra Canal
2023-03-09  5:32     ` Asahi Lina
2023-03-09  6:15       ` Dave Airlie
2023-03-09 12:09         ` Maíra Canal
2023-03-07 17:34   ` Björn Roy Baron
2023-03-09  6:04     ` Asahi Lina
2023-03-09 20:24       ` Faith Ekstrand
2023-03-09 20:39         ` Karol Herbst
2023-03-10  6:21           ` Asahi Lina
2023-04-13  9:23   ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 02/18] rust: drm: Add Device and Driver abstractions Asahi Lina
2023-03-07 18:19   ` Björn Roy Baron
2023-03-09  6:10     ` Asahi Lina
2023-03-10 18:56   ` Boqun Feng
2023-03-11  5:41   ` Boqun Feng
2023-04-05 17:10   ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 03/18] rust: drm: file: Add File abstraction Asahi Lina
2023-03-09 21:16   ` Faith Ekstrand
2023-03-09 22:16     ` Asahi Lina
2023-03-13 17:49       ` Faith Ekstrand
2023-03-14  2:07         ` Boqun Feng
2023-04-05 11:25           ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 04/18] rust: drm: gem: Add GEM object abstraction Asahi Lina
2023-04-05 11:08   ` Daniel Vetter
2023-04-05 11:19     ` Miguel Ojeda
2023-04-05 11:22       ` Daniel Vetter
2023-04-05 12:32         ` Miguel Ojeda
2023-04-05 12:36           ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 05/18] drm/gem-shmem: Export VM ops functions Asahi Lina
2023-03-07 14:25 ` [PATCH RFC 06/18] rust: drm: gem: shmem: Add DRM shmem helper abstraction Asahi Lina
2023-03-08 13:38   ` Maíra Canal
2023-03-09  5:25     ` Asahi Lina
2023-03-09 11:47       ` Maíra Canal
2023-03-09 14:16         ` Asahi Lina
2023-03-07 14:25 ` [PATCH RFC 07/18] rust: drm: mm: Add DRM MM Range Allocator abstraction Asahi Lina
2023-04-06 14:15   ` Daniel Vetter
2023-04-06 15:28     ` Miguel Ojeda
2023-04-06 15:45       ` Daniel Vetter
2023-04-06 17:19         ` Miguel Ojeda
2023-04-06 15:53     ` Asahi Lina
2023-04-06 16:13       ` [Linaro-mm-sig] " Daniel Vetter
2023-04-06 16:39         ` Asahi Lina
2023-03-07 14:25 ` [PATCH RFC 08/18] rust: dma_fence: Add DMA Fence abstraction Asahi Lina
2023-04-05 11:10   ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 09/18] rust: drm: syncobj: Add DRM Sync Object abstraction Asahi Lina
2023-04-05 12:33   ` Daniel Vetter
2023-04-06 16:04     ` Asahi Lina
2023-03-07 14:25 ` [PATCH RFC 10/18] drm/scheduler: Add can_run_job callback Asahi Lina
2023-03-08  8:46   ` Christian König
2023-03-08  9:41     ` Asahi Lina
2023-03-08 10:00       ` Christian König
2023-03-08 14:53         ` Asahi Lina
2023-03-08 15:30           ` Christian König
2023-03-08 16:44             ` Asahi Lina
2023-03-08 17:57               ` Christian König
2023-03-08 19:05                 ` Asahi Lina
2023-03-08 19:12                   ` Christian König
2023-03-08 19:45                     ` Asahi Lina
2023-03-08 20:14                       ` Christian König
2023-03-09  6:30                         ` Asahi Lina
2023-03-09  8:05                           ` Christian König
2023-03-09  9:14                             ` Asahi Lina
2023-03-09 18:50                               ` Faith Ekstrand
2023-03-10  9:16                                 ` Asahi Lina
2023-03-08 12:39     ` Karol Herbst
2023-03-08 13:47       ` Christian König
2023-03-08 14:43         ` Karol Herbst
2023-03-08 15:02           ` Christian König
2023-03-08 15:19             ` Karol Herbst
2023-03-16 13:40               ` Daniel Vetter [this message]
2023-04-05 13:40   ` Daniel Vetter
2023-04-05 14:14     ` Christian König
2023-04-05 14:21       ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 11/18] drm/scheduler: Clean up jobs when the scheduler is torn down Asahi Lina
2023-03-08  9:57   ` Maarten Lankhorst
2023-03-08 10:03     ` Christian König
2023-03-08 15:18       ` Asahi Lina
2023-03-08 15:42         ` Christian König
2023-03-08 17:32           ` Asahi Lina
2023-03-08 18:12             ` Christian König
2023-03-08 19:37               ` Asahi Lina
2023-03-09  8:42                 ` Christian König
2023-03-09  9:43                   ` Asahi Lina
2023-03-09 11:47                     ` Christian König
2023-03-09 13:48                       ` Asahi Lina
2023-03-09 19:59                     ` Faith Ekstrand
2023-03-10  9:58                       ` Asahi Lina
2023-03-13 20:11                         ` Faith Ekstrand
2023-03-08 17:39           ` alyssa
2023-03-08 17:44             ` Asahi Lina
2023-03-08 18:13             ` Christian König
2023-04-05 13:52   ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 12/18] rust: drm: sched: Add GPU scheduler abstraction Asahi Lina
2023-04-05 15:43   ` Daniel Vetter
2023-04-05 19:29     ` Daniel Vetter
2023-04-18  8:45       ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 13/18] drm/gem: Add a flag to control whether objects can be exported Asahi Lina
2023-04-05 14:55   ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 14/18] rust: drm: gem: Add set_exportable() method Asahi Lina
2023-03-07 14:25 ` [PATCH RFC 15/18] drm/asahi: Add the Asahi driver UAPI [DO NOT MERGE] Asahi Lina
2023-03-07 15:28   ` Karol Herbst
2023-03-07 14:25 ` [PATCH RFC 16/18] rust: bindings: Bind the Asahi DRM UAPI Asahi Lina
2023-03-07 14:25 ` [PATCH RFC 17/18] rust: macros: Add versions macro Asahi Lina
2023-03-07 16:17 ` [PATCH RFC 00/18] Rust DRM subsystem abstractions (& preview AGX driver) Asahi Lina
     [not found] ` <20230307-rust-drm-v1-18-917ff5bc80a8@asahilina.net>
2023-04-05 14:44   ` [PATCH RFC 18/18] drm/asahi: Add the Asahi driver for Apple AGX GPUs Daniel Vetter
2023-04-06  5:02     ` Asahi Lina
2023-04-06  5:09       ` Asahi Lina
2023-04-06 11:25       ` [Linaro-mm-sig] " Daniel Vetter
2023-04-06 13:32         ` Asahi Lina
2023-04-06 13:54           ` Daniel Vetter
     [not found]   ` <ZC2HtBOaoUAzVCVH@phenom.ffwll.local>
2023-04-06  4:44     ` Asahi Lina
2023-04-06  5:09       ` Asahi Lina
2023-04-06 11:26         ` Daniel Vetter
2023-04-06 10:42       ` [Linaro-mm-sig] " Daniel Vetter
2023-04-06 11:55       ` Daniel Vetter
2023-04-06 13:15         ` Asahi Lina
2023-04-06 13:48           ` Daniel Vetter
2023-04-06 15:19             ` Asahi Lina

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZBMcWuu+W2zfP64O@phenom.ffwll.local \
    --to=daniel@ffwll.ch \
    --cc=airlied@gmail.com \
    --cc=alex.gaynor@gmail.com \
    --cc=alyssa@rosenzweig.io \
    --cc=asahi@lists.linux.dev \
    --cc=bjorn3_gh@protonmail.com \
    --cc=boqun.feng@gmail.com \
    --cc=christian.koenig@amd.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=ella@iglunix.org \
    --cc=faith.ekstrand@collabora.com \
    --cc=gary@garyguo.net \
    --cc=jarkko@kernel.org \
    --cc=kherbst@redhat.com \
    --cc=lina@asahilina.net \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=linux-sgx@vger.kernel.org \
    --cc=luben.tuikov@amd.com \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=mary@mary.zone \
    --cc=mripard@kernel.org \
    --cc=ojeda@kernel.org \
    --cc=rust-for-linux@vger.kernel.org \
    --cc=sumit.semwal@linaro.org \
    --cc=tzimmermann@suse.de \
    --cc=wedsonaf@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).