linux-sgx.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Christian König" <christian.koenig@amd.com>
To: "Asahi Lina" <lina@asahilina.net>,
	"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
	"Maxime Ripard" <mripard@kernel.org>,
	"Thomas Zimmermann" <tzimmermann@suse.de>,
	"David Airlie" <airlied@gmail.com>,
	"Daniel Vetter" <daniel@ffwll.ch>,
	"Miguel Ojeda" <ojeda@kernel.org>,
	"Alex Gaynor" <alex.gaynor@gmail.com>,
	"Wedson Almeida Filho" <wedsonaf@gmail.com>,
	"Boqun Feng" <boqun.feng@gmail.com>,
	"Gary Guo" <gary@garyguo.net>,
	"Björn Roy Baron" <bjorn3_gh@protonmail.com>,
	"Sumit Semwal" <sumit.semwal@linaro.org>,
	"Luben Tuikov" <luben.tuikov@amd.com>,
	"Jarkko Sakkinen" <jarkko@kernel.org>,
	"Dave Hansen" <dave.hansen@linux.intel.com>
Cc: Alyssa Rosenzweig <alyssa@rosenzweig.io>,
	Karol Herbst <kherbst@redhat.com>,
	Ella Stanforth <ella@iglunix.org>,
	Faith Ekstrand <faith.ekstrand@collabora.com>,
	Mary <mary@mary.zone>,
	linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org,
	rust-for-linux@vger.kernel.org, linux-media@vger.kernel.org,
	linaro-mm-sig@lists.linaro.org, linux-sgx@vger.kernel.org,
	asahi@lists.linux.dev
Subject: Re: [PATCH RFC 10/18] drm/scheduler: Add can_run_job callback
Date: Wed, 8 Mar 2023 21:14:34 +0100	[thread overview]
Message-ID: <d544748c-8a2b-7c08-f199-182a56af22be@amd.com> (raw)
In-Reply-To: <9c3dc2ad-11e4-6004-7230-8ca752e3d9f7@asahilina.net>

Am 08.03.23 um 20:45 schrieb Asahi Lina:
> On 09/03/2023 04.12, Christian König wrote:
>> Am 08.03.23 um 20:05 schrieb Asahi Lina:
>>> [SNIP]
>>>> Well it's not the better way, it's the only way that works.
>>>>
>>>> I have to admit that my bet on your intentions was wrong, but even that
>>>> use case doesn't work correctly.
>>>>
>>>> See when your callback returns false it is perfectly possible that all
>>>> hw fences are signaled between returning that information and processing it.
>>>>
>>>> The result would be that the scheduler goes to sleep and never wakes up
>>>> again.
>>> That can't happen, because it will just go into another iteration of the
>>> drm_sched main loop since there is an entity available still.
>>>
>>> Rather there is probably the opposite bug in this patch: the can_run_job
>>> logic should be moved into the wait_event_interruptible() condition
>>> check, otherwise I think it can end up busy-looping since the condition
>>> itself can be true even when the can_run_job check blocks it.
>>>
>>> But there is no risk of it going to sleep and never waking up because
>>> job completions will wake up the waitqueue by definition, and that
>>> happens after the driver-side queues are popped. If this problem could
>>> happen, then the existing hw_submission_limit logic would be broken in
>>> the same way. It is logically equivalent in how it works.
>>>
>>> Basically, if properly done in wait_event_interruptible, it is exactly
>>> the logic of that macro that prevents this race condition and makes
>>> everything work at all. Without it, drm_sched would be completely broken.
>>>
>>>> As I said we exercised those ideas before and yes this approach here
>>>> came up before as well and no it doesn't work.
>>> It can never deadlock with this patch as it stands (though it could busy
>>> loop), and if properly moved into the wait_event_interruptible(), it
>>> would also never busy loop and work entirely as intended. The actual API
>>> change is sound.
>>>
>>> I don't know why you're trying so hard to convince everyone that this
>>> approach is fundamentally broken... It might be a bad idea for other
>>> reasons, it might encourage incorrect usage, it might not be the best
>>> option, there are plenty of arguments you can make... but you just keep
>>> trying to make an argument that it just can't work at all for some
>>> reason. Why? I already said I'm happy dropping it in favor of the fences...
>> Well because it is broken.
>>
>> When you move the check into the wait_event_interruptible condition then
>> who is going to call wait_event_interruptible when the condition changes?
> I think you mean wake_up_interruptible(). That would be
> drm_sched_job_done(), on the fence callback when a job completes, which
> as I keep saying is the same logic used for
> hw_rq_count/hw_submission_limit tracking.

As the documentation to wait_event says:

  * wake_up() has to be called after changing any variable that could
  * change the result of the wait condition.

So what you essentially try to do here is to skip that and say 
drm_sched_job_done() would call that anyway, but when you read any 
variable to determine that state then as far as I can see nothing is 
guarantying that order.

The only other possibility how you could use the callback correctly 
would be to call drm_fence_is_signaled() to query the state of your hw 
submission from the same fence which is then signaled. But then the 
question is once more why you don't give that fence directly to the 
scheduler?

> Please think about it for a second,

Yeah, I'm trying to really follow your intentions here. But that doesn't 
really makes sense.

Either you are trying to do something invalid or you are trying to 
circumvent the object model somehow and add a shortcut for the signaling 
API. Both would be more than fishy.

Regards,
Christian.

>   it's really not that complicated to
> see why it works:
>
> - Driver pops off completed commands <-- can_run_job condition satisfied
> - Driver signals fence
>   - drm_sched_job_done_cb()
>    - drm_sched_job_done()
>     - atomic_dec(&sched->hw_rq_count); <-- hw_submission_limit satisfied
>     - ...
>     - wake_up_interruptible(&sched->wake_up_worker);
>        ^- happens after both conditions are potentially satisfied
>
> It really is completely equivalent to just making the hw_rq_count logic
> customizable by the driver. The actual flow is the same. As long as the
> driver guarantees it satisfies the can_run_job() condition before
> signaling the completion fence that triggered that change, it works fine.
>
>> As I said this idea came up before and was rejected multiple times.
> Maybe it was a different idea, or maybe it was rejected for other
> reasons, or maybe it was wrongly rejected for being broken when it isn't ^^
>
> ~~ Lina


  reply	other threads:[~2023-03-08 20:15 UTC|newest]

Thread overview: 122+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-07 14:25 [PATCH RFC 00/18] Rust DRM subsystem abstractions (& preview AGX driver) Asahi Lina
2023-03-07 14:25 ` [PATCH RFC 01/18] rust: drm: ioctl: Add DRM ioctl abstraction Asahi Lina
2023-03-07 14:48   ` Karol Herbst
2023-03-07 14:51     ` Karol Herbst
2023-03-07 15:32   ` Maíra Canal
2023-03-09  5:32     ` Asahi Lina
2023-03-09  6:15       ` Dave Airlie
2023-03-09 12:09         ` Maíra Canal
2023-03-07 17:34   ` Björn Roy Baron
2023-03-09  6:04     ` Asahi Lina
2023-03-09 20:24       ` Faith Ekstrand
2023-03-09 20:39         ` Karol Herbst
2023-03-10  6:21           ` Asahi Lina
2023-04-13  9:23   ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 02/18] rust: drm: Add Device and Driver abstractions Asahi Lina
2023-03-07 18:19   ` Björn Roy Baron
2023-03-09  6:10     ` Asahi Lina
2023-03-10 18:56   ` Boqun Feng
2023-03-11  5:41   ` Boqun Feng
2023-04-05 17:10   ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 03/18] rust: drm: file: Add File abstraction Asahi Lina
2023-03-09 21:16   ` Faith Ekstrand
2023-03-09 22:16     ` Asahi Lina
2023-03-13 17:49       ` Faith Ekstrand
2023-03-14  2:07         ` Boqun Feng
2023-04-05 11:25           ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 04/18] rust: drm: gem: Add GEM object abstraction Asahi Lina
2023-04-05 11:08   ` Daniel Vetter
2023-04-05 11:19     ` Miguel Ojeda
2023-04-05 11:22       ` Daniel Vetter
2023-04-05 12:32         ` Miguel Ojeda
2023-04-05 12:36           ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 05/18] drm/gem-shmem: Export VM ops functions Asahi Lina
2023-03-07 14:25 ` [PATCH RFC 06/18] rust: drm: gem: shmem: Add DRM shmem helper abstraction Asahi Lina
2023-03-08 13:38   ` Maíra Canal
2023-03-09  5:25     ` Asahi Lina
2023-03-09 11:47       ` Maíra Canal
2023-03-09 14:16         ` Asahi Lina
2023-03-07 14:25 ` [PATCH RFC 07/18] rust: drm: mm: Add DRM MM Range Allocator abstraction Asahi Lina
2023-04-06 14:15   ` Daniel Vetter
2023-04-06 15:28     ` Miguel Ojeda
2023-04-06 15:45       ` Daniel Vetter
2023-04-06 17:19         ` Miguel Ojeda
2023-04-06 15:53     ` Asahi Lina
2023-04-06 16:13       ` [Linaro-mm-sig] " Daniel Vetter
2023-04-06 16:39         ` Asahi Lina
2023-03-07 14:25 ` [PATCH RFC 08/18] rust: dma_fence: Add DMA Fence abstraction Asahi Lina
2023-04-05 11:10   ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 09/18] rust: drm: syncobj: Add DRM Sync Object abstraction Asahi Lina
2023-04-05 12:33   ` Daniel Vetter
2023-04-06 16:04     ` Asahi Lina
2023-03-07 14:25 ` [PATCH RFC 10/18] drm/scheduler: Add can_run_job callback Asahi Lina
2023-03-08  8:46   ` Christian König
2023-03-08  9:41     ` Asahi Lina
2023-03-08 10:00       ` Christian König
2023-03-08 14:53         ` Asahi Lina
2023-03-08 15:30           ` Christian König
2023-03-08 16:44             ` Asahi Lina
2023-03-08 17:57               ` Christian König
2023-03-08 19:05                 ` Asahi Lina
2023-03-08 19:12                   ` Christian König
2023-03-08 19:45                     ` Asahi Lina
2023-03-08 20:14                       ` Christian König [this message]
2023-03-09  6:30                         ` Asahi Lina
2023-03-09  8:05                           ` Christian König
2023-03-09  9:14                             ` Asahi Lina
2023-03-09 18:50                               ` Faith Ekstrand
2023-03-10  9:16                                 ` Asahi Lina
2023-03-08 12:39     ` Karol Herbst
2023-03-08 13:47       ` Christian König
2023-03-08 14:43         ` Karol Herbst
2023-03-08 15:02           ` Christian König
2023-03-08 15:19             ` Karol Herbst
2023-03-16 13:40               ` Daniel Vetter
2023-04-05 13:40   ` Daniel Vetter
2023-04-05 14:14     ` Christian König
2023-04-05 14:21       ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 11/18] drm/scheduler: Clean up jobs when the scheduler is torn down Asahi Lina
2023-03-08  9:57   ` Maarten Lankhorst
2023-03-08 10:03     ` Christian König
2023-03-08 15:18       ` Asahi Lina
2023-03-08 15:42         ` Christian König
2023-03-08 17:32           ` Asahi Lina
2023-03-08 18:12             ` Christian König
2023-03-08 19:37               ` Asahi Lina
2023-03-09  8:42                 ` Christian König
2023-03-09  9:43                   ` Asahi Lina
2023-03-09 11:47                     ` Christian König
2023-03-09 13:48                       ` Asahi Lina
2023-03-09 19:59                     ` Faith Ekstrand
2023-03-10  9:58                       ` Asahi Lina
2023-03-13 20:11                         ` Faith Ekstrand
2023-03-08 17:39           ` alyssa
2023-03-08 17:44             ` Asahi Lina
2023-03-08 18:13             ` Christian König
2023-04-05 13:52   ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 12/18] rust: drm: sched: Add GPU scheduler abstraction Asahi Lina
2023-04-05 15:43   ` Daniel Vetter
2023-04-05 19:29     ` Daniel Vetter
2023-04-18  8:45       ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 13/18] drm/gem: Add a flag to control whether objects can be exported Asahi Lina
2023-04-05 14:55   ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 14/18] rust: drm: gem: Add set_exportable() method Asahi Lina
2023-03-07 14:25 ` [PATCH RFC 15/18] drm/asahi: Add the Asahi driver UAPI [DO NOT MERGE] Asahi Lina
2023-03-07 15:28   ` Karol Herbst
2023-03-07 14:25 ` [PATCH RFC 16/18] rust: bindings: Bind the Asahi DRM UAPI Asahi Lina
2023-03-07 14:25 ` [PATCH RFC 17/18] rust: macros: Add versions macro Asahi Lina
2023-03-07 16:17 ` [PATCH RFC 00/18] Rust DRM subsystem abstractions (& preview AGX driver) Asahi Lina
     [not found] ` <20230307-rust-drm-v1-18-917ff5bc80a8@asahilina.net>
2023-04-05 14:44   ` [PATCH RFC 18/18] drm/asahi: Add the Asahi driver for Apple AGX GPUs Daniel Vetter
2023-04-06  5:02     ` Asahi Lina
2023-04-06  5:09       ` Asahi Lina
2023-04-06 11:25       ` [Linaro-mm-sig] " Daniel Vetter
2023-04-06 13:32         ` Asahi Lina
2023-04-06 13:54           ` Daniel Vetter
     [not found]   ` <ZC2HtBOaoUAzVCVH@phenom.ffwll.local>
2023-04-06  4:44     ` Asahi Lina
2023-04-06  5:09       ` Asahi Lina
2023-04-06 11:26         ` Daniel Vetter
2023-04-06 10:42       ` [Linaro-mm-sig] " Daniel Vetter
2023-04-06 11:55       ` Daniel Vetter
2023-04-06 13:15         ` Asahi Lina
2023-04-06 13:48           ` Daniel Vetter
2023-04-06 15:19             ` Asahi Lina

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d544748c-8a2b-7c08-f199-182a56af22be@amd.com \
    --to=christian.koenig@amd.com \
    --cc=airlied@gmail.com \
    --cc=alex.gaynor@gmail.com \
    --cc=alyssa@rosenzweig.io \
    --cc=asahi@lists.linux.dev \
    --cc=bjorn3_gh@protonmail.com \
    --cc=boqun.feng@gmail.com \
    --cc=daniel@ffwll.ch \
    --cc=dave.hansen@linux.intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=ella@iglunix.org \
    --cc=faith.ekstrand@collabora.com \
    --cc=gary@garyguo.net \
    --cc=jarkko@kernel.org \
    --cc=kherbst@redhat.com \
    --cc=lina@asahilina.net \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=linux-sgx@vger.kernel.org \
    --cc=luben.tuikov@amd.com \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=mary@mary.zone \
    --cc=mripard@kernel.org \
    --cc=ojeda@kernel.org \
    --cc=rust-for-linux@vger.kernel.org \
    --cc=sumit.semwal@linaro.org \
    --cc=tzimmermann@suse.de \
    --cc=wedsonaf@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).