linux-sgx.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Vetter <daniel@ffwll.ch>
To: Asahi Lina <lina@asahilina.net>
Cc: "Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
	"Maxime Ripard" <mripard@kernel.org>,
	"Thomas Zimmermann" <tzimmermann@suse.de>,
	"David Airlie" <airlied@gmail.com>,
	"Daniel Vetter" <daniel@ffwll.ch>,
	"Miguel Ojeda" <ojeda@kernel.org>,
	"Alex Gaynor" <alex.gaynor@gmail.com>,
	"Wedson Almeida Filho" <wedsonaf@gmail.com>,
	"Boqun Feng" <boqun.feng@gmail.com>,
	"Gary Guo" <gary@garyguo.net>,
	"Björn Roy Baron" <bjorn3_gh@protonmail.com>,
	"Sumit Semwal" <sumit.semwal@linaro.org>,
	"Christian König" <christian.koenig@amd.com>,
	"Luben Tuikov" <luben.tuikov@amd.com>,
	"Jarkko Sakkinen" <jarkko@kernel.org>,
	"Dave Hansen" <dave.hansen@linux.intel.com>,
	"Alyssa Rosenzweig" <alyssa@rosenzweig.io>,
	"Karol Herbst" <kherbst@redhat.com>,
	"Ella Stanforth" <ella@iglunix.org>,
	"Faith Ekstrand" <faith.ekstrand@collabora.com>,
	Mary <mary@mary.zone>,
	linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org,
	rust-for-linux@vger.kernel.org, linux-media@vger.kernel.org,
	linaro-mm-sig@lists.linaro.org, linux-sgx@vger.kernel.org,
	asahi@lists.linux.dev
Subject: Re: [PATCH RFC 11/18] drm/scheduler: Clean up jobs when the scheduler is torn down
Date: Wed, 5 Apr 2023 15:52:44 +0200	[thread overview]
Message-ID: <ZC19LIsNlhFC+7ME@phenom.ffwll.local> (raw)
In-Reply-To: <20230307-rust-drm-v1-11-917ff5bc80a8@asahilina.net>

On Tue, Mar 07, 2023 at 11:25:36PM +0900, Asahi Lina wrote:
> drm_sched_fini() currently leaves any pending jobs dangling, which
> causes segfaults and other badness when job completion fences are
> signaled after the scheduler is torn down.
> 
> Explicitly detach all jobs from their completion callbacks and free
> them. This makes it possible to write a sensible safe abstraction for
> drm_sched, without having to externally duplicate the tracking of
> in-flight jobs.
> 
> This shouldn't regress any existing drivers, since calling
> drm_sched_fini() with any pending jobs is broken and this change should
> be a no-op if there are no pending jobs.
> 
> Signed-off-by: Asahi Lina <lina@asahilina.net>
> ---
>  drivers/gpu/drm/scheduler/sched_main.c | 27 +++++++++++++++++++++++++--
>  1 file changed, 25 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 5c0add2c7546..0aab1e0aebdd 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -1119,10 +1119,33 @@ EXPORT_SYMBOL(drm_sched_init);
>  void drm_sched_fini(struct drm_gpu_scheduler *sched)
>  {
>  	struct drm_sched_entity *s_entity;
> +	struct drm_sched_job *s_job, *tmp;
>  	int i;
>  
> -	if (sched->thread)
> -		kthread_stop(sched->thread);
> +	if (!sched->thread)
> +		return;
> +
> +	/*
> +	 * Stop the scheduler, detaching all jobs from their hardware callbacks
> +	 * and cleaning up complete jobs.
> +	 */
> +	drm_sched_stop(sched, NULL);
> +
> +	/*
> +	 * Iterate through the pending job list and free all jobs.
> +	 * This assumes the driver has either guaranteed jobs are already stopped, or that
> +	 * otherwise it is responsible for keeping any necessary data structures for
> +	 * in-progress jobs alive even when the free_job() callback is called early (e.g. by
> +	 * putting them in its own queue or doing its own refcounting).
> +	 */

This comment makes me wonder whether we shouldn't go one step further and
have a drm_sched_quiescent, which waits for any in-flight jobs to complete
and cancels everything else. Because even if rust guarantees that you
don't have any memory bugs, if you just leak things by sprinkling
reference-counted pointer wrappers everywhere you still have a semantic
bug.

Except now it's much harder to realize that because there's no Oops and
KASAN doesn't tell you about it either. I think it would be much better if
the scheduler code and rust abstraction provider drivers the correct
lifetimes and very strongly encourage them to only have borrowed
references and not additional refcounting of their own.

I think Christian mentioned that this would block in close() or context
destruction, which is no good at all. And with the 1:1
drm_scheduler:drm_sched_entity design for there's no other place. This is
way I've suggested in the Xe threads that we should make the current
drm_scheduler an implementation detail hidden from drivers, with a new
drm_scheduler which is always per-engine for all cases as the driver api
interface.  And the internal scheduler attached to either that (for
current drivers) or drm_sched_entity (for fw scheduling drivers) as
needed. With that
- the sched_entity cleanup could take care of this code here for the fw
  scheduler case
- the drm_sched_fini could take care of blocking appropriately before the
  driver is unloaded for any lagging in-flight jobs, without blocking
  userspace
- drivers should not end up with any need to reference-count either
  per-ctx/drm_sched_entity or per-drm_sched_job data, ever

Because any comment that's along the lines of "drivers need to refcount"
is bad business, because it either means leaks (rust) or crashes (C). I
much prefer when drivers have to put in extra effort to get things wrong
because by default the lifetimes are Just Right(tm).
-Daniel

> +	list_for_each_entry_safe(s_job, tmp, &sched->pending_list, list) {
> +		spin_lock(&sched->job_list_lock);
> +		list_del_init(&s_job->list);
> +		spin_unlock(&sched->job_list_lock);
> +		sched->ops->free_job(s_job);
> +	}
> +
> +	kthread_stop(sched->thread);
>  
>  	for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) {
>  		struct drm_sched_rq *rq = &sched->sched_rq[i];
> 
> -- 
> 2.35.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

  parent reply	other threads:[~2023-04-05 13:52 UTC|newest]

Thread overview: 122+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-07 14:25 [PATCH RFC 00/18] Rust DRM subsystem abstractions (& preview AGX driver) Asahi Lina
2023-03-07 14:25 ` [PATCH RFC 01/18] rust: drm: ioctl: Add DRM ioctl abstraction Asahi Lina
2023-03-07 14:48   ` Karol Herbst
2023-03-07 14:51     ` Karol Herbst
2023-03-07 15:32   ` Maíra Canal
2023-03-09  5:32     ` Asahi Lina
2023-03-09  6:15       ` Dave Airlie
2023-03-09 12:09         ` Maíra Canal
2023-03-07 17:34   ` Björn Roy Baron
2023-03-09  6:04     ` Asahi Lina
2023-03-09 20:24       ` Faith Ekstrand
2023-03-09 20:39         ` Karol Herbst
2023-03-10  6:21           ` Asahi Lina
2023-04-13  9:23   ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 02/18] rust: drm: Add Device and Driver abstractions Asahi Lina
2023-03-07 18:19   ` Björn Roy Baron
2023-03-09  6:10     ` Asahi Lina
2023-03-10 18:56   ` Boqun Feng
2023-03-11  5:41   ` Boqun Feng
2023-04-05 17:10   ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 03/18] rust: drm: file: Add File abstraction Asahi Lina
2023-03-09 21:16   ` Faith Ekstrand
2023-03-09 22:16     ` Asahi Lina
2023-03-13 17:49       ` Faith Ekstrand
2023-03-14  2:07         ` Boqun Feng
2023-04-05 11:25           ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 04/18] rust: drm: gem: Add GEM object abstraction Asahi Lina
2023-04-05 11:08   ` Daniel Vetter
2023-04-05 11:19     ` Miguel Ojeda
2023-04-05 11:22       ` Daniel Vetter
2023-04-05 12:32         ` Miguel Ojeda
2023-04-05 12:36           ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 05/18] drm/gem-shmem: Export VM ops functions Asahi Lina
2023-03-07 14:25 ` [PATCH RFC 06/18] rust: drm: gem: shmem: Add DRM shmem helper abstraction Asahi Lina
2023-03-08 13:38   ` Maíra Canal
2023-03-09  5:25     ` Asahi Lina
2023-03-09 11:47       ` Maíra Canal
2023-03-09 14:16         ` Asahi Lina
2023-03-07 14:25 ` [PATCH RFC 07/18] rust: drm: mm: Add DRM MM Range Allocator abstraction Asahi Lina
2023-04-06 14:15   ` Daniel Vetter
2023-04-06 15:28     ` Miguel Ojeda
2023-04-06 15:45       ` Daniel Vetter
2023-04-06 17:19         ` Miguel Ojeda
2023-04-06 15:53     ` Asahi Lina
2023-04-06 16:13       ` [Linaro-mm-sig] " Daniel Vetter
2023-04-06 16:39         ` Asahi Lina
2023-03-07 14:25 ` [PATCH RFC 08/18] rust: dma_fence: Add DMA Fence abstraction Asahi Lina
2023-04-05 11:10   ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 09/18] rust: drm: syncobj: Add DRM Sync Object abstraction Asahi Lina
2023-04-05 12:33   ` Daniel Vetter
2023-04-06 16:04     ` Asahi Lina
2023-03-07 14:25 ` [PATCH RFC 10/18] drm/scheduler: Add can_run_job callback Asahi Lina
2023-03-08  8:46   ` Christian König
2023-03-08  9:41     ` Asahi Lina
2023-03-08 10:00       ` Christian König
2023-03-08 14:53         ` Asahi Lina
2023-03-08 15:30           ` Christian König
2023-03-08 16:44             ` Asahi Lina
2023-03-08 17:57               ` Christian König
2023-03-08 19:05                 ` Asahi Lina
2023-03-08 19:12                   ` Christian König
2023-03-08 19:45                     ` Asahi Lina
2023-03-08 20:14                       ` Christian König
2023-03-09  6:30                         ` Asahi Lina
2023-03-09  8:05                           ` Christian König
2023-03-09  9:14                             ` Asahi Lina
2023-03-09 18:50                               ` Faith Ekstrand
2023-03-10  9:16                                 ` Asahi Lina
2023-03-08 12:39     ` Karol Herbst
2023-03-08 13:47       ` Christian König
2023-03-08 14:43         ` Karol Herbst
2023-03-08 15:02           ` Christian König
2023-03-08 15:19             ` Karol Herbst
2023-03-16 13:40               ` Daniel Vetter
2023-04-05 13:40   ` Daniel Vetter
2023-04-05 14:14     ` Christian König
2023-04-05 14:21       ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 11/18] drm/scheduler: Clean up jobs when the scheduler is torn down Asahi Lina
2023-03-08  9:57   ` Maarten Lankhorst
2023-03-08 10:03     ` Christian König
2023-03-08 15:18       ` Asahi Lina
2023-03-08 15:42         ` Christian König
2023-03-08 17:32           ` Asahi Lina
2023-03-08 18:12             ` Christian König
2023-03-08 19:37               ` Asahi Lina
2023-03-09  8:42                 ` Christian König
2023-03-09  9:43                   ` Asahi Lina
2023-03-09 11:47                     ` Christian König
2023-03-09 13:48                       ` Asahi Lina
2023-03-09 19:59                     ` Faith Ekstrand
2023-03-10  9:58                       ` Asahi Lina
2023-03-13 20:11                         ` Faith Ekstrand
2023-03-08 17:39           ` alyssa
2023-03-08 17:44             ` Asahi Lina
2023-03-08 18:13             ` Christian König
2023-04-05 13:52   ` Daniel Vetter [this message]
2023-03-07 14:25 ` [PATCH RFC 12/18] rust: drm: sched: Add GPU scheduler abstraction Asahi Lina
2023-04-05 15:43   ` Daniel Vetter
2023-04-05 19:29     ` Daniel Vetter
2023-04-18  8:45       ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 13/18] drm/gem: Add a flag to control whether objects can be exported Asahi Lina
2023-04-05 14:55   ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 14/18] rust: drm: gem: Add set_exportable() method Asahi Lina
2023-03-07 14:25 ` [PATCH RFC 15/18] drm/asahi: Add the Asahi driver UAPI [DO NOT MERGE] Asahi Lina
2023-03-07 15:28   ` Karol Herbst
2023-03-07 14:25 ` [PATCH RFC 16/18] rust: bindings: Bind the Asahi DRM UAPI Asahi Lina
2023-03-07 14:25 ` [PATCH RFC 17/18] rust: macros: Add versions macro Asahi Lina
2023-03-07 16:17 ` [PATCH RFC 00/18] Rust DRM subsystem abstractions (& preview AGX driver) Asahi Lina
     [not found] ` <20230307-rust-drm-v1-18-917ff5bc80a8@asahilina.net>
2023-04-05 14:44   ` [PATCH RFC 18/18] drm/asahi: Add the Asahi driver for Apple AGX GPUs Daniel Vetter
2023-04-06  5:02     ` Asahi Lina
2023-04-06  5:09       ` Asahi Lina
2023-04-06 11:25       ` [Linaro-mm-sig] " Daniel Vetter
2023-04-06 13:32         ` Asahi Lina
2023-04-06 13:54           ` Daniel Vetter
     [not found]   ` <ZC2HtBOaoUAzVCVH@phenom.ffwll.local>
2023-04-06  4:44     ` Asahi Lina
2023-04-06  5:09       ` Asahi Lina
2023-04-06 11:26         ` Daniel Vetter
2023-04-06 10:42       ` [Linaro-mm-sig] " Daniel Vetter
2023-04-06 11:55       ` Daniel Vetter
2023-04-06 13:15         ` Asahi Lina
2023-04-06 13:48           ` Daniel Vetter
2023-04-06 15:19             ` Asahi Lina

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZC19LIsNlhFC+7ME@phenom.ffwll.local \
    --to=daniel@ffwll.ch \
    --cc=airlied@gmail.com \
    --cc=alex.gaynor@gmail.com \
    --cc=alyssa@rosenzweig.io \
    --cc=asahi@lists.linux.dev \
    --cc=bjorn3_gh@protonmail.com \
    --cc=boqun.feng@gmail.com \
    --cc=christian.koenig@amd.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=ella@iglunix.org \
    --cc=faith.ekstrand@collabora.com \
    --cc=gary@garyguo.net \
    --cc=jarkko@kernel.org \
    --cc=kherbst@redhat.com \
    --cc=lina@asahilina.net \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=linux-sgx@vger.kernel.org \
    --cc=luben.tuikov@amd.com \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=mary@mary.zone \
    --cc=mripard@kernel.org \
    --cc=ojeda@kernel.org \
    --cc=rust-for-linux@vger.kernel.org \
    --cc=sumit.semwal@linaro.org \
    --cc=tzimmermann@suse.de \
    --cc=wedsonaf@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).