dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: Boris Brezillon <boris.brezillon@collabora.com>
To: Matthew Brost <matthew.brost@intel.com>
Cc: robdclark@chromium.org, thomas.hellstrom@linux.intel.com,
	sarah.walker@imgtec.com, ketil.johnsen@arm.com,
	Liviu.Dudau@arm.com, mcanal@igalia.com,
	dri-devel@lists.freedesktop.org, christian.koenig@amd.com,
	luben.tuikov@amd.com, dakr@redhat.com, donald.robson@imgtec.com,
	lina@asahilina.net, intel-xe@lists.freedesktop.org,
	faith.ekstrand@collabora.com
Subject: Re: [PATCH v6 5/7] drm/sched: Split free_job into own work item
Date: Mon, 23 Oct 2023 14:16:06 +0200	[thread overview]
Message-ID: <20231023141606.3dd53c39@collabora.com> (raw)
In-Reply-To: <20231017150958.838613-6-matthew.brost@intel.com>

Hi,

On Tue, 17 Oct 2023 08:09:56 -0700
Matthew Brost <matthew.brost@intel.com> wrote:

> +static void drm_sched_run_job_work(struct work_struct *w)
> +{
> +	struct drm_gpu_scheduler *sched =
> +		container_of(w, struct drm_gpu_scheduler, work_run_job);
> +	struct drm_sched_entity *entity;
> +	struct dma_fence *fence;
> +	struct drm_sched_fence *s_fence;
> +	struct drm_sched_job *sched_job;
> +	int r;
>  
> -		atomic_inc(&sched->hw_rq_count);
> -		drm_sched_job_begin(sched_job);
> +	if (READ_ONCE(sched->pause_submit))
> +		return;
> +
> +	entity = drm_sched_select_entity(sched, true);
> +	if (!entity)
> +		return;
>  
> -		trace_drm_run_job(sched_job, entity);
> -		fence = sched->ops->run_job(sched_job);
> +	sched_job = drm_sched_entity_pop_job(entity);
> +	if (!sched_job) {
>  		complete_all(&entity->entity_idle);
> -		drm_sched_fence_scheduled(s_fence, fence);
> +		return;	/* No more work */
> +	}
>  
> -		if (!IS_ERR_OR_NULL(fence)) {
> -			/* Drop for original kref_init of the fence */
> -			dma_fence_put(fence);
> +	s_fence = sched_job->s_fence;
>  
> -			r = dma_fence_add_callback(fence, &sched_job->cb,
> -						   drm_sched_job_done_cb);
> -			if (r == -ENOENT)
> -				drm_sched_job_done(sched_job, fence->error);
> -			else if (r)
> -				DRM_DEV_ERROR(sched->dev, "fence add callback failed (%d)\n",
> -					  r);
> -		} else {
> -			drm_sched_job_done(sched_job, IS_ERR(fence) ?
> -					   PTR_ERR(fence) : 0);
> -		}
> +	atomic_inc(&sched->hw_rq_count);
> +	drm_sched_job_begin(sched_job);
>  
> -		wake_up(&sched->job_scheduled);
> +	trace_drm_run_job(sched_job, entity);
> +	fence = sched->ops->run_job(sched_job);
> +	complete_all(&entity->entity_idle);
> +	drm_sched_fence_scheduled(s_fence, fence);
> +
> +	if (!IS_ERR_OR_NULL(fence)) {
> +		/* Drop for original kref_init of the fence */
> +		dma_fence_put(fence);
> +
> +		r = dma_fence_add_callback(fence, &sched_job->cb,
> +					   drm_sched_job_done_cb);
> +		if (r == -ENOENT)
> +			drm_sched_job_done(sched_job, fence->error);
> +		else if (r)
> +			DRM_DEV_ERROR(sched->dev, "fence add callback failed (%d)\n", r);
> +	} else {
> +		drm_sched_job_done(sched_job, IS_ERR(fence) ?
> +				   PTR_ERR(fence) : 0);
>  	}

Just ran into a race condition when using a non-ordered workqueue
for drm_sched:

thread A							thread B

drm_sched_run_job_work()			
  drm_sched_job_begin()
    // inserts jobA in pending_list

								drm_sched_free_job_work()
								  drm_sched_get_cleanup_job()
								    // check first job in pending list
								    // if s_fence->parent == NULL, consider
								    // for cleanup
								  ->free_job(jobA)
								    drm_sched_job_cleanup()
								      // set sched_job->s_fence = NULL

    ->run_job()
    drm_sched_fence_scheduled()
      // sched_job->s_fence->parent = parent_fence
      // BOOM => NULL pointer deref

For now, I'll just use a dedicated ordered wq, but if we claim
multi-threaded workqueues are supported, this is probably worth fixing.
I know there's been some discussions about when the timeout should be
started, and the job insertion in the pending_list is kinda related.
If we want this insertion to happen before ->run_job() is called, we
need a way to flag when a job is inserted, but not fully submitted yet.

Regards,

Boris

  parent reply	other threads:[~2023-10-23 12:16 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-17 15:09 [PATCH v6 0/7] DRM scheduler changes for Xe Matthew Brost
2023-10-17 15:09 ` [PATCH v6 1/7] drm/sched: Add drm_sched_wqueue_* helpers Matthew Brost
2023-10-17 15:09 ` [PATCH v6 2/7] drm/sched: Convert drm scheduler to use a work queue rather than kthread Matthew Brost
2023-10-17 15:09 ` [PATCH v6 3/7] drm/sched: Move schedule policy to scheduler Matthew Brost
2023-10-24  3:34   ` Luben Tuikov
2023-10-17 15:09 ` [PATCH v6 4/7] drm/sched: Add DRM_SCHED_POLICY_SINGLE_ENTITY scheduling policy Matthew Brost
2023-10-24  3:50   ` Luben Tuikov
2023-10-25 15:13     ` Matthew Brost
2023-10-25 18:39       ` Luben Tuikov
2023-10-17 15:09 ` [PATCH v6 5/7] drm/sched: Split free_job into own work item Matthew Brost
2023-10-19  1:25   ` Luben Tuikov
2023-10-19 16:55     ` Matthew Brost
2023-10-22  1:27       ` Luben Tuikov
2023-10-23 12:16   ` Boris Brezillon [this message]
2023-10-23 12:39     ` Boris Brezillon
2023-10-23 13:54       ` Matthew Brost
2023-10-23 14:22         ` Boris Brezillon
2023-10-23 22:03           ` Matthew Brost
2023-10-17 15:09 ` [PATCH v6 6/7] drm/sched: Add drm_sched_start_timeout_unlocked helper Matthew Brost
2023-10-17 15:09 ` [PATCH v6 7/7] drm/sched: Add a helper to queue TDR immediately Matthew Brost
2023-10-18 16:18   ` Luben Tuikov
2023-10-18  8:24 ` [PATCH v6 0/7] DRM scheduler changes for Xe Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231023141606.3dd53c39@collabora.com \
    --to=boris.brezillon@collabora.com \
    --cc=Liviu.Dudau@arm.com \
    --cc=christian.koenig@amd.com \
    --cc=dakr@redhat.com \
    --cc=donald.robson@imgtec.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=faith.ekstrand@collabora.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=ketil.johnsen@arm.com \
    --cc=lina@asahilina.net \
    --cc=luben.tuikov@amd.com \
    --cc=matthew.brost@intel.com \
    --cc=mcanal@igalia.com \
    --cc=robdclark@chromium.org \
    --cc=sarah.walker@imgtec.com \
    --cc=thomas.hellstrom@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).