dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] drm/sched: Wait for the currently popped dependency in kill_jobs_cb()
@ 2023-06-08  6:55 Boris Brezillon
  2023-06-08  7:05 ` Boris Brezillon
  2023-06-09 11:53 ` Christian König
  0 siblings, 2 replies; 5+ messages in thread
From: Boris Brezillon @ 2023-06-08  6:55 UTC (permalink / raw)
  To: dri-devel
  Cc: Luben Tuikov, Sarah Walker, Christian König,
	Boris Brezillon, Donald Robson, Sumit Semwal

If I understand correctly, drm_sched_entity_kill_jobs_cb() is supposed
to wait on all the external dependencies (those added to
drm_sched_job::dependencies) before signaling the job finished fence.
This is done this way to prevent jobs depending on these canceled jobs
from considering the resources they want to access as ready, when
they're actually still used by other components, thus leading to
potential data corruptions.

The problem is, the kill_jobs logic is omitting the last fence popped
from the dependencies array that was waited upon before
drm_sched_entity_kill() was called (drm_sched_entity::dependency field),
so we're basically waiting for all dependencies except one.

This is an attempt at fixing that, but also an opportunity to make sure
I understood the drm_sched_entity_kill(), because I'm not 100% sure if
skipping this currently popped dependency was intentional or not. I can't
see a good reason why we'd want to skip that one, but I might be missing
something.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Cc: Frank Binns <frank.binns@imgtec.com>
Cc: Sarah Walker <sarah.walker@imgtec.com>
Cc: Donald Robson <donald.robson@imgtec.com>
Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: David Airlie <airlied@gmail.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: "Christian König" <christian.koenig@amd.com>
---
Stumbled across this issue while working on native dependency support
with Donald (which will be posted soon). Flagged as RFC, because I'm
not sure this is legit, and also not sure we want to fix it this way.
I tried re-using drm_sched_entity::dependency, but it's a bit of a mess
because of the asynchronousity of the wait, and the fact we use
drm_sched_entity::dependency to know if we have a clear_dep()
callback registered, so we can easily reset it without removing the
callback.
---
 drivers/gpu/drm/scheduler/sched_entity.c | 40 ++++++++++++++++++------
 drivers/gpu/drm/scheduler/sched_main.c   |  3 ++
 include/drm/gpu_scheduler.h              | 12 +++++++
 3 files changed, 45 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index 68e807ae136a..3821f9adf7bd 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -176,20 +176,35 @@ static void drm_sched_entity_kill_jobs_cb(struct dma_fence *f,
 {
 	struct drm_sched_job *job = container_of(cb, struct drm_sched_job,
 						 finish_cb);
-	int r;
 
 	dma_fence_put(f);
 
-	/* Wait for all dependencies to avoid data corruptions */
-	while (!xa_empty(&job->dependencies)) {
-		f = xa_erase(&job->dependencies, job->last_dependency++);
-		r = dma_fence_add_callback(f, &job->finish_cb,
-					   drm_sched_entity_kill_jobs_cb);
-		if (!r)
+	/* Wait for all remaining dependencies to avoid data corruptions.
+	 *
+	 * We first check the last dependency popped from job->dependencies,
+	 * and then walk job->dependencies.
+	 *
+	 * Note that we don't wait on the last fence returned by
+	 * drm_gpu_scheduler_ops::prepare_job(), nor do we call
+	 * drm_gpu_scheduler_ops::prepare_job() to empty the list of potential
+	 * internal dependencies the driver might want to wait on before
+	 * scheduling the job. We simply assume skipping internal dependencies
+	 * can't cause data corruption on resources passed to the job.
+	 */
+	do {
+		f = job->cur_dep;
+
+		if (!xa_empty(&job->dependencies))
+			job->cur_dep = xa_erase(&job->dependencies, job->last_dependency++);
+		else
+			job->cur_dep = NULL;
+
+		if (f &&
+		    !dma_fence_add_callback(f, &job->finish_cb, drm_sched_entity_kill_jobs_cb))
 			return;
 
 		dma_fence_put(f);
-	}
+	} while (job->cur_dep);
 
 	INIT_WORK(&job->work, drm_sched_entity_kill_jobs_work);
 	schedule_work(&job->work);
@@ -415,8 +430,13 @@ static struct dma_fence *
 drm_sched_job_dependency(struct drm_sched_job *job,
 			 struct drm_sched_entity *entity)
 {
-	if (!xa_empty(&job->dependencies))
-		return xa_erase(&job->dependencies, job->last_dependency++);
+	dma_fence_put(job->cur_dep);
+	job->cur_dep = NULL;
+
+	if (!xa_empty(&job->dependencies)) {
+		job->cur_dep = xa_erase(&job->dependencies, job->last_dependency++);
+		return dma_fence_get(job->cur_dep);
+	}
 
 	if (job->sched->ops->prepare_job)
 		return job->sched->ops->prepare_job(job, entity);
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 394010a60821..70ab60e5760c 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -839,6 +839,9 @@ void drm_sched_job_cleanup(struct drm_sched_job *job)
 
 	job->s_fence = NULL;
 
+	if (WARN_ON(job->cur_dep))
+		dma_fence_put(job->cur_dep);
+
 	xa_for_each(&job->dependencies, index, fence) {
 		dma_fence_put(fence);
 	}
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index e95b4837e5a3..8e8e3decdc80 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -366,6 +366,18 @@ struct drm_sched_job {
 	/** @last_dependency: tracks @dependencies as they signal */
 	unsigned long			last_dependency;
 
+	/*
+	 * @cur_dep:
+	 *
+	 * Last dependency popped from @dependencies. Dependencies returned by
+	 * drm_gpu_scheculer_ops::prepare_job() are not recorded here.
+	 *
+	 * We need to keep track of the last dependencies popped from
+	 * @dependencies so we can wait on the already popped dependency when
+	 * drm_sched_entity_kill_jobs_cb() is called.
+	 */
+	struct dma_fence		*cur_dep;
+
 	/**
 	 * @submit_ts:
 	 *
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] drm/sched: Wait for the currently popped dependency in kill_jobs_cb()
  2023-06-08  6:55 [RFC PATCH] drm/sched: Wait for the currently popped dependency in kill_jobs_cb() Boris Brezillon
@ 2023-06-08  7:05 ` Boris Brezillon
  2023-06-09 11:53 ` Christian König
  1 sibling, 0 replies; 5+ messages in thread
From: Boris Brezillon @ 2023-06-08  7:05 UTC (permalink / raw)
  To: dri-devel
  Cc: Sarah Walker, Christian König, Luben Tuikov, Donald Robson,
	Sumit Semwal

On Thu,  8 Jun 2023 08:55:51 +0200
Boris Brezillon <boris.brezillon@collabora.com> wrote:

> If I understand correctly, drm_sched_entity_kill_jobs_cb() is supposed
> to wait on all the external dependencies (those added to
> drm_sched_job::dependencies) before signaling the job finished fence.
> This is done this way to prevent jobs depending on these canceled jobs
> from considering the resources they want to access as ready, when
> they're actually still used by other components, thus leading to
> potential data corruptions.
> 
> The problem is, the kill_jobs logic is omitting the last fence popped
> from the dependencies array that was waited upon before
> drm_sched_entity_kill() was called (drm_sched_entity::dependency field),
> so we're basically waiting for all dependencies except one.
> 
> This is an attempt at fixing that, but also an opportunity to make sure
> I understood the drm_sched_entity_kill(), because I'm not 100% sure if

               ^ the drm_sched_entity_kill() logic correctly, ...

> skipping this currently popped dependency was intentional or not. I can't
> see a good reason why we'd want to skip that one, but I might be missing
> something.
> 
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> Cc: Frank Binns <frank.binns@imgtec.com>
> Cc: Sarah Walker <sarah.walker@imgtec.com>
> Cc: Donald Robson <donald.robson@imgtec.com>
> Cc: Luben Tuikov <luben.tuikov@amd.com>
> Cc: David Airlie <airlied@gmail.com>
> Cc: Daniel Vetter <daniel@ffwll.ch>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: "Christian König" <christian.koenig@amd.com>
> ---
> Stumbled across this issue while working on native dependency support
> with Donald (which will be posted soon). Flagged as RFC, because I'm
> not sure this is legit, and also not sure we want to fix it this way.
> I tried re-using drm_sched_entity::dependency, but it's a bit of a mess
> because of the asynchronousity of the wait, and the fact we use
> drm_sched_entity::dependency to know if we have a clear_dep()
> callback registered, so we can easily reset it without removing the

                          ^ we can't ...

> callback.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] drm/sched: Wait for the currently popped dependency in kill_jobs_cb()
  2023-06-08  6:55 [RFC PATCH] drm/sched: Wait for the currently popped dependency in kill_jobs_cb() Boris Brezillon
  2023-06-08  7:05 ` Boris Brezillon
@ 2023-06-09 11:53 ` Christian König
  2023-06-09 14:10   ` Boris Brezillon
  1 sibling, 1 reply; 5+ messages in thread
From: Christian König @ 2023-06-09 11:53 UTC (permalink / raw)
  To: Boris Brezillon, dri-devel
  Cc: Sarah Walker, Luben Tuikov, Donald Robson, Sumit Semwal

Am 08.06.23 um 08:55 schrieb Boris Brezillon:
> If I understand correctly, drm_sched_entity_kill_jobs_cb() is supposed
> to wait on all the external dependencies (those added to
> drm_sched_job::dependencies) before signaling the job finished fence.
> This is done this way to prevent jobs depending on these canceled jobs
> from considering the resources they want to access as ready, when
> they're actually still used by other components, thus leading to
> potential data corruptions.
>
> The problem is, the kill_jobs logic is omitting the last fence popped
> from the dependencies array that was waited upon before
> drm_sched_entity_kill() was called (drm_sched_entity::dependency field),
> so we're basically waiting for all dependencies except one.
>
> This is an attempt at fixing that, but also an opportunity to make sure
> I understood the drm_sched_entity_kill(), because I'm not 100% sure if
> skipping this currently popped dependency was intentional or not. I can't
> see a good reason why we'd want to skip that one, but I might be missing
> something.
>
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> Cc: Frank Binns <frank.binns@imgtec.com>
> Cc: Sarah Walker <sarah.walker@imgtec.com>
> Cc: Donald Robson <donald.robson@imgtec.com>
> Cc: Luben Tuikov <luben.tuikov@amd.com>
> Cc: David Airlie <airlied@gmail.com>
> Cc: Daniel Vetter <daniel@ffwll.ch>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: "Christian König" <christian.koenig@amd.com>
> ---
> Stumbled across this issue while working on native dependency support
> with Donald (which will be posted soon). Flagged as RFC, because I'm
> not sure this is legit, and also not sure we want to fix it this way.
> I tried re-using drm_sched_entity::dependency, but it's a bit of a mess
> because of the asynchronousity of the wait, and the fact we use
> drm_sched_entity::dependency to know if we have a clear_dep()
> callback registered, so we can easily reset it without removing the
> callback.

Well yes, that's a known problem. But this is really not the right 
approach to fixing this.

Trying to wait for all the dependencies before killing jobs was added 
because of the way we kept track of dma_fences in dma_resv objects. 
Basically adding exclusive fences removed all other fences leading to a 
bit fragile memory management.

This handling was removed by now and so the workaround for waiting for 
dependencies is not really necessary any more, but I think it is still 
better to do so.

The right approach of getting this waiting for dependencies completely 
straight is also not to touch entity->dependency in any way, but to stop 
removing them from the XA in drm_sched_job_dependency(). Otherwise you 
don't catch the pipeline optimized ones either.

Regards,
Christian.

> ---
>   drivers/gpu/drm/scheduler/sched_entity.c | 40 ++++++++++++++++++------
>   drivers/gpu/drm/scheduler/sched_main.c   |  3 ++
>   include/drm/gpu_scheduler.h              | 12 +++++++
>   3 files changed, 45 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> index 68e807ae136a..3821f9adf7bd 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -176,20 +176,35 @@ static void drm_sched_entity_kill_jobs_cb(struct dma_fence *f,
>   {
>   	struct drm_sched_job *job = container_of(cb, struct drm_sched_job,
>   						 finish_cb);
> -	int r;
>   
>   	dma_fence_put(f);
>   
> -	/* Wait for all dependencies to avoid data corruptions */
> -	while (!xa_empty(&job->dependencies)) {
> -		f = xa_erase(&job->dependencies, job->last_dependency++);
> -		r = dma_fence_add_callback(f, &job->finish_cb,
> -					   drm_sched_entity_kill_jobs_cb);
> -		if (!r)
> +	/* Wait for all remaining dependencies to avoid data corruptions.
> +	 *
> +	 * We first check the last dependency popped from job->dependencies,
> +	 * and then walk job->dependencies.
> +	 *
> +	 * Note that we don't wait on the last fence returned by
> +	 * drm_gpu_scheduler_ops::prepare_job(), nor do we call
> +	 * drm_gpu_scheduler_ops::prepare_job() to empty the list of potential
> +	 * internal dependencies the driver might want to wait on before
> +	 * scheduling the job. We simply assume skipping internal dependencies
> +	 * can't cause data corruption on resources passed to the job.
> +	 */
> +	do {
> +		f = job->cur_dep;
> +
> +		if (!xa_empty(&job->dependencies))
> +			job->cur_dep = xa_erase(&job->dependencies, job->last_dependency++);
> +		else
> +			job->cur_dep = NULL;
> +
> +		if (f &&
> +		    !dma_fence_add_callback(f, &job->finish_cb, drm_sched_entity_kill_jobs_cb))
>   			return;
>   
>   		dma_fence_put(f);
> -	}
> +	} while (job->cur_dep);
>   
>   	INIT_WORK(&job->work, drm_sched_entity_kill_jobs_work);
>   	schedule_work(&job->work);
> @@ -415,8 +430,13 @@ static struct dma_fence *
>   drm_sched_job_dependency(struct drm_sched_job *job,
>   			 struct drm_sched_entity *entity)
>   {
> -	if (!xa_empty(&job->dependencies))
> -		return xa_erase(&job->dependencies, job->last_dependency++);
> +	dma_fence_put(job->cur_dep);
> +	job->cur_dep = NULL;
> +
> +	if (!xa_empty(&job->dependencies)) {
> +		job->cur_dep = xa_erase(&job->dependencies, job->last_dependency++);
> +		return dma_fence_get(job->cur_dep);
> +	}
>   
>   	if (job->sched->ops->prepare_job)
>   		return job->sched->ops->prepare_job(job, entity);
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 394010a60821..70ab60e5760c 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -839,6 +839,9 @@ void drm_sched_job_cleanup(struct drm_sched_job *job)
>   
>   	job->s_fence = NULL;
>   
> +	if (WARN_ON(job->cur_dep))
> +		dma_fence_put(job->cur_dep);
> +
>   	xa_for_each(&job->dependencies, index, fence) {
>   		dma_fence_put(fence);
>   	}
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index e95b4837e5a3..8e8e3decdc80 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -366,6 +366,18 @@ struct drm_sched_job {
>   	/** @last_dependency: tracks @dependencies as they signal */
>   	unsigned long			last_dependency;
>   
> +	/*
> +	 * @cur_dep:
> +	 *
> +	 * Last dependency popped from @dependencies. Dependencies returned by
> +	 * drm_gpu_scheculer_ops::prepare_job() are not recorded here.
> +	 *
> +	 * We need to keep track of the last dependencies popped from
> +	 * @dependencies so we can wait on the already popped dependency when
> +	 * drm_sched_entity_kill_jobs_cb() is called.
> +	 */
> +	struct dma_fence		*cur_dep;
> +
>   	/**
>   	 * @submit_ts:
>   	 *


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] drm/sched: Wait for the currently popped dependency in kill_jobs_cb()
  2023-06-09 11:53 ` Christian König
@ 2023-06-09 14:10   ` Boris Brezillon
  2023-06-09 14:29     ` Christian König
  0 siblings, 1 reply; 5+ messages in thread
From: Boris Brezillon @ 2023-06-09 14:10 UTC (permalink / raw)
  To: Christian König
  Cc: Sarah Walker, dri-devel, Luben Tuikov, Donald Robson, Sumit Semwal

Hello Christian,

On Fri, 9 Jun 2023 13:53:59 +0200
Christian König <christian.koenig@amd.com> wrote:

> Am 08.06.23 um 08:55 schrieb Boris Brezillon:
> > If I understand correctly, drm_sched_entity_kill_jobs_cb() is supposed
> > to wait on all the external dependencies (those added to
> > drm_sched_job::dependencies) before signaling the job finished fence.
> > This is done this way to prevent jobs depending on these canceled jobs
> > from considering the resources they want to access as ready, when
> > they're actually still used by other components, thus leading to
> > potential data corruptions.
> >
> > The problem is, the kill_jobs logic is omitting the last fence popped
> > from the dependencies array that was waited upon before
> > drm_sched_entity_kill() was called (drm_sched_entity::dependency field),
> > so we're basically waiting for all dependencies except one.
> >
> > This is an attempt at fixing that, but also an opportunity to make sure
> > I understood the drm_sched_entity_kill(), because I'm not 100% sure if
> > skipping this currently popped dependency was intentional or not. I can't
> > see a good reason why we'd want to skip that one, but I might be missing
> > something.
> >
> > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > Cc: Frank Binns <frank.binns@imgtec.com>
> > Cc: Sarah Walker <sarah.walker@imgtec.com>
> > Cc: Donald Robson <donald.robson@imgtec.com>
> > Cc: Luben Tuikov <luben.tuikov@amd.com>
> > Cc: David Airlie <airlied@gmail.com>
> > Cc: Daniel Vetter <daniel@ffwll.ch>
> > Cc: Sumit Semwal <sumit.semwal@linaro.org>
> > Cc: "Christian König" <christian.koenig@amd.com>
> > ---
> > Stumbled across this issue while working on native dependency support
> > with Donald (which will be posted soon). Flagged as RFC, because I'm
> > not sure this is legit, and also not sure we want to fix it this way.
> > I tried re-using drm_sched_entity::dependency, but it's a bit of a mess
> > because of the asynchronousity of the wait, and the fact we use
> > drm_sched_entity::dependency to know if we have a clear_dep()
> > callback registered, so we can easily reset it without removing the
> > callback.  
> 
> Well yes, that's a known problem. But this is really not the right 
> approach to fixing this.
> 
> Trying to wait for all the dependencies before killing jobs was added 
> because of the way we kept track of dma_fences in dma_resv objects. 
> Basically adding exclusive fences removed all other fences leading to a 
> bit fragile memory management.

Okay.

> 
> This handling was removed by now and so the workaround for waiting for 
> dependencies is not really necessary any more, but I think it is still 
> better to do so.
> 
> The right approach of getting this waiting for dependencies completely 
> straight is also not to touch entity->dependency in any way, but to stop 
> removing them from the XA in drm_sched_job_dependency(). Otherwise you 
> don't catch the pipeline optimized ones either.

Do you want me to post a v2 doing that, or should I forget about it?
If we decide to keep things like that, it might be good to at least
add a comment explaining why we don't care.

Regards,

Boris

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] drm/sched: Wait for the currently popped dependency in kill_jobs_cb()
  2023-06-09 14:10   ` Boris Brezillon
@ 2023-06-09 14:29     ` Christian König
  0 siblings, 0 replies; 5+ messages in thread
From: Christian König @ 2023-06-09 14:29 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Sarah Walker, dri-devel, Luben Tuikov, Donald Robson, Sumit Semwal

Am 09.06.23 um 16:10 schrieb Boris Brezillon:
> Hello Christian,
>
> On Fri, 9 Jun 2023 13:53:59 +0200
> Christian König <christian.koenig@amd.com> wrote:
>
>> Am 08.06.23 um 08:55 schrieb Boris Brezillon:
>>> If I understand correctly, drm_sched_entity_kill_jobs_cb() is supposed
>>> to wait on all the external dependencies (those added to
>>> drm_sched_job::dependencies) before signaling the job finished fence.
>>> This is done this way to prevent jobs depending on these canceled jobs
>>> from considering the resources they want to access as ready, when
>>> they're actually still used by other components, thus leading to
>>> potential data corruptions.
>>>
>>> The problem is, the kill_jobs logic is omitting the last fence popped
>>> from the dependencies array that was waited upon before
>>> drm_sched_entity_kill() was called (drm_sched_entity::dependency field),
>>> so we're basically waiting for all dependencies except one.
>>>
>>> This is an attempt at fixing that, but also an opportunity to make sure
>>> I understood the drm_sched_entity_kill(), because I'm not 100% sure if
>>> skipping this currently popped dependency was intentional or not. I can't
>>> see a good reason why we'd want to skip that one, but I might be missing
>>> something.
>>>
>>> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
>>> Cc: Frank Binns <frank.binns@imgtec.com>
>>> Cc: Sarah Walker <sarah.walker@imgtec.com>
>>> Cc: Donald Robson <donald.robson@imgtec.com>
>>> Cc: Luben Tuikov <luben.tuikov@amd.com>
>>> Cc: David Airlie <airlied@gmail.com>
>>> Cc: Daniel Vetter <daniel@ffwll.ch>
>>> Cc: Sumit Semwal <sumit.semwal@linaro.org>
>>> Cc: "Christian König" <christian.koenig@amd.com>
>>> ---
>>> Stumbled across this issue while working on native dependency support
>>> with Donald (which will be posted soon). Flagged as RFC, because I'm
>>> not sure this is legit, and also not sure we want to fix it this way.
>>> I tried re-using drm_sched_entity::dependency, but it's a bit of a mess
>>> because of the asynchronousity of the wait, and the fact we use
>>> drm_sched_entity::dependency to know if we have a clear_dep()
>>> callback registered, so we can easily reset it without removing the
>>> callback.
>> Well yes, that's a known problem. But this is really not the right
>> approach to fixing this.
>>
>> Trying to wait for all the dependencies before killing jobs was added
>> because of the way we kept track of dma_fences in dma_resv objects.
>> Basically adding exclusive fences removed all other fences leading to a
>> bit fragile memory management.
> Okay.
>
>> This handling was removed by now and so the workaround for waiting for
>> dependencies is not really necessary any more, but I think it is still
>> better to do so.
>>
>> The right approach of getting this waiting for dependencies completely
>> straight is also not to touch entity->dependency in any way, but to stop
>> removing them from the XA in drm_sched_job_dependency(). Otherwise you
>> don't catch the pipeline optimized ones either.
> Do you want me to post a v2 doing that, or should I forget about it?
> If we decide to keep things like that, it might be good to at least
> add a comment explaining why we don't care.

Well, if you have time fixing this fully and keeping the dependency 
handling is certainly be preferred.

Regards,
Christian.

>
> Regards,
>
> Boris


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-06-09 14:29 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-08  6:55 [RFC PATCH] drm/sched: Wait for the currently popped dependency in kill_jobs_cb() Boris Brezillon
2023-06-08  7:05 ` Boris Brezillon
2023-06-09 11:53 ` Christian König
2023-06-09 14:10   ` Boris Brezillon
2023-06-09 14:29     ` Christian König

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).