Why we break out the loops when there are pending bos to be released?

We do this anyway if we can't acquire the necessary locks. Freeing already deleted BOs is just a very lazy background work.

So it did not break anything with this patch I think.

Oh, the patch will certainly work. I'm just not sure if it's the ideal behavior.

https://elixir.bootlin.com/linux/latest/source/mm/slab.c#L4026

This is another example of the usage of  cond_sched.

Yes, and that is also a good example of what I mean here:

	if (!mutex_trylock(&slab_mutex))
		/* Give up. Setup the next iteration. */
		goto out;

If the function can't acquire the lock immediately it gives up and waits for the next iteration.

I think it would be better if we do this in TTM as well if we spend to much time cleaning up old BOs.

On the other hand you are right that cond_resched() has the advantage that we could spend more time on cleaning up old BOs if there is nothing else for the CPU TODO.

Regards,
Christian.

Am 09.04.20 um 16:24 schrieb Pan, Xinhui:
https://elixir.bootlin.com/linux/latest/source/mm/slab.c#L4026

This is another example of the usage of  cond_sched.

From: Pan, Xinhui <Xinhui.Pan@amd.com>
Sent: Thursday, April 9, 2020 10:11:08 PM
To: Lucas Stach <l.stach@pengutronix.de>; amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; Koenig, Christian <Christian.Koenig@amd.com>
Cc: dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org>
Subject: Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed delete worker
 
I think it doesn't matter if workitem schedule out. Even we did not schedule out, the workqueue itself will schedule out later.
So it did not break anything with this patch I think.

From: Pan, Xinhui <Xinhui.Pan@amd.com>
Sent: Thursday, April 9, 2020 10:07:09 PM
To: Lucas Stach <l.stach@pengutronix.de>; amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; Koenig, Christian <Christian.Koenig@amd.com>
Cc: dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org>
Subject: Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed delete worker
 
Why we break out the loops when there are pending bos to be released?

And I just checked the process_one_work. Right after the work item callback is called,  the workqueue itself will call cond_resched. So I think

From: Koenig, Christian <Christian.Koenig@amd.com>
Sent: Thursday, April 9, 2020 9:38:24 PM
To: Lucas Stach <l.stach@pengutronix.de>; Pan, Xinhui <Xinhui.Pan@amd.com>; amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>
Cc: dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org>
Subject: Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed delete worker
 
Am 09.04.20 um 15:25 schrieb Lucas Stach:
> Am Donnerstag, den 09.04.2020, 14:35 +0200 schrieb Christian König:
>> Am 09.04.20 um 03:31 schrieb xinhui pan:
>>> The delayed delete list is per device which might be very huge. And in
>>> a heavy workload test, the list might always not be empty. That will
>>> trigger any RCU stall warnings or softlockups in non-preemptible kernels
>>> Lets do schedule out if possible in that case.
>> Mhm, I'm not sure if that is actually allowed. This is called from a
>> work item and those are not really supposed to be scheduled away.
> Huh? Workitems can schedule out just fine, otherwise they would be
> horribly broken when it comes to sleeping locks.

Let me refine the sentence: Work items are not really supposed to be
scheduled purposely. E.g. you shouldn't call schedule() or
cond_resched() like in the case here.

Getting scheduled away because we wait for a lock is of course perfectly
fine.

>   The workqueue code
> even has measures to keep the workqueues at the expected concurrency
> level by starting other workitems when one of them goes to sleep.

Yeah, and exactly that's what I would say we should avoid here :)

In other words work items can be scheduled away, but they should not if
not really necessary (e.g. waiting for a lock).

Otherwise as you said new threads for work item processing are started
up and I don't think we want that.

Just returning from the work item and waiting for the next cycle is most
likely the better option.

Regards,
Christian.

>
> Regards,
> Lucas
>
>> Maybe rather change the while into while (!list_empty(&bdev->ddestroy)
>> && !should_reschedule(0)).
>>
>> Christian.
>>
>>> Signed-off-by: xinhui pan <xinhui.pan@amd.com>
>>> ---
>>>    drivers/gpu/drm/ttm/ttm_bo.c | 1 +
>>>    1 file changed, 1 insertion(+)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
>>> index 9e07c3f75156..b8d853cab33b 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>> @@ -541,6 +541,7 @@ static bool ttm_bo_delayed_delete(struct ttm_bo_device *bdev, bool remove_all)
>>>              }
>>>   
>>>              ttm_bo_put(bo);
>>> +           cond_resched();
>>>              spin_lock(&glob->lru_lock);
>>>      }
>>>      list_splice_tail(&removed, &bdev->ddestroy);
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&amp;data=02%7C01%7Cchristian.koenig%40amd.com%7C0a47486676a74702f05408d7dc89839c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637220355504145868&amp;sdata=wbRkYBPI6mYuZjKBtQN3AGLDOwqJlWY3XUtwwSiUQHg%3D&amp;reserved=0