nouveau.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [Nouveau] [PATCH] drm/nouveau: Fix bug in buffer relocs for Nouveau
@ 2023-01-19 22:53 Tanmay Bhushan
  2023-03-27  8:42 ` John Ogness
  0 siblings, 1 reply; 3+ messages in thread
From: Tanmay Bhushan @ 2023-01-19 22:53 UTC (permalink / raw)
  To: Ben Skeggs, Karol Herbst, Lyude Paul, David Airlie,
	Daniel Vetter, dri-devel, nouveau, linux-kernel
  Cc: Tanmay Bhushan

dma_resv_wait_timeout returns greater than zero on success
as opposed to ttm_bo_wait_ctx. As a result of that relocs
will fail and give failure even when it was a success.

Signed-off-by: Tanmay Bhushan <007047221b@gmail.com>
---
 drivers/gpu/drm/nouveau/nouveau_gem.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
index f77e44958037..0e3690459144 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -706,9 +706,8 @@ nouveau_gem_pushbuf_reloc_apply(struct nouveau_cli *cli,
 		ret = dma_resv_wait_timeout(nvbo->bo.base.resv,
 					    DMA_RESV_USAGE_BOOKKEEP,
 					    false, 15 * HZ);
-		if (ret == 0)
+		if (ret <= 0) {
 			ret = -EBUSY;
-		if (ret) {
 			NV_PRINTK(err, cli, "reloc wait_idle failed: %ld\n",
 				  ret);
 			break;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [Nouveau] [PATCH] drm/nouveau: Fix bug in buffer relocs for Nouveau
  2023-01-19 22:53 [Nouveau] [PATCH] drm/nouveau: Fix bug in buffer relocs for Nouveau Tanmay Bhushan
@ 2023-03-27  8:42 ` John Ogness
  2023-03-27 19:53   ` Christian König
  0 siblings, 1 reply; 3+ messages in thread
From: John Ogness @ 2023-03-27  8:42 UTC (permalink / raw)
  To: Tanmay Bhushan, Ben Skeggs, Karol Herbst, Lyude Paul,
	David Airlie, Daniel Vetter, dri-devel, nouveau, linux-kernel,
	Christian König
  Cc: Tanmay Bhushan

On 2023-01-19, Tanmay Bhushan <007047221b@gmail.com> wrote:
> dma_resv_wait_timeout returns greater than zero on success
> as opposed to ttm_bo_wait_ctx. As a result of that relocs
> will fail and give failure even when it was a success.

Today I switched my workstation from 6.2 to 6.3-rc3 and started seeing
lots of new kernel messages:

[  642.138313][ T1751] nouveau 0000:f0:10.0: X[1751]: reloc wait_idle failed: 1500
[  642.138389][ T1751] nouveau 0000:f0:10.0: X[1751]: reloc apply: 1500
[  646.123490][ T1751] nouveau 0000:f0:10.0: X[1751]: reloc wait_idle failed: 1500
[  646.123573][ T1751] nouveau 0000:f0:10.0: X[1751]: reloc apply: 1500

The graphics seemed to go slower or hang a bit when these messages would
appear. I then found your patch! However, I have some comments about it.

First, it should include a fixes tag:

Fixes: 41d351f29528 ("drm/nouveau: stop using ttm_bo_wait")

> Signed-off-by: Tanmay Bhushan <007047221b@gmail.com>
> ---
>  drivers/gpu/drm/nouveau/nouveau_gem.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
> index f77e44958037..0e3690459144 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_gem.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
> @@ -706,9 +706,8 @@ nouveau_gem_pushbuf_reloc_apply(struct nouveau_cli *cli,
>  		ret = dma_resv_wait_timeout(nvbo->bo.base.resv,
>  					    DMA_RESV_USAGE_BOOKKEEP,
>  					    false, 15 * HZ);
> -		if (ret == 0)
> +		if (ret <= 0) {
>  			ret = -EBUSY;

This is incorrect for 2 reasons:

* it treats restarts as timeouts

* this function now returns >0 on success

> -		if (ret) {
>  			NV_PRINTK(err, cli, "reloc wait_idle failed: %ld\n",
>  				  ret);
>  			break;

I rearranged things to basically correctly translate the return code of
dma_resv_wait_timeout() to match the previous ttm_bo_wait():

		ret = dma_resv_wait_timeout(nvbo->bo.base.resv,
					    DMA_RESV_USAGE_BOOKKEEP,
					    false, 15 * HZ);
		if (ret == 0)
			ret = -EBUSY;
		if (ret > 0)
			ret = 0;
		if (ret) {
			NV_PRINTK(err, cli, "reloc wait_idle failed: %ld\n",
				  ret);
			break;
		}

So the patch just becomes:

@@ -708,6 +708,8 @@ nouveau_gem_pushbuf_reloc_apply(struct n
 					    false, 15 * HZ);
 		if (ret == 0)
 			ret = -EBUSY;
+		if (ret > 0)
+			ret = 0;
 		if (ret) {
 			NV_PRINTK(err, cli, "reloc wait_idle failed: %ld\n",
 				  ret);

With this variant, everything runs correctly on my workstation again.

It probably deserves a comment about why @ret is being translated. Or
perhaps a new variable should be introduced to separate the return value
of dma_resv_wait_timeout() from the return value of this function.

Either way, this is an important fix for 6.3-rc!

John Ogness

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Nouveau] [PATCH] drm/nouveau: Fix bug in buffer relocs for Nouveau
  2023-03-27  8:42 ` John Ogness
@ 2023-03-27 19:53   ` Christian König
  0 siblings, 0 replies; 3+ messages in thread
From: Christian König @ 2023-03-27 19:53 UTC (permalink / raw)
  To: John Ogness, Tanmay Bhushan, Ben Skeggs, Karol Herbst,
	Lyude Paul, David Airlie, Daniel Vetter, dri-devel, nouveau,
	linux-kernel

Am 27.03.23 um 10:42 schrieb John Ogness:
> On 2023-01-19, Tanmay Bhushan <007047221b@gmail.com> wrote:
>> dma_resv_wait_timeout returns greater than zero on success
>> as opposed to ttm_bo_wait_ctx. As a result of that relocs
>> will fail and give failure even when it was a success.
> Today I switched my workstation from 6.2 to 6.3-rc3 and started seeing
> lots of new kernel messages:
>
> [  642.138313][ T1751] nouveau 0000:f0:10.0: X[1751]: reloc wait_idle failed: 1500
> [  642.138389][ T1751] nouveau 0000:f0:10.0: X[1751]: reloc apply: 1500
> [  646.123490][ T1751] nouveau 0000:f0:10.0: X[1751]: reloc wait_idle failed: 1500
> [  646.123573][ T1751] nouveau 0000:f0:10.0: X[1751]: reloc apply: 1500
>
> The graphics seemed to go slower or hang a bit when these messages would
> appear. I then found your patch! However, I have some comments about it.
>
> First, it should include a fixes tag:
>
> Fixes: 41d351f29528 ("drm/nouveau: stop using ttm_bo_wait")
>
>> Signed-off-by: Tanmay Bhushan <007047221b@gmail.com>
>> ---
>>   drivers/gpu/drm/nouveau/nouveau_gem.c | 3 +--
>>   1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
>> index f77e44958037..0e3690459144 100644
>> --- a/drivers/gpu/drm/nouveau/nouveau_gem.c
>> +++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
>> @@ -706,9 +706,8 @@ nouveau_gem_pushbuf_reloc_apply(struct nouveau_cli *cli,
>>   		ret = dma_resv_wait_timeout(nvbo->bo.base.resv,
>>   					    DMA_RESV_USAGE_BOOKKEEP,
>>   					    false, 15 * HZ);
>> -		if (ret == 0)
>> +		if (ret <= 0) {
>>   			ret = -EBUSY;
> This is incorrect for 2 reasons:
>
> * it treats restarts as timeouts
>
> * this function now returns >0 on success
>
>> -		if (ret) {
>>   			NV_PRINTK(err, cli, "reloc wait_idle failed: %ld\n",
>>   				  ret);
>>   			break;
> I rearranged things to basically correctly translate the return code of
> dma_resv_wait_timeout() to match the previous ttm_bo_wait():
>
> 		ret = dma_resv_wait_timeout(nvbo->bo.base.resv,
> 					    DMA_RESV_USAGE_BOOKKEEP,
> 					    false, 15 * HZ);
> 		if (ret == 0)
> 			ret = -EBUSY;
> 		if (ret > 0)
> 			ret = 0;
> 		if (ret) {
> 			NV_PRINTK(err, cli, "reloc wait_idle failed: %ld\n",
> 				  ret);
> 			break;
> 		}
>
> So the patch just becomes:
>
> @@ -708,6 +708,8 @@ nouveau_gem_pushbuf_reloc_apply(struct n
>   					    false, 15 * HZ);
>   		if (ret == 0)
>   			ret = -EBUSY;
> +		if (ret > 0)
> +			ret = 0;
>   		if (ret) {
>   			NV_PRINTK(err, cli, "reloc wait_idle failed: %ld\n",
>   				  ret);
>
> With this variant, everything runs correctly on my workstation again.
>
> It probably deserves a comment about why @ret is being translated. Or
> perhaps a new variable should be introduced to separate the return value
> of dma_resv_wait_timeout() from the return value of this function.

I'm going to take a look tomorrow, but your code already looks pretty 
correct to me.

And sorry for the noise, missed the different in the conversion.

Thanks,
Christian.

>
> Either way, this is an important fix for 6.3-rc!
>
> John Ogness


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-05-04 12:32 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-19 22:53 [Nouveau] [PATCH] drm/nouveau: Fix bug in buffer relocs for Nouveau Tanmay Bhushan
2023-03-27  8:42 ` John Ogness
2023-03-27 19:53   ` Christian König

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).