All of lore.kernel.org
 help / color / mirror / Atom feed
* question about error handling in ttm_bo_handle_move_mem()
@ 2021-06-10 15:41 Dan Carpenter
  2021-06-16  6:30   ` Dan Carpenter
  0 siblings, 1 reply; 19+ messages in thread
From: Dan Carpenter @ 2021-06-10 15:41 UTC (permalink / raw)
  To: dri-devel; +Cc: Thomas Hellstrom

The new version of Firefox seems to trigger a refcounting bug in my
nouveau driver.  I tested a v4.15 kernel and that has the bug as well.
It seems like the refcounting is off if ttm_bo_evict() fails.  Dmesg
at the end.

I tried to see if I could spot anything off and I had a question about
ttm_bo_handle_move_mem().

drivers/gpu/drm/ttm/ttm_bo.c
   230  static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
   231                                    struct ttm_resource *mem, bool evict,
   232                                    struct ttm_operation_ctx *ctx,
   233                                    struct ttm_place *hop)
   234  {
   235          struct ttm_bo_device *bdev = bo->bdev;
   236          struct ttm_resource_manager *old_man = ttm_manager_type(bdev, bo->mem.mem_type);
   237          struct ttm_resource_manager *new_man = ttm_manager_type(bdev, mem->mem_type);

old_man and new_man are assigned here.

   238          int ret;
   239  
   240          ttm_bo_unmap_virtual(bo);
   241  
   242          /*
   243           * Create and bind a ttm if required.
   244           */
   245  
   246          if (new_man->use_tt) {
   247                  /* Zero init the new TTM structure if the old location should
   248                   * have used one as well.
   249                   */
   250                  ret = ttm_tt_create(bo, old_man->use_tt);
   251                  if (ret)
   252                          goto out_err;

This "goto out_err;" is a no-op.  Presumably that is intentional.  I
think if this create succeeds then the error handling is expected to
clean it up?

   253  
   254                  if (mem->mem_type != TTM_PL_SYSTEM) {
   255                          ret = ttm_tt_populate(bo->bdev, bo->ttm, ctx);
   256                          if (ret)
   257                                  goto out_err;
   258                  }
   259          }
   260  
   261          ret = bdev->driver->move(bo, evict, ctx, mem, hop);

On my system ->move() is returning -EINVAL

   262          if (ret) {
   263                  if (ret == -EMULTIHOP)
   264                          return ret;
   265                  goto out_err;
   266          }
   267  
   268          ctx->bytes_moved += bo->base.size;
   269          return 0;
   270  
   271  out_err:
   272          new_man = ttm_manager_type(bdev, bo->mem.mem_type);

This seems like a mistake.  This sets new_man to the same value as
old_man.  I don't understand why it needs to be re-assigned at all
though so maybe I'm missing something.


   273          if (!new_man->use_tt)

This test seems reversed.

Unfortunately, making these changes doesn't fix my crashes and I'm still
investigating.

   274                  ttm_bo_tt_destroy(bo);
   275  
   276          return ret;
   277  }

regards,
dan carpenter

[  159.893081] nouveau 0000:01:00.0: Direct firmware load for nouveau/nva8_fuc084 failed with error -2
[  159.893089] nouveau 0000:01:00.0: Direct firmware load for nouveau/nva8_fuc084d failed with error -2
[  159.893091] nouveau 0000:01:00.0: msvld: unable to load firmware data
[  159.893092] nouveau 0000:01:00.0: msvld: init failed, -19
[ 1945.479861] [TTM] Buffer eviction failed
[ 1945.479883] ------------[ cut here ]------------
[ 1945.479886] refcount_t: underflow; use-after-free.
[ 1945.479900] WARNING: CPU: 7 PID: 2528 at lib/refcount.c:28 refcount_warn_saturate+0xa6/0xf0
[ 1945.479914] Modules linked in: bnep(E) ctr(E) ccm(E) cpufreq_conservative(E) cpufreq_userspace(E) cpufreq_powersave(E) cpufreq_ondemand(E) tun(E) uinput(E) binfmt_misc(E) ath3k(E) btusb(E) btrtl(E) btbcm(E) btintel(E) bluetooth(E) jitterentropy_rng(E) drbg(E) ansi_cprng(E) ecdh_generic(E) ecc(E) intel_rapl_msr(E) intel_rapl_common(E) snd_hda_codec_realtek(E) x86_pkg_temp_thermal(E) snd_hda_codec_generic(E) ath9k(E) intel_powerclamp(E) ledtrig_audio(E) snd_hda_codec_hdmi(E) coretemp(E) ath9k_common(E) kvm_intel(E) ath9k_hw(E) snd_hda_intel(E) snd_intel_dspcfg(E) kvm(E) snd_intel_sdw_acpi(E) ath(E) irqbypass(E) snd_hda_codec(E) mac80211(E) snd_hda_core(E) ghash_clmulni_intel(E) snd_hwdep(E) aesni_intel(E) snd_pcm_oss(E) libaes(E) snd_mixer_oss(E) crypto_simd(E) dell_smm_hwmon(E) cfg80211(E) cryptd(E) snd_pcm(E) rapl(E) iTCO_wdt(E) snd_timer(E) intel_cstate(E) intel_pmc_bxt(E) snd(E) rfkill(E) iTCO_vendor_support(E) intel_uncore(E) pcspkr(E) libarc4(E) soundcore(E) mei_me(E) watchdog(E)
[ 1945.480005]  sg(E) at24(E) mei(E) evdev(E) nfsd(E) loop(E) auth_rpcgss(E) msr(E) nfs_acl(E) lockd(E) parport_pc(E) ppdev(E) grace(E) lp(E) parport(E) sunrpc(E) fuse(E) configfs(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) btrfs(E) blake2b_generic(E) xor(E) raid6_pq(E) libcrc32c(E) crc32c_generic(E) ums_realtek(E) uas(E) usb_storage(E) hid_generic(E) usbhid(E) hid(E) sd_mod(E) sr_mod(E) t10_pi(E) crc_t10dif(E) cdrom(E) crct10dif_generic(E) nouveau(E) mxm_wmi(E) wmi(E) i2c_algo_bit(E) drm_ttm_helper(E) ttm(E) xhci_pci(E) drm_kms_helper(E) ahci(E) r8169(E) crct10dif_pclmul(E) crct10dif_common(E) ehci_pci(E) realtek(E) lpc_ich(E) libahci(E) mdio_devres(E) cec(E) xhci_hcd(E) crc32_pclmul(E) libphy(E) ehci_hcd(E) crc32c_intel(E) libata(E) i2c_i801(E) i2c_smbus(E) drm(E) scsi_mod(E) usbcore(E) fan(E) video(E) button(E)
[ 1945.480157] CPU: 7 PID: 2528 Comm: Xorg Tainted: G            E     5.12.0+ #1
[ 1945.480164] Hardware name: Dell Inc. XPS 8700/0KWVT8, BIOS A06 11/18/2013
[ 1945.480168] RIP: 0010:refcount_warn_saturate+0xa6/0xf0
[ 1945.480177] Code: 05 b9 e2 3d 01 01 e8 79 e5 42 00 0f 0b c3 80 3d a7 e2 3d 01 00 75 95 48 c7 c7 68 61 f2 b1 c6 05 97 e2 3d 01 01 e8 5a e5 42 00 <0f> 0b c3 80 3d 86 e2 3d 01 00 0f 85 72 ff ff ff 48 c7 c7 c0 61 f2
[ 1945.480183] RSP: 0018:ffffbba402fd7d30 EFLAGS: 00010286
[ 1945.480188] RAX: 0000000000000000 RBX: 0000000000000002 RCX: ffff9194fedd8588
[ 1945.480192] RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff9194fedd8580
[ 1945.480196] RBP: ffff918e03c0d800 R08: 0000000000000000 R09: ffffbba402fd7b50
[ 1945.480199] R10: ffffbba402fd7b48 R11: ffffffffb24cc7c8 R12: ffffffffc08b4d20
[ 1945.480202] R13: ffff918e00c2e000 R14: ffff918e7e348c00 R15: ffff918e7e348c00
[ 1945.480206] FS:  00007fa1278f0a40(0000) GS:ffff9194fedc0000(0000) knlGS:0000000000000000
[ 1945.480211] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1945.480215] CR2: 00007f8c0802f000 CR3: 000000013f4bc003 CR4: 00000000001706e0
[ 1945.480219] Call Trace:
[ 1945.480225]  nouveau_gem_new+0xc1/0xf0 [nouveau]
[ 1945.480451]  nouveau_gem_ioctl_new+0x53/0xf0 [nouveau]
[ 1945.480618]  ? nouveau_gem_new+0xf0/0xf0 [nouveau]
[ 1945.480779]  drm_ioctl_kernel+0xaa/0xf0 [drm]
[ 1945.480837]  drm_ioctl+0x20f/0x3a0 [drm]
[ 1945.480883]  ? nouveau_gem_new+0xf0/0xf0 [nouveau]
[ 1945.481058]  nouveau_drm_ioctl+0x55/0xa0 [nouveau]
[ 1945.481233]  __x64_sys_ioctl+0x83/0xb0
[ 1945.481242]  do_syscall_64+0x33/0x80
[ 1945.481251]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 1945.481262] RIP: 0033:0x7fa127d5bcc7
[ 1945.481268] Code: 00 00 00 48 8b 05 c9 91 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 99 91 0c 00 f7 d8 64 89 01 48
[ 1945.481274] RSP: 002b:00007ffe54852078 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 1945.481281] RAX: ffffffffffffffda RBX: 00007ffe548520d0 RCX: 00007fa127d5bcc7
[ 1945.481285] RDX: 00007ffe548520d0 RSI: 00000000c0306480 RDI: 0000000000000010
[ 1945.481288] RBP: 00000000c0306480 R08: 0000000000000000 R09: 000055e83020e010
[ 1945.481292] R10: 00007fa127e25b80 R11: 0000000000000246 R12: 00007ffe548520d0
[ 1945.481296] R13: 0000000000000010 R14: 000055e8302c9fd0 R15: 0000000000001000
[ 1945.481302] ---[ end trace 1717583068871a81 ]---
[ 2081.413684] [TTM] Buffer eviction failed



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH] drm/ttm: fix error handling in ttm_bo_handle_move_mem()
  2021-06-10 15:41 question about error handling in ttm_bo_handle_move_mem() Dan Carpenter
@ 2021-06-16  6:30   ` Dan Carpenter
  0 siblings, 0 replies; 19+ messages in thread
From: Dan Carpenter @ 2021-06-16  6:30 UTC (permalink / raw)
  To: Christian Koenig, Thomas Hellstr, B6, m
  Cc: Huang Rui, David Airlie, Daniel Vetter, dri-devel, linux-kernel

There are three bugs here:
1) We need to call unpopulate() if ttm_tt_populate() succeeds.
2) The "new_man = ttm_manager_type(bdev, bo->mem.mem_type);" assignment
   was wrong and it was really assigning "new_mem = old_mem;".  There
   is no need for this assignment anyway as we already have the value
   for "new_mem".
3) The (!new_man->use_tt) condition is reversed.

Fixes: ba4e7d973dd0 ("drm: Add the TTM GPU memory manager subsystem.")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
---
This is from reading the code and I can't swear that I have understood
it correctly.  My nouveau driver is currently unusable and this patch
has not helped.  But hopefully if I fix enough bugs eventually it will
start to work.

 drivers/gpu/drm/ttm/ttm_bo.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index ebcffe794adb..72dde093f754 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -180,12 +180,12 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
 		 */
 		ret = ttm_tt_create(bo, old_man->use_tt);
 		if (ret)
-			goto out_err;
+			return ret;
 
 		if (mem->mem_type != TTM_PL_SYSTEM) {
 			ret = ttm_tt_populate(bo->bdev, bo->ttm, ctx);
 			if (ret)
-				goto out_err;
+				goto err_destroy;
 		}
 	}
 
@@ -193,15 +193,17 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
 	if (ret) {
 		if (ret == -EMULTIHOP)
 			return ret;
-		goto out_err;
+		goto err_unpopulate;
 	}
 
 	ctx->bytes_moved += bo->base.size;
 	return 0;
 
-out_err:
-	new_man = ttm_manager_type(bdev, bo->mem.mem_type);
-	if (!new_man->use_tt)
+err_unpopulate:
+	if (new_man->use_tt)
+		ttm_tt_unpopulate(bo->bdev, bo->ttm);
+err_destroy:
+	if (new_man->use_tt)
 		ttm_bo_tt_destroy(bo);
 
 	return ret;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH] drm/ttm: fix error handling in ttm_bo_handle_move_mem()
@ 2021-06-16  6:30   ` Dan Carpenter
  0 siblings, 0 replies; 19+ messages in thread
From: Dan Carpenter @ 2021-06-16  6:30 UTC (permalink / raw)
  To: Christian Koenig, Thomas Hellstr, B6, m
  Cc: David Airlie, Huang Rui, dri-devel, linux-kernel

There are three bugs here:
1) We need to call unpopulate() if ttm_tt_populate() succeeds.
2) The "new_man = ttm_manager_type(bdev, bo->mem.mem_type);" assignment
   was wrong and it was really assigning "new_mem = old_mem;".  There
   is no need for this assignment anyway as we already have the value
   for "new_mem".
3) The (!new_man->use_tt) condition is reversed.

Fixes: ba4e7d973dd0 ("drm: Add the TTM GPU memory manager subsystem.")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
---
This is from reading the code and I can't swear that I have understood
it correctly.  My nouveau driver is currently unusable and this patch
has not helped.  But hopefully if I fix enough bugs eventually it will
start to work.

 drivers/gpu/drm/ttm/ttm_bo.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index ebcffe794adb..72dde093f754 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -180,12 +180,12 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
 		 */
 		ret = ttm_tt_create(bo, old_man->use_tt);
 		if (ret)
-			goto out_err;
+			return ret;
 
 		if (mem->mem_type != TTM_PL_SYSTEM) {
 			ret = ttm_tt_populate(bo->bdev, bo->ttm, ctx);
 			if (ret)
-				goto out_err;
+				goto err_destroy;
 		}
 	}
 
@@ -193,15 +193,17 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
 	if (ret) {
 		if (ret == -EMULTIHOP)
 			return ret;
-		goto out_err;
+		goto err_unpopulate;
 	}
 
 	ctx->bytes_moved += bo->base.size;
 	return 0;
 
-out_err:
-	new_man = ttm_manager_type(bdev, bo->mem.mem_type);
-	if (!new_man->use_tt)
+err_unpopulate:
+	if (new_man->use_tt)
+		ttm_tt_unpopulate(bo->bdev, bo->ttm);
+err_destroy:
+	if (new_man->use_tt)
 		ttm_bo_tt_destroy(bo);
 
 	return ret;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] drm/ttm: fix error handling in ttm_bo_handle_move_mem()
  2021-06-16  6:30   ` Dan Carpenter
@ 2021-06-16  6:46     ` Christian König
  -1 siblings, 0 replies; 19+ messages in thread
From: Christian König @ 2021-06-16  6:46 UTC (permalink / raw)
  To: Dan Carpenter, Thomas Hellstr, B6, m
  Cc: Huang Rui, David Airlie, Daniel Vetter, dri-devel, linux-kernel

Sending the first message didn't worked, so let's try again.

Am 16.06.21 um 08:30 schrieb Dan Carpenter:
> There are three bugs here:
> 1) We need to call unpopulate() if ttm_tt_populate() succeeds.
> 2) The "new_man = ttm_manager_type(bdev, bo->mem.mem_type);" assignment
>     was wrong and it was really assigning "new_mem = old_mem;".  There
>     is no need for this assignment anyway as we already have the value
>     for "new_mem".
> 3) The (!new_man->use_tt) condition is reversed.
>
> Fixes: ba4e7d973dd0 ("drm: Add the TTM GPU memory manager subsystem.")
> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
> ---
> This is from reading the code and I can't swear that I have understood
> it correctly.  My nouveau driver is currently unusable and this patch
> has not helped.  But hopefully if I fix enough bugs eventually it will
> start to work.

Well NAK, the code previously looked quite well and you are breaking it now.

What's the problem with nouveau?

>   drivers/gpu/drm/ttm/ttm_bo.c | 14 ++++++++------
>   1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index ebcffe794adb..72dde093f754 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -180,12 +180,12 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
>   		 */
>   		ret = ttm_tt_create(bo, old_man->use_tt);
>   		if (ret)
> -			goto out_err;
> +			return ret;
>   
>   		if (mem->mem_type != TTM_PL_SYSTEM) {
>   			ret = ttm_tt_populate(bo->bdev, bo->ttm, ctx);
>   			if (ret)
> -				goto out_err;
> +				goto err_destroy;
>   		}
>   	}
>   
> @@ -193,15 +193,17 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
>   	if (ret) {
>   		if (ret == -EMULTIHOP)
>   			return ret;
> -		goto out_err;
> +		goto err_unpopulate;
>   	}
>   
>   	ctx->bytes_moved += bo->base.size;
>   	return 0;
>   
> -out_err:
> -	new_man = ttm_manager_type(bdev, bo->mem.mem_type);

This here switches new and old manager. E.g. the new_man is now pointing 
to the existing resource manager.

> -	if (!new_man->use_tt)

So we should destroy the TT object only if the old manager is not using one.

> +err_unpopulate:
> +	if (new_man->use_tt)
> +		ttm_tt_unpopulate(bo->bdev, bo->ttm);

Unpopulate is not necessary, destroying is sufficient.

Christian.

> +err_destroy:
> +	if (new_man->use_tt)
>   		ttm_bo_tt_destroy(bo);
>   
>   	return ret;


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] drm/ttm: fix error handling in ttm_bo_handle_move_mem()
@ 2021-06-16  6:46     ` Christian König
  0 siblings, 0 replies; 19+ messages in thread
From: Christian König @ 2021-06-16  6:46 UTC (permalink / raw)
  To: Dan Carpenter, Thomas Hellstr, B6, m
  Cc: David Airlie, Huang Rui, dri-devel, linux-kernel

Sending the first message didn't worked, so let's try again.

Am 16.06.21 um 08:30 schrieb Dan Carpenter:
> There are three bugs here:
> 1) We need to call unpopulate() if ttm_tt_populate() succeeds.
> 2) The "new_man = ttm_manager_type(bdev, bo->mem.mem_type);" assignment
>     was wrong and it was really assigning "new_mem = old_mem;".  There
>     is no need for this assignment anyway as we already have the value
>     for "new_mem".
> 3) The (!new_man->use_tt) condition is reversed.
>
> Fixes: ba4e7d973dd0 ("drm: Add the TTM GPU memory manager subsystem.")
> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
> ---
> This is from reading the code and I can't swear that I have understood
> it correctly.  My nouveau driver is currently unusable and this patch
> has not helped.  But hopefully if I fix enough bugs eventually it will
> start to work.

Well NAK, the code previously looked quite well and you are breaking it now.

What's the problem with nouveau?

>   drivers/gpu/drm/ttm/ttm_bo.c | 14 ++++++++------
>   1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index ebcffe794adb..72dde093f754 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -180,12 +180,12 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
>   		 */
>   		ret = ttm_tt_create(bo, old_man->use_tt);
>   		if (ret)
> -			goto out_err;
> +			return ret;
>   
>   		if (mem->mem_type != TTM_PL_SYSTEM) {
>   			ret = ttm_tt_populate(bo->bdev, bo->ttm, ctx);
>   			if (ret)
> -				goto out_err;
> +				goto err_destroy;
>   		}
>   	}
>   
> @@ -193,15 +193,17 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
>   	if (ret) {
>   		if (ret == -EMULTIHOP)
>   			return ret;
> -		goto out_err;
> +		goto err_unpopulate;
>   	}
>   
>   	ctx->bytes_moved += bo->base.size;
>   	return 0;
>   
> -out_err:
> -	new_man = ttm_manager_type(bdev, bo->mem.mem_type);

This here switches new and old manager. E.g. the new_man is now pointing 
to the existing resource manager.

> -	if (!new_man->use_tt)

So we should destroy the TT object only if the old manager is not using one.

> +err_unpopulate:
> +	if (new_man->use_tt)
> +		ttm_tt_unpopulate(bo->bdev, bo->ttm);

Unpopulate is not necessary, destroying is sufficient.

Christian.

> +err_destroy:
> +	if (new_man->use_tt)
>   		ttm_bo_tt_destroy(bo);
>   
>   	return ret;


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] drm/ttm: fix error handling in ttm_bo_handle_move_mem()
  2021-06-16  6:46     ` Christian König
@ 2021-06-16  8:37       ` Dan Carpenter
  -1 siblings, 0 replies; 19+ messages in thread
From: Dan Carpenter @ 2021-06-16  8:37 UTC (permalink / raw)
  To: Christian König
  Cc: Thomas Hellstr, B6, m, Huang Rui, David Airlie, Daniel Vetter,
	dri-devel, linux-kernel

On Wed, Jun 16, 2021 at 08:46:33AM +0200, Christian König wrote:
> Sending the first message didn't worked, so let's try again.
> 
> Am 16.06.21 um 08:30 schrieb Dan Carpenter:
> > There are three bugs here:
> > 1) We need to call unpopulate() if ttm_tt_populate() succeeds.
> > 2) The "new_man = ttm_manager_type(bdev, bo->mem.mem_type);" assignment
> >     was wrong and it was really assigning "new_mem = old_mem;".  There
> >     is no need for this assignment anyway as we already have the value
> >     for "new_mem".
> > 3) The (!new_man->use_tt) condition is reversed.
> > 
> > Fixes: ba4e7d973dd0 ("drm: Add the TTM GPU memory manager subsystem.")
> > Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
> > ---
> > This is from reading the code and I can't swear that I have understood
> > it correctly.  My nouveau driver is currently unusable and this patch
> > has not helped.  But hopefully if I fix enough bugs eventually it will
> > start to work.
> 
> Well NAK, the code previously looked quite well and you are breaking it now.
> 
> What's the problem with nouveau?
> 

The new Firefox seems to excersize nouveau more than the old one so
when I start 10 firefox windows it just hangs the graphics.

I've added debug code and it seems like the problem is that
nv50_mem_new() is failing.


> >   drivers/gpu/drm/ttm/ttm_bo.c | 14 ++++++++------
> >   1 file changed, 8 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> > index ebcffe794adb..72dde093f754 100644
> > --- a/drivers/gpu/drm/ttm/ttm_bo.c
> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> > @@ -180,12 +180,12 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
> >   		 */
> >   		ret = ttm_tt_create(bo, old_man->use_tt);
> >   		if (ret)
> > -			goto out_err;
> > +			return ret;
> >   		if (mem->mem_type != TTM_PL_SYSTEM) {
> >   			ret = ttm_tt_populate(bo->bdev, bo->ttm, ctx);
> >   			if (ret)
> > -				goto out_err;
> > +				goto err_destroy;
> >   		}
> >   	}
> > @@ -193,15 +193,17 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
> >   	if (ret) {
> >   		if (ret == -EMULTIHOP)
> >   			return ret;
> > -		goto out_err;
> > +		goto err_unpopulate;
> >   	}
> >   	ctx->bytes_moved += bo->base.size;
> >   	return 0;
> > -out_err:
> > -	new_man = ttm_manager_type(bdev, bo->mem.mem_type);
> 
> This here switches new and old manager. E.g. the new_man is now pointing to
> the existing resource manager.

Why not just use "old_man" instead of basically the equivalent to
"new_man = old_man"?  Can the old_man change part way through the
function?

regards,
dan carpenter


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] drm/ttm: fix error handling in ttm_bo_handle_move_mem()
@ 2021-06-16  8:37       ` Dan Carpenter
  0 siblings, 0 replies; 19+ messages in thread
From: Dan Carpenter @ 2021-06-16  8:37 UTC (permalink / raw)
  To: Christian König
  Cc: B6, m, David Airlie, linux-kernel, dri-devel, Huang Rui, Thomas Hellstr

On Wed, Jun 16, 2021 at 08:46:33AM +0200, Christian König wrote:
> Sending the first message didn't worked, so let's try again.
> 
> Am 16.06.21 um 08:30 schrieb Dan Carpenter:
> > There are three bugs here:
> > 1) We need to call unpopulate() if ttm_tt_populate() succeeds.
> > 2) The "new_man = ttm_manager_type(bdev, bo->mem.mem_type);" assignment
> >     was wrong and it was really assigning "new_mem = old_mem;".  There
> >     is no need for this assignment anyway as we already have the value
> >     for "new_mem".
> > 3) The (!new_man->use_tt) condition is reversed.
> > 
> > Fixes: ba4e7d973dd0 ("drm: Add the TTM GPU memory manager subsystem.")
> > Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
> > ---
> > This is from reading the code and I can't swear that I have understood
> > it correctly.  My nouveau driver is currently unusable and this patch
> > has not helped.  But hopefully if I fix enough bugs eventually it will
> > start to work.
> 
> Well NAK, the code previously looked quite well and you are breaking it now.
> 
> What's the problem with nouveau?
> 

The new Firefox seems to excersize nouveau more than the old one so
when I start 10 firefox windows it just hangs the graphics.

I've added debug code and it seems like the problem is that
nv50_mem_new() is failing.


> >   drivers/gpu/drm/ttm/ttm_bo.c | 14 ++++++++------
> >   1 file changed, 8 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> > index ebcffe794adb..72dde093f754 100644
> > --- a/drivers/gpu/drm/ttm/ttm_bo.c
> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> > @@ -180,12 +180,12 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
> >   		 */
> >   		ret = ttm_tt_create(bo, old_man->use_tt);
> >   		if (ret)
> > -			goto out_err;
> > +			return ret;
> >   		if (mem->mem_type != TTM_PL_SYSTEM) {
> >   			ret = ttm_tt_populate(bo->bdev, bo->ttm, ctx);
> >   			if (ret)
> > -				goto out_err;
> > +				goto err_destroy;
> >   		}
> >   	}
> > @@ -193,15 +193,17 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
> >   	if (ret) {
> >   		if (ret == -EMULTIHOP)
> >   			return ret;
> > -		goto out_err;
> > +		goto err_unpopulate;
> >   	}
> >   	ctx->bytes_moved += bo->base.size;
> >   	return 0;
> > -out_err:
> > -	new_man = ttm_manager_type(bdev, bo->mem.mem_type);
> 
> This here switches new and old manager. E.g. the new_man is now pointing to
> the existing resource manager.

Why not just use "old_man" instead of basically the equivalent to
"new_man = old_man"?  Can the old_man change part way through the
function?

regards,
dan carpenter


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] drm/ttm: fix error handling in ttm_bo_handle_move_mem()
  2021-06-16  8:37       ` Dan Carpenter
@ 2021-06-16  8:47         ` Christian König
  -1 siblings, 0 replies; 19+ messages in thread
From: Christian König @ 2021-06-16  8:47 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Thomas Hellstr, B6, m, Huang Rui, David Airlie, Daniel Vetter,
	dri-devel, linux-kernel



Am 16.06.21 um 10:37 schrieb Dan Carpenter:
> On Wed, Jun 16, 2021 at 08:46:33AM +0200, Christian König wrote:
>> Sending the first message didn't worked, so let's try again.
>>
>> Am 16.06.21 um 08:30 schrieb Dan Carpenter:
>>> There are three bugs here:
>>> 1) We need to call unpopulate() if ttm_tt_populate() succeeds.
>>> 2) The "new_man = ttm_manager_type(bdev, bo->mem.mem_type);" assignment
>>>      was wrong and it was really assigning "new_mem = old_mem;".  There
>>>      is no need for this assignment anyway as we already have the value
>>>      for "new_mem".
>>> 3) The (!new_man->use_tt) condition is reversed.
>>>
>>> Fixes: ba4e7d973dd0 ("drm: Add the TTM GPU memory manager subsystem.")
>>> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
>>> ---
>>> This is from reading the code and I can't swear that I have understood
>>> it correctly.  My nouveau driver is currently unusable and this patch
>>> has not helped.  But hopefully if I fix enough bugs eventually it will
>>> start to work.
>> Well NAK, the code previously looked quite well and you are breaking it now.
>>
>> What's the problem with nouveau?
>>
> The new Firefox seems to excersize nouveau more than the old one so
> when I start 10 firefox windows it just hangs the graphics.
>
> I've added debug code and it seems like the problem is that
> nv50_mem_new() is failing.

Sounds like it is running out of memory to me.

Do you have a dmesg?

>
>
>>>    drivers/gpu/drm/ttm/ttm_bo.c | 14 ++++++++------
>>>    1 file changed, 8 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
>>> index ebcffe794adb..72dde093f754 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>> @@ -180,12 +180,12 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
>>>    		 */
>>>    		ret = ttm_tt_create(bo, old_man->use_tt);
>>>    		if (ret)
>>> -			goto out_err;
>>> +			return ret;
>>>    		if (mem->mem_type != TTM_PL_SYSTEM) {
>>>    			ret = ttm_tt_populate(bo->bdev, bo->ttm, ctx);
>>>    			if (ret)
>>> -				goto out_err;
>>> +				goto err_destroy;
>>>    		}
>>>    	}
>>> @@ -193,15 +193,17 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
>>>    	if (ret) {
>>>    		if (ret == -EMULTIHOP)
>>>    			return ret;
>>> -		goto out_err;
>>> +		goto err_unpopulate;
>>>    	}
>>>    	ctx->bytes_moved += bo->base.size;
>>>    	return 0;
>>> -out_err:
>>> -	new_man = ttm_manager_type(bdev, bo->mem.mem_type);
>> This here switches new and old manager. E.g. the new_man is now pointing to
>> the existing resource manager.
> Why not just use "old_man" instead of basically the equivalent to
> "new_man = old_man"?  Can the old_man change part way through the
> function?

Good question :)

I don't think that old_man could change and yes that would be much more 
easier to understand.

Regards,
Christian.

>
> regards,
> dan carpenter
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] drm/ttm: fix error handling in ttm_bo_handle_move_mem()
@ 2021-06-16  8:47         ` Christian König
  0 siblings, 0 replies; 19+ messages in thread
From: Christian König @ 2021-06-16  8:47 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: B6, m, David Airlie, linux-kernel, dri-devel, Huang Rui, Thomas Hellstr



Am 16.06.21 um 10:37 schrieb Dan Carpenter:
> On Wed, Jun 16, 2021 at 08:46:33AM +0200, Christian König wrote:
>> Sending the first message didn't worked, so let's try again.
>>
>> Am 16.06.21 um 08:30 schrieb Dan Carpenter:
>>> There are three bugs here:
>>> 1) We need to call unpopulate() if ttm_tt_populate() succeeds.
>>> 2) The "new_man = ttm_manager_type(bdev, bo->mem.mem_type);" assignment
>>>      was wrong and it was really assigning "new_mem = old_mem;".  There
>>>      is no need for this assignment anyway as we already have the value
>>>      for "new_mem".
>>> 3) The (!new_man->use_tt) condition is reversed.
>>>
>>> Fixes: ba4e7d973dd0 ("drm: Add the TTM GPU memory manager subsystem.")
>>> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
>>> ---
>>> This is from reading the code and I can't swear that I have understood
>>> it correctly.  My nouveau driver is currently unusable and this patch
>>> has not helped.  But hopefully if I fix enough bugs eventually it will
>>> start to work.
>> Well NAK, the code previously looked quite well and you are breaking it now.
>>
>> What's the problem with nouveau?
>>
> The new Firefox seems to excersize nouveau more than the old one so
> when I start 10 firefox windows it just hangs the graphics.
>
> I've added debug code and it seems like the problem is that
> nv50_mem_new() is failing.

Sounds like it is running out of memory to me.

Do you have a dmesg?

>
>
>>>    drivers/gpu/drm/ttm/ttm_bo.c | 14 ++++++++------
>>>    1 file changed, 8 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
>>> index ebcffe794adb..72dde093f754 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>> @@ -180,12 +180,12 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
>>>    		 */
>>>    		ret = ttm_tt_create(bo, old_man->use_tt);
>>>    		if (ret)
>>> -			goto out_err;
>>> +			return ret;
>>>    		if (mem->mem_type != TTM_PL_SYSTEM) {
>>>    			ret = ttm_tt_populate(bo->bdev, bo->ttm, ctx);
>>>    			if (ret)
>>> -				goto out_err;
>>> +				goto err_destroy;
>>>    		}
>>>    	}
>>> @@ -193,15 +193,17 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
>>>    	if (ret) {
>>>    		if (ret == -EMULTIHOP)
>>>    			return ret;
>>> -		goto out_err;
>>> +		goto err_unpopulate;
>>>    	}
>>>    	ctx->bytes_moved += bo->base.size;
>>>    	return 0;
>>> -out_err:
>>> -	new_man = ttm_manager_type(bdev, bo->mem.mem_type);
>> This here switches new and old manager. E.g. the new_man is now pointing to
>> the existing resource manager.
> Why not just use "old_man" instead of basically the equivalent to
> "new_man = old_man"?  Can the old_man change part way through the
> function?

Good question :)

I don't think that old_man could change and yes that would be much more 
easier to understand.

Regards,
Christian.

>
> regards,
> dan carpenter
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] drm/ttm: fix error handling in ttm_bo_handle_move_mem()
  2021-06-16  8:47         ` Christian König
@ 2021-06-16  9:36           ` Dan Carpenter
  -1 siblings, 0 replies; 19+ messages in thread
From: Dan Carpenter @ 2021-06-16  9:36 UTC (permalink / raw)
  To: Christian König
  Cc: Thomas Hellstr, B6, m, Huang Rui, David Airlie, Daniel Vetter,
	dri-devel, linux-kernel

On Wed, Jun 16, 2021 at 10:47:14AM +0200, Christian König wrote:
> 
> 
> Am 16.06.21 um 10:37 schrieb Dan Carpenter:
> > On Wed, Jun 16, 2021 at 08:46:33AM +0200, Christian König wrote:
> > > Sending the first message didn't worked, so let's try again.
> > > 
> > > Am 16.06.21 um 08:30 schrieb Dan Carpenter:
> > > > There are three bugs here:
> > > > 1) We need to call unpopulate() if ttm_tt_populate() succeeds.
> > > > 2) The "new_man = ttm_manager_type(bdev, bo->mem.mem_type);" assignment
> > > >      was wrong and it was really assigning "new_mem = old_mem;".  There
> > > >      is no need for this assignment anyway as we already have the value
> > > >      for "new_mem".
> > > > 3) The (!new_man->use_tt) condition is reversed.
> > > > 
> > > > Fixes: ba4e7d973dd0 ("drm: Add the TTM GPU memory manager subsystem.")
> > > > Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
> > > > ---
> > > > This is from reading the code and I can't swear that I have understood
> > > > it correctly.  My nouveau driver is currently unusable and this patch
> > > > has not helped.  But hopefully if I fix enough bugs eventually it will
> > > > start to work.
> > > Well NAK, the code previously looked quite well and you are breaking it now.
> > > 
> > > What's the problem with nouveau?
> > > 
> > The new Firefox seems to excersize nouveau more than the old one so
> > when I start 10 firefox windows it just hangs the graphics.
> > 
> > I've added debug code and it seems like the problem is that
> > nv50_mem_new() is failing.
> 
> Sounds like it is running out of memory to me.
> 
> Do you have a dmesg?
> 

At first there was a very straight forward use after free bug which I
fixed.
https://lore.kernel.org/nouveau/YMinJwpIei9n1Pn1@mwanda/T/#u

But now the use after free is gone the only thing in dmesg is:
"[TTM] Buffer eviction failed".  And I have some firmware missing.

[  205.489763] rfkill: input handler disabled
[  205.678292] nouveau 0000:01:00.0: Direct firmware load for nouveau/nva8_fuc084 failed with error -2
[  205.678300] nouveau 0000:01:00.0: Direct firmware load for nouveau/nva8_fuc084d failed with error -2
[  205.678302] nouveau 0000:01:00.0: msvld: unable to load firmware data
[  205.678304] nouveau 0000:01:00.0: msvld: init failed, -19
[  296.150632] [TTM] Buffer eviction failed
[  417.084265] [TTM] Buffer eviction failed
[  447.295961] [TTM] Buffer eviction failed
[  510.800231] [TTM] Buffer eviction failed
[  556.101384] [TTM] Buffer eviction failed
[  616.495790] [TTM] Buffer eviction failed
[  692.014007] [TTM] Buffer eviction failed

The eviction failed message only shows up a minute after the hang so it
seems more like a symptom than a root cause.

regards,
dan carpenter


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] drm/ttm: fix error handling in ttm_bo_handle_move_mem()
@ 2021-06-16  9:36           ` Dan Carpenter
  0 siblings, 0 replies; 19+ messages in thread
From: Dan Carpenter @ 2021-06-16  9:36 UTC (permalink / raw)
  To: Christian König
  Cc: B6, m, David Airlie, linux-kernel, dri-devel, Huang Rui, Thomas Hellstr

On Wed, Jun 16, 2021 at 10:47:14AM +0200, Christian König wrote:
> 
> 
> Am 16.06.21 um 10:37 schrieb Dan Carpenter:
> > On Wed, Jun 16, 2021 at 08:46:33AM +0200, Christian König wrote:
> > > Sending the first message didn't worked, so let's try again.
> > > 
> > > Am 16.06.21 um 08:30 schrieb Dan Carpenter:
> > > > There are three bugs here:
> > > > 1) We need to call unpopulate() if ttm_tt_populate() succeeds.
> > > > 2) The "new_man = ttm_manager_type(bdev, bo->mem.mem_type);" assignment
> > > >      was wrong and it was really assigning "new_mem = old_mem;".  There
> > > >      is no need for this assignment anyway as we already have the value
> > > >      for "new_mem".
> > > > 3) The (!new_man->use_tt) condition is reversed.
> > > > 
> > > > Fixes: ba4e7d973dd0 ("drm: Add the TTM GPU memory manager subsystem.")
> > > > Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
> > > > ---
> > > > This is from reading the code and I can't swear that I have understood
> > > > it correctly.  My nouveau driver is currently unusable and this patch
> > > > has not helped.  But hopefully if I fix enough bugs eventually it will
> > > > start to work.
> > > Well NAK, the code previously looked quite well and you are breaking it now.
> > > 
> > > What's the problem with nouveau?
> > > 
> > The new Firefox seems to excersize nouveau more than the old one so
> > when I start 10 firefox windows it just hangs the graphics.
> > 
> > I've added debug code and it seems like the problem is that
> > nv50_mem_new() is failing.
> 
> Sounds like it is running out of memory to me.
> 
> Do you have a dmesg?
> 

At first there was a very straight forward use after free bug which I
fixed.
https://lore.kernel.org/nouveau/YMinJwpIei9n1Pn1@mwanda/T/#u

But now the use after free is gone the only thing in dmesg is:
"[TTM] Buffer eviction failed".  And I have some firmware missing.

[  205.489763] rfkill: input handler disabled
[  205.678292] nouveau 0000:01:00.0: Direct firmware load for nouveau/nva8_fuc084 failed with error -2
[  205.678300] nouveau 0000:01:00.0: Direct firmware load for nouveau/nva8_fuc084d failed with error -2
[  205.678302] nouveau 0000:01:00.0: msvld: unable to load firmware data
[  205.678304] nouveau 0000:01:00.0: msvld: init failed, -19
[  296.150632] [TTM] Buffer eviction failed
[  417.084265] [TTM] Buffer eviction failed
[  447.295961] [TTM] Buffer eviction failed
[  510.800231] [TTM] Buffer eviction failed
[  556.101384] [TTM] Buffer eviction failed
[  616.495790] [TTM] Buffer eviction failed
[  692.014007] [TTM] Buffer eviction failed

The eviction failed message only shows up a minute after the hang so it
seems more like a symptom than a root cause.

regards,
dan carpenter


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] drm/ttm: fix error handling in ttm_bo_handle_move_mem()
  2021-06-16  9:36           ` Dan Carpenter
@ 2021-06-16 11:00             ` Christian König
  -1 siblings, 0 replies; 19+ messages in thread
From: Christian König @ 2021-06-16 11:00 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Thomas Hellstr, B6, m, Huang Rui, David Airlie, Daniel Vetter,
	dri-devel, linux-kernel



Am 16.06.21 um 11:36 schrieb Dan Carpenter:
> On Wed, Jun 16, 2021 at 10:47:14AM +0200, Christian König wrote:
>>
>> Am 16.06.21 um 10:37 schrieb Dan Carpenter:
>>> On Wed, Jun 16, 2021 at 08:46:33AM +0200, Christian König wrote:
>>>> Sending the first message didn't worked, so let's try again.
>>>>
>>>> Am 16.06.21 um 08:30 schrieb Dan Carpenter:
>>>>> There are three bugs here:
>>>>> 1) We need to call unpopulate() if ttm_tt_populate() succeeds.
>>>>> 2) The "new_man = ttm_manager_type(bdev, bo->mem.mem_type);" assignment
>>>>>       was wrong and it was really assigning "new_mem = old_mem;".  There
>>>>>       is no need for this assignment anyway as we already have the value
>>>>>       for "new_mem".
>>>>> 3) The (!new_man->use_tt) condition is reversed.
>>>>>
>>>>> Fixes: ba4e7d973dd0 ("drm: Add the TTM GPU memory manager subsystem.")
>>>>> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
>>>>> ---
>>>>> This is from reading the code and I can't swear that I have understood
>>>>> it correctly.  My nouveau driver is currently unusable and this patch
>>>>> has not helped.  But hopefully if I fix enough bugs eventually it will
>>>>> start to work.
>>>> Well NAK, the code previously looked quite well and you are breaking it now.
>>>>
>>>> What's the problem with nouveau?
>>>>
>>> The new Firefox seems to excersize nouveau more than the old one so
>>> when I start 10 firefox windows it just hangs the graphics.
>>>
>>> I've added debug code and it seems like the problem is that
>>> nv50_mem_new() is failing.
>> Sounds like it is running out of memory to me.
>>
>> Do you have a dmesg?
>>
> At first there was a very straight forward use after free bug which I
> fixed.
> https://lore.kernel.org/nouveau/YMinJwpIei9n1Pn1@mwanda/T/#u
>
> But now the use after free is gone the only thing in dmesg is:
> "[TTM] Buffer eviction failed".  And I have some firmware missing.
>
> [  205.489763] rfkill: input handler disabled
> [  205.678292] nouveau 0000:01:00.0: Direct firmware load for nouveau/nva8_fuc084 failed with error -2
> [  205.678300] nouveau 0000:01:00.0: Direct firmware load for nouveau/nva8_fuc084d failed with error -2
> [  205.678302] nouveau 0000:01:00.0: msvld: unable to load firmware data
> [  205.678304] nouveau 0000:01:00.0: msvld: init failed, -19
> [  296.150632] [TTM] Buffer eviction failed
> [  417.084265] [TTM] Buffer eviction failed
> [  447.295961] [TTM] Buffer eviction failed
> [  510.800231] [TTM] Buffer eviction failed
> [  556.101384] [TTM] Buffer eviction failed
> [  616.495790] [TTM] Buffer eviction failed
> [  692.014007] [TTM] Buffer eviction failed
>
> The eviction failed message only shows up a minute after the hang so it
> seems more like a symptom than a root cause.

Yeah, look at the timing. What happens is that the buffer eviction timed 
out because the hardware is locked up.

No idea what that could be. It might not even be kernel related at all.

Regards,
Christian.

>
> regards,
> dan carpenter
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] drm/ttm: fix error handling in ttm_bo_handle_move_mem()
@ 2021-06-16 11:00             ` Christian König
  0 siblings, 0 replies; 19+ messages in thread
From: Christian König @ 2021-06-16 11:00 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: B6, m, David Airlie, linux-kernel, dri-devel, Huang Rui, Thomas Hellstr



Am 16.06.21 um 11:36 schrieb Dan Carpenter:
> On Wed, Jun 16, 2021 at 10:47:14AM +0200, Christian König wrote:
>>
>> Am 16.06.21 um 10:37 schrieb Dan Carpenter:
>>> On Wed, Jun 16, 2021 at 08:46:33AM +0200, Christian König wrote:
>>>> Sending the first message didn't worked, so let's try again.
>>>>
>>>> Am 16.06.21 um 08:30 schrieb Dan Carpenter:
>>>>> There are three bugs here:
>>>>> 1) We need to call unpopulate() if ttm_tt_populate() succeeds.
>>>>> 2) The "new_man = ttm_manager_type(bdev, bo->mem.mem_type);" assignment
>>>>>       was wrong and it was really assigning "new_mem = old_mem;".  There
>>>>>       is no need for this assignment anyway as we already have the value
>>>>>       for "new_mem".
>>>>> 3) The (!new_man->use_tt) condition is reversed.
>>>>>
>>>>> Fixes: ba4e7d973dd0 ("drm: Add the TTM GPU memory manager subsystem.")
>>>>> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
>>>>> ---
>>>>> This is from reading the code and I can't swear that I have understood
>>>>> it correctly.  My nouveau driver is currently unusable and this patch
>>>>> has not helped.  But hopefully if I fix enough bugs eventually it will
>>>>> start to work.
>>>> Well NAK, the code previously looked quite well and you are breaking it now.
>>>>
>>>> What's the problem with nouveau?
>>>>
>>> The new Firefox seems to excersize nouveau more than the old one so
>>> when I start 10 firefox windows it just hangs the graphics.
>>>
>>> I've added debug code and it seems like the problem is that
>>> nv50_mem_new() is failing.
>> Sounds like it is running out of memory to me.
>>
>> Do you have a dmesg?
>>
> At first there was a very straight forward use after free bug which I
> fixed.
> https://lore.kernel.org/nouveau/YMinJwpIei9n1Pn1@mwanda/T/#u
>
> But now the use after free is gone the only thing in dmesg is:
> "[TTM] Buffer eviction failed".  And I have some firmware missing.
>
> [  205.489763] rfkill: input handler disabled
> [  205.678292] nouveau 0000:01:00.0: Direct firmware load for nouveau/nva8_fuc084 failed with error -2
> [  205.678300] nouveau 0000:01:00.0: Direct firmware load for nouveau/nva8_fuc084d failed with error -2
> [  205.678302] nouveau 0000:01:00.0: msvld: unable to load firmware data
> [  205.678304] nouveau 0000:01:00.0: msvld: init failed, -19
> [  296.150632] [TTM] Buffer eviction failed
> [  417.084265] [TTM] Buffer eviction failed
> [  447.295961] [TTM] Buffer eviction failed
> [  510.800231] [TTM] Buffer eviction failed
> [  556.101384] [TTM] Buffer eviction failed
> [  616.495790] [TTM] Buffer eviction failed
> [  692.014007] [TTM] Buffer eviction failed
>
> The eviction failed message only shows up a minute after the hang so it
> seems more like a symptom than a root cause.

Yeah, look at the timing. What happens is that the buffer eviction timed 
out because the hardware is locked up.

No idea what that could be. It might not even be kernel related at all.

Regards,
Christian.

>
> regards,
> dan carpenter
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] drm/ttm: fix error handling in ttm_bo_handle_move_mem()
  2021-06-16 11:00             ` Christian König
@ 2021-06-16 19:19               ` Dan Carpenter
  -1 siblings, 0 replies; 19+ messages in thread
From: Dan Carpenter @ 2021-06-16 19:19 UTC (permalink / raw)
  To: Christian König
  Cc: Thomas Hellstr, thomas.hellstrom, Huang Rui, David Airlie,
	Daniel Vetter, dri-devel, linux-kernel

On Wed, Jun 16, 2021 at 01:00:38PM +0200, Christian König wrote:
> 
> 
> Am 16.06.21 um 11:36 schrieb Dan Carpenter:
> > On Wed, Jun 16, 2021 at 10:47:14AM +0200, Christian König wrote:
> > > 
> > > Am 16.06.21 um 10:37 schrieb Dan Carpenter:
> > > > On Wed, Jun 16, 2021 at 08:46:33AM +0200, Christian König wrote:
> > > > > Sending the first message didn't worked, so let's try again.
> > > > > 
> > > > > Am 16.06.21 um 08:30 schrieb Dan Carpenter:
> > > > > > There are three bugs here:
> > > > > > 1) We need to call unpopulate() if ttm_tt_populate() succeeds.
> > > > > > 2) The "new_man = ttm_manager_type(bdev, bo->mem.mem_type);" assignment
> > > > > >       was wrong and it was really assigning "new_mem = old_mem;".  There
> > > > > >       is no need for this assignment anyway as we already have the value
> > > > > >       for "new_mem".
> > > > > > 3) The (!new_man->use_tt) condition is reversed.
> > > > > > 
> > > > > > Fixes: ba4e7d973dd0 ("drm: Add the TTM GPU memory manager subsystem.")
> > > > > > Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
> > > > > > ---
> > > > > > This is from reading the code and I can't swear that I have understood
> > > > > > it correctly.  My nouveau driver is currently unusable and this patch
> > > > > > has not helped.  But hopefully if I fix enough bugs eventually it will
> > > > > > start to work.
> > > > > Well NAK, the code previously looked quite well and you are breaking it now.
> > > > > 
> > > > > What's the problem with nouveau?
> > > > > 
> > > > The new Firefox seems to excersize nouveau more than the old one so
> > > > when I start 10 firefox windows it just hangs the graphics.
> > > > 
> > > > I've added debug code and it seems like the problem is that
> > > > nv50_mem_new() is failing.
> > > Sounds like it is running out of memory to me.
> > > 
> > > Do you have a dmesg?
> > > 
> > At first there was a very straight forward use after free bug which I
> > fixed.
> > https://lore.kernel.org/nouveau/YMinJwpIei9n1Pn1@mwanda/T/#u
> > 
> > But now the use after free is gone the only thing in dmesg is:
> > "[TTM] Buffer eviction failed".  And I have some firmware missing.
> > 
> > [  205.489763] rfkill: input handler disabled
> > [  205.678292] nouveau 0000:01:00.0: Direct firmware load for nouveau/nva8_fuc084 failed with error -2
> > [  205.678300] nouveau 0000:01:00.0: Direct firmware load for nouveau/nva8_fuc084d failed with error -2
> > [  205.678302] nouveau 0000:01:00.0: msvld: unable to load firmware data
> > [  205.678304] nouveau 0000:01:00.0: msvld: init failed, -19
> > [  296.150632] [TTM] Buffer eviction failed
> > [  417.084265] [TTM] Buffer eviction failed
> > [  447.295961] [TTM] Buffer eviction failed
> > [  510.800231] [TTM] Buffer eviction failed
> > [  556.101384] [TTM] Buffer eviction failed
> > [  616.495790] [TTM] Buffer eviction failed
> > [  692.014007] [TTM] Buffer eviction failed
> > 
> > The eviction failed message only shows up a minute after the hang so it
> > seems more like a symptom than a root cause.
> 
> Yeah, look at the timing. What happens is that the buffer eviction timed out
> because the hardware is locked up.
> 
> No idea what that could be. It might not even be kernel related at all.

I don't think it's hardware related...  Using an old version of firefox
"fixes" the problem.  I downloaded the firmware so that's not the issue.
Here's the dmesg load info with the new firmware.

[    1.412458] AMD-Vi: AMD IOMMUv2 driver by Joerg Roedel <jroedel@suse.de>
[    1.412527] AMD-Vi: AMD IOMMUv2 functionality not available on this system
[    1.412710] nouveau 0000:01:00.0: vgaarb: deactivate vga console
[    1.417213] Console: switching to colour dummy device 80x25
[    1.417272] nouveau 0000:01:00.0: NVIDIA GT218 (0a8280b1)
[    1.531565] nouveau 0000:01:00.0: bios: nvkm_bios_new: version 70.18.6f.00.05
[    1.531916] nouveau 0000:01:00.0: fb: nvkm_ram_ctor: 1024 MiB DDR3
[    2.248212] tsc: Refined TSC clocksource calibration: 3392.144 MHz
[    2.248218] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x30e5517d4e4, max_idle_ns: 440795261668 ns
[    2.252203] clocksource: Switched to clocksource tsc
[    2.848138] nouveau 0000:01:00.0: DRM: VRAM: 1024 MiB
[    2.848142] nouveau 0000:01:00.0: DRM: GART: 1048576 MiB
[    2.848145] nouveau 0000:01:00.0: DRM: TMDS table version 2.0
[    2.848147] nouveau 0000:01:00.0: DRM: DCB version 4.0
[    2.848149] nouveau 0000:01:00.0: DRM: DCB outp 00: 01000302 00020030
[    2.848151] nouveau 0000:01:00.0: DRM: DCB outp 01: 02000300 00000000
[    2.848154] nouveau 0000:01:00.0: DRM: DCB outp 02: 02011362 00020010
[    2.848155] nouveau 0000:01:00.0: DRM: DCB outp 03: 01022310 00000000
[    2.848157] nouveau 0000:01:00.0: DRM: DCB conn 00: 00001030
[    2.848159] nouveau 0000:01:00.0: DRM: DCB conn 01: 00002161
[    2.848161] nouveau 0000:01:00.0: DRM: DCB conn 02: 00000200
[    2.850214] nouveau 0000:01:00.0: DRM: MM: using COPY for buffer copies
[    2.908409] nouveau 0000:01:00.0: DRM: allocated 1600x900 fb: 0x70000, bo 00000000091fb080
[    2.908518] fbcon: nouveaudrmfb (fb0) is primary device
[    2.955528] Console: switching to colour frame buffer device 200x56
[    2.957780] nouveau 0000:01:00.0: [drm] fb0: nouveaudrmfb frame buffer device
[    2.957926] [drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on minor 0
[    2.959816] loop: module loaded

regards,
dan carpenter

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] drm/ttm: fix error handling in ttm_bo_handle_move_mem()
@ 2021-06-16 19:19               ` Dan Carpenter
  0 siblings, 0 replies; 19+ messages in thread
From: Dan Carpenter @ 2021-06-16 19:19 UTC (permalink / raw)
  To: Christian König
  Cc: thomas.hellstrom, David Airlie, linux-kernel, dri-devel,
	Huang Rui, Thomas Hellstr

On Wed, Jun 16, 2021 at 01:00:38PM +0200, Christian König wrote:
> 
> 
> Am 16.06.21 um 11:36 schrieb Dan Carpenter:
> > On Wed, Jun 16, 2021 at 10:47:14AM +0200, Christian König wrote:
> > > 
> > > Am 16.06.21 um 10:37 schrieb Dan Carpenter:
> > > > On Wed, Jun 16, 2021 at 08:46:33AM +0200, Christian König wrote:
> > > > > Sending the first message didn't worked, so let's try again.
> > > > > 
> > > > > Am 16.06.21 um 08:30 schrieb Dan Carpenter:
> > > > > > There are three bugs here:
> > > > > > 1) We need to call unpopulate() if ttm_tt_populate() succeeds.
> > > > > > 2) The "new_man = ttm_manager_type(bdev, bo->mem.mem_type);" assignment
> > > > > >       was wrong and it was really assigning "new_mem = old_mem;".  There
> > > > > >       is no need for this assignment anyway as we already have the value
> > > > > >       for "new_mem".
> > > > > > 3) The (!new_man->use_tt) condition is reversed.
> > > > > > 
> > > > > > Fixes: ba4e7d973dd0 ("drm: Add the TTM GPU memory manager subsystem.")
> > > > > > Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
> > > > > > ---
> > > > > > This is from reading the code and I can't swear that I have understood
> > > > > > it correctly.  My nouveau driver is currently unusable and this patch
> > > > > > has not helped.  But hopefully if I fix enough bugs eventually it will
> > > > > > start to work.
> > > > > Well NAK, the code previously looked quite well and you are breaking it now.
> > > > > 
> > > > > What's the problem with nouveau?
> > > > > 
> > > > The new Firefox seems to excersize nouveau more than the old one so
> > > > when I start 10 firefox windows it just hangs the graphics.
> > > > 
> > > > I've added debug code and it seems like the problem is that
> > > > nv50_mem_new() is failing.
> > > Sounds like it is running out of memory to me.
> > > 
> > > Do you have a dmesg?
> > > 
> > At first there was a very straight forward use after free bug which I
> > fixed.
> > https://lore.kernel.org/nouveau/YMinJwpIei9n1Pn1@mwanda/T/#u
> > 
> > But now the use after free is gone the only thing in dmesg is:
> > "[TTM] Buffer eviction failed".  And I have some firmware missing.
> > 
> > [  205.489763] rfkill: input handler disabled
> > [  205.678292] nouveau 0000:01:00.0: Direct firmware load for nouveau/nva8_fuc084 failed with error -2
> > [  205.678300] nouveau 0000:01:00.0: Direct firmware load for nouveau/nva8_fuc084d failed with error -2
> > [  205.678302] nouveau 0000:01:00.0: msvld: unable to load firmware data
> > [  205.678304] nouveau 0000:01:00.0: msvld: init failed, -19
> > [  296.150632] [TTM] Buffer eviction failed
> > [  417.084265] [TTM] Buffer eviction failed
> > [  447.295961] [TTM] Buffer eviction failed
> > [  510.800231] [TTM] Buffer eviction failed
> > [  556.101384] [TTM] Buffer eviction failed
> > [  616.495790] [TTM] Buffer eviction failed
> > [  692.014007] [TTM] Buffer eviction failed
> > 
> > The eviction failed message only shows up a minute after the hang so it
> > seems more like a symptom than a root cause.
> 
> Yeah, look at the timing. What happens is that the buffer eviction timed out
> because the hardware is locked up.
> 
> No idea what that could be. It might not even be kernel related at all.

I don't think it's hardware related...  Using an old version of firefox
"fixes" the problem.  I downloaded the firmware so that's not the issue.
Here's the dmesg load info with the new firmware.

[    1.412458] AMD-Vi: AMD IOMMUv2 driver by Joerg Roedel <jroedel@suse.de>
[    1.412527] AMD-Vi: AMD IOMMUv2 functionality not available on this system
[    1.412710] nouveau 0000:01:00.0: vgaarb: deactivate vga console
[    1.417213] Console: switching to colour dummy device 80x25
[    1.417272] nouveau 0000:01:00.0: NVIDIA GT218 (0a8280b1)
[    1.531565] nouveau 0000:01:00.0: bios: nvkm_bios_new: version 70.18.6f.00.05
[    1.531916] nouveau 0000:01:00.0: fb: nvkm_ram_ctor: 1024 MiB DDR3
[    2.248212] tsc: Refined TSC clocksource calibration: 3392.144 MHz
[    2.248218] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x30e5517d4e4, max_idle_ns: 440795261668 ns
[    2.252203] clocksource: Switched to clocksource tsc
[    2.848138] nouveau 0000:01:00.0: DRM: VRAM: 1024 MiB
[    2.848142] nouveau 0000:01:00.0: DRM: GART: 1048576 MiB
[    2.848145] nouveau 0000:01:00.0: DRM: TMDS table version 2.0
[    2.848147] nouveau 0000:01:00.0: DRM: DCB version 4.0
[    2.848149] nouveau 0000:01:00.0: DRM: DCB outp 00: 01000302 00020030
[    2.848151] nouveau 0000:01:00.0: DRM: DCB outp 01: 02000300 00000000
[    2.848154] nouveau 0000:01:00.0: DRM: DCB outp 02: 02011362 00020010
[    2.848155] nouveau 0000:01:00.0: DRM: DCB outp 03: 01022310 00000000
[    2.848157] nouveau 0000:01:00.0: DRM: DCB conn 00: 00001030
[    2.848159] nouveau 0000:01:00.0: DRM: DCB conn 01: 00002161
[    2.848161] nouveau 0000:01:00.0: DRM: DCB conn 02: 00000200
[    2.850214] nouveau 0000:01:00.0: DRM: MM: using COPY for buffer copies
[    2.908409] nouveau 0000:01:00.0: DRM: allocated 1600x900 fb: 0x70000, bo 00000000091fb080
[    2.908518] fbcon: nouveaudrmfb (fb0) is primary device
[    2.955528] Console: switching to colour frame buffer device 200x56
[    2.957780] nouveau 0000:01:00.0: [drm] fb0: nouveaudrmfb frame buffer device
[    2.957926] [drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on minor 0
[    2.959816] loop: module loaded

regards,
dan carpenter

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] drm/ttm: fix error handling in ttm_bo_handle_move_mem()
  2021-06-16 19:19               ` Dan Carpenter
@ 2021-06-17  7:41                 ` Christian König
  -1 siblings, 0 replies; 19+ messages in thread
From: Christian König @ 2021-06-17  7:41 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: thomas.hellstrom, Huang Rui, David Airlie, Daniel Vetter,
	dri-devel, linux-kernel



Am 16.06.21 um 21:19 schrieb Dan Carpenter:
> On Wed, Jun 16, 2021 at 01:00:38PM +0200, Christian König wrote:
>>
>> Am 16.06.21 um 11:36 schrieb Dan Carpenter:
>>> On Wed, Jun 16, 2021 at 10:47:14AM +0200, Christian König wrote:
>>>> Am 16.06.21 um 10:37 schrieb Dan Carpenter:
>>>>> On Wed, Jun 16, 2021 at 08:46:33AM +0200, Christian König wrote:
>>>>>> Sending the first message didn't worked, so let's try again.
>>>>>>
>>>>>> Am 16.06.21 um 08:30 schrieb Dan Carpenter:
>>>>>>> There are three bugs here:
>>>>>>> 1) We need to call unpopulate() if ttm_tt_populate() succeeds.
>>>>>>> 2) The "new_man = ttm_manager_type(bdev, bo->mem.mem_type);" assignment
>>>>>>>        was wrong and it was really assigning "new_mem = old_mem;".  There
>>>>>>>        is no need for this assignment anyway as we already have the value
>>>>>>>        for "new_mem".
>>>>>>> 3) The (!new_man->use_tt) condition is reversed.
>>>>>>>
>>>>>>> Fixes: ba4e7d973dd0 ("drm: Add the TTM GPU memory manager subsystem.")
>>>>>>> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
>>>>>>> ---
>>>>>>> This is from reading the code and I can't swear that I have understood
>>>>>>> it correctly.  My nouveau driver is currently unusable and this patch
>>>>>>> has not helped.  But hopefully if I fix enough bugs eventually it will
>>>>>>> start to work.
>>>>>> Well NAK, the code previously looked quite well and you are breaking it now.
>>>>>>
>>>>>> What's the problem with nouveau?
>>>>>>
>>>>> The new Firefox seems to excersize nouveau more than the old one so
>>>>> when I start 10 firefox windows it just hangs the graphics.
>>>>>
>>>>> I've added debug code and it seems like the problem is that
>>>>> nv50_mem_new() is failing.
>>>> Sounds like it is running out of memory to me.
>>>>
>>>> Do you have a dmesg?
>>>>
>>> At first there was a very straight forward use after free bug which I
>>> fixed.
>>> https://lore.kernel.org/nouveau/YMinJwpIei9n1Pn1@mwanda/T/#u
>>>
>>> But now the use after free is gone the only thing in dmesg is:
>>> "[TTM] Buffer eviction failed".  And I have some firmware missing.
>>>
>>> [  205.489763] rfkill: input handler disabled
>>> [  205.678292] nouveau 0000:01:00.0: Direct firmware load for nouveau/nva8_fuc084 failed with error -2
>>> [  205.678300] nouveau 0000:01:00.0: Direct firmware load for nouveau/nva8_fuc084d failed with error -2
>>> [  205.678302] nouveau 0000:01:00.0: msvld: unable to load firmware data
>>> [  205.678304] nouveau 0000:01:00.0: msvld: init failed, -19
>>> [  296.150632] [TTM] Buffer eviction failed
>>> [  417.084265] [TTM] Buffer eviction failed
>>> [  447.295961] [TTM] Buffer eviction failed
>>> [  510.800231] [TTM] Buffer eviction failed
>>> [  556.101384] [TTM] Buffer eviction failed
>>> [  616.495790] [TTM] Buffer eviction failed
>>> [  692.014007] [TTM] Buffer eviction failed
>>>
>>> The eviction failed message only shows up a minute after the hang so it
>>> seems more like a symptom than a root cause.
>> Yeah, look at the timing. What happens is that the buffer eviction timed out
>> because the hardware is locked up.
>>
>> No idea what that could be. It might not even be kernel related at all.
> I don't think it's hardware related...  Using an old version of firefox
> "fixes" the problem.  I downloaded the firmware so that's not the issue.
> Here's the dmesg load info with the new firmware.

Oh, I was not suggesting a hardware problem.

The most likely cause is a software issue in userspace, e.g. wrong order 
of doing thing, doing things to fast without waiting etc...

There are tons of things how userspace can crash GPU hardware you can't 
prevent in the kernel. Especially sending an endless loop is well known 
as Turing's halting problems and not even theoretically solvable.

I suggest to start digging in userspace instead.

Christian.

>
> [    1.412458] AMD-Vi: AMD IOMMUv2 driver by Joerg Roedel <jroedel@suse.de>
> [    1.412527] AMD-Vi: AMD IOMMUv2 functionality not available on this system
> [    1.412710] nouveau 0000:01:00.0: vgaarb: deactivate vga console
> [    1.417213] Console: switching to colour dummy device 80x25
> [    1.417272] nouveau 0000:01:00.0: NVIDIA GT218 (0a8280b1)
> [    1.531565] nouveau 0000:01:00.0: bios: nvkm_bios_new: version 70.18.6f.00.05
> [    1.531916] nouveau 0000:01:00.0: fb: nvkm_ram_ctor: 1024 MiB DDR3
> [    2.248212] tsc: Refined TSC clocksource calibration: 3392.144 MHz
> [    2.248218] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x30e5517d4e4, max_idle_ns: 440795261668 ns
> [    2.252203] clocksource: Switched to clocksource tsc
> [    2.848138] nouveau 0000:01:00.0: DRM: VRAM: 1024 MiB
> [    2.848142] nouveau 0000:01:00.0: DRM: GART: 1048576 MiB
> [    2.848145] nouveau 0000:01:00.0: DRM: TMDS table version 2.0
> [    2.848147] nouveau 0000:01:00.0: DRM: DCB version 4.0
> [    2.848149] nouveau 0000:01:00.0: DRM: DCB outp 00: 01000302 00020030
> [    2.848151] nouveau 0000:01:00.0: DRM: DCB outp 01: 02000300 00000000
> [    2.848154] nouveau 0000:01:00.0: DRM: DCB outp 02: 02011362 00020010
> [    2.848155] nouveau 0000:01:00.0: DRM: DCB outp 03: 01022310 00000000
> [    2.848157] nouveau 0000:01:00.0: DRM: DCB conn 00: 00001030
> [    2.848159] nouveau 0000:01:00.0: DRM: DCB conn 01: 00002161
> [    2.848161] nouveau 0000:01:00.0: DRM: DCB conn 02: 00000200
> [    2.850214] nouveau 0000:01:00.0: DRM: MM: using COPY for buffer copies
> [    2.908409] nouveau 0000:01:00.0: DRM: allocated 1600x900 fb: 0x70000, bo 00000000091fb080
> [    2.908518] fbcon: nouveaudrmfb (fb0) is primary device
> [    2.955528] Console: switching to colour frame buffer device 200x56
> [    2.957780] nouveau 0000:01:00.0: [drm] fb0: nouveaudrmfb frame buffer device
> [    2.957926] [drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on minor 0
> [    2.959816] loop: module loaded
>
> regards,
> dan carpenter


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] drm/ttm: fix error handling in ttm_bo_handle_move_mem()
@ 2021-06-17  7:41                 ` Christian König
  0 siblings, 0 replies; 19+ messages in thread
From: Christian König @ 2021-06-17  7:41 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: thomas.hellstrom, David Airlie, linux-kernel, dri-devel, Huang Rui



Am 16.06.21 um 21:19 schrieb Dan Carpenter:
> On Wed, Jun 16, 2021 at 01:00:38PM +0200, Christian König wrote:
>>
>> Am 16.06.21 um 11:36 schrieb Dan Carpenter:
>>> On Wed, Jun 16, 2021 at 10:47:14AM +0200, Christian König wrote:
>>>> Am 16.06.21 um 10:37 schrieb Dan Carpenter:
>>>>> On Wed, Jun 16, 2021 at 08:46:33AM +0200, Christian König wrote:
>>>>>> Sending the first message didn't worked, so let's try again.
>>>>>>
>>>>>> Am 16.06.21 um 08:30 schrieb Dan Carpenter:
>>>>>>> There are three bugs here:
>>>>>>> 1) We need to call unpopulate() if ttm_tt_populate() succeeds.
>>>>>>> 2) The "new_man = ttm_manager_type(bdev, bo->mem.mem_type);" assignment
>>>>>>>        was wrong and it was really assigning "new_mem = old_mem;".  There
>>>>>>>        is no need for this assignment anyway as we already have the value
>>>>>>>        for "new_mem".
>>>>>>> 3) The (!new_man->use_tt) condition is reversed.
>>>>>>>
>>>>>>> Fixes: ba4e7d973dd0 ("drm: Add the TTM GPU memory manager subsystem.")
>>>>>>> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
>>>>>>> ---
>>>>>>> This is from reading the code and I can't swear that I have understood
>>>>>>> it correctly.  My nouveau driver is currently unusable and this patch
>>>>>>> has not helped.  But hopefully if I fix enough bugs eventually it will
>>>>>>> start to work.
>>>>>> Well NAK, the code previously looked quite well and you are breaking it now.
>>>>>>
>>>>>> What's the problem with nouveau?
>>>>>>
>>>>> The new Firefox seems to excersize nouveau more than the old one so
>>>>> when I start 10 firefox windows it just hangs the graphics.
>>>>>
>>>>> I've added debug code and it seems like the problem is that
>>>>> nv50_mem_new() is failing.
>>>> Sounds like it is running out of memory to me.
>>>>
>>>> Do you have a dmesg?
>>>>
>>> At first there was a very straight forward use after free bug which I
>>> fixed.
>>> https://lore.kernel.org/nouveau/YMinJwpIei9n1Pn1@mwanda/T/#u
>>>
>>> But now the use after free is gone the only thing in dmesg is:
>>> "[TTM] Buffer eviction failed".  And I have some firmware missing.
>>>
>>> [  205.489763] rfkill: input handler disabled
>>> [  205.678292] nouveau 0000:01:00.0: Direct firmware load for nouveau/nva8_fuc084 failed with error -2
>>> [  205.678300] nouveau 0000:01:00.0: Direct firmware load for nouveau/nva8_fuc084d failed with error -2
>>> [  205.678302] nouveau 0000:01:00.0: msvld: unable to load firmware data
>>> [  205.678304] nouveau 0000:01:00.0: msvld: init failed, -19
>>> [  296.150632] [TTM] Buffer eviction failed
>>> [  417.084265] [TTM] Buffer eviction failed
>>> [  447.295961] [TTM] Buffer eviction failed
>>> [  510.800231] [TTM] Buffer eviction failed
>>> [  556.101384] [TTM] Buffer eviction failed
>>> [  616.495790] [TTM] Buffer eviction failed
>>> [  692.014007] [TTM] Buffer eviction failed
>>>
>>> The eviction failed message only shows up a minute after the hang so it
>>> seems more like a symptom than a root cause.
>> Yeah, look at the timing. What happens is that the buffer eviction timed out
>> because the hardware is locked up.
>>
>> No idea what that could be. It might not even be kernel related at all.
> I don't think it's hardware related...  Using an old version of firefox
> "fixes" the problem.  I downloaded the firmware so that's not the issue.
> Here's the dmesg load info with the new firmware.

Oh, I was not suggesting a hardware problem.

The most likely cause is a software issue in userspace, e.g. wrong order 
of doing thing, doing things to fast without waiting etc...

There are tons of things how userspace can crash GPU hardware you can't 
prevent in the kernel. Especially sending an endless loop is well known 
as Turing's halting problems and not even theoretically solvable.

I suggest to start digging in userspace instead.

Christian.

>
> [    1.412458] AMD-Vi: AMD IOMMUv2 driver by Joerg Roedel <jroedel@suse.de>
> [    1.412527] AMD-Vi: AMD IOMMUv2 functionality not available on this system
> [    1.412710] nouveau 0000:01:00.0: vgaarb: deactivate vga console
> [    1.417213] Console: switching to colour dummy device 80x25
> [    1.417272] nouveau 0000:01:00.0: NVIDIA GT218 (0a8280b1)
> [    1.531565] nouveau 0000:01:00.0: bios: nvkm_bios_new: version 70.18.6f.00.05
> [    1.531916] nouveau 0000:01:00.0: fb: nvkm_ram_ctor: 1024 MiB DDR3
> [    2.248212] tsc: Refined TSC clocksource calibration: 3392.144 MHz
> [    2.248218] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x30e5517d4e4, max_idle_ns: 440795261668 ns
> [    2.252203] clocksource: Switched to clocksource tsc
> [    2.848138] nouveau 0000:01:00.0: DRM: VRAM: 1024 MiB
> [    2.848142] nouveau 0000:01:00.0: DRM: GART: 1048576 MiB
> [    2.848145] nouveau 0000:01:00.0: DRM: TMDS table version 2.0
> [    2.848147] nouveau 0000:01:00.0: DRM: DCB version 4.0
> [    2.848149] nouveau 0000:01:00.0: DRM: DCB outp 00: 01000302 00020030
> [    2.848151] nouveau 0000:01:00.0: DRM: DCB outp 01: 02000300 00000000
> [    2.848154] nouveau 0000:01:00.0: DRM: DCB outp 02: 02011362 00020010
> [    2.848155] nouveau 0000:01:00.0: DRM: DCB outp 03: 01022310 00000000
> [    2.848157] nouveau 0000:01:00.0: DRM: DCB conn 00: 00001030
> [    2.848159] nouveau 0000:01:00.0: DRM: DCB conn 01: 00002161
> [    2.848161] nouveau 0000:01:00.0: DRM: DCB conn 02: 00000200
> [    2.850214] nouveau 0000:01:00.0: DRM: MM: using COPY for buffer copies
> [    2.908409] nouveau 0000:01:00.0: DRM: allocated 1600x900 fb: 0x70000, bo 00000000091fb080
> [    2.908518] fbcon: nouveaudrmfb (fb0) is primary device
> [    2.955528] Console: switching to colour frame buffer device 200x56
> [    2.957780] nouveau 0000:01:00.0: [drm] fb0: nouveaudrmfb frame buffer device
> [    2.957926] [drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on minor 0
> [    2.959816] loop: module loaded
>
> regards,
> dan carpenter


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] drm/ttm: fix error handling in ttm_bo_handle_move_mem()
  2021-06-17  7:41                 ` Christian König
@ 2021-06-17 16:54                   ` Daniel Vetter
  -1 siblings, 0 replies; 19+ messages in thread
From: Daniel Vetter @ 2021-06-17 16:54 UTC (permalink / raw)
  To: Christian König
  Cc: Dan Carpenter, thomas.hellstrom, Huang Rui, David Airlie,
	Daniel Vetter, dri-devel, linux-kernel

On Thu, Jun 17, 2021 at 09:41:35AM +0200, Christian König wrote:
> 
> 
> Am 16.06.21 um 21:19 schrieb Dan Carpenter:
> > On Wed, Jun 16, 2021 at 01:00:38PM +0200, Christian König wrote:
> > > 
> > > Am 16.06.21 um 11:36 schrieb Dan Carpenter:
> > > > On Wed, Jun 16, 2021 at 10:47:14AM +0200, Christian König wrote:
> > > > > Am 16.06.21 um 10:37 schrieb Dan Carpenter:
> > > > > > On Wed, Jun 16, 2021 at 08:46:33AM +0200, Christian König wrote:
> > > > > > > Sending the first message didn't worked, so let's try again.
> > > > > > > 
> > > > > > > Am 16.06.21 um 08:30 schrieb Dan Carpenter:
> > > > > > > > There are three bugs here:
> > > > > > > > 1) We need to call unpopulate() if ttm_tt_populate() succeeds.
> > > > > > > > 2) The "new_man = ttm_manager_type(bdev, bo->mem.mem_type);" assignment
> > > > > > > >        was wrong and it was really assigning "new_mem = old_mem;".  There
> > > > > > > >        is no need for this assignment anyway as we already have the value
> > > > > > > >        for "new_mem".
> > > > > > > > 3) The (!new_man->use_tt) condition is reversed.
> > > > > > > > 
> > > > > > > > Fixes: ba4e7d973dd0 ("drm: Add the TTM GPU memory manager subsystem.")
> > > > > > > > Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
> > > > > > > > ---
> > > > > > > > This is from reading the code and I can't swear that I have understood
> > > > > > > > it correctly.  My nouveau driver is currently unusable and this patch
> > > > > > > > has not helped.  But hopefully if I fix enough bugs eventually it will
> > > > > > > > start to work.
> > > > > > > Well NAK, the code previously looked quite well and you are breaking it now.
> > > > > > > 
> > > > > > > What's the problem with nouveau?
> > > > > > > 
> > > > > > The new Firefox seems to excersize nouveau more than the old one so
> > > > > > when I start 10 firefox windows it just hangs the graphics.
> > > > > > 
> > > > > > I've added debug code and it seems like the problem is that
> > > > > > nv50_mem_new() is failing.
> > > > > Sounds like it is running out of memory to me.
> > > > > 
> > > > > Do you have a dmesg?
> > > > > 
> > > > At first there was a very straight forward use after free bug which I
> > > > fixed.
> > > > https://lore.kernel.org/nouveau/YMinJwpIei9n1Pn1@mwanda/T/#u
> > > > 
> > > > But now the use after free is gone the only thing in dmesg is:
> > > > "[TTM] Buffer eviction failed".  And I have some firmware missing.
> > > > 
> > > > [  205.489763] rfkill: input handler disabled
> > > > [  205.678292] nouveau 0000:01:00.0: Direct firmware load for nouveau/nva8_fuc084 failed with error -2
> > > > [  205.678300] nouveau 0000:01:00.0: Direct firmware load for nouveau/nva8_fuc084d failed with error -2
> > > > [  205.678302] nouveau 0000:01:00.0: msvld: unable to load firmware data
> > > > [  205.678304] nouveau 0000:01:00.0: msvld: init failed, -19
> > > > [  296.150632] [TTM] Buffer eviction failed
> > > > [  417.084265] [TTM] Buffer eviction failed
> > > > [  447.295961] [TTM] Buffer eviction failed
> > > > [  510.800231] [TTM] Buffer eviction failed
> > > > [  556.101384] [TTM] Buffer eviction failed
> > > > [  616.495790] [TTM] Buffer eviction failed
> > > > [  692.014007] [TTM] Buffer eviction failed
> > > > 
> > > > The eviction failed message only shows up a minute after the hang so it
> > > > seems more like a symptom than a root cause.
> > > Yeah, look at the timing. What happens is that the buffer eviction timed out
> > > because the hardware is locked up.
> > > 
> > > No idea what that could be. It might not even be kernel related at all.
> > I don't think it's hardware related...  Using an old version of firefox
> > "fixes" the problem.  I downloaded the firmware so that's not the issue.
> > Here's the dmesg load info with the new firmware.
> 
> Oh, I was not suggesting a hardware problem.
> 
> The most likely cause is a software issue in userspace, e.g. wrong order of
> doing thing, doing things to fast without waiting etc...
> 
> There are tons of things how userspace can crash GPU hardware you can't
> prevent in the kernel. Especially sending an endless loop is well known as
> Turing's halting problems and not even theoretically solvable.
> 
> I suggest to start digging in userspace instead.

I guess nouveau doesn't have reset when the fences time out? That would at
least paper over this, plus it makes debugging the bug in mesa3 easier.

Also as Christian points out, because halting problem lack of tdr (timeoud
and device reset) is actually a security bug itself.
-Daniel

> 
> Christian.
> 
> > 
> > [    1.412458] AMD-Vi: AMD IOMMUv2 driver by Joerg Roedel <jroedel@suse.de>
> > [    1.412527] AMD-Vi: AMD IOMMUv2 functionality not available on this system
> > [    1.412710] nouveau 0000:01:00.0: vgaarb: deactivate vga console
> > [    1.417213] Console: switching to colour dummy device 80x25
> > [    1.417272] nouveau 0000:01:00.0: NVIDIA GT218 (0a8280b1)
> > [    1.531565] nouveau 0000:01:00.0: bios: nvkm_bios_new: version 70.18.6f.00.05
> > [    1.531916] nouveau 0000:01:00.0: fb: nvkm_ram_ctor: 1024 MiB DDR3
> > [    2.248212] tsc: Refined TSC clocksource calibration: 3392.144 MHz
> > [    2.248218] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x30e5517d4e4, max_idle_ns: 440795261668 ns
> > [    2.252203] clocksource: Switched to clocksource tsc
> > [    2.848138] nouveau 0000:01:00.0: DRM: VRAM: 1024 MiB
> > [    2.848142] nouveau 0000:01:00.0: DRM: GART: 1048576 MiB
> > [    2.848145] nouveau 0000:01:00.0: DRM: TMDS table version 2.0
> > [    2.848147] nouveau 0000:01:00.0: DRM: DCB version 4.0
> > [    2.848149] nouveau 0000:01:00.0: DRM: DCB outp 00: 01000302 00020030
> > [    2.848151] nouveau 0000:01:00.0: DRM: DCB outp 01: 02000300 00000000
> > [    2.848154] nouveau 0000:01:00.0: DRM: DCB outp 02: 02011362 00020010
> > [    2.848155] nouveau 0000:01:00.0: DRM: DCB outp 03: 01022310 00000000
> > [    2.848157] nouveau 0000:01:00.0: DRM: DCB conn 00: 00001030
> > [    2.848159] nouveau 0000:01:00.0: DRM: DCB conn 01: 00002161
> > [    2.848161] nouveau 0000:01:00.0: DRM: DCB conn 02: 00000200
> > [    2.850214] nouveau 0000:01:00.0: DRM: MM: using COPY for buffer copies
> > [    2.908409] nouveau 0000:01:00.0: DRM: allocated 1600x900 fb: 0x70000, bo 00000000091fb080
> > [    2.908518] fbcon: nouveaudrmfb (fb0) is primary device
> > [    2.955528] Console: switching to colour frame buffer device 200x56
> > [    2.957780] nouveau 0000:01:00.0: [drm] fb0: nouveaudrmfb frame buffer device
> > [    2.957926] [drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on minor 0
> > [    2.959816] loop: module loaded
> > 
> > regards,
> > dan carpenter
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] drm/ttm: fix error handling in ttm_bo_handle_move_mem()
@ 2021-06-17 16:54                   ` Daniel Vetter
  0 siblings, 0 replies; 19+ messages in thread
From: Daniel Vetter @ 2021-06-17 16:54 UTC (permalink / raw)
  To: Christian König
  Cc: thomas.hellstrom, David Airlie, linux-kernel, dri-devel,
	Huang Rui, Dan Carpenter

On Thu, Jun 17, 2021 at 09:41:35AM +0200, Christian König wrote:
> 
> 
> Am 16.06.21 um 21:19 schrieb Dan Carpenter:
> > On Wed, Jun 16, 2021 at 01:00:38PM +0200, Christian König wrote:
> > > 
> > > Am 16.06.21 um 11:36 schrieb Dan Carpenter:
> > > > On Wed, Jun 16, 2021 at 10:47:14AM +0200, Christian König wrote:
> > > > > Am 16.06.21 um 10:37 schrieb Dan Carpenter:
> > > > > > On Wed, Jun 16, 2021 at 08:46:33AM +0200, Christian König wrote:
> > > > > > > Sending the first message didn't worked, so let's try again.
> > > > > > > 
> > > > > > > Am 16.06.21 um 08:30 schrieb Dan Carpenter:
> > > > > > > > There are three bugs here:
> > > > > > > > 1) We need to call unpopulate() if ttm_tt_populate() succeeds.
> > > > > > > > 2) The "new_man = ttm_manager_type(bdev, bo->mem.mem_type);" assignment
> > > > > > > >        was wrong and it was really assigning "new_mem = old_mem;".  There
> > > > > > > >        is no need for this assignment anyway as we already have the value
> > > > > > > >        for "new_mem".
> > > > > > > > 3) The (!new_man->use_tt) condition is reversed.
> > > > > > > > 
> > > > > > > > Fixes: ba4e7d973dd0 ("drm: Add the TTM GPU memory manager subsystem.")
> > > > > > > > Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
> > > > > > > > ---
> > > > > > > > This is from reading the code and I can't swear that I have understood
> > > > > > > > it correctly.  My nouveau driver is currently unusable and this patch
> > > > > > > > has not helped.  But hopefully if I fix enough bugs eventually it will
> > > > > > > > start to work.
> > > > > > > Well NAK, the code previously looked quite well and you are breaking it now.
> > > > > > > 
> > > > > > > What's the problem with nouveau?
> > > > > > > 
> > > > > > The new Firefox seems to excersize nouveau more than the old one so
> > > > > > when I start 10 firefox windows it just hangs the graphics.
> > > > > > 
> > > > > > I've added debug code and it seems like the problem is that
> > > > > > nv50_mem_new() is failing.
> > > > > Sounds like it is running out of memory to me.
> > > > > 
> > > > > Do you have a dmesg?
> > > > > 
> > > > At first there was a very straight forward use after free bug which I
> > > > fixed.
> > > > https://lore.kernel.org/nouveau/YMinJwpIei9n1Pn1@mwanda/T/#u
> > > > 
> > > > But now the use after free is gone the only thing in dmesg is:
> > > > "[TTM] Buffer eviction failed".  And I have some firmware missing.
> > > > 
> > > > [  205.489763] rfkill: input handler disabled
> > > > [  205.678292] nouveau 0000:01:00.0: Direct firmware load for nouveau/nva8_fuc084 failed with error -2
> > > > [  205.678300] nouveau 0000:01:00.0: Direct firmware load for nouveau/nva8_fuc084d failed with error -2
> > > > [  205.678302] nouveau 0000:01:00.0: msvld: unable to load firmware data
> > > > [  205.678304] nouveau 0000:01:00.0: msvld: init failed, -19
> > > > [  296.150632] [TTM] Buffer eviction failed
> > > > [  417.084265] [TTM] Buffer eviction failed
> > > > [  447.295961] [TTM] Buffer eviction failed
> > > > [  510.800231] [TTM] Buffer eviction failed
> > > > [  556.101384] [TTM] Buffer eviction failed
> > > > [  616.495790] [TTM] Buffer eviction failed
> > > > [  692.014007] [TTM] Buffer eviction failed
> > > > 
> > > > The eviction failed message only shows up a minute after the hang so it
> > > > seems more like a symptom than a root cause.
> > > Yeah, look at the timing. What happens is that the buffer eviction timed out
> > > because the hardware is locked up.
> > > 
> > > No idea what that could be. It might not even be kernel related at all.
> > I don't think it's hardware related...  Using an old version of firefox
> > "fixes" the problem.  I downloaded the firmware so that's not the issue.
> > Here's the dmesg load info with the new firmware.
> 
> Oh, I was not suggesting a hardware problem.
> 
> The most likely cause is a software issue in userspace, e.g. wrong order of
> doing thing, doing things to fast without waiting etc...
> 
> There are tons of things how userspace can crash GPU hardware you can't
> prevent in the kernel. Especially sending an endless loop is well known as
> Turing's halting problems and not even theoretically solvable.
> 
> I suggest to start digging in userspace instead.

I guess nouveau doesn't have reset when the fences time out? That would at
least paper over this, plus it makes debugging the bug in mesa3 easier.

Also as Christian points out, because halting problem lack of tdr (timeoud
and device reset) is actually a security bug itself.
-Daniel

> 
> Christian.
> 
> > 
> > [    1.412458] AMD-Vi: AMD IOMMUv2 driver by Joerg Roedel <jroedel@suse.de>
> > [    1.412527] AMD-Vi: AMD IOMMUv2 functionality not available on this system
> > [    1.412710] nouveau 0000:01:00.0: vgaarb: deactivate vga console
> > [    1.417213] Console: switching to colour dummy device 80x25
> > [    1.417272] nouveau 0000:01:00.0: NVIDIA GT218 (0a8280b1)
> > [    1.531565] nouveau 0000:01:00.0: bios: nvkm_bios_new: version 70.18.6f.00.05
> > [    1.531916] nouveau 0000:01:00.0: fb: nvkm_ram_ctor: 1024 MiB DDR3
> > [    2.248212] tsc: Refined TSC clocksource calibration: 3392.144 MHz
> > [    2.248218] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x30e5517d4e4, max_idle_ns: 440795261668 ns
> > [    2.252203] clocksource: Switched to clocksource tsc
> > [    2.848138] nouveau 0000:01:00.0: DRM: VRAM: 1024 MiB
> > [    2.848142] nouveau 0000:01:00.0: DRM: GART: 1048576 MiB
> > [    2.848145] nouveau 0000:01:00.0: DRM: TMDS table version 2.0
> > [    2.848147] nouveau 0000:01:00.0: DRM: DCB version 4.0
> > [    2.848149] nouveau 0000:01:00.0: DRM: DCB outp 00: 01000302 00020030
> > [    2.848151] nouveau 0000:01:00.0: DRM: DCB outp 01: 02000300 00000000
> > [    2.848154] nouveau 0000:01:00.0: DRM: DCB outp 02: 02011362 00020010
> > [    2.848155] nouveau 0000:01:00.0: DRM: DCB outp 03: 01022310 00000000
> > [    2.848157] nouveau 0000:01:00.0: DRM: DCB conn 00: 00001030
> > [    2.848159] nouveau 0000:01:00.0: DRM: DCB conn 01: 00002161
> > [    2.848161] nouveau 0000:01:00.0: DRM: DCB conn 02: 00000200
> > [    2.850214] nouveau 0000:01:00.0: DRM: MM: using COPY for buffer copies
> > [    2.908409] nouveau 0000:01:00.0: DRM: allocated 1600x900 fb: 0x70000, bo 00000000091fb080
> > [    2.908518] fbcon: nouveaudrmfb (fb0) is primary device
> > [    2.955528] Console: switching to colour frame buffer device 200x56
> > [    2.957780] nouveau 0000:01:00.0: [drm] fb0: nouveaudrmfb frame buffer device
> > [    2.957926] [drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on minor 0
> > [    2.959816] loop: module loaded
> > 
> > regards,
> > dan carpenter
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-06-17 16:54 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-10 15:41 question about error handling in ttm_bo_handle_move_mem() Dan Carpenter
2021-06-16  6:30 ` [PATCH] drm/ttm: fix " Dan Carpenter
2021-06-16  6:30   ` Dan Carpenter
2021-06-16  6:46   ` Christian König
2021-06-16  6:46     ` Christian König
2021-06-16  8:37     ` Dan Carpenter
2021-06-16  8:37       ` Dan Carpenter
2021-06-16  8:47       ` Christian König
2021-06-16  8:47         ` Christian König
2021-06-16  9:36         ` Dan Carpenter
2021-06-16  9:36           ` Dan Carpenter
2021-06-16 11:00           ` Christian König
2021-06-16 11:00             ` Christian König
2021-06-16 19:19             ` Dan Carpenter
2021-06-16 19:19               ` Dan Carpenter
2021-06-17  7:41               ` Christian König
2021-06-17  7:41                 ` Christian König
2021-06-17 16:54                 ` Daniel Vetter
2021-06-17 16:54                   ` Daniel Vetter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.