linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* regression:  nouveau fifo: fault 01 ==> channel 1: killed ==> dead desktop
@ 2020-12-16 13:31 Mike Galbraith
  2020-12-17 13:30 ` [bisected] " Mike Galbraith
  0 siblings, 1 reply; 4+ messages in thread
From: Mike Galbraith @ 2020-12-16 13:31 UTC (permalink / raw)
  To: lkml; +Cc: Ben Skeggs, nouveau

When the below new to 5.11 cycle badness happens, it's time to reboot.

...
[   27.467260] NFSD: Using UMH upcall client tracking operations.
[   27.467273] NFSD: starting 90-second grace period (net f00000a0)
[   27.965138] Bridge firewalling registered
[   39.096604] fuse: init (API version 7.32)
[  961.579832] nouveau 0000:01:00.0: fifo: fault 01 [WRITE] at 000000000069f000 engine 15 [CE0] client 01 [HUB/CE0] reason 02 [PTE] on channel 1 [00ff73d000 DRM]
[  961.579840] nouveau 0000:01:00.0: fifo: channel 1: killed
[  961.579844] nouveau 0000:01:00.0: fifo: runlist 0: scheduled for recovery
[  961.579850] nouveau 0000:01:00.0: fifo: runlist 4: scheduled for recovery
[  961.579853] nouveau 0000:01:00.0: fifo: engine 4: scheduled for recovery

Box is aging generic i4790 desktop box with...
01:00.0 VGA compatible controller: NVIDIA Corporation GM204 [GeForce GTX 980] (rev a1)

	-Mike


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [bisected] Re: regression:  nouveau fifo: fault 01 ==> channel 1: killed ==> dead desktop
  2020-12-16 13:31 regression: nouveau fifo: fault 01 ==> channel 1: killed ==> dead desktop Mike Galbraith
@ 2020-12-17 13:30 ` Mike Galbraith
  2020-12-17 19:45   ` David Airlie
  0 siblings, 1 reply; 4+ messages in thread
From: Mike Galbraith @ 2020-12-17 13:30 UTC (permalink / raw)
  To: lkml; +Cc: Ben Skeggs, nouveau, Dave Airlie

On Wed, 2020-12-16 at 14:31 +0100, Mike Galbraith wrote:
> When the below new to 5.11 cycle badness happens, it's time to reboot.
>
> ...
> [   27.467260] NFSD: Using UMH upcall client tracking operations.
> [   27.467273] NFSD: starting 90-second grace period (net f00000a0)
> [   27.965138] Bridge firewalling registered
> [   39.096604] fuse: init (API version 7.32)
> [  961.579832] nouveau 0000:01:00.0: fifo: fault 01 [WRITE] at 000000000069f000 engine 15 [CE0] client 01 [HUB/CE0] reason 02 [PTE] on channel 1 [00ff73d000 DRM]
> [  961.579840] nouveau 0000:01:00.0: fifo: channel 1: killed
> [  961.579844] nouveau 0000:01:00.0: fifo: runlist 0: scheduled for recovery
> [  961.579850] nouveau 0000:01:00.0: fifo: runlist 4: scheduled for recovery
> [  961.579853] nouveau 0000:01:00.0: fifo: engine 4: scheduled for recovery
>
> Box is aging generic i4790 desktop box with...
> 01:00.0 VGA compatible controller: NVIDIA Corporation GM204 [GeForce GTX 980] (rev a1)

Bisection was straight forward.  A post bisect test revert was equally
straight forward, and seems to confirm the fingered commit.

0c8c0659d7475b6304b67374caf15b56cf0be4f9 is the first bad commit
commit 0c8c0659d7475b6304b67374caf15b56cf0be4f9
Author: Dave Airlie <airlied@redhat.com>
Date:   Thu Oct 29 13:59:20 2020 +1000

    drm/nouveau/ttm: use multihop

    This removes the code to move resources directly between
    SYSTEM and VRAM in favour of using the core ttm mulithop code.

    Signed-off-by: Dave Airlie <airlied@redhat.com>
    Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
    Reviewed-by: Christian König <christian.koenig@amd.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20201109005432.861936-4-airlied@gmail.com

 drivers/gpu/drm/nouveau/nouveau_bo.c | 112 ++++-------------------------------
 1 file changed, 13 insertions(+), 99 deletions(-)

git bisect start 'drivers/gpu'
# good: [2c85ebc57b3e1817b6ce1a6b703928e113a90442] Linux 5.10
git bisect good 2c85ebc57b3e1817b6ce1a6b703928e113a90442
# bad: [accefff5b547a9a1d959c7e76ad539bf2480e78b] Merge tag 'arm-soc-omap-genpd-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
git bisect bad accefff5b547a9a1d959c7e76ad539bf2480e78b
# bad: [d635a69dd4981cc51f90293f5f64268620ed1565] Merge tag 'net-next-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
git bisect bad d635a69dd4981cc51f90293f5f64268620ed1565
# bad: [0ca2ce81eb8ee30f3ba8ac7967fef9cfbb44dbdb] Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
git bisect bad 0ca2ce81eb8ee30f3ba8ac7967fef9cfbb44dbdb
# good: [f8aab60422c371425365d386dfd51e0c6c5b1041] drm/amdgpu: Initialise drm_gem_object_funcs for imported BOs
git bisect good f8aab60422c371425365d386dfd51e0c6c5b1041
# bad: [fab0fca1da5cdc48be051715cd9787df04fdce3a] Merge tag 'media/v5.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
git bisect bad fab0fca1da5cdc48be051715cd9787df04fdce3a
# bad: [bcc68bd8161261ceeb1a4ab02b5265758944f90d] Merge tag 'auxdisplay-for-linus-v5.11' of git://github.com/ojeda/linux
git bisect bad bcc68bd8161261ceeb1a4ab02b5265758944f90d
# bad: [22f8c80566c4a29a0d8b5ebf24aa1fd1679b39e5] Merge tag 'drm-misc-next-2020-11-18' of ssh://git.freedesktop.org/git/drm/drm-misc into drm-next
git bisect bad 22f8c80566c4a29a0d8b5ebf24aa1fd1679b39e5
# bad: [a1ac250a82a5e97db71f14101ff7468291a6aaef] fbcon: Avoid using FNTCHARCNT() and hard-coded built-in font charcount
git bisect bad a1ac250a82a5e97db71f14101ff7468291a6aaef
# good: [a39855076c859b7f6c58ed4da8f195a2a6cd3c7b] drm/cma-helper: Make default object functions the default
git bisect good a39855076c859b7f6c58ed4da8f195a2a6cd3c7b
# bad: [5f1f10998e7f0ba98a8efc27009cd9a11cff6616] drm/atmel-hlcdc/atmel_hlcdc_plane: Staticise local function 'atmel_hlcdc_plane_setup_scaler()'
git bisect bad 5f1f10998e7f0ba98a8efc27009cd9a11cff6616
# good: [55c8bcaeccaa5c6d9e7a432ebd0a1717f488a3f4] drm: mxsfb: Implement .format_mod_supported
git bisect good 55c8bcaeccaa5c6d9e7a432ebd0a1717f488a3f4
# bad: [0c8c0659d7475b6304b67374caf15b56cf0be4f9] drm/nouveau/ttm: use multihop
git bisect bad 0c8c0659d7475b6304b67374caf15b56cf0be4f9
# good: [23d6ab1d4c503660632e7b18cbb571d62d9bf792] drm: remove pgprot_decrypted() before calls to io_remap_pfn_range()
git bisect good 23d6ab1d4c503660632e7b18cbb571d62d9bf792
# good: [ebdf565169af006ee3be8c40eecbfc77d28a3b84] drm/ttm: add multihop infrastrucutre (v3)
git bisect good ebdf565169af006ee3be8c40eecbfc77d28a3b84
# good: [f5a89a5cae812a39993be32e74c8ed7856b1e2b2] drm/amdgpu/ttm: use multihop
git bisect good f5a89a5cae812a39993be32e74c8ed7856b1e2b2
# first bad commit: [0c8c0659d7475b6304b67374caf15b56cf0be4f9] drm/nouveau/ttm: use multihop


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [bisected] Re: regression: nouveau fifo: fault 01 ==> channel 1: killed ==> dead desktop
  2020-12-17 13:30 ` [bisected] " Mike Galbraith
@ 2020-12-17 19:45   ` David Airlie
  2020-12-17 20:04     ` Mike Galbraith
  0 siblings, 1 reply; 4+ messages in thread
From: David Airlie @ 2020-12-17 19:45 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: lkml, Ben Skeggs, nouveau

[-- Attachment #1: Type: text/plain, Size: 5106 bytes --]

Hi Mike,

Thanks for the report,

Does the attached patch help?

Dave.

On Thu, Dec 17, 2020 at 11:30 PM Mike Galbraith <efault@gmx.de> wrote:
>
> On Wed, 2020-12-16 at 14:31 +0100, Mike Galbraith wrote:
> > When the below new to 5.11 cycle badness happens, it's time to reboot.
> >
> > ...
> > [   27.467260] NFSD: Using UMH upcall client tracking operations.
> > [   27.467273] NFSD: starting 90-second grace period (net f00000a0)
> > [   27.965138] Bridge firewalling registered
> > [   39.096604] fuse: init (API version 7.32)
> > [  961.579832] nouveau 0000:01:00.0: fifo: fault 01 [WRITE] at 000000000069f000 engine 15 [CE0] client 01 [HUB/CE0] reason 02 [PTE] on channel 1 [00ff73d000 DRM]
> > [  961.579840] nouveau 0000:01:00.0: fifo: channel 1: killed
> > [  961.579844] nouveau 0000:01:00.0: fifo: runlist 0: scheduled for recovery
> > [  961.579850] nouveau 0000:01:00.0: fifo: runlist 4: scheduled for recovery
> > [  961.579853] nouveau 0000:01:00.0: fifo: engine 4: scheduled for recovery
> >
> > Box is aging generic i4790 desktop box with...
> > 01:00.0 VGA compatible controller: NVIDIA Corporation GM204 [GeForce GTX 980] (rev a1)
>
> Bisection was straight forward.  A post bisect test revert was equally
> straight forward, and seems to confirm the fingered commit.
>
> 0c8c0659d7475b6304b67374caf15b56cf0be4f9 is the first bad commit
> commit 0c8c0659d7475b6304b67374caf15b56cf0be4f9
> Author: Dave Airlie <airlied@redhat.com>
> Date:   Thu Oct 29 13:59:20 2020 +1000
>
>     drm/nouveau/ttm: use multihop
>
>     This removes the code to move resources directly between
>     SYSTEM and VRAM in favour of using the core ttm mulithop code.
>
>     Signed-off-by: Dave Airlie <airlied@redhat.com>
>     Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>     Reviewed-by: Christian König <christian.koenig@amd.com>
>     Link: https://patchwork.freedesktop.org/patch/msgid/20201109005432.861936-4-airlied@gmail.com
>
>  drivers/gpu/drm/nouveau/nouveau_bo.c | 112 ++++-------------------------------
>  1 file changed, 13 insertions(+), 99 deletions(-)
>
> git bisect start 'drivers/gpu'
> # good: [2c85ebc57b3e1817b6ce1a6b703928e113a90442] Linux 5.10
> git bisect good 2c85ebc57b3e1817b6ce1a6b703928e113a90442
> # bad: [accefff5b547a9a1d959c7e76ad539bf2480e78b] Merge tag 'arm-soc-omap-genpd-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
> git bisect bad accefff5b547a9a1d959c7e76ad539bf2480e78b
> # bad: [d635a69dd4981cc51f90293f5f64268620ed1565] Merge tag 'net-next-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
> git bisect bad d635a69dd4981cc51f90293f5f64268620ed1565
> # bad: [0ca2ce81eb8ee30f3ba8ac7967fef9cfbb44dbdb] Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
> git bisect bad 0ca2ce81eb8ee30f3ba8ac7967fef9cfbb44dbdb
> # good: [f8aab60422c371425365d386dfd51e0c6c5b1041] drm/amdgpu: Initialise drm_gem_object_funcs for imported BOs
> git bisect good f8aab60422c371425365d386dfd51e0c6c5b1041
> # bad: [fab0fca1da5cdc48be051715cd9787df04fdce3a] Merge tag 'media/v5.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
> git bisect bad fab0fca1da5cdc48be051715cd9787df04fdce3a
> # bad: [bcc68bd8161261ceeb1a4ab02b5265758944f90d] Merge tag 'auxdisplay-for-linus-v5.11' of git://github.com/ojeda/linux
> git bisect bad bcc68bd8161261ceeb1a4ab02b5265758944f90d
> # bad: [22f8c80566c4a29a0d8b5ebf24aa1fd1679b39e5] Merge tag 'drm-misc-next-2020-11-18' of ssh://git.freedesktop.org/git/drm/drm-misc into drm-next
> git bisect bad 22f8c80566c4a29a0d8b5ebf24aa1fd1679b39e5
> # bad: [a1ac250a82a5e97db71f14101ff7468291a6aaef] fbcon: Avoid using FNTCHARCNT() and hard-coded built-in font charcount
> git bisect bad a1ac250a82a5e97db71f14101ff7468291a6aaef
> # good: [a39855076c859b7f6c58ed4da8f195a2a6cd3c7b] drm/cma-helper: Make default object functions the default
> git bisect good a39855076c859b7f6c58ed4da8f195a2a6cd3c7b
> # bad: [5f1f10998e7f0ba98a8efc27009cd9a11cff6616] drm/atmel-hlcdc/atmel_hlcdc_plane: Staticise local function 'atmel_hlcdc_plane_setup_scaler()'
> git bisect bad 5f1f10998e7f0ba98a8efc27009cd9a11cff6616
> # good: [55c8bcaeccaa5c6d9e7a432ebd0a1717f488a3f4] drm: mxsfb: Implement .format_mod_supported
> git bisect good 55c8bcaeccaa5c6d9e7a432ebd0a1717f488a3f4
> # bad: [0c8c0659d7475b6304b67374caf15b56cf0be4f9] drm/nouveau/ttm: use multihop
> git bisect bad 0c8c0659d7475b6304b67374caf15b56cf0be4f9
> # good: [23d6ab1d4c503660632e7b18cbb571d62d9bf792] drm: remove pgprot_decrypted() before calls to io_remap_pfn_range()
> git bisect good 23d6ab1d4c503660632e7b18cbb571d62d9bf792
> # good: [ebdf565169af006ee3be8c40eecbfc77d28a3b84] drm/ttm: add multihop infrastrucutre (v3)
> git bisect good ebdf565169af006ee3be8c40eecbfc77d28a3b84
> # good: [f5a89a5cae812a39993be32e74c8ed7856b1e2b2] drm/amdgpu/ttm: use multihop
> git bisect good f5a89a5cae812a39993be32e74c8ed7856b1e2b2
> # first bad commit: [0c8c0659d7475b6304b67374caf15b56cf0be4f9] drm/nouveau/ttm: use multihop
>

[-- Attachment #2: 0001-drm-nouveau-fix-multihop-when-move-doesn-t-work.patch --]
[-- Type: text/x-patch, Size: 2138 bytes --]

From 7e3eef93cdf8228d4f9b8ef2fddd170eedc6a0b0 Mon Sep 17 00:00:00 2001
From: Dave Airlie <airlied@redhat.com>
Date: Fri, 18 Dec 2020 05:43:15 +1000
Subject: [PATCH] drm/nouveau: fix multihop when move doesn't work.

As per the radeon/amdgpu fix don't use multihop is hw moves
aren't enabled.

Reported-by: Mike Galbraith <efault@gmx.de>
Fixes: 0c8c0659d74 ("drm/nouveau/ttm: use multihop")
Signed-off-by: Dave Airlie <airlied@redhat.com>
---
 drivers/gpu/drm/nouveau/nouveau_bo.c | 31 ++++++++++++++--------------
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
index 1386b0fc1640..c85b1af06b7b 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -942,16 +942,6 @@ nouveau_bo_move(struct ttm_buffer_object *bo, bool evict,
 	struct nouveau_drm_tile *new_tile = NULL;
 	int ret = 0;
 
-	if ((old_reg->mem_type == TTM_PL_SYSTEM &&
-	     new_reg->mem_type == TTM_PL_VRAM) ||
-	    (old_reg->mem_type == TTM_PL_VRAM &&
-	     new_reg->mem_type == TTM_PL_SYSTEM)) {
-		hop->fpfn = 0;
-		hop->lpfn = 0;
-		hop->mem_type = TTM_PL_TT;
-		hop->flags = 0;
-		return -EMULTIHOP;
-	}
 
 	if (new_reg->mem_type == TTM_PL_TT) {
 		ret = nouveau_ttm_tt_bind(bo->bdev, bo->ttm, new_reg);
@@ -995,14 +985,25 @@ nouveau_bo_move(struct ttm_buffer_object *bo, bool evict,
 
 	/* Hardware assisted copy. */
 	if (drm->ttm.move) {
+		if ((old_reg->mem_type == TTM_PL_SYSTEM &&
+		     new_reg->mem_type == TTM_PL_VRAM) ||
+		    (old_reg->mem_type == TTM_PL_VRAM &&
+		     new_reg->mem_type == TTM_PL_SYSTEM)) {
+			hop->fpfn = 0;
+			hop->lpfn = 0;
+			hop->mem_type = TTM_PL_TT;
+			hop->flags = 0;
+			return -EMULTIHOP;
+		}
 		ret = nouveau_bo_move_m2mf(bo, evict, ctx,
 					   new_reg);
-		if (!ret)
-			goto out;
-	}
+	} else
+		ret = -ENODEV;
 
-	/* Fallback to software copy. */
-	ret = ttm_bo_move_memcpy(bo, ctx, new_reg);
+	if (ret) {
+		/* Fallback to software copy. */
+		ret = ttm_bo_move_memcpy(bo, ctx, new_reg);
+	}
 
 out:
 	if (drm->client.device.info.family < NV_DEVICE_INFO_V0_TESLA) {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [bisected] Re: regression: nouveau fifo: fault 01 ==> channel 1: killed ==> dead desktop
  2020-12-17 19:45   ` David Airlie
@ 2020-12-17 20:04     ` Mike Galbraith
  0 siblings, 0 replies; 4+ messages in thread
From: Mike Galbraith @ 2020-12-17 20:04 UTC (permalink / raw)
  To: David Airlie; +Cc: lkml, Ben Skeggs, nouveau

On Fri, 2020-12-18 at 05:45 +1000, David Airlie wrote:

> Does the attached patch help?

Yup, that seems to have done the trick.  Fast bug squashing by the drm
guys today, two slowly bisected, two quickly squashed.

	-Mike


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-12-17 20:06 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-16 13:31 regression: nouveau fifo: fault 01 ==> channel 1: killed ==> dead desktop Mike Galbraith
2020-12-17 13:30 ` [bisected] " Mike Galbraith
2020-12-17 19:45   ` David Airlie
2020-12-17 20:04     ` Mike Galbraith

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).