All of lore.kernel.org
 help / color / mirror / Atom feed
* nouveau TRAP_M2MF still there on G98
@ 2018-04-03 20:00 Adam Borowski
       [not found] ` <20180403200053.iwk62zspnjatw46j-b9QjgO8OEXPVItvQsEIGlw@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Adam Borowski @ 2018-04-03 20:00 UTC (permalink / raw)
  To: Māris Nartišs
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Ben Skeggs

Hi!
In commit da5e45e619b3f101420c38b3006a9ae4f3ad19b0:

>  drm/nouveau/mmu: ALIGN_DOWN correct variable
>
>  Commit 7110c89bb8852ff8b0f88ce05b332b3fe22bd11e ("mmu: swap out round
>  for ALIGN") replaced two calls to round/rounddown with ALIGN/ALIGN_DOWN,
>  but erroneously applied ALIGN_DOWN to a different variable (addr) and left
>  intended variable (tail) not rounded/ALIGNed.
>  
>  As a result screen corruption, X lockups are observable. An example of kernel
>  log of affected system with NV98 card where it was bisected:
>  
>  nouveau 0000:01:00.0: gr: TRAP_M2MF 00000002 [IN]
>  nouveau 0000:01:00.0: gr: TRAP_M2MF 00320951 400007c0 00000000 04000000
>  nouveau 0000:01:00.0: gr: 00200000 [] ch 1 [000fbbe000 DRM] subc 4 class 5039
>  mthd 0100 data 00000000
>  nouveau 0000:01:00.0: fb: trapped read at 0040000000 on channel 1
>  [0fbbe000 DRM]
>  engine 00 [PGRAPH] client 03 [DISPATCH] subclient 04 [M2M_IN] reason 00000006
>  [NULL_DMAOBJ]

yet it is still reproducible for me on 4.16-rc7 and 4.16.0, which already
have your fix.  I don't know about earlier versions -- my newer card went
into flames just a few days ago, and I replaced it a brand new 8400GS (G98)
I happened to have in a dusty closet.  Obviously, I can bisect if that would
be helpful, but the error looks the same thus I'm reporting first.

Card's init:
[    7.068779] nouveau 0000:01:00.0: NVIDIA G98 (298200a2)
[    7.101622] cfg80211: Loading compiled-in X.509 certificates for regulatory database
[    7.201089] nouveau 0000:01:00.0: bios: version 62.98.29.00.00
[    7.239105] nouveau 0000:01:00.0: bios: M0203T not found
[    7.244530] nouveau 0000:01:00.0: bios: M0203E not matched!
[    7.250200] nouveau 0000:01:00.0: fb: 512 MiB DDR2
[    7.286877] [TTM] Zone  kernel: Available graphics memory: 4084962 kiB
[    7.293526] [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
[    7.300170] [TTM] Initializing pool allocator
[    7.304637] [TTM] Initializing DMA pool allocator
[    7.309469] nouveau 0000:01:00.0: DRM: VRAM: 512 MiB
[    7.314518] nouveau 0000:01:00.0: DRM: GART: 1048576 MiB
[    7.319933] nouveau 0000:01:00.0: DRM: TMDS table version 2.0
[    7.325796] nouveau 0000:01:00.0: DRM: DCB version 4.0
[    7.331067] nouveau 0000:01:00.0: DRM: DCB outp 00: 02000300 00000028
[    7.337596] nouveau 0000:01:00.0: DRM: DCB outp 01: 01000302 00020030
[    7.344130] nouveau 0000:01:00.0: DRM: DCB outp 02: 02011332 00020018
[    7.350663] nouveau 0000:01:00.0: DRM: DCB outp 03: 04022310 07a20028
[    7.357197] nouveau 0000:01:00.0: DRM: DCB conn 00: 00001030
[    7.362944] nouveau 0000:01:00.0: DRM: DCB conn 01: 00002161
[    7.368695] nouveau 0000:01:00.0: DRM: DCB conn 02: 00000200
[    7.375539] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    7.382246] [drm] Driver supports precise vblank timestamp query.
[    7.389671] nouveau 0000:01:00.0: DRM: MM: using M2MF for buffer copies
[    7.464023] nouveau 0000:01:00.0: DRM: allocated 1600x1200 fb: 0x70000, bo 00000000289517a1
[    7.501927] fbcon: nouveaufb (fb0) is primary device
[    7.615433] Console: switching to colour frame buffer device 160x64
[    7.666358] nouveau 0000:01:00.0: fb0: nouveaufb frame buffer device

Errors:
[ 4008.457182] nouveau 0000:01:00.0: gr: TRAP_M2MF 00000002 [IN]
[ 4008.463010] nouveau 0000:01:00.0: gr: TRAP_M2MF 00320251 00f72f00 00000000 04000430
[ 4008.470774] nouveau 0000:01:00.0: gr: 00200000 [] ch 2 [001fa31000 Xorg[2678]] subc 0 class 5039 mthd 0328 data 00000000
[ 4008.481808] nouveau 0000:01:00.0: fb: trapped read at 0000f72000 on channel 2 [1fa31000 Xorg[2678]] engine 00 [PGRAPH] client 03 [DISPATCH] subclient 04 [M2M_IN] reason 00000003 [PAGE_SYSTEM_ONLY]

[12964.990240] nouveau 0000:01:00.0: fifo: DMA_PUSHER - ch 2 [Xorg[2678]] get 0000031500 put 0000034394 ib_get 00000172 ib_put 00000173 state 8000e6a0 (err: INVALID_CMD) push 00704031
[12965.017230] nouveau 0000:01:00.0: fifo: DMA_PUSHER - ch 2 [Xorg[2678]] get 000003732c put 0000034394 ib_get 00000172 ib_put 00000199 state 80000000 (err: INVALID_CMD) push 00406040
[12965.033547] nouveau 0000:01:00.0: fb: trapped write at 00003be000 on channel 2 [1fa31000 Xorg[2678]] engine 00 [PGRAPH] client 0b [PROP] subclient 0c [DST2D] reason 00000002 [PAGE_NOT_PRESENT]
[12965.051014] nouveau 0000:01:00.0: gr: TRAP_PROP - TP 0 - 00000010 [DST2D_FAULT] - Address 00003be000
[12965.060314] nouveau 0000:01:00.0: gr: TRAP_PROP - TP 0 - e0c: 00000000, e18: 00000000, e1c: 00000000, e20: 00000011, e24: 0c030000
[12965.072241] nouveau 0000:01:00.0: gr: 00200000 [] ch 2 [001fa31000 Xorg[2678]] subc 2 class 502d mthd 0860 data ff020202
[12965.083283] nouveau 0000:01:00.0: fb: trapped write at 00003be200 on channel 2 [1fa31000 Xorg[2678]] engine 00 [PGRAPH] client 0b [PROP] subclient 0c [DST2D] reason 00000002 [PAGE_NOT_PRESENT]
[12965.100766] nouveau 0000:01:00.0: gr: TRAP_PROP - TP 0 - 00000010 [DST2D_FAULT] - Address 0000387000
[12965.109989] nouveau 0000:01:00.0: gr: TRAP_PROP - TP 0 - e0c: 00000000, e18: 00000000, e1c: 00000000, e20: 00000011, e24: 0c030000
[12965.121854] nouveau 0000:01:00.0: gr: 00200000 [] ch 2 [001fa31000 Xorg[2678]] subc 2 class 502d mthd 0860 data ff020202

[13069.807377] nouveau 0000:01:00.0: Xorg[2678]: nv50cal_space: -16
[13069.956879] nouveau 0000:01:00.0: Xorg[2678]: nv50cal_space: -16
[13070.103751] nouveau 0000:01:00.0: Xorg[2678]: nv50cal_space: -16
[13070.250175] nouveau 0000:01:00.0: Xorg[2678]: nv50cal_space: -16
[13070.397110] nouveau 0000:01:00.0: Xorg[2678]: nv50cal_space: -16
[13070.543727] nouveau 0000:01:00.0: Xorg[2678]: nv50cal_space: -16
[13070.551840] nouveau 0000:01:00.0: fifo: DMA_PUSHER - ch 2 [Xorg[2678]] get 0000024bc0 put 0000024f0c ib_get 0000011f ib_put 00000122 state 40000004 (err: INVALID_MTHD) push 00406040
[13070.569224] nouveau 0000:01:00.0: fifo: CACHE_ERROR - ch 2 [Xorg[2678]] subc 0 mthd 0000 data 0004e680
[13070.589325] nouveau 0000:01:00.0: gr: TRAP_PROP - TP 0 - 00000008 [SURF_HEIGHT_OVERRUN] - Address 0000000000
[13070.599271] nouveau 0000:01:00.0: gr: TRAP_PROP - TP 0 - e0c: 00000000, e18: 00000000, e1c: 00200110, e20: 00001100, e24: 00000000
[13070.611049] nouveau 0000:01:00.0: gr: 00200000 [] ch 2 [001fa31000 Xorg[2678]] subc 7 class 8297 mthd 15e0 data 00000000
[13070.621968] nouveau 0000:01:00.0: fb: trapped read at 000036b500 on channel 6 [1f551000 mplayer[23276]] engine 00 [PGRAPH] client 0a [TEXTURE] subclient 00 [] reason 00000002 [PAGE_NOT_PRESENT]
[13070.639374] nouveau 0000:01:00.0: gr: magic set 0:
[13070.644198] nouveau 0000:01:00.0: gr: 	00408904: 74087b0a
[13070.649648] nouveau 0000:01:00.0: gr: 	00408908: 000036b5
[13070.655117] nouveau 0000:01:00.0: gr: 	0040890c: 40000430
[13070.660569] nouveau 0000:01:00.0: gr: 	00408910: 36b00000
[13070.666010] nouveau 0000:01:00.0: gr: TRAP_TEXTURE - TP0: 00000003 [ FAULT]
[13070.673075] nouveau 0000:01:00.0: gr: 00200000 [] ch 6 [001f551000 mplayer[23276]] subc 3 class 8297 mthd 1b0c data 1000f010

[13191.087642] nouveau 0000:01:00.0: fifo: CACHE_ERROR - ch 2 [Xorg[2678]] subc 0 mthd 0190 data 001c0000
[13191.097980] nouveau 0000:01:00.0: fifo: CACHE_ERROR - ch 2 [Xorg[2678]] subc 0 mthd 0194 data 0004e680
[13191.109383] nouveau 0000:01:00.0: fifo: CACHE_ERROR - ch 2 [Xorg[2678]] subc 0 mthd 0198 data 0033004e
[13191.119791] nouveau 0000:01:00.0: fifo: CACHE_ERROR - ch 2 [Xorg[2678]] subc 0 mthd 019c data 0008e6a0
[13191.132146] nouveau 0000:01:00.0: fifo: CACHE_ERROR - ch 2 [Xorg[2678]] subc 0 mthd 01a0 data 00300190
[13191.141666] nouveau 0000:01:00.0: gr: 00000010 [ILLEGAL_MTHD] ch 2 [001fa31000 Xorg[2678]] subc 0 class 5039 mthd 01a4 data 00000436
[13191.153642] nouveau 0000:01:00.0: fifo: CACHE_ERROR - ch 2 [Xorg[2678]] subc 0 mthd 01a8 data 0004e680
[13191.165012] nouveau 0000:01:00.0: fifo: CACHE_ERROR - ch 2 [Xorg[2678]] subc 0 mthd 01ac data 0017004e
[13191.176429] nouveau 0000:01:00.0: fifo: CACHE_ERROR - ch 2 [Xorg[2678]] subc 0 mthd 01b0 data 0008e6a0
[13191.187792] nouveau 0000:01:00.0: fifo: CACHE_ERROR - ch 2 [Xorg[2678]] subc 0 mthd 01b4 data 003001a2
[13191.199192] nouveau 0000:01:00.0: fifo: CACHE_ERROR - ch 2 [Xorg[2678]] subc 0 mthd 01b8 data 00000012
[13191.210576] nouveau 0000:01:00.0: fifo: CACHE_ERROR - ch 2 [Xorg[2678]] subc 0 mthd 01bc data 0004e680
[13191.221991] nouveau 0000:01:00.0: fifo: CACHE_ERROR - ch 2 [Xorg[2678]] subc 0 mthd 01c0 data 00170060
[13191.233418] nouveau 0000:01:00.0: fifo: CACHE_ERROR - ch 2 [Xorg[2678]] subc 0 mthd 01c4 data 0004f5e0
[13191.242778] nouveau 0000:01:00.0: gr: 00000010 [ILLEGAL_MTHD] ch 2 [001fa31000 Xorg[2678]] subc 0 class 5039 mthd 01c8 data 00000436
[13191.254745] nouveau 0000:01:00.0: fifo: CACHE_ERROR - ch 2 [Xorg[2678]] subc 0 mthd 01cc data 0008ee04
[13191.266133] nouveau 0000:01:00.0: fifo: CACHE_ERROR - ch 2 [Xorg[2678]] subc 0 mthd 01d0 data 006e0063
[13191.277519] nouveau 0000:01:00.0: fifo: CACHE_ERROR - ch 2 [Xorg[2678]] subc 0 mthd 01d4 data 00250017
[13191.288893] nouveau 0000:01:00.0: fifo: CACHE_ERROR - ch 2 [Xorg[2678]] subc 0 mthd 01d8 data 0004f5dc
[13191.298281] nouveau 0000:01:00.0: gr: DATA_ERROR 00000004 [INVALID_VALUE]
[13191.305103] nouveau 0000:01:00.0: gr: 00100000 [] ch 2 [001fa31000 Xorg[2678]] subc 7 class 8297 mthd 1288 data 001003c4
[13191.316031] nouveau 0000:01:00.0: gr: TRAP_PROP - TP 0 - 00000008 [SURF_HEIGHT_OVERRUN] - Address 0000000000
[13191.325920] nouveau 0000:01:00.0: gr: TRAP_PROP - TP 0 - e0c: 00000000, e18: 00000000, e1c: 001c0000, e20: 00001100, e24: 00000000
[13191.337708] nouveau 0000:01:00.0: gr: 00200000 [] ch 2 [001fa31000 Xorg[2678]] subc 7 class 8297 mthd 15e0 data 00000000

[13965.708639] nouveau 0000:01:00.0: fb: trapped read at 2e0001e46e on channel 2 [1fa31000 Xorg[2678]] engine 0c [SEMAPHORE_BG] client 08 [PFIFO_READ] subclient 00 [] reason 00000000 [PT_NOT_PRESENT]

Artifacts are not correlated with any printk-ed errors; somehow of my two
monitors they ever happen only on the left small one (via DVI) but not on
the right big one (via HDMI, xrandred vertical).  A screenshot:
https://angband.pl/tmp/g98-artifacts.png
These never happen on initial draw, slowly appear and accumulate,
disappearing once the window is redrawn, then start appearing again.

Crash (only one happened so far):
[18827.759324] nouveau 0000:01:00.0: fifo: intr 04000000
[18829.968650] ------------[ cut here ]------------
[18830.073628] WARNING: CPU: 1 PID: 0 at drivers/gpu/drm/nouveau/nvkm/subdev/mc/base.c:72 nvkm_mc_intr+0x144/0x160 [nouveau]
[18830.078080] nouveau 0000:01:00.0: timeout
[18830.089198] Modules linked in: arc4 mt7601u mac80211 sha256_generic cfg80211 rfkill nouveau video ttm
[18830.093352] WARNING: CPU: 3 PID: 7568 at drivers/gpu/drm/nouveau/nvkm/subdev/bar/g84.c:38 g84_bar_flush+0xb4/0xc0 [nouveau]
[18830.102599] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.16.0-rc7-debug-00033-g48f6f9f11263 #1
[18830.113768] Modules linked in:
[18830.122386] Hardware name: System manufacturer System Product Name/M4A77T, BIOS 2401    05/18/2011
[18830.122448] RIP: 0010:nvkm_mc_intr+0x144/0x160 [nouveau]
[18830.125457]  arc4
[18830.134575] RSP: 0018:ffff88022fc43da0 EFLAGS: 00010046
[18830.139921]  mt7601u mac80211
[18830.141889] RAX: 00000000ffffffff RBX: ffff880212651780 RCX: ffff88022fc592e0
[18830.141893] RDX: ffffc90002088068 RSI: ffff88022fc43df7 RDI: ffffc90002000100
[18830.147139]  sha256_generic cfg80211
[18830.150150] RBP: ffff880224c62400 R08: 0000000000000000 R09: 0000000000000000
[18830.150153] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000ffffffff
[18830.157305]  rfkill nouveau
[18830.164605] R13: 000000000000001a R14: ffff8802223bfc00 R15: 0000000000000000
[18830.164610] FS:  0000000000000000(0000) GS:ffff88022fc40000(0000) knlGS:0000000000000000
[18830.168190]  video ttm
[18830.175422] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[18830.175426] CR2: 00007f68e65e8030 CR3: 000000021dbc9000 CR4: 00000000000006e0
[18830.182590] CPU: 3 PID: 7568 Comm: <omitted> Not tainted 4.16.0-rc7-debug-00033-g48f6f9f11263 #1
[18830.185467] Call Trace:
[18830.192613] Hardware name: System manufacturer System Product Name/M4A77T, BIOS 2401    05/18/2011
[18830.192633] RIP: 0010:g84_bar_flush+0xb4/0xc0 [nouveau]
[18830.200808]  <IRQ>
[18830.203170] RSP: 0018:ffffc900017fb7f0 EFLAGS: 00010086
[18830.208959]  ? wq_worker_waking_up+0x8/0x30
[18830.216083] RAX: 000000000000001d RBX: ffff880225cda400 RCX: 0000000000000006
[18830.216086] RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffff88022fcd5270
[18830.225253]  nvkm_pci_intr+0x4e/0x80 [nouveau]
[18830.227661] RBP: ffff880224c62400 R08: 0000000000000000 R09: 0000000000013100
[18830.227665] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8802223bfb00
[18830.236777]  __handle_irq_event_percpu+0x32/0xe0
[18830.242005] R13: 0000112066bcb3a0 R14: ffff8802223c2c48 R15: 0000000000000246
[18830.242009] FS:  00007f6910fea280(0000) GS:ffff88022fcc0000(0000) knlGS:0000000000000000
[18830.244064]  handle_irq_event_percpu+0x23/0x60
[18830.249284] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[18830.249288] CR2: 000055e159e90bec CR3: 00000001e0617000 CR4: 00000000000006e0
[18830.253544]  handle_irq_event+0x25/0x50
[18830.260680] Call Trace:
[18830.260706]  nv50_instobj_release+0x30/0xc0 [nouveau]
[18830.267914]  ? apic_ack_edge+0x31/0x50
[18830.272387]  nvkm_vmm_iter.constprop.14+0x549/0x7e0 [nouveau]
[18830.279615]  handle_edge_irq+0x6e/0x190
[18830.286792]  ? nvkm_vmm_map_choose+0xb0/0xb0 [nouveau]
[18830.291480]  handle_irq+0x17/0x30
[18830.298670]  ? nvkm_vmm_map_valid+0x50/0x200 [nouveau]
[18830.298693]  nvkm_vmm_map+0x20f/0x3f0 [nouveau]
[18830.306900]  do_IRQ+0x50/0xf0
[18830.311368]  ? nv50_vmm_part+0x60/0x60 [nouveau]
[18830.317205]  common_interrupt+0xf/0xf
[18830.324361]  nvkm_mem_map_dma+0x43/0x50 [nouveau]
[18830.328262] RIP: 0010:__do_softirq+0x66/0x1f2
[18830.330740]  nvkm_uvmm_mthd+0x769/0x8b0 [nouveau]
[18830.335843] RSP: 0018:ffff88022fc43f80 EFLAGS: 00000206 ORIG_RAX: ffffffffffffffdd
[18830.339620]  nvkm_ioctl+0x12d/0x270 [nouveau]
[18830.345425] RAX: 000000000001f080 RBX: ffff88022fc54f00 RCX: 00000000ffffffff
[18830.345429] RDX: 0000000000000000 RSI: 0000000000200042 RDI: 0000000000000380
[18830.349278]  nvif_object_mthd+0x123/0x150 [nouveau]
[18830.354492] RBP: 0000000000000000 R08: 0000000000000080 R09: ffffc9000373bd20
[18830.354495] R10: 0000000000000002 R11: 000000000000000f R12: 0000000000000000
[18830.357823]  ? _cond_resched+0x10/0x40
[18830.362985] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000002
[18830.362991]  ? common_interrupt+0xa/0xf
[18830.367541]  nvif_vmm_map+0x90/0xd0 [nouveau]
[18830.370549]  ? hrtimer_interrupt+0x112/0x240
[18830.375189]  nouveau_mem_map+0x77/0xd0 [nouveau]
[18830.378884]  irq_exit+0xaa/0xb0
[18830.383618]  nouveau_vma_new+0x1b5/0x1d0 [nouveau]
[18830.388001]  smp_apic_timer_interrupt+0x64/0xa0
[18830.392746]  nouveau_gem_object_open+0x109/0x130 [nouveau]
[18830.400421]  apic_timer_interrupt+0xf/0x20
[18830.404788]  drm_gem_handle_create_tail+0xd3/0x160
[18830.411969]  </IRQ>
[18830.419146]  ? nouveau_gem_new+0x120/0x120 [nouveau]
[18830.424088] RIP: 0010:do_idle+0x15a/0x1c0
[18830.431277]  nouveau_gem_ioctl_new+0x83/0xd0 [nouveau]
[18830.431282]  drm_ioctl_kernel+0x6b/0xd0
[18830.438630] RSP: 0018:ffffc90000ccbf20 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff12
[18830.442389]  drm_ioctl+0x301/0x3a0
[18830.449643] RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffff88022fc60400
[18830.449648] RDX: 00000000000c0000 RSI: 0000000000000003 RDI: ffff88022fc54f00
[18830.453508]  ? nouveau_gem_new+0x120/0x120 [nouveau]
[18830.457946] RBP: ffffffff8228c8d8 R08: 0000000000080000 R09: ffff88022fca0400
[18830.457950] R10: 0000000000000000 R11: 000000000001a574 R12: 0000000000000000
[18830.462262]  nouveau_drm_ioctl+0x8d/0xd0 [nouveau]
[18830.466906] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[18830.466914]  cpu_startup_entry+0x1b/0x20
[18830.470074]  do_vfs_ioctl+0xa0/0x610
[18830.474926]  secondary_startup_64+0xa5/0xb0
[18830.479477]  ? _copy_to_user+0x22/0x30
[18830.485091] Code: 
[18830.489216]  ? SyS_prlimit64+0x164/0x230
[18830.489221]  SyS_ioctl+0x91/0xa0
[18830.494095] f9 
[18830.496203]  do_syscall_64+0x61/0x110
[18830.501201] 48 
[18830.505225]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[18830.510457] c7 
[18830.514298] RIP: 0033:0x7f690d044f07
[18830.522008] c6 
[18830.525406] RSP: 002b:00007ffd4d6fe078 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[18830.532694] 53 
[18830.539845] RAX: ffffffffffffffda RBX: 0000000001c97500 RCX: 00007f690d044f07
[18830.539848] RDX: 00007ffd4d6fe0d0 RSI: 00000000c0306480 RDI: 0000000000000009
[18830.544907] 69 
[18830.552045] RBP: 00007ffd4d6fe0d0 R08: 0000000000000004 R09: 00007f690d30c410
[18830.552048] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000c0306480
[18830.559259] 1c 
[18830.564076] R13: 0000000000000009 R14: 0000000001c97438 R15: 0000000001c2ef10
[18830.564079] Code: 
[18830.571339] a0 
[18830.575285] 41 
[18830.578906] 48 
[18830.583090] 5f e9 
[18830.586895] 8b 14 
[18830.588913] 70 29 
[18830.592896] c5 
[18830.596134] 87 e1 
[18830.597912] 00 
[18830.601584] 48 8b 
[18830.603364] f0 
[18830.608431] 7d 10 
[18830.610219] 1c 
[18830.613823] 48 8b 
[18830.615609] a0 49 
[18830.623191] 5f 50 
[18830.624959] 8b 46 
[18830.632134] 48 85 
[18830.639356] 10 
[18830.641104] db 74 
[18830.648360] 48 
[18830.655501] 1b e8 
[18830.657306] 8b 
[18830.664496] 7e 32 
[18830.666560] 78 
[18830.668319] 5b e1 
[18830.670105] 10 
[18830.671871] 48 89 
[18830.673937] e8 
[18830.675954] da 48 
[18830.678009] ca 
[18830.679767] 89 c6 
[18830.681824] 22 
[18830.683589] 48 c7 
[18830.685626] 57 
[18830.687393] c7 11 
[18830.689466] e1 
[18830.691241] 52 1c 
[18830.693280] 0f 
[18830.695298] a0 e8 
[18830.697326] b6 54 
[18830.699344] 7c 13 
[18830.701419] 24 
[18830.703177] 03 e1 
[18830.705213] 07 <line noise>


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢰⠒⠀⣿⡁
⢿⡄⠘⠷⠚⠋⠀ ... what's the frequency of that 5V DC?
⠈⠳⣄⠀⠀⠀⠀
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: nouveau TRAP_M2MF still there on G98
       [not found] ` <20180403200053.iwk62zspnjatw46j-b9QjgO8OEXPVItvQsEIGlw@public.gmane.org>
@ 2018-04-04 12:48   ` Māris Nartišs
       [not found]     ` <CAJHE3DppCEiZc3oOxRrbVr5ZxCTC7mBwtQxtzaUVpVW04UTfKw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Māris Nartišs @ 2018-04-04 12:48 UTC (permalink / raw)
  To: Adam Borowski; +Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Ben Skeggs

Hi, Adam.
2018-04-03 23:00 GMT+03:00, Adam Borowski <kilobyte@angband.pl>:
> Hi!
> In commit da5e45e619b3f101420c38b3006a9ae4f3ad19b0
>
> yet it is still reproducible for me on 4.16-rc7 and 4.16.0, which already
> have your fix.  I don't know about earlier versions -- my newer card went
> into flames just a few days ago, and I replaced it a brand new 8400GS (G98)
> I happened to have in a dusty closet.  Obviously, I can bisect if that
> would
> be helpful, but the error looks the same thus I'm reporting first.

Unfortunately I will not be able to help you, as patch fixed issue on
my system and thus I have no means to test anything more. My card is
G98M [Quadro NVS 160M]. Besides – I'm a geographer not a programmer
;-)

Still your report makes to question the original commit I was fixing
(mmu: swap out round for ALIGN). Could you test if going back to
rounddown fixes problem on your side?

--- a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c
@@ -1354,7 +1354,7 @@ nvkm_vmm_get_locked(struct nvkm_vmm *vmm, bool
getref, bool mapref, bool sparse,

                tail = this->addr + this->size;
                if (vmm->func->page_block && next && next->page != p)
-                       tail = ALIGN_DOWN(tail, vmm->func->page_block);
+                       tail = rounddown(tail, vmm->func->page_block);

                if (addr <= tail && tail - addr >= size) {
                        rb_erase(&this->tree, &vmm->free);

All best,
Māris.
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: nouveau TRAP_M2MF still there on G98
       [not found]     ` <CAJHE3DppCEiZc3oOxRrbVr5ZxCTC7mBwtQxtzaUVpVW04UTfKw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-04-04 22:58       ` Adam Borowski
       [not found]         ` <20180404225822.i67jpymqk63uuss3-b9QjgO8OEXPVItvQsEIGlw@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Adam Borowski @ 2018-04-04 22:58 UTC (permalink / raw)
  To: Māris Nartišs
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Ben Skeggs

On Wed, Apr 04, 2018 at 03:48:39PM +0300, Māris Nartišs wrote:
> 2018-04-03 23:00 GMT+03:00, Adam Borowski <kilobyte@angband.pl>:
> > In commit da5e45e619b3f101420c38b3006a9ae4f3ad19b0
> >
> > yet it is still reproducible for me on 4.16-rc7 and 4.16.0, which already
> > have your fix.  I don't know about earlier versions -- my newer card went
> > into flames just a few days ago, and I replaced it a brand new 8400GS (G98)
> > I happened to have in a dusty closet.  Obviously, I can bisect if that
> > would be helpful, but the error looks the same thus I'm reporting first.
> 
> Unfortunately I will not be able to help you, as patch fixed issue on
> my system and thus I have no means to test anything more. My card is
> G98M [Quadro NVS 160M]. Besides – I'm a geographer not a programmer
> ;-)

And I'm, it seems, servant of a particular cat, all else being secondary. :p

> Still your report makes to question the original commit I was fixing
> (mmu: swap out round for ALIGN). Could you test if going back to
> rounddown fixes problem on your side?
> 
> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c
> @@ -1354,7 +1354,7 @@ nvkm_vmm_get_locked(struct nvkm_vmm *vmm, bool
> getref, bool mapref, bool sparse,
> 
>                 tail = this->addr + this->size;
>                 if (vmm->func->page_block && next && next->page != p)
> -                       tail = ALIGN_DOWN(tail, vmm->func->page_block);
> +                       tail = rounddown(tail, vmm->func->page_block);
> 
>                 if (addr <= tail && tail - addr >= size) {
>                         rb_erase(&this->tree, &vmm->free);
> 

Alas, it did work for a few hours, then a total display freeze:

[29982.011795] nouveau 0000:01:00.0: fifo: DMA_PUSHER - ch 2 [Xorg[2667]] get
0000037d90 put 000003a2cc ib_get 000001dc ib_put 000001dd state 80004861 (err:
INVALID_CMD) push 00704031
[29982.027959] nouveau 0000:01:00.0: fifo: DMA_PUSHER - ch 2 [Xorg[2667]] get
000003a2cc put 000003a2cc ib_get 000001dc ib_put 000001f9 state 80000000 (err:
INVALID_CMD) push 00406040
[29982.044136] nouveau 0000:01:00.0: gr: DATA_ERROR 00000004 [INVALID_VALUE]
[29982.050934] nouveau 0000:01:00.0: gr: 00100000 [] ch 2 [001fa31000
Xorg[2667]] subc 2 class 502d mthd 0218 data ff000000
[29982.061866] nouveau 0000:01:00.0: gr: DATA_ERROR 00000004 [INVALID_VALUE]
[29982.068658] nouveau 0000:01:00.0: gr: 00100000 [] ch 2 [001fa31000
Xorg[2667]] subc 2 class 502d mthd 021c data ff000000
[29982.079584] nouveau 0000:01:00.0: gr: DATA_ERROR 0000000c [INVALID_BITFIELD]
[29982.086651] nouveau 0000:01:00.0: gr: 00100000 [] ch 2 [001fa31000
Xorg[2667]] subc 2 class 502d mthd 0220 data ff000000
[29982.097517] nouveau 0000:01:00.0: fb: trapped write at 00ff000000 on
channel 2 [1fa31000 Xorg[2667]] engine 00 [PGRAPH] client 0b [PROP]
subclient 0c [DST2D] reason 00000000 [PT_NOT_PRESENT]
[29982.114491] nouveau 0000:01:00.0: gr: TRAP_PROP - TP 0 - 00000010
[DST2D_FAULT] - Address 00ff000000
[29982.123620] nouveau 0000:01:00.0: gr: TRAP_PROP - TP 0 - e0c: 00000000,
e18: 00000000, e1c: 00000000, e20: 00000011, e24: 0c030000
[29982.135365] nouveau 0000:01:00.0: gr: 00200000 [] ch 2 [001fa31000
Xorg[2667]] subc 2 class 502d mthd 0860 data ff2e2e2e

I did not observe a TRAP_M2MF, but the above were present in previous
errors, thus it's probably random what happens first.


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀ 
⣾⠁⢰⠒⠀⣿⡁ 
⢿⡄⠘⠷⠚⠋⠀ ... what's the frequency of that 5V DC?
⠈⠳⣄⠀⠀⠀⠀
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: nouveau TRAP_M2MF still there on G98
       [not found]         ` <20180404225822.i67jpymqk63uuss3-b9QjgO8OEXPVItvQsEIGlw@public.gmane.org>
@ 2018-04-04 23:03           ` Ilia Mirkin
       [not found]             ` <CAKb7UvjBViZYJiawJVQKp-74eN1hq=XRRfW+O18MLF8A7iCwsg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Ilia Mirkin @ 2018-04-04 23:03 UTC (permalink / raw)
  To: Adam Borowski; +Cc: nouveau, Ben Skeggs

On Wed, Apr 4, 2018 at 6:58 PM, Adam Borowski <kilobyte@angband.pl> wrote:
> On Wed, Apr 04, 2018 at 03:48:39PM +0300, Māris Nartišs wrote:
>> 2018-04-03 23:00 GMT+03:00, Adam Borowski <kilobyte@angband.pl>:
>> > In commit da5e45e619b3f101420c38b3006a9ae4f3ad19b0
>> >
>> > yet it is still reproducible for me on 4.16-rc7 and 4.16.0, which already
>> > have your fix.  I don't know about earlier versions -- my newer card went
>> > into flames just a few days ago, and I replaced it a brand new 8400GS (G98)
>> > I happened to have in a dusty closet.  Obviously, I can bisect if that
>> > would be helpful, but the error looks the same thus I'm reporting first.
>>
>> Unfortunately I will not be able to help you, as patch fixed issue on
>> my system and thus I have no means to test anything more. My card is
>> G98M [Quadro NVS 160M]. Besides – I'm a geographer not a programmer
>> ;-)
>
> And I'm, it seems, servant of a particular cat, all else being secondary. :p
>
>> Still your report makes to question the original commit I was fixing
>> (mmu: swap out round for ALIGN). Could you test if going back to
>> rounddown fixes problem on your side?
>>
>> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c
>> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c
>> @@ -1354,7 +1354,7 @@ nvkm_vmm_get_locked(struct nvkm_vmm *vmm, bool
>> getref, bool mapref, bool sparse,
>>
>>                 tail = this->addr + this->size;
>>                 if (vmm->func->page_block && next && next->page != p)
>> -                       tail = ALIGN_DOWN(tail, vmm->func->page_block);
>> +                       tail = rounddown(tail, vmm->func->page_block);
>>
>>                 if (addr <= tail && tail - addr >= size) {
>>                         rb_erase(&this->tree, &vmm->free);
>>
>
> Alas, it did work for a few hours, then a total display freeze:
>
> [29982.011795] nouveau 0000:01:00.0: fifo: DMA_PUSHER - ch 2 [Xorg[2667]] get
> 0000037d90 put 000003a2cc ib_get 000001dc ib_put 000001dd state 80004861 (err:
> INVALID_CMD) push 00704031
> [29982.027959] nouveau 0000:01:00.0: fifo: DMA_PUSHER - ch 2 [Xorg[2667]] get
> 000003a2cc put 000003a2cc ib_get 000001dc ib_put 000001f9 state 80000000 (err:
> INVALID_CMD) push 00406040

These, as I call them, 406040 errors, have been around on Tesla for
ages. We have no idea what leads to them, but generally some kind of
fifo desync appears to follow.

  -ilia
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: nouveau TRAP_M2MF still there on G98
       [not found]             ` <CAKb7UvjBViZYJiawJVQKp-74eN1hq=XRRfW+O18MLF8A7iCwsg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-04-06  8:57               ` Māris Nartišs
  0 siblings, 0 replies; 5+ messages in thread
From: Māris Nartišs @ 2018-04-06  8:57 UTC (permalink / raw)
  To: Ilia Mirkin; +Cc: Adam Borowski, Ben Skeggs, nouveau

2018-04-05 2:03 GMT+03:00, Ilia Mirkin <imirkin@alum.mit.edu>:
> On Wed, Apr 4, 2018 at 6:58 PM, Adam Borowski <kilobyte@angband.pl> wrote:
>> On Wed, Apr 04, 2018 at 03:48:39PM +0300, Māris Nartišs wrote:
>>> Still your report makes to question the original commit I was fixing
>>> (mmu: swap out round for ALIGN). Could you test if going back to
>>> rounddown fixes problem on your side?
>>>
>>> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c
>>> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c
>>> @@ -1354,7 +1354,7 @@ nvkm_vmm_get_locked(struct nvkm_vmm *vmm, bool
>>> getref, bool mapref, bool sparse,
>>>
>>>                 tail = this->addr + this->size;
>>>                 if (vmm->func->page_block && next && next->page != p)
>>> -                       tail = ALIGN_DOWN(tail, vmm->func->page_block);
>>> +                       tail = rounddown(tail, vmm->func->page_block);
>>>
>>>                 if (addr <= tail && tail - addr >= size) {
>>>                         rb_erase(&this->tree, &vmm->free);
>>>
>>
>> Alas, it did work for a few hours, then a total display freeze:
>>
>> [29982.011795] nouveau 0000:01:00.0: fifo: DMA_PUSHER - ch 2 [Xorg[2667]]
>> get
>> 0000037d90 put 000003a2cc ib_get 000001dc ib_put 000001dd state 80004861
>> (err:
>> INVALID_CMD) push 00704031
>> [29982.027959] nouveau 0000:01:00.0: fifo: DMA_PUSHER - ch 2 [Xorg[2667]]
>> get
>> 000003a2cc put 000003a2cc ib_get 000001dc ib_put 000001f9 state 80000000
>> (err:
>> INVALID_CMD) push 00406040
>
> These, as I call them, 406040 errors, have been around on Tesla for
> ages. We have no idea what leads to them, but generally some kind of
> fifo desync appears to follow.
>
>   -ilia

Taking this into account, going back to rounddonw from ALIGN_DOWN
seems to fix breakage on some systems. Lets wait for Ben's input on
this matter, as he swapped rounddown with ALIGN_DOWN to fix some kind
of build problems on 32bit systems.

Ilia, is there anything we could add to our kernels to shed some light
on 406040 errors? I am not certain if I have seen those on my
hardware, but, as you say, they might be rare enough to not remember
it.

Māris.
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-04-06  8:57 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-03 20:00 nouveau TRAP_M2MF still there on G98 Adam Borowski
     [not found] ` <20180403200053.iwk62zspnjatw46j-b9QjgO8OEXPVItvQsEIGlw@public.gmane.org>
2018-04-04 12:48   ` Māris Nartišs
     [not found]     ` <CAJHE3DppCEiZc3oOxRrbVr5ZxCTC7mBwtQxtzaUVpVW04UTfKw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-04-04 22:58       ` Adam Borowski
     [not found]         ` <20180404225822.i67jpymqk63uuss3-b9QjgO8OEXPVItvQsEIGlw@public.gmane.org>
2018-04-04 23:03           ` Ilia Mirkin
     [not found]             ` <CAKb7UvjBViZYJiawJVQKp-74eN1hq=XRRfW+O18MLF8A7iCwsg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-04-06  8:57               ` Māris Nartišs

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.