[BUG][REGRESSION] i915 gpu hangs under load

* [BUG][REGRESSION] i915 gpu hangs under load
@ 2017-03-22  8:38 Martin Kepplinger
  2017-03-22 10:36 ` [Intel-gfx] " Jani Nikula
  0 siblings, 1 reply; 21+ messages in thread
From: Martin Kepplinger @ 2017-03-22  8:38 UTC (permalink / raw)
  To: daniel.vetter, airlied; +Cc: intel-gfx, dri-devel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1910 bytes --]

Hi

I know something similar is here: 
https://bugs.freedesktop.org/show_bug.cgi?id=100110 too.

But this is rc3 and my machine is totally *not usable*. Let me be 
annoying :) I hope I can help:

Since rc1 I get gpu hangs and resets under load: This is almost 
certainly a kernel issue. 4.10 is fine.
I keep a debian stable userspace. nouveau is running on this machine 
too.

Mar 22 09:17:01 martin-laptop kernel: [ 2409.538706] [drm] GPU HANG: 
ecode 7:0:0xf3cffffe, in gnome-shell [1869], reason: Hang on render 
ring, action: reset
Mar 22 09:17:01 martin-laptop kernel: [ 2409.538711] [drm] GPU hangs can 
indicate a bug anywhere in the entire gfx stack, including userspace.
Mar 22 09:17:01 martin-laptop kernel: [ 2409.538713] [drm] Please file a 
_new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Mar 22 09:17:01 martin-laptop kernel: [ 2409.538714] [drm] drm/i915 
developers can then reassign to the right component if it's not a kernel 
issue.
Mar 22 09:17:01 martin-laptop kernel: [ 2409.538715] [drm] The gpu crash 
dump is required to analyze gpu hangs, so please always attach it.
Mar 22 09:17:01 martin-laptop kernel: [ 2409.538716] [drm] GPU crash 
dump saved to /sys/class/drm/card0/error
Mar 22 09:17:01 martin-laptop kernel: [ 2409.538768] drm/i915: Resetting 
chip after gpu hang
Mar 22 09:17:09 martin-laptop kernel: [ 2417.537886] drm/i915: Resetting 
chip after gpu hang
Mar 22 09:17:17 martin-laptop kernel: [ 2425.537152] drm/i915: Resetting 
chip after gpu hang
Mar 22 09:17:25 martin-laptop kernel: [ 2433.536407] drm/i915: Resetting 
chip after gpu hang
Mar 22 09:17:33 martin-laptop kernel: [ 2441.539674] drm/i915: Resetting 
chip after gpu hang

Furthermore, there are weird, small display distortions occuring. I 
don't get any log about them and
don't have a screenshot. Well. Nevermind. Please fix 4.11 and CC anyone 
I forgot.

thanks

              martin

[-- Attachment #2: gpu_crash.txt --]
[-- Type: text/plain, Size: 10815 bytes --]

GPU HANG: ecode 7:0:0xf3cffffe, in gnome-shell [1869], reason: Hang on render ring, action: reset
Kernel: 4.11.0-rc3-00003-gbc61cd2
Time: 1490170621 s 524489 us
Boottime: 2409 s 756155 us
Uptime: 2395 s 323536 us
is_mobile: no
is_lp: no
is_alpha_support: no
has_64bit_reloc: no
has_aliasing_ppgtt: yes
has_csr: no
has_ddi: yes
has_decoupled_mmio: no
has_dp_mst: yes
has_fbc: yes
has_fpga_dbg: yes
has_full_ppgtt: yes
has_full_48bit_ppgtt: no
has_gmbus_irq: yes
has_gmch_display: no
has_guc: no
has_hotplug: yes
has_hw_contexts: yes
has_l3_dpf: yes
has_llc: yes
has_logical_ring_contexts: no
has_overlay: no
has_pipe_cxsr: no
has_pooled_eu: no
has_psr: yes
has_rc6: yes
has_rc6p: no
has_resource_streamer: yes
has_runtime_pm: yes
has_snoop: no
cursor_needs_physical: no
hws_needs_physical: no
overlay_needs_physical: no
supports_tv: no
Active process (on ring render): gnome-shell [1869], context bans 0
Reset count: 0
Suspend count: 0
Platform: HASWELL
PCI ID: 0x0416
PCI Revision: 0x06
PCI Subsystem: 10cf:17ac
IOMMU enabled?: 0
EIR: 0x00000000
IER: 0xfc002529
GTIER: 0x00401821
PGTBL_ER: 0x00000000
FORCEWAKE: 0x00000001
DERRMR: 0xffffffff
CCID: 0x00ef410d
Missed interrupts: 0x00000000
  fence[0] = 00000000
  fence[1] = 00000000
  fence[2] = 00000000
  fence[3] = 00000000
  fence[4] = 00000000
  fence[5] = 00000000
  fence[6] = 00000000
  fence[7] = 00000000
  fence[8] = 00000000
  fence[9] = 00000000
  fence[10] = 00000000
  fence[11] = 00000000
  fence[12] = 00000000
  fence[13] = 00000000
  fence[14] = 00000000
  fence[15] = 00000000
  fence[16] = 00000000
  fence[17] = 00000000
  fence[18] = 4b530770374a001
  fence[19] = 00000000
  fence[20] = 00000000
  fence[21] = 00000000
  fence[22] = 00000000
  fence[23] = 00000000
  fence[24] = 00000000
  fence[25] = 00000000
  fence[26] = 00000000
  fence[27] = 00000000
  fence[28] = 00000000
  fence[29] = 00000000
  fence[30] = 00000000
  fence[31] = 00000000
ERROR: 0x00000109
DONE_REG: 0xffffffff
ERR_INT: 0x00000000
render command stream:
  START: 0x007ea000
  HEAD:  0x07a1f6dc [0x0001f648]
  TAIL:  0x0001f8f8 [0x0001f728, 0x0001f760]
  CTL:   0x0001f001
  MODE:  0x00004000
  HWS:   0x7fff0000
  ACTHD: 0x00000000 07a1f6dc
  IPEIR: 0x00000000
  IPEHR: 0x0c000000
  INSTDONE: 0xffcffffe
  SC_INSTDONE: 0xffffffff
  SAMPLER_INSTDONE[0][0]: 0xffffffff
  ROW_INSTDONE[0][0]: 0xffffffff
  BBADDR: 0x00000000_7fa48330
  BB_STATE: 0x00000000
  INSTPS: 0x00000500
  INSTPM: 0x00006080
  FADDR: 0x00000000 008096d8
  RC PSMI: 0x00000010
  FAULT_REG: 0x000000c5
  SYNC_0: 0x00000000
  SYNC_1: 0x0001c2a1
  SYNC_2: 0x00000000
  GFX_MODE: 0x00002a00
  PP_DIR_BASE: 0x7fdf0000
  seqno: 0x0001c29a
  last_seqno: 0x0001c2a2
  waiting: yes
  ring->head: 0x00016e60
  ring->tail: 0x0001f8f8
  hangcheck stall: yes
  hangcheck action: dead
  hangcheck action timestamp: 4295493232, 204600 ms ago
blt command stream:
  START: 0x0080a000
  HEAD:  0x07e0e8d0 [0x00000000]
  TAIL:  0x0000e8d0 [0x00000000, 0x00000000]
  CTL:   0x0001f001
  MODE:  0x00000200
  HWS:   0x7fff1000
  ACTHD: 0x00000000 07e0e8d0
  IPEIR: 0x00000000
  IPEHR: 0x01000000
  INSTDONE: 0xfffffffe
  BBADDR: 0x00000000_7fff4028
  BB_STATE: 0x00000000
  INSTPS: 0x00000000
  INSTPM: 0x00000000
  FADDR: 0x00000000 008188d0
  RC PSMI: 0x00000011
  FAULT_REG: 0x00000000
  SYNC_0: 0x0001c29a
  SYNC_1: 0x00000000
  SYNC_2: 0x00000000
  GFX_MODE: 0x00000200
  PP_DIR_BASE: 0x7fdf0000
  seqno: 0x0001c2a1
  last_seqno: 0x0001c2a1
  waiting: no
  ring->head: 0x00000000
  ring->tail: 0x00000000
  hangcheck stall: no
  hangcheck action: idle
  hangcheck action timestamp: 4295494736, 198584 ms ago
bsd command stream:
  START: 0x0082a000
  HEAD:  0x00000000 [0x00000000]
  TAIL:  0x00000000 [0x00000000, 0x00000000]
  CTL:   0x0001f001
  MODE:  0x00000200
  HWS:   0x7fff2000
  ACTHD: 0x00000000 00000000
  IPEIR: 0x00000000
  IPEHR: 0x00000000
  INSTDONE: 0xfffffffe
  BBADDR: 0x00000000_00000000
  BB_STATE: 0x00000000
  INSTPS: 0x00000000
  INSTPM: 0x00000000
  FADDR: 0x00000000 0082a000
  RC PSMI: 0x00000011
  FAULT_REG: 0x00000000
  SYNC_0: 0x0001c2a1
  SYNC_1: 0x0001c29a
  SYNC_2: 0x00000000
  GFX_MODE: 0x00000200
  PP_DIR_BASE: 0x00000000
  seqno: 0x00000000
  last_seqno: 0x00000000
  waiting: no
  ring->head: 0x00000000
  ring->tail: 0x00000000
  hangcheck stall: no
  hangcheck action: idle
  hangcheck action timestamp: 4295494736, 198584 ms ago
vebox command stream:
  START: 0x0084a000
  HEAD:  0x00000000 [0x00000000]
  TAIL:  0x00000000 [0x00000000, 0x00000000]
  CTL:   0x0001f001
  MODE:  0x00000200
  HWS:   0x7fff3000
  ACTHD: 0x00000000 00000000
  IPEIR: 0x00000000
  IPEHR: 0x00000000
  INSTDONE: 0xfffffffe
  BBADDR: 0x00000000_00000000
  BB_STATE: 0x00000000
  INSTPS: 0x00000000
  INSTPM: 0x00000000
  FADDR: 0x00000000 0084a000
  RC PSMI: 0x00000011
  FAULT_REG: 0x00000000
  SYNC_0: 0x0001c2a1
  SYNC_1: 0x0001c29a
  SYNC_2: 0x00000000
  GFX_MODE: 0x00000200
  PP_DIR_BASE: 0x00000000
  seqno: 0x00000000
  last_seqno: 0x00000000
  waiting: no
  ring->head: 0x00000000
  ring->tail: 0x00000000
  hangcheck stall: no
  hangcheck action: idle
  hangcheck action timestamp: 4295494736, 198584 ms ago
Active (render ring) [40]:
    00000000_7fff8000     8192 37 00 [ 1c29e 00 00 00 00 ] 00 LLC
    00000000_7e9df000 20971520 36 00 [ 1c29e 00 00 00 00 ] 00 X uncached (name: 2)
    00000000_7fff7000     4096 36 00 [ 1c29e 00 00 00 00 ] 00 LLC
    00000000_7d5df000 20971520 36 00 [ 1c29e 00 00 00 00 ] 00 Y LLC
    00000000_7c1df000 20971520 36 00 [ 1c29e 00 00 00 00 ] 00 Y LLC
    00000000_7bcdf000  5242880 36 00 [ 1c29e 00 00 00 00 ] 00 LLC
    00000000_7fff6000     4096 37 00 [ 1c29e 00 00 00 00 ] 00 LLC
    00000000_7b2df000 10485760 37 00 [ 1c29e 00 00 00 00 ] 00 X LLC (name: 10)
    00000000_7b2d5000    40960 37 00 [ 1c29e 00 00 00 00 ] 00 LLC
    00000000_7a8d5000 10485760 37 00 [ 1c29e 00 00 00 00 ] 00 X LLC (name: 8)
    00000000_7a655000  2621440 37 00 [ 1c29e 00 00 00 00 ] 00 Y LLC
    00000000_7fff5000     4096 37 00 [ 1c29e 00 00 00 00 ] 00 LLC
    00000000_7a575000   917504 37 00 [ 1c29e 00 00 00 00 ] 00 X LLC (name: 11)
    00000000_7a535000   262144 37 00 [ 1c29e 00 00 00 00 ] 00 Y LLC
    00000000_79b35000 10485760 37 00 [ 1c29e 00 00 00 00 ] 00 X LLC (name: 5)
    00000000_79535000  6291456 37 00 [ 1c29e 00 00 00 00 ] 00 X LLC (name: 12)
    00000000_793b5000  1572864 37 00 [ 1c29e 00 00 00 00 ] 00 Y LLC
    00000000_793ad000    32768 37 00 [ 1c29e 00 00 00 00 ] 00 dirty LLC
    00000000_00ef3000     4096 09 00 [ 1c29e 00 00 00 00 ] 00 dirty purgeable LLC
    00000000_77f9d000     4096 37 00 [ 1c2a0 00 00 00 00 ] 00 dirty LLC
    00000000_77f9c000     4096 37 00 [ 1c2a0 00 00 00 00 ] 00 LLC
    00000000_77f98000    16384 37 00 [ 1c2a0 00 00 00 00 ] 00 purgeable LLC
    00000000_77f97000     4096 37 00 [ 1c2a0 00 00 00 00 ] 00 LLC
    00000000_77e97000  1048576 37 00 [ 1c2a0 00 00 00 00 ] 00 X LLC
    00000000_77e93000    16384 37 00 [ 1c2a0 00 00 00 00 ] 00 dirty purgeable LLC
    00000000_7fffa000    16384 37 00 [ 1c2a0 00 00 00 00 ] 00 dirty LLC
    00000000_00f06000     4096 09 00 [ 1c2a0 00 00 00 00 ] 00 dirty purgeable LLC
    00000000_77fa8000     4096 37 00 [ 1c2a2 00 00 00 00 ] 00 LLC
    00000000_77fa1000    28672 37 00 [ 1c2a2 00 00 00 00 ] 00 LLC
    00000000_77fa0000     4096 37 00 [ 1c2a2 00 00 00 00 ] 00 LLC
    00000000_77f9f000     4096 37 00 [ 1c2a2 00 00 00 00 ] 00 LLC
    00000000_77f9e000     4096 37 00 [ 1c2a2 00 00 00 00 ] 00 LLC
    00000000_77e56000     4096 37 00 [ 1c2a2 00 00 00 00 ] 00 dirty LLC
    00000000_77e55000     4096 37 00 [ 1c2a2 00 00 00 00 ] 00 LLC
    00000000_77e51000    16384 37 00 [ 1c2a2 00 00 00 00 ] 00 purgeable LLC
    00000000_77e5b000   229376 36 00 [ 1c2a2 00 00 00 00 ] 00 X LLC
    00000000_789ad000 10485760 36 00 [ 1c2a2 00 00 00 00 ] 00 X LLC
    00000000_77e4d000    16384 37 00 [ 1c2a2 00 00 00 00 ] 00 purgeable LLC
    00000000_77fa9000    16384 37 00 [ 1c2a2 00 00 00 00 ] 00 dirty LLC
    00000000_00f07000     4096 09 00 [ 1c2a2 00 00 00 00 ] 00 dirty purgeable LLC
Pinned (global) [15]:
    00000000_7fddf000    69632 41 00 [ 00 00 00 00 00 ] 00 LLC
    00000000_7fff0000     4096 01 01 [ 00 00 00 00 00 ] 00 purgeable LLC
    00000000_007ea000   131072 40 40 [ 00 00 00 00 00 ] 00 dirty LLC
    00000000_7fffe000     4096 41 00 [ 00 00 00 00 00 ] 00 LLC
    00000000_7fff1000     4096 01 01 [ 00 00 00 00 00 ] 00 purgeable LLC
    00000000_0080a000   131072 40 40 [ 00 00 00 00 00 ] 00 dirty LLC
    00000000_7fff2000     4096 01 01 [ 00 00 00 00 00 ] 00 purgeable LLC
    00000000_0082a000   131072 40 40 [ 00 00 00 00 00 ] 00 dirty LLC
    00000000_7fff3000     4096 01 01 [ 00 00 00 00 00 ] 00 purgeable LLC
    00000000_0084a000   131072 40 40 [ 00 00 00 00 00 ] 00 dirty LLC
    00000000_00000000  8294400 41 00 [ 00 00 00 00 00 ] 00 uncached
    00000000_00f49000    16384 40 00 [ 00 00 00 00 00 ] 00 dirty uncached
    00000000_00ee2000    69632 41 00 [ 00 00 00 00 00 ] 00 LLC
    00000000_00ef4000    69632 41 00 [ 00 00 00 00 00 ] 00 LLC
    00000000_0374a000 20971520 36 00 [ 00 00 00 00 00 ] 00 X dirty uncached (name: 3) (fence: 18)
render ring --- 3 requests
  pid 1869, ban score 0, seqno        4:0001c29e, emitted 207960ms ago, head 0001f648, tail 0001f760
  pid 934, ban score 0, seqno        1:0001c2a0, emitted 207924ms ago, head 0001f760, tail 0001f878
  pid 934, ban score 0, seqno        1:0001c2a2, emitted 207908ms ago, head 0001f878, tail 0001f8f8
render ring --- 2 waiters
 seqno 0x0001c29e for gnome-shell [1869]
 seqno 0x0001c2a0 for Xorg [934]
Num Pipes: 3
PWR_WELL_CTL2: c0000000
Pipe [0]:
  Power: on
  SRC: 077f04af
  STAT: 00000000
Plane [0]:
  CNTR: d9000400
  STRIDE: 00003c00
  SURF: 0374a000
  TILEOFF: 00000000
Cursor [0]:
  CNTR: 05000027
  POS: 02740205
  BASE: 00f49000
Pipe [1]:
  Power: on
  SRC: 077f0437
  STAT: 00000000
Plane [1]:
  CNTR: d9000400
  STRIDE: 00003c00
  SURF: 03759000
  TILEOFF: 00000000
Cursor [1]:
  CNTR: 00000000
  POS: 00000000
  BASE: 00000000
Pipe [2]:
  Power: on
  SRC: 00000000
  STAT: 00000000
Plane [2]:
  CNTR: 00000000
  STRIDE: 00000000
  SURF: 00000000
  TILEOFF: 00000000
Cursor [2]:
  CNTR: 00000000
  POS: 00000000
  BASE: 00000000
CPU transcoder: A
  Power: on
  CONF: c0000000
  HTOTAL: 081f077f
  HBLANK: 081f077f
  HSYNC: 07cf07af
  VTOTAL: 04d204af
  VBLANK: 04d204af
  VSYNC: 04b804b2
CPU transcoder: B
  Power: on
  CONF: 00000000
  HTOTAL: 072f068f
  HBLANK: 072f068f
  HSYNC: 06df06bf
  VTOTAL: 04560437
  VBLANK: 04370419
  VSYNC: 0422041c
CPU transcoder: C
  Power: on
  CONF: 00000000
  HTOTAL: 00000000
  HBLANK: 00000000
  HSYNC: 00000000
  VTOTAL: 00000000
  VBLANK: 00000000
  VSYNC: 00000000
CPU transcoder: EDP
  Power: on
  CONF: c0000000
  HTOTAL: 081f077f
  HBLANK: 081f077f
  HSYNC: 07cf07af
  VTOTAL: 04560437
  VBLANK: 04560437
  VSYNC: 043f043a

^ permalink raw reply	[flat|nested] 21+ messages in thread