All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915: Set fake_vma.size as well as fake_vma.node.size for capture
@ 2017-12-01  0:15 Chris Wilson
  2017-12-01  0:20 ` Chris Wilson
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Chris Wilson @ 2017-12-01  0:15 UTC (permalink / raw)
  To: intel-gfx

When capturing the bo, we allocate an array for min(vma->size,
vma->node.size) pages, plus a bit for compression overhead. Through my
and CI testing, this was sufficient for the mostly empty NULL context as
it compressed well (or the out-of-bounds access simply didn't cause an
issue). However, in real workloads on Cannonlake, we were overflowing
that array and causing havoc with the random memory corruption.

Reported-by: Rafael Antognolli <rafael.antognolli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103964
Fixes: 4e90a6e22272 ("drm/i915: Record default HW state in the GPU error state")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Tested-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
---
 drivers/gpu/drm/i915/i915_gpu_error.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 876be8f1d930..48418fb81066 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1424,6 +1424,7 @@ capture_object(struct drm_i915_private *dev_priv,
 	if (obj && i915_gem_object_has_pages(obj)) {
 		struct i915_vma fake = {
 			.node = { .start = U64_MAX, .size = obj->base.size },
+			.size = obj->base.size,
 			.pages = obj->mm.pages,
 			.obj = obj,
 		};
-- 
2.15.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/i915: Set fake_vma.size as well as fake_vma.node.size for capture
  2017-12-01  0:15 [PATCH] drm/i915: Set fake_vma.size as well as fake_vma.node.size for capture Chris Wilson
@ 2017-12-01  0:20 ` Chris Wilson
  2017-12-01  0:36 ` ✓ Fi.CI.BAT: success for " Patchwork
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Chris Wilson @ 2017-12-01  0:20 UTC (permalink / raw)
  To: intel-gfx

Quoting Chris Wilson (2017-12-01 00:15:36)
> When capturing the bo, we allocate an array for min(vma->size,
> vma->node.size) pages, plus a bit for compression overhead. Through my
> and CI testing, this was sufficient for the mostly empty NULL context as
> it compressed well (or the out-of-bounds access simply didn't cause an
> issue). However, in real workloads on Cannonlake, we were overflowing
> that array and causing havoc with the random memory corruption.
> 
> Reported-by: Rafael Antognolli <rafael.antognolli@intel.com>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103964
> Fixes: 4e90a6e22272 ("drm/i915: Record default HW state in the GPU error state")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Tested-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* ✓ Fi.CI.BAT: success for drm/i915: Set fake_vma.size as well as fake_vma.node.size for capture
  2017-12-01  0:15 [PATCH] drm/i915: Set fake_vma.size as well as fake_vma.node.size for capture Chris Wilson
  2017-12-01  0:20 ` Chris Wilson
@ 2017-12-01  0:36 ` Patchwork
  2017-12-01  1:39 ` ✓ Fi.CI.IGT: " Patchwork
  2017-12-01  8:28 ` [PATCH] " Mika Kuoppala
  3 siblings, 0 replies; 7+ messages in thread
From: Patchwork @ 2017-12-01  0:36 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Set fake_vma.size as well as fake_vma.node.size for capture
URL   : https://patchwork.freedesktop.org/series/34717/
State : success

== Summary ==

Series 34717v1 drm/i915: Set fake_vma.size as well as fake_vma.node.size for capture
https://patchwork.freedesktop.org/api/1.0/series/34717/revisions/1/mbox/

Test gem_exec_reloc:
        Subgroup basic-write-read-active:
                fail       -> PASS       (fi-gdg-551) fdo#102582
Test prime_vgem:
        Subgroup basic-fence-flip:
                fail       -> PASS       (fi-byt-n2820)

fdo#102582 https://bugs.freedesktop.org/show_bug.cgi?id=102582

fi-bdw-5557u     total:288  pass:267  dwarn:0   dfail:0   fail:0   skip:21  time:449s
fi-bdw-gvtdvm    total:288  pass:264  dwarn:0   dfail:0   fail:0   skip:24  time:440s
fi-blb-e6850     total:288  pass:223  dwarn:1   dfail:0   fail:0   skip:64  time:386s
fi-bsw-n3050     total:288  pass:242  dwarn:0   dfail:0   fail:0   skip:46  time:520s
fi-bwr-2160      total:288  pass:183  dwarn:0   dfail:0   fail:0   skip:105 time:282s
fi-bxt-dsi       total:288  pass:258  dwarn:0   dfail:0   fail:0   skip:30  time:505s
fi-bxt-j4205     total:288  pass:259  dwarn:0   dfail:0   fail:0   skip:29  time:509s
fi-byt-n2820     total:288  pass:249  dwarn:0   dfail:0   fail:0   skip:39  time:472s
fi-elk-e7500     total:224  pass:162  dwarn:16  dfail:0   fail:0   skip:45 
fi-gdg-551       total:288  pass:178  dwarn:1   dfail:0   fail:1   skip:108 time:266s
fi-glk-1         total:288  pass:260  dwarn:0   dfail:0   fail:0   skip:28  time:537s
fi-hsw-4770      total:288  pass:261  dwarn:0   dfail:0   fail:0   skip:27  time:371s
fi-hsw-4770r     total:288  pass:224  dwarn:0   dfail:0   fail:0   skip:64  time:257s
fi-ilk-650       total:288  pass:228  dwarn:0   dfail:0   fail:0   skip:60  time:396s
fi-ivb-3520m     total:288  pass:259  dwarn:0   dfail:0   fail:0   skip:29  time:480s
fi-ivb-3770      total:288  pass:259  dwarn:0   dfail:0   fail:0   skip:29  time:446s
fi-kbl-7500u     total:288  pass:263  dwarn:1   dfail:0   fail:0   skip:24  time:485s
fi-kbl-7560u     total:288  pass:269  dwarn:0   dfail:0   fail:0   skip:19  time:530s
fi-kbl-7567u     total:288  pass:268  dwarn:0   dfail:0   fail:0   skip:20  time:480s
fi-kbl-r         total:288  pass:261  dwarn:0   dfail:0   fail:0   skip:27  time:533s
fi-pnv-d510      total:288  pass:222  dwarn:1   dfail:0   fail:0   skip:65  time:591s
fi-skl-6260u     total:288  pass:268  dwarn:0   dfail:0   fail:0   skip:20  time:469s
fi-skl-6600u     total:288  pass:261  dwarn:0   dfail:0   fail:0   skip:27  time:538s
fi-skl-6700hq    total:288  pass:262  dwarn:0   dfail:0   fail:0   skip:26  time:562s
fi-skl-6700k     total:288  pass:264  dwarn:0   dfail:0   fail:0   skip:24  time:516s
fi-skl-6770hq    total:288  pass:268  dwarn:0   dfail:0   fail:0   skip:20  time:499s
fi-skl-gvtdvm    total:288  pass:265  dwarn:0   dfail:0   fail:0   skip:23  time:449s
fi-snb-2520m     total:288  pass:249  dwarn:0   dfail:0   fail:0   skip:39  time:549s
fi-snb-2600      total:288  pass:248  dwarn:0   dfail:0   fail:0   skip:40  time:413s
Blacklisted hosts:
fi-cfl-s2        total:288  pass:262  dwarn:0   dfail:0   fail:0   skip:26  time:612s
fi-glk-dsi       total:288  pass:258  dwarn:0   dfail:0   fail:0   skip:30  time:490s

2147458fbe6d5cd610598f1270265e26c716a596 drm-tip: 2017y-11m-30d-23h-37m-28s UTC integration manifest
bbcabbdba265 drm/i915: Set fake_vma.size as well as fake_vma.node.size for capture

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_7384/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* ✓ Fi.CI.IGT: success for drm/i915: Set fake_vma.size as well as fake_vma.node.size for capture
  2017-12-01  0:15 [PATCH] drm/i915: Set fake_vma.size as well as fake_vma.node.size for capture Chris Wilson
  2017-12-01  0:20 ` Chris Wilson
  2017-12-01  0:36 ` ✓ Fi.CI.BAT: success for " Patchwork
@ 2017-12-01  1:39 ` Patchwork
  2017-12-01  9:24   ` Chris Wilson
  2017-12-01  8:28 ` [PATCH] " Mika Kuoppala
  3 siblings, 1 reply; 7+ messages in thread
From: Patchwork @ 2017-12-01  1:39 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Set fake_vma.size as well as fake_vma.node.size for capture
URL   : https://patchwork.freedesktop.org/series/34717/
State : success

== Summary ==

Test kms_frontbuffer_tracking:
        Subgroup fbc-1p-offscren-pri-shrfb-draw-render:
                pass       -> FAIL       (shard-snb) fdo#101623 +1
        Subgroup fbc-rgb101010-draw-mmap-wc:
                skip       -> PASS       (shard-hsw) fdo#103167
Test gem_busy:
        Subgroup close-race:
                fail       -> PASS       (shard-snb) fdo#103829
Test drv_module_reload:
        Subgroup basic-no-display:
                pass       -> DMESG-WARN (shard-snb) fdo#102707 +2
Test gem_tiled_swapping:
        Subgroup non-threaded:
                incomplete -> PASS       (shard-hsw) fdo#103525
Test kms_flip_event_leak:
                skip       -> PASS       (shard-hsw)
Test kms_setmode:
        Subgroup basic:
                pass       -> FAIL       (shard-hsw) fdo#99912
Test perf:
        Subgroup blocking:
                pass       -> FAIL       (shard-hsw) fdo#102252

fdo#101623 https://bugs.freedesktop.org/show_bug.cgi?id=101623
fdo#103167 https://bugs.freedesktop.org/show_bug.cgi?id=103167
fdo#103829 https://bugs.freedesktop.org/show_bug.cgi?id=103829
fdo#102707 https://bugs.freedesktop.org/show_bug.cgi?id=102707
fdo#103525 https://bugs.freedesktop.org/show_bug.cgi?id=103525
fdo#99912 https://bugs.freedesktop.org/show_bug.cgi?id=99912
fdo#102252 https://bugs.freedesktop.org/show_bug.cgi?id=102252

shard-hsw        total:2663 pass:1535 dwarn:1   dfail:0   fail:11  skip:1116 time:9509s
shard-snb        total:2663 pass:1305 dwarn:3   dfail:0   fail:13  skip:1342 time:8090s
Blacklisted hosts:
shard-apl        total:2663 pass:1690 dwarn:1   dfail:0   fail:23  skip:949 time:13795s
shard-kbl        total:2650 pass:1796 dwarn:1   dfail:0   fail:23  skip:829 time:10561s

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_7384/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/i915: Set fake_vma.size as well as fake_vma.node.size for capture
  2017-12-01  0:15 [PATCH] drm/i915: Set fake_vma.size as well as fake_vma.node.size for capture Chris Wilson
                   ` (2 preceding siblings ...)
  2017-12-01  1:39 ` ✓ Fi.CI.IGT: " Patchwork
@ 2017-12-01  8:28 ` Mika Kuoppala
  2017-12-01  9:01   ` Chris Wilson
  3 siblings, 1 reply; 7+ messages in thread
From: Mika Kuoppala @ 2017-12-01  8:28 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> When capturing the bo, we allocate an array for min(vma->size,
> vma->node.size) pages, plus a bit for compression overhead. Through my
> and CI testing, this was sufficient for the mostly empty NULL context as
> it compressed well (or the out-of-bounds access simply didn't cause an
> issue). However, in real workloads on Cannonlake, we were overflowing
> that array and causing havoc with the random memory corruption.
>

When capturing the error object we allocate a struct for bookkeeping
plus an array for min(vma->size, vma->node.size) pages and a bit for
compression overhead. We use this mechanism when capturing state object
by constructing a fake vma for it. We forgot to set the vma size
causing allocation to cater only for bookkeepping struct, overflowing
and causing havoc with the random memory corruption.

This is how I see it so with above and including possible language fixes,

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

> Reported-by: Rafael Antognolli <rafael.antognolli@intel.com>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103964
> Fixes: 4e90a6e22272 ("drm/i915: Record default HW state in the GPU error state")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Tested-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
> ---
>  drivers/gpu/drm/i915/i915_gpu_error.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 876be8f1d930..48418fb81066 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -1424,6 +1424,7 @@ capture_object(struct drm_i915_private *dev_priv,
>  	if (obj && i915_gem_object_has_pages(obj)) {
>  		struct i915_vma fake = {
>  			.node = { .start = U64_MAX, .size = obj->base.size },
> +			.size = obj->base.size,
>  			.pages = obj->mm.pages,
>  			.obj = obj,
>  		};
> -- 
> 2.15.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/i915: Set fake_vma.size as well as fake_vma.node.size for capture
  2017-12-01  8:28 ` [PATCH] " Mika Kuoppala
@ 2017-12-01  9:01   ` Chris Wilson
  0 siblings, 0 replies; 7+ messages in thread
From: Chris Wilson @ 2017-12-01  9:01 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

Quoting Mika Kuoppala (2017-12-01 08:28:45)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > When capturing the bo, we allocate an array for min(vma->size,
> > vma->node.size) pages, plus a bit for compression overhead. Through my
> > and CI testing, this was sufficient for the mostly empty NULL context as
> > it compressed well (or the out-of-bounds access simply didn't cause an
> > issue). However, in real workloads on Cannonlake, we were overflowing
> > that array and causing havoc with the random memory corruption.
> >
> 
> When capturing the error object we allocate a struct for bookkeeping

We are capturing a bo, into the error object. (As opposed to when we are
capturing to just the vma.)

> plus an array for min(vma->size, vma->node.size) pages and a bit for
> compression overhead. We use this mechanism when capturing state object
> by constructing a fake vma for it. We forgot to set the vma size

We set one of the sizes used, I forgot it compared both. Ah, I see, I
missed a sentence saying what was missing, just focussed on how it went
wrong and yet survived testing. Ta,
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ✓ Fi.CI.IGT: success for drm/i915: Set fake_vma.size as well as fake_vma.node.size for capture
  2017-12-01  1:39 ` ✓ Fi.CI.IGT: " Patchwork
@ 2017-12-01  9:24   ` Chris Wilson
  0 siblings, 0 replies; 7+ messages in thread
From: Chris Wilson @ 2017-12-01  9:24 UTC (permalink / raw)
  To: Patchwork; +Cc: intel-gfx

Quoting Patchwork (2017-12-01 01:39:53)
> == Series Details ==
> 
> Series: drm/i915: Set fake_vma.size as well as fake_vma.node.size for capture
> URL   : https://patchwork.freedesktop.org/series/34717/
> State : success
> 
> == Summary ==
> 
> Test kms_frontbuffer_tracking:
>         Subgroup fbc-1p-offscren-pri-shrfb-draw-render:
>                 pass       -> FAIL       (shard-snb) fdo#101623 +1
>         Subgroup fbc-rgb101010-draw-mmap-wc:
>                 skip       -> PASS       (shard-hsw) fdo#103167
> Test gem_busy:
>         Subgroup close-race:
>                 fail       -> PASS       (shard-snb) fdo#103829
> Test drv_module_reload:
>         Subgroup basic-no-display:
>                 pass       -> DMESG-WARN (shard-snb) fdo#102707 +2
> Test gem_tiled_swapping:
>         Subgroup non-threaded:
>                 incomplete -> PASS       (shard-hsw) fdo#103525
> Test kms_flip_event_leak:
>                 skip       -> PASS       (shard-hsw)
> Test kms_setmode:
>         Subgroup basic:
>                 pass       -> FAIL       (shard-hsw) fdo#99912
> Test perf:
>         Subgroup blocking:
>                 pass       -> FAIL       (shard-hsw) fdo#102252

And pushed. Hopefully this is the silliest brainfart for 4.16.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-12-01  9:24 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-01  0:15 [PATCH] drm/i915: Set fake_vma.size as well as fake_vma.node.size for capture Chris Wilson
2017-12-01  0:20 ` Chris Wilson
2017-12-01  0:36 ` ✓ Fi.CI.BAT: success for " Patchwork
2017-12-01  1:39 ` ✓ Fi.CI.IGT: " Patchwork
2017-12-01  9:24   ` Chris Wilson
2017-12-01  8:28 ` [PATCH] " Mika Kuoppala
2017-12-01  9:01   ` Chris Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.