* [PATCH] drm/i915: Set fake_vma.size as well as fake_vma.node.size for capture
@ 2017-12-01 0:15 Chris Wilson
2017-12-01 0:20 ` Chris Wilson
` (3 more replies)
0 siblings, 4 replies; 7+ messages in thread
From: Chris Wilson @ 2017-12-01 0:15 UTC (permalink / raw)
To: intel-gfx
When capturing the bo, we allocate an array for min(vma->size,
vma->node.size) pages, plus a bit for compression overhead. Through my
and CI testing, this was sufficient for the mostly empty NULL context as
it compressed well (or the out-of-bounds access simply didn't cause an
issue). However, in real workloads on Cannonlake, we were overflowing
that array and causing havoc with the random memory corruption.
Reported-by: Rafael Antognolli <rafael.antognolli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103964
Fixes: 4e90a6e22272 ("drm/i915: Record default HW state in the GPU error state")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Tested-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
---
drivers/gpu/drm/i915/i915_gpu_error.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 876be8f1d930..48418fb81066 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1424,6 +1424,7 @@ capture_object(struct drm_i915_private *dev_priv,
if (obj && i915_gem_object_has_pages(obj)) {
struct i915_vma fake = {
.node = { .start = U64_MAX, .size = obj->base.size },
+ .size = obj->base.size,
.pages = obj->mm.pages,
.obj = obj,
};
--
2.15.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/i915: Set fake_vma.size as well as fake_vma.node.size for capture
2017-12-01 0:15 [PATCH] drm/i915: Set fake_vma.size as well as fake_vma.node.size for capture Chris Wilson
@ 2017-12-01 0:20 ` Chris Wilson
2017-12-01 0:36 ` ✓ Fi.CI.BAT: success for " Patchwork
` (2 subsequent siblings)
3 siblings, 0 replies; 7+ messages in thread
From: Chris Wilson @ 2017-12-01 0:20 UTC (permalink / raw)
To: intel-gfx
Quoting Chris Wilson (2017-12-01 00:15:36)
> When capturing the bo, we allocate an array for min(vma->size,
> vma->node.size) pages, plus a bit for compression overhead. Through my
> and CI testing, this was sufficient for the mostly empty NULL context as
> it compressed well (or the out-of-bounds access simply didn't cause an
> issue). However, in real workloads on Cannonlake, we were overflowing
> that array and causing havoc with the random memory corruption.
>
> Reported-by: Rafael Antognolli <rafael.antognolli@intel.com>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103964
> Fixes: 4e90a6e22272 ("drm/i915: Record default HW state in the GPU error state")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Tested-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 7+ messages in thread
* ✓ Fi.CI.BAT: success for drm/i915: Set fake_vma.size as well as fake_vma.node.size for capture
2017-12-01 0:15 [PATCH] drm/i915: Set fake_vma.size as well as fake_vma.node.size for capture Chris Wilson
2017-12-01 0:20 ` Chris Wilson
@ 2017-12-01 0:36 ` Patchwork
2017-12-01 1:39 ` ✓ Fi.CI.IGT: " Patchwork
2017-12-01 8:28 ` [PATCH] " Mika Kuoppala
3 siblings, 0 replies; 7+ messages in thread
From: Patchwork @ 2017-12-01 0:36 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
== Series Details ==
Series: drm/i915: Set fake_vma.size as well as fake_vma.node.size for capture
URL : https://patchwork.freedesktop.org/series/34717/
State : success
== Summary ==
Series 34717v1 drm/i915: Set fake_vma.size as well as fake_vma.node.size for capture
https://patchwork.freedesktop.org/api/1.0/series/34717/revisions/1/mbox/
Test gem_exec_reloc:
Subgroup basic-write-read-active:
fail -> PASS (fi-gdg-551) fdo#102582
Test prime_vgem:
Subgroup basic-fence-flip:
fail -> PASS (fi-byt-n2820)
fdo#102582 https://bugs.freedesktop.org/show_bug.cgi?id=102582
fi-bdw-5557u total:288 pass:267 dwarn:0 dfail:0 fail:0 skip:21 time:449s
fi-bdw-gvtdvm total:288 pass:264 dwarn:0 dfail:0 fail:0 skip:24 time:440s
fi-blb-e6850 total:288 pass:223 dwarn:1 dfail:0 fail:0 skip:64 time:386s
fi-bsw-n3050 total:288 pass:242 dwarn:0 dfail:0 fail:0 skip:46 time:520s
fi-bwr-2160 total:288 pass:183 dwarn:0 dfail:0 fail:0 skip:105 time:282s
fi-bxt-dsi total:288 pass:258 dwarn:0 dfail:0 fail:0 skip:30 time:505s
fi-bxt-j4205 total:288 pass:259 dwarn:0 dfail:0 fail:0 skip:29 time:509s
fi-byt-n2820 total:288 pass:249 dwarn:0 dfail:0 fail:0 skip:39 time:472s
fi-elk-e7500 total:224 pass:162 dwarn:16 dfail:0 fail:0 skip:45
fi-gdg-551 total:288 pass:178 dwarn:1 dfail:0 fail:1 skip:108 time:266s
fi-glk-1 total:288 pass:260 dwarn:0 dfail:0 fail:0 skip:28 time:537s
fi-hsw-4770 total:288 pass:261 dwarn:0 dfail:0 fail:0 skip:27 time:371s
fi-hsw-4770r total:288 pass:224 dwarn:0 dfail:0 fail:0 skip:64 time:257s
fi-ilk-650 total:288 pass:228 dwarn:0 dfail:0 fail:0 skip:60 time:396s
fi-ivb-3520m total:288 pass:259 dwarn:0 dfail:0 fail:0 skip:29 time:480s
fi-ivb-3770 total:288 pass:259 dwarn:0 dfail:0 fail:0 skip:29 time:446s
fi-kbl-7500u total:288 pass:263 dwarn:1 dfail:0 fail:0 skip:24 time:485s
fi-kbl-7560u total:288 pass:269 dwarn:0 dfail:0 fail:0 skip:19 time:530s
fi-kbl-7567u total:288 pass:268 dwarn:0 dfail:0 fail:0 skip:20 time:480s
fi-kbl-r total:288 pass:261 dwarn:0 dfail:0 fail:0 skip:27 time:533s
fi-pnv-d510 total:288 pass:222 dwarn:1 dfail:0 fail:0 skip:65 time:591s
fi-skl-6260u total:288 pass:268 dwarn:0 dfail:0 fail:0 skip:20 time:469s
fi-skl-6600u total:288 pass:261 dwarn:0 dfail:0 fail:0 skip:27 time:538s
fi-skl-6700hq total:288 pass:262 dwarn:0 dfail:0 fail:0 skip:26 time:562s
fi-skl-6700k total:288 pass:264 dwarn:0 dfail:0 fail:0 skip:24 time:516s
fi-skl-6770hq total:288 pass:268 dwarn:0 dfail:0 fail:0 skip:20 time:499s
fi-skl-gvtdvm total:288 pass:265 dwarn:0 dfail:0 fail:0 skip:23 time:449s
fi-snb-2520m total:288 pass:249 dwarn:0 dfail:0 fail:0 skip:39 time:549s
fi-snb-2600 total:288 pass:248 dwarn:0 dfail:0 fail:0 skip:40 time:413s
Blacklisted hosts:
fi-cfl-s2 total:288 pass:262 dwarn:0 dfail:0 fail:0 skip:26 time:612s
fi-glk-dsi total:288 pass:258 dwarn:0 dfail:0 fail:0 skip:30 time:490s
2147458fbe6d5cd610598f1270265e26c716a596 drm-tip: 2017y-11m-30d-23h-37m-28s UTC integration manifest
bbcabbdba265 drm/i915: Set fake_vma.size as well as fake_vma.node.size for capture
== Logs ==
For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_7384/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 7+ messages in thread
* ✓ Fi.CI.IGT: success for drm/i915: Set fake_vma.size as well as fake_vma.node.size for capture
2017-12-01 0:15 [PATCH] drm/i915: Set fake_vma.size as well as fake_vma.node.size for capture Chris Wilson
2017-12-01 0:20 ` Chris Wilson
2017-12-01 0:36 ` ✓ Fi.CI.BAT: success for " Patchwork
@ 2017-12-01 1:39 ` Patchwork
2017-12-01 9:24 ` Chris Wilson
2017-12-01 8:28 ` [PATCH] " Mika Kuoppala
3 siblings, 1 reply; 7+ messages in thread
From: Patchwork @ 2017-12-01 1:39 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
== Series Details ==
Series: drm/i915: Set fake_vma.size as well as fake_vma.node.size for capture
URL : https://patchwork.freedesktop.org/series/34717/
State : success
== Summary ==
Test kms_frontbuffer_tracking:
Subgroup fbc-1p-offscren-pri-shrfb-draw-render:
pass -> FAIL (shard-snb) fdo#101623 +1
Subgroup fbc-rgb101010-draw-mmap-wc:
skip -> PASS (shard-hsw) fdo#103167
Test gem_busy:
Subgroup close-race:
fail -> PASS (shard-snb) fdo#103829
Test drv_module_reload:
Subgroup basic-no-display:
pass -> DMESG-WARN (shard-snb) fdo#102707 +2
Test gem_tiled_swapping:
Subgroup non-threaded:
incomplete -> PASS (shard-hsw) fdo#103525
Test kms_flip_event_leak:
skip -> PASS (shard-hsw)
Test kms_setmode:
Subgroup basic:
pass -> FAIL (shard-hsw) fdo#99912
Test perf:
Subgroup blocking:
pass -> FAIL (shard-hsw) fdo#102252
fdo#101623 https://bugs.freedesktop.org/show_bug.cgi?id=101623
fdo#103167 https://bugs.freedesktop.org/show_bug.cgi?id=103167
fdo#103829 https://bugs.freedesktop.org/show_bug.cgi?id=103829
fdo#102707 https://bugs.freedesktop.org/show_bug.cgi?id=102707
fdo#103525 https://bugs.freedesktop.org/show_bug.cgi?id=103525
fdo#99912 https://bugs.freedesktop.org/show_bug.cgi?id=99912
fdo#102252 https://bugs.freedesktop.org/show_bug.cgi?id=102252
shard-hsw total:2663 pass:1535 dwarn:1 dfail:0 fail:11 skip:1116 time:9509s
shard-snb total:2663 pass:1305 dwarn:3 dfail:0 fail:13 skip:1342 time:8090s
Blacklisted hosts:
shard-apl total:2663 pass:1690 dwarn:1 dfail:0 fail:23 skip:949 time:13795s
shard-kbl total:2650 pass:1796 dwarn:1 dfail:0 fail:23 skip:829 time:10561s
== Logs ==
For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_7384/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/i915: Set fake_vma.size as well as fake_vma.node.size for capture
2017-12-01 0:15 [PATCH] drm/i915: Set fake_vma.size as well as fake_vma.node.size for capture Chris Wilson
` (2 preceding siblings ...)
2017-12-01 1:39 ` ✓ Fi.CI.IGT: " Patchwork
@ 2017-12-01 8:28 ` Mika Kuoppala
2017-12-01 9:01 ` Chris Wilson
3 siblings, 1 reply; 7+ messages in thread
From: Mika Kuoppala @ 2017-12-01 8:28 UTC (permalink / raw)
To: Chris Wilson, intel-gfx
Chris Wilson <chris@chris-wilson.co.uk> writes:
> When capturing the bo, we allocate an array for min(vma->size,
> vma->node.size) pages, plus a bit for compression overhead. Through my
> and CI testing, this was sufficient for the mostly empty NULL context as
> it compressed well (or the out-of-bounds access simply didn't cause an
> issue). However, in real workloads on Cannonlake, we were overflowing
> that array and causing havoc with the random memory corruption.
>
When capturing the error object we allocate a struct for bookkeeping
plus an array for min(vma->size, vma->node.size) pages and a bit for
compression overhead. We use this mechanism when capturing state object
by constructing a fake vma for it. We forgot to set the vma size
causing allocation to cater only for bookkeepping struct, overflowing
and causing havoc with the random memory corruption.
This is how I see it so with above and including possible language fixes,
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Reported-by: Rafael Antognolli <rafael.antognolli@intel.com>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103964
> Fixes: 4e90a6e22272 ("drm/i915: Record default HW state in the GPU error state")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Tested-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
> ---
> drivers/gpu/drm/i915/i915_gpu_error.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 876be8f1d930..48418fb81066 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -1424,6 +1424,7 @@ capture_object(struct drm_i915_private *dev_priv,
> if (obj && i915_gem_object_has_pages(obj)) {
> struct i915_vma fake = {
> .node = { .start = U64_MAX, .size = obj->base.size },
> + .size = obj->base.size,
> .pages = obj->mm.pages,
> .obj = obj,
> };
> --
> 2.15.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/i915: Set fake_vma.size as well as fake_vma.node.size for capture
2017-12-01 8:28 ` [PATCH] " Mika Kuoppala
@ 2017-12-01 9:01 ` Chris Wilson
0 siblings, 0 replies; 7+ messages in thread
From: Chris Wilson @ 2017-12-01 9:01 UTC (permalink / raw)
To: Mika Kuoppala, intel-gfx
Quoting Mika Kuoppala (2017-12-01 08:28:45)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
>
> > When capturing the bo, we allocate an array for min(vma->size,
> > vma->node.size) pages, plus a bit for compression overhead. Through my
> > and CI testing, this was sufficient for the mostly empty NULL context as
> > it compressed well (or the out-of-bounds access simply didn't cause an
> > issue). However, in real workloads on Cannonlake, we were overflowing
> > that array and causing havoc with the random memory corruption.
> >
>
> When capturing the error object we allocate a struct for bookkeeping
We are capturing a bo, into the error object. (As opposed to when we are
capturing to just the vma.)
> plus an array for min(vma->size, vma->node.size) pages and a bit for
> compression overhead. We use this mechanism when capturing state object
> by constructing a fake vma for it. We forgot to set the vma size
We set one of the sizes used, I forgot it compared both. Ah, I see, I
missed a sentence saying what was missing, just focussed on how it went
wrong and yet survived testing. Ta,
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ✓ Fi.CI.IGT: success for drm/i915: Set fake_vma.size as well as fake_vma.node.size for capture
2017-12-01 1:39 ` ✓ Fi.CI.IGT: " Patchwork
@ 2017-12-01 9:24 ` Chris Wilson
0 siblings, 0 replies; 7+ messages in thread
From: Chris Wilson @ 2017-12-01 9:24 UTC (permalink / raw)
To: Patchwork; +Cc: intel-gfx
Quoting Patchwork (2017-12-01 01:39:53)
> == Series Details ==
>
> Series: drm/i915: Set fake_vma.size as well as fake_vma.node.size for capture
> URL : https://patchwork.freedesktop.org/series/34717/
> State : success
>
> == Summary ==
>
> Test kms_frontbuffer_tracking:
> Subgroup fbc-1p-offscren-pri-shrfb-draw-render:
> pass -> FAIL (shard-snb) fdo#101623 +1
> Subgroup fbc-rgb101010-draw-mmap-wc:
> skip -> PASS (shard-hsw) fdo#103167
> Test gem_busy:
> Subgroup close-race:
> fail -> PASS (shard-snb) fdo#103829
> Test drv_module_reload:
> Subgroup basic-no-display:
> pass -> DMESG-WARN (shard-snb) fdo#102707 +2
> Test gem_tiled_swapping:
> Subgroup non-threaded:
> incomplete -> PASS (shard-hsw) fdo#103525
> Test kms_flip_event_leak:
> skip -> PASS (shard-hsw)
> Test kms_setmode:
> Subgroup basic:
> pass -> FAIL (shard-hsw) fdo#99912
> Test perf:
> Subgroup blocking:
> pass -> FAIL (shard-hsw) fdo#102252
And pushed. Hopefully this is the silliest brainfart for 4.16.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2017-12-01 9:24 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-01 0:15 [PATCH] drm/i915: Set fake_vma.size as well as fake_vma.node.size for capture Chris Wilson
2017-12-01 0:20 ` Chris Wilson
2017-12-01 0:36 ` ✓ Fi.CI.BAT: success for " Patchwork
2017-12-01 1:39 ` ✓ Fi.CI.IGT: " Patchwork
2017-12-01 9:24 ` Chris Wilson
2017-12-01 8:28 ` [PATCH] " Mika Kuoppala
2017-12-01 9:01 ` Chris Wilson
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.