* [PATCH 1/2] tests/gem_eio: New ABI - no EIO even from wait_ioctl @ 2015-11-26 11:34 Daniel Vetter 2015-11-26 11:34 ` [PATCH 2/2] tests/gem_eio: Resilience against "hanging too fast" Daniel Vetter ` (2 more replies) 0 siblings, 3 replies; 16+ messages in thread From: Daniel Vetter @ 2015-11-26 11:34 UTC (permalink / raw) To: Intel Graphics Development; +Cc: Daniel Vetter, Daniel Vetter So there's 3 competing proposals for what wait_ioctl should do wrt -EIO: - return -EIO when the gpu is wedged. Not terribly useful for userspace since it might race with a hang and then there's no guarantee that a subsequent execbuf won't end up in an -EIO. Terminally wedge really can only be reliably signalled at execbuf time, and userspace needs to cope with that (or decide not to bother). - EIO for any obj that suffered from a reset. This means big internal reorginazation in the kernel since currently we track reset stats per-ctx and not on the obj. That's also what arb robustness wants. We could do this, but this feels like new ABI territory with the usual userspace requirements and high hurdles. - No -EIO at all. Consistent with set_domain_ioctl and simplest to implement. Which is what this patch does. We can always opt to change this later on if there's a real need. To make the test really exercise this do a full wedged gpu hang, to make sure -EIO doesn't leak out at all. Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> --- tests/gem_eio.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/tests/gem_eio.c b/tests/gem_eio.c index a24c8f1c53b5..8345d1a7a429 100644 --- a/tests/gem_eio.c +++ b/tests/gem_eio.c @@ -161,10 +161,14 @@ static void test_wait(int fd) { igt_hang_ring_t hang; + igt_require(i915_reset_control(false)); + hang = igt_hang_ring(fd, I915_EXEC_DEFAULT); - igt_assert_eq(__gem_wait(fd, hang.handle, -1), -EIO); + igt_assert_eq(__gem_wait(fd, hang.handle, -1), 0); igt_post_hang_ring(fd, hang); + igt_assert(i915_reset_control(true)); + trigger_reset(fd); } -- 2.1.0 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 2/2] tests/gem_eio: Resilience against "hanging too fast" 2015-11-26 11:34 [PATCH 1/2] tests/gem_eio: New ABI - no EIO even from wait_ioctl Daniel Vetter @ 2015-11-26 11:34 ` Daniel Vetter 2015-11-26 12:59 ` Chris Wilson 2015-11-30 10:11 ` [PATCH 1/2] tests/gem_eio: New ABI - no EIO even from wait_ioctl Chris Wilson 2015-12-16 13:48 ` Chris Wilson 2 siblings, 1 reply; 16+ messages in thread From: Daniel Vetter @ 2015-11-26 11:34 UTC (permalink / raw) To: Intel Graphics Development; +Cc: Daniel Vetter, Daniel Vetter Since $debugfs/i915_wedged restores a wedged gpu by using a normal gpu hang we need to be careful to not run into the "hanging too fast check": - don't restore the ban period, but instead keep it at 0. - make sure we idle the gpu fully before hanging it again (wait subtest missted that). With this gem_eio works now reliable even when I don't run the subtests individually. Of course it's a bit fishy that the default ctx gets blamed for essentially doing nothing, but until that's figured out in upstream it's better to make the test work for now. Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> --- tests/gem_eio.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/tests/gem_eio.c b/tests/gem_eio.c index 8345d1a7a429..6c07d9175b95 100644 --- a/tests/gem_eio.c +++ b/tests/gem_eio.c @@ -84,13 +84,17 @@ static void trigger_reset(int fd) static void wedge_gpu(int fd) { + igt_hang_ring_t hang; + /* First idle the GPU then disable GPU resets before injecting a hang */ gem_quiescent_gpu(fd); igt_require(i915_reset_control(false)); igt_debug("Wedging GPU by injecting hang\n"); - igt_post_hang_ring(fd, igt_hang_ring(fd, I915_EXEC_DEFAULT)); + hang = igt_hang_ring(fd, I915_EXEC_DEFAULT); + hang.ban = 0; + igt_post_hang_ring(fd, hang); igt_assert(i915_reset_control(true)); } @@ -161,10 +165,14 @@ static void test_wait(int fd) { igt_hang_ring_t hang; + /* First idle the GPU then disable GPU resets before injecting a hang */ + gem_quiescent_gpu(fd); + igt_require(i915_reset_control(false)); hang = igt_hang_ring(fd, I915_EXEC_DEFAULT); igt_assert_eq(__gem_wait(fd, hang.handle, -1), 0); + hang.ban = 0; igt_post_hang_ring(fd, hang); igt_assert(i915_reset_control(true)); -- 2.1.0 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 2/2] tests/gem_eio: Resilience against "hanging too fast" 2015-11-26 11:34 ` [PATCH 2/2] tests/gem_eio: Resilience against "hanging too fast" Daniel Vetter @ 2015-11-26 12:59 ` Chris Wilson 2015-11-26 14:46 ` Daniel Vetter 0 siblings, 1 reply; 16+ messages in thread From: Chris Wilson @ 2015-11-26 12:59 UTC (permalink / raw) To: Daniel Vetter; +Cc: Daniel Vetter, Intel Graphics Development On Thu, Nov 26, 2015 at 12:34:35PM +0100, Daniel Vetter wrote: > Since $debugfs/i915_wedged restores a wedged gpu by using a normal gpu > hang we need to be careful to not run into the "hanging too fast > check": > > - don't restore the ban period, but instead keep it at 0. > - make sure we idle the gpu fully before hanging it again (wait > subtest missted that). > > With this gem_eio works now reliable even when I don't run the > subtests individually. > > Of course it's a bit fishy that the default ctx gets blamed for > essentially doing nothing, but until that's figured out in upstream > it's better to make the test work for now. This used to be reliable. And just disabling all banning in the kernel forever more is silly. During igt_post_hang_ring: 1. we wait upon the hanging batch - this returns when hangcheck fires 2. reset the ban period to normal - this takes mutex_lock_interruptible and so must wait for the reset handler to run before it can make the change, - ergo the hanging batch never triggers a ban for itself. - (a subsequent nonsimulated GPU hang may trigger the ban though) Nak. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/2] tests/gem_eio: Resilience against "hanging too fast" 2015-11-26 12:59 ` Chris Wilson @ 2015-11-26 14:46 ` Daniel Vetter 2015-11-26 15:34 ` Chris Wilson 0 siblings, 1 reply; 16+ messages in thread From: Daniel Vetter @ 2015-11-26 14:46 UTC (permalink / raw) To: Chris Wilson, Daniel Vetter, Intel Graphics Development, Daniel Vetter On Thu, Nov 26, 2015 at 12:59:37PM +0000, Chris Wilson wrote: > On Thu, Nov 26, 2015 at 12:34:35PM +0100, Daniel Vetter wrote: > > Since $debugfs/i915_wedged restores a wedged gpu by using a normal gpu > > hang we need to be careful to not run into the "hanging too fast > > check": > > > > - don't restore the ban period, but instead keep it at 0. > > - make sure we idle the gpu fully before hanging it again (wait > > subtest missted that). > > > > With this gem_eio works now reliable even when I don't run the > > subtests individually. > > > > Of course it's a bit fishy that the default ctx gets blamed for > > essentially doing nothing, but until that's figured out in upstream > > it's better to make the test work for now. > > This used to be reliable. And just disabling all banning in the kernel > forever more is silly. > > During igt_post_hang_ring: > 1. we wait upon the hanging batch > - this returns when hangcheck fires > 2. reset the ban period to normal > - this takes mutex_lock_interruptible and so must wait for the reset > handler to run before it can make the change, > - ergo the hanging batch never triggers a ban for itself. > - (a subsequent nonsimulated GPU hang may trigger the ban though) This isn't where it dies. It dies when we do the echo 1 > i915_wedged. I suspect quiescent_gpu or whatever is getting in the way, but I really only wanted to get things to run first. And since i915_wedged is a developer feature, and it does work perfectly if you don't intend to reuse contexts I didn't see any point in first trying to fix it up. So I still maintain that this is a good enough approach, at least if there's no obvious fix in-flight already. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/2] tests/gem_eio: Resilience against "hanging too fast" 2015-11-26 14:46 ` Daniel Vetter @ 2015-11-26 15:34 ` Chris Wilson 2015-11-26 15:51 ` Daniel Vetter 0 siblings, 1 reply; 16+ messages in thread From: Chris Wilson @ 2015-11-26 15:34 UTC (permalink / raw) To: Daniel Vetter; +Cc: Daniel Vetter, Intel Graphics Development, Daniel Vetter On Thu, Nov 26, 2015 at 03:46:06PM +0100, Daniel Vetter wrote: > On Thu, Nov 26, 2015 at 12:59:37PM +0000, Chris Wilson wrote: > > On Thu, Nov 26, 2015 at 12:34:35PM +0100, Daniel Vetter wrote: > > > Since $debugfs/i915_wedged restores a wedged gpu by using a normal gpu > > > hang we need to be careful to not run into the "hanging too fast > > > check": > > > > > > - don't restore the ban period, but instead keep it at 0. > > > - make sure we idle the gpu fully before hanging it again (wait > > > subtest missted that). > > > > > > With this gem_eio works now reliable even when I don't run the > > > subtests individually. > > > > > > Of course it's a bit fishy that the default ctx gets blamed for > > > essentially doing nothing, but until that's figured out in upstream > > > it's better to make the test work for now. > > > > This used to be reliable. And just disabling all banning in the kernel > > forever more is silly. > > > > During igt_post_hang_ring: > > 1. we wait upon the hanging batch > > - this returns when hangcheck fires > > 2. reset the ban period to normal > > - this takes mutex_lock_interruptible and so must wait for the reset > > handler to run before it can make the change, > > - ergo the hanging batch never triggers a ban for itself. > > - (a subsequent nonsimulated GPU hang may trigger the ban though) > > This isn't where it dies. It dies when we do the echo 1 > i915_wedged. That is not where it dies. > I suspect quiescent_gpu or whatever is getting in the way, but I really only > wanted to get things to run first. And since i915_wedged is a developer > feature, and it does work perfectly if you don't intend to reuse contexts > I didn't see any point in first trying to fix it up. > > So I still maintain that this is a good enough approach, at least if > there's no obvious fix in-flight already. No way. This is a kernel regression since 4.0, having just tested with v4.0 on snb/ivb/hsw. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/2] tests/gem_eio: Resilience against "hanging too fast" 2015-11-26 15:34 ` Chris Wilson @ 2015-11-26 15:51 ` Daniel Vetter 2015-11-26 21:10 ` Chris Wilson 0 siblings, 1 reply; 16+ messages in thread From: Daniel Vetter @ 2015-11-26 15:51 UTC (permalink / raw) To: Chris Wilson, Daniel Vetter, Daniel Vetter, Intel Graphics Development, Daniel Vetter On Thu, Nov 26, 2015 at 03:34:05PM +0000, Chris Wilson wrote: > On Thu, Nov 26, 2015 at 03:46:06PM +0100, Daniel Vetter wrote: > > On Thu, Nov 26, 2015 at 12:59:37PM +0000, Chris Wilson wrote: > > > On Thu, Nov 26, 2015 at 12:34:35PM +0100, Daniel Vetter wrote: > > > > Since $debugfs/i915_wedged restores a wedged gpu by using a normal gpu > > > > hang we need to be careful to not run into the "hanging too fast > > > > check": > > > > > > > > - don't restore the ban period, but instead keep it at 0. > > > > - make sure we idle the gpu fully before hanging it again (wait > > > > subtest missted that). > > > > > > > > With this gem_eio works now reliable even when I don't run the > > > > subtests individually. > > > > > > > > Of course it's a bit fishy that the default ctx gets blamed for > > > > essentially doing nothing, but until that's figured out in upstream > > > > it's better to make the test work for now. > > > > > > This used to be reliable. And just disabling all banning in the kernel > > > forever more is silly. > > > > > > During igt_post_hang_ring: > > > 1. we wait upon the hanging batch > > > - this returns when hangcheck fires > > > 2. reset the ban period to normal > > > - this takes mutex_lock_interruptible and so must wait for the reset > > > handler to run before it can make the change, > > > - ergo the hanging batch never triggers a ban for itself. > > > - (a subsequent nonsimulated GPU hang may trigger the ban though) > > > > This isn't where it dies. It dies when we do the echo 1 > i915_wedged. > > That is not where it dies. Well at least it happens after we start the hang recover from i915_wedged. > > I suspect quiescent_gpu or whatever is getting in the way, but I really only > > wanted to get things to run first. And since i915_wedged is a developer > > feature, and it does work perfectly if you don't intend to reuse contexts > > I didn't see any point in first trying to fix it up. > > > > So I still maintain that this is a good enough approach, at least if > > there's no obvious fix in-flight already. > > No way. This is a kernel regression since 4.0, having just tested with > v4.0 on snb/ivb/hsw. Ok, I didn't realize that. I figured since i915_wedged will return -EAGAIN anyway when we are terminally wedged, and that seems to have been the case ever since we started with reset_counter this has been broken forever. I guess I missed something. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/2] tests/gem_eio: Resilience against "hanging too fast" 2015-11-26 15:51 ` Daniel Vetter @ 2015-11-26 21:10 ` Chris Wilson 2015-11-30 8:25 ` Daniel Vetter 0 siblings, 1 reply; 16+ messages in thread From: Chris Wilson @ 2015-11-26 21:10 UTC (permalink / raw) To: Daniel Vetter; +Cc: Daniel Vetter, Intel Graphics Development, Daniel Vetter On Thu, Nov 26, 2015 at 04:51:13PM +0100, Daniel Vetter wrote: > On Thu, Nov 26, 2015 at 03:34:05PM +0000, Chris Wilson wrote: > > On Thu, Nov 26, 2015 at 03:46:06PM +0100, Daniel Vetter wrote: > > > On Thu, Nov 26, 2015 at 12:59:37PM +0000, Chris Wilson wrote: > > > > On Thu, Nov 26, 2015 at 12:34:35PM +0100, Daniel Vetter wrote: > > > > > Since $debugfs/i915_wedged restores a wedged gpu by using a normal gpu > > > > > hang we need to be careful to not run into the "hanging too fast > > > > > check": > > > > > > > > > > - don't restore the ban period, but instead keep it at 0. > > > > > - make sure we idle the gpu fully before hanging it again (wait > > > > > subtest missted that). > > > > > > > > > > With this gem_eio works now reliable even when I don't run the > > > > > subtests individually. > > > > > > > > > > Of course it's a bit fishy that the default ctx gets blamed for > > > > > essentially doing nothing, but until that's figured out in upstream > > > > > it's better to make the test work for now. > > > > > > > > This used to be reliable. And just disabling all banning in the kernel > > > > forever more is silly. > > > > > > > > During igt_post_hang_ring: > > > > 1. we wait upon the hanging batch > > > > - this returns when hangcheck fires > > > > 2. reset the ban period to normal > > > > - this takes mutex_lock_interruptible and so must wait for the reset > > > > handler to run before it can make the change, > > > > - ergo the hanging batch never triggers a ban for itself. > > > > - (a subsequent nonsimulated GPU hang may trigger the ban though) > > > > > > This isn't where it dies. It dies when we do the echo 1 > i915_wedged. > > > > That is not where it dies. > > Well at least it happens after we start the hang recover from i915_wedged. > > > > I suspect quiescent_gpu or whatever is getting in the way, but I really only > > > wanted to get things to run first. And since i915_wedged is a developer > > > feature, and it does work perfectly if you don't intend to reuse contexts > > > I didn't see any point in first trying to fix it up. > > > > > > So I still maintain that this is a good enough approach, at least if > > > there's no obvious fix in-flight already. > > > > No way. This is a kernel regression since 4.0, having just tested with > > v4.0 on snb/ivb/hsw. > > Ok, I didn't realize that. I figured since i915_wedged will return -EAGAIN > anyway when we are terminally wedged, and that seems to have been the case > ever since we started with reset_counter this has been broken forever. I > guess I missed something. The bug I see is SNB specific, and introduced between v4.0 and v4.1. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/2] tests/gem_eio: Resilience against "hanging too fast" 2015-11-26 21:10 ` Chris Wilson @ 2015-11-30 8:25 ` Daniel Vetter 0 siblings, 0 replies; 16+ messages in thread From: Daniel Vetter @ 2015-11-30 8:25 UTC (permalink / raw) To: Chris Wilson, Daniel Vetter, Daniel Vetter, Intel Graphics Development, Daniel Vetter On Thu, Nov 26, 2015 at 09:10:57PM +0000, Chris Wilson wrote: > On Thu, Nov 26, 2015 at 04:51:13PM +0100, Daniel Vetter wrote: > > On Thu, Nov 26, 2015 at 03:34:05PM +0000, Chris Wilson wrote: > > > On Thu, Nov 26, 2015 at 03:46:06PM +0100, Daniel Vetter wrote: > > > > On Thu, Nov 26, 2015 at 12:59:37PM +0000, Chris Wilson wrote: > > > > > On Thu, Nov 26, 2015 at 12:34:35PM +0100, Daniel Vetter wrote: > > > > > > Since $debugfs/i915_wedged restores a wedged gpu by using a normal gpu > > > > > > hang we need to be careful to not run into the "hanging too fast > > > > > > check": > > > > > > > > > > > > - don't restore the ban period, but instead keep it at 0. > > > > > > - make sure we idle the gpu fully before hanging it again (wait > > > > > > subtest missted that). > > > > > > > > > > > > With this gem_eio works now reliable even when I don't run the > > > > > > subtests individually. > > > > > > > > > > > > Of course it's a bit fishy that the default ctx gets blamed for > > > > > > essentially doing nothing, but until that's figured out in upstream > > > > > > it's better to make the test work for now. > > > > > > > > > > This used to be reliable. And just disabling all banning in the kernel > > > > > forever more is silly. > > > > > > > > > > During igt_post_hang_ring: > > > > > 1. we wait upon the hanging batch > > > > > - this returns when hangcheck fires > > > > > 2. reset the ban period to normal > > > > > - this takes mutex_lock_interruptible and so must wait for the reset > > > > > handler to run before it can make the change, > > > > > - ergo the hanging batch never triggers a ban for itself. > > > > > - (a subsequent nonsimulated GPU hang may trigger the ban though) > > > > > > > > This isn't where it dies. It dies when we do the echo 1 > i915_wedged. > > > > > > That is not where it dies. > > > > Well at least it happens after we start the hang recover from i915_wedged. > > > > > > I suspect quiescent_gpu or whatever is getting in the way, but I really only > > > > wanted to get things to run first. And since i915_wedged is a developer > > > > feature, and it does work perfectly if you don't intend to reuse contexts > > > > I didn't see any point in first trying to fix it up. > > > > > > > > So I still maintain that this is a good enough approach, at least if > > > > there's no obvious fix in-flight already. > > > > > > No way. This is a kernel regression since 4.0, having just tested with > > > v4.0 on snb/ivb/hsw. > > > > Ok, I didn't realize that. I figured since i915_wedged will return -EAGAIN > > anyway when we are terminally wedged, and that seems to have been the case > > ever since we started with reset_counter this has been broken forever. I > > guess I missed something. > > The bug I see is SNB specific, and introduced between v4.0 and v4.1. The irony, I hacked on gem_eio on snb too ;-) -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] tests/gem_eio: New ABI - no EIO even from wait_ioctl 2015-11-26 11:34 [PATCH 1/2] tests/gem_eio: New ABI - no EIO even from wait_ioctl Daniel Vetter 2015-11-26 11:34 ` [PATCH 2/2] tests/gem_eio: Resilience against "hanging too fast" Daniel Vetter @ 2015-11-30 10:11 ` Chris Wilson 2015-12-01 8:28 ` Daniel Vetter 2015-12-16 13:48 ` Chris Wilson 2 siblings, 1 reply; 16+ messages in thread From: Chris Wilson @ 2015-11-30 10:11 UTC (permalink / raw) To: Daniel Vetter; +Cc: Daniel Vetter, Intel Graphics Development On Thu, Nov 26, 2015 at 12:34:34PM +0100, Daniel Vetter wrote: > So there's 3 competing proposals for what wait_ioctl should do wrt > -EIO: > > - return -EIO when the gpu is wedged. Not terribly useful for > userspace since it might race with a hang and then there's no > guarantee that a subsequent execbuf won't end up in an -EIO. > Terminally wedge really can only be reliably signalled at execbuf > time, and userspace needs to cope with that (or decide not to > bother). > > - EIO for any obj that suffered from a reset. This means big internal > reorginazation in the kernel since currently we track reset stats > per-ctx and not on the obj. That's also what arb robustness wants. > We could do this, but this feels like new ABI territory with the > usual userspace requirements and high hurdles. > > - No -EIO at all. Consistent with set_domain_ioctl and simplest to > implement. Which is what this patch does. Since no one else is weighing into the ABI discussion, I'm happy with losing EIO here. I thought it could be useful, but as no one is using or seems likely to start using it, begone. > We can always opt to change this later on if there's a real need. > > To make the test really exercise this do a full wedged gpu hang, to > make sure -EIO doesn't leak out at all. > > Cc: Chris Wilson <chris@chris-wilson.co.uk> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> > --- > tests/gem_eio.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/tests/gem_eio.c b/tests/gem_eio.c > index a24c8f1c53b5..8345d1a7a429 100644 > --- a/tests/gem_eio.c > +++ b/tests/gem_eio.c > @@ -161,10 +161,14 @@ static void test_wait(int fd) > { > igt_hang_ring_t hang; > > + igt_require(i915_reset_control(false)); However, this is not required to test the ABI change above as the wait itself will still hang, whether or not it wedges the GPU. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] tests/gem_eio: New ABI - no EIO even from wait_ioctl 2015-11-30 10:11 ` [PATCH 1/2] tests/gem_eio: New ABI - no EIO even from wait_ioctl Chris Wilson @ 2015-12-01 8:28 ` Daniel Vetter 2015-12-01 9:04 ` Chris Wilson 0 siblings, 1 reply; 16+ messages in thread From: Daniel Vetter @ 2015-12-01 8:28 UTC (permalink / raw) To: Chris Wilson, Daniel Vetter, Intel Graphics Development, Daniel Vetter On Mon, Nov 30, 2015 at 10:11:12AM +0000, Chris Wilson wrote: > On Thu, Nov 26, 2015 at 12:34:34PM +0100, Daniel Vetter wrote: > > So there's 3 competing proposals for what wait_ioctl should do wrt > > -EIO: > > > > - return -EIO when the gpu is wedged. Not terribly useful for > > userspace since it might race with a hang and then there's no > > guarantee that a subsequent execbuf won't end up in an -EIO. > > Terminally wedge really can only be reliably signalled at execbuf > > time, and userspace needs to cope with that (or decide not to > > bother). > > > > - EIO for any obj that suffered from a reset. This means big internal > > reorginazation in the kernel since currently we track reset stats > > per-ctx and not on the obj. That's also what arb robustness wants. > > We could do this, but this feels like new ABI territory with the > > usual userspace requirements and high hurdles. > > > > - No -EIO at all. Consistent with set_domain_ioctl and simplest to > > implement. Which is what this patch does. > > Since no one else is weighing into the ABI discussion, I'm happy with > losing EIO here. I thought it could be useful, but as no one is using or > seems likely to start using it, begone. > > > We can always opt to change this later on if there's a real need. > > > > To make the test really exercise this do a full wedged gpu hang, to > > make sure -EIO doesn't leak out at all. > > > > Cc: Chris Wilson <chris@chris-wilson.co.uk> > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> > > --- > > tests/gem_eio.c | 6 +++++- > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > diff --git a/tests/gem_eio.c b/tests/gem_eio.c > > index a24c8f1c53b5..8345d1a7a429 100644 > > --- a/tests/gem_eio.c > > +++ b/tests/gem_eio.c > > @@ -161,10 +161,14 @@ static void test_wait(int fd) > > { > > igt_hang_ring_t hang; > > > > + igt_require(i915_reset_control(false)); > > However, this is not required to test the ABI change above as the wait > itself will still hang, whether or not it wedges the GPU. Yes it's not strictly required, but without it the testcase is fairly boring. If we move the check_wedge out of wait_request then a normail gpu reset would always return 0 (after retrying a few times perhaps), so I figured testing the wedged case is the only one that's worth it. Maybe we should dupe the subtests all and have wedged and non-wedged cases for all of them? That would imo make more sense. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] tests/gem_eio: New ABI - no EIO even from wait_ioctl 2015-12-01 8:28 ` Daniel Vetter @ 2015-12-01 9:04 ` Chris Wilson 2015-12-01 9:13 ` Daniel Vetter 0 siblings, 1 reply; 16+ messages in thread From: Chris Wilson @ 2015-12-01 9:04 UTC (permalink / raw) To: Daniel Vetter; +Cc: Daniel Vetter, Intel Graphics Development, Daniel Vetter On Tue, Dec 01, 2015 at 09:28:08AM +0100, Daniel Vetter wrote: > On Mon, Nov 30, 2015 at 10:11:12AM +0000, Chris Wilson wrote: > > On Thu, Nov 26, 2015 at 12:34:34PM +0100, Daniel Vetter wrote: > > > So there's 3 competing proposals for what wait_ioctl should do wrt > > > -EIO: > > > > > > - return -EIO when the gpu is wedged. Not terribly useful for > > > userspace since it might race with a hang and then there's no > > > guarantee that a subsequent execbuf won't end up in an -EIO. > > > Terminally wedge really can only be reliably signalled at execbuf > > > time, and userspace needs to cope with that (or decide not to > > > bother). > > > > > > - EIO for any obj that suffered from a reset. This means big internal > > > reorginazation in the kernel since currently we track reset stats > > > per-ctx and not on the obj. That's also what arb robustness wants. > > > We could do this, but this feels like new ABI territory with the > > > usual userspace requirements and high hurdles. > > > > > > - No -EIO at all. Consistent with set_domain_ioctl and simplest to > > > implement. Which is what this patch does. > > > > Since no one else is weighing into the ABI discussion, I'm happy with > > losing EIO here. I thought it could be useful, but as no one is using or > > seems likely to start using it, begone. > > > > > We can always opt to change this later on if there's a real need. > > > > > > To make the test really exercise this do a full wedged gpu hang, to > > > make sure -EIO doesn't leak out at all. > > > > > > Cc: Chris Wilson <chris@chris-wilson.co.uk> > > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> > > > --- > > > tests/gem_eio.c | 6 +++++- > > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > > > diff --git a/tests/gem_eio.c b/tests/gem_eio.c > > > index a24c8f1c53b5..8345d1a7a429 100644 > > > --- a/tests/gem_eio.c > > > +++ b/tests/gem_eio.c > > > @@ -161,10 +161,14 @@ static void test_wait(int fd) > > > { > > > igt_hang_ring_t hang; > > > > > > + igt_require(i915_reset_control(false)); > > > > However, this is not required to test the ABI change above as the wait > > itself will still hang, whether or not it wedges the GPU. > > Yes it's not strictly required, but without it the testcase is fairly > boring. If we move the check_wedge out of wait_request then a normail gpu > reset would always return 0 (after retrying a few times perhaps), so I > figured testing the wedged case is the only one that's worth it. But wedging during the hang is also not interesting as we have no opportunity to see the reset failure in the test case. Putting the GPU into the wedged state before the wait, should be a trivial test that the object is idle after the reset. > Maybe we should dupe the subtests all and have wedged and non-wedged cases > for all of them? That would imo make more sense. The others, what matters is how we handle the GPU being wedged before we queue an execbuf or throttling. In terms of testing no error is reported for the hanging case, we should add tests for set-domain (so that it is explicit and not reliant on implementation inside lib/), GTT faulting (that would need both wedged and hanging cases, but we are not likely to be able to cover all the cases like waiting on fence and other secondary waits) and throttle. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] tests/gem_eio: New ABI - no EIO even from wait_ioctl 2015-12-01 9:04 ` Chris Wilson @ 2015-12-01 9:13 ` Daniel Vetter 2015-12-01 9:20 ` Chris Wilson 0 siblings, 1 reply; 16+ messages in thread From: Daniel Vetter @ 2015-12-01 9:13 UTC (permalink / raw) To: Chris Wilson, Daniel Vetter, Daniel Vetter, Intel Graphics Development, Daniel Vetter On Tue, Dec 01, 2015 at 09:04:23AM +0000, Chris Wilson wrote: > On Tue, Dec 01, 2015 at 09:28:08AM +0100, Daniel Vetter wrote: > > On Mon, Nov 30, 2015 at 10:11:12AM +0000, Chris Wilson wrote: > > > On Thu, Nov 26, 2015 at 12:34:34PM +0100, Daniel Vetter wrote: > > > > So there's 3 competing proposals for what wait_ioctl should do wrt > > > > -EIO: > > > > > > > > - return -EIO when the gpu is wedged. Not terribly useful for > > > > userspace since it might race with a hang and then there's no > > > > guarantee that a subsequent execbuf won't end up in an -EIO. > > > > Terminally wedge really can only be reliably signalled at execbuf > > > > time, and userspace needs to cope with that (or decide not to > > > > bother). > > > > > > > > - EIO for any obj that suffered from a reset. This means big internal > > > > reorginazation in the kernel since currently we track reset stats > > > > per-ctx and not on the obj. That's also what arb robustness wants. > > > > We could do this, but this feels like new ABI territory with the > > > > usual userspace requirements and high hurdles. > > > > > > > > - No -EIO at all. Consistent with set_domain_ioctl and simplest to > > > > implement. Which is what this patch does. > > > > > > Since no one else is weighing into the ABI discussion, I'm happy with > > > losing EIO here. I thought it could be useful, but as no one is using or > > > seems likely to start using it, begone. > > > > > > > We can always opt to change this later on if there's a real need. > > > > > > > > To make the test really exercise this do a full wedged gpu hang, to > > > > make sure -EIO doesn't leak out at all. > > > > > > > > Cc: Chris Wilson <chris@chris-wilson.co.uk> > > > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> > > > > --- > > > > tests/gem_eio.c | 6 +++++- > > > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/tests/gem_eio.c b/tests/gem_eio.c > > > > index a24c8f1c53b5..8345d1a7a429 100644 > > > > --- a/tests/gem_eio.c > > > > +++ b/tests/gem_eio.c > > > > @@ -161,10 +161,14 @@ static void test_wait(int fd) > > > > { > > > > igt_hang_ring_t hang; > > > > > > > > + igt_require(i915_reset_control(false)); > > > > > > However, this is not required to test the ABI change above as the wait > > > itself will still hang, whether or not it wedges the GPU. > > > > Yes it's not strictly required, but without it the testcase is fairly > > boring. If we move the check_wedge out of wait_request then a normail gpu > > reset would always return 0 (after retrying a few times perhaps), so I > > figured testing the wedged case is the only one that's worth it. > > But wedging during the hang is also not interesting as we have no > opportunity to see the reset failure in the test case. Putting the GPU > into the wedged state before the wait, should be a trivial test that the > object is idle after the reset. Right now (with current kernels) we see an -EIO with this testcase instead of 0 in the wait. Without disabling reset we see 0 both on fixed and broken kernels. So I don't really see why not testing this case is a good idea? It's the one we're currently failing at and leak -EIO to userspace. Assuming ofc we still go with the "let's curb -EIO except for execbuf" ABI. > > Maybe we should dupe the subtests all and have wedged and non-wedged cases > > for all of them? That would imo make more sense. > > The others, what matters is how we handle the GPU being wedged before we > queue an execbuf or throttling. In terms of testing no error is reported > for the hanging case, we should add tests for set-domain (so that it is > explicit and not reliant on implementation inside lib/), GTT faulting > (that would need both wedged and hanging cases, but we are not likely to > be able to cover all the cases like waiting on fence and other secondary > waits) and throttle. Yeah there's more gaps than just the one above. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] tests/gem_eio: New ABI - no EIO even from wait_ioctl 2015-12-01 9:13 ` Daniel Vetter @ 2015-12-01 9:20 ` Chris Wilson 2015-12-03 8:50 ` Daniel Vetter 0 siblings, 1 reply; 16+ messages in thread From: Chris Wilson @ 2015-12-01 9:20 UTC (permalink / raw) To: Daniel Vetter; +Cc: Daniel Vetter, Intel Graphics Development, Daniel Vetter On Tue, Dec 01, 2015 at 10:13:10AM +0100, Daniel Vetter wrote: > On Tue, Dec 01, 2015 at 09:04:23AM +0000, Chris Wilson wrote: > > On Tue, Dec 01, 2015 at 09:28:08AM +0100, Daniel Vetter wrote: > > > On Mon, Nov 30, 2015 at 10:11:12AM +0000, Chris Wilson wrote: > > > > On Thu, Nov 26, 2015 at 12:34:34PM +0100, Daniel Vetter wrote: > > > > > So there's 3 competing proposals for what wait_ioctl should do wrt > > > > > -EIO: > > > > > > > > > > - return -EIO when the gpu is wedged. Not terribly useful for > > > > > userspace since it might race with a hang and then there's no > > > > > guarantee that a subsequent execbuf won't end up in an -EIO. > > > > > Terminally wedge really can only be reliably signalled at execbuf > > > > > time, and userspace needs to cope with that (or decide not to > > > > > bother). > > > > > > > > > > - EIO for any obj that suffered from a reset. This means big internal > > > > > reorginazation in the kernel since currently we track reset stats > > > > > per-ctx and not on the obj. That's also what arb robustness wants. > > > > > We could do this, but this feels like new ABI territory with the > > > > > usual userspace requirements and high hurdles. > > > > > > > > > > - No -EIO at all. Consistent with set_domain_ioctl and simplest to > > > > > implement. Which is what this patch does. > > > > > > > > Since no one else is weighing into the ABI discussion, I'm happy with > > > > losing EIO here. I thought it could be useful, but as no one is using or > > > > seems likely to start using it, begone. > > > > > > > > > We can always opt to change this later on if there's a real need. > > > > > > > > > > To make the test really exercise this do a full wedged gpu hang, to > > > > > make sure -EIO doesn't leak out at all. > > > > > > > > > > Cc: Chris Wilson <chris@chris-wilson.co.uk> > > > > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> > > > > > --- > > > > > tests/gem_eio.c | 6 +++++- > > > > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/tests/gem_eio.c b/tests/gem_eio.c > > > > > index a24c8f1c53b5..8345d1a7a429 100644 > > > > > --- a/tests/gem_eio.c > > > > > +++ b/tests/gem_eio.c > > > > > @@ -161,10 +161,14 @@ static void test_wait(int fd) > > > > > { > > > > > igt_hang_ring_t hang; > > > > > > > > > > + igt_require(i915_reset_control(false)); > > > > > > > > However, this is not required to test the ABI change above as the wait > > > > itself will still hang, whether or not it wedges the GPU. > > > > > > Yes it's not strictly required, but without it the testcase is fairly > > > boring. If we move the check_wedge out of wait_request then a normail gpu > > > reset would always return 0 (after retrying a few times perhaps), so I > > > figured testing the wedged case is the only one that's worth it. > > > > But wedging during the hang is also not interesting as we have no > > opportunity to see the reset failure in the test case. Putting the GPU > > into the wedged state before the wait, should be a trivial test that the > > object is idle after the reset. > > Right now (with current kernels) we see an -EIO with this testcase instead > of 0 in the wait. Without disabling reset we see 0 both on fixed and > broken kernels. So I don't really see why not testing this case is a good > idea? It's the one we're currently failing at and leak -EIO to userspace. Because that wasn't the ABI I was trying to test here and expected to maintain! I was just aiming for covering the ioctl calls with expected EIO (to do the opposite would need something like trinity or a guided fuzzer). -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] tests/gem_eio: New ABI - no EIO even from wait_ioctl 2015-12-01 9:20 ` Chris Wilson @ 2015-12-03 8:50 ` Daniel Vetter 2015-12-03 9:00 ` Chris Wilson 0 siblings, 1 reply; 16+ messages in thread From: Daniel Vetter @ 2015-12-03 8:50 UTC (permalink / raw) To: Chris Wilson, Daniel Vetter, Daniel Vetter, Intel Graphics Development, Daniel Vetter On Tue, Dec 01, 2015 at 09:20:02AM +0000, Chris Wilson wrote: > On Tue, Dec 01, 2015 at 10:13:10AM +0100, Daniel Vetter wrote: > > On Tue, Dec 01, 2015 at 09:04:23AM +0000, Chris Wilson wrote: > > > On Tue, Dec 01, 2015 at 09:28:08AM +0100, Daniel Vetter wrote: > > > > On Mon, Nov 30, 2015 at 10:11:12AM +0000, Chris Wilson wrote: > > > > > On Thu, Nov 26, 2015 at 12:34:34PM +0100, Daniel Vetter wrote: > > > > > > So there's 3 competing proposals for what wait_ioctl should do wrt > > > > > > -EIO: > > > > > > > > > > > > - return -EIO when the gpu is wedged. Not terribly useful for > > > > > > userspace since it might race with a hang and then there's no > > > > > > guarantee that a subsequent execbuf won't end up in an -EIO. > > > > > > Terminally wedge really can only be reliably signalled at execbuf > > > > > > time, and userspace needs to cope with that (or decide not to > > > > > > bother). > > > > > > > > > > > > - EIO for any obj that suffered from a reset. This means big internal > > > > > > reorginazation in the kernel since currently we track reset stats > > > > > > per-ctx and not on the obj. That's also what arb robustness wants. > > > > > > We could do this, but this feels like new ABI territory with the > > > > > > usual userspace requirements and high hurdles. > > > > > > > > > > > > - No -EIO at all. Consistent with set_domain_ioctl and simplest to > > > > > > implement. Which is what this patch does. > > > > > > > > > > Since no one else is weighing into the ABI discussion, I'm happy with > > > > > losing EIO here. I thought it could be useful, but as no one is using or > > > > > seems likely to start using it, begone. > > > > > > > > > > > We can always opt to change this later on if there's a real need. > > > > > > > > > > > > To make the test really exercise this do a full wedged gpu hang, to > > > > > > make sure -EIO doesn't leak out at all. > > > > > > > > > > > > Cc: Chris Wilson <chris@chris-wilson.co.uk> > > > > > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> > > > > > > --- > > > > > > tests/gem_eio.c | 6 +++++- > > > > > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > > > > > > > > > diff --git a/tests/gem_eio.c b/tests/gem_eio.c > > > > > > index a24c8f1c53b5..8345d1a7a429 100644 > > > > > > --- a/tests/gem_eio.c > > > > > > +++ b/tests/gem_eio.c > > > > > > @@ -161,10 +161,14 @@ static void test_wait(int fd) > > > > > > { > > > > > > igt_hang_ring_t hang; > > > > > > > > > > > > + igt_require(i915_reset_control(false)); > > > > > > > > > > However, this is not required to test the ABI change above as the wait > > > > > itself will still hang, whether or not it wedges the GPU. > > > > > > > > Yes it's not strictly required, but without it the testcase is fairly > > > > boring. If we move the check_wedge out of wait_request then a normail gpu > > > > reset would always return 0 (after retrying a few times perhaps), so I > > > > figured testing the wedged case is the only one that's worth it. > > > > > > But wedging during the hang is also not interesting as we have no > > > opportunity to see the reset failure in the test case. Putting the GPU > > > into the wedged state before the wait, should be a trivial test that the > > > object is idle after the reset. > > > > Right now (with current kernels) we see an -EIO with this testcase instead > > of 0 in the wait. Without disabling reset we see 0 both on fixed and > > broken kernels. So I don't really see why not testing this case is a good > > idea? It's the one we're currently failing at and leak -EIO to userspace. > > Because that wasn't the ABI I was trying to test here and expected to > maintain! I was just aiming for covering the ioctl calls with expected > EIO (to do the opposite would need something like trinity or a guided > fuzzer). Well yeah, but I thought we decided to change it? So Ack or not? -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] tests/gem_eio: New ABI - no EIO even from wait_ioctl 2015-12-03 8:50 ` Daniel Vetter @ 2015-12-03 9:00 ` Chris Wilson 0 siblings, 0 replies; 16+ messages in thread From: Chris Wilson @ 2015-12-03 9:00 UTC (permalink / raw) To: Daniel Vetter; +Cc: Daniel Vetter, Intel Graphics Development, Daniel Vetter On Thu, Dec 03, 2015 at 09:50:25AM +0100, Daniel Vetter wrote: > On Tue, Dec 01, 2015 at 09:20:02AM +0000, Chris Wilson wrote: > > On Tue, Dec 01, 2015 at 10:13:10AM +0100, Daniel Vetter wrote: > > > On Tue, Dec 01, 2015 at 09:04:23AM +0000, Chris Wilson wrote: > > > > On Tue, Dec 01, 2015 at 09:28:08AM +0100, Daniel Vetter wrote: > > > > > On Mon, Nov 30, 2015 at 10:11:12AM +0000, Chris Wilson wrote: > > > > > > On Thu, Nov 26, 2015 at 12:34:34PM +0100, Daniel Vetter wrote: > > > > > > > So there's 3 competing proposals for what wait_ioctl should do wrt > > > > > > > -EIO: > > > > > > > > > > > > > > - return -EIO when the gpu is wedged. Not terribly useful for > > > > > > > userspace since it might race with a hang and then there's no > > > > > > > guarantee that a subsequent execbuf won't end up in an -EIO. > > > > > > > Terminally wedge really can only be reliably signalled at execbuf > > > > > > > time, and userspace needs to cope with that (or decide not to > > > > > > > bother). > > > > > > > > > > > > > > - EIO for any obj that suffered from a reset. This means big internal > > > > > > > reorginazation in the kernel since currently we track reset stats > > > > > > > per-ctx and not on the obj. That's also what arb robustness wants. > > > > > > > We could do this, but this feels like new ABI territory with the > > > > > > > usual userspace requirements and high hurdles. > > > > > > > > > > > > > > - No -EIO at all. Consistent with set_domain_ioctl and simplest to > > > > > > > implement. Which is what this patch does. > > > > > > > > > > > > Since no one else is weighing into the ABI discussion, I'm happy with > > > > > > losing EIO here. I thought it could be useful, but as no one is using or > > > > > > seems likely to start using it, begone. > > > > > > > > > > > > > We can always opt to change this later on if there's a real need. > > > > > > > > > > > > > > To make the test really exercise this do a full wedged gpu hang, to > > > > > > > make sure -EIO doesn't leak out at all. > > > > > > > > > > > > > > Cc: Chris Wilson <chris@chris-wilson.co.uk> > > > > > > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> > > > > > > > --- > > > > > > > tests/gem_eio.c | 6 +++++- > > > > > > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > > > > > > > > > > > diff --git a/tests/gem_eio.c b/tests/gem_eio.c > > > > > > > index a24c8f1c53b5..8345d1a7a429 100644 > > > > > > > --- a/tests/gem_eio.c > > > > > > > +++ b/tests/gem_eio.c > > > > > > > @@ -161,10 +161,14 @@ static void test_wait(int fd) > > > > > > > { > > > > > > > igt_hang_ring_t hang; > > > > > > > > > > > > > > + igt_require(i915_reset_control(false)); > > > > > > > > > > > > However, this is not required to test the ABI change above as the wait > > > > > > itself will still hang, whether or not it wedges the GPU. > > > > > > > > > > Yes it's not strictly required, but without it the testcase is fairly > > > > > boring. If we move the check_wedge out of wait_request then a normail gpu > > > > > reset would always return 0 (after retrying a few times perhaps), so I > > > > > figured testing the wedged case is the only one that's worth it. > > > > > > > > But wedging during the hang is also not interesting as we have no > > > > opportunity to see the reset failure in the test case. Putting the GPU > > > > into the wedged state before the wait, should be a trivial test that the > > > > object is idle after the reset. > > > > > > Right now (with current kernels) we see an -EIO with this testcase instead > > > of 0 in the wait. Without disabling reset we see 0 both on fixed and > > > broken kernels. So I don't really see why not testing this case is a good > > > idea? It's the one we're currently failing at and leak -EIO to userspace. > > > > Because that wasn't the ABI I was trying to test here and expected to > > maintain! I was just aiming for covering the ioctl calls with expected > > EIO (to do the opposite would need something like trinity or a guided > > fuzzer). > > Well yeah, but I thought we decided to change it? So Ack or not? If you change the subtest to do both, the more likely case where there is a GPU hang whilst we wait for a request, and the second where the GPU hang leads to a failed reset. There's actually a another pair where the GPU hangs (and reset succeeds or not) before we call the wait_ioctl. And then we can use the guts of that function to do the set-domain and gtt-fault testing. As far as wait-ioctl reporting 0 if the request is completed due to a reset, that's an ACK. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] tests/gem_eio: New ABI - no EIO even from wait_ioctl 2015-11-26 11:34 [PATCH 1/2] tests/gem_eio: New ABI - no EIO even from wait_ioctl Daniel Vetter 2015-11-26 11:34 ` [PATCH 2/2] tests/gem_eio: Resilience against "hanging too fast" Daniel Vetter 2015-11-30 10:11 ` [PATCH 1/2] tests/gem_eio: New ABI - no EIO even from wait_ioctl Chris Wilson @ 2015-12-16 13:48 ` Chris Wilson 2 siblings, 0 replies; 16+ messages in thread From: Chris Wilson @ 2015-12-16 13:48 UTC (permalink / raw) To: Daniel Vetter; +Cc: Daniel Vetter, Intel Graphics Development On Thu, Nov 26, 2015 at 12:34:34PM +0100, Daniel Vetter wrote: > So there's 3 competing proposals for what wait_ioctl should do wrt > -EIO: > > - return -EIO when the gpu is wedged. Not terribly useful for > userspace since it might race with a hang and then there's no > guarantee that a subsequent execbuf won't end up in an -EIO. > Terminally wedge really can only be reliably signalled at execbuf > time, and userspace needs to cope with that (or decide not to > bother). > > - EIO for any obj that suffered from a reset. This means big internal > reorginazation in the kernel since currently we track reset stats > per-ctx and not on the obj. That's also what arb robustness wants. > We could do this, but this feels like new ABI territory with the > usual userspace requirements and high hurdles. > > - No -EIO at all. Consistent with set_domain_ioctl and simplest to > implement. Which is what this patch does. > > We can always opt to change this later on if there's a real need. > > To make the test really exercise this do a full wedged gpu hang, to > make sure -EIO doesn't leak out at all. > > Cc: Chris Wilson <chris@chris-wilson.co.uk> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> I changed the test to cover both i915.reset=true and i915.reset=false possibilities and pushed. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2015-12-16 13:48 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-11-26 11:34 [PATCH 1/2] tests/gem_eio: New ABI - no EIO even from wait_ioctl Daniel Vetter 2015-11-26 11:34 ` [PATCH 2/2] tests/gem_eio: Resilience against "hanging too fast" Daniel Vetter 2015-11-26 12:59 ` Chris Wilson 2015-11-26 14:46 ` Daniel Vetter 2015-11-26 15:34 ` Chris Wilson 2015-11-26 15:51 ` Daniel Vetter 2015-11-26 21:10 ` Chris Wilson 2015-11-30 8:25 ` Daniel Vetter 2015-11-30 10:11 ` [PATCH 1/2] tests/gem_eio: New ABI - no EIO even from wait_ioctl Chris Wilson 2015-12-01 8:28 ` Daniel Vetter 2015-12-01 9:04 ` Chris Wilson 2015-12-01 9:13 ` Daniel Vetter 2015-12-01 9:20 ` Chris Wilson 2015-12-03 8:50 ` Daniel Vetter 2015-12-03 9:00 ` Chris Wilson 2015-12-16 13:48 ` Chris Wilson
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.