All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH igt] igt/drv_hangman: Use manual error-state generation
@ 2016-10-20  9:07 Chris Wilson
  2016-10-20  9:29 ` Daniel Vetter
  0 siblings, 1 reply; 7+ messages in thread
From: Chris Wilson @ 2016-10-20  9:07 UTC (permalink / raw)
  To: intel-gfx

For the basic error state, we only desire that an error state be created
following a hang. For that purpose, we do not need a real hang (slow
6-12s) but can inject one instead (fast <1s).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 tests/drv_hangman.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/tests/drv_hangman.c b/tests/drv_hangman.c
index 953a4c6..bfe5eaf 100644
--- a/tests/drv_hangman.c
+++ b/tests/drv_hangman.c
@@ -32,6 +32,8 @@
 #include <sys/stat.h>
 #include <fcntl.h>
 
+#include "igt_debugfs.h"
+
 #ifndef I915_PARAM_CMD_PARSER_VERSION
 #define I915_PARAM_CMD_PARSER_VERSION       28
 #endif
@@ -166,15 +168,21 @@ static void test_error_state_basic(void)
 {
 	int fd;
 
-	fd = drm_open_driver(DRIVER_INTEL);
-
 	clear_error_state();
 	assert_error_state_clear();
 
-	submit_hang(fd, I915_EXEC_RENDER);
+	/* Manually trigger a hang by request a reset */
+	fd = igt_debugfs_open("i915_wedged", O_WRONLY);
+	igt_ignore_warn(write(fd, "1\n", 2));
+	close(fd);
+
+	/* Wait for the error capture and gpu reset to complete */
+	fd = drm_open_driver(DRIVER_INTEL);
+	gem_quiescent_gpu(fd);
 	close(fd);
 
 	assert_error_state_collected();
+
 	clear_error_state();
 	assert_error_state_clear();
 }
-- 
2.9.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH igt] igt/drv_hangman: Use manual error-state generation
  2016-10-20  9:07 [PATCH igt] igt/drv_hangman: Use manual error-state generation Chris Wilson
@ 2016-10-20  9:29 ` Daniel Vetter
  2016-10-20  9:46   ` Chris Wilson
  0 siblings, 1 reply; 7+ messages in thread
From: Daniel Vetter @ 2016-10-20  9:29 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Thu, Oct 20, 2016 at 10:07:39AM +0100, Chris Wilson wrote:
> For the basic error state, we only desire that an error state be created
> following a hang. For that purpose, we do not need a real hang (slow
> 6-12s) but can inject one instead (fast <1s).
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Should we instead speed up hangcheck? I think there's lots of value in
making sure not just error dumping, but also hang detection works somewhat
in BAT. Since if it doesn't any attempt at a full run will lead to pretty
serious disasters. And I have this dream that BAT is the gating thing
deciding whether a patch series deserves a complete pre-merge run ;-)

But since this is a controlled enviromnent we could make hangcheck
super-fast at timing out with some debugfs knob. Would probably also help
a lot with speeding up the gazillion of testcases in gem_reset_stats.
-Daniel

> ---
>  tests/drv_hangman.c | 14 +++++++++++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/tests/drv_hangman.c b/tests/drv_hangman.c
> index 953a4c6..bfe5eaf 100644
> --- a/tests/drv_hangman.c
> +++ b/tests/drv_hangman.c
> @@ -32,6 +32,8 @@
>  #include <sys/stat.h>
>  #include <fcntl.h>
>  
> +#include "igt_debugfs.h"
> +
>  #ifndef I915_PARAM_CMD_PARSER_VERSION
>  #define I915_PARAM_CMD_PARSER_VERSION       28
>  #endif
> @@ -166,15 +168,21 @@ static void test_error_state_basic(void)
>  {
>  	int fd;
>  
> -	fd = drm_open_driver(DRIVER_INTEL);
> -
>  	clear_error_state();
>  	assert_error_state_clear();
>  
> -	submit_hang(fd, I915_EXEC_RENDER);
> +	/* Manually trigger a hang by request a reset */
> +	fd = igt_debugfs_open("i915_wedged", O_WRONLY);
> +	igt_ignore_warn(write(fd, "1\n", 2));
> +	close(fd);
> +
> +	/* Wait for the error capture and gpu reset to complete */
> +	fd = drm_open_driver(DRIVER_INTEL);
> +	gem_quiescent_gpu(fd);
>  	close(fd);
>  
>  	assert_error_state_collected();
> +
>  	clear_error_state();
>  	assert_error_state_clear();
>  }
> -- 
> 2.9.3
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH igt] igt/drv_hangman: Use manual error-state generation
  2016-10-20  9:29 ` Daniel Vetter
@ 2016-10-20  9:46   ` Chris Wilson
  2016-10-20 10:05     ` Chris Wilson
  2016-10-20 13:14     ` Daniel Vetter
  0 siblings, 2 replies; 7+ messages in thread
From: Chris Wilson @ 2016-10-20  9:46 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Thu, Oct 20, 2016 at 11:29:05AM +0200, Daniel Vetter wrote:
> On Thu, Oct 20, 2016 at 10:07:39AM +0100, Chris Wilson wrote:
> > For the basic error state, we only desire that an error state be created
> > following a hang. For that purpose, we do not need a real hang (slow
> > 6-12s) but can inject one instead (fast <1s).
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> 
> Should we instead speed up hangcheck? I think there's lots of value in
> making sure not just error dumping, but also hang detection works somewhat
> in BAT. Since if it doesn't any attempt at a full run will lead to pretty
> serious disasters. And I have this dream that BAT is the gating thing
> deciding whether a patch series deserves a complete pre-merge run ;-)

We have full-hang detection in BAT elsewhere as well. This particular
test was only asking the question "do we generate an error state", hence
why I felt it was safe to just do that and skip a simulated hang.
 
> But since this is a controlled enviromnent we could make hangcheck
> super-fast at timing out with some debugfs knob. Would probably also help
> a lot with speeding up the gazillion of testcases in gem_reset_stats.

I have considered i915.hangcheck_interval_ms many a time. It is not just
the interval but the hangcheck score threshold to consider. If we can
trust our activity detection, we would be safe with a hangcheck every
jiffie (at some overhead mind you), but we would declare a dos too soon.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH igt] igt/drv_hangman: Use manual error-state generation
  2016-10-20  9:46   ` Chris Wilson
@ 2016-10-20 10:05     ` Chris Wilson
  2016-10-20 13:14     ` Daniel Vetter
  1 sibling, 0 replies; 7+ messages in thread
From: Chris Wilson @ 2016-10-20 10:05 UTC (permalink / raw)
  To: Daniel Vetter, intel-gfx

On Thu, Oct 20, 2016 at 10:46:01AM +0100, Chris Wilson wrote:
> On Thu, Oct 20, 2016 at 11:29:05AM +0200, Daniel Vetter wrote:
> > On Thu, Oct 20, 2016 at 10:07:39AM +0100, Chris Wilson wrote:
> > > For the basic error state, we only desire that an error state be created
> > > following a hang. For that purpose, we do not need a real hang (slow
> > > 6-12s) but can inject one instead (fast <1s).
> > > 
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > 
> > Should we instead speed up hangcheck? I think there's lots of value in
> > making sure not just error dumping, but also hang detection works somewhat
> > in BAT. Since if it doesn't any attempt at a full run will lead to pretty
> > serious disasters. And I have this dream that BAT is the gating thing
> > deciding whether a patch series deserves a complete pre-merge run ;-)
> 
> We have full-hang detection in BAT elsewhere as well. This particular
> test was only asking the question "do we generate an error state", hence
> why I felt it was safe to just do that and skip a simulated hang.
>  
> > But since this is a controlled enviromnent we could make hangcheck
> > super-fast at timing out with some debugfs knob. Would probably also help
> > a lot with speeding up the gazillion of testcases in gem_reset_stats.
> 
> I have considered i915.hangcheck_interval_ms many a time. It is not just
> the interval but the hangcheck score threshold to consider. If we can
> trust our activity detection, we would be safe with a hangcheck every
> jiffie (at some overhead mind you), but we would declare a dos too soon.

Thinking of which, Mika did have some patches to move towards a time
accrued metric...
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH igt] igt/drv_hangman: Use manual error-state generation
  2016-10-20  9:46   ` Chris Wilson
  2016-10-20 10:05     ` Chris Wilson
@ 2016-10-20 13:14     ` Daniel Vetter
  2016-10-20 13:22       ` Chris Wilson
  1 sibling, 1 reply; 7+ messages in thread
From: Daniel Vetter @ 2016-10-20 13:14 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, intel-gfx

On Thu, Oct 20, 2016 at 10:46:01AM +0100, Chris Wilson wrote:
> On Thu, Oct 20, 2016 at 11:29:05AM +0200, Daniel Vetter wrote:
> > On Thu, Oct 20, 2016 at 10:07:39AM +0100, Chris Wilson wrote:
> > > For the basic error state, we only desire that an error state be created
> > > following a hang. For that purpose, we do not need a real hang (slow
> > > 6-12s) but can inject one instead (fast <1s).
> > > 
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > 
> > Should we instead speed up hangcheck? I think there's lots of value in
> > making sure not just error dumping, but also hang detection works somewhat
> > in BAT. Since if it doesn't any attempt at a full run will lead to pretty
> > serious disasters. And I have this dream that BAT is the gating thing
> > deciding whether a patch series deserves a complete pre-merge run ;-)
> 
> We have full-hang detection in BAT elsewhere as well. This particular
> test was only asking the question "do we generate an error state", hence
> why I felt it was safe to just do that and skip a simulated hang.

Hm, is it worth it then in BAT? Or does the other test not check whether
the error capture part was mildly successful? Might be worth it to just
combine them (in BAT) for even more time saved. Either way ack on this.

> > But since this is a controlled enviromnent we could make hangcheck
> > super-fast at timing out with some debugfs knob. Would probably also help
> > a lot with speeding up the gazillion of testcases in gem_reset_stats.
> 
> I have considered i915.hangcheck_interval_ms many a time. It is not just
> the interval but the hangcheck score threshold to consider. If we can
> trust our activity detection, we would be safe with a hangcheck every
> jiffie (at some overhead mind you), but we would declare a dos too soon.

Yeah, we'd still need to tune hangcheck in normal time. But for regression
testing I think a linear speed-up (i.e. just scaling the hangcheck
timeout, but leaving all the cadence and accumulation logic intact). And
ofc also scaling the dos loads with the same linear factor. Maybe a
--normal-time option in gem_reset_stats or similar could then be used for
manual testing of changes, or re-tuning.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH igt] igt/drv_hangman: Use manual error-state generation
  2016-10-20 13:14     ` Daniel Vetter
@ 2016-10-20 13:22       ` Chris Wilson
  2016-10-24  8:24         ` Daniel Vetter
  0 siblings, 1 reply; 7+ messages in thread
From: Chris Wilson @ 2016-10-20 13:22 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Thu, Oct 20, 2016 at 03:14:30PM +0200, Daniel Vetter wrote:
> On Thu, Oct 20, 2016 at 10:46:01AM +0100, Chris Wilson wrote:
> > On Thu, Oct 20, 2016 at 11:29:05AM +0200, Daniel Vetter wrote:
> > > On Thu, Oct 20, 2016 at 10:07:39AM +0100, Chris Wilson wrote:
> > > > For the basic error state, we only desire that an error state be created
> > > > following a hang. For that purpose, we do not need a real hang (slow
> > > > 6-12s) but can inject one instead (fast <1s).
> > > > 
> > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > 
> > > Should we instead speed up hangcheck? I think there's lots of value in
> > > making sure not just error dumping, but also hang detection works somewhat
> > > in BAT. Since if it doesn't any attempt at a full run will lead to pretty
> > > serious disasters. And I have this dream that BAT is the gating thing
> > > deciding whether a patch series deserves a complete pre-merge run ;-)
> > 
> > We have full-hang detection in BAT elsewhere as well. This particular
> > test was only asking the question "do we generate an error state", hence
> > why I felt it was safe to just do that and skip a simulated hang.
> 
> Hm, is it worth it then in BAT? Or does the other test not check whether
> the error capture part was mildly successful? Might be worth it to just
> combine them (in BAT) for even more time saved. Either way ack on this.

No, the other tests are to check we survive a hang, not that we generate
post-mortem error state. This test takes approximately 0.2s on a slow
device (at mild debug levels), and I think is concise enough to keep
separate.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH igt] igt/drv_hangman: Use manual error-state generation
  2016-10-20 13:22       ` Chris Wilson
@ 2016-10-24  8:24         ` Daniel Vetter
  0 siblings, 0 replies; 7+ messages in thread
From: Daniel Vetter @ 2016-10-24  8:24 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, intel-gfx

On Thu, Oct 20, 2016 at 02:22:56PM +0100, Chris Wilson wrote:
> On Thu, Oct 20, 2016 at 03:14:30PM +0200, Daniel Vetter wrote:
> > On Thu, Oct 20, 2016 at 10:46:01AM +0100, Chris Wilson wrote:
> > > On Thu, Oct 20, 2016 at 11:29:05AM +0200, Daniel Vetter wrote:
> > > > On Thu, Oct 20, 2016 at 10:07:39AM +0100, Chris Wilson wrote:
> > > > > For the basic error state, we only desire that an error state be created
> > > > > following a hang. For that purpose, we do not need a real hang (slow
> > > > > 6-12s) but can inject one instead (fast <1s).
> > > > > 
> > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > > 
> > > > Should we instead speed up hangcheck? I think there's lots of value in
> > > > making sure not just error dumping, but also hang detection works somewhat
> > > > in BAT. Since if it doesn't any attempt at a full run will lead to pretty
> > > > serious disasters. And I have this dream that BAT is the gating thing
> > > > deciding whether a patch series deserves a complete pre-merge run ;-)
> > > 
> > > We have full-hang detection in BAT elsewhere as well. This particular
> > > test was only asking the question "do we generate an error state", hence
> > > why I felt it was safe to just do that and skip a simulated hang.
> > 
> > Hm, is it worth it then in BAT? Or does the other test not check whether
> > the error capture part was mildly successful? Might be worth it to just
> > combine them (in BAT) for even more time saved. Either way ack on this.
> 
> No, the other tests are to check we survive a hang, not that we generate
> post-mortem error state. This test takes approximately 0.2s on a slow
> device (at mild debug levels), and I think is concise enough to keep
> separate.

Ok, makes sense.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-10-24  8:24 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-20  9:07 [PATCH igt] igt/drv_hangman: Use manual error-state generation Chris Wilson
2016-10-20  9:29 ` Daniel Vetter
2016-10-20  9:46   ` Chris Wilson
2016-10-20 10:05     ` Chris Wilson
2016-10-20 13:14     ` Daniel Vetter
2016-10-20 13:22       ` Chris Wilson
2016-10-24  8:24         ` Daniel Vetter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.