All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915/guc: Protect against no desc-pool on premature shutdown
@ 2018-07-13 17:26 Chris Wilson
  2018-07-13 17:48 ` Michal Wajdeczko
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Chris Wilson @ 2018-07-13 17:26 UTC (permalink / raw)
  To: intel-gfx; +Cc: Rodrigo Vivi

Hopefully the final hack to get guc fault-injection happy before we can
clean it up again, starting from a known good baseline...

[  383.017530] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a0
[  383.017556] Oops: 0000 [#1] PREEMPT SMP PTI
[  383.017566] CPU: 7 PID: 4725 Comm: drv_module_relo Tainted: G     U            4.18.0-rc4-CI-CI_DRM_4485+ #1
[  383.017581] Hardware name: Micro-Star International Co., Ltd. MS-7B54/Z370M MORTAR (MS-7B54), BIOS 1.10 12/28/2017
[  383.017664] RIP: 0010:guc_stage_desc_pool_destroy+0x17/0xe0 [i915]
[  383.017674] Code: 59 a0 c6 05 02 59 18 00 01 e8 5e 01 c3 e0 eb b1 0f 1f 00 53 48 89 fb 48 81 c7 90 02 00 00 e8 60 64 45 e1 48 8b 83 80 02 00 00 <48> 8b 80 a0 00 00 00 48 8b 90 68 02 00 00 48 83 ea 01 48 81 fa ff
[  383.017771] RSP: 0018:ffffc900004bbdd0 EFLAGS: 00010282
[  383.017782] RAX: 0000000000000000 RBX: ffff88012ff41300 RCX: 0000000000000000
[  383.017794] RDX: 0000000000000000 RSI: ffffc900004bbd80 RDI: 0000000000000000
[  383.017805] RBP: ffff88012ff40000 R08: 00000000d876ee11 R09: 0000000000000000
[  383.017817] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88012ff47770
[  383.017828] R13: ffff88012ff40068 R14: ffff880264392ef8 R15: ffffffffa0639950
[  383.017840] FS:  00007fb9c18c8980(0000) GS:ffff8802663c0000(0000) knlGS:0000000000000000
[  383.017853] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  383.017864] CR2: 00000000000000a0 CR3: 00000001df6cc003 CR4: 00000000003606e0
[  383.017875] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  383.017887] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  383.017898] Call Trace:
[  383.017962]  intel_uc_fini+0x34/0xd0 [i915]
[  383.018020]  i915_gem_fini+0x5c/0x100 [i915]
[  383.018093]  i915_driver_unload+0xd2/0x110 [i915]
[  383.018150]  i915_pci_remove+0x10/0x20 [i915]
[  383.018165]  pci_device_remove+0x36/0xb0
[  383.018179]  device_release_driver_internal+0x185/0x250
[  383.018193]  driver_detach+0x35/0x70
[  383.018205]  bus_remove_driver+0x53/0xd0
[  383.018217]  pci_unregister_driver+0x25/0xa0
[  383.018232]  __se_sys_delete_module+0x162/0x210
[  383.018245]  ? do_syscall_64+0xd/0x190
[  383.018257]  do_syscall_64+0x55/0x190
[  383.018270]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  383.018282] RIP: 0033:0x7fb9c0f7c1b7
[  383.018290] Code: 73 01 c3 48 8b 0d d1 8c 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a1 8c 2c 00 f7 d8 64 89 01 48
[  383.018408] RSP: 002b:00007fffa01c2aa8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[  383.018425] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fb9c0f7c1b7
[  383.018440] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 0000560b96856d48
[  383.018454] RBP: 0000560b96856ce0 R08: 0000560b96856d4c R09: 00007fffa01c2ae8
[  383.018468] R10: 00007fffa01c1aa4 R11: 0000000000000206 R12: 0000560b954f7470

Testcase: igt/drv_module_reload/basic-reload-inject
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/i915/intel_guc_submission.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
index 22367131d6a1..cc444dc5f3ad 100644
--- a/drivers/gpu/drm/i915/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/intel_guc_submission.c
@@ -1184,7 +1184,8 @@ void intel_guc_submission_fini(struct intel_guc *guc)
 	guc_clients_destroy(guc);
 	WARN_ON(!guc_verify_doorbells(guc));
 
-	guc_stage_desc_pool_destroy(guc);
+	if (guc->stage_desc_pool)
+		guc_stage_desc_pool_destroy(guc);
 }
 
 static void guc_interrupts_capture(struct drm_i915_private *dev_priv)
-- 
2.18.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/i915/guc: Protect against no desc-pool on premature shutdown
  2018-07-13 17:26 [PATCH] drm/i915/guc: Protect against no desc-pool on premature shutdown Chris Wilson
@ 2018-07-13 17:48 ` Michal Wajdeczko
  2018-07-13 17:52   ` Chris Wilson
  2018-07-13 17:54 ` Rodrigo Vivi
  2018-07-13 18:04 ` ✓ Fi.CI.BAT: success for " Patchwork
  2 siblings, 1 reply; 7+ messages in thread
From: Michal Wajdeczko @ 2018-07-13 17:48 UTC (permalink / raw)
  To: intel-gfx, Chris Wilson; +Cc: Rodrigo Vivi

On Fri, 13 Jul 2018 19:26:58 +0200, Chris Wilson  
<chris@chris-wilson.co.uk> wrote:

> Hopefully the final hack to get guc fault-injection happy before we can
> clean it up again, starting from a known good baseline...
>
> [  383.017530] BUG: unable to handle kernel NULL pointer dereference at  
> 00000000000000a0
> [  383.017556] Oops: 0000 [#1] PREEMPT SMP PTI
> [  383.017566] CPU: 7 PID: 4725 Comm: drv_module_relo Tainted: G      
> U            4.18.0-rc4-CI-CI_DRM_4485+ #1
> [  383.017581] Hardware name: Micro-Star International Co., Ltd.  
> MS-7B54/Z370M MORTAR (MS-7B54), BIOS 1.10 12/28/2017
> [  383.017664] RIP: 0010:guc_stage_desc_pool_destroy+0x17/0xe0 [i915]
> [  383.017674] Code: 59 a0 c6 05 02 59 18 00 01 e8 5e 01 c3 e0 eb b1 0f  
> 1f 00 53 48 89 fb 48 81 c7 90 02 00 00 e8 60 64 45 e1 48 8b 83 80 02 00  
> 00 <48> 8b 80 a0 00 00 00 48 8b 90 68 02 00 00 48 83 ea 01 48 81 fa ff
> [  383.017771] RSP: 0018:ffffc900004bbdd0 EFLAGS: 00010282
> [  383.017782] RAX: 0000000000000000 RBX: ffff88012ff41300 RCX:  
> 0000000000000000
> [  383.017794] RDX: 0000000000000000 RSI: ffffc900004bbd80 RDI:  
> 0000000000000000
> [  383.017805] RBP: ffff88012ff40000 R08: 00000000d876ee11 R09:  
> 0000000000000000
> [  383.017817] R10: 0000000000000000 R11: 0000000000000000 R12:  
> ffff88012ff47770
> [  383.017828] R13: ffff88012ff40068 R14: ffff880264392ef8 R15:  
> ffffffffa0639950
> [  383.017840] FS:  00007fb9c18c8980(0000) GS:ffff8802663c0000(0000)  
> knlGS:0000000000000000
> [  383.017853] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  383.017864] CR2: 00000000000000a0 CR3: 00000001df6cc003 CR4:  
> 00000000003606e0
> [  383.017875] DR0: 0000000000000000 DR1: 0000000000000000 DR2:  
> 0000000000000000
> [  383.017887] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:  
> 0000000000000400
> [  383.017898] Call Trace:
> [  383.017962]  intel_uc_fini+0x34/0xd0 [i915]
> [  383.018020]  i915_gem_fini+0x5c/0x100 [i915]
> [  383.018093]  i915_driver_unload+0xd2/0x110 [i915]
> [  383.018150]  i915_pci_remove+0x10/0x20 [i915]
> [  383.018165]  pci_device_remove+0x36/0xb0
> [  383.018179]  device_release_driver_internal+0x185/0x250
> [  383.018193]  driver_detach+0x35/0x70
> [  383.018205]  bus_remove_driver+0x53/0xd0
> [  383.018217]  pci_unregister_driver+0x25/0xa0
> [  383.018232]  __se_sys_delete_module+0x162/0x210
> [  383.018245]  ? do_syscall_64+0xd/0x190
> [  383.018257]  do_syscall_64+0x55/0x190
> [  383.018270]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [  383.018282] RIP: 0033:0x7fb9c0f7c1b7
> [  383.018290] Code: 73 01 c3 48 8b 0d d1 8c 2c 00 f7 d8 64 89 01 48 83  
> c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f  
> 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a1 8c 2c 00 f7 d8 64 89 01 48
> [  383.018408] RSP: 002b:00007fffa01c2aa8 EFLAGS: 00000206 ORIG_RAX:  
> 00000000000000b0
> [  383.018425] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:  
> 00007fb9c0f7c1b7
> [  383.018440] RDX: 0000000000000000 RSI: 0000000000000800 RDI:  
> 0000560b96856d48
> [  383.018454] RBP: 0000560b96856ce0 R08: 0000560b96856d4c R09:  
> 00007fffa01c2ae8
> [  383.018468] R10: 00007fffa01c1aa4 R11: 0000000000000206 R12:  
> 0000560b954f7470
>
> Testcase: igt/drv_module_reload/basic-reload-inject
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Michał Winiarski <michal.winiarski@intel.com>
> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_guc_submission.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c  
> b/drivers/gpu/drm/i915/intel_guc_submission.c
> index 22367131d6a1..cc444dc5f3ad 100644
> --- a/drivers/gpu/drm/i915/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/intel_guc_submission.c
> @@ -1184,7 +1184,8 @@ void intel_guc_submission_fini(struct intel_guc  
> *guc)
>  	guc_clients_destroy(guc);
>  	WARN_ON(!guc_verify_doorbells(guc));
> -	guc_stage_desc_pool_destroy(guc);
> +	if (guc->stage_desc_pool)
> +		guc_stage_desc_pool_destroy(guc);

As short-term hack this is probably ok, but maybe to avoid such case by
case hacks we should add single flag at UC level (intel_uc_init) that we
have completed our initialization and then use this flag at cleanup phase
(intel_uc_fini) just once.

Michal

ps.
I recall some earlier reviews saying that using "if" at fini is wrong ;)
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/i915/guc: Protect against no desc-pool on premature shutdown
  2018-07-13 17:48 ` Michal Wajdeczko
@ 2018-07-13 17:52   ` Chris Wilson
  2018-07-13 17:57     ` Michal Wajdeczko
  0 siblings, 1 reply; 7+ messages in thread
From: Chris Wilson @ 2018-07-13 17:52 UTC (permalink / raw)
  To: Michal Wajdeczko, intel-gfx; +Cc: Rodrigo Vivi

Quoting Michal Wajdeczko (2018-07-13 18:48:05)
> On Fri, 13 Jul 2018 19:26:58 +0200, Chris Wilson  
> <chris@chris-wilson.co.uk> wrote:
> 
> > Hopefully the final hack to get guc fault-injection happy before we can
> > clean it up again, starting from a known good baseline...
> >
> > [  383.017530] BUG: unable to handle kernel NULL pointer dereference at  
> > 00000000000000a0
> > [  383.017556] Oops: 0000 [#1] PREEMPT SMP PTI
> > [  383.017566] CPU: 7 PID: 4725 Comm: drv_module_relo Tainted: G      
> > U            4.18.0-rc4-CI-CI_DRM_4485+ #1
> > [  383.017581] Hardware name: Micro-Star International Co., Ltd.  
> > MS-7B54/Z370M MORTAR (MS-7B54), BIOS 1.10 12/28/2017
> > [  383.017664] RIP: 0010:guc_stage_desc_pool_destroy+0x17/0xe0 [i915]
> > [  383.017674] Code: 59 a0 c6 05 02 59 18 00 01 e8 5e 01 c3 e0 eb b1 0f  
> > 1f 00 53 48 89 fb 48 81 c7 90 02 00 00 e8 60 64 45 e1 48 8b 83 80 02 00  
> > 00 <48> 8b 80 a0 00 00 00 48 8b 90 68 02 00 00 48 83 ea 01 48 81 fa ff
> > [  383.017771] RSP: 0018:ffffc900004bbdd0 EFLAGS: 00010282
> > [  383.017782] RAX: 0000000000000000 RBX: ffff88012ff41300 RCX:  
> > 0000000000000000
> > [  383.017794] RDX: 0000000000000000 RSI: ffffc900004bbd80 RDI:  
> > 0000000000000000
> > [  383.017805] RBP: ffff88012ff40000 R08: 00000000d876ee11 R09:  
> > 0000000000000000
> > [  383.017817] R10: 0000000000000000 R11: 0000000000000000 R12:  
> > ffff88012ff47770
> > [  383.017828] R13: ffff88012ff40068 R14: ffff880264392ef8 R15:  
> > ffffffffa0639950
> > [  383.017840] FS:  00007fb9c18c8980(0000) GS:ffff8802663c0000(0000)  
> > knlGS:0000000000000000
> > [  383.017853] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  383.017864] CR2: 00000000000000a0 CR3: 00000001df6cc003 CR4:  
> > 00000000003606e0
> > [  383.017875] DR0: 0000000000000000 DR1: 0000000000000000 DR2:  
> > 0000000000000000
> > [  383.017887] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:  
> > 0000000000000400
> > [  383.017898] Call Trace:
> > [  383.017962]  intel_uc_fini+0x34/0xd0 [i915]
> > [  383.018020]  i915_gem_fini+0x5c/0x100 [i915]
> > [  383.018093]  i915_driver_unload+0xd2/0x110 [i915]
> > [  383.018150]  i915_pci_remove+0x10/0x20 [i915]
> > [  383.018165]  pci_device_remove+0x36/0xb0
> > [  383.018179]  device_release_driver_internal+0x185/0x250
> > [  383.018193]  driver_detach+0x35/0x70
> > [  383.018205]  bus_remove_driver+0x53/0xd0
> > [  383.018217]  pci_unregister_driver+0x25/0xa0
> > [  383.018232]  __se_sys_delete_module+0x162/0x210
> > [  383.018245]  ? do_syscall_64+0xd/0x190
> > [  383.018257]  do_syscall_64+0x55/0x190
> > [  383.018270]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > [  383.018282] RIP: 0033:0x7fb9c0f7c1b7
> > [  383.018290] Code: 73 01 c3 48 8b 0d d1 8c 2c 00 f7 d8 64 89 01 48 83  
> > c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f  
> > 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a1 8c 2c 00 f7 d8 64 89 01 48
> > [  383.018408] RSP: 002b:00007fffa01c2aa8 EFLAGS: 00000206 ORIG_RAX:  
> > 00000000000000b0
> > [  383.018425] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:  
> > 00007fb9c0f7c1b7
> > [  383.018440] RDX: 0000000000000000 RSI: 0000000000000800 RDI:  
> > 0000560b96856d48
> > [  383.018454] RBP: 0000560b96856ce0 R08: 0000560b96856d4c R09:  
> > 00007fffa01c2ae8
> > [  383.018468] R10: 00007fffa01c1aa4 R11: 0000000000000206 R12:  
> > 0000560b954f7470
> >
> > Testcase: igt/drv_module_reload/basic-reload-inject
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Michał Winiarski <michal.winiarski@intel.com>
> > Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> > Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > ---
> >  drivers/gpu/drm/i915/intel_guc_submission.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c  
> > b/drivers/gpu/drm/i915/intel_guc_submission.c
> > index 22367131d6a1..cc444dc5f3ad 100644
> > --- a/drivers/gpu/drm/i915/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/intel_guc_submission.c
> > @@ -1184,7 +1184,8 @@ void intel_guc_submission_fini(struct intel_guc  
> > *guc)
> >       guc_clients_destroy(guc);
> >       WARN_ON(!guc_verify_doorbells(guc));
> > -     guc_stage_desc_pool_destroy(guc);
> > +     if (guc->stage_desc_pool)
> > +             guc_stage_desc_pool_destroy(guc);
> 
> As short-term hack this is probably ok, but maybe to avoid such case by
> case hacks we should add single flag at UC level (intel_uc_init) that we
> have completed our initialization and then use this flag at cleanup phase
> (intel_uc_fini) just once.
> 
> Michal
> 
> ps.
> I recall some earlier reviews saying that using "if" at fini is wrong ;)

It is so wrong, and I'm cutting every corner in order to get the test in
place for someone else to clean up. Judging by the number of bugs that
have swept by with the non-functional drv_module_reload fault-injection,
we need to get it back working.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/i915/guc: Protect against no desc-pool on premature shutdown
  2018-07-13 17:26 [PATCH] drm/i915/guc: Protect against no desc-pool on premature shutdown Chris Wilson
  2018-07-13 17:48 ` Michal Wajdeczko
@ 2018-07-13 17:54 ` Rodrigo Vivi
  2018-07-13 18:04 ` ✓ Fi.CI.BAT: success for " Patchwork
  2 siblings, 0 replies; 7+ messages in thread
From: Rodrigo Vivi @ 2018-07-13 17:54 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Fri, Jul 13, 2018 at 06:26:58PM +0100, Chris Wilson wrote:
> Hopefully the final hack to get guc fault-injection happy before we can
> clean it up again, starting from a known good baseline...
> 
> [  383.017530] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a0
> [  383.017556] Oops: 0000 [#1] PREEMPT SMP PTI
> [  383.017566] CPU: 7 PID: 4725 Comm: drv_module_relo Tainted: G     U            4.18.0-rc4-CI-CI_DRM_4485+ #1
> [  383.017581] Hardware name: Micro-Star International Co., Ltd. MS-7B54/Z370M MORTAR (MS-7B54), BIOS 1.10 12/28/2017
> [  383.017664] RIP: 0010:guc_stage_desc_pool_destroy+0x17/0xe0 [i915]
> [  383.017674] Code: 59 a0 c6 05 02 59 18 00 01 e8 5e 01 c3 e0 eb b1 0f 1f 00 53 48 89 fb 48 81 c7 90 02 00 00 e8 60 64 45 e1 48 8b 83 80 02 00 00 <48> 8b 80 a0 00 00 00 48 8b 90 68 02 00 00 48 83 ea 01 48 81 fa ff
> [  383.017771] RSP: 0018:ffffc900004bbdd0 EFLAGS: 00010282
> [  383.017782] RAX: 0000000000000000 RBX: ffff88012ff41300 RCX: 0000000000000000
> [  383.017794] RDX: 0000000000000000 RSI: ffffc900004bbd80 RDI: 0000000000000000
> [  383.017805] RBP: ffff88012ff40000 R08: 00000000d876ee11 R09: 0000000000000000
> [  383.017817] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88012ff47770
> [  383.017828] R13: ffff88012ff40068 R14: ffff880264392ef8 R15: ffffffffa0639950
> [  383.017840] FS:  00007fb9c18c8980(0000) GS:ffff8802663c0000(0000) knlGS:0000000000000000
> [  383.017853] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  383.017864] CR2: 00000000000000a0 CR3: 00000001df6cc003 CR4: 00000000003606e0
> [  383.017875] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  383.017887] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  383.017898] Call Trace:
> [  383.017962]  intel_uc_fini+0x34/0xd0 [i915]
> [  383.018020]  i915_gem_fini+0x5c/0x100 [i915]
> [  383.018093]  i915_driver_unload+0xd2/0x110 [i915]
> [  383.018150]  i915_pci_remove+0x10/0x20 [i915]
> [  383.018165]  pci_device_remove+0x36/0xb0
> [  383.018179]  device_release_driver_internal+0x185/0x250
> [  383.018193]  driver_detach+0x35/0x70
> [  383.018205]  bus_remove_driver+0x53/0xd0
> [  383.018217]  pci_unregister_driver+0x25/0xa0
> [  383.018232]  __se_sys_delete_module+0x162/0x210
> [  383.018245]  ? do_syscall_64+0xd/0x190
> [  383.018257]  do_syscall_64+0x55/0x190
> [  383.018270]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [  383.018282] RIP: 0033:0x7fb9c0f7c1b7
> [  383.018290] Code: 73 01 c3 48 8b 0d d1 8c 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a1 8c 2c 00 f7 d8 64 89 01 48
> [  383.018408] RSP: 002b:00007fffa01c2aa8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
> [  383.018425] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fb9c0f7c1b7
> [  383.018440] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 0000560b96856d48
> [  383.018454] RBP: 0000560b96856ce0 R08: 0000560b96856d4c R09: 00007fffa01c2ae8
> [  383.018468] R10: 00007fffa01c1aa4 R11: 0000000000000206 R12: 0000560b954f7470
> 
> Testcase: igt/drv_module_reload/basic-reload-inject
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Michał Winiarski <michal.winiarski@intel.com>
> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_guc_submission.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
> index 22367131d6a1..cc444dc5f3ad 100644
> --- a/drivers/gpu/drm/i915/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/intel_guc_submission.c
> @@ -1184,7 +1184,8 @@ void intel_guc_submission_fini(struct intel_guc *guc)
>  	guc_clients_destroy(guc);
>  	WARN_ON(!guc_verify_doorbells(guc));
>  
> -	guc_stage_desc_pool_destroy(guc);
> +	if (guc->stage_desc_pool)
> +		guc_stage_desc_pool_destroy(guc);

it seems all this fini vs init could have something more solid

but these checks are really needed on the current code and I have
no other suggestion, so:

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

>  }
>  
>  static void guc_interrupts_capture(struct drm_i915_private *dev_priv)
> -- 
> 2.18.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/i915/guc: Protect against no desc-pool on premature shutdown
  2018-07-13 17:52   ` Chris Wilson
@ 2018-07-13 17:57     ` Michal Wajdeczko
  2018-07-13 18:24       ` Chris Wilson
  0 siblings, 1 reply; 7+ messages in thread
From: Michal Wajdeczko @ 2018-07-13 17:57 UTC (permalink / raw)
  To: intel-gfx, Chris Wilson; +Cc: Rodrigo Vivi

On Fri, 13 Jul 2018 19:52:09 +0200, Chris Wilson  
<chris@chris-wilson.co.uk> wrote:

> Quoting Michal Wajdeczko (2018-07-13 18:48:05)
>> On Fri, 13 Jul 2018 19:26:58 +0200, Chris Wilson
>> <chris@chris-wilson.co.uk> wrote:
>>
>> > Hopefully the final hack to get guc fault-injection happy before we  
>> can
>> > clean it up again, starting from a known good baseline...
>> >
>> > [  383.017530] BUG: unable to handle kernel NULL pointer dereference  
>> at
>> > 00000000000000a0
>> > [  383.017556] Oops: 0000 [#1] PREEMPT SMP PTI
>> > [  383.017566] CPU: 7 PID: 4725 Comm: drv_module_relo Tainted: G
>> > U            4.18.0-rc4-CI-CI_DRM_4485+ #1
>> > [  383.017581] Hardware name: Micro-Star International Co., Ltd.
>> > MS-7B54/Z370M MORTAR (MS-7B54), BIOS 1.10 12/28/2017
>> > [  383.017664] RIP: 0010:guc_stage_desc_pool_destroy+0x17/0xe0 [i915]
>> > [  383.017674] Code: 59 a0 c6 05 02 59 18 00 01 e8 5e 01 c3 e0 eb b1  
>> 0f
>> > 1f 00 53 48 89 fb 48 81 c7 90 02 00 00 e8 60 64 45 e1 48 8b 83 80 02  
>> 00
>> > 00 <48> 8b 80 a0 00 00 00 48 8b 90 68 02 00 00 48 83 ea 01 48 81 fa ff
>> > [  383.017771] RSP: 0018:ffffc900004bbdd0 EFLAGS: 00010282
>> > [  383.017782] RAX: 0000000000000000 RBX: ffff88012ff41300 RCX:
>> > 0000000000000000
>> > [  383.017794] RDX: 0000000000000000 RSI: ffffc900004bbd80 RDI:
>> > 0000000000000000
>> > [  383.017805] RBP: ffff88012ff40000 R08: 00000000d876ee11 R09:
>> > 0000000000000000
>> > [  383.017817] R10: 0000000000000000 R11: 0000000000000000 R12:
>> > ffff88012ff47770
>> > [  383.017828] R13: ffff88012ff40068 R14: ffff880264392ef8 R15:
>> > ffffffffa0639950
>> > [  383.017840] FS:  00007fb9c18c8980(0000) GS:ffff8802663c0000(0000)
>> > knlGS:0000000000000000
>> > [  383.017853] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> > [  383.017864] CR2: 00000000000000a0 CR3: 00000001df6cc003 CR4:
>> > 00000000003606e0
>> > [  383.017875] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>> > 0000000000000000
>> > [  383.017887] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
>> > 0000000000000400
>> > [  383.017898] Call Trace:
>> > [  383.017962]  intel_uc_fini+0x34/0xd0 [i915]
>> > [  383.018020]  i915_gem_fini+0x5c/0x100 [i915]
>> > [  383.018093]  i915_driver_unload+0xd2/0x110 [i915]
>> > [  383.018150]  i915_pci_remove+0x10/0x20 [i915]
>> > [  383.018165]  pci_device_remove+0x36/0xb0
>> > [  383.018179]  device_release_driver_internal+0x185/0x250
>> > [  383.018193]  driver_detach+0x35/0x70
>> > [  383.018205]  bus_remove_driver+0x53/0xd0
>> > [  383.018217]  pci_unregister_driver+0x25/0xa0
>> > [  383.018232]  __se_sys_delete_module+0x162/0x210
>> > [  383.018245]  ? do_syscall_64+0xd/0x190
>> > [  383.018257]  do_syscall_64+0x55/0x190
>> > [  383.018270]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>> > [  383.018282] RIP: 0033:0x7fb9c0f7c1b7
>> > [  383.018290] Code: 73 01 c3 48 8b 0d d1 8c 2c 00 f7 d8 64 89 01 48  
>> 83
>> > c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00  
>> 0f
>> > 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a1 8c 2c 00 f7 d8 64 89 01 48
>> > [  383.018408] RSP: 002b:00007fffa01c2aa8 EFLAGS: 00000206 ORIG_RAX:
>> > 00000000000000b0
>> > [  383.018425] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
>> > 00007fb9c0f7c1b7
>> > [  383.018440] RDX: 0000000000000000 RSI: 0000000000000800 RDI:
>> > 0000560b96856d48
>> > [  383.018454] RBP: 0000560b96856ce0 R08: 0000560b96856d4c R09:
>> > 00007fffa01c2ae8
>> > [  383.018468] R10: 00007fffa01c1aa4 R11: 0000000000000206 R12:
>> > 0000560b954f7470
>> >
>> > Testcase: igt/drv_module_reload/basic-reload-inject
>> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> > Cc: Michał Winiarski <michal.winiarski@intel.com>
>> > Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
>> > Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>> > ---
>> >  drivers/gpu/drm/i915/intel_guc_submission.c | 3 ++-
>> >  1 file changed, 2 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c
>> > b/drivers/gpu/drm/i915/intel_guc_submission.c
>> > index 22367131d6a1..cc444dc5f3ad 100644
>> > --- a/drivers/gpu/drm/i915/intel_guc_submission.c
>> > +++ b/drivers/gpu/drm/i915/intel_guc_submission.c
>> > @@ -1184,7 +1184,8 @@ void intel_guc_submission_fini(struct intel_guc
>> > *guc)
>> >       guc_clients_destroy(guc);
>> >       WARN_ON(!guc_verify_doorbells(guc));
>> > -     guc_stage_desc_pool_destroy(guc);
>> > +     if (guc->stage_desc_pool)
>> > +             guc_stage_desc_pool_destroy(guc);
>>
>> As short-term hack this is probably ok, but maybe to avoid such case by
>> case hacks we should add single flag at UC level (intel_uc_init) that we
>> have completed our initialization and then use this flag at cleanup  
>> phase
>> (intel_uc_fini) just once.
>>
>> Michal
>>
>> ps.
>> I recall some earlier reviews saying that using "if" at fini is wrong ;)
>
> It is so wrong, and I'm cutting every corner in order to get the test in
> place for someone else to clean up. Judging by the number of bugs that
> have swept by with the non-functional drv_module_reload fault-injection,
> we need to get it back working.

ok, I hope 'someone' will find time to finish this cleanup, so

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* ✓ Fi.CI.BAT: success for drm/i915/guc: Protect against no desc-pool on premature shutdown
  2018-07-13 17:26 [PATCH] drm/i915/guc: Protect against no desc-pool on premature shutdown Chris Wilson
  2018-07-13 17:48 ` Michal Wajdeczko
  2018-07-13 17:54 ` Rodrigo Vivi
@ 2018-07-13 18:04 ` Patchwork
  2 siblings, 0 replies; 7+ messages in thread
From: Patchwork @ 2018-07-13 18:04 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/guc: Protect against no desc-pool on premature shutdown
URL   : https://patchwork.freedesktop.org/series/46503/
State : success

== Summary ==

= CI Bug Log - changes from CI_DRM_4487 -> Patchwork_9650 =

== Summary - SUCCESS ==

  No regressions found.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/46503/revisions/1/mbox/

== Known issues ==

  Here are the changes found in Patchwork_9650 that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@debugfs_test@read_all_entries:
      fi-snb-2520m:       PASS -> INCOMPLETE (fdo#103713)

    igt@kms_flip@basic-flip-vs-modeset:
      fi-skl-6700hq:      PASS -> DMESG-WARN (fdo#105998)

    
  fdo#103713 https://bugs.freedesktop.org/show_bug.cgi?id=103713
  fdo#105998 https://bugs.freedesktop.org/show_bug.cgi?id=105998


== Participating hosts (46 -> 41) ==

  Missing    (5): fi-hsw-peppy fi-ilk-m540 fi-byt-squawks fi-bsw-cyan fi-hsw-4200u 


== Build changes ==

    * Linux: CI_DRM_4487 -> Patchwork_9650

  CI_DRM_4487: 627ed020cac6a73f0a014537dac7191efbb57084 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4556: caea9c5b3aa1191c0152d7c0f22a94efca4fd048 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_9650: ea65c2feb8a1ce0302608ff2cdea648e6e790d7d @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

ea65c2feb8a1 drm/i915/guc: Protect against no desc-pool on premature shutdown

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_9650/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/i915/guc: Protect against no desc-pool on premature shutdown
  2018-07-13 17:57     ` Michal Wajdeczko
@ 2018-07-13 18:24       ` Chris Wilson
  0 siblings, 0 replies; 7+ messages in thread
From: Chris Wilson @ 2018-07-13 18:24 UTC (permalink / raw)
  To: Michal Wajdeczko, intel-gfx; +Cc: Rodrigo Vivi

Quoting Michal Wajdeczko (2018-07-13 18:57:33)
> On Fri, 13 Jul 2018 19:52:09 +0200, Chris Wilson  
> <chris@chris-wilson.co.uk> wrote:
> 
> > Quoting Michal Wajdeczko (2018-07-13 18:48:05)
> >> On Fri, 13 Jul 2018 19:26:58 +0200, Chris Wilson
> >> <chris@chris-wilson.co.uk> wrote:
> >>
> >> > Hopefully the final hack to get guc fault-injection happy before we  
> >> can
> >> > clean it up again, starting from a known good baseline...
> >> >
> >> > [  383.017530] BUG: unable to handle kernel NULL pointer dereference  
> >> at
> >> > 00000000000000a0
> >> > [  383.017556] Oops: 0000 [#1] PREEMPT SMP PTI
> >> > [  383.017566] CPU: 7 PID: 4725 Comm: drv_module_relo Tainted: G
> >> > U            4.18.0-rc4-CI-CI_DRM_4485+ #1
> >> > [  383.017581] Hardware name: Micro-Star International Co., Ltd.
> >> > MS-7B54/Z370M MORTAR (MS-7B54), BIOS 1.10 12/28/2017
> >> > [  383.017664] RIP: 0010:guc_stage_desc_pool_destroy+0x17/0xe0 [i915]
> >> > [  383.017674] Code: 59 a0 c6 05 02 59 18 00 01 e8 5e 01 c3 e0 eb b1  
> >> 0f
> >> > 1f 00 53 48 89 fb 48 81 c7 90 02 00 00 e8 60 64 45 e1 48 8b 83 80 02  
> >> 00
> >> > 00 <48> 8b 80 a0 00 00 00 48 8b 90 68 02 00 00 48 83 ea 01 48 81 fa ff
> >> > [  383.017771] RSP: 0018:ffffc900004bbdd0 EFLAGS: 00010282
> >> > [  383.017782] RAX: 0000000000000000 RBX: ffff88012ff41300 RCX:
> >> > 0000000000000000
> >> > [  383.017794] RDX: 0000000000000000 RSI: ffffc900004bbd80 RDI:
> >> > 0000000000000000
> >> > [  383.017805] RBP: ffff88012ff40000 R08: 00000000d876ee11 R09:
> >> > 0000000000000000
> >> > [  383.017817] R10: 0000000000000000 R11: 0000000000000000 R12:
> >> > ffff88012ff47770
> >> > [  383.017828] R13: ffff88012ff40068 R14: ffff880264392ef8 R15:
> >> > ffffffffa0639950
> >> > [  383.017840] FS:  00007fb9c18c8980(0000) GS:ffff8802663c0000(0000)
> >> > knlGS:0000000000000000
> >> > [  383.017853] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> > [  383.017864] CR2: 00000000000000a0 CR3: 00000001df6cc003 CR4:
> >> > 00000000003606e0
> >> > [  383.017875] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> >> > 0000000000000000
> >> > [  383.017887] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> >> > 0000000000000400
> >> > [  383.017898] Call Trace:
> >> > [  383.017962]  intel_uc_fini+0x34/0xd0 [i915]
> >> > [  383.018020]  i915_gem_fini+0x5c/0x100 [i915]
> >> > [  383.018093]  i915_driver_unload+0xd2/0x110 [i915]
> >> > [  383.018150]  i915_pci_remove+0x10/0x20 [i915]
> >> > [  383.018165]  pci_device_remove+0x36/0xb0
> >> > [  383.018179]  device_release_driver_internal+0x185/0x250
> >> > [  383.018193]  driver_detach+0x35/0x70
> >> > [  383.018205]  bus_remove_driver+0x53/0xd0
> >> > [  383.018217]  pci_unregister_driver+0x25/0xa0
> >> > [  383.018232]  __se_sys_delete_module+0x162/0x210
> >> > [  383.018245]  ? do_syscall_64+0xd/0x190
> >> > [  383.018257]  do_syscall_64+0x55/0x190
> >> > [  383.018270]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> >> > [  383.018282] RIP: 0033:0x7fb9c0f7c1b7
> >> > [  383.018290] Code: 73 01 c3 48 8b 0d d1 8c 2c 00 f7 d8 64 89 01 48  
> >> 83
> >> > c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00  
> >> 0f
> >> > 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a1 8c 2c 00 f7 d8 64 89 01 48
> >> > [  383.018408] RSP: 002b:00007fffa01c2aa8 EFLAGS: 00000206 ORIG_RAX:
> >> > 00000000000000b0
> >> > [  383.018425] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
> >> > 00007fb9c0f7c1b7
> >> > [  383.018440] RDX: 0000000000000000 RSI: 0000000000000800 RDI:
> >> > 0000560b96856d48
> >> > [  383.018454] RBP: 0000560b96856ce0 R08: 0000560b96856d4c R09:
> >> > 00007fffa01c2ae8
> >> > [  383.018468] R10: 00007fffa01c1aa4 R11: 0000000000000206 R12:
> >> > 0000560b954f7470
> >> >
> >> > Testcase: igt/drv_module_reload/basic-reload-inject
> >> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >> > Cc: Michał Winiarski <michal.winiarski@intel.com>
> >> > Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> >> > Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> >> > ---
> >> >  drivers/gpu/drm/i915/intel_guc_submission.c | 3 ++-
> >> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >> >
> >> > diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c
> >> > b/drivers/gpu/drm/i915/intel_guc_submission.c
> >> > index 22367131d6a1..cc444dc5f3ad 100644
> >> > --- a/drivers/gpu/drm/i915/intel_guc_submission.c
> >> > +++ b/drivers/gpu/drm/i915/intel_guc_submission.c
> >> > @@ -1184,7 +1184,8 @@ void intel_guc_submission_fini(struct intel_guc
> >> > *guc)
> >> >       guc_clients_destroy(guc);
> >> >       WARN_ON(!guc_verify_doorbells(guc));
> >> > -     guc_stage_desc_pool_destroy(guc);
> >> > +     if (guc->stage_desc_pool)
> >> > +             guc_stage_desc_pool_destroy(guc);
> >>
> >> As short-term hack this is probably ok, but maybe to avoid such case by
> >> case hacks we should add single flag at UC level (intel_uc_init) that we
> >> have completed our initialization and then use this flag at cleanup  
> >> phase
> >> (intel_uc_fini) just once.
> >>
> >> Michal
> >>
> >> ps.
> >> I recall some earlier reviews saying that using "if" at fini is wrong ;)
> >
> > It is so wrong, and I'm cutting every corner in order to get the test in
> > place for someone else to clean up. Judging by the number of bugs that
> > have swept by with the non-functional drv_module_reload fault-injection,
> > we need to get it back working.
> 
> ok, I hope 'someone' will find time to finish this cleanup, so

Let's have a game of musical chairs!

Pushed, now I just need to apply an igt patch and then can try again to
see if drv_module_reload is ready.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-07-13 18:24 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-13 17:26 [PATCH] drm/i915/guc: Protect against no desc-pool on premature shutdown Chris Wilson
2018-07-13 17:48 ` Michal Wajdeczko
2018-07-13 17:52   ` Chris Wilson
2018-07-13 17:57     ` Michal Wajdeczko
2018-07-13 18:24       ` Chris Wilson
2018-07-13 17:54 ` Rodrigo Vivi
2018-07-13 18:04 ` ✓ Fi.CI.BAT: success for " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.