All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: Matthew Brost <matthew.brost@intel.com>,
	intel-gfx@lists.freedesktop.org,
	 dri-devel@lists.freedesktop.org
Cc: daniele.ceraolospurio@intel.com, john.c.harrison@intel.com
Subject: Re: [PATCH] drm/i915/guc: Fix recursive lock in GuC submission
Date: Thu, 21 Oct 2021 07:39:48 +0200	[thread overview]
Message-ID: <a2d6a96b3f360991511e6e4969de83cea2f5a97a.camel@linux.intel.com> (raw)
In-Reply-To: <20211020192147.8048-1-matthew.brost@intel.com>

On Wed, 2021-10-20 at 12:21 -0700, Matthew Brost wrote:
> Use __release_guc_id (lock held) rather than release_guc_id (acquires
> lock), add lockdep annotations.
> 
> 213.280129] i915: Running i915_perf_live_selftests/live_noa_gpr
> [ 213.283459] ============================================
> [ 213.283462] WARNING: possible recursive locking detected
> {{[ 213.283466] 5.15.0-rc6+ #18 Tainted: G U W }}
> [ 213.283470] --------------------------------------------
> [ 213.283472] kworker/u24:0/8 is trying to acquire lock:
> [ 213.283475] ffff8ffc4f6cc1e8 (&guc->submission_state.lock){....}-
> {2:2}, at: destroyed_worker_func+0x2df/0x350 [i915]
> {{[ 213.283618] }}
> {{ but task is already holding lock:}}
> [ 213.283621] ffff8ffc4f6cc1e8 (&guc->submission_state.lock){....}-
> {2:2}, at: destroyed_worker_func+0x4f/0x350 [i915]
> {{[ 213.283720] }}
> {{ other info that might help us debug this:}}
> [ 213.283724] Possible unsafe locking scenario:[ 213.283727] CPU0
> [ 213.283728] ----
> [ 213.283730] lock(&guc->submission_state.lock);
> [ 213.283734] lock(&guc->submission_state.lock);
> {{[ 213.283737] }}
> {{ *** DEADLOCK ***}}[ 213.283740] May be due to missing lock nesting
> notation[ 213.283744] 3 locks held by kworker/u24:0/8:
> [ 213.283747] #0: ffff8ffb80059d38
> ((wq_completion)events_unbound){..}-{0:0}, at:
> process_one_work+0x1f3/0x550
> [ 213.283757] #1: ffffb509000e3e78 ((work_completion)(&guc-
> >submission_state.destroyed_worker)){..}-{0:0}, at:
> process_one_work+0x1f3/0x550
> [ 213.283766] #2: ffff8ffc4f6cc1e8 (&guc-
> >submission_state.lock){....}-{2:2}, at:
> destroyed_worker_func+0x4f/0x350 [i915]
> {{[ 213.283860] }}
> {{ stack backtrace:}}
> [ 213.283863] CPU: 8 PID: 8 Comm: kworker/u24:0 Tainted: G U W
> 5.15.0-rc6+ #18
> [ 213.283868] Hardware name: ASUS System Product Name/PRIME B560M-A
> AC, BIOS 0403 01/26/2021
> [ 213.283873] Workqueue: events_unbound destroyed_worker_func [i915]
> [ 213.283957] Call Trace:
> [ 213.283960] dump_stack_lvl+0x57/0x72
> [ 213.283966] __lock_acquire.cold+0x191/0x2d3
> [ 213.283972] lock_acquire+0xb5/0x2b0
> [ 213.283978] ? destroyed_worker_func+0x2df/0x350 [i915]
> [ 213.284059] ? destroyed_worker_func+0x2d7/0x350 [i915]
> [ 213.284139] ? lock_release+0xb9/0x280
> [ 213.284143] _raw_spin_lock_irqsave+0x48/0x60
> [ 213.284148] ? destroyed_worker_func+0x2df/0x350 [i915]
> [ 213.284226] destroyed_worker_func+0x2df/0x350 [i915]
> [ 213.284310] process_one_work+0x270/0x550
> [ 213.284315] worker_thread+0x52/0x3b0
> [ 213.284319] ? process_one_work+0x550/0x550
> [ 213.284322] kthread+0x135/0x160
> [ 213.284326] ? set_kthread_struct+0x40/0x40
> [ 213.284331] ret_from_fork+0x1f/0x30
> 
> and a bit later in the trace:
> 
> {{ 227.499864] do_raw_spin_lock+0x94/0xa0}}
> [ 227.499868] _raw_spin_lock_irqsave+0x50/0x60
> [ 227.499871] ? guc_flush_destroyed_contexts+0x4f/0xf0 [i915]
> [ 227.499995] guc_flush_destroyed_contexts+0x4f/0xf0 [i915]
> [ 227.500104] intel_guc_submission_reset_prepare+0x99/0x4b0 [i915]
> [ 227.500209] ? mark_held_locks+0x49/0x70
> [ 227.500212] intel_uc_reset_prepare+0x46/0x50 [i915]
> [ 227.500320] reset_prepare+0x78/0x90 [i915]
> [ 227.500412] __intel_gt_set_wedged.part.0+0x13/0xe0 [i915]
> [ 227.500485] intel_gt_set_wedged.part.0+0x54/0x100 [i915]
> [ 227.500556] intel_gt_set_wedged_on_fini+0x1a/0x30 [i915]
> [ 227.500622] intel_gt_driver_unregister+0x1e/0x60 [i915]
> [ 227.500694] i915_driver_remove+0x4a/0xf0 [i915]
> [ 227.500767] i915_pci_probe+0x84/0x170 [i915]
> [ 227.500838] local_pci_probe+0x42/0x80
> [ 227.500842] pci_device_probe+0xd9/0x190
> [ 227.500844] really_probe+0x1f2/0x3f0
> [ 227.500847] __driver_probe_device+0xfe/0x180
> [ 227.500848] driver_probe_device+0x1e/0x90
> [ 227.500850] __driver_attach+0xc4/0x1d0
> [ 227.500851] ? __device_attach_driver+0xe0/0xe0
> [ 227.500853] ? __device_attach_driver+0xe0/0xe0
> [ 227.500854] bus_for_each_dev+0x64/0x90
> [ 227.500856] bus_add_driver+0x12e/0x1f0
> [ 227.500857] driver_register+0x8f/0xe0
> [ 227.500859] i915_init+0x1d/0x8f [i915]
> [ 227.500934] ? 0xffffffffc144a000
> [ 227.500936] do_one_initcall+0x58/0x2d0
> [ 227.500938] ? rcu_read_lock_sched_held+0x3f/0x80
> [ 227.500940] ? kmem_cache_alloc_trace+0x238/0x2d0
> [ 227.500944] do_init_module+0x5c/0x270
> [ 227.500946] __do_sys_finit_module+0x95/0xe0
> [ 227.500949] do_syscall_64+0x38/0x90
> [ 227.500951] entry_SYSCALL_64_after_hwframe+0x44/0xae
> [ 227.500953] RIP: 0033:0x7ffa59d2ae0d
> [ 227.500954] Code: c8 0c 00 0f 05 eb a9 66 0f 1f 44 00 00 f3 0f 1e
> fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24
> 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3b 80 0c 00 f7 d8 64
> 89 01 48
> [ 227.500955] RSP: 002b:00007fff320bbf48 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000139
> [ 227.500956] RAX: ffffffffffffffda RBX: 00000000022ea710 RCX:
> 00007ffa59d2ae0d
> [ 227.500957] RDX: 0000000000000000 RSI: 00000000022e1d90 RDI:
> 0000000000000004
> [ 227.500958] RBP: 0000000000000020 R08: 00007ffa59df3a60 R09:
> 0000000000000070
> [ 227.500958] R10: 00000000022e1d90 R11: 0000000000000246 R12:
> 00000000022e1d90
> [ 227.500959] R13: 00000000022e58e0 R14: 0000000000000043 R15:
> 00000000022e42c0
> 
> v2:
>  (CI build)
>   - Fix build error
> 
> Fixes: 1a52faed31311 ("drm/i915/guc: Take engine PM when a context is
> pinned with GuC submission")
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Cc: stable@vger.kernel.org

Looks correct to me.
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index d7710debcd47..38b47e73e35d 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -2373,6 +2373,7 @@ static inline void guc_lrc_desc_unpin(struct
> intel_context *ce)
>         unsigned long flags;
>         bool disabled;
>  
> +       lockdep_assert_held(&guc->submission_state.lock);
>         GEM_BUG_ON(!intel_gt_pm_is_awake(gt));
>         GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id.id));
>         GEM_BUG_ON(ce != __get_context(guc, ce->guc_id.id));
> @@ -2388,7 +2389,7 @@ static inline void guc_lrc_desc_unpin(struct
> intel_context *ce)
>         }
>         spin_unlock_irqrestore(&ce->guc_state.lock, flags);
>         if (unlikely(disabled)) {
> -               release_guc_id(guc, ce);
> +               __release_guc_id(guc, ce);
>                 __guc_context_destroy(ce);
>                 return;
>         }



WARNING: multiple messages have this Message-ID (diff)
From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: Matthew Brost <matthew.brost@intel.com>,
	intel-gfx@lists.freedesktop.org,
	 dri-devel@lists.freedesktop.org
Cc: daniele.ceraolospurio@intel.com, john.c.harrison@intel.com
Subject: Re: [Intel-gfx] [PATCH] drm/i915/guc: Fix recursive lock in GuC submission
Date: Thu, 21 Oct 2021 07:39:48 +0200	[thread overview]
Message-ID: <a2d6a96b3f360991511e6e4969de83cea2f5a97a.camel@linux.intel.com> (raw)
In-Reply-To: <20211020192147.8048-1-matthew.brost@intel.com>

On Wed, 2021-10-20 at 12:21 -0700, Matthew Brost wrote:
> Use __release_guc_id (lock held) rather than release_guc_id (acquires
> lock), add lockdep annotations.
> 
> 213.280129] i915: Running i915_perf_live_selftests/live_noa_gpr
> [ 213.283459] ============================================
> [ 213.283462] WARNING: possible recursive locking detected
> {{[ 213.283466] 5.15.0-rc6+ #18 Tainted: G U W }}
> [ 213.283470] --------------------------------------------
> [ 213.283472] kworker/u24:0/8 is trying to acquire lock:
> [ 213.283475] ffff8ffc4f6cc1e8 (&guc->submission_state.lock){....}-
> {2:2}, at: destroyed_worker_func+0x2df/0x350 [i915]
> {{[ 213.283618] }}
> {{ but task is already holding lock:}}
> [ 213.283621] ffff8ffc4f6cc1e8 (&guc->submission_state.lock){....}-
> {2:2}, at: destroyed_worker_func+0x4f/0x350 [i915]
> {{[ 213.283720] }}
> {{ other info that might help us debug this:}}
> [ 213.283724] Possible unsafe locking scenario:[ 213.283727] CPU0
> [ 213.283728] ----
> [ 213.283730] lock(&guc->submission_state.lock);
> [ 213.283734] lock(&guc->submission_state.lock);
> {{[ 213.283737] }}
> {{ *** DEADLOCK ***}}[ 213.283740] May be due to missing lock nesting
> notation[ 213.283744] 3 locks held by kworker/u24:0/8:
> [ 213.283747] #0: ffff8ffb80059d38
> ((wq_completion)events_unbound){..}-{0:0}, at:
> process_one_work+0x1f3/0x550
> [ 213.283757] #1: ffffb509000e3e78 ((work_completion)(&guc-
> >submission_state.destroyed_worker)){..}-{0:0}, at:
> process_one_work+0x1f3/0x550
> [ 213.283766] #2: ffff8ffc4f6cc1e8 (&guc-
> >submission_state.lock){....}-{2:2}, at:
> destroyed_worker_func+0x4f/0x350 [i915]
> {{[ 213.283860] }}
> {{ stack backtrace:}}
> [ 213.283863] CPU: 8 PID: 8 Comm: kworker/u24:0 Tainted: G U W
> 5.15.0-rc6+ #18
> [ 213.283868] Hardware name: ASUS System Product Name/PRIME B560M-A
> AC, BIOS 0403 01/26/2021
> [ 213.283873] Workqueue: events_unbound destroyed_worker_func [i915]
> [ 213.283957] Call Trace:
> [ 213.283960] dump_stack_lvl+0x57/0x72
> [ 213.283966] __lock_acquire.cold+0x191/0x2d3
> [ 213.283972] lock_acquire+0xb5/0x2b0
> [ 213.283978] ? destroyed_worker_func+0x2df/0x350 [i915]
> [ 213.284059] ? destroyed_worker_func+0x2d7/0x350 [i915]
> [ 213.284139] ? lock_release+0xb9/0x280
> [ 213.284143] _raw_spin_lock_irqsave+0x48/0x60
> [ 213.284148] ? destroyed_worker_func+0x2df/0x350 [i915]
> [ 213.284226] destroyed_worker_func+0x2df/0x350 [i915]
> [ 213.284310] process_one_work+0x270/0x550
> [ 213.284315] worker_thread+0x52/0x3b0
> [ 213.284319] ? process_one_work+0x550/0x550
> [ 213.284322] kthread+0x135/0x160
> [ 213.284326] ? set_kthread_struct+0x40/0x40
> [ 213.284331] ret_from_fork+0x1f/0x30
> 
> and a bit later in the trace:
> 
> {{ 227.499864] do_raw_spin_lock+0x94/0xa0}}
> [ 227.499868] _raw_spin_lock_irqsave+0x50/0x60
> [ 227.499871] ? guc_flush_destroyed_contexts+0x4f/0xf0 [i915]
> [ 227.499995] guc_flush_destroyed_contexts+0x4f/0xf0 [i915]
> [ 227.500104] intel_guc_submission_reset_prepare+0x99/0x4b0 [i915]
> [ 227.500209] ? mark_held_locks+0x49/0x70
> [ 227.500212] intel_uc_reset_prepare+0x46/0x50 [i915]
> [ 227.500320] reset_prepare+0x78/0x90 [i915]
> [ 227.500412] __intel_gt_set_wedged.part.0+0x13/0xe0 [i915]
> [ 227.500485] intel_gt_set_wedged.part.0+0x54/0x100 [i915]
> [ 227.500556] intel_gt_set_wedged_on_fini+0x1a/0x30 [i915]
> [ 227.500622] intel_gt_driver_unregister+0x1e/0x60 [i915]
> [ 227.500694] i915_driver_remove+0x4a/0xf0 [i915]
> [ 227.500767] i915_pci_probe+0x84/0x170 [i915]
> [ 227.500838] local_pci_probe+0x42/0x80
> [ 227.500842] pci_device_probe+0xd9/0x190
> [ 227.500844] really_probe+0x1f2/0x3f0
> [ 227.500847] __driver_probe_device+0xfe/0x180
> [ 227.500848] driver_probe_device+0x1e/0x90
> [ 227.500850] __driver_attach+0xc4/0x1d0
> [ 227.500851] ? __device_attach_driver+0xe0/0xe0
> [ 227.500853] ? __device_attach_driver+0xe0/0xe0
> [ 227.500854] bus_for_each_dev+0x64/0x90
> [ 227.500856] bus_add_driver+0x12e/0x1f0
> [ 227.500857] driver_register+0x8f/0xe0
> [ 227.500859] i915_init+0x1d/0x8f [i915]
> [ 227.500934] ? 0xffffffffc144a000
> [ 227.500936] do_one_initcall+0x58/0x2d0
> [ 227.500938] ? rcu_read_lock_sched_held+0x3f/0x80
> [ 227.500940] ? kmem_cache_alloc_trace+0x238/0x2d0
> [ 227.500944] do_init_module+0x5c/0x270
> [ 227.500946] __do_sys_finit_module+0x95/0xe0
> [ 227.500949] do_syscall_64+0x38/0x90
> [ 227.500951] entry_SYSCALL_64_after_hwframe+0x44/0xae
> [ 227.500953] RIP: 0033:0x7ffa59d2ae0d
> [ 227.500954] Code: c8 0c 00 0f 05 eb a9 66 0f 1f 44 00 00 f3 0f 1e
> fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24
> 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3b 80 0c 00 f7 d8 64
> 89 01 48
> [ 227.500955] RSP: 002b:00007fff320bbf48 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000139
> [ 227.500956] RAX: ffffffffffffffda RBX: 00000000022ea710 RCX:
> 00007ffa59d2ae0d
> [ 227.500957] RDX: 0000000000000000 RSI: 00000000022e1d90 RDI:
> 0000000000000004
> [ 227.500958] RBP: 0000000000000020 R08: 00007ffa59df3a60 R09:
> 0000000000000070
> [ 227.500958] R10: 00000000022e1d90 R11: 0000000000000246 R12:
> 00000000022e1d90
> [ 227.500959] R13: 00000000022e58e0 R14: 0000000000000043 R15:
> 00000000022e42c0
> 
> v2:
>  (CI build)
>   - Fix build error
> 
> Fixes: 1a52faed31311 ("drm/i915/guc: Take engine PM when a context is
> pinned with GuC submission")
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Cc: stable@vger.kernel.org

Looks correct to me.
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index d7710debcd47..38b47e73e35d 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -2373,6 +2373,7 @@ static inline void guc_lrc_desc_unpin(struct
> intel_context *ce)
>         unsigned long flags;
>         bool disabled;
>  
> +       lockdep_assert_held(&guc->submission_state.lock);
>         GEM_BUG_ON(!intel_gt_pm_is_awake(gt));
>         GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id.id));
>         GEM_BUG_ON(ce != __get_context(guc, ce->guc_id.id));
> @@ -2388,7 +2389,7 @@ static inline void guc_lrc_desc_unpin(struct
> intel_context *ce)
>         }
>         spin_unlock_irqrestore(&ce->guc_state.lock, flags);
>         if (unlikely(disabled)) {
> -               release_guc_id(guc, ce);
> +               __release_guc_id(guc, ce);
>                 __guc_context_destroy(ce);
>                 return;
>         }



  parent reply	other threads:[~2021-10-21  5:39 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-20 19:21 [PATCH] drm/i915/guc: Fix recursive lock in GuC submission Matthew Brost
2021-10-20 19:21 ` [Intel-gfx] " Matthew Brost
2021-10-20 20:16 ` [Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915/guc: Fix recursive lock in GuC submission (rev2) Patchwork
2021-10-20 23:37 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
2021-10-21  5:39 ` Thomas Hellström [this message]
2021-10-21  5:39   ` [Intel-gfx] [PATCH] drm/i915/guc: Fix recursive lock in GuC submission Thomas Hellström
2021-10-25 12:23   ` Joonas Lahtinen
2021-10-25 12:23     ` [Intel-gfx] " Joonas Lahtinen
2021-10-25 17:13     ` Matthew Brost
2021-10-25 17:13       ` [Intel-gfx] " Matthew Brost
2021-10-26  9:15       ` Joonas Lahtinen
2021-10-26  9:15         ` [Intel-gfx] " Joonas Lahtinen
  -- strict thread matches above, loose matches on Subject: below --
2021-10-20 19:07 Matthew Brost

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a2d6a96b3f360991511e6e4969de83cea2f5a97a.camel@linux.intel.com \
    --to=thomas.hellstrom@linux.intel.com \
    --cc=daniele.ceraolospurio@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=john.c.harrison@intel.com \
    --cc=matthew.brost@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.