[PATCH 0/1] perf: Fix warning from concurrent read/write of perf_event_pmu

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/1] perf: Fix warning from concurrent read/write of perf_event_pmu_context
@ 2023-01-27 14:31 James Clark
  2023-01-27 14:31 ` [PATCH 1/1] " James Clark
  0 siblings, 1 reply; 7+ messages in thread
From: James Clark @ 2023-01-27 14:31 UTC (permalink / raw)
  To: linux-perf-users, peterz, ravi.bangoria
  Cc: James Clark, Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-kernel

Hi Peter and Ravi,

I came across this issue, and I also found there was a syzbot report
for it and linked it on the commit. I have one question about the fix,
where I had to remove the if (epc->ctx) from put_pmu_ctx(). I assume
this was added for a reason, but I can't see where it's not ever set?
Unless it's removed, the function got a lot more complicated to take
the lock before the reference decrement.

Also now I think the atomic type for epc->refcount is redundant,
because everything is always done with the lock held. Except for
get_pmu_ctx(), but that is just a read for a warning. So unless you
think the scope of my extra locking can be reduced, I would probably
also send a commit to remove that as well.

Thanks
James

James Clark (1):
  perf: Fix warning from concurrent read/write of perf_event_pmu_context

 kernel/events/core.c | 42 ++++++++++++++++++++----------------------
 1 file changed, 20 insertions(+), 22 deletions(-)

base-commit: 5912e15f25d6d584f3b8449e0b6a244a0f4f0903
-- 
2.39.1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/1] perf: Fix warning from concurrent read/write of perf_event_pmu_context
  2023-01-27 14:31 [PATCH 0/1] perf: Fix warning from concurrent read/write of perf_event_pmu_context James Clark
@ 2023-01-27 14:31 ` James Clark
  2023-01-30  5:49   ` Ravi Bangoria
                     ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: James Clark @ 2023-01-27 14:31 UTC (permalink / raw)
  To: linux-perf-users, peterz, ravi.bangoria
  Cc: James Clark, syzbot+697196bc0265049822bd, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-kernel

When running two Perf sessions, the following warning can appear:

  WARNING: CPU: 1 PID: 2245 at kernel/events/core.c:4925 put_pmu_ctx+0x1f0/0x278
  Modules linked in: xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack libcrc32c nf_defrag_ipv6 nf_defrag_ipv4 ip6table_filter ip6_tables iptable_filter bridge stp llc coresight_stm stm_core coresight_etm4x coresight_tmc coresight_replicator coresight_funnel coresight_tpiu coresight arm_spe_pmu ip_tables x_tables ipv6 xhci_pci xhci_pci_renesas r8169
  CPU: 1 PID: 2245 Comm: perf Not tainted 6.2.0-rc4+ #1
  pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
  pc : put_pmu_ctx+0x1f0/0x278
  lr : put_pmu_ctx+0x1b4/0x278
  sp : ffff80000dfcbc20
  x29: ffff80000dfcbca0 x28: ffff008004f00000 x27: ffff00800763a928
  x26: ffff00800763a928 x25: 00000000000000c0 x24: 0000000000000000
  x23: 00000000000a0003 x22: ffff00837df74088 x21: ffff80000dfcbd18
  x20: 0000000000000000 x19: ffff00800763a6c0 x18: 0000000000000000
  x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
  x14: 0000000000000000 x13: ffff80000dfc8000 x12: ffff80000dfcc000
  x11: be58ab6d2939e700 x10: be58ab6d2939e700 x9 : 0000000000000000
  x8 : 0000000000000001 x7 : 0000000000000000 x6 : 0000000000000000
  x5 : ffff00800093c9c0 x4 : 0000000000000000 x3 : ffff80000dfcbca0
  x2 : ffff008004f00000 x1 : ffff8000082403c4 x0 : 0000000000000000
  Call trace:
   put_pmu_ctx+0x1f0/0x278
   _free_event+0x2bc/0x3d0
   perf_event_release_kernel+0x444/0x4bc
   perf_release+0x20/0x30
   __fput+0xe4/0x25c
   ____fput+0x1c/0x28
   task_work_run+0xc4/0xe8
   do_notify_resume+0x10c/0x164
   el0_svc+0xb4/0xdc
   el0t_64_sync_handler+0x84/0xf0
   el0t_64_sync+0x190/0x194

This is because there is no locking around the access of "if
(!epc->ctx)" in find_get_pmu_context() and when it is set to NULL in
put_pmu_ctx().

The decrement of the reference count in put_pmu_ctx() also happens
outside of the spinlock, leading to the possibility of this order of
events, and the context being cleared in put_pmu_ctx(), after its
refcount is non zero:

 CPU0                                   CPU1
 find_get_pmu_context()
   if (!epc->ctx) == false
                                        put_pmu_ctx()
                                        atomic_dec_and_test(&epc->refcount) == true
                                        epc->refcount == 0
     atomic_inc(&epc->refcount);
     epc->refcount == 1
                                        list_del_init(&epc->pmu_ctx_entry);
	                                      epc->ctx = NULL;

Another issue is that WARN_ON for no active PMU events in put_pmu_ctx()
is outside of the lock. If the perf_event_pmu_context is an embedded
one, even after clearing it, it won't be deleted and can be re-used. So
the warning can trigger. For this reason it also needs to be moved
inside the lock.

The above warning is very quick to trigger on Arm by running these two
commands at the same time:

  while true; do perf record -- ls; done
  while true; do perf record -- ls; done

Reported-by: syzbot+697196bc0265049822bd@syzkaller.appspotmail.com
Fixes: bd2756811766 ("perf: Rewrite core context handling")
Signed-off-by: James Clark <james.clark@arm.com>
---
 kernel/events/core.c | 42 ++++++++++++++++++++----------------------
 1 file changed, 20 insertions(+), 22 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 380476a934e8..b11edb86d518 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4813,19 +4813,17 @@ find_get_pmu_context(struct pmu *pmu, struct perf_event_context *ctx,
 
 		cpc = per_cpu_ptr(pmu->cpu_pmu_context, event->cpu);
 		epc = &cpc->epc;
-
+		raw_spin_lock_irq(&ctx->lock);
 		if (!epc->ctx) {
 			atomic_set(&epc->refcount, 1);
 			epc->embedded = 1;
-			raw_spin_lock_irq(&ctx->lock);
 			list_add(&epc->pmu_ctx_entry, &ctx->pmu_ctx_list);
 			epc->ctx = ctx;
-			raw_spin_unlock_irq(&ctx->lock);
 		} else {
 			WARN_ON_ONCE(epc->ctx != ctx);
 			atomic_inc(&epc->refcount);
 		}
-
+		raw_spin_unlock_irq(&ctx->lock);
 		return epc;
 	}
 
@@ -4897,32 +4895,32 @@ static void free_epc_rcu(struct rcu_head *head)
 static void put_pmu_ctx(struct perf_event_pmu_context *epc)
 {
 	unsigned long flags;
+	struct perf_event_context *ctx = epc->ctx;
 
-	if (!atomic_dec_and_test(&epc->refcount))
+	/*
+	 * XXX
+	 *
+	 * lockdep_assert_held(&ctx->mutex);
+	 *
+	 * can't because of the call-site in _free_event()/put_event()
+	 * which isn't always called under ctx->mutex.
+	 */
+	raw_spin_lock_irqsave(&ctx->lock, flags);
+	if (!atomic_dec_and_test(&epc->refcount)) {
+		raw_spin_unlock_irqrestore(&ctx->lock, flags);
 		return;
+	}
 
-	if (epc->ctx) {
-		struct perf_event_context *ctx = epc->ctx;
-
-		/*
-		 * XXX
-		 *
-		 * lockdep_assert_held(&ctx->mutex);
-		 *
-		 * can't because of the call-site in _free_event()/put_event()
-		 * which isn't always called under ctx->mutex.
-		 */
+	WARN_ON_ONCE(list_empty(&epc->pmu_ctx_entry));
 
-		WARN_ON_ONCE(list_empty(&epc->pmu_ctx_entry));
-		raw_spin_lock_irqsave(&ctx->lock, flags);
-		list_del_init(&epc->pmu_ctx_entry);
-		epc->ctx = NULL;
-		raw_spin_unlock_irqrestore(&ctx->lock, flags);
-	}
+	list_del_init(&epc->pmu_ctx_entry);
+	epc->ctx = NULL;
 
 	WARN_ON_ONCE(!list_empty(&epc->pinned_active));
 	WARN_ON_ONCE(!list_empty(&epc->flexible_active));
 
+	raw_spin_unlock_irqrestore(&ctx->lock, flags);
+
 	if (epc->embedded)
 		return;
 
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] perf: Fix warning from concurrent read/write of perf_event_pmu_context
  2023-01-27 14:31 ` [PATCH 1/1] " James Clark
@ 2023-01-30  5:49   ` Ravi Bangoria
  2023-01-31  8:25     ` Ravi Bangoria
  2023-01-31 13:31   ` Peter Zijlstra
  2023-02-01 16:10   ` [tip: perf/urgent] perf: Fix perf_event_pmu_context serialization tip-bot2 for James Clark
  2 siblings, 1 reply; 7+ messages in thread
From: Ravi Bangoria @ 2023-01-30  5:49 UTC (permalink / raw)
  To: James Clark, linux-perf-users, peterz
  Cc: syzbot+697196bc0265049822bd, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-kernel, Ravi Bangoria,
	Thomas Richter

Hi James,

On 27-Jan-23 8:01 PM, James Clark wrote:
> When running two Perf sessions, the following warning can appear:
> 
>   WARNING: CPU: 1 PID: 2245 at kernel/events/core.c:4925 put_pmu_ctx+0x1f0/0x278
>   Modules linked in: xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack libcrc32c nf_defrag_ipv6 nf_defrag_ipv4 ip6table_filter ip6_tables iptable_filter bridge stp llc coresight_stm stm_core coresight_etm4x coresight_tmc coresight_replicator coresight_funnel coresight_tpiu coresight arm_spe_pmu ip_tables x_tables ipv6 xhci_pci xhci_pci_renesas r8169
>   CPU: 1 PID: 2245 Comm: perf Not tainted 6.2.0-rc4+ #1
>   pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>   pc : put_pmu_ctx+0x1f0/0x278
>   lr : put_pmu_ctx+0x1b4/0x278
>   sp : ffff80000dfcbc20
>   x29: ffff80000dfcbca0 x28: ffff008004f00000 x27: ffff00800763a928
>   x26: ffff00800763a928 x25: 00000000000000c0 x24: 0000000000000000
>   x23: 00000000000a0003 x22: ffff00837df74088 x21: ffff80000dfcbd18
>   x20: 0000000000000000 x19: ffff00800763a6c0 x18: 0000000000000000
>   x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
>   x14: 0000000000000000 x13: ffff80000dfc8000 x12: ffff80000dfcc000
>   x11: be58ab6d2939e700 x10: be58ab6d2939e700 x9 : 0000000000000000
>   x8 : 0000000000000001 x7 : 0000000000000000 x6 : 0000000000000000
>   x5 : ffff00800093c9c0 x4 : 0000000000000000 x3 : ffff80000dfcbca0
>   x2 : ffff008004f00000 x1 : ffff8000082403c4 x0 : 0000000000000000
>   Call trace:
>    put_pmu_ctx+0x1f0/0x278
>    _free_event+0x2bc/0x3d0
>    perf_event_release_kernel+0x444/0x4bc
>    perf_release+0x20/0x30
>    __fput+0xe4/0x25c
>    ____fput+0x1c/0x28
>    task_work_run+0xc4/0xe8
>    do_notify_resume+0x10c/0x164
>    el0_svc+0xb4/0xdc
>    el0t_64_sync_handler+0x84/0xf0
>    el0t_64_sync+0x190/0x194
> 
> This is because there is no locking around the access of "if
> (!epc->ctx)" in find_get_pmu_context() and when it is set to NULL in
> put_pmu_ctx().
> 
> The decrement of the reference count in put_pmu_ctx() also happens
> outside of the spinlock, leading to the possibility of this order of
> events, and the context being cleared in put_pmu_ctx(), after its
> refcount is non zero:
> 
>  CPU0                                   CPU1
>  find_get_pmu_context()
>    if (!epc->ctx) == false
>                                         put_pmu_ctx()
>                                         atomic_dec_and_test(&epc->refcount) == true
>                                         epc->refcount == 0
>      atomic_inc(&epc->refcount);
>      epc->refcount == 1
>                                         list_del_init(&epc->pmu_ctx_entry);
> 	                                      epc->ctx = NULL;
> 
> Another issue is that WARN_ON for no active PMU events in put_pmu_ctx()
> is outside of the lock. If the perf_event_pmu_context is an embedded
> one, even after clearing it, it won't be deleted and can be re-used. So
> the warning can trigger. For this reason it also needs to be moved
> inside the lock.
> 
> The above warning is very quick to trigger on Arm by running these two
> commands at the same time:
> 
>   while true; do perf record -- ls; done
>   while true; do perf record -- ls; done

These dose not trigger WARN_ON on my x86 machine, however, the C reproducer
provided by syzbot[1] does trigger it.

[1]: https://syzkaller.appspot.com/text?tag=ReproC&x=17beacbc480000

Thanks,
Ravi

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] perf: Fix warning from concurrent read/write of perf_event_pmu_context
  2023-01-30  5:49   ` Ravi Bangoria
@ 2023-01-31  8:25     ` Ravi Bangoria
  0 siblings, 0 replies; 7+ messages in thread
From: Ravi Bangoria @ 2023-01-31  8:25 UTC (permalink / raw)
  To: James Clark, linux-perf-users, peterz
  Cc: syzbot+697196bc0265049822bd, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-kernel, Thomas Richter,
	Ravi Bangoria

On 30-Jan-23 11:19 AM, Ravi Bangoria wrote:
> Hi James,
> 
> On 27-Jan-23 8:01 PM, James Clark wrote:
>> When running two Perf sessions, the following warning can appear:
>>
>>   WARNING: CPU: 1 PID: 2245 at kernel/events/core.c:4925 put_pmu_ctx+0x1f0/0x278
>>   Modules linked in: xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack libcrc32c nf_defrag_ipv6 nf_defrag_ipv4 ip6table_filter ip6_tables iptable_filter bridge stp llc coresight_stm stm_core coresight_etm4x coresight_tmc coresight_replicator coresight_funnel coresight_tpiu coresight arm_spe_pmu ip_tables x_tables ipv6 xhci_pci xhci_pci_renesas r8169
>>   CPU: 1 PID: 2245 Comm: perf Not tainted 6.2.0-rc4+ #1
>>   pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>   pc : put_pmu_ctx+0x1f0/0x278
>>   lr : put_pmu_ctx+0x1b4/0x278
>>   sp : ffff80000dfcbc20
>>   x29: ffff80000dfcbca0 x28: ffff008004f00000 x27: ffff00800763a928
>>   x26: ffff00800763a928 x25: 00000000000000c0 x24: 0000000000000000
>>   x23: 00000000000a0003 x22: ffff00837df74088 x21: ffff80000dfcbd18
>>   x20: 0000000000000000 x19: ffff00800763a6c0 x18: 0000000000000000
>>   x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
>>   x14: 0000000000000000 x13: ffff80000dfc8000 x12: ffff80000dfcc000
>>   x11: be58ab6d2939e700 x10: be58ab6d2939e700 x9 : 0000000000000000
>>   x8 : 0000000000000001 x7 : 0000000000000000 x6 : 0000000000000000
>>   x5 : ffff00800093c9c0 x4 : 0000000000000000 x3 : ffff80000dfcbca0
>>   x2 : ffff008004f00000 x1 : ffff8000082403c4 x0 : 0000000000000000
>>   Call trace:
>>    put_pmu_ctx+0x1f0/0x278
>>    _free_event+0x2bc/0x3d0
>>    perf_event_release_kernel+0x444/0x4bc
>>    perf_release+0x20/0x30
>>    __fput+0xe4/0x25c
>>    ____fput+0x1c/0x28
>>    task_work_run+0xc4/0xe8
>>    do_notify_resume+0x10c/0x164
>>    el0_svc+0xb4/0xdc
>>    el0t_64_sync_handler+0x84/0xf0
>>    el0t_64_sync+0x190/0x194
>>
>> This is because there is no locking around the access of "if
>> (!epc->ctx)" in find_get_pmu_context() and when it is set to NULL in
>> put_pmu_ctx().
>>
>> The decrement of the reference count in put_pmu_ctx() also happens
>> outside of the spinlock, leading to the possibility of this order of
>> events, and the context being cleared in put_pmu_ctx(), after its
>> refcount is non zero:
>>
>>  CPU0                                   CPU1
>>  find_get_pmu_context()
>>    if (!epc->ctx) == false
>>                                         put_pmu_ctx()
>>                                         atomic_dec_and_test(&epc->refcount) == true
>>                                         epc->refcount == 0
>>      atomic_inc(&epc->refcount);
>>      epc->refcount == 1
>>                                         list_del_init(&epc->pmu_ctx_entry);
>> 	                                      epc->ctx = NULL;
>>
>> Another issue is that WARN_ON for no active PMU events in put_pmu_ctx()
>> is outside of the lock. If the perf_event_pmu_context is an embedded
>> one, even after clearing it, it won't be deleted and can be re-used. So
>> the warning can trigger. For this reason it also needs to be moved
>> inside the lock.
>>
>> The above warning is very quick to trigger on Arm by running these two
>> commands at the same time:
>>
>>   while true; do perf record -- ls; done
>>   while true; do perf record -- ls; done
> 
> These dose not trigger WARN_ON on my x86 machine, however, the C reproducer
> provided by syzbot[1] does trigger it.
> 
> [1]: https://syzkaller.appspot.com/text?tag=ReproC&x=17beacbc480000

Unless I'm missing some subtle scenario, the patch looks fine to me.

Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com>

Thanks,
Ravi

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] perf: Fix warning from concurrent read/write of perf_event_pmu_context
  2023-01-27 14:31 ` [PATCH 1/1] " James Clark
  2023-01-30  5:49   ` Ravi Bangoria
@ 2023-01-31 13:31   ` Peter Zijlstra
  2023-02-01 10:49     ` James Clark
  2023-02-01 16:10   ` [tip: perf/urgent] perf: Fix perf_event_pmu_context serialization tip-bot2 for James Clark
  2 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2023-01-31 13:31 UTC (permalink / raw)
  To: James Clark
  Cc: linux-perf-users, ravi.bangoria, syzbot+697196bc0265049822bd,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-kernel

On Fri, Jan 27, 2023 at 02:31:41PM +0000, James Clark wrote:

> @@ -4897,32 +4895,32 @@ static void free_epc_rcu(struct rcu_head *head)
>  static void put_pmu_ctx(struct perf_event_pmu_context *epc)
>  {
>  	unsigned long flags;
> +	struct perf_event_context *ctx = epc->ctx;
>  
> -	if (!atomic_dec_and_test(&epc->refcount))
> +	/*
> +	 * XXX
> +	 *
> +	 * lockdep_assert_held(&ctx->mutex);
> +	 *
> +	 * can't because of the call-site in _free_event()/put_event()
> +	 * which isn't always called under ctx->mutex.
> +	 */
> +	raw_spin_lock_irqsave(&ctx->lock, flags);
> +	if (!atomic_dec_and_test(&epc->refcount)) {
> +		raw_spin_unlock_irqrestore(&ctx->lock, flags);
>  		return;
> +	}

This is a bit of an anti-pattern and better donw using dec_and_lock. As
such, I'll fold the below into it.

--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -476,6 +476,15 @@ extern int _atomic_dec_and_lock_irqsave(
 #define atomic_dec_and_lock_irqsave(atomic, lock, flags) \
 		__cond_lock(lock, _atomic_dec_and_lock_irqsave(atomic, lock, &(flags)))
 
+extern int _atomic_dec_and_raw_lock(atomic_t *atomic, raw_spinlock_t *lock);
+#define atomic_dec_and_raw_lock(atomic, lock) \
+		__cond_lock(lock, _atomic_dec_and_raw_lock(atomic, lock))
+
+extern int _atomic_dec_and_raw_lock_irqsave(atomic_t *atomic, raw_spinlock_t *lock,
+					unsigned long *flags);
+#define atomic_dec_and_raw_lock_irqsave(atomic, lock, flags) \
+		__cond_lock(lock, _atomic_dec_and_raw_lock_irqsave(atomic, lock, &(flags)))
+
 int __alloc_bucket_spinlocks(spinlock_t **locks, unsigned int *lock_mask,
 			     size_t max_size, unsigned int cpu_mult,
 			     gfp_t gfp, const char *name,
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4901,11 +4901,8 @@ static void put_pmu_ctx(struct perf_even
 	 * can't because of the call-site in _free_event()/put_event()
 	 * which isn't always called under ctx->mutex.
 	 */
-	raw_spin_lock_irqsave(&ctx->lock, flags);
-	if (!atomic_dec_and_test(&epc->refcount)) {
-		raw_spin_unlock_irqrestore(&ctx->lock, flags);
+	if (!atomic_dec_and_raw_lock_irqsave(&epc->refcount, &ctx->lock, flags))
 		return;
-	}
 
 	WARN_ON_ONCE(list_empty(&epc->pmu_ctx_entry));
 
--- a/lib/dec_and_lock.c
+++ b/lib/dec_and_lock.c
@@ -49,3 +49,34 @@ int _atomic_dec_and_lock_irqsave(atomic_
 	return 0;
 }
 EXPORT_SYMBOL(_atomic_dec_and_lock_irqsave);
+
+int _atomic_dec_and_raw_lock(atomic_t *atomic, raw_spinlock_t *lock)
+{
+	/* Subtract 1 from counter unless that drops it to 0 (ie. it was 1) */
+	if (atomic_add_unless(atomic, -1, 1))
+		return 0;
+
+	/* Otherwise do it the slow way */
+	raw_spin_lock(lock);
+	if (atomic_dec_and_test(atomic))
+		return 1;
+	raw_spin_unlock(lock);
+	return 0;
+}
+EXPORT_SYMBOL(_atomic_dec_and_raw_lock);
+
+int _atomic_dec_and_raw_lock_irqsave(atomic_t *atomic, raw_spinlock_t *lock,
+				     unsigned long *flags)
+{
+	/* Subtract 1 from counter unless that drops it to 0 (ie. it was 1) */
+	if (atomic_add_unless(atomic, -1, 1))
+		return 0;
+
+	/* Otherwise do it the slow way */
+	raw_spin_lock_irqsave(lock, *flags);
+	if (atomic_dec_and_test(atomic))
+		return 1;
+	raw_spin_unlock_irqrestore(lock, *flags);
+	return 0;
+}
+EXPORT_SYMBOL(_atomic_dec_and_raw_lock_irqsave);

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] perf: Fix warning from concurrent read/write of perf_event_pmu_context
  2023-01-31 13:31   ` Peter Zijlstra
@ 2023-02-01 10:49     ` James Clark
  0 siblings, 0 replies; 7+ messages in thread
From: James Clark @ 2023-02-01 10:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-perf-users, ravi.bangoria, syzbot+697196bc0265049822bd,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-kernel



On 31/01/2023 13:31, Peter Zijlstra wrote:
> On Fri, Jan 27, 2023 at 02:31:41PM +0000, James Clark wrote:
> 
>> @@ -4897,32 +4895,32 @@ static void free_epc_rcu(struct rcu_head *head)
>>  static void put_pmu_ctx(struct perf_event_pmu_context *epc)
>>  {
>>  	unsigned long flags;
>> +	struct perf_event_context *ctx = epc->ctx;
>>  
>> -	if (!atomic_dec_and_test(&epc->refcount))
>> +	/*
>> +	 * XXX
>> +	 *
>> +	 * lockdep_assert_held(&ctx->mutex);
>> +	 *
>> +	 * can't because of the call-site in _free_event()/put_event()
>> +	 * which isn't always called under ctx->mutex.
>> +	 */
>> +	raw_spin_lock_irqsave(&ctx->lock, flags);
>> +	if (!atomic_dec_and_test(&epc->refcount)) {
>> +		raw_spin_unlock_irqrestore(&ctx->lock, flags);
>>  		return;
>> +	}
> 
> This is a bit of an anti-pattern and better donw using dec_and_lock. As
> such, I'll fold the below into it.
> 

Yep that looks better. Thanks Peter.

> --- a/include/linux/spinlock.h
> +++ b/include/linux/spinlock.h
> @@ -476,6 +476,15 @@ extern int _atomic_dec_and_lock_irqsave(
>  #define atomic_dec_and_lock_irqsave(atomic, lock, flags) \
>  		__cond_lock(lock, _atomic_dec_and_lock_irqsave(atomic, lock, &(flags)))
>  
> +extern int _atomic_dec_and_raw_lock(atomic_t *atomic, raw_spinlock_t *lock);
> +#define atomic_dec_and_raw_lock(atomic, lock) \
> +		__cond_lock(lock, _atomic_dec_and_raw_lock(atomic, lock))
> +
> +extern int _atomic_dec_and_raw_lock_irqsave(atomic_t *atomic, raw_spinlock_t *lock,
> +					unsigned long *flags);
> +#define atomic_dec_and_raw_lock_irqsave(atomic, lock, flags) \
> +		__cond_lock(lock, _atomic_dec_and_raw_lock_irqsave(atomic, lock, &(flags)))
> +
>  int __alloc_bucket_spinlocks(spinlock_t **locks, unsigned int *lock_mask,
>  			     size_t max_size, unsigned int cpu_mult,
>  			     gfp_t gfp, const char *name,
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -4901,11 +4901,8 @@ static void put_pmu_ctx(struct perf_even
>  	 * can't because of the call-site in _free_event()/put_event()
>  	 * which isn't always called under ctx->mutex.
>  	 */
> -	raw_spin_lock_irqsave(&ctx->lock, flags);
> -	if (!atomic_dec_and_test(&epc->refcount)) {
> -		raw_spin_unlock_irqrestore(&ctx->lock, flags);
> +	if (!atomic_dec_and_raw_lock_irqsave(&epc->refcount, &ctx->lock, flags))
>  		return;
> -	}
>  
>  	WARN_ON_ONCE(list_empty(&epc->pmu_ctx_entry));
>  
> --- a/lib/dec_and_lock.c
> +++ b/lib/dec_and_lock.c
> @@ -49,3 +49,34 @@ int _atomic_dec_and_lock_irqsave(atomic_
>  	return 0;
>  }
>  EXPORT_SYMBOL(_atomic_dec_and_lock_irqsave);
> +
> +int _atomic_dec_and_raw_lock(atomic_t *atomic, raw_spinlock_t *lock)
> +{
> +	/* Subtract 1 from counter unless that drops it to 0 (ie. it was 1) */
> +	if (atomic_add_unless(atomic, -1, 1))
> +		return 0;
> +
> +	/* Otherwise do it the slow way */
> +	raw_spin_lock(lock);
> +	if (atomic_dec_and_test(atomic))
> +		return 1;
> +	raw_spin_unlock(lock);
> +	return 0;
> +}
> +EXPORT_SYMBOL(_atomic_dec_and_raw_lock);
> +
> +int _atomic_dec_and_raw_lock_irqsave(atomic_t *atomic, raw_spinlock_t *lock,
> +				     unsigned long *flags)
> +{
> +	/* Subtract 1 from counter unless that drops it to 0 (ie. it was 1) */
> +	if (atomic_add_unless(atomic, -1, 1))
> +		return 0;
> +
> +	/* Otherwise do it the slow way */
> +	raw_spin_lock_irqsave(lock, *flags);
> +	if (atomic_dec_and_test(atomic))
> +		return 1;
> +	raw_spin_unlock_irqrestore(lock, *flags);
> +	return 0;
> +}
> +EXPORT_SYMBOL(_atomic_dec_and_raw_lock_irqsave);

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [tip: perf/urgent] perf: Fix perf_event_pmu_context serialization
  2023-01-27 14:31 ` [PATCH 1/1] " James Clark
  2023-01-30  5:49   ` Ravi Bangoria
  2023-01-31 13:31   ` Peter Zijlstra
@ 2023-02-01 16:10   ` tip-bot2 for James Clark
  2 siblings, 0 replies; 7+ messages in thread
From: tip-bot2 for James Clark @ 2023-02-01 16:10 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: syzbot+697196bc0265049822bd, James Clark, Peter Zijlstra (Intel),
	Ravi Bangoria, x86, linux-kernel

The following commit has been merged into the perf/urgent branch of tip:

Commit-ID:     4f64a6c9f6f11e8b7314f8e27e2c4568706009e6
Gitweb:        https://git.kernel.org/tip/4f64a6c9f6f11e8b7314f8e27e2c4568706009e6
Author:        James Clark <james.clark@arm.com>
AuthorDate:    Fri, 27 Jan 2023 14:31:41 
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 31 Jan 2023 20:37:18 +01:00

perf: Fix perf_event_pmu_context serialization

Syzkaller triggered a WARN in put_pmu_ctx().

  WARNING: CPU: 1 PID: 2245 at kernel/events/core.c:4925 put_pmu_ctx+0x1f0/0x278

This is because there is no locking around the access of "if
(!epc->ctx)" in find_get_pmu_context() and when it is set to NULL in
put_pmu_ctx().

The decrement of the reference count in put_pmu_ctx() also happens
outside of the spinlock, leading to the possibility of this order of
events, and the context being cleared in put_pmu_ctx(), after its
refcount is non zero:

 CPU0                                   CPU1
 find_get_pmu_context()
   if (!epc->ctx) == false
                                        put_pmu_ctx()
                                        atomic_dec_and_test(&epc->refcount) == true
                                        epc->refcount == 0
     atomic_inc(&epc->refcount);
     epc->refcount == 1
                                        list_del_init(&epc->pmu_ctx_entry);
	                                      epc->ctx = NULL;

Another issue is that WARN_ON for no active PMU events in put_pmu_ctx()
is outside of the lock. If the perf_event_pmu_context is an embedded
one, even after clearing it, it won't be deleted and can be re-used. So
the warning can trigger. For this reason it also needs to be moved
inside the lock.

The above warning is very quick to trigger on Arm by running these two
commands at the same time:

  while true; do perf record -- ls; done
  while true; do perf record -- ls; done

[peterz: atomic_dec_and_raw_lock*()]
Fixes: bd2756811766 ("perf: Rewrite core context handling")
Reported-by: syzbot+697196bc0265049822bd@syzkaller.appspotmail.com
Signed-off-by: James Clark <james.clark@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20230127143141.1782804-2-james.clark@arm.com
---
 include/linux/spinlock.h |  9 +++++++++-
 kernel/events/core.c     | 39 +++++++++++++++++----------------------
 lib/dec_and_lock.c       | 31 +++++++++++++++++++++++++++++++-
 3 files changed, 57 insertions(+), 22 deletions(-)

diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index 1341f7d..be48f1c 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -476,6 +476,15 @@ extern int _atomic_dec_and_lock_irqsave(atomic_t *atomic, spinlock_t *lock,
 #define atomic_dec_and_lock_irqsave(atomic, lock, flags) \
 		__cond_lock(lock, _atomic_dec_and_lock_irqsave(atomic, lock, &(flags)))
 
+extern int _atomic_dec_and_raw_lock(atomic_t *atomic, raw_spinlock_t *lock);
+#define atomic_dec_and_raw_lock(atomic, lock) \
+		__cond_lock(lock, _atomic_dec_and_raw_lock(atomic, lock))
+
+extern int _atomic_dec_and_raw_lock_irqsave(atomic_t *atomic, raw_spinlock_t *lock,
+					unsigned long *flags);
+#define atomic_dec_and_raw_lock_irqsave(atomic, lock, flags) \
+		__cond_lock(lock, _atomic_dec_and_raw_lock_irqsave(atomic, lock, &(flags)))
+
 int __alloc_bucket_spinlocks(spinlock_t **locks, unsigned int *lock_mask,
 			     size_t max_size, unsigned int cpu_mult,
 			     gfp_t gfp, const char *name,
diff --git a/kernel/events/core.c b/kernel/events/core.c
index d56328e..c4be13e 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4813,19 +4813,17 @@ find_get_pmu_context(struct pmu *pmu, struct perf_event_context *ctx,
 
 		cpc = per_cpu_ptr(pmu->cpu_pmu_context, event->cpu);
 		epc = &cpc->epc;
-
+		raw_spin_lock_irq(&ctx->lock);
 		if (!epc->ctx) {
 			atomic_set(&epc->refcount, 1);
 			epc->embedded = 1;
-			raw_spin_lock_irq(&ctx->lock);
 			list_add(&epc->pmu_ctx_entry, &ctx->pmu_ctx_list);
 			epc->ctx = ctx;
-			raw_spin_unlock_irq(&ctx->lock);
 		} else {
 			WARN_ON_ONCE(epc->ctx != ctx);
 			atomic_inc(&epc->refcount);
 		}
-
+		raw_spin_unlock_irq(&ctx->lock);
 		return epc;
 	}
 
@@ -4896,33 +4894,30 @@ static void free_epc_rcu(struct rcu_head *head)
 
 static void put_pmu_ctx(struct perf_event_pmu_context *epc)
 {
+	struct perf_event_context *ctx = epc->ctx;
 	unsigned long flags;
 
-	if (!atomic_dec_and_test(&epc->refcount))
+	/*
+	 * XXX
+	 *
+	 * lockdep_assert_held(&ctx->mutex);
+	 *
+	 * can't because of the call-site in _free_event()/put_event()
+	 * which isn't always called under ctx->mutex.
+	 */
+	if (!atomic_dec_and_raw_lock_irqsave(&epc->refcount, &ctx->lock, flags))
 		return;
 
-	if (epc->ctx) {
-		struct perf_event_context *ctx = epc->ctx;
+	WARN_ON_ONCE(list_empty(&epc->pmu_ctx_entry));
 
-		/*
-		 * XXX
-		 *
-		 * lockdep_assert_held(&ctx->mutex);
-		 *
-		 * can't because of the call-site in _free_event()/put_event()
-		 * which isn't always called under ctx->mutex.
-		 */
-
-		WARN_ON_ONCE(list_empty(&epc->pmu_ctx_entry));
-		raw_spin_lock_irqsave(&ctx->lock, flags);
-		list_del_init(&epc->pmu_ctx_entry);
-		epc->ctx = NULL;
-		raw_spin_unlock_irqrestore(&ctx->lock, flags);
-	}
+	list_del_init(&epc->pmu_ctx_entry);
+	epc->ctx = NULL;
 
 	WARN_ON_ONCE(!list_empty(&epc->pinned_active));
 	WARN_ON_ONCE(!list_empty(&epc->flexible_active));
 
+	raw_spin_unlock_irqrestore(&ctx->lock, flags);
+
 	if (epc->embedded)
 		return;
 
diff --git a/lib/dec_and_lock.c b/lib/dec_and_lock.c
index 9555b68..1dcca8f 100644
--- a/lib/dec_and_lock.c
+++ b/lib/dec_and_lock.c
@@ -49,3 +49,34 @@ int _atomic_dec_and_lock_irqsave(atomic_t *atomic, spinlock_t *lock,
 	return 0;
 }
 EXPORT_SYMBOL(_atomic_dec_and_lock_irqsave);
+
+int _atomic_dec_and_raw_lock(atomic_t *atomic, raw_spinlock_t *lock)
+{
+	/* Subtract 1 from counter unless that drops it to 0 (ie. it was 1) */
+	if (atomic_add_unless(atomic, -1, 1))
+		return 0;
+
+	/* Otherwise do it the slow way */
+	raw_spin_lock(lock);
+	if (atomic_dec_and_test(atomic))
+		return 1;
+	raw_spin_unlock(lock);
+	return 0;
+}
+EXPORT_SYMBOL(_atomic_dec_and_raw_lock);
+
+int _atomic_dec_and_raw_lock_irqsave(atomic_t *atomic, raw_spinlock_t *lock,
+				     unsigned long *flags)
+{
+	/* Subtract 1 from counter unless that drops it to 0 (ie. it was 1) */
+	if (atomic_add_unless(atomic, -1, 1))
+		return 0;
+
+	/* Otherwise do it the slow way */
+	raw_spin_lock_irqsave(lock, *flags);
+	if (atomic_dec_and_test(atomic))
+		return 1;
+	raw_spin_unlock_irqrestore(lock, *flags);
+	return 0;
+}
+EXPORT_SYMBOL(_atomic_dec_and_raw_lock_irqsave);

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-02-01 16:11 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-27 14:31 [PATCH 0/1] perf: Fix warning from concurrent read/write of perf_event_pmu_context James Clark
2023-01-27 14:31 ` [PATCH 1/1] " James Clark
2023-01-30  5:49   ` Ravi Bangoria
2023-01-31  8:25     ` Ravi Bangoria
2023-01-31 13:31   ` Peter Zijlstra
2023-02-01 10:49     ` James Clark
2023-02-01 16:10   ` [tip: perf/urgent] perf: Fix perf_event_pmu_context serialization tip-bot2 for James Clark

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).