linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* WARN_ONCE in arch/x86/kernel/hw_breakpoint.c
@ 2013-05-20 16:19 Vince Weaver
  2013-05-28 17:00 ` Oleg Nesterov
                   ` (3 more replies)
  0 siblings, 4 replies; 28+ messages in thread
From: Vince Weaver @ 2013-05-20 16:19 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, trinity

Hello

on 3.10-rc1 with the trinity fuzzer patched to exercise the 
perf_event_open() syscall I am triggering this WARN_ONCE:

[   75.864822] ------------[ cut here ]------------
[   75.864830] WARNING: at arch/x86/kernel/hw_breakpoint.c:121 arch_install_hw_breakpoint+0x5b/0xcb()
[   75.864832] Can't find any breakpoint slot
[   75.864833] Modules linked in: dn_rtmsg can_raw nfnetlink can_bcm can xfrm_user xfrm_algo nfc rfkill ax25 scsi_transport_iscsi atm ipt_ULOG x_tables ipx p8023 p8022 irda crc_ccitt appletalk psnap llc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd dns_resolver fscache sunrpc loop fuse snd_hda_codec_hdmi snd_hda_codec_realtek coretemp kvm_intel kvm evdev nouveau mxm_wmi ttm drm_kms_helper video wmi drm i2c_algo_bit microcode snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_seq snd_seq_device snd_timer snd acpi_cpufreq mperf processor thermal_sys psmouse serio_raw pcspkr button soundcore shpchp i2c_nforce2 i2c_core ext4 crc16 jbd2 mbcache sg sd_mod crc_t10dif ata_generic ahci libahci ehci_pci ohci_hcd ehci_hcd libata r8169 mii scsi_mod usbcore usb_common
[   75.864890] CPU: 0 PID: 3136 Comm: trinity-child0 Not tainted 3.10.0-rc1 #1
[   75.864892] Hardware name: AOpen   DE7000/nMCP7ALPx-DE R1.06 Oct.19.2012, BIOS 080015  10/19/2012
[   75.864894]  0000000000000000 ffffffff8102e205 ffff880119b01c98 ffff880119abac00
[   75.864897]  ffff880119b01ce8 ffff88011fc15b28 0000000ed19e90a3 ffffffff8102e2b0
[   75.864900]  ffffffff814db382 0000000200000018 ffff880119b01cf8 ffff880119b01cb8
[   75.864903] Call Trace:
[   75.864907]  [<ffffffff8102e205>] ? warn_slowpath_common+0x5b/0x70
[   75.864910]  [<ffffffff8102e2b0>] ? warn_slowpath_fmt+0x47/0x49
[   75.864913]  [<ffffffff81055d08>] ? sched_clock_local+0xc/0x6d
[   75.864916]  [<ffffffff81006fff>] ? arch_install_hw_breakpoint+0x5b/0xcb
[   75.864919]  [<ffffffff810ab5a1>] ? event_sched_in+0x68/0x11c
[   75.864921]  [<ffffffff810ab69b>] ? group_sched_in+0x46/0x120
[   75.864923]  [<ffffffff810ab898>] ? ctx_sched_in+0x123/0x141
[   75.864926]  [<ffffffff810abdf1>] ? __perf_install_in_context+0xcb/0xea
[   75.864929]  [<ffffffff810a88b2>] ? perf_swevent_add+0xb8/0xb8
[   75.864932]  [<ffffffff810a88c5>] ? remote_function+0x13/0x3b
[   75.864934]  [<ffffffff8106fd87>] ? smp_call_function_single+0x76/0xf1
[   75.864937]  [<ffffffff810a803a>] ? task_function_call+0x42/0x4c
[   75.864939]  [<ffffffff810abd26>] ? perf_event_sched_in+0x69/0x69
[   75.864942]  [<ffffffff810aa018>] ? perf_install_in_context+0x5b/0x97
[   75.864945]  [<ffffffff810aecf0>] ? SYSC_perf_event_open+0x65f/0x7d4
[   75.864949]  [<ffffffff81369792>] ? system_call_fastpath+0x16/0x1b
[   75.864951] ---[ end trace 250def16d8853b8c ]---

Is this condition really drastic enough to deserve a WARNing?

Vince Weaver
vincent.weaver@maine.edu
http://www.eece.maine.edu/~vweaver/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: WARN_ONCE in arch/x86/kernel/hw_breakpoint.c
  2013-05-20 16:19 WARN_ONCE in arch/x86/kernel/hw_breakpoint.c Vince Weaver
@ 2013-05-28 17:00 ` Oleg Nesterov
  2013-05-28 17:28   ` Oleg Nesterov
  2013-06-01 18:20 ` [PATCH 0/2]: " Oleg Nesterov
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 28+ messages in thread
From: Oleg Nesterov @ 2013-05-28 17:00 UTC (permalink / raw)
  To: Vince Weaver
  Cc: linux-kernel, Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, trinity, Frédéric Weisbecker

Well. I am not familiar with this code, and when I tried to read it
I feel I will be never able to understand it ;)

On 05/20, Vince Weaver wrote:
>
> on 3.10-rc1 with the trinity fuzzer patched to exercise the
> perf_event_open() syscall I am triggering this WARN_ONCE:
>
> [   75.864822] ------------[ cut here ]------------
> [   75.864830] WARNING: at arch/x86/kernel/hw_breakpoint.c:121 arch_install_hw_breakpoint+0x5b/0xcb()
...
> [   75.864916]  [<ffffffff81006fff>] ? arch_install_hw_breakpoint+0x5b/0xcb
> [   75.864919]  [<ffffffff810ab5a1>] ? event_sched_in+0x68/0x11c

I am wondering if we should check attr->pinned before WARN_ONCE...

But it seems that hw_breakpoint.c is buggy anyway.

Suppose that attr.task != NULL and event->cpu = -1.

__reserve_bp_slot() tries to calculate slots.pinned and calls
fetch_bp_busy_slots().

In this case fetch_bp_busy_slots() does

	for_each_online_cpu(cpu)
		...
		nr += task_bp_pinned(cpu, bp, type);

And task_bp_pinned() (in particular) checks cpu == event->cpu,
this will be never true.

IOW, it seems that __reserve_bp_slot(task, cpu => -1) always
succeeds because task_bp_pinned() returns 0 and thus we can
create more than HWP_NUM breakpoints. Much more ;)

As for _create, I guess we probably need something like

--- x/kernel/events/hw_breakpoint.c
+++ x/kernel/events/hw_breakpoint.c
@@ -156,7 +156,7 @@ fetch_bp_busy_slots(struct bp_busy_slots
 		if (!tsk)
 			nr += max_task_bp_pinned(cpu, type);
 		else
-			nr += task_bp_pinned(cpu, bp, type);
+			nr += task_bp_pinned(-1, bp, type);
 
 		if (nr > slots->pinned)
 			slots->pinned = nr;

But I simply can't understand toggle_bp_task_slot()->task_bp_pinned().

Oleg.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: WARN_ONCE in arch/x86/kernel/hw_breakpoint.c
  2013-05-28 17:00 ` Oleg Nesterov
@ 2013-05-28 17:28   ` Oleg Nesterov
  2013-05-28 18:47     ` Oleg Nesterov
  0 siblings, 1 reply; 28+ messages in thread
From: Oleg Nesterov @ 2013-05-28 17:28 UTC (permalink / raw)
  To: Vince Weaver
  Cc: linux-kernel, Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, trinity, Frédéric Weisbecker,
	Jiri Olsa

OK, Jiri explained me how can I use perf to install the hwbp ;)

And indeed,

	# perl -e 'sleep 1 while 1' &
	[1] 507
	# perf record -e mem:0x10,mem:0x10,mem:0x10,mem:0x10,mem:0x10 -p `pidof perl`
	
triggers the same warn/problem.

Interestingly,

	perf record -e mem:0x10,mem:0x10,mem:0x10,mem:0x10,mem:0x10 true

correctly fails with ENOSPC, this is because perf installs NR_CPUS counters
for each cpu and the accounting works.

IIRC, I already tried to complain that perf could be smarter in this case
and install a single counter with event->cpu = -1, but this is offtopic.

Oleg.

On 05/28, Oleg Nesterov wrote:
>
> Well. I am not familiar with this code, and when I tried to read it
> I feel I will be never able to understand it ;)
>
> On 05/20, Vince Weaver wrote:
> >
> > on 3.10-rc1 with the trinity fuzzer patched to exercise the
> > perf_event_open() syscall I am triggering this WARN_ONCE:
> >
> > [   75.864822] ------------[ cut here ]------------
> > [   75.864830] WARNING: at arch/x86/kernel/hw_breakpoint.c:121 arch_install_hw_breakpoint+0x5b/0xcb()
> ...
> > [   75.864916]  [<ffffffff81006fff>] ? arch_install_hw_breakpoint+0x5b/0xcb
> > [   75.864919]  [<ffffffff810ab5a1>] ? event_sched_in+0x68/0x11c
>
> I am wondering if we should check attr->pinned before WARN_ONCE...
>
> But it seems that hw_breakpoint.c is buggy anyway.
>
> Suppose that attr.task != NULL and event->cpu = -1.
>
> __reserve_bp_slot() tries to calculate slots.pinned and calls
> fetch_bp_busy_slots().
>
> In this case fetch_bp_busy_slots() does
>
> 	for_each_online_cpu(cpu)
> 		...
> 		nr += task_bp_pinned(cpu, bp, type);
>
> And task_bp_pinned() (in particular) checks cpu == event->cpu,
> this will be never true.
>
> IOW, it seems that __reserve_bp_slot(task, cpu => -1) always
> succeeds because task_bp_pinned() returns 0 and thus we can
> create more than HWP_NUM breakpoints. Much more ;)
>
> As for _create, I guess we probably need something like
>
> --- x/kernel/events/hw_breakpoint.c
> +++ x/kernel/events/hw_breakpoint.c
> @@ -156,7 +156,7 @@ fetch_bp_busy_slots(struct bp_busy_slots
>  		if (!tsk)
>  			nr += max_task_bp_pinned(cpu, type);
>  		else
> -			nr += task_bp_pinned(cpu, bp, type);
> +			nr += task_bp_pinned(-1, bp, type);
>
>  		if (nr > slots->pinned)
>  			slots->pinned = nr;
>
> But I simply can't understand toggle_bp_task_slot()->task_bp_pinned().
>
> Oleg.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: WARN_ONCE in arch/x86/kernel/hw_breakpoint.c
  2013-05-28 17:28   ` Oleg Nesterov
@ 2013-05-28 18:47     ` Oleg Nesterov
  2013-05-29 16:32       ` [MAYBEPATCH] : " Oleg Nesterov
  0 siblings, 1 reply; 28+ messages in thread
From: Oleg Nesterov @ 2013-05-28 18:47 UTC (permalink / raw)
  To: Vince Weaver
  Cc: linux-kernel, Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, trinity, Frédéric Weisbecker,
	Jiri Olsa

On 05/28, Oleg Nesterov wrote:
>
> IIRC, I already tried to complain that perf could be smarter in this case
> and install a single counter with event->cpu = -1, but this is offtopic.

Just in case, please see my old email

	http://marc.info/?l=linux-kernel&m=136017983113299

I didn't check if perf was changed since then. And I forgot everything
(not too much ;) I learned after I briefly looked into tools/perf/.

But, again, this is off-topic.

Oleg.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [MAYBEPATCH] : WARN_ONCE in arch/x86/kernel/hw_breakpoint.c
  2013-05-28 18:47     ` Oleg Nesterov
@ 2013-05-29 16:32       ` Oleg Nesterov
  0 siblings, 0 replies; 28+ messages in thread
From: Oleg Nesterov @ 2013-05-29 16:32 UTC (permalink / raw)
  To: Vince Weaver
  Cc: linux-kernel, Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, trinity, Frédéric Weisbecker,
	Jiri Olsa

OK, I seem to understand what toggle_bp_task_slot() does, and
it looks equally wrong.

I think we need something like the patch below... I'll try to
recheck/test tomorrow, but I would not mind if someone who knows
this code makes the authoritative fix.

Even if this patch is right, I think this all needs more cleanups,
at least. For example, every DEFINE_PER_CPU() looks bogus. This
is not pcpu memory.

Oleg.

--- x/kernel/events/hw_breakpoint.c
+++ x/kernel/events/hw_breakpoint.c
@@ -111,7 +111,7 @@ static unsigned int max_task_bp_pinned(int cpu, enum bp_type_idx type)
  * Count the number of breakpoints of the same type and same task.
  * The given event must be not on the list.
  */
-static int task_bp_pinned(int cpu, struct perf_event *bp, enum bp_type_idx type)
+static int task_bp_pinned(struct perf_event *bp, enum bp_type_idx type)
 {
 	struct task_struct *tsk = bp->hw.bp_target;
 	struct perf_event *iter;
@@ -120,7 +120,7 @@ static int task_bp_pinned(int cpu, struct perf_event *bp, enum bp_type_idx type)
 	list_for_each_entry(iter, &bp_task_head, hw.bp_list) {
 		if (iter->hw.bp_target == tsk &&
 		    find_slot_idx(iter) == type &&
-		    cpu == iter->cpu)
+		    bp->cpu == iter->cpu)
 			count += hw_breakpoint_weight(iter);
 	}
 
@@ -137,13 +137,17 @@ fetch_bp_busy_slots(struct bp_busy_slots *slots, struct perf_event *bp,
 {
 	int cpu = bp->cpu;
 	struct task_struct *tsk = bp->hw.bp_target;
+	int task_pinned;
+
+	if (tsk)
+		task_pinned = task_bp_pinned(bp, type);
 
 	if (cpu >= 0) {
 		slots->pinned = per_cpu(nr_cpu_bp_pinned[type], cpu);
 		if (!tsk)
 			slots->pinned += max_task_bp_pinned(cpu, type);
 		else
-			slots->pinned += task_bp_pinned(cpu, bp, type);
+			slots->pinned += task_pinned;
 		slots->flexible = per_cpu(nr_bp_flexible[type], cpu);
 
 		return;
@@ -156,7 +160,7 @@ fetch_bp_busy_slots(struct bp_busy_slots *slots, struct perf_event *bp,
 		if (!tsk)
 			nr += max_task_bp_pinned(cpu, type);
 		else
-			nr += task_bp_pinned(cpu, bp, type);
+			nr += task_pinned;
 
 		if (nr > slots->pinned)
 			slots->pinned = nr;
@@ -182,15 +186,13 @@ fetch_this_slot(struct bp_busy_slots *slots, int weight)
 /*
  * Add a pinned breakpoint for the given task in our constraint table
  */
-static void toggle_bp_task_slot(struct perf_event *bp, int cpu, bool enable,
+static void toggle_bp_task_slot(int old_count, int cpu, bool enable,
 				enum bp_type_idx type, int weight)
 {
 	unsigned int *tsk_pinned;
-	int old_count = 0;
 	int old_idx = 0;
 	int idx = 0;
 
-	old_count = task_bp_pinned(cpu, bp, type);
 	old_idx = old_count - 1;
 	idx = old_idx + weight;
 
@@ -216,6 +218,7 @@ toggle_bp_slot(struct perf_event *bp, bool enable, enum bp_type_idx type,
 {
 	int cpu = bp->cpu;
 	struct task_struct *tsk = bp->hw.bp_target;
+	int task_pinned;
 
 	/* Pinned counter cpu profiling */
 	if (!tsk) {
@@ -232,11 +235,13 @@ toggle_bp_slot(struct perf_event *bp, bool enable, enum bp_type_idx type,
 	if (!enable)
 		list_del(&bp->hw.bp_list);
 
+	task_pinned = task_bp_pinned(bp, type);
+
 	if (cpu >= 0) {
-		toggle_bp_task_slot(bp, cpu, enable, type, weight);
+		toggle_bp_task_slot(task_pinned, cpu, enable, type, weight);
 	} else {
 		for_each_online_cpu(cpu)
-			toggle_bp_task_slot(bp, cpu, enable, type, weight);
+			toggle_bp_task_slot(task_pinned, cpu, enable, type, weight);
 	}
 
 	if (enable)


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 0/2]: WARN_ONCE in arch/x86/kernel/hw_breakpoint.c
  2013-05-20 16:19 WARN_ONCE in arch/x86/kernel/hw_breakpoint.c Vince Weaver
  2013-05-28 17:00 ` Oleg Nesterov
@ 2013-06-01 18:20 ` Oleg Nesterov
  2013-06-01 18:21   ` [PATCH 1/2] hw_breakpoint: Fix cpu check in task_bp_pinned(cpu) Oleg Nesterov
  2013-06-01 18:21   ` [PATCH 2/2] hw_breakpoint: Use cpu_possible_mask in {reserve,release}_bp_slot() Oleg Nesterov
  2013-06-01 19:45 ` [PATCH 0/3] hw_breakpoint: cleanups Oleg Nesterov
  2013-06-02 19:49 ` [PATCH 0/2] hw_breakpoint: more cleanups Oleg Nesterov
  3 siblings, 2 replies; 28+ messages in thread
From: Oleg Nesterov @ 2013-06-01 18:20 UTC (permalink / raw)
  To: Vince Weaver
  Cc: linux-kernel, Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, trinity, Jiri Olsa

On 05/20, Vince Weaver wrote:
>
> on 3.10-rc1 with the trinity fuzzer patched to exercise the
> perf_event_open() syscall I am triggering this WARN_ONCE:
>
> [   75.864822] ------------[ cut here ]------------
> [   75.864830] WARNING: at arch/x86/kernel/hw_breakpoint.c:121 arch_install_hw_breakpoint+0x5b/0xcb()

Ingo,

I am not sure about -stable, but probably this is 3.10 material.

Oleg.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/2] hw_breakpoint: Fix cpu check in task_bp_pinned(cpu)
  2013-06-01 18:20 ` [PATCH 0/2]: " Oleg Nesterov
@ 2013-06-01 18:21   ` Oleg Nesterov
  2013-06-13 14:20     ` Frederic Weisbecker
  2013-06-01 18:21   ` [PATCH 2/2] hw_breakpoint: Use cpu_possible_mask in {reserve,release}_bp_slot() Oleg Nesterov
  1 sibling, 1 reply; 28+ messages in thread
From: Oleg Nesterov @ 2013-06-01 18:21 UTC (permalink / raw)
  To: Vince Weaver
  Cc: linux-kernel, Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, trinity, Jiri Olsa

trinity fuzzer triggered WARN_ONCE("Can't find any breakpoint slot")
in arch_install_hw_breakpoint() but the problem is not arch-specific.

The problem is, task_bp_pinned(cpu) checks "cpu == iter->cpu" but
this doesn't account the "all cpus" events with iter->cpu < 0.

This means that, say, register_user_hw_breakpoint(tsk) can happily
create the arbitrary number > HBP_NUM of breakpoints which can not
be activated. toggle_bp_task_slot() is equally wrong by the same
reason and nr_task_bp_pinned[] can have negative entries.

Simple test:

	# perl -e 'sleep 1 while 1' &
	# perf record -e mem:0x10,mem:0x10,mem:0x10,mem:0x10,mem:0x10 -p `pidof perl`

Before this patch this triggers the same problem/WARN_ON(), after
the patch it correctly fails with -ENOSPC.

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: <stable@vger.kernel.org>
---
 kernel/events/hw_breakpoint.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
index ed1c897..29d3abe 100644
--- a/kernel/events/hw_breakpoint.c
+++ b/kernel/events/hw_breakpoint.c
@@ -120,7 +120,7 @@ static int task_bp_pinned(int cpu, struct perf_event *bp, enum bp_type_idx type)
 	list_for_each_entry(iter, &bp_task_head, hw.bp_list) {
 		if (iter->hw.bp_target == tsk &&
 		    find_slot_idx(iter) == type &&
-		    cpu == iter->cpu)
+		    (iter->cpu < 0 || cpu == iter->cpu))
 			count += hw_breakpoint_weight(iter);
 	}
 
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/2] hw_breakpoint: Use cpu_possible_mask in {reserve,release}_bp_slot()
  2013-06-01 18:20 ` [PATCH 0/2]: " Oleg Nesterov
  2013-06-01 18:21   ` [PATCH 1/2] hw_breakpoint: Fix cpu check in task_bp_pinned(cpu) Oleg Nesterov
@ 2013-06-01 18:21   ` Oleg Nesterov
  2013-06-15 12:46     ` Frederic Weisbecker
  1 sibling, 1 reply; 28+ messages in thread
From: Oleg Nesterov @ 2013-06-01 18:21 UTC (permalink / raw)
  To: Vince Weaver
  Cc: linux-kernel, Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, trinity, Jiri Olsa

fetch_bp_busy_slots() and toggle_bp_slot() use for_each_online_cpu(),
this is obviously wrong wrt cpu_up() or cpu_down(), we can over/under
account the per-cpu numbers.

For example:

	# echo 0 >> /sys/devices/system/cpu/cpu1/online
	# perf record -e mem:0x10 -p 1 &
	# echo 1 >> /sys/devices/system/cpu/cpu1/online
	# perf record -e mem:0x10,mem:0x10,mem:0x10,mem:0x10 -C1 -a &
	# taskset -p 0x2 1

triggers the same WARN_ONCE("Can't find any breakpoint slot") in
arch_install_hw_breakpoint().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: <stable@vger.kernel.org>
---
 kernel/events/hw_breakpoint.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
index 29d3abe..4407e43 100644
--- a/kernel/events/hw_breakpoint.c
+++ b/kernel/events/hw_breakpoint.c
@@ -149,7 +149,7 @@ fetch_bp_busy_slots(struct bp_busy_slots *slots, struct perf_event *bp,
 		return;
 	}
 
-	for_each_online_cpu(cpu) {
+	for_each_possible_cpu(cpu) {
 		unsigned int nr;
 
 		nr = per_cpu(nr_cpu_bp_pinned[type], cpu);
@@ -235,7 +235,7 @@ toggle_bp_slot(struct perf_event *bp, bool enable, enum bp_type_idx type,
 	if (cpu >= 0) {
 		toggle_bp_task_slot(bp, cpu, enable, type, weight);
 	} else {
-		for_each_online_cpu(cpu)
+		for_each_possible_cpu(cpu)
 			toggle_bp_task_slot(bp, cpu, enable, type, weight);
 	}
 
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 0/3] hw_breakpoint: cleanups
  2013-05-20 16:19 WARN_ONCE in arch/x86/kernel/hw_breakpoint.c Vince Weaver
  2013-05-28 17:00 ` Oleg Nesterov
  2013-06-01 18:20 ` [PATCH 0/2]: " Oleg Nesterov
@ 2013-06-01 19:45 ` Oleg Nesterov
  2013-06-01 19:45   ` [PATCH 1/3] hw_breakpoint: Simplify list/idx mess in toggle_bp_slot() paths Oleg Nesterov
                     ` (3 more replies)
  2013-06-02 19:49 ` [PATCH 0/2] hw_breakpoint: more cleanups Oleg Nesterov
  3 siblings, 4 replies; 28+ messages in thread
From: Oleg Nesterov @ 2013-06-01 19:45 UTC (permalink / raw)
  To: Ingo Molnar, Vince Weaver
  Cc: linux-kernel, Peter Zijlstra, Paul Mackerras,
	Arnaldo Carvalho de Melo, trinity, Jiri Olsa

Hello.

Cleanups, on top of

	[PATCH 0/2]: WARN_ONCE in arch/x86/kernel/hw_breakpoint.c

series.

Oleg.

 kernel/events/hw_breakpoint.c |   91 ++++++++++++++++-------------------------
 1 files changed, 35 insertions(+), 56 deletions(-)


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/3] hw_breakpoint: Simplify list/idx mess in toggle_bp_slot() paths
  2013-06-01 19:45 ` [PATCH 0/3] hw_breakpoint: cleanups Oleg Nesterov
@ 2013-06-01 19:45   ` Oleg Nesterov
  2013-06-15 12:59     ` Frederic Weisbecker
  2013-06-01 19:46   ` [PATCH 2/3] hw_breakpoint: Simplify the "weight" usage " Oleg Nesterov
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 28+ messages in thread
From: Oleg Nesterov @ 2013-06-01 19:45 UTC (permalink / raw)
  To: Ingo Molnar, Vince Weaver
  Cc: linux-kernel, Peter Zijlstra, Paul Mackerras,
	Arnaldo Carvalho de Melo, trinity, Jiri Olsa

The enable/disable logic in toggle_bp_slot() is not symmetrical
and imho very confusing. "old_count" in toggle_bp_task_slot() is
actually new_count because this bp was already removed from the
list.

Change toggle_bp_slot() to always call list_add/list_del after
toggle_bp_task_slot(). This way old_idx is task_bp_pinned() and
this entry should be decremented, new_idx is +/-weight and we
need to increment this element. The code/logic looks obvious.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/events/hw_breakpoint.c |   40 ++++++++++++++++------------------------
 1 files changed, 16 insertions(+), 24 deletions(-)

diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
index ed31fe1..21a3c88 100644
--- a/kernel/events/hw_breakpoint.c
+++ b/kernel/events/hw_breakpoint.c
@@ -185,26 +185,20 @@ fetch_this_slot(struct bp_busy_slots *slots, int weight)
 static void toggle_bp_task_slot(struct perf_event *bp, int cpu, bool enable,
 				enum bp_type_idx type, int weight)
 {
-	unsigned int *tsk_pinned;
-	int old_count = 0;
-	int old_idx = 0;
-	int idx = 0;
-
-	old_count = task_bp_pinned(cpu, bp, type);
-	old_idx = old_count - 1;
-	idx = old_idx + weight;
-
-	/* tsk_pinned[n] is the number of tasks having n breakpoints */
-	tsk_pinned = per_cpu(nr_task_bp_pinned[type], cpu);
-	if (enable) {
-		tsk_pinned[idx]++;
-		if (old_count > 0)
-			tsk_pinned[old_idx]--;
-	} else {
-		tsk_pinned[idx]--;
-		if (old_count > 0)
-			tsk_pinned[old_idx]++;
-	}
+	/* tsk_pinned[n-1] is the number of tasks having n>0 breakpoints */
+	unsigned int *tsk_pinned = per_cpu(nr_task_bp_pinned[type], cpu);
+	int old_idx, new_idx;
+
+	old_idx = task_bp_pinned(cpu, bp, type) - 1;
+	if (enable)
+		new_idx = old_idx + weight;
+	else
+		new_idx = old_idx - weight;
+
+	if (old_idx >= 0)
+		tsk_pinned[old_idx]--;
+	if (new_idx >= 0)
+		tsk_pinned[new_idx]++;
 }
 
 /*
@@ -228,10 +222,6 @@ toggle_bp_slot(struct perf_event *bp, bool enable, enum bp_type_idx type,
 	}
 
 	/* Pinned counter task profiling */
-
-	if (!enable)
-		list_del(&bp->hw.bp_list);
-
 	if (cpu >= 0) {
 		toggle_bp_task_slot(bp, cpu, enable, type, weight);
 	} else {
@@ -241,6 +231,8 @@ toggle_bp_slot(struct perf_event *bp, bool enable, enum bp_type_idx type,
 
 	if (enable)
 		list_add_tail(&bp->hw.bp_list, &bp_task_head);
+	else
+		list_del(&bp->hw.bp_list);
 }
 
 /*
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/3] hw_breakpoint: Simplify the "weight" usage in toggle_bp_slot() paths
  2013-06-01 19:45 ` [PATCH 0/3] hw_breakpoint: cleanups Oleg Nesterov
  2013-06-01 19:45   ` [PATCH 1/3] hw_breakpoint: Simplify list/idx mess in toggle_bp_slot() paths Oleg Nesterov
@ 2013-06-01 19:46   ` Oleg Nesterov
  2013-06-15 13:14     ` Frederic Weisbecker
  2013-06-01 19:46   ` [PATCH 3/3] hw_breakpoint: Introduce cpumask_of_bp() Oleg Nesterov
  2013-06-13 14:01   ` [PATCH 0/3] hw_breakpoint: cleanups Frederic Weisbecker
  3 siblings, 1 reply; 28+ messages in thread
From: Oleg Nesterov @ 2013-06-01 19:46 UTC (permalink / raw)
  To: Ingo Molnar, Vince Weaver
  Cc: linux-kernel, Peter Zijlstra, Paul Mackerras,
	Arnaldo Carvalho de Melo, trinity, Jiri Olsa

Change toggle_bp_slot() to make "weight" negative if !enable. This
way we can always use "+ weight" without additional "if (enable)"
check and toggle_bp_task_slot() no longer needs this arg.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/events/hw_breakpoint.c |   20 ++++++++------------
 1 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
index 21a3c88..2f4d7c4 100644
--- a/kernel/events/hw_breakpoint.c
+++ b/kernel/events/hw_breakpoint.c
@@ -182,7 +182,7 @@ fetch_this_slot(struct bp_busy_slots *slots, int weight)
 /*
  * Add a pinned breakpoint for the given task in our constraint table
  */
-static void toggle_bp_task_slot(struct perf_event *bp, int cpu, bool enable,
+static void toggle_bp_task_slot(struct perf_event *bp, int cpu,
 				enum bp_type_idx type, int weight)
 {
 	/* tsk_pinned[n-1] is the number of tasks having n>0 breakpoints */
@@ -190,10 +190,7 @@ static void toggle_bp_task_slot(struct perf_event *bp, int cpu, bool enable,
 	int old_idx, new_idx;
 
 	old_idx = task_bp_pinned(cpu, bp, type) - 1;
-	if (enable)
-		new_idx = old_idx + weight;
-	else
-		new_idx = old_idx - weight;
+	new_idx = old_idx + weight;
 
 	if (old_idx >= 0)
 		tsk_pinned[old_idx]--;
@@ -211,22 +208,21 @@ toggle_bp_slot(struct perf_event *bp, bool enable, enum bp_type_idx type,
 	int cpu = bp->cpu;
 	struct task_struct *tsk = bp->hw.bp_target;
 
+	if (!enable)
+		weight = -weight;
+
 	/* Pinned counter cpu profiling */
 	if (!tsk) {
-
-		if (enable)
-			per_cpu(nr_cpu_bp_pinned[type], bp->cpu) += weight;
-		else
-			per_cpu(nr_cpu_bp_pinned[type], bp->cpu) -= weight;
+		per_cpu(nr_cpu_bp_pinned[type], cpu) += weight;
 		return;
 	}
 
 	/* Pinned counter task profiling */
 	if (cpu >= 0) {
-		toggle_bp_task_slot(bp, cpu, enable, type, weight);
+		toggle_bp_task_slot(bp, cpu, type, weight);
 	} else {
 		for_each_possible_cpu(cpu)
-			toggle_bp_task_slot(bp, cpu, enable, type, weight);
+			toggle_bp_task_slot(bp, cpu, type, weight);
 	}
 
 	if (enable)
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 3/3] hw_breakpoint: Introduce cpumask_of_bp()
  2013-06-01 19:45 ` [PATCH 0/3] hw_breakpoint: cleanups Oleg Nesterov
  2013-06-01 19:45   ` [PATCH 1/3] hw_breakpoint: Simplify list/idx mess in toggle_bp_slot() paths Oleg Nesterov
  2013-06-01 19:46   ` [PATCH 2/3] hw_breakpoint: Simplify the "weight" usage " Oleg Nesterov
@ 2013-06-01 19:46   ` Oleg Nesterov
  2013-06-15 13:29     ` Frederic Weisbecker
  2013-06-13 14:01   ` [PATCH 0/3] hw_breakpoint: cleanups Frederic Weisbecker
  3 siblings, 1 reply; 28+ messages in thread
From: Oleg Nesterov @ 2013-06-01 19:46 UTC (permalink / raw)
  To: Ingo Molnar, Vince Weaver
  Cc: linux-kernel, Peter Zijlstra, Paul Mackerras,
	Arnaldo Carvalho de Melo, trinity, Jiri Olsa

Add the trivial helper which simply returns cpumask_of() or
cpu_possible_mask depending on bp->cpu.

Change fetch_bp_busy_slots() and toggle_bp_slot() to always do
for_each_cpu(cpumask_of_bp) to simplify the code and avoid the
code duplication.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/events/hw_breakpoint.c |   43 ++++++++++++++++------------------------
 1 files changed, 17 insertions(+), 26 deletions(-)

diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
index 2f4d7c4..57efe5d 100644
--- a/kernel/events/hw_breakpoint.c
+++ b/kernel/events/hw_breakpoint.c
@@ -127,6 +127,13 @@ static int task_bp_pinned(int cpu, struct perf_event *bp, enum bp_type_idx type)
 	return count;
 }
 
+static const struct cpumask *cpumask_of_bp(struct perf_event *bp)
+{
+	if (bp->cpu >= 0)
+		return cpumask_of(bp->cpu);
+	return cpu_possible_mask;
+}
+
 /*
  * Report the number of pinned/un-pinned breakpoints we have in
  * a given cpu (cpu > -1) or in all of them (cpu = -1).
@@ -135,25 +142,13 @@ static void
 fetch_bp_busy_slots(struct bp_busy_slots *slots, struct perf_event *bp,
 		    enum bp_type_idx type)
 {
-	int cpu = bp->cpu;
-	struct task_struct *tsk = bp->hw.bp_target;
-
-	if (cpu >= 0) {
-		slots->pinned = per_cpu(nr_cpu_bp_pinned[type], cpu);
-		if (!tsk)
-			slots->pinned += max_task_bp_pinned(cpu, type);
-		else
-			slots->pinned += task_bp_pinned(cpu, bp, type);
-		slots->flexible = per_cpu(nr_bp_flexible[type], cpu);
-
-		return;
-	}
+	const struct cpumask *cpumask = cpumask_of_bp(bp);
+	int cpu;
 
-	for_each_possible_cpu(cpu) {
-		unsigned int nr;
+	for_each_cpu(cpu, cpumask) {
+		unsigned int nr = per_cpu(nr_cpu_bp_pinned[type], cpu);
 
-		nr = per_cpu(nr_cpu_bp_pinned[type], cpu);
-		if (!tsk)
+		if (!bp->hw.bp_target)
 			nr += max_task_bp_pinned(cpu, type);
 		else
 			nr += task_bp_pinned(cpu, bp, type);
@@ -205,25 +200,21 @@ static void
 toggle_bp_slot(struct perf_event *bp, bool enable, enum bp_type_idx type,
 	       int weight)
 {
-	int cpu = bp->cpu;
-	struct task_struct *tsk = bp->hw.bp_target;
+	const struct cpumask *cpumask = cpumask_of_bp(bp);
+	int cpu;
 
 	if (!enable)
 		weight = -weight;
 
 	/* Pinned counter cpu profiling */
-	if (!tsk) {
-		per_cpu(nr_cpu_bp_pinned[type], cpu) += weight;
+	if (!bp->hw.bp_target) {
+		per_cpu(nr_cpu_bp_pinned[type], bp->cpu) += weight;
 		return;
 	}
 
 	/* Pinned counter task profiling */
-	if (cpu >= 0) {
+	for_each_cpu(cpu, cpumask)
 		toggle_bp_task_slot(bp, cpu, type, weight);
-	} else {
-		for_each_possible_cpu(cpu)
-			toggle_bp_task_slot(bp, cpu, type, weight);
-	}
 
 	if (enable)
 		list_add_tail(&bp->hw.bp_list, &bp_task_head);
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 0/2] hw_breakpoint: more cleanups
  2013-05-20 16:19 WARN_ONCE in arch/x86/kernel/hw_breakpoint.c Vince Weaver
                   ` (2 preceding siblings ...)
  2013-06-01 19:45 ` [PATCH 0/3] hw_breakpoint: cleanups Oleg Nesterov
@ 2013-06-02 19:49 ` Oleg Nesterov
  2013-06-02 19:50   ` [PATCH 1/2] hw_breakpoint: Simplify *register_wide_hw_breakpoint() Oleg Nesterov
  2013-06-02 19:50   ` [PATCH 2/2] hw_breakpoint: Introduce "struct bp_cpuinfo" Oleg Nesterov
  3 siblings, 2 replies; 28+ messages in thread
From: Oleg Nesterov @ 2013-06-02 19:49 UTC (permalink / raw)
  To: Ingo Molnar, Vince Weaver
  Cc: linux-kernel, Peter Zijlstra, Paul Mackerras,
	Arnaldo Carvalho de Melo, trinity, Jiri Olsa

Hello.

More cleanups, on top of "[PATCH 0/3] hw_breakpoint: cleanups".

Oleg.

 kernel/events/hw_breakpoint.c |  103 +++++++++++++++++++----------------------
 1 files changed, 47 insertions(+), 56 deletions(-)


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/2] hw_breakpoint: Simplify *register_wide_hw_breakpoint()
  2013-06-02 19:49 ` [PATCH 0/2] hw_breakpoint: more cleanups Oleg Nesterov
@ 2013-06-02 19:50   ` Oleg Nesterov
  2013-06-18  0:12     ` Frederic Weisbecker
  2013-06-02 19:50   ` [PATCH 2/2] hw_breakpoint: Introduce "struct bp_cpuinfo" Oleg Nesterov
  1 sibling, 1 reply; 28+ messages in thread
From: Oleg Nesterov @ 2013-06-02 19:50 UTC (permalink / raw)
  To: Ingo Molnar, Vince Weaver
  Cc: linux-kernel, Peter Zijlstra, Paul Mackerras,
	Arnaldo Carvalho de Melo, trinity, Jiri Olsa

1. register_wide_hw_breakpoint() can use unregister_ if failure,
   no need to duplicate the code.

2. "struct perf_event **pevent" adds the unnecesary lever of
   indirection and complication, use per_cpu(*cpu_events, cpu).

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/events/hw_breakpoint.c |   34 +++++++++++-----------------------
 1 files changed, 11 insertions(+), 23 deletions(-)

diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
index 74d739b..17d8093 100644
--- a/kernel/events/hw_breakpoint.c
+++ b/kernel/events/hw_breakpoint.c
@@ -497,8 +497,8 @@ register_wide_hw_breakpoint(struct perf_event_attr *attr,
 			    perf_overflow_handler_t triggered,
 			    void *context)
 {
-	struct perf_event * __percpu *cpu_events, **pevent, *bp;
-	long err;
+	struct perf_event * __percpu *cpu_events, *bp;
+	long err = 0;
 	int cpu;
 
 	cpu_events = alloc_percpu(typeof(*cpu_events));
@@ -507,31 +507,21 @@ register_wide_hw_breakpoint(struct perf_event_attr *attr,
 
 	get_online_cpus();
 	for_each_online_cpu(cpu) {
-		pevent = per_cpu_ptr(cpu_events, cpu);
 		bp = perf_event_create_kernel_counter(attr, cpu, NULL,
 						      triggered, context);
-
-		*pevent = bp;
-
 		if (IS_ERR(bp)) {
 			err = PTR_ERR(bp);
-			goto fail;
+			break;
 		}
-	}
-	put_online_cpus();
-
-	return cpu_events;
 
-fail:
-	for_each_online_cpu(cpu) {
-		pevent = per_cpu_ptr(cpu_events, cpu);
-		if (IS_ERR(*pevent))
-			break;
-		unregister_hw_breakpoint(*pevent);
+		per_cpu(*cpu_events, cpu) = bp;
 	}
 	put_online_cpus();
 
-	free_percpu(cpu_events);
+	if (likely(!err))
+		return cpu_events;
+
+	unregister_wide_hw_breakpoint(cpu_events);
 	return (void __percpu __force *)ERR_PTR(err);
 }
 EXPORT_SYMBOL_GPL(register_wide_hw_breakpoint);
@@ -543,12 +533,10 @@ EXPORT_SYMBOL_GPL(register_wide_hw_breakpoint);
 void unregister_wide_hw_breakpoint(struct perf_event * __percpu *cpu_events)
 {
 	int cpu;
-	struct perf_event **pevent;
 
-	for_each_possible_cpu(cpu) {
-		pevent = per_cpu_ptr(cpu_events, cpu);
-		unregister_hw_breakpoint(*pevent);
-	}
+	for_each_possible_cpu(cpu)
+		unregister_hw_breakpoint(per_cpu(*cpu_events, cpu));
+
 	free_percpu(cpu_events);
 }
 EXPORT_SYMBOL_GPL(unregister_wide_hw_breakpoint);
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/2] hw_breakpoint: Introduce "struct bp_cpuinfo"
  2013-06-02 19:49 ` [PATCH 0/2] hw_breakpoint: more cleanups Oleg Nesterov
  2013-06-02 19:50   ` [PATCH 1/2] hw_breakpoint: Simplify *register_wide_hw_breakpoint() Oleg Nesterov
@ 2013-06-02 19:50   ` Oleg Nesterov
  2013-06-18 12:37     ` Frederic Weisbecker
  1 sibling, 1 reply; 28+ messages in thread
From: Oleg Nesterov @ 2013-06-02 19:50 UTC (permalink / raw)
  To: Ingo Molnar, Vince Weaver
  Cc: linux-kernel, Peter Zijlstra, Paul Mackerras,
	Arnaldo Carvalho de Melo, trinity, Jiri Olsa

This patch simply moves all per-cpu variables into the new single
per-cpu "struct bp_cpuinfo".

To me this looks more logical and clean, but this can also simplify
the further potential changes. In particular, I do not think this
memory should be per-cpu, it is never used "locally". After this
change it is trivial to turn it into, say, bootmem[nr_cpu_ids].

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/events/hw_breakpoint.c |   69 +++++++++++++++++++++-------------------
 1 files changed, 36 insertions(+), 33 deletions(-)

diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
index 17d8093..42c47a8 100644
--- a/kernel/events/hw_breakpoint.c
+++ b/kernel/events/hw_breakpoint.c
@@ -46,23 +46,26 @@
 #include <linux/smp.h>
 
 #include <linux/hw_breakpoint.h>
-
-
 /*
  * Constraints data
  */
+struct bp_cpuinfo {
+	/* Number of pinned cpu breakpoints in a cpu */
+	unsigned int	cpu_pinned;
+	/* tsk_pinned[n] is the number of tasks having n+1 breakpoints */
+	unsigned int	*tsk_pinned;
+	/* Number of non-pinned cpu/task breakpoints in a cpu */
+	unsigned int	flexible; /* XXX: placeholder, see fetch_this_slot() */
+};
 
-/* Number of pinned cpu breakpoints in a cpu */
-static DEFINE_PER_CPU(unsigned int, nr_cpu_bp_pinned[TYPE_MAX]);
-
-/* Number of pinned task breakpoints in a cpu */
-static DEFINE_PER_CPU(unsigned int *, nr_task_bp_pinned[TYPE_MAX]);
-
-/* Number of non-pinned cpu/task breakpoints in a cpu */
-static DEFINE_PER_CPU(unsigned int, nr_bp_flexible[TYPE_MAX]);
-
+static DEFINE_PER_CPU(struct bp_cpuinfo, bp_cpuinfo[TYPE_MAX]);
 static int nr_slots[TYPE_MAX];
 
+static struct bp_cpuinfo *get_bp_info(int cpu, enum bp_type_idx type)
+{
+	return per_cpu_ptr(bp_cpuinfo + type, cpu);
+}
+
 /* Keep track of the breakpoints attached to tasks */
 static LIST_HEAD(bp_task_head);
 
@@ -96,8 +99,8 @@ static inline enum bp_type_idx find_slot_idx(struct perf_event *bp)
  */
 static unsigned int max_task_bp_pinned(int cpu, enum bp_type_idx type)
 {
+	unsigned int *tsk_pinned = get_bp_info(cpu, type)->tsk_pinned;
 	int i;
-	unsigned int *tsk_pinned = per_cpu(nr_task_bp_pinned[type], cpu);
 
 	for (i = nr_slots[type] - 1; i >= 0; i--) {
 		if (tsk_pinned[i] > 0)
@@ -146,8 +149,10 @@ fetch_bp_busy_slots(struct bp_busy_slots *slots, struct perf_event *bp,
 	int cpu;
 
 	for_each_cpu(cpu, cpumask) {
-		unsigned int nr = per_cpu(nr_cpu_bp_pinned[type], cpu);
+		struct bp_cpuinfo *info = get_bp_info(cpu, type);
+		int nr;
 
+		nr = info->cpu_pinned;
 		if (!bp->hw.bp_target)
 			nr += max_task_bp_pinned(cpu, type);
 		else
@@ -156,8 +161,7 @@ fetch_bp_busy_slots(struct bp_busy_slots *slots, struct perf_event *bp,
 		if (nr > slots->pinned)
 			slots->pinned = nr;
 
-		nr = per_cpu(nr_bp_flexible[type], cpu);
-
+		nr = info->flexible;
 		if (nr > slots->flexible)
 			slots->flexible = nr;
 	}
@@ -180,8 +184,7 @@ fetch_this_slot(struct bp_busy_slots *slots, int weight)
 static void toggle_bp_task_slot(struct perf_event *bp, int cpu,
 				enum bp_type_idx type, int weight)
 {
-	/* tsk_pinned[n-1] is the number of tasks having n>0 breakpoints */
-	unsigned int *tsk_pinned = per_cpu(nr_task_bp_pinned[type], cpu);
+	unsigned int *tsk_pinned = get_bp_info(cpu, type)->tsk_pinned;
 	int old_idx, new_idx;
 
 	old_idx = task_bp_pinned(cpu, bp, type) - 1;
@@ -208,7 +211,7 @@ toggle_bp_slot(struct perf_event *bp, bool enable, enum bp_type_idx type,
 
 	/* Pinned counter cpu profiling */
 	if (!bp->hw.bp_target) {
-		per_cpu(nr_cpu_bp_pinned[type], bp->cpu) += weight;
+		get_bp_info(bp->cpu, type)->cpu_pinned += weight;
 		return;
 	}
 
@@ -240,8 +243,8 @@ __weak void arch_unregister_hw_breakpoint(struct perf_event *bp)
  *
  *   - If attached to a single cpu, check:
  *
- *       (per_cpu(nr_bp_flexible, cpu) || (per_cpu(nr_cpu_bp_pinned, cpu)
- *           + max(per_cpu(nr_task_bp_pinned, cpu)))) < HBP_NUM
+ *       (per_cpu(info->flexible, cpu) || (per_cpu(info->cpu_pinned, cpu)
+ *           + max(per_cpu(info->tsk_pinned, cpu)))) < HBP_NUM
  *
  *       -> If there are already non-pinned counters in this cpu, it means
  *          there is already a free slot for them.
@@ -251,8 +254,8 @@ __weak void arch_unregister_hw_breakpoint(struct perf_event *bp)
  *
  *   - If attached to every cpus, check:
  *
- *       (per_cpu(nr_bp_flexible, *) || (max(per_cpu(nr_cpu_bp_pinned, *))
- *           + max(per_cpu(nr_task_bp_pinned, *)))) < HBP_NUM
+ *       (per_cpu(info->flexible, *) || (max(per_cpu(info->cpu_pinned, *))
+ *           + max(per_cpu(info->tsk_pinned, *)))) < HBP_NUM
  *
  *       -> This is roughly the same, except we check the number of per cpu
  *          bp for every cpu and we keep the max one. Same for the per tasks
@@ -263,16 +266,16 @@ __weak void arch_unregister_hw_breakpoint(struct perf_event *bp)
  *
  *   - If attached to a single cpu, check:
  *
- *       ((per_cpu(nr_bp_flexible, cpu) > 1) + per_cpu(nr_cpu_bp_pinned, cpu)
- *            + max(per_cpu(nr_task_bp_pinned, cpu))) < HBP_NUM
+ *       ((per_cpu(info->flexible, cpu) > 1) + per_cpu(info->cpu_pinned, cpu)
+ *            + max(per_cpu(info->tsk_pinned, cpu))) < HBP_NUM
  *
- *       -> Same checks as before. But now the nr_bp_flexible, if any, must keep
+ *       -> Same checks as before. But now the info->flexible, if any, must keep
  *          one register at least (or they will never be fed).
  *
  *   - If attached to every cpus, check:
  *
- *       ((per_cpu(nr_bp_flexible, *) > 1) + max(per_cpu(nr_cpu_bp_pinned, *))
- *            + max(per_cpu(nr_task_bp_pinned, *))) < HBP_NUM
+ *       ((per_cpu(info->flexible, *) > 1) + max(per_cpu(info->cpu_pinned, *))
+ *            + max(per_cpu(info->tsk_pinned, *))) < HBP_NUM
  */
 static int __reserve_bp_slot(struct perf_event *bp)
 {
@@ -617,7 +620,6 @@ static struct pmu perf_breakpoint = {
 
 int __init init_hw_breakpoint(void)
 {
-	unsigned int **task_bp_pinned;
 	int cpu, err_cpu;
 	int i;
 
@@ -626,10 +628,11 @@ int __init init_hw_breakpoint(void)
 
 	for_each_possible_cpu(cpu) {
 		for (i = 0; i < TYPE_MAX; i++) {
-			task_bp_pinned = &per_cpu(nr_task_bp_pinned[i], cpu);
-			*task_bp_pinned = kzalloc(sizeof(int) * nr_slots[i],
-						  GFP_KERNEL);
-			if (!*task_bp_pinned)
+			struct bp_cpuinfo *info = get_bp_info(cpu, i);
+
+			info->tsk_pinned = kcalloc(nr_slots[i], sizeof(int),
+							GFP_KERNEL);
+			if (!info->tsk_pinned)
 				goto err_alloc;
 		}
 	}
@@ -643,7 +646,7 @@ int __init init_hw_breakpoint(void)
  err_alloc:
 	for_each_possible_cpu(err_cpu) {
 		for (i = 0; i < TYPE_MAX; i++)
-			kfree(per_cpu(nr_task_bp_pinned[i], err_cpu));
+			kfree(get_bp_info(err_cpu, i)->tsk_pinned);
 		if (err_cpu == cpu)
 			break;
 	}
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/3] hw_breakpoint: cleanups
  2013-06-01 19:45 ` [PATCH 0/3] hw_breakpoint: cleanups Oleg Nesterov
                     ` (2 preceding siblings ...)
  2013-06-01 19:46   ` [PATCH 3/3] hw_breakpoint: Introduce cpumask_of_bp() Oleg Nesterov
@ 2013-06-13 14:01   ` Frederic Weisbecker
  2013-06-13 15:15     ` Oleg Nesterov
  3 siblings, 1 reply; 28+ messages in thread
From: Frederic Weisbecker @ 2013-06-13 14:01 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Vince Weaver, linux-kernel, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, trinity, Jiri Olsa

On Sat, Jun 01, 2013 at 09:45:26PM +0200, Oleg Nesterov wrote:
> Hello.
> 
> Cleanups, on top of
> 
> 	[PATCH 0/2]: WARN_ONCE in arch/x86/kernel/hw_breakpoint.c

So this series doesn't have the fix for the warning?

> 
> series.
> 
> Oleg.
> 
>  kernel/events/hw_breakpoint.c |   91 ++++++++++++++++-------------------------
>  1 files changed, 35 insertions(+), 56 deletions(-)
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] hw_breakpoint: Fix cpu check in task_bp_pinned(cpu)
  2013-06-01 18:21   ` [PATCH 1/2] hw_breakpoint: Fix cpu check in task_bp_pinned(cpu) Oleg Nesterov
@ 2013-06-13 14:20     ` Frederic Weisbecker
  0 siblings, 0 replies; 28+ messages in thread
From: Frederic Weisbecker @ 2013-06-13 14:20 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Vince Weaver, linux-kernel, Peter Zijlstra, Paul Mackerras,
	Ingo Molnar, Arnaldo Carvalho de Melo, trinity, Jiri Olsa

On Sat, Jun 01, 2013 at 08:21:20PM +0200, Oleg Nesterov wrote:
> trinity fuzzer triggered WARN_ONCE("Can't find any breakpoint slot")
> in arch_install_hw_breakpoint() but the problem is not arch-specific.
> 
> The problem is, task_bp_pinned(cpu) checks "cpu == iter->cpu" but
> this doesn't account the "all cpus" events with iter->cpu < 0.
> 
> This means that, say, register_user_hw_breakpoint(tsk) can happily
> create the arbitrary number > HBP_NUM of breakpoints which can not
> be activated. toggle_bp_task_slot() is equally wrong by the same
> reason and nr_task_bp_pinned[] can have negative entries.
> 
> Simple test:
> 
> 	# perl -e 'sleep 1 while 1' &
> 	# perf record -e mem:0x10,mem:0x10,mem:0x10,mem:0x10,mem:0x10 -p `pidof perl`
> 
> Before this patch this triggers the same problem/WARN_ON(), after
> the patch it correctly fails with -ENOSPC.
> 
> Reported-by: Vince Weaver <vincent.weaver@maine.edu>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> Cc: <stable@vger.kernel.org>

Looks good, thanks!

Acked-by: Frederic Weisbecker <fweisbec@gmail.com>


> ---
>  kernel/events/hw_breakpoint.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
> index ed1c897..29d3abe 100644
> --- a/kernel/events/hw_breakpoint.c
> +++ b/kernel/events/hw_breakpoint.c
> @@ -120,7 +120,7 @@ static int task_bp_pinned(int cpu, struct perf_event *bp, enum bp_type_idx type)
>  	list_for_each_entry(iter, &bp_task_head, hw.bp_list) {
>  		if (iter->hw.bp_target == tsk &&
>  		    find_slot_idx(iter) == type &&
> -		    cpu == iter->cpu)
> +		    (iter->cpu < 0 || cpu == iter->cpu))
>  			count += hw_breakpoint_weight(iter);
>  	}
>  
> -- 
> 1.5.5.1
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/3] hw_breakpoint: cleanups
  2013-06-13 14:01   ` [PATCH 0/3] hw_breakpoint: cleanups Frederic Weisbecker
@ 2013-06-13 15:15     ` Oleg Nesterov
  2013-06-13 15:24       ` Frederic Weisbecker
  0 siblings, 1 reply; 28+ messages in thread
From: Oleg Nesterov @ 2013-06-13 15:15 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, Vince Weaver, linux-kernel, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, trinity, Jiri Olsa

On 06/13, Frederic Weisbecker wrote:
>
> On Sat, Jun 01, 2013 at 09:45:26PM +0200, Oleg Nesterov wrote:
> > Hello.
> >
> > Cleanups, on top of
> >
> > 	[PATCH 0/2]: WARN_ONCE in arch/x86/kernel/hw_breakpoint.c
>
> So this series doesn't have the fix for the warning?

I don't understand the question ;)

The previous series

	[PATCH 1/2] hw_breakpoint: Fix cpu check in task_bp_pinned(cpu)
	[PATCH 2/2] hw_breakpoint: Use cpu_possible_mask in {reserve,release}_bp_slot()

tried to fix the bugs which lead to this (correct) warning.

This and the next one try to cleanup the code.

Oleg.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/3] hw_breakpoint: cleanups
  2013-06-13 15:15     ` Oleg Nesterov
@ 2013-06-13 15:24       ` Frederic Weisbecker
  0 siblings, 0 replies; 28+ messages in thread
From: Frederic Weisbecker @ 2013-06-13 15:24 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Vince Weaver, linux-kernel, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, trinity, Jiri Olsa

2013/6/13 Oleg Nesterov <oleg@redhat.com>:
> On 06/13, Frederic Weisbecker wrote:
>>
>> On Sat, Jun 01, 2013 at 09:45:26PM +0200, Oleg Nesterov wrote:
>> > Hello.
>> >
>> > Cleanups, on top of
>> >
>> >     [PATCH 0/2]: WARN_ONCE in arch/x86/kernel/hw_breakpoint.c
>>
>> So this series doesn't have the fix for the warning?
>
> I don't understand the question ;)
>
> The previous series
>
>         [PATCH 1/2] hw_breakpoint: Fix cpu check in task_bp_pinned(cpu)
>         [PATCH 2/2] hw_breakpoint: Use cpu_possible_mask in {reserve,release}_bp_slot()
>
> tried to fix the bugs which lead to this (correct) warning.
>
> This and the next one try to cleanup the code.

Ah ok, there is two series (/me confused as usual :)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] hw_breakpoint: Use cpu_possible_mask in {reserve,release}_bp_slot()
  2013-06-01 18:21   ` [PATCH 2/2] hw_breakpoint: Use cpu_possible_mask in {reserve,release}_bp_slot() Oleg Nesterov
@ 2013-06-15 12:46     ` Frederic Weisbecker
  0 siblings, 0 replies; 28+ messages in thread
From: Frederic Weisbecker @ 2013-06-15 12:46 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Vince Weaver, linux-kernel, Peter Zijlstra, Paul Mackerras,
	Ingo Molnar, Arnaldo Carvalho de Melo, trinity, Jiri Olsa

On Sat, Jun 01, 2013 at 08:21:39PM +0200, Oleg Nesterov wrote:
> fetch_bp_busy_slots() and toggle_bp_slot() use for_each_online_cpu(),
> this is obviously wrong wrt cpu_up() or cpu_down(), we can over/under
> account the per-cpu numbers.
> 
> For example:
> 
> 	# echo 0 >> /sys/devices/system/cpu/cpu1/online
> 	# perf record -e mem:0x10 -p 1 &
> 	# echo 1 >> /sys/devices/system/cpu/cpu1/online
> 	# perf record -e mem:0x10,mem:0x10,mem:0x10,mem:0x10 -C1 -a &
> 	# taskset -p 0x2 1
> 
> triggers the same WARN_ONCE("Can't find any breakpoint slot") in
> arch_install_hw_breakpoint().
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> Cc: <stable@vger.kernel.org>

Acked-by: Frederic Weisbecker <fweisbec@gmail.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/3] hw_breakpoint: Simplify list/idx mess in toggle_bp_slot() paths
  2013-06-01 19:45   ` [PATCH 1/3] hw_breakpoint: Simplify list/idx mess in toggle_bp_slot() paths Oleg Nesterov
@ 2013-06-15 12:59     ` Frederic Weisbecker
  0 siblings, 0 replies; 28+ messages in thread
From: Frederic Weisbecker @ 2013-06-15 12:59 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Vince Weaver, linux-kernel, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, trinity, Jiri Olsa

On Sat, Jun 01, 2013 at 09:45:48PM +0200, Oleg Nesterov wrote:
> The enable/disable logic in toggle_bp_slot() is not symmetrical
> and imho very confusing. "old_count" in toggle_bp_task_slot() is
> actually new_count because this bp was already removed from the
> list.
> 
> Change toggle_bp_slot() to always call list_add/list_del after
> toggle_bp_task_slot(). This way old_idx is task_bp_pinned() and
> this entry should be decremented, new_idx is +/-weight and we
> need to increment this element. The code/logic looks obvious.
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Nice :-)

Acked-by: Frederic Weisbecker <fweisbec@gmail.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/3] hw_breakpoint: Simplify the "weight" usage in toggle_bp_slot() paths
  2013-06-01 19:46   ` [PATCH 2/3] hw_breakpoint: Simplify the "weight" usage " Oleg Nesterov
@ 2013-06-15 13:14     ` Frederic Weisbecker
  0 siblings, 0 replies; 28+ messages in thread
From: Frederic Weisbecker @ 2013-06-15 13:14 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Vince Weaver, linux-kernel, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, trinity, Jiri Olsa

On Sat, Jun 01, 2013 at 09:46:06PM +0200, Oleg Nesterov wrote:
> Change toggle_bp_slot() to make "weight" negative if !enable. This
> way we can always use "+ weight" without additional "if (enable)"
> check and toggle_bp_task_slot() no longer needs this arg.
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Acked-by: Frederic Weisbecker <fweisbec@gmail.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/3] hw_breakpoint: Introduce cpumask_of_bp()
  2013-06-01 19:46   ` [PATCH 3/3] hw_breakpoint: Introduce cpumask_of_bp() Oleg Nesterov
@ 2013-06-15 13:29     ` Frederic Weisbecker
  0 siblings, 0 replies; 28+ messages in thread
From: Frederic Weisbecker @ 2013-06-15 13:29 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Vince Weaver, linux-kernel, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, trinity, Jiri Olsa

On Sat, Jun 01, 2013 at 09:46:26PM +0200, Oleg Nesterov wrote:
> Add the trivial helper which simply returns cpumask_of() or
> cpu_possible_mask depending on bp->cpu.
> 
> Change fetch_bp_busy_slots() and toggle_bp_slot() to always do
> for_each_cpu(cpumask_of_bp) to simplify the code and avoid the
> code duplication.
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Acked-by: Frederic Weisbecker <fweisbec@gmail.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] hw_breakpoint: Simplify *register_wide_hw_breakpoint()
  2013-06-02 19:50   ` [PATCH 1/2] hw_breakpoint: Simplify *register_wide_hw_breakpoint() Oleg Nesterov
@ 2013-06-18  0:12     ` Frederic Weisbecker
  0 siblings, 0 replies; 28+ messages in thread
From: Frederic Weisbecker @ 2013-06-18  0:12 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Vince Weaver, linux-kernel, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, trinity, Jiri Olsa

On Sun, Jun 02, 2013 at 09:50:11PM +0200, Oleg Nesterov wrote:
> 1. register_wide_hw_breakpoint() can use unregister_ if failure,
>    no need to duplicate the code.
> 
> 2. "struct perf_event **pevent" adds the unnecesary lever of
>    indirection and complication, use per_cpu(*cpu_events, cpu).
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Acked-by: Frederic Weisbecker <fweisbec@gmail.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] hw_breakpoint: Introduce "struct bp_cpuinfo"
  2013-06-02 19:50   ` [PATCH 2/2] hw_breakpoint: Introduce "struct bp_cpuinfo" Oleg Nesterov
@ 2013-06-18 12:37     ` Frederic Weisbecker
  2013-06-18 14:42       ` Oleg Nesterov
  0 siblings, 1 reply; 28+ messages in thread
From: Frederic Weisbecker @ 2013-06-18 12:37 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Vince Weaver, linux-kernel, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, trinity, Jiri Olsa

On Sun, Jun 02, 2013 at 09:50:57PM +0200, Oleg Nesterov wrote:
> This patch simply moves all per-cpu variables into the new single
> per-cpu "struct bp_cpuinfo".
> 
> To me this looks more logical and clean, but this can also simplify
> the further potential changes. In particular, I do not think this
> memory should be per-cpu, it is never used "locally". After this
> change it is trivial to turn it into, say, bootmem[nr_cpu_ids].
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

I'm ok with the patch because it's indeed more logical and clean to pack the info
to a single struct.

But I'm not sure why you think using per-cpu is a problem. It's not only
deemed for optimized local uses, it's also convenient for allocations and
de-allocation, or static definitions. I'm not sure why bootmem would make
more sense.

Other than this in the changelog, the patch is nice, thanks!

Acked-by: Frederic Weisbecker <fweisbec@gmail.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] hw_breakpoint: Introduce "struct bp_cpuinfo"
  2013-06-18 12:37     ` Frederic Weisbecker
@ 2013-06-18 14:42       ` Oleg Nesterov
  2013-06-18 17:01         ` Frederic Weisbecker
  0 siblings, 1 reply; 28+ messages in thread
From: Oleg Nesterov @ 2013-06-18 14:42 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, Vince Weaver, linux-kernel, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, trinity, Jiri Olsa

On 06/18, Frederic Weisbecker wrote:
>
> On Sun, Jun 02, 2013 at 09:50:57PM +0200, Oleg Nesterov wrote:
> > This patch simply moves all per-cpu variables into the new single
> > per-cpu "struct bp_cpuinfo".
> >
> > To me this looks more logical and clean, but this can also simplify
> > the further potential changes. In particular, I do not think this
> > memory should be per-cpu, it is never used "locally". After this
> > change it is trivial to turn it into, say, bootmem[nr_cpu_ids].
> >
> > Signed-off-by: Oleg Nesterov <oleg@redhat.com>
>
> I'm ok with the patch because it's indeed more logical and clean to pack the info
> to a single struct.

Great,

> But I'm not sure why you think using per-cpu is a problem. It's not only
> deemed for optimized local uses,

But it is.

Simplest example,

	for_each_possible_cpu(cpu)
		total_count = per_cpu(per_cpu_count, cpu);

Every per_cpu() likely means the cache miss. Not to mention we need the
additional math to calculate the address of the local counter.

	for_each_possible_cpu(cpu)
		total_count = bootmem_or_kmalloc_array[cpu];

is much better in this respect.

And note also that per_cpu_count above can share the cacheline with
another "hot" per-cpu variable.

> it's also convenient for allocations and
> de-allocation, or static definitions.

Yes, this is advantage. But afaics the only one.

> I'm not sure why bootmem would make
> more sense.

Or kcalloc(nr_cpu_ids), I didn't really mean that alloc_bootmem() is
necessarily the best option.

> Other than this in the changelog, the patch is nice, thanks!
>
> Acked-by: Frederic Weisbecker <fweisbec@gmail.com>

Thanks ;)

Oleg.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] hw_breakpoint: Introduce "struct bp_cpuinfo"
  2013-06-18 14:42       ` Oleg Nesterov
@ 2013-06-18 17:01         ` Frederic Weisbecker
  2013-06-19 15:54           ` Oleg Nesterov
  0 siblings, 1 reply; 28+ messages in thread
From: Frederic Weisbecker @ 2013-06-18 17:01 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Vince Weaver, linux-kernel, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, trinity, Jiri Olsa

On Tue, Jun 18, 2013 at 04:42:25PM +0200, Oleg Nesterov wrote:
> On 06/18, Frederic Weisbecker wrote:
> >
> > On Sun, Jun 02, 2013 at 09:50:57PM +0200, Oleg Nesterov wrote:
> > > This patch simply moves all per-cpu variables into the new single
> > > per-cpu "struct bp_cpuinfo".
> > >
> > > To me this looks more logical and clean, but this can also simplify
> > > the further potential changes. In particular, I do not think this
> > > memory should be per-cpu, it is never used "locally". After this
> > > change it is trivial to turn it into, say, bootmem[nr_cpu_ids].
> > >
> > > Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> >
> > I'm ok with the patch because it's indeed more logical and clean to pack the info
> > to a single struct.
> 
> Great,
> 
> > But I'm not sure why you think using per-cpu is a problem. It's not only
> > deemed for optimized local uses,
> 
> But it is.
> 
> Simplest example,
> 
> 	for_each_possible_cpu(cpu)
> 		total_count = per_cpu(per_cpu_count, cpu);
> 
> Every per_cpu() likely means the cache miss. Not to mention we need the
> additional math to calculate the address of the local counter.
> 
> 	for_each_possible_cpu(cpu)
> 		total_count = bootmem_or_kmalloc_array[cpu];
> 
> is much better in this respect.
> 
> And note also that per_cpu_count above can share the cacheline with
> another "hot" per-cpu variable.

Ah I see, that's good to know.

But these variables are supposed to only be touched from slow path
(perf events syscall, ptrace breakpoints creation, etc...), right?
So this is probably not a problem?

> 
> > it's also convenient for allocations and
> > de-allocation, or static definitions.
> 
> Yes, this is advantage. But afaics the only one.
> 
> > I'm not sure why bootmem would make
> > more sense.
> 
> Or kcalloc(nr_cpu_ids), I didn't really mean that alloc_bootmem() is
> necessarily the best option.

Ok.

Well if there are any real performance issue I don't mind using arrays
of course.

> 
> > Other than this in the changelog, the patch is nice, thanks!
> >
> > Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
> 
> Thanks ;)
> 
> Oleg.
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] hw_breakpoint: Introduce "struct bp_cpuinfo"
  2013-06-18 17:01         ` Frederic Weisbecker
@ 2013-06-19 15:54           ` Oleg Nesterov
  0 siblings, 0 replies; 28+ messages in thread
From: Oleg Nesterov @ 2013-06-19 15:54 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, Vince Weaver, linux-kernel, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, trinity, Jiri Olsa

On 06/18, Frederic Weisbecker wrote:
>
> On Tue, Jun 18, 2013 at 04:42:25PM +0200, Oleg Nesterov wrote:
> >
> > Simplest example,
> >
> > 	for_each_possible_cpu(cpu)
> > 		total_count = per_cpu(per_cpu_count, cpu);
> >
> > Every per_cpu() likely means the cache miss. Not to mention we need the
> > additional math to calculate the address of the local counter.
> >
> > 	for_each_possible_cpu(cpu)
> > 		total_count = bootmem_or_kmalloc_array[cpu];
> >
> > is much better in this respect.
> >
> > And note also that per_cpu_count above can share the cacheline with
> > another "hot" per-cpu variable.
>
> Ah I see, that's good to know.
>
> But these variables are supposed to only be touched from slow path
> (perf events syscall, ptrace breakpoints creation, etc...), right?
> So this is probably not a problem?

Yes, sure. But please note that this can also penalize other CPUs.
For example, toggle_bp_slot() writes to per_cpu(nr_cpu_bp_pinned),
this invalidates the cachline which can contain another per-cpu
variable.

But let me clarify. I agree, this all is minor, I am not trying to
say this change can actually improve the performance.

The main point of this patch is to make the code look a bit better,
and you seem to agree. The changelog mentions s/percpu/array/ only
as a potential change which obviously needs more discussion, I didnt
mean that we should necessarily do this.

Although yes, personally I really dislike per-cpu in this case, but
of course this is subjective and I won't argue ;)

Oleg.


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2013-06-19 15:58 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-20 16:19 WARN_ONCE in arch/x86/kernel/hw_breakpoint.c Vince Weaver
2013-05-28 17:00 ` Oleg Nesterov
2013-05-28 17:28   ` Oleg Nesterov
2013-05-28 18:47     ` Oleg Nesterov
2013-05-29 16:32       ` [MAYBEPATCH] : " Oleg Nesterov
2013-06-01 18:20 ` [PATCH 0/2]: " Oleg Nesterov
2013-06-01 18:21   ` [PATCH 1/2] hw_breakpoint: Fix cpu check in task_bp_pinned(cpu) Oleg Nesterov
2013-06-13 14:20     ` Frederic Weisbecker
2013-06-01 18:21   ` [PATCH 2/2] hw_breakpoint: Use cpu_possible_mask in {reserve,release}_bp_slot() Oleg Nesterov
2013-06-15 12:46     ` Frederic Weisbecker
2013-06-01 19:45 ` [PATCH 0/3] hw_breakpoint: cleanups Oleg Nesterov
2013-06-01 19:45   ` [PATCH 1/3] hw_breakpoint: Simplify list/idx mess in toggle_bp_slot() paths Oleg Nesterov
2013-06-15 12:59     ` Frederic Weisbecker
2013-06-01 19:46   ` [PATCH 2/3] hw_breakpoint: Simplify the "weight" usage " Oleg Nesterov
2013-06-15 13:14     ` Frederic Weisbecker
2013-06-01 19:46   ` [PATCH 3/3] hw_breakpoint: Introduce cpumask_of_bp() Oleg Nesterov
2013-06-15 13:29     ` Frederic Weisbecker
2013-06-13 14:01   ` [PATCH 0/3] hw_breakpoint: cleanups Frederic Weisbecker
2013-06-13 15:15     ` Oleg Nesterov
2013-06-13 15:24       ` Frederic Weisbecker
2013-06-02 19:49 ` [PATCH 0/2] hw_breakpoint: more cleanups Oleg Nesterov
2013-06-02 19:50   ` [PATCH 1/2] hw_breakpoint: Simplify *register_wide_hw_breakpoint() Oleg Nesterov
2013-06-18  0:12     ` Frederic Weisbecker
2013-06-02 19:50   ` [PATCH 2/2] hw_breakpoint: Introduce "struct bp_cpuinfo" Oleg Nesterov
2013-06-18 12:37     ` Frederic Weisbecker
2013-06-18 14:42       ` Oleg Nesterov
2013-06-18 17:01         ` Frederic Weisbecker
2013-06-19 15:54           ` Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).