linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] perf/core: fix group {cpu,task} validation
@ 2017-06-22 14:41 Mark Rutland
  2017-06-23  0:56 ` zhouchengming
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Mark Rutland @ 2017-06-22 14:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mark Rutland, Alexander Shishkin, Arnaldo Carvalho de Melo,
	Ingo Molnar, Peter Zijlstra, Zhou Chengming

Regardless of which events form a group, it does not make sense for the
events to target different tasks and/or CPUs, as this leaves the group
inconsistent and impossible to schedule. The core perf code assumes that
these are consistent across (successfully intialised) groups.

Core perf code only verifies this when moving SW events into a HW
context. Thus, we can violate this requirement for pure SW groups and
pure HW groups, unless the relevant PMU driver happens to perform this
verification itself. These mismatched groups subsequently wreak havoc
elsewhere.

For example, we handle watchpoints as SW events, and reserve watchpoint
HW on a per-cpu basis at pmu::event_init() time to ensure that any event
that is initialised is guaranteed to have a slot at pmu::add() time.
However, the core code only checks the group leader's cpu filter (via
event_filter_match()), and can thus install follower events onto CPUs
violating thier (mismatched) CPU filters, potentially installing them
into a CPU without sufficient reserved slots.

This can be triggered with the below test case, resulting in warnings
from arch backends.

  #define _GNU_SOURCE
  #include <linux/hw_breakpoint.h>
  #include <linux/perf_event.h>
  #include <sched.h>
  #include <stdio.h>
  #include <sys/prctl.h>
  #include <sys/syscall.h>
  #include <unistd.h>

  static int perf_event_open(struct perf_event_attr *attr, pid_t pid, int cpu,
  			   int group_fd, unsigned long flags)
  {
  	return syscall(__NR_perf_event_open, attr, pid, cpu, group_fd, flags);
  }

  char watched_char;

  struct perf_event_attr wp_attr = {
  	.type = PERF_TYPE_BREAKPOINT,
  	.bp_type = HW_BREAKPOINT_RW,
  	.bp_addr = (unsigned long)&watched_char,
  	.bp_len = 1,
  	.size = sizeof(wp_attr),
  };

  int main(int argc, char *argv[])
  {
  	int leader, ret;
  	cpu_set_t cpus;

  	/*
  	 * Force use of CPU0 to ensure our CPU0-bound events get scheduled.
  	 */
  	CPU_ZERO(&cpus);
  	CPU_SET(0, &cpus);
  	ret = sched_setaffinity(0, sizeof(cpus), &cpus);
  	if (ret) {
  		printf("Unable to set cpu affinity\n");
  		return 1;
  	}

  	/* open leader event, bound to this task, CPU0 only */
  	leader = perf_event_open(&wp_attr, 0, 0, -1, 0);
  	if (leader < 0) {
  		printf("Couldn't open leader: %d\n", leader);
  		return 1;
  	}

  	/*
  	 * Open a follower event that is bound to the same task, but a
  	 * different CPU. This means that the group should never be possible to
  	 * schedule.
  	 */
  	ret = perf_event_open(&wp_attr, 0, 1, leader, 0);
  	if (ret < 0) {
  		printf("Couldn't open mismatched follower: %d\n", ret);
  		return 1;
  	} else {
  		printf("Opened leader/follower with mismastched CPUs\n");
  	}

  	/*
  	 * Open as many independent events as we can, all bound to the same
  	 * task, CPU0 only.
  	 */
  	do {
  		ret = perf_event_open(&wp_attr, 0, 0, -1, 0);
  	} while (ret >= 0);

  	/*
  	 * Force enable/disble all events to trigger the erronoeous
  	 * installation of the follower event.
  	 */
  	printf("Opened all events. Toggling..\n");
  	for (;;) {
  		prctl(PR_TASK_PERF_EVENTS_DISABLE, 0, 0, 0, 0);
  		prctl(PR_TASK_PERF_EVENTS_ENABLE, 0, 0, 0, 0);
  	}

  	return 0;
  }

Fix this by validating this requirement regardless of whether we're
moving events.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zhou Chengming <zhouchengming1@huawei.com>
Cc: linux-kernel@vger.kernel.org
---
 kernel/events/core.c | 39 +++++++++++++++++++--------------------
 1 file changed, 19 insertions(+), 20 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 6c4e523..1dca484 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -10010,28 +10010,27 @@ static int perf_event_set_clock(struct perf_event *event, clockid_t clk_id)
 			goto err_context;
 
 		/*
-		 * Do not allow to attach to a group in a different
-		 * task or CPU context:
+		 * Make sure we're both events for the same CPU;
+		 * grouping events for different CPUs is broken; since
+		 * you can never concurrently schedule them anyhow.
 		 */
-		if (move_group) {
-			/*
-			 * Make sure we're both on the same task, or both
-			 * per-cpu events.
-			 */
-			if (group_leader->ctx->task != ctx->task)
-				goto err_context;
+		if (group_leader->cpu != event->cpu)
+			goto err_context;
 
-			/*
-			 * Make sure we're both events for the same CPU;
-			 * grouping events for different CPUs is broken; since
-			 * you can never concurrently schedule them anyhow.
-			 */
-			if (group_leader->cpu != event->cpu)
-				goto err_context;
-		} else {
-			if (group_leader->ctx != ctx)
-				goto err_context;
-		}
+		/*
+		 * Make sure we're both on the same task, or both
+		 * per-cpu events.
+		 */
+		if (group_leader->ctx->task != ctx->task)
+			goto err_context;
+
+		/*
+		 * Do not allow to attach to a group in a different task
+		 * or CPU context. If we're moving SW events, we'll fix
+		 * this up later, so allow that.
+		 */
+		if (!move_group && group_leader->ctx != ctx)
+			goto err_context;
 
 		/*
 		 * Only a group leader can be exclusive or pinned
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] perf/core: fix group {cpu,task} validation
  2017-06-22 14:41 [PATCH] perf/core: fix group {cpu,task} validation Mark Rutland
@ 2017-06-23  0:56 ` zhouchengming
  2017-06-23 10:07   ` Mark Rutland
  2017-08-21 15:53 ` Peter Zijlstra
  2017-08-25 11:51 ` [tip:perf/core] perf/core: Fix " tip-bot for Mark Rutland
  2 siblings, 1 reply; 7+ messages in thread
From: zhouchengming @ 2017-06-23  0:56 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-kernel, Alexander Shishkin, Arnaldo Carvalho de Melo,
	Ingo Molnar, Peter Zijlstra, Li Bin

On 2017/6/22 22:41, Mark Rutland wrote:
> Regardless of which events form a group, it does not make sense for the
> events to target different tasks and/or CPUs, as this leaves the group
> inconsistent and impossible to schedule. The core perf code assumes that
> these are consistent across (successfully intialised) groups.
>
> Core perf code only verifies this when moving SW events into a HW
> context. Thus, we can violate this requirement for pure SW groups and
> pure HW groups, unless the relevant PMU driver happens to perform this
> verification itself. These mismatched groups subsequently wreak havoc
> elsewhere.
>
> For example, we handle watchpoints as SW events, and reserve watchpoint
> HW on a per-cpu basis at pmu::event_init() time to ensure that any event
> that is initialised is guaranteed to have a slot at pmu::add() time.
> However, the core code only checks the group leader's cpu filter (via
> event_filter_match()), and can thus install follower events onto CPUs
> violating thier (mismatched) CPU filters, potentially installing them
> into a CPU without sufficient reserved slots.
>
> This can be triggered with the below test case, resulting in warnings
> from arch backends.
>
>    #define _GNU_SOURCE
>    #include<linux/hw_breakpoint.h>
>    #include<linux/perf_event.h>
>    #include<sched.h>
>    #include<stdio.h>
>    #include<sys/prctl.h>
>    #include<sys/syscall.h>
>    #include<unistd.h>
>
>    static int perf_event_open(struct perf_event_attr *attr, pid_t pid, int cpu,
>    			   int group_fd, unsigned long flags)
>    {
>    	return syscall(__NR_perf_event_open, attr, pid, cpu, group_fd, flags);
>    }
>
>    char watched_char;
>
>    struct perf_event_attr wp_attr = {
>    	.type = PERF_TYPE_BREAKPOINT,
>    	.bp_type = HW_BREAKPOINT_RW,
>    	.bp_addr = (unsigned long)&watched_char,
>    	.bp_len = 1,
>    	.size = sizeof(wp_attr),
>    };
>
>    int main(int argc, char *argv[])
>    {
>    	int leader, ret;
>    	cpu_set_t cpus;
>
>    	/*
>    	 * Force use of CPU0 to ensure our CPU0-bound events get scheduled.
>    	 */
>    	CPU_ZERO(&cpus);
>    	CPU_SET(0,&cpus);
>    	ret = sched_setaffinity(0, sizeof(cpus),&cpus);
>    	if (ret) {
>    		printf("Unable to set cpu affinity\n");
>    		return 1;
>    	}
>
>    	/* open leader event, bound to this task, CPU0 only */
>    	leader = perf_event_open(&wp_attr, 0, 0, -1, 0);
>    	if (leader<  0) {
>    		printf("Couldn't open leader: %d\n", leader);
>    		return 1;
>    	}
>
>    	/*
>    	 * Open a follower event that is bound to the same task, but a
>    	 * different CPU. This means that the group should never be possible to
>    	 * schedule.
>    	 */
>    	ret = perf_event_open(&wp_attr, 0, 1, leader, 0);
>    	if (ret<  0) {
>    		printf("Couldn't open mismatched follower: %d\n", ret);
>    		return 1;
>    	} else {
>    		printf("Opened leader/follower with mismastched CPUs\n");
>    	}
>
>    	/*
>    	 * Open as many independent events as we can, all bound to the same
>    	 * task, CPU0 only.
>    	 */
>    	do {
>    		ret = perf_event_open(&wp_attr, 0, 0, -1, 0);
>    	} while (ret>= 0);
>
>    	/*
>    	 * Force enable/disble all events to trigger the erronoeous
>    	 * installation of the follower event.
>    	 */
>    	printf("Opened all events. Toggling..\n");
>    	for (;;) {
>    		prctl(PR_TASK_PERF_EVENTS_DISABLE, 0, 0, 0, 0);
>    		prctl(PR_TASK_PERF_EVENTS_ENABLE, 0, 0, 0, 0);
>    	}
>
>    	return 0;
>    }

Very good example!

>
> Fix this by validating this requirement regardless of whether we're
> moving events.
>
> Signed-off-by: Mark Rutland<mark.rutland@arm.com>
> Cc: Alexander Shishkin<alexander.shishkin@linux.intel.com>
> Cc: Arnaldo Carvalho de Melo<acme@kernel.org>
> Cc: Ingo Molnar<mingo@redhat.com>
> Cc: Peter Zijlstra<peterz@infradead.org>
> Cc: Zhou Chengming<zhouchengming1@huawei.com>
> Cc: linux-kernel@vger.kernel.org
> ---
>   kernel/events/core.c | 39 +++++++++++++++++++--------------------
>   1 file changed, 19 insertions(+), 20 deletions(-)
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 6c4e523..1dca484 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -10010,28 +10010,27 @@ static int perf_event_set_clock(struct perf_event *event, clockid_t clk_id)
>   			goto err_context;
>
>   		/*
> -		 * Do not allow to attach to a group in a different
> -		 * task or CPU context:
> +		 * Make sure we're both events for the same CPU;
> +		 * grouping events for different CPUs is broken; since
> +		 * you can never concurrently schedule them anyhow.
>   		 */
> -		if (move_group) {
> -			/*
> -			 * Make sure we're both on the same task, or both
> -			 * per-cpu events.
> -			 */
> -			if (group_leader->ctx->task != ctx->task)
> -				goto err_context;
> +		if (group_leader->cpu != event->cpu)
> +			goto err_context;
>
> -			/*
> -			 * Make sure we're both events for the same CPU;
> -			 * grouping events for different CPUs is broken; since
> -			 * you can never concurrently schedule them anyhow.
> -			 */
> -			if (group_leader->cpu != event->cpu)
> -				goto err_context;
> -		} else {
> -			if (group_leader->ctx != ctx)
> -				goto err_context;
> -		}
> +		/*
> +		 * Make sure we're both on the same task, or both
> +		 * per-cpu events.
> +		 */
> +		if (group_leader->ctx->task != ctx->task)
> +			goto err_context;
> +
> +		/*
> +		 * Do not allow to attach to a group in a different task
> +		 * or CPU context. If we're moving SW events, we'll fix
> +		 * this up later, so allow that.
> +		 */
> +		if (!move_group&&  group_leader->ctx != ctx)
> +			goto err_context;

We don't need to check move_group here, the previous two checks already make sure
the events are on the same task and the same cpu.  So when move_group needed, they
will be moved to the same taskctx or cpuctx then.

Thanks.

>
>   		/*
>   		 * Only a group leader can be exclusive or pinned

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] perf/core: fix group {cpu,task} validation
  2017-06-23  0:56 ` zhouchengming
@ 2017-06-23 10:07   ` Mark Rutland
  0 siblings, 0 replies; 7+ messages in thread
From: Mark Rutland @ 2017-06-23 10:07 UTC (permalink / raw)
  To: zhouchengming
  Cc: linux-kernel, Alexander Shishkin, Arnaldo Carvalho de Melo,
	Ingo Molnar, Peter Zijlstra, Li Bin

Hi,

On Fri, Jun 23, 2017 at 08:56:38AM +0800, zhouchengming wrote:
> On 2017/6/22 22:41, Mark Rutland wrote:
> >Regardless of which events form a group, it does not make sense for the
> >events to target different tasks and/or CPUs, as this leaves the group
> >inconsistent and impossible to schedule. The core perf code assumes that
> >these are consistent across (successfully intialised) groups.
> >
> >Core perf code only verifies this when moving SW events into a HW
> >context. Thus, we can violate this requirement for pure SW groups and
> >pure HW groups, unless the relevant PMU driver happens to perform this
> >verification itself. These mismatched groups subsequently wreak havoc
> >elsewhere.
> >
> >For example, we handle watchpoints as SW events, and reserve watchpoint
> >HW on a per-cpu basis at pmu::event_init() time to ensure that any event
> >that is initialised is guaranteed to have a slot at pmu::add() time.
> >However, the core code only checks the group leader's cpu filter (via
> >event_filter_match()), and can thus install follower events onto CPUs
> >violating thier (mismatched) CPU filters, potentially installing them
> >into a CPU without sufficient reserved slots.

[...]

> >Fix this by validating this requirement regardless of whether we're
> >moving events.
> >
> >Signed-off-by: Mark Rutland<mark.rutland@arm.com>
> >Cc: Alexander Shishkin<alexander.shishkin@linux.intel.com>
> >Cc: Arnaldo Carvalho de Melo<acme@kernel.org>
> >Cc: Ingo Molnar<mingo@redhat.com>
> >Cc: Peter Zijlstra<peterz@infradead.org>
> >Cc: Zhou Chengming<zhouchengming1@huawei.com>
> >Cc: linux-kernel@vger.kernel.org
> >---
> >  kernel/events/core.c | 39 +++++++++++++++++++--------------------
> >  1 file changed, 19 insertions(+), 20 deletions(-)
> >
> >diff --git a/kernel/events/core.c b/kernel/events/core.c
> >index 6c4e523..1dca484 100644
> >--- a/kernel/events/core.c
> >+++ b/kernel/events/core.c
> >@@ -10010,28 +10010,27 @@ static int perf_event_set_clock(struct perf_event *event, clockid_t clk_id)
> >  			goto err_context;
> >
> >  		/*
> >-		 * Do not allow to attach to a group in a different
> >-		 * task or CPU context:
> >+		 * Make sure we're both events for the same CPU;
> >+		 * grouping events for different CPUs is broken; since
> >+		 * you can never concurrently schedule them anyhow.
> >  		 */
> >-		if (move_group) {
> >-			/*
> >-			 * Make sure we're both on the same task, or both
> >-			 * per-cpu events.
> >-			 */
> >-			if (group_leader->ctx->task != ctx->task)
> >-				goto err_context;
> >+		if (group_leader->cpu != event->cpu)
> >+			goto err_context;
> >
> >-			/*
> >-			 * Make sure we're both events for the same CPU;
> >-			 * grouping events for different CPUs is broken; since
> >-			 * you can never concurrently schedule them anyhow.
> >-			 */
> >-			if (group_leader->cpu != event->cpu)
> >-				goto err_context;
> >-		} else {
> >-			if (group_leader->ctx != ctx)
> >-				goto err_context;
> >-		}
> >+		/*
> >+		 * Make sure we're both on the same task, or both
> >+		 * per-cpu events.
> >+		 */
> >+		if (group_leader->ctx->task != ctx->task)
> >+			goto err_context;
> >+
> >+		/*
> >+		 * Do not allow to attach to a group in a different task
> >+		 * or CPU context. If we're moving SW events, we'll fix
> >+		 * this up later, so allow that.
> >+		 */
> >+		if (!move_group&&  group_leader->ctx != ctx)
> >+			goto err_context;
> 
> We don't need to check move_group here, the previous two checks
> already make sure the events are on the same task and the same cpu.

That's not sufficient to ensure that they're the same context, however.

> So when move_group needed, they will be moved to the same taskctx or
> cpuctx then.

Consider the case of two "uncore" PMUs, X and Y. Each has their own
cpuctx. You could open PMU X event with cpu == 0 && !task, and you could
subsequently open a PMU Y event following X with cpu == 0, && !task.

Neither event is a SW event, so we won't set move_group, and thus we
won't move either event.

Each event would be placed in its respective PMU's cpuctx, so
group_leader->ctx != event->ctx. We don't check this again prior to
installing the event, which would go wrong:

perf_install_in_context(ctx, event, event->cpu)
-> __perf_install_in_context()
-> add_event_to_ctx(event, ctx)
-> perf_group_attach(event)
-> WARN_ON_ONCE(group_leader->ctx != event->ctx)

... and subsequently a number of other things could go wrong due to this
mismatch.

We need to keep this check in the !move_group case.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] perf/core: fix group {cpu,task} validation
  2017-06-22 14:41 [PATCH] perf/core: fix group {cpu,task} validation Mark Rutland
  2017-06-23  0:56 ` zhouchengming
@ 2017-08-21 15:53 ` Peter Zijlstra
  2017-08-21 16:01   ` Mark Rutland
  2017-08-25 11:51 ` [tip:perf/core] perf/core: Fix " tip-bot for Mark Rutland
  2 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2017-08-21 15:53 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-kernel, Alexander Shishkin, Arnaldo Carvalho de Melo,
	Ingo Molnar, Zhou Chengming

On Thu, Jun 22, 2017 at 03:41:38PM +0100, Mark Rutland wrote:
> Regardless of which events form a group, it does not make sense for the
> events to target different tasks and/or CPUs, as this leaves the group
> inconsistent and impossible to schedule. The core perf code assumes that
> these are consistent across (successfully intialised) groups.
> 
> Core perf code only verifies this when moving SW events into a HW
> context. Thus, we can violate this requirement for pure SW groups and
> pure HW groups, unless the relevant PMU driver happens to perform this
> verification itself. These mismatched groups subsequently wreak havoc
> elsewhere.
> 
> For example, we handle watchpoints as SW events, and reserve watchpoint
> HW on a per-cpu basis at pmu::event_init() time to ensure that any event
> that is initialised is guaranteed to have a slot at pmu::add() time.
> However, the core code only checks the group leader's cpu filter (via
> event_filter_match()), and can thus install follower events onto CPUs
> violating thier (mismatched) CPU filters, potentially installing them
> into a CPU without sufficient reserved slots.
> 

> 
> Fix this by validating this requirement regardless of whether we're
> moving events.

Yes, and this also appears to cure your other problem:

  https://lkml.kernel.org/r/20170810173551.GD12812@leverpostej


Thanks!

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] perf/core: fix group {cpu,task} validation
  2017-08-21 15:53 ` Peter Zijlstra
@ 2017-08-21 16:01   ` Mark Rutland
  2017-08-21 16:08     ` Peter Zijlstra
  0 siblings, 1 reply; 7+ messages in thread
From: Mark Rutland @ 2017-08-21 16:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Alexander Shishkin, Arnaldo Carvalho de Melo,
	Ingo Molnar, Zhou Chengming

On Mon, Aug 21, 2017 at 05:53:26PM +0200, Peter Zijlstra wrote:
> On Thu, Jun 22, 2017 at 03:41:38PM +0100, Mark Rutland wrote:
> > Regardless of which events form a group, it does not make sense for the
> > events to target different tasks and/or CPUs, as this leaves the group
> > inconsistent and impossible to schedule. The core perf code assumes that
> > these are consistent across (successfully intialised) groups.
> > 
> > Core perf code only verifies this when moving SW events into a HW
> > context. Thus, we can violate this requirement for pure SW groups and
> > pure HW groups, unless the relevant PMU driver happens to perform this
> > verification itself. These mismatched groups subsequently wreak havoc
> > elsewhere.
> > 
> > For example, we handle watchpoints as SW events, and reserve watchpoint
> > HW on a per-cpu basis at pmu::event_init() time to ensure that any event
> > that is initialised is guaranteed to have a slot at pmu::add() time.
> > However, the core code only checks the group leader's cpu filter (via
> > event_filter_match()), and can thus install follower events onto CPUs
> > violating thier (mismatched) CPU filters, potentially installing them
> > into a CPU without sufficient reserved slots.
> 
> > Fix this by validating this requirement regardless of whether we're
> > moving events.
> 
> Yes, and this also appears to cure your other problem:
> 
>   https://lkml.kernel.org/r/20170810173551.GD12812@leverpostej

Ah; sorry for the duplicate report! I should have realised.

I guess this will get queued soon?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] perf/core: fix group {cpu,task} validation
  2017-08-21 16:01   ` Mark Rutland
@ 2017-08-21 16:08     ` Peter Zijlstra
  0 siblings, 0 replies; 7+ messages in thread
From: Peter Zijlstra @ 2017-08-21 16:08 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-kernel, Alexander Shishkin, Arnaldo Carvalho de Melo,
	Ingo Molnar, Zhou Chengming

On Mon, Aug 21, 2017 at 05:01:38PM +0100, Mark Rutland wrote:
> On Mon, Aug 21, 2017 at 05:53:26PM +0200, Peter Zijlstra wrote:
> > On Thu, Jun 22, 2017 at 03:41:38PM +0100, Mark Rutland wrote:
> > > Regardless of which events form a group, it does not make sense for the
> > > events to target different tasks and/or CPUs, as this leaves the group
> > > inconsistent and impossible to schedule. The core perf code assumes that
> > > these are consistent across (successfully intialised) groups.
> > > 
> > > Core perf code only verifies this when moving SW events into a HW
> > > context. Thus, we can violate this requirement for pure SW groups and
> > > pure HW groups, unless the relevant PMU driver happens to perform this
> > > verification itself. These mismatched groups subsequently wreak havoc
> > > elsewhere.
> > > 
> > > For example, we handle watchpoints as SW events, and reserve watchpoint
> > > HW on a per-cpu basis at pmu::event_init() time to ensure that any event
> > > that is initialised is guaranteed to have a slot at pmu::add() time.
> > > However, the core code only checks the group leader's cpu filter (via
> > > event_filter_match()), and can thus install follower events onto CPUs
> > > violating thier (mismatched) CPU filters, potentially installing them
> > > into a CPU without sufficient reserved slots.
> > 
> > > Fix this by validating this requirement regardless of whether we're
> > > moving events.
> > 
> > Yes, and this also appears to cure your other problem:
> > 
> >   https://lkml.kernel.org/r/20170810173551.GD12812@leverpostej
> 
> Ah; sorry for the duplicate report! I should have realised.
> 
> I guess this will get queued soon?

Done :-) I'll try and hand to Ingo before end of week.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [tip:perf/core] perf/core: Fix group {cpu,task} validation
  2017-06-22 14:41 [PATCH] perf/core: fix group {cpu,task} validation Mark Rutland
  2017-06-23  0:56 ` zhouchengming
  2017-08-21 15:53 ` Peter Zijlstra
@ 2017-08-25 11:51 ` tip-bot for Mark Rutland
  2 siblings, 0 replies; 7+ messages in thread
From: tip-bot for Mark Rutland @ 2017-08-25 11:51 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: zhouchengming1, mark.rutland, torvalds, mingo, tglx,
	alexander.shishkin, acme, peterz, hpa, linux-kernel

Commit-ID:  64aee2a965cf2954a038b5522f11d2cd2f0f8f3e
Gitweb:     http://git.kernel.org/tip/64aee2a965cf2954a038b5522f11d2cd2f0f8f3e
Author:     Mark Rutland <mark.rutland@arm.com>
AuthorDate: Thu, 22 Jun 2017 15:41:38 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 25 Aug 2017 11:00:34 +0200

perf/core: Fix group {cpu,task} validation

Regardless of which events form a group, it does not make sense for the
events to target different tasks and/or CPUs, as this leaves the group
inconsistent and impossible to schedule. The core perf code assumes that
these are consistent across (successfully intialised) groups.

Core perf code only verifies this when moving SW events into a HW
context. Thus, we can violate this requirement for pure SW groups and
pure HW groups, unless the relevant PMU driver happens to perform this
verification itself. These mismatched groups subsequently wreak havoc
elsewhere.

For example, we handle watchpoints as SW events, and reserve watchpoint
HW on a per-CPU basis at pmu::event_init() time to ensure that any event
that is initialised is guaranteed to have a slot at pmu::add() time.
However, the core code only checks the group leader's cpu filter (via
event_filter_match()), and can thus install follower events onto CPUs
violating thier (mismatched) CPU filters, potentially installing them
into a CPU without sufficient reserved slots.

This can be triggered with the below test case, resulting in warnings
from arch backends.

  #define _GNU_SOURCE
  #include <linux/hw_breakpoint.h>
  #include <linux/perf_event.h>
  #include <sched.h>
  #include <stdio.h>
  #include <sys/prctl.h>
  #include <sys/syscall.h>
  #include <unistd.h>

  static int perf_event_open(struct perf_event_attr *attr, pid_t pid, int cpu,
			   int group_fd, unsigned long flags)
  {
	return syscall(__NR_perf_event_open, attr, pid, cpu, group_fd, flags);
  }

  char watched_char;

  struct perf_event_attr wp_attr = {
	.type = PERF_TYPE_BREAKPOINT,
	.bp_type = HW_BREAKPOINT_RW,
	.bp_addr = (unsigned long)&watched_char,
	.bp_len = 1,
	.size = sizeof(wp_attr),
  };

  int main(int argc, char *argv[])
  {
	int leader, ret;
	cpu_set_t cpus;

	/*
	 * Force use of CPU0 to ensure our CPU0-bound events get scheduled.
	 */
	CPU_ZERO(&cpus);
	CPU_SET(0, &cpus);
	ret = sched_setaffinity(0, sizeof(cpus), &cpus);
	if (ret) {
		printf("Unable to set cpu affinity\n");
		return 1;
	}

	/* open leader event, bound to this task, CPU0 only */
	leader = perf_event_open(&wp_attr, 0, 0, -1, 0);
	if (leader < 0) {
		printf("Couldn't open leader: %d\n", leader);
		return 1;
	}

	/*
	 * Open a follower event that is bound to the same task, but a
	 * different CPU. This means that the group should never be possible to
	 * schedule.
	 */
	ret = perf_event_open(&wp_attr, 0, 1, leader, 0);
	if (ret < 0) {
		printf("Couldn't open mismatched follower: %d\n", ret);
		return 1;
	} else {
		printf("Opened leader/follower with mismastched CPUs\n");
	}

	/*
	 * Open as many independent events as we can, all bound to the same
	 * task, CPU0 only.
	 */
	do {
		ret = perf_event_open(&wp_attr, 0, 0, -1, 0);
	} while (ret >= 0);

	/*
	 * Force enable/disble all events to trigger the erronoeous
	 * installation of the follower event.
	 */
	printf("Opened all events. Toggling..\n");
	for (;;) {
		prctl(PR_TASK_PERF_EVENTS_DISABLE, 0, 0, 0, 0);
		prctl(PR_TASK_PERF_EVENTS_ENABLE, 0, 0, 0, 0);
	}

	return 0;
  }

Fix this by validating this requirement regardless of whether we're
moving events.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Zhou Chengming <zhouchengming1@huawei.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1498142498-15758-1-git-send-email-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/events/core.c | 39 +++++++++++++++++++--------------------
 1 file changed, 19 insertions(+), 20 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index ee20d4c..3504125 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -10032,28 +10032,27 @@ SYSCALL_DEFINE5(perf_event_open,
 			goto err_context;
 
 		/*
-		 * Do not allow to attach to a group in a different
-		 * task or CPU context:
+		 * Make sure we're both events for the same CPU;
+		 * grouping events for different CPUs is broken; since
+		 * you can never concurrently schedule them anyhow.
 		 */
-		if (move_group) {
-			/*
-			 * Make sure we're both on the same task, or both
-			 * per-cpu events.
-			 */
-			if (group_leader->ctx->task != ctx->task)
-				goto err_context;
+		if (group_leader->cpu != event->cpu)
+			goto err_context;
 
-			/*
-			 * Make sure we're both events for the same CPU;
-			 * grouping events for different CPUs is broken; since
-			 * you can never concurrently schedule them anyhow.
-			 */
-			if (group_leader->cpu != event->cpu)
-				goto err_context;
-		} else {
-			if (group_leader->ctx != ctx)
-				goto err_context;
-		}
+		/*
+		 * Make sure we're both on the same task, or both
+		 * per-CPU events.
+		 */
+		if (group_leader->ctx->task != ctx->task)
+			goto err_context;
+
+		/*
+		 * Do not allow to attach to a group in a different task
+		 * or CPU context. If we're moving SW events, we'll fix
+		 * this up later, so allow that.
+		 */
+		if (!move_group && group_leader->ctx != ctx)
+			goto err_context;
 
 		/*
 		 * Only a group leader can be exclusive or pinned

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-08-25 11:57 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-22 14:41 [PATCH] perf/core: fix group {cpu,task} validation Mark Rutland
2017-06-23  0:56 ` zhouchengming
2017-06-23 10:07   ` Mark Rutland
2017-08-21 15:53 ` Peter Zijlstra
2017-08-21 16:01   ` Mark Rutland
2017-08-21 16:08     ` Peter Zijlstra
2017-08-25 11:51 ` [tip:perf/core] perf/core: Fix " tip-bot for Mark Rutland

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).