linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] perf/x86: Consider pinned events for group validation
@ 2019-08-16 17:49 kan.liang
  2019-08-20 14:10 ` Peter Zijlstra
  0 siblings, 1 reply; 5+ messages in thread
From: kan.liang @ 2019-08-16 17:49 UTC (permalink / raw)
  To: peterz, mingo, acme, linux-kernel; +Cc: eranian, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

perf stat -M metrics relies on weak groups to reject unschedulable
groups and run them as non-groups.
This uses the group validation code in the kernel. Unfortunately
that code doesn't take pinned events, such as the NMI watchdog, into
account. So some groups can pass validation, but then later still
never schedule.

For example,

 $echo 1 > /proc/sys/kernel/nmi_watchdog
 $perf stat -M Page_Walks_Utilization

 Performance counter stats for 'system wide':

     <not counted>      itlb_misses.walk_pending
(0.00%)
     <not counted>      dtlb_load_misses.walk_pending
(0.00%)
     <not counted>      dtlb_store_misses.walk_pending
(0.00%)
     <not counted>      ept.walk_pending
(0.00%)
     <not counted>      cycles
(0.00%)

       1.176613558 seconds time elapsed

Current pinned events are always scheduled first. So the new group must
can be scheduled together with current pinned events. Otherwise, it will
never get a chance to be scheduled later.
The trick is to pretend the current pinned events as part of the new
group, and insert them into the fake_cpuc.
The simulation result will tell if they can be scheduled successfully.
The fake_cpuc never touch event state. The current pinned events will
not be impacted.

It won't catch all possible cases that cannot be scheduled, such as
events pinned differently on different CPUs, or complicated constraints.
But for the most common case, the NMI watchdog interacting with the
current perf metrics, it is strong enough.

After applying the patch,

 $echo 1 > /proc/sys/kernel/nmi_watchdog
 $ perf stat -M Page_Walks_Utilization

 Performance counter stats for 'system wide':

         2,491,910      itlb_misses.walk_pending  #      0.0
Page_Walks_Utilization   (79.94%)
        13,630,942      dtlb_load_misses.walk_pending
(80.02%)
           207,255      dtlb_store_misses.walk_pending
(80.04%)
                 0      ept.walk_pending
(80.04%)
       236,204,924      cycles
(79.97%)

       0.901785713 seconds time elapsed

Reported-by: Stephane Eranian <eranian@google.com>
Suggested-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/core.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 81b005e..c8ed441 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2011,9 +2011,11 @@ static int validate_event(struct perf_event *event)
  */
 static int validate_group(struct perf_event *event)
 {
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
 	struct perf_event *leader = event->group_leader;
 	struct cpu_hw_events *fake_cpuc;
-	int ret = -EINVAL, n;
+	struct perf_event *pinned_event;
+	int ret = -EINVAL, n, i;
 
 	fake_cpuc = allocate_fake_cpuc();
 	if (IS_ERR(fake_cpuc))
@@ -2033,6 +2035,24 @@ static int validate_group(struct perf_event *event)
 	if (n < 0)
 		goto out;
 
+	/*
+	 * The new group must can be scheduled
+	 * together with current pinned events.
+	 * Otherwise, it will never get a chance
+	 * to be scheduled later.
+	 */
+	for (i = 0; i < cpuc->n_events; i++) {
+		pinned_event = cpuc->event_list[i];
+		if (WARN_ON_ONCE(!pinned_event))
+			continue;
+		if (!pinned_event->attr.pinned)
+			continue;
+		fake_cpuc->n_events = n;
+		n = collect_events(fake_cpuc, pinned_event, false);
+		if (n < 0)
+			goto out;
+	}
+
 	fake_cpuc->n_events = 0;
 	ret = x86_pmu.schedule_events(fake_cpuc, n, NULL);
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] perf/x86: Consider pinned events for group validation
  2019-08-16 17:49 [PATCH] perf/x86: Consider pinned events for group validation kan.liang
@ 2019-08-20 14:10 ` Peter Zijlstra
  2019-08-20 14:52   ` Liang, Kan
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Zijlstra @ 2019-08-20 14:10 UTC (permalink / raw)
  To: kan.liang; +Cc: mingo, acme, linux-kernel, eranian, ak

On Fri, Aug 16, 2019 at 10:49:10AM -0700, kan.liang@linux.intel.com wrote:
> From: Kan Liang <kan.liang@linux.intel.com>
> 
> perf stat -M metrics relies on weak groups to reject unschedulable
> groups and run them as non-groups.
> This uses the group validation code in the kernel. Unfortunately
> that code doesn't take pinned events, such as the NMI watchdog, into
> account. So some groups can pass validation, but then later still
> never schedule.

But if you first create the group and then a pinned event it 'works',
which is inconsistent and makes all this timing dependent.

> @@ -2011,9 +2011,11 @@ static int validate_event(struct perf_event *event)
>   */
>  static int validate_group(struct perf_event *event)
>  {
> +	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
>  	struct perf_event *leader = event->group_leader;
>  	struct cpu_hw_events *fake_cpuc;
> -	int ret = -EINVAL, n;
> +	struct perf_event *pinned_event;
> +	int ret = -EINVAL, n, i;
>  
>  	fake_cpuc = allocate_fake_cpuc();
>  	if (IS_ERR(fake_cpuc))
> @@ -2033,6 +2035,24 @@ static int validate_group(struct perf_event *event)
>  	if (n < 0)
>  		goto out;
>  
> +	/*
> +	 * The new group must can be scheduled
> +	 * together with current pinned events.
> +	 * Otherwise, it will never get a chance
> +	 * to be scheduled later.

That's wrapped short; also I don't think it is sufficient; what if you
happen to have a pinned event on CPU1 (and not others) and happen to run
validation for a new CPU1 event on CPUn ?

Also; per that same; it is broken, you're accessing the cpu-local cpuc
without serialization.

> +	 */
> +	for (i = 0; i < cpuc->n_events; i++) {
> +		pinned_event = cpuc->event_list[i];
> +		if (WARN_ON_ONCE(!pinned_event))
> +			continue;
> +		if (!pinned_event->attr.pinned)
> +			continue;
> +		fake_cpuc->n_events = n;
> +		n = collect_events(fake_cpuc, pinned_event, false);
> +		if (n < 0)
> +			goto out;
> +	}
> +
>  	fake_cpuc->n_events = 0;
>  	ret = x86_pmu.schedule_events(fake_cpuc, n, NULL);
>  
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] perf/x86: Consider pinned events for group validation
  2019-08-20 14:10 ` Peter Zijlstra
@ 2019-08-20 14:52   ` Liang, Kan
  2019-08-20 15:09     ` Peter Zijlstra
  0 siblings, 1 reply; 5+ messages in thread
From: Liang, Kan @ 2019-08-20 14:52 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, acme, linux-kernel, eranian, ak



On 8/20/2019 10:10 AM, Peter Zijlstra wrote:
> On Fri, Aug 16, 2019 at 10:49:10AM -0700, kan.liang@linux.intel.com wrote:
>> From: Kan Liang <kan.liang@linux.intel.com>
>>
>> perf stat -M metrics relies on weak groups to reject unschedulable
>> groups and run them as non-groups.
>> This uses the group validation code in the kernel. Unfortunately
>> that code doesn't take pinned events, such as the NMI watchdog, into
>> account. So some groups can pass validation, but then later still
>> never schedule.
> 
> But if you first create the group and then a pinned event it 'works',
> which is inconsistent and makes all this timing dependent.

I don't think so. The pinned event will be validated by 
validate_event(), which doesn't simulate the schedule.
So the validation still pass, but the group still never schedule.

> 
>> @@ -2011,9 +2011,11 @@ static int validate_event(struct perf_event *event)
>>    */
>>   static int validate_group(struct perf_event *event)
>>   {
>> +	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
>>   	struct perf_event *leader = event->group_leader;
>>   	struct cpu_hw_events *fake_cpuc;
>> -	int ret = -EINVAL, n;
>> +	struct perf_event *pinned_event;
>> +	int ret = -EINVAL, n, i;
>>   
>>   	fake_cpuc = allocate_fake_cpuc();
>>   	if (IS_ERR(fake_cpuc))
>> @@ -2033,6 +2035,24 @@ static int validate_group(struct perf_event *event)
>>   	if (n < 0)
>>   		goto out;
>>   
>> +	/*
>> +	 * The new group must can be scheduled
>> +	 * together with current pinned events.
>> +	 * Otherwise, it will never get a chance
>> +	 * to be scheduled later.
> 
> That's wrapped short; also I don't think it is sufficient; what if you
> happen to have a pinned event on CPU1 (and not others) and happen to run
> validation for a new CPU1 event on CPUn ?
>

The patch doesn't support this case. It is mentioned in the description.
The patch doesn't intend to catch all possible cases that cannot be 
scheduled. I think it's impossible to catch all cases.
We only want to improve the validate_group() a little bit to catch some 
common cases, e.g. NMI watchdog interacting with group.


> Also; per that same; it is broken, you're accessing the cpu-local cpuc
> without serialization.

Do you mean accessing all cpuc serially?
We only check the cpuc on current CPU here. It doesn't intend to access 
other cpuc.


Thanks,
Kan

> 
>> +	 */
>> +	for (i = 0; i < cpuc->n_events; i++) {
>> +		pinned_event = cpuc->event_list[i];
>> +		if (WARN_ON_ONCE(!pinned_event))
>> +			continue;
>> +		if (!pinned_event->attr.pinned)
>> +			continue;
>> +		fake_cpuc->n_events = n;
>> +		n = collect_events(fake_cpuc, pinned_event, false);
>> +		if (n < 0)
>> +			goto out;
>> +	}
>> +
>>   	fake_cpuc->n_events = 0;
>>   	ret = x86_pmu.schedule_events(fake_cpuc, n, NULL);
>>   
>> -- 
>> 2.7.4
>>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] perf/x86: Consider pinned events for group validation
  2019-08-20 14:52   ` Liang, Kan
@ 2019-08-20 15:09     ` Peter Zijlstra
  2019-08-20 17:13       ` Liang, Kan
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Zijlstra @ 2019-08-20 15:09 UTC (permalink / raw)
  To: Liang, Kan; +Cc: mingo, acme, linux-kernel, eranian, ak

On Tue, Aug 20, 2019 at 10:52:57AM -0400, Liang, Kan wrote:
> On 8/20/2019 10:10 AM, Peter Zijlstra wrote:
> > On Fri, Aug 16, 2019 at 10:49:10AM -0700, kan.liang@linux.intel.com wrote:
> > > From: Kan Liang <kan.liang@linux.intel.com>
> > > 
> > > perf stat -M metrics relies on weak groups to reject unschedulable
> > > groups and run them as non-groups.
> > > This uses the group validation code in the kernel. Unfortunately
> > > that code doesn't take pinned events, such as the NMI watchdog, into
> > > account. So some groups can pass validation, but then later still
> > > never schedule.
> > 
> > But if you first create the group and then a pinned event it 'works',
> > which is inconsistent and makes all this timing dependent.
> 
> I don't think so. The pinned event will be validated by validate_event(),
> which doesn't simulate the schedule.
> So the validation still pass, but the group still never schedule.

Exactly my point; so sometimes it fails creation and sometimes if fails
running. So now we have two failiure cases instead of one and the
reason might not always be evident.

> > > +	/*
> > > +	 * The new group must can be scheduled
> > > +	 * together with current pinned events.
> > > +	 * Otherwise, it will never get a chance
> > > +	 * to be scheduled later.
> > 
> > That's wrapped short; also I don't think it is sufficient; what if you
> > happen to have a pinned event on CPU1 (and not others) and happen to run
> > validation for a new CPU1 event on CPUn ?
> > 
> 
> The patch doesn't support this case.

Which makes the whole thing even more random.

> It is mentioned in the description.
> The patch doesn't intend to catch all possible cases that cannot be
> scheduled. I think it's impossible to catch all cases.
> We only want to improve the validate_group() a little bit to catch some
> common cases, e.g. NMI watchdog interacting with group.
> 
> > Also; per that same; it is broken, you're accessing the cpu-local cpuc
> > without serialization.
> 
> Do you mean accessing all cpuc serially?
> We only check the cpuc on current CPU here. It doesn't intend to access
> other cpuc.

There's nothing preventing the cpuc you're looking at changing while
you're looking at it. Heck, afaict it is possible to UaF here. Nothing
prevents the events you're looking at from going away and getting freed.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] perf/x86: Consider pinned events for group validation
  2019-08-20 15:09     ` Peter Zijlstra
@ 2019-08-20 17:13       ` Liang, Kan
  0 siblings, 0 replies; 5+ messages in thread
From: Liang, Kan @ 2019-08-20 17:13 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, acme, linux-kernel, eranian, ak

>>>> +	/*
>>>> +	 * The new group must can be scheduled
>>>> +	 * together with current pinned events.
>>>> +	 * Otherwise, it will never get a chance
>>>> +	 * to be scheduled later.
>>>
>>> That's wrapped short; also I don't think it is sufficient; what if you
>>> happen to have a pinned event on CPU1 (and not others) and happen to run
>>> validation for a new CPU1 event on CPUn ?
>>>
>>
>> The patch doesn't support this case.
> 
> Which makes the whole thing even more random.

Maybe we can use the cpuc on event->cpu. That could help a little here.
cpuc = per_cpu_ptr(&cpu_hw_events, event->cpu >= 0 ? event->cpu : 
raw_smp_processor_id());

> 
>> It is mentioned in the description.
>> The patch doesn't intend to catch all possible cases that cannot be
>> scheduled. I think it's impossible to catch all cases.
>> We only want to improve the validate_group() a little bit to catch some
>> common cases, e.g. NMI watchdog interacting with group.
>>
>>> Also; per that same; it is broken, you're accessing the cpu-local cpuc
>>> without serialization.
>>
>> Do you mean accessing all cpuc serially?
>> We only check the cpuc on current CPU here. It doesn't intend to access
>> other cpuc.
> 
> There's nothing preventing the cpuc you're looking at changing while
> you're looking at it. Heck, afaict it is possible to UaF here. Nothing
> prevents the events you're looking at from going away and getting freed.

You are right.
I think we can add a lock to prevent the event_list[] in x86_pmu_add() 
and x86_pmu_del().


Thanks,
Kan

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-08-20 17:13 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-16 17:49 [PATCH] perf/x86: Consider pinned events for group validation kan.liang
2019-08-20 14:10 ` Peter Zijlstra
2019-08-20 14:52   ` Liang, Kan
2019-08-20 15:09     ` Peter Zijlstra
2019-08-20 17:13       ` Liang, Kan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).