linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] fix race caused by hyperthreads when online an offline cpu
@ 2017-01-14 11:21 Zhou Chengming
  0 siblings, 0 replies; 8+ messages in thread
From: Zhou Chengming @ 2017-01-14 11:21 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: tglx, mingo, peterz, hpa, ak, eranian, kan.liang, davidcc,
	dave.hansen, qiaonuohan, zhouchengming1, guohanjun

After online an offline cpu, cpu_hw_events.excl_thread_id will always be
set to 1 in intel_pmu_cpu_starting() even when its sibling's excl_thread_id
is also 1. Then the two siblings will use the same state in their shared
cpu_hw_events.excl_cntrs, it will cause race problem.

The race senario is like this:

Two cpu (7 and 19) are siblings, excl_thread_id of 7 and 19 are 0 and 1.
After offline and online cpu 7, intel_pmu_cpu_starting() will set excl_thread_id
of cpu 7 to 1. Then both cpu 7 and 19 will use the same state in their
shared cpu_hw_events.excl_cntrs.

cpu7					cpu19
---					---
intel_start_scheduling()
 set state->sched_started = true
					intel_put_excl_constraints() {
					 if (!state->sched_started)
					  spin_lock	// not executed
intel_stop_scheduling()
 set state->sched_started = false
					if (!state->sched_started)
					 spin_unlock	// excuted

Signed-off-by: NuoHan Qiao <qiaonuohan@huawei.com>
Signed-off-by: Zhou Chengming <zhouchengming1@huawei.com>
---
 arch/x86/events/intel/core.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index a74a2db..593d8c9 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3164,13 +3164,16 @@ static void intel_pmu_cpu_starting(int cpu)
 
 	if (x86_pmu.flags & PMU_FL_EXCL_CNTRS) {
 		for_each_cpu(i, topology_sibling_cpumask(cpu)) {
+			struct cpu_hw_events *sibling;
 			struct intel_excl_cntrs *c;
 
-			c = per_cpu(cpu_hw_events, i).excl_cntrs;
+			sibling = &per_cpu(cpu_hw_events, i);
+			c = sibling->excl_cntrs;
 			if (c && c->core_id == core_id) {
 				cpuc->kfree_on_online[1] = cpuc->excl_cntrs;
 				cpuc->excl_cntrs = c;
-				cpuc->excl_thread_id = 1;
+				if (!sibling->excl_thread_id)
+					cpuc->excl_thread_id = 1;
 				break;
 			}
 		}
-- 
1.7.7

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] fix race caused by hyperthreads when online an offline cpu
  2017-01-16 18:36     ` Stephane Eranian
@ 2017-01-17 10:05       ` Thomas Gleixner
  0 siblings, 0 replies; 8+ messages in thread
From: Thomas Gleixner @ 2017-01-17 10:05 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: zhouchengming, LKML, x86, mingo, Peter Zijlstra, H. Peter Anvin,
	ak, Liang, Kan, David Carrillo Cisneros, dave.hansen, qiaonuohan,
	guohanjun

On Mon, 16 Jan 2017, Stephane Eranian wrote:
> On Mon, Jan 16, 2017 at 1:53 AM, zhouchengming
> <zhouchengming1@huawei.com> wrote:
> > On 2017/1/16 17:05, Thomas Gleixner wrote:
> >>
> >> On Mon, 16 Jan 2017, Zhou Chengming wrote:
> >>
> >> Can you please stop sending the same patch over and over every other day?
> >>
> >> Granted, things get forgotten, but sending a polite reminder after a week
> >> is definitely enough.
> >>
> >> Maintainers are not machines responding within a split second on every
> >> mail
> >> they get. And that patch is not so substantial that it justifies that kind
> >> of spam.
> >>
> >
> > Very sorry for the noise. We are just not sure this is the right fix because
> > it's
> > hard to reproduce.
> >
> I believe this is the right fixed. I tried it and instrumented the
> code to verify thread_id
> assignment. The problem is easy to reproduce.
> 
> $ echo 0 >/sys/devices/system/cpu/cpu2/online
> $ echo 1 >/sys/devices/system/cpu/cpu2/online
> 
> Normally on Haswell Desktop part,  CPU2 gets thread_id 0 on boot, CPU6
> gets thread_id 1.
> If you offline CPU2 and bring it back in, it will get thread_id 1 and
> thus both sibling will point
> to the same exclusive state. The fix is, indeed, to check if the
> sibling is not already assigned 1,
> and if so to keep 0 for the CPU being online'd.

Right. So it's a simple static fully reproducible problem and not a race of
some sorts. I'll amend the changelog ....

Btw, this code has the hardcoded assumption two threads per core. So
anything which has more than two threads is broken vs. that exclusive
access. No idea whether that matters in practice, but I just noticed.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] fix race caused by hyperthreads when online an offline cpu
  2017-01-16  9:53   ` zhouchengming
@ 2017-01-16 18:36     ` Stephane Eranian
  2017-01-17 10:05       ` Thomas Gleixner
  0 siblings, 1 reply; 8+ messages in thread
From: Stephane Eranian @ 2017-01-16 18:36 UTC (permalink / raw)
  To: zhouchengming
  Cc: Thomas Gleixner, LKML, x86, mingo, Peter Zijlstra,
	H. Peter Anvin, ak, Liang, Kan, David Carrillo Cisneros,
	dave.hansen, qiaonuohan, guohanjun

On Mon, Jan 16, 2017 at 1:53 AM, zhouchengming
<zhouchengming1@huawei.com> wrote:
> On 2017/1/16 17:05, Thomas Gleixner wrote:
>>
>> On Mon, 16 Jan 2017, Zhou Chengming wrote:
>>
>> Can you please stop sending the same patch over and over every other day?
>>
>> Granted, things get forgotten, but sending a polite reminder after a week
>> is definitely enough.
>>
>> Maintainers are not machines responding within a split second on every
>> mail
>> they get. And that patch is not so substantial that it justifies that kind
>> of spam.
>>
>
> Very sorry for the noise. We are just not sure this is the right fix because
> it's
> hard to reproduce.
>
I believe this is the right fixed. I tried it and instrumented the
code to verify thread_id
assignment. The problem is easy to reproduce.

$ echo 0 >/sys/devices/system/cpu/cpu2/online
$ echo 1 >/sys/devices/system/cpu/cpu2/online

Normally on Haswell Desktop part,  CPU2 gets thread_id 0 on boot, CPU6
gets thread_id 1.
If you offline CPU2 and bring it back in, it will get thread_id 1 and
thus both sibling will point
to the same exclusive state. The fix is, indeed, to check if the
sibling is not already assigned 1,
and if so to keep 0 for the CPU being online'd.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] fix race caused by hyperthreads when online an offline cpu
  2017-01-16  9:05 ` Thomas Gleixner
@ 2017-01-16  9:53   ` zhouchengming
  2017-01-16 18:36     ` Stephane Eranian
  0 siblings, 1 reply; 8+ messages in thread
From: zhouchengming @ 2017-01-16  9:53 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, x86, mingo, peterz, hpa, ak, eranian, kan.liang,
	davidcc, dave.hansen, qiaonuohan, guohanjun

On 2017/1/16 17:05, Thomas Gleixner wrote:
> On Mon, 16 Jan 2017, Zhou Chengming wrote:
>
> Can you please stop sending the same patch over and over every other day?
>
> Granted, things get forgotten, but sending a polite reminder after a week
> is definitely enough.
>
> Maintainers are not machines responding within a split second on every mail
> they get. And that patch is not so substantial that it justifies that kind
> of spam.
>

Very sorry for the noise. We are just not sure this is the right fix because it's
hard to reproduce.

Thanks.

> Thanks,
>
> 	tglx
>
>
>
> .
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] fix race caused by hyperthreads when online an offline cpu
  2017-01-16  3:21 Zhou Chengming
@ 2017-01-16  9:05 ` Thomas Gleixner
  2017-01-16  9:53   ` zhouchengming
  0 siblings, 1 reply; 8+ messages in thread
From: Thomas Gleixner @ 2017-01-16  9:05 UTC (permalink / raw)
  To: Zhou Chengming
  Cc: linux-kernel, x86, mingo, peterz, hpa, ak, eranian, kan.liang,
	davidcc, dave.hansen, qiaonuohan, guohanjun

On Mon, 16 Jan 2017, Zhou Chengming wrote:

Can you please stop sending the same patch over and over every other day?

Granted, things get forgotten, but sending a polite reminder after a week
is definitely enough.

Maintainers are not machines responding within a split second on every mail
they get. And that patch is not so substantial that it justifies that kind
of spam.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] fix race caused by hyperthreads when online an offline cpu
@ 2017-01-16  3:21 Zhou Chengming
  2017-01-16  9:05 ` Thomas Gleixner
  0 siblings, 1 reply; 8+ messages in thread
From: Zhou Chengming @ 2017-01-16  3:21 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: tglx, mingo, peterz, hpa, ak, eranian, kan.liang, davidcc,
	dave.hansen, qiaonuohan, zhouchengming1, guohanjun

After online an offline cpu, cpu_hw_events.excl_thread_id will always be
set to 1 in intel_pmu_cpu_starting() even when its sibling's excl_thread_id
is also 1. Then the two siblings will use the same state in their shared
cpu_hw_events.excl_cntrs, it will cause race problem.

The race senario is like this:

Two cpu (7 and 19) are siblings, excl_thread_id of 7 and 19 are 0 and 1.
After offline and online cpu 7, intel_pmu_cpu_starting() will set excl_thread_id
of cpu 7 to 1. Then both cpu 7 and 19 will use the same state in their
shared cpu_hw_events.excl_cntrs.

cpu7					cpu19
---					---
intel_start_scheduling()
 set state->sched_started = true
					intel_put_excl_constraints() {
					 if (!state->sched_started)
					  spin_lock	// not executed
intel_stop_scheduling()
 set state->sched_started = false
					if (!state->sched_started)
					 spin_unlock	// excuted

Signed-off-by: NuoHan Qiao <qiaonuohan@huawei.com>
Signed-off-by: Zhou Chengming <zhouchengming1@huawei.com>
---
 arch/x86/events/intel/core.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index a74a2db..593d8c9 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3164,13 +3164,16 @@ static void intel_pmu_cpu_starting(int cpu)
 
 	if (x86_pmu.flags & PMU_FL_EXCL_CNTRS) {
 		for_each_cpu(i, topology_sibling_cpumask(cpu)) {
+			struct cpu_hw_events *sibling;
 			struct intel_excl_cntrs *c;
 
-			c = per_cpu(cpu_hw_events, i).excl_cntrs;
+			sibling = &per_cpu(cpu_hw_events, i);
+			c = sibling->excl_cntrs;
 			if (c && c->core_id == core_id) {
 				cpuc->kfree_on_online[1] = cpuc->excl_cntrs;
 				cpuc->excl_cntrs = c;
-				cpuc->excl_thread_id = 1;
+				if (!sibling->excl_thread_id)
+					cpuc->excl_thread_id = 1;
 				break;
 			}
 		}
-- 
1.7.7

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] fix race caused by hyperthreads when online an offline cpu
  2017-01-12  9:32 Zhou Chengming
@ 2017-01-14  1:33 ` qiaonuohan
  0 siblings, 0 replies; 8+ messages in thread
From: qiaonuohan @ 2017-01-14  1:33 UTC (permalink / raw)
  To: Zhou Chengming, tglx, mingo, hpa, x86, peterz, ak, eranian,
	kan.liang, davidcc, dave.hansen, linux-kernel
  Cc: guohanjun

ping...

On 2017/1/12 17:32, Zhou Chengming wrote:
> After online an offline cpu, cpu_hw_events.excl_thread_id will always be
> set to 1 in intel_pmu_cpu_starting() even when its sibling's excl_thread_id
> is also 1. Then the two siblings will use the same state in their shared
> hw_hw_events.excl_cntrs, it will cause race problem.
>
> The race senario is like this:
>
> Two cpu (7 and 19) are siblings, excl_thread_id of 7 and 19 are 0 and 1.
> After offline and online cpu 7, intel_pmu_cpu_starting() will set excl_thread_id
> of cpu 7 to 1. Then both cpu 7 and 19 will use the same state in their
> shared hw_hw_events.excl_cntrs.
>
> cpu7					cpu19
> ---					---
> intel_start_scheduling()
>  set state->sched_started = true
> 					intel_put_excl_constraints() {
> 					 if (!state->sched_started)
> 					  spin_lock	// not executed
> intel_stop_scheduling()
>  set state->sched_started = false
> 					if (!state->sched_started)
> 					 spin_unlock	// excuted
>
> Signed-off-by: NuoHan Qiao <qiaonuohan@huawei.com>
> Signed-off-by: Zhou Chengming <zhouchengming1@huawei.com>
> ---
>  arch/x86/events/intel/core.c |    7 +++++--
>  1 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index a74a2db..593d8c9 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -3164,13 +3164,16 @@ static void intel_pmu_cpu_starting(int cpu)
>
>  	if (x86_pmu.flags & PMU_FL_EXCL_CNTRS) {
>  		for_each_cpu(i, topology_sibling_cpumask(cpu)) {
> +			struct cpu_hw_events *sibling;
>  			struct intel_excl_cntrs *c;
>
> -			c = per_cpu(cpu_hw_events, i).excl_cntrs;
> +			sibling = &per_cpu(cpu_hw_events, i);
> +			c = sibling->excl_cntrs;
>  			if (c && c->core_id == core_id) {
>  				cpuc->kfree_on_online[1] = cpuc->excl_cntrs;
>  				cpuc->excl_cntrs = c;
> -				cpuc->excl_thread_id = 1;
> +				if (!sibling->excl_thread_id)
> +					cpuc->excl_thread_id = 1;
>  				break;
>  			}
>  		}
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] fix race caused by hyperthreads when online an offline cpu
@ 2017-01-12  9:32 Zhou Chengming
  2017-01-14  1:33 ` qiaonuohan
  0 siblings, 1 reply; 8+ messages in thread
From: Zhou Chengming @ 2017-01-12  9:32 UTC (permalink / raw)
  To: tglx, mingo, hpa, x86, peterz, ak, eranian, kan.liang, davidcc,
	dave.hansen, linux-kernel
  Cc: qiaonuohan, zhouchengming1, guohanjun

After online an offline cpu, cpu_hw_events.excl_thread_id will always be
set to 1 in intel_pmu_cpu_starting() even when its sibling's excl_thread_id
is also 1. Then the two siblings will use the same state in their shared
hw_hw_events.excl_cntrs, it will cause race problem.

The race senario is like this:

Two cpu (7 and 19) are siblings, excl_thread_id of 7 and 19 are 0 and 1.
After offline and online cpu 7, intel_pmu_cpu_starting() will set excl_thread_id
of cpu 7 to 1. Then both cpu 7 and 19 will use the same state in their
shared hw_hw_events.excl_cntrs.

cpu7					cpu19
---					---
intel_start_scheduling()
 set state->sched_started = true
					intel_put_excl_constraints() {
					 if (!state->sched_started)
					  spin_lock	// not executed
intel_stop_scheduling()
 set state->sched_started = false
					if (!state->sched_started)
					 spin_unlock	// excuted

Signed-off-by: NuoHan Qiao <qiaonuohan@huawei.com>
Signed-off-by: Zhou Chengming <zhouchengming1@huawei.com>
---
 arch/x86/events/intel/core.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index a74a2db..593d8c9 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3164,13 +3164,16 @@ static void intel_pmu_cpu_starting(int cpu)
 
 	if (x86_pmu.flags & PMU_FL_EXCL_CNTRS) {
 		for_each_cpu(i, topology_sibling_cpumask(cpu)) {
+			struct cpu_hw_events *sibling;
 			struct intel_excl_cntrs *c;
 
-			c = per_cpu(cpu_hw_events, i).excl_cntrs;
+			sibling = &per_cpu(cpu_hw_events, i);
+			c = sibling->excl_cntrs;
 			if (c && c->core_id == core_id) {
 				cpuc->kfree_on_online[1] = cpuc->excl_cntrs;
 				cpuc->excl_cntrs = c;
-				cpuc->excl_thread_id = 1;
+				if (!sibling->excl_thread_id)
+					cpuc->excl_thread_id = 1;
 				break;
 			}
 		}
-- 
1.7.7

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-01-17 10:44 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-14 11:21 [PATCH] fix race caused by hyperthreads when online an offline cpu Zhou Chengming
  -- strict thread matches above, loose matches on Subject: below --
2017-01-16  3:21 Zhou Chengming
2017-01-16  9:05 ` Thomas Gleixner
2017-01-16  9:53   ` zhouchengming
2017-01-16 18:36     ` Stephane Eranian
2017-01-17 10:05       ` Thomas Gleixner
2017-01-12  9:32 Zhou Chengming
2017-01-14  1:33 ` qiaonuohan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).