linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] kernel/watchdog.c: Do not hardcode CPU 0 as the initial thread
@ 2016-11-29 13:15 Prarit Bhargava
  2016-11-29 21:07 ` Don Zickus
  2016-12-01 20:06 ` Don Zickus
  0 siblings, 2 replies; 5+ messages in thread
From: Prarit Bhargava @ 2016-11-29 13:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Prarit Bhargava, Borislav Petkov, Tejun Heo, Don Zickus,
	Hidehiro Kawai, Thomas Gleixner, Andi Kleen, Joshua Hunt,
	Ingo Molnar, Babu Moger

When CONFIG_BOOTPARAM_HOTPLUG_CPU0 is enabled, the socket containing the
boot cpu can be replaced.  During the hot add event, the message

NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.

is output implying that the NMI watchdog was disabled at some point.  This
is not the case and the message has caused confusion for users of systems
that support the removal of the boot cpu socket.

The watchdog code is coded to assume that cpu 0 is always the first cpu to
initialize the watchdog, and the last to stop its watchdog thread.  That
is not the case for initializing if cpu 0 has been removed and added.  The
removal case has never been correct because the smpboot code will remove
the watchdog threads starting with the lowest cpu number.

This patch adds watchdog_cpus to track the number of cpus with active NMI
watchdog threads so that the first and last thread can be used to set and
clear the value of firstcpu_err.  firstcpu_err is set when the first
watchdog thread is enabled, and cleared when the last watchdog thread is
disabled.

This patch is based on top of linux-next akpm-base.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Joshua Hunt <johunt@akamai.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Babu Moger <babu.moger@oracle.com>
---
 kernel/watchdog_hld.c |   25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index 84016c8aee6b..30761f7504ef 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -134,12 +134,14 @@ static void watchdog_overflow_callback(struct perf_event *event,
  * Reduce the watchdog noise by only printing messages
  * that are different from what cpu0 displayed.
  */
-static unsigned long cpu0_err;
+static unsigned long firstcpu_err;
+static atomic_t watchdog_cpus;
 
 int watchdog_nmi_enable(unsigned int cpu)
 {
 	struct perf_event_attr *wd_attr;
 	struct perf_event *event = per_cpu(watchdog_ev, cpu);
+	int firstcpu = 0;
 
 	/* nothing to do if the hard lockup detector is disabled */
 	if (!(watchdog_enabled & NMI_WATCHDOG_ENABLED))
@@ -153,19 +155,22 @@ int watchdog_nmi_enable(unsigned int cpu)
 	if (event != NULL)
 		goto out_enable;
 
+	if (atomic_inc_return(&watchdog_cpus) == 1)
+		firstcpu = 1;
+
 	wd_attr = &wd_hw_attr;
 	wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
 
 	/* Try to register using hardware perf events */
 	event = perf_event_create_kernel_counter(wd_attr, cpu, NULL, watchdog_overflow_callback, NULL);
 
-	/* save cpu0 error for future comparision */
-	if (cpu == 0 && IS_ERR(event))
-		cpu0_err = PTR_ERR(event);
+	/* save the first cpu's error for future comparision */
+	if (firstcpu && IS_ERR(event))
+		firstcpu_err = PTR_ERR(event);
 
 	if (!IS_ERR(event)) {
-		/* only print for cpu0 or different than cpu0 */
-		if (cpu == 0 || cpu0_err)
+		/* only print for the first cpu initialized */
+		if (firstcpu || firstcpu_err)
 			pr_info("enabled on all CPUs, permanently consumes one hw-PMU counter.\n");
 		goto out_save;
 	}
@@ -183,7 +188,7 @@ int watchdog_nmi_enable(unsigned int cpu)
 	smp_mb__after_atomic();
 
 	/* skip displaying the same error again */
-	if (cpu > 0 && (PTR_ERR(event) == cpu0_err))
+	if (!firstcpu && (PTR_ERR(event) == firstcpu_err))
 		return PTR_ERR(event);
 
 	/* vary the KERN level based on the returned errno */
@@ -219,9 +224,9 @@ void watchdog_nmi_disable(unsigned int cpu)
 
 		/* should be in cleanup, but blocks oprofile */
 		perf_event_release_kernel(event);
-	}
-	if (cpu == 0) {
+
 		/* watchdog_nmi_enable() expects this to be zero initially. */
-		cpu0_err = 0;
+		if (atomic_dec_and_test(&watchdog_cpus))
+			firstcpu_err = 0;
 	}
 }
-- 
1.7.9.3

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] kernel/watchdog.c: Do not hardcode CPU 0 as the initial thread
  2016-11-29 13:15 [PATCH] kernel/watchdog.c: Do not hardcode CPU 0 as the initial thread Prarit Bhargava
@ 2016-11-29 21:07 ` Don Zickus
  2016-12-01 20:06 ` Don Zickus
  1 sibling, 0 replies; 5+ messages in thread
From: Don Zickus @ 2016-11-29 21:07 UTC (permalink / raw)
  To: Prarit Bhargava
  Cc: linux-kernel, Borislav Petkov, Tejun Heo, Hidehiro Kawai,
	Thomas Gleixner, Andi Kleen, Joshua Hunt, Ingo Molnar,
	Babu Moger

On Tue, Nov 29, 2016 at 08:15:21AM -0500, Prarit Bhargava wrote:
> When CONFIG_BOOTPARAM_HOTPLUG_CPU0 is enabled, the socket containing the
> boot cpu can be replaced.  During the hot add event, the message
> 
> NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
> 
> is output implying that the NMI watchdog was disabled at some point.  This
> is not the case and the message has caused confusion for users of systems
> that support the removal of the boot cpu socket.
> 
> The watchdog code is coded to assume that cpu 0 is always the first cpu to
> initialize the watchdog, and the last to stop its watchdog thread.  That
> is not the case for initializing if cpu 0 has been removed and added.  The
> removal case has never been correct because the smpboot code will remove
> the watchdog threads starting with the lowest cpu number.
> 
> This patch adds watchdog_cpus to track the number of cpus with active NMI
> watchdog threads so that the first and last thread can be used to set and
> clear the value of firstcpu_err.  firstcpu_err is set when the first
> watchdog thread is enabled, and cleared when the last watchdog thread is
> disabled.
> 
> This patch is based on top of linux-next akpm-base.

Most of this patch is just renaming cpu0_err to firstcpu_err.  The main
trick is the global atomic watchdog_cpus and the small cleanup in
watchdog_nmi_disable.

Looks good to me.  Will try to test and verify things later this week.

Cheers,
Don

> 
> Signed-off-by: Prarit Bhargava <prarit@redhat.com>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Don Zickus <dzickus@redhat.com>
> Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Andi Kleen <ak@linux.intel.com>
> Cc: Joshua Hunt <johunt@akamai.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Babu Moger <babu.moger@oracle.com>
> ---
>  kernel/watchdog_hld.c |   25 +++++++++++++++----------
>  1 file changed, 15 insertions(+), 10 deletions(-)
> 
> diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
> index 84016c8aee6b..30761f7504ef 100644
> --- a/kernel/watchdog_hld.c
> +++ b/kernel/watchdog_hld.c
> @@ -134,12 +134,14 @@ static void watchdog_overflow_callback(struct perf_event *event,
>   * Reduce the watchdog noise by only printing messages
>   * that are different from what cpu0 displayed.
>   */
> -static unsigned long cpu0_err;
> +static unsigned long firstcpu_err;
> +static atomic_t watchdog_cpus;
>  
>  int watchdog_nmi_enable(unsigned int cpu)
>  {
>  	struct perf_event_attr *wd_attr;
>  	struct perf_event *event = per_cpu(watchdog_ev, cpu);
> +	int firstcpu = 0;
>  
>  	/* nothing to do if the hard lockup detector is disabled */
>  	if (!(watchdog_enabled & NMI_WATCHDOG_ENABLED))
> @@ -153,19 +155,22 @@ int watchdog_nmi_enable(unsigned int cpu)
>  	if (event != NULL)
>  		goto out_enable;
>  
> +	if (atomic_inc_return(&watchdog_cpus) == 1)
> +		firstcpu = 1;
> +
>  	wd_attr = &wd_hw_attr;
>  	wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
>  
>  	/* Try to register using hardware perf events */
>  	event = perf_event_create_kernel_counter(wd_attr, cpu, NULL, watchdog_overflow_callback, NULL);
>  
> -	/* save cpu0 error for future comparision */
> -	if (cpu == 0 && IS_ERR(event))
> -		cpu0_err = PTR_ERR(event);
> +	/* save the first cpu's error for future comparision */
> +	if (firstcpu && IS_ERR(event))
> +		firstcpu_err = PTR_ERR(event);
>  
>  	if (!IS_ERR(event)) {
> -		/* only print for cpu0 or different than cpu0 */
> -		if (cpu == 0 || cpu0_err)
> +		/* only print for the first cpu initialized */
> +		if (firstcpu || firstcpu_err)
>  			pr_info("enabled on all CPUs, permanently consumes one hw-PMU counter.\n");
>  		goto out_save;
>  	}
> @@ -183,7 +188,7 @@ int watchdog_nmi_enable(unsigned int cpu)
>  	smp_mb__after_atomic();
>  
>  	/* skip displaying the same error again */
> -	if (cpu > 0 && (PTR_ERR(event) == cpu0_err))
> +	if (!firstcpu && (PTR_ERR(event) == firstcpu_err))
>  		return PTR_ERR(event);
>  
>  	/* vary the KERN level based on the returned errno */
> @@ -219,9 +224,9 @@ void watchdog_nmi_disable(unsigned int cpu)
>  
>  		/* should be in cleanup, but blocks oprofile */
>  		perf_event_release_kernel(event);
> -	}
> -	if (cpu == 0) {
> +
>  		/* watchdog_nmi_enable() expects this to be zero initially. */
> -		cpu0_err = 0;
> +		if (atomic_dec_and_test(&watchdog_cpus))
> +			firstcpu_err = 0;
>  	}
>  }
> -- 
> 1.7.9.3
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] kernel/watchdog.c: Do not hardcode CPU 0 as the initial thread
  2016-11-29 13:15 [PATCH] kernel/watchdog.c: Do not hardcode CPU 0 as the initial thread Prarit Bhargava
  2016-11-29 21:07 ` Don Zickus
@ 2016-12-01 20:06 ` Don Zickus
  2017-01-03 21:19   ` Prarit Bhargava
  1 sibling, 1 reply; 5+ messages in thread
From: Don Zickus @ 2016-12-01 20:06 UTC (permalink / raw)
  To: Prarit Bhargava
  Cc: linux-kernel, Borislav Petkov, Tejun Heo, Hidehiro Kawai,
	Thomas Gleixner, Andi Kleen, Joshua Hunt, Ingo Molnar,
	Babu Moger

On Tue, Nov 29, 2016 at 08:15:21AM -0500, Prarit Bhargava wrote:
> When CONFIG_BOOTPARAM_HOTPLUG_CPU0 is enabled, the socket containing the
> boot cpu can be replaced.  During the hot add event, the message
> 
> NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
> 
> is output implying that the NMI watchdog was disabled at some point.  This
> is not the case and the message has caused confusion for users of systems
> that support the removal of the boot cpu socket.
> 
> The watchdog code is coded to assume that cpu 0 is always the first cpu to
> initialize the watchdog, and the last to stop its watchdog thread.  That
> is not the case for initializing if cpu 0 has been removed and added.  The
> removal case has never been correct because the smpboot code will remove
> the watchdog threads starting with the lowest cpu number.
> 
> This patch adds watchdog_cpus to track the number of cpus with active NMI
> watchdog threads so that the first and last thread can be used to set and
> clear the value of firstcpu_err.  firstcpu_err is set when the first
> watchdog thread is enabled, and cleared when the last watchdog thread is
> disabled.
> 
> This patch is based on top of linux-next akpm-base.

It passed my tests.  Thanks!

Acked-by: Don Zickus <dzickus@redhat.com>


> 
> Signed-off-by: Prarit Bhargava <prarit@redhat.com>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Don Zickus <dzickus@redhat.com>
> Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Andi Kleen <ak@linux.intel.com>
> Cc: Joshua Hunt <johunt@akamai.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Babu Moger <babu.moger@oracle.com>
> ---
>  kernel/watchdog_hld.c |   25 +++++++++++++++----------
>  1 file changed, 15 insertions(+), 10 deletions(-)
> 
> diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
> index 84016c8aee6b..30761f7504ef 100644
> --- a/kernel/watchdog_hld.c
> +++ b/kernel/watchdog_hld.c
> @@ -134,12 +134,14 @@ static void watchdog_overflow_callback(struct perf_event *event,
>   * Reduce the watchdog noise by only printing messages
>   * that are different from what cpu0 displayed.
>   */
> -static unsigned long cpu0_err;
> +static unsigned long firstcpu_err;
> +static atomic_t watchdog_cpus;
>  
>  int watchdog_nmi_enable(unsigned int cpu)
>  {
>  	struct perf_event_attr *wd_attr;
>  	struct perf_event *event = per_cpu(watchdog_ev, cpu);
> +	int firstcpu = 0;
>  
>  	/* nothing to do if the hard lockup detector is disabled */
>  	if (!(watchdog_enabled & NMI_WATCHDOG_ENABLED))
> @@ -153,19 +155,22 @@ int watchdog_nmi_enable(unsigned int cpu)
>  	if (event != NULL)
>  		goto out_enable;
>  
> +	if (atomic_inc_return(&watchdog_cpus) == 1)
> +		firstcpu = 1;
> +
>  	wd_attr = &wd_hw_attr;
>  	wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
>  
>  	/* Try to register using hardware perf events */
>  	event = perf_event_create_kernel_counter(wd_attr, cpu, NULL, watchdog_overflow_callback, NULL);
>  
> -	/* save cpu0 error for future comparision */
> -	if (cpu == 0 && IS_ERR(event))
> -		cpu0_err = PTR_ERR(event);
> +	/* save the first cpu's error for future comparision */
> +	if (firstcpu && IS_ERR(event))
> +		firstcpu_err = PTR_ERR(event);
>  
>  	if (!IS_ERR(event)) {
> -		/* only print for cpu0 or different than cpu0 */
> -		if (cpu == 0 || cpu0_err)
> +		/* only print for the first cpu initialized */
> +		if (firstcpu || firstcpu_err)
>  			pr_info("enabled on all CPUs, permanently consumes one hw-PMU counter.\n");
>  		goto out_save;
>  	}
> @@ -183,7 +188,7 @@ int watchdog_nmi_enable(unsigned int cpu)
>  	smp_mb__after_atomic();
>  
>  	/* skip displaying the same error again */
> -	if (cpu > 0 && (PTR_ERR(event) == cpu0_err))
> +	if (!firstcpu && (PTR_ERR(event) == firstcpu_err))
>  		return PTR_ERR(event);
>  
>  	/* vary the KERN level based on the returned errno */
> @@ -219,9 +224,9 @@ void watchdog_nmi_disable(unsigned int cpu)
>  
>  		/* should be in cleanup, but blocks oprofile */
>  		perf_event_release_kernel(event);
> -	}
> -	if (cpu == 0) {
> +
>  		/* watchdog_nmi_enable() expects this to be zero initially. */
> -		cpu0_err = 0;
> +		if (atomic_dec_and_test(&watchdog_cpus))
> +			firstcpu_err = 0;
>  	}
>  }
> -- 
> 1.7.9.3
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] kernel/watchdog.c: Do not hardcode CPU 0 as the initial thread
  2016-12-01 20:06 ` Don Zickus
@ 2017-01-03 21:19   ` Prarit Bhargava
  2017-01-05 16:17     ` Don Zickus
  0 siblings, 1 reply; 5+ messages in thread
From: Prarit Bhargava @ 2017-01-03 21:19 UTC (permalink / raw)
  To: Don Zickus
  Cc: linux-kernel, Borislav Petkov, Tejun Heo, Hidehiro Kawai,
	Thomas Gleixner, Andi Kleen, Joshua Hunt, Ingo Molnar,
	Babu Moger



On 12/01/2016 03:06 PM, Don Zickus wrote:
> On Tue, Nov 29, 2016 at 08:15:21AM -0500, Prarit Bhargava wrote:
>> When CONFIG_BOOTPARAM_HOTPLUG_CPU0 is enabled, the socket containing the
>> boot cpu can be replaced.  During the hot add event, the message
>>
>> NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
>>
>> is output implying that the NMI watchdog was disabled at some point.  This
>> is not the case and the message has caused confusion for users of systems
>> that support the removal of the boot cpu socket.
>>
>> The watchdog code is coded to assume that cpu 0 is always the first cpu to
>> initialize the watchdog, and the last to stop its watchdog thread.  That
>> is not the case for initializing if cpu 0 has been removed and added.  The
>> removal case has never been correct because the smpboot code will remove
>> the watchdog threads starting with the lowest cpu number.
>>
>> This patch adds watchdog_cpus to track the number of cpus with active NMI
>> watchdog threads so that the first and last thread can be used to set and
>> clear the value of firstcpu_err.  firstcpu_err is set when the first
>> watchdog thread is enabled, and cleared when the last watchdog thread is
>> disabled.
>>
>> This patch is based on top of linux-next akpm-base.
> 
> It passed my tests.  Thanks!
> 
> Acked-by: Don Zickus <dzickus@redhat.com>

Just re-pinging on this.  I haven't seen it picked up by anyone.

P.

> 
> 
>>
>> Signed-off-by: Prarit Bhargava <prarit@redhat.com>
>> Cc: Borislav Petkov <bp@suse.de>
>> Cc: Tejun Heo <tj@kernel.org>
>> Cc: Don Zickus <dzickus@redhat.com>
>> Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Andi Kleen <ak@linux.intel.com>
>> Cc: Joshua Hunt <johunt@akamai.com>
>> Cc: Ingo Molnar <mingo@kernel.org>
>> Cc: Babu Moger <babu.moger@oracle.com>
>> ---
>>  kernel/watchdog_hld.c |   25 +++++++++++++++----------
>>  1 file changed, 15 insertions(+), 10 deletions(-)
>>
>> diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
>> index 84016c8aee6b..30761f7504ef 100644
>> --- a/kernel/watchdog_hld.c
>> +++ b/kernel/watchdog_hld.c
>> @@ -134,12 +134,14 @@ static void watchdog_overflow_callback(struct perf_event *event,
>>   * Reduce the watchdog noise by only printing messages
>>   * that are different from what cpu0 displayed.
>>   */
>> -static unsigned long cpu0_err;
>> +static unsigned long firstcpu_err;
>> +static atomic_t watchdog_cpus;
>>  
>>  int watchdog_nmi_enable(unsigned int cpu)
>>  {
>>  	struct perf_event_attr *wd_attr;
>>  	struct perf_event *event = per_cpu(watchdog_ev, cpu);
>> +	int firstcpu = 0;
>>  
>>  	/* nothing to do if the hard lockup detector is disabled */
>>  	if (!(watchdog_enabled & NMI_WATCHDOG_ENABLED))
>> @@ -153,19 +155,22 @@ int watchdog_nmi_enable(unsigned int cpu)
>>  	if (event != NULL)
>>  		goto out_enable;
>>  
>> +	if (atomic_inc_return(&watchdog_cpus) == 1)
>> +		firstcpu = 1;
>> +
>>  	wd_attr = &wd_hw_attr;
>>  	wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
>>  
>>  	/* Try to register using hardware perf events */
>>  	event = perf_event_create_kernel_counter(wd_attr, cpu, NULL, watchdog_overflow_callback, NULL);
>>  
>> -	/* save cpu0 error for future comparision */
>> -	if (cpu == 0 && IS_ERR(event))
>> -		cpu0_err = PTR_ERR(event);
>> +	/* save the first cpu's error for future comparision */
>> +	if (firstcpu && IS_ERR(event))
>> +		firstcpu_err = PTR_ERR(event);
>>  
>>  	if (!IS_ERR(event)) {
>> -		/* only print for cpu0 or different than cpu0 */
>> -		if (cpu == 0 || cpu0_err)
>> +		/* only print for the first cpu initialized */
>> +		if (firstcpu || firstcpu_err)
>>  			pr_info("enabled on all CPUs, permanently consumes one hw-PMU counter.\n");
>>  		goto out_save;
>>  	}
>> @@ -183,7 +188,7 @@ int watchdog_nmi_enable(unsigned int cpu)
>>  	smp_mb__after_atomic();
>>  
>>  	/* skip displaying the same error again */
>> -	if (cpu > 0 && (PTR_ERR(event) == cpu0_err))
>> +	if (!firstcpu && (PTR_ERR(event) == firstcpu_err))
>>  		return PTR_ERR(event);
>>  
>>  	/* vary the KERN level based on the returned errno */
>> @@ -219,9 +224,9 @@ void watchdog_nmi_disable(unsigned int cpu)
>>  
>>  		/* should be in cleanup, but blocks oprofile */
>>  		perf_event_release_kernel(event);
>> -	}
>> -	if (cpu == 0) {
>> +
>>  		/* watchdog_nmi_enable() expects this to be zero initially. */
>> -		cpu0_err = 0;
>> +		if (atomic_dec_and_test(&watchdog_cpus))
>> +			firstcpu_err = 0;
>>  	}
>>  }
>> -- 
>> 1.7.9.3
>>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] kernel/watchdog.c: Do not hardcode CPU 0 as the initial thread
  2017-01-03 21:19   ` Prarit Bhargava
@ 2017-01-05 16:17     ` Don Zickus
  0 siblings, 0 replies; 5+ messages in thread
From: Don Zickus @ 2017-01-05 16:17 UTC (permalink / raw)
  To: Prarit Bhargava
  Cc: linux-kernel, Borislav Petkov, Tejun Heo, Hidehiro Kawai,
	Thomas Gleixner, Andi Kleen, Joshua Hunt, Ingo Molnar,
	Babu Moger, Andrew Morton

(cc'ing Andrew)

On Tue, Jan 03, 2017 at 04:19:50PM -0500, Prarit Bhargava wrote:
> 
> 
> On 12/01/2016 03:06 PM, Don Zickus wrote:
> > On Tue, Nov 29, 2016 at 08:15:21AM -0500, Prarit Bhargava wrote:
> >> When CONFIG_BOOTPARAM_HOTPLUG_CPU0 is enabled, the socket containing the
> >> boot cpu can be replaced.  During the hot add event, the message
> >>
> >> NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
> >>
> >> is output implying that the NMI watchdog was disabled at some point.  This
> >> is not the case and the message has caused confusion for users of systems
> >> that support the removal of the boot cpu socket.
> >>
> >> The watchdog code is coded to assume that cpu 0 is always the first cpu to
> >> initialize the watchdog, and the last to stop its watchdog thread.  That
> >> is not the case for initializing if cpu 0 has been removed and added.  The
> >> removal case has never been correct because the smpboot code will remove
> >> the watchdog threads starting with the lowest cpu number.
> >>
> >> This patch adds watchdog_cpus to track the number of cpus with active NMI
> >> watchdog threads so that the first and last thread can be used to set and
> >> clear the value of firstcpu_err.  firstcpu_err is set when the first
> >> watchdog thread is enabled, and cleared when the last watchdog thread is
> >> disabled.
> >>
> >> This patch is based on top of linux-next akpm-base.
> > 
> > It passed my tests.  Thanks!
> > 
> > Acked-by: Don Zickus <dzickus@redhat.com>
> 
> Just re-pinging on this.  I haven't seen it picked up by anyone.

Andrew usually picks this up for me.  Any concerns Andrew?

Cheers,
Don

> 
> P.
> 
> > 
> > 
> >>
> >> Signed-off-by: Prarit Bhargava <prarit@redhat.com>
> >> Cc: Borislav Petkov <bp@suse.de>
> >> Cc: Tejun Heo <tj@kernel.org>
> >> Cc: Don Zickus <dzickus@redhat.com>
> >> Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
> >> Cc: Thomas Gleixner <tglx@linutronix.de>
> >> Cc: Andi Kleen <ak@linux.intel.com>
> >> Cc: Joshua Hunt <johunt@akamai.com>
> >> Cc: Ingo Molnar <mingo@kernel.org>
> >> Cc: Babu Moger <babu.moger@oracle.com>
> >> ---
> >>  kernel/watchdog_hld.c |   25 +++++++++++++++----------
> >>  1 file changed, 15 insertions(+), 10 deletions(-)
> >>
> >> diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
> >> index 84016c8aee6b..30761f7504ef 100644
> >> --- a/kernel/watchdog_hld.c
> >> +++ b/kernel/watchdog_hld.c
> >> @@ -134,12 +134,14 @@ static void watchdog_overflow_callback(struct perf_event *event,
> >>   * Reduce the watchdog noise by only printing messages
> >>   * that are different from what cpu0 displayed.
> >>   */
> >> -static unsigned long cpu0_err;
> >> +static unsigned long firstcpu_err;
> >> +static atomic_t watchdog_cpus;
> >>  
> >>  int watchdog_nmi_enable(unsigned int cpu)
> >>  {
> >>  	struct perf_event_attr *wd_attr;
> >>  	struct perf_event *event = per_cpu(watchdog_ev, cpu);
> >> +	int firstcpu = 0;
> >>  
> >>  	/* nothing to do if the hard lockup detector is disabled */
> >>  	if (!(watchdog_enabled & NMI_WATCHDOG_ENABLED))
> >> @@ -153,19 +155,22 @@ int watchdog_nmi_enable(unsigned int cpu)
> >>  	if (event != NULL)
> >>  		goto out_enable;
> >>  
> >> +	if (atomic_inc_return(&watchdog_cpus) == 1)
> >> +		firstcpu = 1;
> >> +
> >>  	wd_attr = &wd_hw_attr;
> >>  	wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
> >>  
> >>  	/* Try to register using hardware perf events */
> >>  	event = perf_event_create_kernel_counter(wd_attr, cpu, NULL, watchdog_overflow_callback, NULL);
> >>  
> >> -	/* save cpu0 error for future comparision */
> >> -	if (cpu == 0 && IS_ERR(event))
> >> -		cpu0_err = PTR_ERR(event);
> >> +	/* save the first cpu's error for future comparision */
> >> +	if (firstcpu && IS_ERR(event))
> >> +		firstcpu_err = PTR_ERR(event);
> >>  
> >>  	if (!IS_ERR(event)) {
> >> -		/* only print for cpu0 or different than cpu0 */
> >> -		if (cpu == 0 || cpu0_err)
> >> +		/* only print for the first cpu initialized */
> >> +		if (firstcpu || firstcpu_err)
> >>  			pr_info("enabled on all CPUs, permanently consumes one hw-PMU counter.\n");
> >>  		goto out_save;
> >>  	}
> >> @@ -183,7 +188,7 @@ int watchdog_nmi_enable(unsigned int cpu)
> >>  	smp_mb__after_atomic();
> >>  
> >>  	/* skip displaying the same error again */
> >> -	if (cpu > 0 && (PTR_ERR(event) == cpu0_err))
> >> +	if (!firstcpu && (PTR_ERR(event) == firstcpu_err))
> >>  		return PTR_ERR(event);
> >>  
> >>  	/* vary the KERN level based on the returned errno */
> >> @@ -219,9 +224,9 @@ void watchdog_nmi_disable(unsigned int cpu)
> >>  
> >>  		/* should be in cleanup, but blocks oprofile */
> >>  		perf_event_release_kernel(event);
> >> -	}
> >> -	if (cpu == 0) {
> >> +
> >>  		/* watchdog_nmi_enable() expects this to be zero initially. */
> >> -		cpu0_err = 0;
> >> +		if (atomic_dec_and_test(&watchdog_cpus))
> >> +			firstcpu_err = 0;
> >>  	}
> >>  }
> >> -- 
> >> 1.7.9.3
> >>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-01-05 16:18 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-29 13:15 [PATCH] kernel/watchdog.c: Do not hardcode CPU 0 as the initial thread Prarit Bhargava
2016-11-29 21:07 ` Don Zickus
2016-12-01 20:06 ` Don Zickus
2017-01-03 21:19   ` Prarit Bhargava
2017-01-05 16:17     ` Don Zickus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).