linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] watchdog/softlockup: Fix softlockup_stop_all() hungtask bug
@ 2021-09-22  9:52 Jinhui Guo
  2021-09-22 14:32 ` Petr Mladek
  0 siblings, 1 reply; 2+ messages in thread
From: Jinhui Guo @ 2021-09-22  9:52 UTC (permalink / raw)
  To: akpm, pmladek, peterz, valentin.schneider; +Cc: linux-kernel, guojinhui

If NR_CPUS equal to 1, it would trigger hungtask, it can be
triggered by follow command:
	echo 0 > /proc/sys/kernel/watchdog
	echo 1 > /proc/sys/kernel/watchdog
The hungtask stack:
	__schedule
	schedule
	schedule_timeout
	__wait_for_common
	softlockup_stop_fn
	lockup_detector_reconfigure
	proc_watchdog_common
	proc_watchdog
	proc_sys_call_handler
	vfs_write
	ksys_write
The watchdog_allowed_mask is completely cleared when the
watchdog is disabled. But the macro for_each_cpu() assume
all masks are "1" when macro NR_CPUS equal to 1. It makes
watchdog_allowed_mask not work at all.

Fixes: be45bf5395e0 ("watchdog/softlockup: Fix cpu_stop_queue_work() double-queue bug")

Cc: Petr Mladek <pmladek@suse.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Jinhui Guo <guojinhui@huawei.com>
---
 arch/x86/include/asm/smp.h | 5 +++++
 include/linux/cpumask.h    | 5 +++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
index 630ff08532be..f5d3ca5696b3 100644
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -21,7 +21,12 @@ DECLARE_PER_CPU_READ_MOSTLY(int, cpu_number);
 
 static inline struct cpumask *cpu_llc_shared_mask(int cpu)
 {
+#ifdef CONFIG_SMP
 	return per_cpu(cpu_llc_shared_map, cpu);
+#else
+	/* cpu_llc_shared_map is not defined while !CONFIG_SMP */
+	return cpu_all_mask;
+#endif
 }
 
 DECLARE_EARLY_PER_CPU_READ_MOSTLY(u16, x86_cpu_to_apicid);
diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index 5d4d07a9e1ed..1a35dbcc397d 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -175,10 +175,11 @@ static inline int cpumask_any_distribute(const struct cpumask *srcp)
 	return cpumask_first(srcp);
 }
 
+/* It should check cpumask in some special case, such as watchdog */
 #define for_each_cpu(cpu, mask)			\
-	for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask)
+	for ((cpu) = 0; (cpu) < 1 && test_bit(0, cpumask_bits(mask)); (cpu)++)
 #define for_each_cpu_not(cpu, mask)		\
-	for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask)
+	for ((cpu) = 0; (cpu) < 1 && !test_bit(0, cpumask_bits(mask)); (cpu)++)
 #define for_each_cpu_wrap(cpu, mask, start)	\
 	for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask, (void)(start))
 #define for_each_cpu_and(cpu, mask1, mask2)	\
-- 
2.12.3


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH] watchdog/softlockup: Fix softlockup_stop_all() hungtask bug
  2021-09-22  9:52 [PATCH] watchdog/softlockup: Fix softlockup_stop_all() hungtask bug Jinhui Guo
@ 2021-09-22 14:32 ` Petr Mladek
  0 siblings, 0 replies; 2+ messages in thread
From: Petr Mladek @ 2021-09-22 14:32 UTC (permalink / raw)
  To: Jinhui Guo; +Cc: akpm, peterz, valentin.schneider, linux-kernel

On Wed 2021-09-22 17:52:02, Jinhui Guo wrote:
> If NR_CPUS equal to 1, it would trigger hungtask, it can be
> triggered by follow command:
> 	echo 0 > /proc/sys/kernel/watchdog
> 	echo 1 > /proc/sys/kernel/watchdog
> The hungtask stack:
> 	__schedule
> 	schedule
> 	schedule_timeout
> 	__wait_for_common
> 	softlockup_stop_fn
> 	lockup_detector_reconfigure
> 	proc_watchdog_common
> 	proc_watchdog
> 	proc_sys_call_handler
> 	vfs_write
> 	ksys_write
> The watchdog_allowed_mask is completely cleared when the
> watchdog is disabled. But the macro for_each_cpu() assume
> all masks are "1" when macro NR_CPUS equal to 1. It makes
> watchdog_allowed_mask not work at all.
> 
> Fixes: be45bf5395e0 ("watchdog/softlockup: Fix cpu_stop_queue_work() double-queue bug")
> 
> Cc: Petr Mladek <pmladek@suse.com>
> Reported-by: kernel test robot <lkp@intel.com>
> Signed-off-by: Jinhui Guo <guojinhui@huawei.com>
> ---
>  arch/x86/include/asm/smp.h | 5 +++++
>  include/linux/cpumask.h    | 5 +++--
>  2 files changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
> index 630ff08532be..f5d3ca5696b3 100644
> --- a/arch/x86/include/asm/smp.h
> +++ b/arch/x86/include/asm/smp.h
> @@ -21,7 +21,12 @@ DECLARE_PER_CPU_READ_MOSTLY(int, cpu_number);
>  
>  static inline struct cpumask *cpu_llc_shared_mask(int cpu)
>  {
> +#ifdef CONFIG_SMP
>  	return per_cpu(cpu_llc_shared_map, cpu);
> +#else
> +	/* cpu_llc_shared_map is not defined while !CONFIG_SMP */
> +	return cpu_all_mask;
> +#endif

This looks dangerous. The cpumask returned can be modified,
for example, in

static void remove_siblinginfo(int cpu)
{
[...]
	for_each_cpu(sibling, cpu_llc_shared_mask(cpu))
		cpumask_clear_cpu(cpu, cpu_llc_shared_mask(sibling));
[...]
	cpumask_clear_cpu(cpu, cpu_sibling_setup_mask);
[...]
}

With you patch, this code would clear "cpu_all_mask" which
is wrong!

We need to fix this another way. I see few possibilities:

1. Define cpu_llc_shared_map even with NR_CPUS == 1.

2. Do not compile the problematic code with NR_CPUS == 1.
   It seems to be needed for CPU hotplug so it does not
   make sense to have it.

3. Define for_each_cpu_optimized() that would behave the same
   way as the non-patched for_each_cpu(). It can be
   used in remove_siblinginfo().

4. Do not patch for_each_cpu(). Instead, add for_each_cpu_safe()
   that would work correctly even with NR_CPUS == 1.
   It can than be used in watchdog.c.


IMHO, the 2nd solution would be the best if the change
is simple.

The 3rd solution is my other favorite. It would keep
the default for_each_cpu() safe. People might use
for_each_cpu_optimized() only when they know what
they are doing. It will be easy to find these
locations.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-09-22 14:32 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-22  9:52 [PATCH] watchdog/softlockup: Fix softlockup_stop_all() hungtask bug Jinhui Guo
2021-09-22 14:32 ` Petr Mladek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).