linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] kernel: set taint flag 'L' at any kind of lockup
@ 2019-11-29 13:21 Konstantin Khlebnikov
  2019-11-29 13:21 ` [PATCH 2/2] kernel: add sysctl kernel.nr_taints Konstantin Khlebnikov
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Konstantin Khlebnikov @ 2019-11-29 13:21 UTC (permalink / raw)
  To: linux-kernel, linux-doc
  Cc: Sasha Levin, Kees Cook, Greg Kroah-Hartman, rcu, Tejun Heo,
	Andrew Morton, Linus Torvalds, Thomas Gleixner

Any lockup or stall notifies about unexpected lack of progress.
It's better to know about them for further problem investigations.

Right now only softlockup has own taint flag. Let's generalize it.

This patch renames TAINT_SOFTLOCKUP into TAINT_LOCKUP at sets it for:
- softlockup
- hardlockup
- RCU stalls
- stuck in workqueues
- detected task hung

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
---
 Documentation/admin-guide/sysctl/kernel.rst   |    2 +-
 Documentation/admin-guide/tainted-kernels.rst |    8 ++++++--
 include/linux/kernel.h                        |    2 +-
 kernel/hung_task.c                            |    2 ++
 kernel/panic.c                                |    2 +-
 kernel/rcu/tree_stall.h                       |    1 +
 kernel/watchdog.c                             |    2 +-
 kernel/watchdog_hld.c                         |    1 +
 kernel/workqueue.c                            |    1 +
 tools/debugging/kernel-chktaint               |    2 +-
 10 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index 032c7cd3cede..60867ec525a4 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -1082,7 +1082,7 @@ ORed together. The letters are seen in "Tainted" line of Oops reports.
   2048  `(I)`  workaround for bug in platform firmware applied
   4096  `(O)`  externally-built ("out-of-tree") module was loaded
   8192  `(E)`  unsigned module was loaded
- 16384  `(L)`  soft lockup occurred
+ 16384  `(L)`  lockup occurred
  32768  `(K)`  kernel has been live patched
  65536  `(X)`  Auxiliary taint, defined and used by for distros
 131072  `(T)`  The kernel was built with the struct randomization plugin
diff --git a/Documentation/admin-guide/tainted-kernels.rst b/Documentation/admin-guide/tainted-kernels.rst
index 71e9184a9079..fc76d0aad9f5 100644
--- a/Documentation/admin-guide/tainted-kernels.rst
+++ b/Documentation/admin-guide/tainted-kernels.rst
@@ -96,7 +96,7 @@ Bit  Log  Number  Reason that got the kernel tainted
  11  _/I    2048  workaround for bug in platform firmware applied
  12  _/O    4096  externally-built ("out-of-tree") module was loaded
  13  _/E    8192  unsigned module was loaded
- 14  _/L   16384  soft lockup occurred
+ 14  _/L   16384  lockup occurred
  15  _/K   32768  kernel has been live patched
  16  _/X   65536  auxiliary taint, defined for and used by distros
  17  _/T  131072  kernel was built with the struct randomization plugin
@@ -152,7 +152,11 @@ More detailed explanation for tainting
  13) ``E`` if an unsigned module has been loaded in a kernel supporting
      module signature.
 
- 14) ``L`` if a soft lockup has previously occurred on the system.
+ 14) ``L`` if some kind of lockup has previously occurred on the system:
+     - soft/hardlockup, see Documentation/admin-guide/lockup-watchdogs.rst
+     - RCU stall, see Documentation/RCU/stallwarn.txt
+     - hung task detected, see CONFIG_DETECT_HUNG_TASK
+     - kernel workqueue lockup, see CONFIG_WQ_WATCHDOG
 
  15) ``K`` if the kernel has been live patched.
 
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index d83d403dac2e..e8a6808e4f2f 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -591,7 +591,7 @@ extern enum system_states {
 #define TAINT_FIRMWARE_WORKAROUND	11
 #define TAINT_OOT_MODULE		12
 #define TAINT_UNSIGNED_MODULE		13
-#define TAINT_SOFTLOCKUP		14
+#define TAINT_LOCKUP			14
 #define TAINT_LIVEPATCH			15
 #define TAINT_AUX			16
 #define TAINT_RANDSTRUCT		17
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 14a625c16cb3..521eb2fbf5fc 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -139,6 +139,8 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
 		hung_task_show_lock = true;
 	}
 
+	add_taint(TAINT_LOCKUP, LOCKDEP_STILL_OK);
+
 	touch_nmi_watchdog();
 }
 
diff --git a/kernel/panic.c b/kernel/panic.c
index f470a038b05b..d7750a45ca8d 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -372,7 +372,7 @@ const struct taint_flag taint_flags[TAINT_FLAGS_COUNT] = {
 	[ TAINT_FIRMWARE_WORKAROUND ]	= { 'I', ' ', false },
 	[ TAINT_OOT_MODULE ]		= { 'O', ' ', true },
 	[ TAINT_UNSIGNED_MODULE ]	= { 'E', ' ', true },
-	[ TAINT_SOFTLOCKUP ]		= { 'L', ' ', false },
+	[ TAINT_LOCKUP ]		= { 'L', ' ', false },
 	[ TAINT_LIVEPATCH ]		= { 'K', ' ', true },
 	[ TAINT_AUX ]			= { 'X', ' ', true },
 	[ TAINT_RANDSTRUCT ]		= { 'T', ' ', true },
diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index c0b8c458d8a6..181495efff80 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -74,6 +74,7 @@ early_initcall(check_cpu_stall_init);
 /* If so specified via sysctl, panic, yielding cleaner stall-warning output. */
 static void panic_on_rcu_stall(void)
 {
+	add_taint(TAINT_LOCKUP, LOCKDEP_STILL_OK);
 	if (sysctl_panic_on_rcu_stall)
 		panic("RCU Stall\n");
 }
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index f41334ef0971..d60b195708f7 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -466,7 +466,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 			smp_mb__after_atomic();
 		}
 
-		add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK);
+		add_taint(TAINT_LOCKUP, LOCKDEP_STILL_OK);
 		if (softlockup_panic)
 			panic("softlockup: hung tasks");
 		__this_cpu_write(soft_watchdog_warn, true);
diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index 247bf0b1582c..f77256f47422 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -152,6 +152,7 @@ static void watchdog_overflow_callback(struct perf_event *event,
 				!test_and_set_bit(0, &hardlockup_allcpu_dumped))
 			trigger_allbutself_cpu_backtrace();
 
+		add_taint(TAINT_LOCKUP, LOCKDEP_STILL_OK);
 		if (hardlockup_panic)
 			nmi_panic(regs, "Hard LOCKUP");
 
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index bc2e09a8ea61..825a92893882 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -5741,6 +5741,7 @@ static void wq_watchdog_timer_fn(struct timer_list *unused)
 			pr_cont_pool_info(pool);
 			pr_cont(" stuck for %us!\n",
 				jiffies_to_msecs(jiffies - pool_ts) / 1000);
+			add_taint(TAINT_LOCKUP, LOCKDEP_STILL_OK);
 		}
 	}
 
diff --git a/tools/debugging/kernel-chktaint b/tools/debugging/kernel-chktaint
index 2240cb56e6e5..9f24719d8c80 100755
--- a/tools/debugging/kernel-chktaint
+++ b/tools/debugging/kernel-chktaint
@@ -168,7 +168,7 @@ if [ `expr $T % 2` -eq 0 ]; then
 	addout " "
 else
 	addout "L"
-	echo " * soft lockup occurred (#14)"
+	echo " * lockup occurred (#14)"
 fi
 
 T=`expr $T / 2`


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] kernel: add sysctl kernel.nr_taints
  2019-11-29 13:21 [PATCH 1/2] kernel: set taint flag 'L' at any kind of lockup Konstantin Khlebnikov
@ 2019-11-29 13:21 ` Konstantin Khlebnikov
  2019-11-30 16:33   ` Kees Cook
  2019-11-30 16:22 ` [PATCH 1/2] kernel: set taint flag 'L' at any kind of lockup Kees Cook
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 6+ messages in thread
From: Konstantin Khlebnikov @ 2019-11-29 13:21 UTC (permalink / raw)
  To: linux-kernel, linux-doc
  Cc: Sasha Levin, Kees Cook, Greg Kroah-Hartman, rcu, Tejun Heo,
	Andrew Morton, Linus Torvalds, Thomas Gleixner

Once taint flag is set it's never cleared. Following taint could be detected
only via parsing kernel log messages which are different for each occasion.
For repeatable taints like TAINT_MACHINE_CHECK, TAINT_BAD_PAGE, TAINT_DIE,
TAINT_WARN, TAINT_LOCKUP it would be good to know count to see their rate.

This patch adds sysctl with vector of counters. One for each taint flag.
Counters are non-atomic in favor of simplicity. Exact count doesn't matter.
Writing vector of zeroes resets counters.

This is useful for detecting frequent problems with automatic monitoring.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
---
 include/linux/kernel.h |    1 +
 kernel/panic.c         |    2 ++
 kernel/sysctl.c        |    9 +++++++++
 3 files changed, 12 insertions(+)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index e8a6808e4f2f..900d02167bbd 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -604,6 +604,7 @@ struct taint_flag {
 };
 
 extern const struct taint_flag taint_flags[TAINT_FLAGS_COUNT];
+extern int sysctl_nr_taints[TAINT_FLAGS_COUNT];
 
 extern const char hex_asc[];
 #define hex_asc_lo(x)	hex_asc[((x) & 0x0f)]
diff --git a/kernel/panic.c b/kernel/panic.c
index d7750a45ca8d..a3df00ebcba2 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -39,6 +39,7 @@
 int panic_on_oops = CONFIG_PANIC_ON_OOPS_VALUE;
 static unsigned long tainted_mask =
 	IS_ENABLED(CONFIG_GCC_PLUGIN_RANDSTRUCT) ? (1 << TAINT_RANDSTRUCT) : 0;
+int sysctl_nr_taints[TAINT_FLAGS_COUNT];
 static int pause_on_oops;
 static int pause_on_oops_flag;
 static DEFINE_SPINLOCK(pause_on_oops_lock);
@@ -434,6 +435,7 @@ void add_taint(unsigned flag, enum lockdep_ok lockdep_ok)
 		pr_warn("Disabling lock debugging due to kernel taint\n");
 
 	set_bit(flag, &tainted_mask);
+	sysctl_nr_taints[flag]++;
 }
 EXPORT_SYMBOL(add_taint);
 
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index b6f2f35d0bcf..5d9727556cef 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -553,6 +553,15 @@ static struct ctl_table kern_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_taint,
 	},
+	{
+		.procname	= "nr_taints",
+		.data		= &sysctl_nr_taints,
+		.maxlen		= sizeof(sysctl_nr_taints),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= SYSCTL_ZERO,
+		.extra2		= SYSCTL_ZERO,
+	},
 	{
 		.procname	= "sysctl_writes_strict",
 		.data		= &sysctl_writes_strict,


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] kernel: set taint flag 'L' at any kind of lockup
  2019-11-29 13:21 [PATCH 1/2] kernel: set taint flag 'L' at any kind of lockup Konstantin Khlebnikov
  2019-11-29 13:21 ` [PATCH 2/2] kernel: add sysctl kernel.nr_taints Konstantin Khlebnikov
@ 2019-11-30 16:22 ` Kees Cook
  2019-12-02 20:03 ` Paul E. McKenney
  2020-01-16 11:19 ` Thomas Gleixner
  3 siblings, 0 replies; 6+ messages in thread
From: Kees Cook @ 2019-11-30 16:22 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-kernel, linux-doc, Sasha Levin, Greg Kroah-Hartman, rcu,
	Tejun Heo, Andrew Morton, Linus Torvalds, Thomas Gleixner

On Fri, Nov 29, 2019 at 04:21:46PM +0300, Konstantin Khlebnikov wrote:
> Any lockup or stall notifies about unexpected lack of progress.
> It's better to know about them for further problem investigations.
> 
> Right now only softlockup has own taint flag. Let's generalize it.
> 
> This patch renames TAINT_SOFTLOCKUP into TAINT_LOCKUP at sets it for:
> - softlockup
> - hardlockup
> - RCU stalls
> - stuck in workqueues
> - detected task hung

This seems reasonable to me. The only nit I have is that it might be
nicer to separate the rename from the new call sites, but if there's no
v2, I'm fine with it this way.

Reviewed-by: Kees Cook <keescook@chromium.org>

> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> ---
>  Documentation/admin-guide/sysctl/kernel.rst   |    2 +-
>  Documentation/admin-guide/tainted-kernels.rst |    8 ++++++--
>  include/linux/kernel.h                        |    2 +-
>  kernel/hung_task.c                            |    2 ++
>  kernel/panic.c                                |    2 +-
>  kernel/rcu/tree_stall.h                       |    1 +
>  kernel/watchdog.c                             |    2 +-
>  kernel/watchdog_hld.c                         |    1 +
>  kernel/workqueue.c                            |    1 +
>  tools/debugging/kernel-chktaint               |    2 +-
>  10 files changed, 16 insertions(+), 7 deletions(-)
> 
> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> index 032c7cd3cede..60867ec525a4 100644
> --- a/Documentation/admin-guide/sysctl/kernel.rst
> +++ b/Documentation/admin-guide/sysctl/kernel.rst
> @@ -1082,7 +1082,7 @@ ORed together. The letters are seen in "Tainted" line of Oops reports.
>    2048  `(I)`  workaround for bug in platform firmware applied
>    4096  `(O)`  externally-built ("out-of-tree") module was loaded
>    8192  `(E)`  unsigned module was loaded
> - 16384  `(L)`  soft lockup occurred
> + 16384  `(L)`  lockup occurred
>   32768  `(K)`  kernel has been live patched
>   65536  `(X)`  Auxiliary taint, defined and used by for distros
>  131072  `(T)`  The kernel was built with the struct randomization plugin
> diff --git a/Documentation/admin-guide/tainted-kernels.rst b/Documentation/admin-guide/tainted-kernels.rst
> index 71e9184a9079..fc76d0aad9f5 100644
> --- a/Documentation/admin-guide/tainted-kernels.rst
> +++ b/Documentation/admin-guide/tainted-kernels.rst
> @@ -96,7 +96,7 @@ Bit  Log  Number  Reason that got the kernel tainted
>   11  _/I    2048  workaround for bug in platform firmware applied
>   12  _/O    4096  externally-built ("out-of-tree") module was loaded
>   13  _/E    8192  unsigned module was loaded
> - 14  _/L   16384  soft lockup occurred
> + 14  _/L   16384  lockup occurred
>   15  _/K   32768  kernel has been live patched
>   16  _/X   65536  auxiliary taint, defined for and used by distros
>   17  _/T  131072  kernel was built with the struct randomization plugin
> @@ -152,7 +152,11 @@ More detailed explanation for tainting
>   13) ``E`` if an unsigned module has been loaded in a kernel supporting
>       module signature.
>  
> - 14) ``L`` if a soft lockup has previously occurred on the system.
> + 14) ``L`` if some kind of lockup has previously occurred on the system:
> +     - soft/hardlockup, see Documentation/admin-guide/lockup-watchdogs.rst
> +     - RCU stall, see Documentation/RCU/stallwarn.txt
> +     - hung task detected, see CONFIG_DETECT_HUNG_TASK
> +     - kernel workqueue lockup, see CONFIG_WQ_WATCHDOG
>  
>   15) ``K`` if the kernel has been live patched.
>  
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index d83d403dac2e..e8a6808e4f2f 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -591,7 +591,7 @@ extern enum system_states {
>  #define TAINT_FIRMWARE_WORKAROUND	11
>  #define TAINT_OOT_MODULE		12
>  #define TAINT_UNSIGNED_MODULE		13
> -#define TAINT_SOFTLOCKUP		14
> +#define TAINT_LOCKUP			14
>  #define TAINT_LIVEPATCH			15
>  #define TAINT_AUX			16
>  #define TAINT_RANDSTRUCT		17
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index 14a625c16cb3..521eb2fbf5fc 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -139,6 +139,8 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
>  		hung_task_show_lock = true;
>  	}
>  
> +	add_taint(TAINT_LOCKUP, LOCKDEP_STILL_OK);
> +
>  	touch_nmi_watchdog();
>  }
>  
> diff --git a/kernel/panic.c b/kernel/panic.c
> index f470a038b05b..d7750a45ca8d 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -372,7 +372,7 @@ const struct taint_flag taint_flags[TAINT_FLAGS_COUNT] = {
>  	[ TAINT_FIRMWARE_WORKAROUND ]	= { 'I', ' ', false },
>  	[ TAINT_OOT_MODULE ]		= { 'O', ' ', true },
>  	[ TAINT_UNSIGNED_MODULE ]	= { 'E', ' ', true },
> -	[ TAINT_SOFTLOCKUP ]		= { 'L', ' ', false },
> +	[ TAINT_LOCKUP ]		= { 'L', ' ', false },
>  	[ TAINT_LIVEPATCH ]		= { 'K', ' ', true },
>  	[ TAINT_AUX ]			= { 'X', ' ', true },
>  	[ TAINT_RANDSTRUCT ]		= { 'T', ' ', true },
> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> index c0b8c458d8a6..181495efff80 100644
> --- a/kernel/rcu/tree_stall.h
> +++ b/kernel/rcu/tree_stall.h
> @@ -74,6 +74,7 @@ early_initcall(check_cpu_stall_init);
>  /* If so specified via sysctl, panic, yielding cleaner stall-warning output. */
>  static void panic_on_rcu_stall(void)
>  {
> +	add_taint(TAINT_LOCKUP, LOCKDEP_STILL_OK);
>  	if (sysctl_panic_on_rcu_stall)
>  		panic("RCU Stall\n");
>  }
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index f41334ef0971..d60b195708f7 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -466,7 +466,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
>  			smp_mb__after_atomic();
>  		}
>  
> -		add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK);
> +		add_taint(TAINT_LOCKUP, LOCKDEP_STILL_OK);
>  		if (softlockup_panic)
>  			panic("softlockup: hung tasks");
>  		__this_cpu_write(soft_watchdog_warn, true);
> diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
> index 247bf0b1582c..f77256f47422 100644
> --- a/kernel/watchdog_hld.c
> +++ b/kernel/watchdog_hld.c
> @@ -152,6 +152,7 @@ static void watchdog_overflow_callback(struct perf_event *event,
>  				!test_and_set_bit(0, &hardlockup_allcpu_dumped))
>  			trigger_allbutself_cpu_backtrace();
>  
> +		add_taint(TAINT_LOCKUP, LOCKDEP_STILL_OK);
>  		if (hardlockup_panic)
>  			nmi_panic(regs, "Hard LOCKUP");
>  
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index bc2e09a8ea61..825a92893882 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -5741,6 +5741,7 @@ static void wq_watchdog_timer_fn(struct timer_list *unused)
>  			pr_cont_pool_info(pool);
>  			pr_cont(" stuck for %us!\n",
>  				jiffies_to_msecs(jiffies - pool_ts) / 1000);
> +			add_taint(TAINT_LOCKUP, LOCKDEP_STILL_OK);
>  		}
>  	}
>  
> diff --git a/tools/debugging/kernel-chktaint b/tools/debugging/kernel-chktaint
> index 2240cb56e6e5..9f24719d8c80 100755
> --- a/tools/debugging/kernel-chktaint
> +++ b/tools/debugging/kernel-chktaint
> @@ -168,7 +168,7 @@ if [ `expr $T % 2` -eq 0 ]; then
>  	addout " "
>  else
>  	addout "L"
> -	echo " * soft lockup occurred (#14)"
> +	echo " * lockup occurred (#14)"
>  fi
>  
>  T=`expr $T / 2`
> 

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] kernel: add sysctl kernel.nr_taints
  2019-11-29 13:21 ` [PATCH 2/2] kernel: add sysctl kernel.nr_taints Konstantin Khlebnikov
@ 2019-11-30 16:33   ` Kees Cook
  0 siblings, 0 replies; 6+ messages in thread
From: Kees Cook @ 2019-11-30 16:33 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-kernel, linux-doc, Sasha Levin, Greg Kroah-Hartman, rcu,
	Tejun Heo, Andrew Morton, Linus Torvalds, Thomas Gleixner

On Fri, Nov 29, 2019 at 04:21:48PM +0300, Konstantin Khlebnikov wrote:
> Once taint flag is set it's never cleared. Following taint could be detected
> only via parsing kernel log messages which are different for each occasion.
> For repeatable taints like TAINT_MACHINE_CHECK, TAINT_BAD_PAGE, TAINT_DIE,
> TAINT_WARN, TAINT_LOCKUP it would be good to know count to see their rate.
> 
> This patch adds sysctl with vector of counters. One for each taint flag.
> Counters are non-atomic in favor of simplicity. Exact count doesn't matter.
> Writing vector of zeroes resets counters.
> 
> This is useful for detecting frequent problems with automatic monitoring.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>

I like this, yeah. This would let LKDTM users reset taint counts to
re-check the same kernel, etc, without explicitly clearing the taint
flags which always seemed like a bad idea. :)

Reviewed-by: Kees Cook <keescook@chromium.org>

One nit below...

> ---
>  include/linux/kernel.h |    1 +
>  kernel/panic.c         |    2 ++
>  kernel/sysctl.c        |    9 +++++++++
>  3 files changed, 12 insertions(+)
> 
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index e8a6808e4f2f..900d02167bbd 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -604,6 +604,7 @@ struct taint_flag {
>  };
>  
>  extern const struct taint_flag taint_flags[TAINT_FLAGS_COUNT];
> +extern int sysctl_nr_taints[TAINT_FLAGS_COUNT];
>  
>  extern const char hex_asc[];
>  #define hex_asc_lo(x)	hex_asc[((x) & 0x0f)]
> diff --git a/kernel/panic.c b/kernel/panic.c
> index d7750a45ca8d..a3df00ebcba2 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -39,6 +39,7 @@
>  int panic_on_oops = CONFIG_PANIC_ON_OOPS_VALUE;
>  static unsigned long tainted_mask =
>  	IS_ENABLED(CONFIG_GCC_PLUGIN_RANDSTRUCT) ? (1 << TAINT_RANDSTRUCT) : 0;
> +int sysctl_nr_taints[TAINT_FLAGS_COUNT];
>  static int pause_on_oops;
>  static int pause_on_oops_flag;
>  static DEFINE_SPINLOCK(pause_on_oops_lock);
> @@ -434,6 +435,7 @@ void add_taint(unsigned flag, enum lockdep_ok lockdep_ok)
>  		pr_warn("Disabling lock debugging due to kernel taint\n");
>  
>  	set_bit(flag, &tainted_mask);
> +	sysctl_nr_taints[flag]++;

As long as we're changing this code, how about adding an explicit check
of "flag" against either ARRAY_SIZE(sysctl_nr_tains) or TAINT_FLAGS_COUNT?

It looks like only 1 caller isn't using a static value, though:
proc_taint(), so it would catch "overflows" there (it's already bounded
to the size of tainted_mask, but proc could set "unknown" taint flags).

-Kees

>  }
>  EXPORT_SYMBOL(add_taint);
>  
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index b6f2f35d0bcf..5d9727556cef 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -553,6 +553,15 @@ static struct ctl_table kern_table[] = {
>  		.mode		= 0644,
>  		.proc_handler	= proc_taint,
>  	},
> +	{
> +		.procname	= "nr_taints",
> +		.data		= &sysctl_nr_taints,
> +		.maxlen		= sizeof(sysctl_nr_taints),
> +		.mode		= 0644,
> +		.proc_handler	= proc_dointvec_minmax,
> +		.extra1		= SYSCTL_ZERO,
> +		.extra2		= SYSCTL_ZERO,
> +	},
>  	{
>  		.procname	= "sysctl_writes_strict",
>  		.data		= &sysctl_writes_strict,
> 

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] kernel: set taint flag 'L' at any kind of lockup
  2019-11-29 13:21 [PATCH 1/2] kernel: set taint flag 'L' at any kind of lockup Konstantin Khlebnikov
  2019-11-29 13:21 ` [PATCH 2/2] kernel: add sysctl kernel.nr_taints Konstantin Khlebnikov
  2019-11-30 16:22 ` [PATCH 1/2] kernel: set taint flag 'L' at any kind of lockup Kees Cook
@ 2019-12-02 20:03 ` Paul E. McKenney
  2020-01-16 11:19 ` Thomas Gleixner
  3 siblings, 0 replies; 6+ messages in thread
From: Paul E. McKenney @ 2019-12-02 20:03 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-kernel, linux-doc, Sasha Levin, Kees Cook,
	Greg Kroah-Hartman, rcu, Tejun Heo, Andrew Morton,
	Linus Torvalds, Thomas Gleixner

On Fri, Nov 29, 2019 at 04:21:46PM +0300, Konstantin Khlebnikov wrote:
> Any lockup or stall notifies about unexpected lack of progress.
> It's better to know about them for further problem investigations.
> 
> Right now only softlockup has own taint flag. Let's generalize it.
> 
> This patch renames TAINT_SOFTLOCKUP into TAINT_LOCKUP at sets it for:
> - softlockup
> - hardlockup
> - RCU stalls
> - stuck in workqueues
> - detected task hung
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>

From an RCU perspective,

Acked-by: Paul E. McKenney <paulmck@kernel.org>

> ---
>  Documentation/admin-guide/sysctl/kernel.rst   |    2 +-
>  Documentation/admin-guide/tainted-kernels.rst |    8 ++++++--
>  include/linux/kernel.h                        |    2 +-
>  kernel/hung_task.c                            |    2 ++
>  kernel/panic.c                                |    2 +-
>  kernel/rcu/tree_stall.h                       |    1 +
>  kernel/watchdog.c                             |    2 +-
>  kernel/watchdog_hld.c                         |    1 +
>  kernel/workqueue.c                            |    1 +
>  tools/debugging/kernel-chktaint               |    2 +-
>  10 files changed, 16 insertions(+), 7 deletions(-)
> 
> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> index 032c7cd3cede..60867ec525a4 100644
> --- a/Documentation/admin-guide/sysctl/kernel.rst
> +++ b/Documentation/admin-guide/sysctl/kernel.rst
> @@ -1082,7 +1082,7 @@ ORed together. The letters are seen in "Tainted" line of Oops reports.
>    2048  `(I)`  workaround for bug in platform firmware applied
>    4096  `(O)`  externally-built ("out-of-tree") module was loaded
>    8192  `(E)`  unsigned module was loaded
> - 16384  `(L)`  soft lockup occurred
> + 16384  `(L)`  lockup occurred
>   32768  `(K)`  kernel has been live patched
>   65536  `(X)`  Auxiliary taint, defined and used by for distros
>  131072  `(T)`  The kernel was built with the struct randomization plugin
> diff --git a/Documentation/admin-guide/tainted-kernels.rst b/Documentation/admin-guide/tainted-kernels.rst
> index 71e9184a9079..fc76d0aad9f5 100644
> --- a/Documentation/admin-guide/tainted-kernels.rst
> +++ b/Documentation/admin-guide/tainted-kernels.rst
> @@ -96,7 +96,7 @@ Bit  Log  Number  Reason that got the kernel tainted
>   11  _/I    2048  workaround for bug in platform firmware applied
>   12  _/O    4096  externally-built ("out-of-tree") module was loaded
>   13  _/E    8192  unsigned module was loaded
> - 14  _/L   16384  soft lockup occurred
> + 14  _/L   16384  lockup occurred
>   15  _/K   32768  kernel has been live patched
>   16  _/X   65536  auxiliary taint, defined for and used by distros
>   17  _/T  131072  kernel was built with the struct randomization plugin
> @@ -152,7 +152,11 @@ More detailed explanation for tainting
>   13) ``E`` if an unsigned module has been loaded in a kernel supporting
>       module signature.
>  
> - 14) ``L`` if a soft lockup has previously occurred on the system.
> + 14) ``L`` if some kind of lockup has previously occurred on the system:
> +     - soft/hardlockup, see Documentation/admin-guide/lockup-watchdogs.rst
> +     - RCU stall, see Documentation/RCU/stallwarn.txt
> +     - hung task detected, see CONFIG_DETECT_HUNG_TASK
> +     - kernel workqueue lockup, see CONFIG_WQ_WATCHDOG
>  
>   15) ``K`` if the kernel has been live patched.
>  
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index d83d403dac2e..e8a6808e4f2f 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -591,7 +591,7 @@ extern enum system_states {
>  #define TAINT_FIRMWARE_WORKAROUND	11
>  #define TAINT_OOT_MODULE		12
>  #define TAINT_UNSIGNED_MODULE		13
> -#define TAINT_SOFTLOCKUP		14
> +#define TAINT_LOCKUP			14
>  #define TAINT_LIVEPATCH			15
>  #define TAINT_AUX			16
>  #define TAINT_RANDSTRUCT		17
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index 14a625c16cb3..521eb2fbf5fc 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -139,6 +139,8 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
>  		hung_task_show_lock = true;
>  	}
>  
> +	add_taint(TAINT_LOCKUP, LOCKDEP_STILL_OK);
> +
>  	touch_nmi_watchdog();
>  }
>  
> diff --git a/kernel/panic.c b/kernel/panic.c
> index f470a038b05b..d7750a45ca8d 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -372,7 +372,7 @@ const struct taint_flag taint_flags[TAINT_FLAGS_COUNT] = {
>  	[ TAINT_FIRMWARE_WORKAROUND ]	= { 'I', ' ', false },
>  	[ TAINT_OOT_MODULE ]		= { 'O', ' ', true },
>  	[ TAINT_UNSIGNED_MODULE ]	= { 'E', ' ', true },
> -	[ TAINT_SOFTLOCKUP ]		= { 'L', ' ', false },
> +	[ TAINT_LOCKUP ]		= { 'L', ' ', false },
>  	[ TAINT_LIVEPATCH ]		= { 'K', ' ', true },
>  	[ TAINT_AUX ]			= { 'X', ' ', true },
>  	[ TAINT_RANDSTRUCT ]		= { 'T', ' ', true },
> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> index c0b8c458d8a6..181495efff80 100644
> --- a/kernel/rcu/tree_stall.h
> +++ b/kernel/rcu/tree_stall.h
> @@ -74,6 +74,7 @@ early_initcall(check_cpu_stall_init);
>  /* If so specified via sysctl, panic, yielding cleaner stall-warning output. */
>  static void panic_on_rcu_stall(void)
>  {
> +	add_taint(TAINT_LOCKUP, LOCKDEP_STILL_OK);
>  	if (sysctl_panic_on_rcu_stall)
>  		panic("RCU Stall\n");
>  }
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index f41334ef0971..d60b195708f7 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -466,7 +466,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
>  			smp_mb__after_atomic();
>  		}
>  
> -		add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK);
> +		add_taint(TAINT_LOCKUP, LOCKDEP_STILL_OK);
>  		if (softlockup_panic)
>  			panic("softlockup: hung tasks");
>  		__this_cpu_write(soft_watchdog_warn, true);
> diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
> index 247bf0b1582c..f77256f47422 100644
> --- a/kernel/watchdog_hld.c
> +++ b/kernel/watchdog_hld.c
> @@ -152,6 +152,7 @@ static void watchdog_overflow_callback(struct perf_event *event,
>  				!test_and_set_bit(0, &hardlockup_allcpu_dumped))
>  			trigger_allbutself_cpu_backtrace();
>  
> +		add_taint(TAINT_LOCKUP, LOCKDEP_STILL_OK);
>  		if (hardlockup_panic)
>  			nmi_panic(regs, "Hard LOCKUP");
>  
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index bc2e09a8ea61..825a92893882 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -5741,6 +5741,7 @@ static void wq_watchdog_timer_fn(struct timer_list *unused)
>  			pr_cont_pool_info(pool);
>  			pr_cont(" stuck for %us!\n",
>  				jiffies_to_msecs(jiffies - pool_ts) / 1000);
> +			add_taint(TAINT_LOCKUP, LOCKDEP_STILL_OK);
>  		}
>  	}
>  
> diff --git a/tools/debugging/kernel-chktaint b/tools/debugging/kernel-chktaint
> index 2240cb56e6e5..9f24719d8c80 100755
> --- a/tools/debugging/kernel-chktaint
> +++ b/tools/debugging/kernel-chktaint
> @@ -168,7 +168,7 @@ if [ `expr $T % 2` -eq 0 ]; then
>  	addout " "
>  else
>  	addout "L"
> -	echo " * soft lockup occurred (#14)"
> +	echo " * lockup occurred (#14)"
>  fi
>  
>  T=`expr $T / 2`
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] kernel: set taint flag 'L' at any kind of lockup
  2019-11-29 13:21 [PATCH 1/2] kernel: set taint flag 'L' at any kind of lockup Konstantin Khlebnikov
                   ` (2 preceding siblings ...)
  2019-12-02 20:03 ` Paul E. McKenney
@ 2020-01-16 11:19 ` Thomas Gleixner
  3 siblings, 0 replies; 6+ messages in thread
From: Thomas Gleixner @ 2020-01-16 11:19 UTC (permalink / raw)
  To: Konstantin Khlebnikov, linux-kernel, linux-doc
  Cc: Sasha Levin, Kees Cook, Greg Kroah-Hartman, rcu, Tejun Heo,
	Andrew Morton, Linus Torvalds

Konstantin Khlebnikov <khlebnikov@yandex-team.ru> writes:
> Any lockup or stall notifies about unexpected lack of progress.
> It's better to know about them for further problem investigations.
>
> Right now only softlockup has own taint flag. Let's generalize it.
>
> This patch renames TAINT_SOFTLOCKUP into TAINT_LOCKUP at sets it for:

Please search 'This patch' in Documentation/process/submitting-patches.rst

> - softlockup
> - hardlockup
> - RCU stalls
> - stuck in workqueues
> - detected task hung

This does too many things at once and wants to be split in pieces:

  1) Change the TAINT flag and update documentation

  2) Add the tainting to the places which are not yet covered

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-01-16 11:19 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-29 13:21 [PATCH 1/2] kernel: set taint flag 'L' at any kind of lockup Konstantin Khlebnikov
2019-11-29 13:21 ` [PATCH 2/2] kernel: add sysctl kernel.nr_taints Konstantin Khlebnikov
2019-11-30 16:33   ` Kees Cook
2019-11-30 16:22 ` [PATCH 1/2] kernel: set taint flag 'L' at any kind of lockup Kees Cook
2019-12-02 20:03 ` Paul E. McKenney
2020-01-16 11:19 ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).