linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 1/3] compiler_types: Introduce the Clang __preserve_most function attribute
@ 2023-08-08 10:17 Marco Elver
  2023-08-08 10:17 ` [PATCH v3 2/3] list_debug: Introduce inline wrappers for debug checks Marco Elver
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Marco Elver @ 2023-08-08 10:17 UTC (permalink / raw)
  To: elver, Andrew Morton, Kees Cook
  Cc: Guenter Roeck, Peter Zijlstra, Mark Rutland, Steven Rostedt,
	Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Nathan Chancellor,
	Nick Desaulniers, Tom Rix, Miguel Ojeda, Sami Tolvanen,
	linux-arm-kernel, kvmarm, linux-kernel, llvm, Dmitry Vyukov,
	Alexander Potapenko, kasan-dev, linux-toolchains

[1]: "On X86-64 and AArch64 targets, this attribute changes the calling
convention of a function. The preserve_most calling convention attempts
to make the code in the caller as unintrusive as possible. This
convention behaves identically to the C calling convention on how
arguments and return values are passed, but it uses a different set of
caller/callee-saved registers. This alleviates the burden of saving and
recovering a large register set before and after the call in the caller.
If the arguments are passed in callee-saved registers, then they will be
preserved by the callee across the call. This doesn't apply for values
returned in callee-saved registers.

 * On X86-64 the callee preserves all general purpose registers, except
   for R11. R11 can be used as a scratch register. Floating-point
   registers (XMMs/YMMs) are not preserved and need to be saved by the
   caller.

 * On AArch64 the callee preserve all general purpose registers, except
   x0-X8 and X16-X18."

[1] https://clang.llvm.org/docs/AttributeReference.html#preserve-most

Introduce the attribute to compiler_types.h as __preserve_most.

Use of this attribute results in better code generation for calls to
very rarely called functions, such as error-reporting functions, or
rarely executed slow paths.

Beware that the attribute conflicts with instrumentation calls inserted
on function entry which do not use __preserve_most themselves. Notably,
function tracing which assumes the normal C calling convention for the
given architecture.  Where the attribute is supported, __preserve_most
will imply notrace. It is recommended to restrict use of the attribute
to functions that should or already disable tracing.

The attribute may be supported by a future GCC version (see
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110899).

Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Miguel Ojeda <ojeda@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
v3:
* Quote more from LLVM documentation about which registers are
  callee/caller with preserve_most.
* Code comment to restrict use where tracing is meant to be disabled.

v2:
* Imply notrace, to avoid any conflicts with tracing which is inserted
  on function entry. See added comments.
---
 include/linux/compiler_types.h | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
index 547ea1ff806e..c88488715a39 100644
--- a/include/linux/compiler_types.h
+++ b/include/linux/compiler_types.h
@@ -106,6 +106,34 @@ static inline void __chk_io_ptr(const volatile void __iomem *ptr) { }
 #define __cold
 #endif
 
+/*
+ * On x86-64 and arm64 targets, __preserve_most changes the calling convention
+ * of a function to make the code in the caller as unintrusive as possible. This
+ * convention behaves identically to the C calling convention on how arguments
+ * and return values are passed, but uses a different set of caller- and callee-
+ * saved registers.
+ *
+ * The purpose is to alleviates the burden of saving and recovering a large
+ * register set before and after the call in the caller.  This is beneficial for
+ * rarely taken slow paths, such as error-reporting functions that may be called
+ * from hot paths.
+ *
+ * Note: This may conflict with instrumentation inserted on function entry which
+ * does not use __preserve_most or equivalent convention (if in assembly). Since
+ * function tracing assumes the normal C calling convention, where the attribute
+ * is supported, __preserve_most implies notrace.  It is recommended to restrict
+ * use of the attribute to functions that should or already disable tracing.
+ *
+ * Optional: not supported by gcc.
+ *
+ * clang: https://clang.llvm.org/docs/AttributeReference.html#preserve-most
+ */
+#if __has_attribute(__preserve_most__)
+# define __preserve_most notrace __attribute__((__preserve_most__))
+#else
+# define __preserve_most
+#endif
+
 /* Builtins */
 
 /*
-- 
2.41.0.640.ga95def55d0-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 2/3] list_debug: Introduce inline wrappers for debug checks
  2023-08-08 10:17 [PATCH v3 1/3] compiler_types: Introduce the Clang __preserve_most function attribute Marco Elver
@ 2023-08-08 10:17 ` Marco Elver
  2023-08-08 10:17 ` [PATCH v3 3/3] list_debug: Introduce CONFIG_DEBUG_LIST_MINIMAL Marco Elver
  2023-08-08 12:35 ` [PATCH v3 1/3] compiler_types: Introduce the Clang __preserve_most function attribute Mark Rutland
  2 siblings, 0 replies; 12+ messages in thread
From: Marco Elver @ 2023-08-08 10:17 UTC (permalink / raw)
  To: elver, Andrew Morton, Kees Cook
  Cc: Guenter Roeck, Peter Zijlstra, Mark Rutland, Steven Rostedt,
	Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Nathan Chancellor,
	Nick Desaulniers, Tom Rix, Miguel Ojeda, Sami Tolvanen,
	linux-arm-kernel, kvmarm, linux-kernel, llvm, Dmitry Vyukov,
	Alexander Potapenko, kasan-dev, linux-toolchains

Turn the list debug checking functions __list_*_valid() into inline
functions that wrap the out-of-line functions. Care is taken to ensure
the inline wrappers are always inlined, so that additional compiler
instrumentation (such as sanitizers) does not result in redundant
outlining.

This change is preparation for performing checks in the inline wrappers.

No functional change intended.

Signed-off-by: Marco Elver <elver@google.com>
---
v3:
* Rename ___list_*_valid() to __list_*_valid_or_report().
* Some documentation.
---
 arch/arm64/kvm/hyp/nvhe/list_debug.c |  6 ++---
 include/linux/list.h                 | 37 +++++++++++++++++++++++++---
 lib/list_debug.c                     | 11 ++++-----
 3 files changed, 41 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/list_debug.c b/arch/arm64/kvm/hyp/nvhe/list_debug.c
index d68abd7ea124..16266a939a4c 100644
--- a/arch/arm64/kvm/hyp/nvhe/list_debug.c
+++ b/arch/arm64/kvm/hyp/nvhe/list_debug.c
@@ -26,8 +26,8 @@ static inline __must_check bool nvhe_check_data_corruption(bool v)
 
 /* The predicates checked here are taken from lib/list_debug.c. */
 
-bool __list_add_valid(struct list_head *new, struct list_head *prev,
-		      struct list_head *next)
+bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev,
+				struct list_head *next)
 {
 	if (NVHE_CHECK_DATA_CORRUPTION(next->prev != prev) ||
 	    NVHE_CHECK_DATA_CORRUPTION(prev->next != next) ||
@@ -37,7 +37,7 @@ bool __list_add_valid(struct list_head *new, struct list_head *prev,
 	return true;
 }
 
-bool __list_del_entry_valid(struct list_head *entry)
+bool __list_del_entry_valid_or_report(struct list_head *entry)
 {
 	struct list_head *prev, *next;
 
diff --git a/include/linux/list.h b/include/linux/list.h
index f10344dbad4d..130c6a1bb45c 100644
--- a/include/linux/list.h
+++ b/include/linux/list.h
@@ -39,10 +39,39 @@ static inline void INIT_LIST_HEAD(struct list_head *list)
 }
 
 #ifdef CONFIG_DEBUG_LIST
-extern bool __list_add_valid(struct list_head *new,
-			      struct list_head *prev,
-			      struct list_head *next);
-extern bool __list_del_entry_valid(struct list_head *entry);
+/*
+ * Performs the full set of list corruption checks before __list_add().
+ * On list corruption reports a warning, and returns false.
+ */
+extern bool __list_add_valid_or_report(struct list_head *new,
+				       struct list_head *prev,
+				       struct list_head *next);
+
+/*
+ * Performs list corruption checks before __list_add(). Returns false if a
+ * corruption is detected, true otherwise.
+ */
+static __always_inline bool __list_add_valid(struct list_head *new,
+					     struct list_head *prev,
+					     struct list_head *next)
+{
+	return __list_add_valid_or_report(new, prev, next);
+}
+
+/*
+ * Performs the full set of list corruption checks before __list_del_entry().
+ * On list corruption reports a warning, and returns false.
+ */
+extern bool __list_del_entry_valid_or_report(struct list_head *entry);
+
+/*
+ * Performs list corruption checks before __list_del_entry(). Returns false if a
+ * corruption is detected, true otherwise.
+ */
+static __always_inline bool __list_del_entry_valid(struct list_head *entry)
+{
+	return __list_del_entry_valid_or_report(entry);
+}
 #else
 static inline bool __list_add_valid(struct list_head *new,
 				struct list_head *prev,
diff --git a/lib/list_debug.c b/lib/list_debug.c
index d98d43f80958..2def33b1491f 100644
--- a/lib/list_debug.c
+++ b/lib/list_debug.c
@@ -17,8 +17,8 @@
  * attempt).
  */
 
-bool __list_add_valid(struct list_head *new, struct list_head *prev,
-		      struct list_head *next)
+bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev,
+				struct list_head *next)
 {
 	if (CHECK_DATA_CORRUPTION(prev == NULL,
 			"list_add corruption. prev is NULL.\n") ||
@@ -37,9 +37,9 @@ bool __list_add_valid(struct list_head *new, struct list_head *prev,
 
 	return true;
 }
-EXPORT_SYMBOL(__list_add_valid);
+EXPORT_SYMBOL(__list_add_valid_or_report);
 
-bool __list_del_entry_valid(struct list_head *entry)
+bool __list_del_entry_valid_or_report(struct list_head *entry)
 {
 	struct list_head *prev, *next;
 
@@ -65,6 +65,5 @@ bool __list_del_entry_valid(struct list_head *entry)
 		return false;
 
 	return true;
-
 }
-EXPORT_SYMBOL(__list_del_entry_valid);
+EXPORT_SYMBOL(__list_del_entry_valid_or_report);
-- 
2.41.0.640.ga95def55d0-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 3/3] list_debug: Introduce CONFIG_DEBUG_LIST_MINIMAL
  2023-08-08 10:17 [PATCH v3 1/3] compiler_types: Introduce the Clang __preserve_most function attribute Marco Elver
  2023-08-08 10:17 ` [PATCH v3 2/3] list_debug: Introduce inline wrappers for debug checks Marco Elver
@ 2023-08-08 10:17 ` Marco Elver
  2023-08-08 21:27   ` Kees Cook
  2023-08-08 12:35 ` [PATCH v3 1/3] compiler_types: Introduce the Clang __preserve_most function attribute Mark Rutland
  2 siblings, 1 reply; 12+ messages in thread
From: Marco Elver @ 2023-08-08 10:17 UTC (permalink / raw)
  To: elver, Andrew Morton, Kees Cook
  Cc: Guenter Roeck, Peter Zijlstra, Mark Rutland, Steven Rostedt,
	Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Nathan Chancellor,
	Nick Desaulniers, Tom Rix, Miguel Ojeda, Sami Tolvanen,
	linux-arm-kernel, kvmarm, linux-kernel, llvm, Dmitry Vyukov,
	Alexander Potapenko, kasan-dev, linux-toolchains

Numerous production kernel configs (see [1, 2]) are choosing to enable
CONFIG_DEBUG_LIST, which is also being recommended by KSPP for hardened
configs [3]. The feature has never been designed with performance in
mind, yet common list manipulation is happening across hot paths all
over the kernel.

Introduce CONFIG_DEBUG_LIST_MINIMAL, which performs list pointer
checking inline, and only upon list corruption delegates to the
reporting slow path.

To generate optimal machine code with CONFIG_DEBUG_LIST_MINIMAL:

  1. Elide checking for pointer values which upon dereference would
     result in an immediate access fault -- therefore "minimal" checks.
     The trade-off is lower-quality error reports.

  2. Use the newly introduced __preserve_most function attribute
     (available with Clang, but not yet with GCC) to minimize the code
     footprint for calling the reporting slow path. As a result,
     function size of callers is reduced by avoiding saving registers
     before calling the rarely called reporting slow path.

     Note that all TUs in lib/Makefile already disable function tracing,
     including list_debug.c, and __preserve_most's implied notrace has
     no effect in this case.

  3. Because the inline checks are a subset of the full set of checks in
     ___list_*_valid(), always return false if the inline checks failed.
     This avoids redundant compare and conditional branch right after
     return from the slow path.

As a side-effect of the checks being inline, if the compiler can prove
some condition to always be true, it can completely elide some checks.

Running netperf with CONFIG_DEBUG_LIST_MINIMAL (using a Clang compiler
with "preserve_most") shows throughput improvements, in my case of ~7%
on average (up to 20-30% on some test cases).

Link: https://r.android.com/1266735 [1]
Link: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/blob/main/config [2]
Link: https://kernsec.org/wiki/index.php/Kernel_Self_Protection_Project/Recommended_Settings [3]
Signed-off-by: Marco Elver <elver@google.com>
---
v3:
* Rename ___list_*_valid() to __list_*_valid_or_report().
* More comments.

v2:
* Note that lib/Makefile disables function tracing for everything and
  __preserve_most's implied notrace is a noop here.
---
 arch/arm64/kvm/hyp/nvhe/list_debug.c |  2 +
 include/linux/list.h                 | 64 +++++++++++++++++++++++++---
 lib/Kconfig.debug                    | 15 +++++++
 lib/list_debug.c                     |  2 +
 4 files changed, 77 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/list_debug.c b/arch/arm64/kvm/hyp/nvhe/list_debug.c
index 16266a939a4c..46a2d4f2b3c6 100644
--- a/arch/arm64/kvm/hyp/nvhe/list_debug.c
+++ b/arch/arm64/kvm/hyp/nvhe/list_debug.c
@@ -26,6 +26,7 @@ static inline __must_check bool nvhe_check_data_corruption(bool v)
 
 /* The predicates checked here are taken from lib/list_debug.c. */
 
+__list_valid_slowpath
 bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev,
 				struct list_head *next)
 {
@@ -37,6 +38,7 @@ bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev,
 	return true;
 }
 
+__list_valid_slowpath
 bool __list_del_entry_valid_or_report(struct list_head *entry)
 {
 	struct list_head *prev, *next;
diff --git a/include/linux/list.h b/include/linux/list.h
index 130c6a1bb45c..066fe33e99bf 100644
--- a/include/linux/list.h
+++ b/include/linux/list.h
@@ -39,38 +39,90 @@ static inline void INIT_LIST_HEAD(struct list_head *list)
 }
 
 #ifdef CONFIG_DEBUG_LIST
+
+#ifdef CONFIG_DEBUG_LIST_MINIMAL
+# define __list_valid_slowpath __cold __preserve_most
+#else
+# define __list_valid_slowpath
+#endif
+
 /*
  * Performs the full set of list corruption checks before __list_add().
  * On list corruption reports a warning, and returns false.
  */
-extern bool __list_add_valid_or_report(struct list_head *new,
-				       struct list_head *prev,
-				       struct list_head *next);
+extern bool __list_valid_slowpath __list_add_valid_or_report(struct list_head *new,
+							     struct list_head *prev,
+							     struct list_head *next);
 
 /*
  * Performs list corruption checks before __list_add(). Returns false if a
  * corruption is detected, true otherwise.
+ *
+ * With CONFIG_DEBUG_LIST_MINIMAL set, performs minimal list integrity checking
+ * (that do not result in a fault) inline, and only if a corruption is detected
+ * calls the reporting function __list_add_valid_or_report().
  */
 static __always_inline bool __list_add_valid(struct list_head *new,
 					     struct list_head *prev,
 					     struct list_head *next)
 {
-	return __list_add_valid_or_report(new, prev, next);
+	bool ret = true;
+
+	if (IS_ENABLED(CONFIG_DEBUG_LIST_MINIMAL)) {
+		/*
+		 * In the minimal config, elide checking if next and prev are
+		 * NULL, since the immediate dereference of them below would
+		 * result in a fault if NULL.
+		 *
+		 * With the minimal config we can afford to inline the checks,
+		 * which also gives the compiler a chance to elide some of them
+		 * completely if they can be proven at compile-time. If one of
+		 * the pre-conditions does not hold, the slow-path will show a
+		 * report which pre-condition failed.
+		 */
+		if (likely(next->prev == prev && prev->next == next && new != prev && new != next))
+			return true;
+		ret = false;
+	}
+
+	ret &= __list_add_valid_or_report(new, prev, next);
+	return ret;
 }
 
 /*
  * Performs the full set of list corruption checks before __list_del_entry().
  * On list corruption reports a warning, and returns false.
  */
-extern bool __list_del_entry_valid_or_report(struct list_head *entry);
+extern bool __list_valid_slowpath __list_del_entry_valid_or_report(struct list_head *entry);
 
 /*
  * Performs list corruption checks before __list_del_entry(). Returns false if a
  * corruption is detected, true otherwise.
+ *
+ * With CONFIG_DEBUG_LIST_MINIMAL set, performs minimal list integrity checking
+ * (that do not result in a fault) inline, and only if a corruption is detected
+ * calls the reporting function __list_del_entry_valid_or_report().
  */
 static __always_inline bool __list_del_entry_valid(struct list_head *entry)
 {
-	return __list_del_entry_valid_or_report(entry);
+	bool ret = true;
+
+	if (IS_ENABLED(CONFIG_DEBUG_LIST_MINIMAL)) {
+		struct list_head *prev = entry->prev;
+		struct list_head *next = entry->next;
+
+		/*
+		 * In the minimal config, elide checking if next and prev are
+		 * NULL, LIST_POISON1 or LIST_POISON2, since the immediate
+		 * dereference of them below would result in a fault.
+		 */
+		if (likely(prev->next == entry && next->prev == entry))
+			return true;
+		ret = false;
+	}
+
+	ret &= __list_del_entry_valid_or_report(entry);
+	return ret;
 }
 #else
 static inline bool __list_add_valid(struct list_head *new,
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index fbc89baf7de6..e72cf08af0fa 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1680,6 +1680,21 @@ config DEBUG_LIST
 
 	  If unsure, say N.
 
+config DEBUG_LIST_MINIMAL
+	bool "Minimal linked list debug checks"
+	default !DEBUG_KERNEL
+	depends on DEBUG_LIST
+	help
+	  Only perform the minimal set of checks in the linked-list walking
+	  routines to catch corruptions that are not guaranteed to result in an
+	  immediate access fault.
+
+	  This trades lower quality error reports for improved performance: the
+	  generated code should be more optimal and provide trade-offs that may
+	  better serve safety- and performance- critical environments.
+
+	  If unsure, say Y.
+
 config DEBUG_PLIST
 	bool "Debug priority linked list manipulation"
 	depends on DEBUG_KERNEL
diff --git a/lib/list_debug.c b/lib/list_debug.c
index 2def33b1491f..0ff547910dd0 100644
--- a/lib/list_debug.c
+++ b/lib/list_debug.c
@@ -17,6 +17,7 @@
  * attempt).
  */
 
+__list_valid_slowpath
 bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev,
 				struct list_head *next)
 {
@@ -39,6 +40,7 @@ bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev,
 }
 EXPORT_SYMBOL(__list_add_valid_or_report);
 
+__list_valid_slowpath
 bool __list_del_entry_valid_or_report(struct list_head *entry)
 {
 	struct list_head *prev, *next;
-- 
2.41.0.640.ga95def55d0-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/3] compiler_types: Introduce the Clang __preserve_most function attribute
  2023-08-08 10:17 [PATCH v3 1/3] compiler_types: Introduce the Clang __preserve_most function attribute Marco Elver
  2023-08-08 10:17 ` [PATCH v3 2/3] list_debug: Introduce inline wrappers for debug checks Marco Elver
  2023-08-08 10:17 ` [PATCH v3 3/3] list_debug: Introduce CONFIG_DEBUG_LIST_MINIMAL Marco Elver
@ 2023-08-08 12:35 ` Mark Rutland
  2 siblings, 0 replies; 12+ messages in thread
From: Mark Rutland @ 2023-08-08 12:35 UTC (permalink / raw)
  To: Marco Elver
  Cc: Andrew Morton, Kees Cook, Guenter Roeck, Peter Zijlstra,
	Steven Rostedt, Marc Zyngier, Oliver Upton, James Morse,
	Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon,
	Nathan Chancellor, Nick Desaulniers, Tom Rix, Miguel Ojeda,
	Sami Tolvanen, linux-arm-kernel, kvmarm, linux-kernel, llvm,
	Dmitry Vyukov, Alexander Potapenko, kasan-dev, linux-toolchains

On Tue, Aug 08, 2023 at 12:17:25PM +0200, Marco Elver wrote:
> [1]: "On X86-64 and AArch64 targets, this attribute changes the calling
> convention of a function. The preserve_most calling convention attempts
> to make the code in the caller as unintrusive as possible. This
> convention behaves identically to the C calling convention on how
> arguments and return values are passed, but it uses a different set of
> caller/callee-saved registers. This alleviates the burden of saving and
> recovering a large register set before and after the call in the caller.
> If the arguments are passed in callee-saved registers, then they will be
> preserved by the callee across the call. This doesn't apply for values
> returned in callee-saved registers.
> 
>  * On X86-64 the callee preserves all general purpose registers, except
>    for R11. R11 can be used as a scratch register. Floating-point
>    registers (XMMs/YMMs) are not preserved and need to be saved by the
>    caller.
> 
>  * On AArch64 the callee preserve all general purpose registers, except
>    x0-X8 and X16-X18."
> 
> [1] https://clang.llvm.org/docs/AttributeReference.html#preserve-most
> 
> Introduce the attribute to compiler_types.h as __preserve_most.
> 
> Use of this attribute results in better code generation for calls to
> very rarely called functions, such as error-reporting functions, or
> rarely executed slow paths.
> 
> Beware that the attribute conflicts with instrumentation calls inserted
> on function entry which do not use __preserve_most themselves. Notably,
> function tracing which assumes the normal C calling convention for the
> given architecture.  Where the attribute is supported, __preserve_most
> will imply notrace. It is recommended to restrict use of the attribute
> to functions that should or already disable tracing.
> 
> The attribute may be supported by a future GCC version (see
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110899).
> 
> Signed-off-by: Marco Elver <elver@google.com>
> Reviewed-by: Miguel Ojeda <ojeda@kernel.org>
> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>

So long as this implies notrace I believe it is safe, so:

Acked-by: Mark Rutland <mark.rutland@arm.com>

Mark.

> ---
> v3:
> * Quote more from LLVM documentation about which registers are
>   callee/caller with preserve_most.
> * Code comment to restrict use where tracing is meant to be disabled.
> 
> v2:
> * Imply notrace, to avoid any conflicts with tracing which is inserted
>   on function entry. See added comments.
> ---
>  include/linux/compiler_types.h | 28 ++++++++++++++++++++++++++++
>  1 file changed, 28 insertions(+)
> 
> diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
> index 547ea1ff806e..c88488715a39 100644
> --- a/include/linux/compiler_types.h
> +++ b/include/linux/compiler_types.h
> @@ -106,6 +106,34 @@ static inline void __chk_io_ptr(const volatile void __iomem *ptr) { }
>  #define __cold
>  #endif
>  
> +/*
> + * On x86-64 and arm64 targets, __preserve_most changes the calling convention
> + * of a function to make the code in the caller as unintrusive as possible. This
> + * convention behaves identically to the C calling convention on how arguments
> + * and return values are passed, but uses a different set of caller- and callee-
> + * saved registers.
> + *
> + * The purpose is to alleviates the burden of saving and recovering a large
> + * register set before and after the call in the caller.  This is beneficial for
> + * rarely taken slow paths, such as error-reporting functions that may be called
> + * from hot paths.
> + *
> + * Note: This may conflict with instrumentation inserted on function entry which
> + * does not use __preserve_most or equivalent convention (if in assembly). Since
> + * function tracing assumes the normal C calling convention, where the attribute
> + * is supported, __preserve_most implies notrace.  It is recommended to restrict
> + * use of the attribute to functions that should or already disable tracing.
> + *
> + * Optional: not supported by gcc.
> + *
> + * clang: https://clang.llvm.org/docs/AttributeReference.html#preserve-most
> + */
> +#if __has_attribute(__preserve_most__)
> +# define __preserve_most notrace __attribute__((__preserve_most__))
> +#else
> +# define __preserve_most
> +#endif
> +
>  /* Builtins */
>  
>  /*
> -- 
> 2.41.0.640.ga95def55d0-goog
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 3/3] list_debug: Introduce CONFIG_DEBUG_LIST_MINIMAL
  2023-08-08 10:17 ` [PATCH v3 3/3] list_debug: Introduce CONFIG_DEBUG_LIST_MINIMAL Marco Elver
@ 2023-08-08 21:27   ` Kees Cook
  2023-08-09  7:35     ` Marco Elver
  0 siblings, 1 reply; 12+ messages in thread
From: Kees Cook @ 2023-08-08 21:27 UTC (permalink / raw)
  To: Marco Elver
  Cc: Andrew Morton, Guenter Roeck, Peter Zijlstra, Mark Rutland,
	Steven Rostedt, Marc Zyngier, Oliver Upton, James Morse,
	Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon,
	Nathan Chancellor, Nick Desaulniers, Tom Rix, Miguel Ojeda,
	Sami Tolvanen, linux-arm-kernel, kvmarm, linux-kernel, llvm,
	Dmitry Vyukov, Alexander Potapenko, kasan-dev, linux-toolchains

On Tue, Aug 08, 2023 at 12:17:27PM +0200, Marco Elver wrote:
> Numerous production kernel configs (see [1, 2]) are choosing to enable
> CONFIG_DEBUG_LIST, which is also being recommended by KSPP for hardened
> configs [3]. The feature has never been designed with performance in
> mind, yet common list manipulation is happening across hot paths all
> over the kernel.
> 
> Introduce CONFIG_DEBUG_LIST_MINIMAL, which performs list pointer
> checking inline, and only upon list corruption delegates to the
> reporting slow path.

I'd really like to get away from calling this "DEBUG", since it's used
more for hardening (CONFIG_LIST_HARDENED?). Will Deacon spent some time
making this better a while back, but the series never landed. Do you
have a bit of time to look through it?

https://github.com/KSPP/linux/issues/10
https://lore.kernel.org/lkml/20200324153643.15527-1-will@kernel.org/

-Kees

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 3/3] list_debug: Introduce CONFIG_DEBUG_LIST_MINIMAL
  2023-08-08 21:27   ` Kees Cook
@ 2023-08-09  7:35     ` Marco Elver
  2023-08-09  9:57       ` Marco Elver
  0 siblings, 1 reply; 12+ messages in thread
From: Marco Elver @ 2023-08-09  7:35 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andrew Morton, Guenter Roeck, Peter Zijlstra, Mark Rutland,
	Steven Rostedt, Marc Zyngier, Oliver Upton, James Morse,
	Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon,
	Nathan Chancellor, Nick Desaulniers, Tom Rix, Miguel Ojeda,
	Sami Tolvanen, linux-arm-kernel, kvmarm, linux-kernel, llvm,
	Dmitry Vyukov, Alexander Potapenko, kasan-dev, linux-toolchains

On Tue, 8 Aug 2023 at 23:27, Kees Cook <keescook@chromium.org> wrote:
>
> On Tue, Aug 08, 2023 at 12:17:27PM +0200, Marco Elver wrote:
> > Numerous production kernel configs (see [1, 2]) are choosing to enable
> > CONFIG_DEBUG_LIST, which is also being recommended by KSPP for hardened
> > configs [3]. The feature has never been designed with performance in
> > mind, yet common list manipulation is happening across hot paths all
> > over the kernel.
> >
> > Introduce CONFIG_DEBUG_LIST_MINIMAL, which performs list pointer
> > checking inline, and only upon list corruption delegates to the
> > reporting slow path.
>
> I'd really like to get away from calling this "DEBUG", since it's used
> more for hardening (CONFIG_LIST_HARDENED?). Will Deacon spent some time
> making this better a while back, but the series never landed. Do you
> have a bit of time to look through it?
>
> https://github.com/KSPP/linux/issues/10
> https://lore.kernel.org/lkml/20200324153643.15527-1-will@kernel.org/

I'm fine renaming this one. But there are other issues that Will's
series solves, which I don't want this series to depend on. We can try
to sort them out separately.

The main problem here is that DEBUG_LIST has been designed to be
friendly for debugging (incl. checking poison values and NULL). Some
kernel devs may still want that, but for production use is pointless
and wasteful.

So what I can propose is to introduce CONFIG_LIST_HARDENED that
doesn't depend on CONFIG_DEBUG_LIST, but instead selects it, because
we still use that code to produce a report.

If there are other list types that have similar debug checks, but
where we can optimize performance by eliding some and moving them
inline, we can do the same (CONFIG_*_HARDENED). If the checks are
already optimized, we could just rename it to CONFIG_*_HARDENED and
remove the DEBUG-name.

I'm not a big fan of the 2 modes either, but that's what we get if we
want to support the debugging and hardening usecases. Inlining the
full set of checks does not work for performance and size.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 3/3] list_debug: Introduce CONFIG_DEBUG_LIST_MINIMAL
  2023-08-09  7:35     ` Marco Elver
@ 2023-08-09  9:57       ` Marco Elver
  2023-08-09 15:30         ` Steven Rostedt
  0 siblings, 1 reply; 12+ messages in thread
From: Marco Elver @ 2023-08-09  9:57 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andrew Morton, Guenter Roeck, Peter Zijlstra, Mark Rutland,
	Steven Rostedt, Marc Zyngier, Oliver Upton, James Morse,
	Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon,
	Nathan Chancellor, Nick Desaulniers, Tom Rix, Miguel Ojeda,
	Sami Tolvanen, linux-arm-kernel, kvmarm, linux-kernel, llvm,
	Dmitry Vyukov, Alexander Potapenko, kasan-dev, linux-toolchains

On Wed, Aug 09, 2023 at 09:35AM +0200, Marco Elver wrote:

> > I'd really like to get away from calling this "DEBUG", since it's used
> > more for hardening (CONFIG_LIST_HARDENED?). Will Deacon spent some time
> > making this better a while back, but the series never landed. Do you
> > have a bit of time to look through it?
> >
> > https://github.com/KSPP/linux/issues/10
> > https://lore.kernel.org/lkml/20200324153643.15527-1-will@kernel.org/
> 
> I'm fine renaming this one. But there are other issues that Will's
> series solves, which I don't want this series to depend on. We can try
> to sort them out separately.
> 
> The main problem here is that DEBUG_LIST has been designed to be
> friendly for debugging (incl. checking poison values and NULL). Some
> kernel devs may still want that, but for production use is pointless
> and wasteful.
> 
> So what I can propose is to introduce CONFIG_LIST_HARDENED that
> doesn't depend on CONFIG_DEBUG_LIST, but instead selects it, because
> we still use that code to produce a report.

How about the below?

We'll add CONFIG_HARDEN_LIST (in Kconfig.hardening), which is
independent of CONFIG_DEBUG_LIST. For the implementation it selects
DEBUG_LIST, but irrelevant for users.

This will get us the best of both worlds: a version for hardening that
should remain as fast as possible, and one for debugging with better
reports.

------ >8 ------

From: Marco Elver <elver@google.com>
Date: Thu, 27 Jul 2023 22:19:02 +0200
Subject: [PATCH v4 3/3] list: Introduce CONFIG_HARDEN_LIST

Numerous production kernel configs (see [1, 2]) are choosing to enable
CONFIG_DEBUG_LIST, which is also being recommended by KSPP for hardened
configs [3]. The motivation behind this is that the option can be used
as a security hardening feature (e.g. CVE-2019-2215 and CVE-2019-2025
are mitigated by the option [4]).

The feature has never been designed with performance in mind, yet common
list manipulation is happening across hot paths all over the kernel.

Introduce CONFIG_HARDEN_LIST, which performs list pointer checking
inline, and only upon list corruption calls the reporting slow path.

To generate optimal machine code with CONFIG_HARDEN_LIST:

  1. Elide checking for pointer values which upon dereference would
     result in an immediate access fault -- therefore "minimal" checks.
     The trade-off is lower-quality error reports.

  2. Use the newly introduced __preserve_most function attribute
     (available with Clang, but not yet with GCC) to minimize the code
     footprint for calling the reporting slow path. As a result,
     function size of callers is reduced by avoiding saving registers
     before calling the rarely called reporting slow path.

     Note that all TUs in lib/Makefile already disable function tracing,
     including list_debug.c, and __preserve_most's implied notrace has
     no effect in this case.

  3. Because the inline checks are a subset of the full set of checks in
     __list_*_valid_or_report(), always return false if the inline
     checks failed.  This avoids redundant compare and conditional
     branch right after return from the slow path.

As a side-effect of the checks being inline, if the compiler can prove
some condition to always be true, it can completely elide some checks.

Running netperf with CONFIG_HARDEN_LIST (using a Clang compiler with
"preserve_most") shows throughput improvements, in my case of ~7% on
average (up to 20-30% on some test cases).

Link: https://r.android.com/1266735 [1]
Link: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/blob/main/config [2]
Link: https://kernsec.org/wiki/index.php/Kernel_Self_Protection_Project/Recommended_Settings [3]
Link: https://googleprojectzero.blogspot.com/2019/11/bad-binder-android-in-wild-exploit.html [4]
Signed-off-by: Marco Elver <elver@google.com>
---
v4:
* Rename to CONFIG_HARDEN_LIST, which can independently be selected from
  CONFIG_DEBUG_LIST.

v3:
* Rename ___list_*_valid() to __list_*_valid_or_report().
* More comments.

v2:
* Note that lib/Makefile disables function tracing for everything and
  __preserve_most's implied notrace is a noop here.
---
 arch/arm64/kvm/hyp/nvhe/list_debug.c |  2 +
 include/linux/list.h                 | 64 +++++++++++++++++++++++++---
 lib/Kconfig.debug                    | 12 ++++--
 lib/list_debug.c                     |  2 +
 security/Kconfig.hardening           | 14 ++++++
 5 files changed, 84 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/list_debug.c b/arch/arm64/kvm/hyp/nvhe/list_debug.c
index 16266a939a4c..46a2d4f2b3c6 100644
--- a/arch/arm64/kvm/hyp/nvhe/list_debug.c
+++ b/arch/arm64/kvm/hyp/nvhe/list_debug.c
@@ -26,6 +26,7 @@ static inline __must_check bool nvhe_check_data_corruption(bool v)
 
 /* The predicates checked here are taken from lib/list_debug.c. */
 
+__list_valid_slowpath
 bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev,
 				struct list_head *next)
 {
@@ -37,6 +38,7 @@ bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev,
 	return true;
 }
 
+__list_valid_slowpath
 bool __list_del_entry_valid_or_report(struct list_head *entry)
 {
 	struct list_head *prev, *next;
diff --git a/include/linux/list.h b/include/linux/list.h
index 130c6a1bb45c..1c7f70b7cc7a 100644
--- a/include/linux/list.h
+++ b/include/linux/list.h
@@ -39,38 +39,90 @@ static inline void INIT_LIST_HEAD(struct list_head *list)
 }
 
 #ifdef CONFIG_DEBUG_LIST
+
+#ifdef CONFIG_HARDEN_LIST
+# define __list_valid_slowpath __cold __preserve_most
+#else
+# define __list_valid_slowpath
+#endif
+
 /*
  * Performs the full set of list corruption checks before __list_add().
  * On list corruption reports a warning, and returns false.
  */
-extern bool __list_add_valid_or_report(struct list_head *new,
-				       struct list_head *prev,
-				       struct list_head *next);
+extern bool __list_valid_slowpath __list_add_valid_or_report(struct list_head *new,
+							     struct list_head *prev,
+							     struct list_head *next);
 
 /*
  * Performs list corruption checks before __list_add(). Returns false if a
  * corruption is detected, true otherwise.
+ *
+ * With CONFIG_HARDEN_LIST set, performs minimal list integrity checking (that
+ * do not result in a fault) inline, and only if a corruption is detected calls
+ * the reporting function __list_add_valid_or_report().
  */
 static __always_inline bool __list_add_valid(struct list_head *new,
 					     struct list_head *prev,
 					     struct list_head *next)
 {
-	return __list_add_valid_or_report(new, prev, next);
+	bool ret = true;
+
+	if (IS_ENABLED(CONFIG_HARDEN_LIST)) {
+		/*
+		 * With the hardening version, elide checking if next and prev
+		 * are NULL, since the immediate dereference of them below would
+		 * result in a fault if NULL.
+		 *
+		 * With the reduced set of checks, we can afford to inline the
+		 * checks, which also gives the compiler a chance to elide some
+		 * of them completely if they can be proven at compile-time. If
+		 * one of the pre-conditions does not hold, the slow-path will
+		 * show a report which pre-condition failed.
+		 */
+		if (likely(next->prev == prev && prev->next == next && new != prev && new != next))
+			return true;
+		ret = false;
+	}
+
+	ret &= __list_add_valid_or_report(new, prev, next);
+	return ret;
 }
 
 /*
  * Performs the full set of list corruption checks before __list_del_entry().
  * On list corruption reports a warning, and returns false.
  */
-extern bool __list_del_entry_valid_or_report(struct list_head *entry);
+extern bool __list_valid_slowpath __list_del_entry_valid_or_report(struct list_head *entry);
 
 /*
  * Performs list corruption checks before __list_del_entry(). Returns false if a
  * corruption is detected, true otherwise.
+ *
+ * With CONFIG_HARDEN_LIST set, performs minimal list integrity checking (that
+ * do not result in a fault) inline, and only if a corruption is detected calls
+ * the reporting function __list_del_entry_valid_or_report().
  */
 static __always_inline bool __list_del_entry_valid(struct list_head *entry)
 {
-	return __list_del_entry_valid_or_report(entry);
+	bool ret = true;
+
+	if (IS_ENABLED(CONFIG_HARDEN_LIST)) {
+		struct list_head *prev = entry->prev;
+		struct list_head *next = entry->next;
+
+		/*
+		 * With the hardening version, elide checking if next and prev
+		 * are NULL, LIST_POISON1 or LIST_POISON2, since the immediate
+		 * dereference of them below would result in a fault.
+		 */
+		if (likely(prev->next == entry && next->prev == entry))
+			return true;
+		ret = false;
+	}
+
+	ret &= __list_del_entry_valid_or_report(entry);
+	return ret;
 }
 #else
 static inline bool __list_add_valid(struct list_head *new,
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index fbc89baf7de6..6b0de78fb2da 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1672,11 +1672,15 @@ config HAVE_DEBUG_BUGVERBOSE
 menu "Debug kernel data structures"
 
 config DEBUG_LIST
-	bool "Debug linked list manipulation"
-	depends on DEBUG_KERNEL || BUG_ON_DATA_CORRUPTION
+	bool "Debug linked list manipulation" if !HARDEN_LIST
+	depends on DEBUG_KERNEL || BUG_ON_DATA_CORRUPTION || HARDEN_LIST
 	help
-	  Enable this to turn on extended checks in the linked-list
-	  walking routines.
+	  Enable this to turn on extended checks in the linked-list walking
+	  routines.
+
+	  If you care about performance, you should enable CONFIG_HARDEN_LIST
+	  instead.  This option alone trades better quality error reports for
+	  worse performance, and is more suitable for debugging.
 
 	  If unsure, say N.
 
diff --git a/lib/list_debug.c b/lib/list_debug.c
index 2def33b1491f..0ff547910dd0 100644
--- a/lib/list_debug.c
+++ b/lib/list_debug.c
@@ -17,6 +17,7 @@
  * attempt).
  */
 
+__list_valid_slowpath
 bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev,
 				struct list_head *next)
 {
@@ -39,6 +40,7 @@ bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev,
 }
 EXPORT_SYMBOL(__list_add_valid_or_report);
 
+__list_valid_slowpath
 bool __list_del_entry_valid_or_report(struct list_head *entry)
 {
 	struct list_head *prev, *next;
diff --git a/security/Kconfig.hardening b/security/Kconfig.hardening
index 0f295961e773..a8aef895f13d 100644
--- a/security/Kconfig.hardening
+++ b/security/Kconfig.hardening
@@ -279,6 +279,20 @@ config ZERO_CALL_USED_REGS
 
 endmenu
 
+menu "Hardening of kernel data structures"
+
+config HARDEN_LIST
+	bool "Check integrity of linked list manipulation"
+	select DEBUG_LIST
+	help
+	  Minimal integrity checking in the linked-list manipulation routines
+	  to catch memory corruptions that are not guaranteed to result in an
+	  immediate access fault.
+
+	  If unsure, say N.
+
+endmenu
+
 config CC_HAS_RANDSTRUCT
 	def_bool $(cc-option,-frandomize-layout-seed-file=/dev/null)
 	# Randstruct was first added in Clang 15, but it isn't safe to use until
-- 
2.41.0.640.ga95def55d0-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 3/3] list_debug: Introduce CONFIG_DEBUG_LIST_MINIMAL
  2023-08-09  9:57       ` Marco Elver
@ 2023-08-09 15:30         ` Steven Rostedt
  2023-08-09 16:32           ` Marco Elver
  0 siblings, 1 reply; 12+ messages in thread
From: Steven Rostedt @ 2023-08-09 15:30 UTC (permalink / raw)
  To: Marco Elver
  Cc: Kees Cook, Andrew Morton, Guenter Roeck, Peter Zijlstra,
	Mark Rutland, Marc Zyngier, Oliver Upton, James Morse,
	Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon,
	Nathan Chancellor, Nick Desaulniers, Tom Rix, Miguel Ojeda,
	Sami Tolvanen, linux-arm-kernel, kvmarm, linux-kernel, llvm,
	Dmitry Vyukov, Alexander Potapenko, kasan-dev, linux-toolchains

On Wed, 9 Aug 2023 11:57:19 +0200
Marco Elver <elver@google.com> wrote:

>  static __always_inline bool __list_add_valid(struct list_head *new,
>  					     struct list_head *prev,
>  					     struct list_head *next)
>  {
> -	return __list_add_valid_or_report(new, prev, next);
> +	bool ret = true;
> +
> +	if (IS_ENABLED(CONFIG_HARDEN_LIST)) {
> +		/*
> +		 * With the hardening version, elide checking if next and prev
> +		 * are NULL, since the immediate dereference of them below would
> +		 * result in a fault if NULL.
> +		 *
> +		 * With the reduced set of checks, we can afford to inline the
> +		 * checks, which also gives the compiler a chance to elide some
> +		 * of them completely if they can be proven at compile-time. If
> +		 * one of the pre-conditions does not hold, the slow-path will
> +		 * show a report which pre-condition failed.
> +		 */
> +		if (likely(next->prev == prev && prev->next == next && new != prev && new != next))
> +			return true;
> +		ret = false;
> +	}
> +
> +	ret &= __list_add_valid_or_report(new, prev, next);
> +	return ret;
>  }

I would actually prefer DEBUG_LIST to select HARDEN_LIST and not the other
way around. It logically doesn't make sense that HARDEN_LIST would select
DEBUG_LIST. That is, I could by default want HARDEN_LIST always on, but not
DEBUG_LIST (because who knows, it may add other features I don't want). But
then, I may have stumbled over something and want more info, and enable
DEBUG_LIST (while still having HARDEN_LIST) enabled.

I think you are looking at this from an implementation perspective and not
the normal developer one.

This would mean the above function should get enabled by CONFIG_HARDEN_LIST
(and CONFIG_DEBUG would select CONFIG_HARDEN) and would look more like:

static __always_inline bool __list_add_valid(struct list_head *new,
					     struct list_head *prev,
					     struct list_head *next)
{
	bool ret = true;

	if (!IS_ENABLED(CONFIG_DEBUG_LIST)) {
		/*
		 * With the hardening version, elide checking if next and prev
		 * are NULL, since the immediate dereference of them below would
		 * result in a fault if NULL.
		 *
		 * With the reduced set of checks, we can afford to inline the
		 * checks, which also gives the compiler a chance to elide some
		 * of them completely if they can be proven at compile-time. If
		 * one of the pre-conditions does not hold, the slow-path will
		 * show a report which pre-condition failed.
		 */
		if (likely(next->prev == prev && prev->next == next && new != prev && new != next))
			return true;
		ret = false;
	}

	ret &= __list_add_valid_or_report(new, prev, next);
	return ret;
}

That is, if DEBUG_LIST is enabled, we always call the
__list_add_valid_or_report(), but if only HARDEN_LIST is enabled, then we
do the shortcut.

-- Steve

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 3/3] list_debug: Introduce CONFIG_DEBUG_LIST_MINIMAL
  2023-08-09 15:30         ` Steven Rostedt
@ 2023-08-09 16:32           ` Marco Elver
  2023-08-10 20:11             ` Kees Cook
  0 siblings, 1 reply; 12+ messages in thread
From: Marco Elver @ 2023-08-09 16:32 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Kees Cook, Andrew Morton, Guenter Roeck, Peter Zijlstra,
	Mark Rutland, Marc Zyngier, Oliver Upton, James Morse,
	Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon,
	Nathan Chancellor, Nick Desaulniers, Tom Rix, Miguel Ojeda,
	Sami Tolvanen, linux-arm-kernel, kvmarm, linux-kernel, llvm,
	Dmitry Vyukov, Alexander Potapenko, kasan-dev, linux-toolchains

On Wed, Aug 09, 2023 at 11:30AM -0400, Steven Rostedt wrote:
[...]
> 
> I would actually prefer DEBUG_LIST to select HARDEN_LIST and not the other
> way around. It logically doesn't make sense that HARDEN_LIST would select
> DEBUG_LIST. That is, I could by default want HARDEN_LIST always on, but not
> DEBUG_LIST (because who knows, it may add other features I don't want). But
> then, I may have stumbled over something and want more info, and enable
> DEBUG_LIST (while still having HARDEN_LIST) enabled.
> 
> I think you are looking at this from an implementation perspective and not
> the normal developer one.
> 
[...]
> 
> That is, if DEBUG_LIST is enabled, we always call the
> __list_add_valid_or_report(), but if only HARDEN_LIST is enabled, then we
> do the shortcut.

Good point - I think this is better. See below tentative v4.

Kees: Does that also look more like what you had in mind?

Thanks,
-- Marco

------ >8 ------

From: Marco Elver <elver@google.com>
Date: Thu, 27 Jul 2023 22:19:02 +0200
Subject: [PATCH] list: Introduce CONFIG_HARDEN_LIST

Numerous production kernel configs (see [1, 2]) are choosing to enable
CONFIG_DEBUG_LIST, which is also being recommended by KSPP for hardened
configs [3]. The motivation behind this is that the option can be used
as a security hardening feature (e.g. CVE-2019-2215 and CVE-2019-2025
are mitigated by the option [4]).

The feature has never been designed with performance in mind, yet common
list manipulation is happening across hot paths all over the kernel.

Introduce CONFIG_HARDEN_LIST, which performs list pointer checking
inline, and only upon list corruption calls the reporting slow path.

Since DEBUG_LIST is functionally a superset of HARDEN_LIST, the Kconfig
variables are designed to reflect that: DEBUG_LIST selects HARDEN_LIST,
whereas HARDEN_LIST itself has no dependency on DEBUG_LIST.

To generate optimal machine code with CONFIG_HARDEN_LIST:

  1. Elide checking for pointer values which upon dereference would
     result in an immediate access fault -- therefore "minimal" checks.
     The trade-off is lower-quality error reports.

  2. Use the newly introduced __preserve_most function attribute
     (available with Clang, but not yet with GCC) to minimize the code
     footprint for calling the reporting slow path. As a result,
     function size of callers is reduced by avoiding saving registers
     before calling the rarely called reporting slow path.

     Note that all TUs in lib/Makefile already disable function tracing,
     including list_debug.c, and __preserve_most's implied notrace has
     no effect in this case.

  3. Because the inline checks are a subset of the full set of checks in
     __list_*_valid_or_report(), always return false if the inline
     checks failed.  This avoids redundant compare and conditional
     branch right after return from the slow path.

As a side-effect of the checks being inline, if the compiler can prove
some condition to always be true, it can completely elide some checks.

Running netperf with CONFIG_HARDEN_LIST (using a Clang compiler with
"preserve_most") shows throughput improvements, in my case of ~7% on
average (up to 20-30% on some test cases).

Link: https://r.android.com/1266735 [1]
Link: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/blob/main/config [2]
Link: https://kernsec.org/wiki/index.php/Kernel_Self_Protection_Project/Recommended_Settings [3]
Link: https://googleprojectzero.blogspot.com/2019/11/bad-binder-android-in-wild-exploit.html [4]
Signed-off-by: Marco Elver <elver@google.com>
---
v4:
* Rename to CONFIG_HARDEN_LIST, which can independently be selected from
  CONFIG_DEBUG_LIST.

v3:
* Rename ___list_*_valid() to __list_*_valid_or_report().
* More comments.

v2:
* Note that lib/Makefile disables function tracing for everything and
  __preserve_most's implied notrace is a noop here.
---
 arch/arm64/kvm/hyp/nvhe/Makefile     |  2 +-
 arch/arm64/kvm/hyp/nvhe/list_debug.c |  2 +
 include/linux/list.h                 | 64 +++++++++++++++++++++++++---
 lib/Kconfig.debug                    |  9 +++-
 lib/Makefile                         |  2 +-
 lib/list_debug.c                     |  5 ++-
 security/Kconfig.hardening           | 13 ++++++
 7 files changed, 86 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index 9ddc025e4b86..c89c85a41ac4 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -25,7 +25,7 @@ hyp-obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o
 	 cache.o setup.o mm.o mem_protect.o sys_regs.o pkvm.o stacktrace.o ffa.o
 hyp-obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
 	 ../fpsimd.o ../hyp-entry.o ../exception.o ../pgtable.o
-hyp-obj-$(CONFIG_DEBUG_LIST) += list_debug.o
+hyp-obj-$(CONFIG_HARDEN_LIST) += list_debug.o
 hyp-obj-y += $(lib-objs)
 
 ##
diff --git a/arch/arm64/kvm/hyp/nvhe/list_debug.c b/arch/arm64/kvm/hyp/nvhe/list_debug.c
index 16266a939a4c..46a2d4f2b3c6 100644
--- a/arch/arm64/kvm/hyp/nvhe/list_debug.c
+++ b/arch/arm64/kvm/hyp/nvhe/list_debug.c
@@ -26,6 +26,7 @@ static inline __must_check bool nvhe_check_data_corruption(bool v)
 
 /* The predicates checked here are taken from lib/list_debug.c. */
 
+__list_valid_slowpath
 bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev,
 				struct list_head *next)
 {
@@ -37,6 +38,7 @@ bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev,
 	return true;
 }
 
+__list_valid_slowpath
 bool __list_del_entry_valid_or_report(struct list_head *entry)
 {
 	struct list_head *prev, *next;
diff --git a/include/linux/list.h b/include/linux/list.h
index 130c6a1bb45c..ef899c27c68b 100644
--- a/include/linux/list.h
+++ b/include/linux/list.h
@@ -38,39 +38,91 @@ static inline void INIT_LIST_HEAD(struct list_head *list)
 	WRITE_ONCE(list->prev, list);
 }
 
+#ifdef CONFIG_HARDEN_LIST
+
 #ifdef CONFIG_DEBUG_LIST
+# define __list_valid_slowpath
+#else
+# define __list_valid_slowpath __cold __preserve_most
+#endif
+
 /*
  * Performs the full set of list corruption checks before __list_add().
  * On list corruption reports a warning, and returns false.
  */
-extern bool __list_add_valid_or_report(struct list_head *new,
-				       struct list_head *prev,
-				       struct list_head *next);
+extern bool __list_valid_slowpath __list_add_valid_or_report(struct list_head *new,
+							     struct list_head *prev,
+							     struct list_head *next);
 
 /*
  * Performs list corruption checks before __list_add(). Returns false if a
  * corruption is detected, true otherwise.
+ *
+ * With CONFIG_HARDEN_LIST only, performs minimal list integrity checking (that
+ * do not result in a fault) inline, and only if a corruption is detected calls
+ * the reporting function __list_add_valid_or_report().
  */
 static __always_inline bool __list_add_valid(struct list_head *new,
 					     struct list_head *prev,
 					     struct list_head *next)
 {
-	return __list_add_valid_or_report(new, prev, next);
+	bool ret = true;
+
+	if (!IS_ENABLED(CONFIG_DEBUG_LIST)) {
+		/*
+		 * With the hardening version, elide checking if next and prev
+		 * are NULL, since the immediate dereference of them below would
+		 * result in a fault if NULL.
+		 *
+		 * With the reduced set of checks, we can afford to inline the
+		 * checks, which also gives the compiler a chance to elide some
+		 * of them completely if they can be proven at compile-time. If
+		 * one of the pre-conditions does not hold, the slow-path will
+		 * show a report which pre-condition failed.
+		 */
+		if (likely(next->prev == prev && prev->next == next && new != prev && new != next))
+			return true;
+		ret = false;
+	}
+
+	ret &= __list_add_valid_or_report(new, prev, next);
+	return ret;
 }
 
 /*
  * Performs the full set of list corruption checks before __list_del_entry().
  * On list corruption reports a warning, and returns false.
  */
-extern bool __list_del_entry_valid_or_report(struct list_head *entry);
+extern bool __list_valid_slowpath __list_del_entry_valid_or_report(struct list_head *entry);
 
 /*
  * Performs list corruption checks before __list_del_entry(). Returns false if a
  * corruption is detected, true otherwise.
+ *
+ * With CONFIG_HARDEN_LIST only, performs minimal list integrity checking (that
+ * do not result in a fault) inline, and only if a corruption is detected calls
+ * the reporting function __list_del_entry_valid_or_report().
  */
 static __always_inline bool __list_del_entry_valid(struct list_head *entry)
 {
-	return __list_del_entry_valid_or_report(entry);
+	bool ret = true;
+
+	if (!IS_ENABLED(CONFIG_DEBUG_LIST)) {
+		struct list_head *prev = entry->prev;
+		struct list_head *next = entry->next;
+
+		/*
+		 * With the hardening version, elide checking if next and prev
+		 * are NULL, LIST_POISON1 or LIST_POISON2, since the immediate
+		 * dereference of them below would result in a fault.
+		 */
+		if (likely(prev->next == entry && next->prev == entry))
+			return true;
+		ret = false;
+	}
+
+	ret &= __list_del_entry_valid_or_report(entry);
+	return ret;
 }
 #else
 static inline bool __list_add_valid(struct list_head *new,
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index fbc89baf7de6..9e38956d6f50 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1674,9 +1674,14 @@ menu "Debug kernel data structures"
 config DEBUG_LIST
 	bool "Debug linked list manipulation"
 	depends on DEBUG_KERNEL || BUG_ON_DATA_CORRUPTION
+	select HARDEN_LIST
 	help
-	  Enable this to turn on extended checks in the linked-list
-	  walking routines.
+	  Enable this to turn on extended checks in the linked-list walking
+	  routines.
+
+	  This option trades better quality error reports for performance, and
+	  is more suitable for kernel debugging. If you care about performance,
+	  you should only enable CONFIG_HARDEN_LIST instead.
 
 	  If unsure, say N.
 
diff --git a/lib/Makefile b/lib/Makefile
index 42d307ade225..a7ebc9090f04 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -161,7 +161,7 @@ obj-$(CONFIG_BTREE) += btree.o
 obj-$(CONFIG_INTERVAL_TREE) += interval_tree.o
 obj-$(CONFIG_ASSOCIATIVE_ARRAY) += assoc_array.o
 obj-$(CONFIG_DEBUG_PREEMPT) += smp_processor_id.o
-obj-$(CONFIG_DEBUG_LIST) += list_debug.o
+obj-$(CONFIG_HARDEN_LIST) += list_debug.o
 obj-$(CONFIG_DEBUG_OBJECTS) += debugobjects.o
 
 obj-$(CONFIG_BITREVERSE) += bitrev.o
diff --git a/lib/list_debug.c b/lib/list_debug.c
index 2def33b1491f..38ddc7c01eab 100644
--- a/lib/list_debug.c
+++ b/lib/list_debug.c
@@ -2,7 +2,8 @@
  * Copyright 2006, Red Hat, Inc., Dave Jones
  * Released under the General Public License (GPL).
  *
- * This file contains the linked list validation for DEBUG_LIST.
+ * This file contains the linked list validation and error reporting for
+ * HARDEN_LIST and DEBUG_LIST.
  */
 
 #include <linux/export.h>
@@ -17,6 +18,7 @@
  * attempt).
  */
 
+__list_valid_slowpath
 bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev,
 				struct list_head *next)
 {
@@ -39,6 +41,7 @@ bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev,
 }
 EXPORT_SYMBOL(__list_add_valid_or_report);
 
+__list_valid_slowpath
 bool __list_del_entry_valid_or_report(struct list_head *entry)
 {
 	struct list_head *prev, *next;
diff --git a/security/Kconfig.hardening b/security/Kconfig.hardening
index 0f295961e773..19aa2b7d2f64 100644
--- a/security/Kconfig.hardening
+++ b/security/Kconfig.hardening
@@ -279,6 +279,19 @@ config ZERO_CALL_USED_REGS
 
 endmenu
 
+menu "Hardening of kernel data structures"
+
+config HARDEN_LIST
+	bool "Check integrity of linked list manipulation"
+	help
+	  Minimal integrity checking in the linked-list manipulation routines
+	  to catch memory corruptions that are not guaranteed to result in an
+	  immediate access fault.
+
+	  If unsure, say N.
+
+endmenu
+
 config CC_HAS_RANDSTRUCT
 	def_bool $(cc-option,-frandomize-layout-seed-file=/dev/null)
 	# Randstruct was first added in Clang 15, but it isn't safe to use until
-- 
2.41.0.640.ga95def55d0-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 3/3] list_debug: Introduce CONFIG_DEBUG_LIST_MINIMAL
  2023-08-09 16:32           ` Marco Elver
@ 2023-08-10 20:11             ` Kees Cook
  2023-08-11  9:10               ` Marco Elver
  2023-08-11 19:33               ` Steven Rostedt
  0 siblings, 2 replies; 12+ messages in thread
From: Kees Cook @ 2023-08-10 20:11 UTC (permalink / raw)
  To: Marco Elver
  Cc: Steven Rostedt, Andrew Morton, Guenter Roeck, Peter Zijlstra,
	Mark Rutland, Marc Zyngier, Oliver Upton, James Morse,
	Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon,
	Nathan Chancellor, Nick Desaulniers, Tom Rix, Miguel Ojeda,
	Sami Tolvanen, linux-arm-kernel, kvmarm, linux-kernel, llvm,
	Dmitry Vyukov, Alexander Potapenko, kasan-dev, linux-toolchains

On Wed, Aug 09, 2023 at 06:32:37PM +0200, Marco Elver wrote:
> On Wed, Aug 09, 2023 at 11:30AM -0400, Steven Rostedt wrote:
> [...]
> > 
> > I would actually prefer DEBUG_LIST to select HARDEN_LIST and not the other
> > way around. It logically doesn't make sense that HARDEN_LIST would select
> > DEBUG_LIST. That is, I could by default want HARDEN_LIST always on, but not
> > DEBUG_LIST (because who knows, it may add other features I don't want). But
> > then, I may have stumbled over something and want more info, and enable
> > DEBUG_LIST (while still having HARDEN_LIST) enabled.
> > 
> > I think you are looking at this from an implementation perspective and not
> > the normal developer one.
> > 
> [...]
> > 
> > That is, if DEBUG_LIST is enabled, we always call the
> > __list_add_valid_or_report(), but if only HARDEN_LIST is enabled, then we
> > do the shortcut.
> 
> Good point - I think this is better. See below tentative v4.
> 
> Kees: Does that also look more like what you had in mind?

Yeah, this looks good. My only nit would be a naming one. All the
other hardening features are named "HARDENED", but perhaps the "ED"
is redundant in the others. Still, consistency seems nicer. What do you
think of CONFIG_LIST_HARDENED ? (The modern trend for Kconfig naming tends
to keep the subsystem name first and then apply optional elements after.)

One note: do the LKDTM list hardening tests still pass? i.e.
CORRUPT_LIST_ADD
CORRUPT_LIST_DEL

> [...]
> +		/*
> +		 * With the hardening version, elide checking if next and prev
> +		 * are NULL, LIST_POISON1 or LIST_POISON2, since the immediate
> +		 * dereference of them below would result in a fault.
> +		 */
> +		if (likely(prev->next == entry && next->prev == entry))
> +			return true;

I'm not super excited about skipping those checks, since they are
values that can be reached through kernel list management confusion. If
an attacker is using a system where the zero-page has been mapped
and is accessible (i.e. lacking SMAP etc), then attacks could still
be constructed. However, I do recognize this chain of exploitation
prerequisites is getting rather long, so probably this is a reasonable
trade off on modern systems.

-Kees

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 3/3] list_debug: Introduce CONFIG_DEBUG_LIST_MINIMAL
  2023-08-10 20:11             ` Kees Cook
@ 2023-08-11  9:10               ` Marco Elver
  2023-08-11 19:33               ` Steven Rostedt
  1 sibling, 0 replies; 12+ messages in thread
From: Marco Elver @ 2023-08-11  9:10 UTC (permalink / raw)
  To: Kees Cook
  Cc: Steven Rostedt, Andrew Morton, Guenter Roeck, Peter Zijlstra,
	Mark Rutland, Marc Zyngier, Oliver Upton, James Morse,
	Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon,
	Nathan Chancellor, Nick Desaulniers, Tom Rix, Miguel Ojeda,
	Sami Tolvanen, linux-arm-kernel, kvmarm, linux-kernel, llvm,
	Dmitry Vyukov, Alexander Potapenko, kasan-dev, linux-toolchains

On Thu, 10 Aug 2023 at 22:12, Kees Cook <keescook@chromium.org> wrote:
>
> On Wed, Aug 09, 2023 at 06:32:37PM +0200, Marco Elver wrote:
> > On Wed, Aug 09, 2023 at 11:30AM -0400, Steven Rostedt wrote:
> > [...]
> > >
> > > I would actually prefer DEBUG_LIST to select HARDEN_LIST and not the other
> > > way around. It logically doesn't make sense that HARDEN_LIST would select
> > > DEBUG_LIST. That is, I could by default want HARDEN_LIST always on, but not
> > > DEBUG_LIST (because who knows, it may add other features I don't want). But
> > > then, I may have stumbled over something and want more info, and enable
> > > DEBUG_LIST (while still having HARDEN_LIST) enabled.
> > >
> > > I think you are looking at this from an implementation perspective and not
> > > the normal developer one.
> > >
> > [...]
> > >
> > > That is, if DEBUG_LIST is enabled, we always call the
> > > __list_add_valid_or_report(), but if only HARDEN_LIST is enabled, then we
> > > do the shortcut.
> >
> > Good point - I think this is better. See below tentative v4.
> >
> > Kees: Does that also look more like what you had in mind?
>
> Yeah, this looks good. My only nit would be a naming one. All the
> other hardening features are named "HARDENED", but perhaps the "ED"
> is redundant in the others. Still, consistency seems nicer. What do you
> think of CONFIG_LIST_HARDENED ? (The modern trend for Kconfig naming tends
> to keep the subsystem name first and then apply optional elements after.)

Naming is a bit all over. :-/
I agree with the <subsystem>_<suboption> scheme, generally. I think
initially I tried to keep the name shorter, and also find a good
counter-part to DEBUG_<suboption>, therefore HARDEN_LIST.

Let's just change it to CONFIG_LIST_HARDENED, given the existing
"HARDENED" options.

I don't have a strong preference.

> One note: do the LKDTM list hardening tests still pass? i.e.
> CORRUPT_LIST_ADD
> CORRUPT_LIST_DEL

Yes, they do. Though I need to also adjust BUG_ON_DATA_CORRUPTION to
select LIST_HARDENED, and the test should check for the new option
(which is implied by DEBUG_LIST now). There will be an additional
patch to adjust that.

> > [...]
> > +             /*
> > +              * With the hardening version, elide checking if next and prev
> > +              * are NULL, LIST_POISON1 or LIST_POISON2, since the immediate
> > +              * dereference of them below would result in a fault.
> > +              */
> > +             if (likely(prev->next == entry && next->prev == entry))
> > +                     return true;
>
> I'm not super excited about skipping those checks, since they are
> values that can be reached through kernel list management confusion. If
> an attacker is using a system where the zero-page has been mapped
> and is accessible (i.e. lacking SMAP etc), then attacks could still
> be constructed. However, I do recognize this chain of exploitation
> prerequisites is getting rather long, so probably this is a reasonable
> trade off on modern systems.

Sure, it's a trade-off for systems which do have the bare minimum of
modern hardware security features.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 3/3] list_debug: Introduce CONFIG_DEBUG_LIST_MINIMAL
  2023-08-10 20:11             ` Kees Cook
  2023-08-11  9:10               ` Marco Elver
@ 2023-08-11 19:33               ` Steven Rostedt
  1 sibling, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2023-08-11 19:33 UTC (permalink / raw)
  To: Kees Cook
  Cc: Marco Elver, Andrew Morton, Guenter Roeck, Peter Zijlstra,
	Mark Rutland, Marc Zyngier, Oliver Upton, James Morse,
	Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon,
	Nathan Chancellor, Nick Desaulniers, Tom Rix, Miguel Ojeda,
	Sami Tolvanen, linux-arm-kernel, kvmarm, linux-kernel, llvm,
	Dmitry Vyukov, Alexander Potapenko, kasan-dev, linux-toolchains

On Thu, 10 Aug 2023 13:11:58 -0700
Kees Cook <keescook@chromium.org> wrote:

> > [...]
> > +		/*
> > +		 * With the hardening version, elide checking if next and prev
> > +		 * are NULL, LIST_POISON1 or LIST_POISON2, since the immediate
> > +		 * dereference of them below would result in a fault.
> > +		 */
> > +		if (likely(prev->next == entry && next->prev == entry))
> > +			return true;  
> 
> I'm not super excited about skipping those checks, since they are
> values that can be reached through kernel list management confusion. If
> an attacker is using a system where the zero-page has been mapped
> and is accessible (i.e. lacking SMAP etc), then attacks could still
> be constructed. However, I do recognize this chain of exploitation
> prerequisites is getting rather long, so probably this is a reasonable
> trade off on modern systems.

A totally hardened machine is one that doesn't run ;-)

Yes, hopefully that when the kernel is configured with HARDENED it will
eliminate steps to a prerequisite attack. I'm sure enabling lockdep would
also help harden the system too. But there is a balance between security
and performance. The more that adding security harms performance, the less
people will use that security.

-- Steve

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-08-11 19:34 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-08 10:17 [PATCH v3 1/3] compiler_types: Introduce the Clang __preserve_most function attribute Marco Elver
2023-08-08 10:17 ` [PATCH v3 2/3] list_debug: Introduce inline wrappers for debug checks Marco Elver
2023-08-08 10:17 ` [PATCH v3 3/3] list_debug: Introduce CONFIG_DEBUG_LIST_MINIMAL Marco Elver
2023-08-08 21:27   ` Kees Cook
2023-08-09  7:35     ` Marco Elver
2023-08-09  9:57       ` Marco Elver
2023-08-09 15:30         ` Steven Rostedt
2023-08-09 16:32           ` Marco Elver
2023-08-10 20:11             ` Kees Cook
2023-08-11  9:10               ` Marco Elver
2023-08-11 19:33               ` Steven Rostedt
2023-08-08 12:35 ` [PATCH v3 1/3] compiler_types: Introduce the Clang __preserve_most function attribute Mark Rutland

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).