All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/45] Add KernelMemorySanitizer infrastructure
@ 2022-07-01 14:22 Alexander Potapenko
  2022-07-01 14:22 ` [PATCH v4 01/45] x86: add missing include to sparsemem.h Alexander Potapenko
                   ` (44 more replies)
  0 siblings, 45 replies; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-01 14:22 UTC (permalink / raw)
  To: glider
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, linux-mm, linux-arch,
	linux-kernel

KernelMemorySanitizer (KMSAN) is a detector of errors related to uses of
uninitialized memory. It relies on compile-time Clang instrumentation
(similar to MSan in the userspace [1]) and tracks the state of every bit
of kernel memory, being able to report an error if uninitialized value
is used in a condition, dereferenced, or escapes to userspace, USB or
DMA.

KMSAN has reported more than 300 bugs in the past few years (recently
fixed bugs: [2]), most of them with the help of syzkaller. Such bugs
keep getting introduced into the kernel despite new compiler warnings
and other analyses (the 5.16 cycle already resulted in several
KMSAN-reported bugs, e.g. [3]). Mitigations like total stack and heap
initialization are unfortunately very far from being deployable.

The proposed patchset contains KMSAN runtime implementation together
with small changes to other subsystems needed to make KMSAN work.

The latter changes fall into several categories:

1. Changes and refactorings of existing code required to add KMSAN:
 - [01/45] x86: add missing include to sparsemem.h
 - [02/45] stackdepot: reserve 5 extra bits in depot_stack_handle_t
 - [03/45] instrumented.h: allow instrumenting both sides of copy_from_user()
 - [04/45] x86: asm: instrument usercopy in get_user() and __put_user_size()
 - [05/45] asm-generic: instrument usercopy in cacheflush.h
 - [10/45] libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE

2. KMSAN-related declarations in generic code, KMSAN runtime library,
   docs and configs:
 - [06/45] kmsan: add ReST documentation
 - [07/45] kmsan: introduce __no_sanitize_memory and __no_kmsan_checks
 - [09/45] x86: kmsan: pgtable: reduce vmalloc space
 - [11/45] kmsan: add KMSAN runtime core
 - [13/45] MAINTAINERS: add entry for KMSAN
 - [25/45] kmsan: add tests for KMSAN
 - [32/45] objtool: kmsan: list KMSAN API functions as uaccess-safe
 - [36/45] x86: kmsan: use __msan_ string functions where possible
 - [45/45] x86: kmsan: enable KMSAN builds for x86

3. Adding hooks from different subsystems to notify KMSAN about memory
   state changes:
 - [14/45] mm: kmsan: maintain KMSAN metadata for page
 - [15/45] mm: kmsan: call KMSAN hooks from SLUB code
 - [16/45] kmsan: handle task creation and exiting
 - [17/45] init: kmsan: call KMSAN initialization routines
 - [18/45] instrumented.h: add KMSAN support
 - [20/45] kmsan: add iomap support
 - [21/45] Input: libps2: mark data received in __ps2_command() as initialized
 - [22/45] dma: kmsan: unpoison DMA mappings
 - [35/45] x86: kmsan: handle open-coded assembly in lib/iomem.c
 - [37/45] x86: kmsan: sync metadata pages on page fault

4. Changes that prevent false reports by explicitly initializing memory,
   disabling optimized code that may trick KMSAN, selectively skipping
   instrumentation:
 - [08/45] kmsan: mark noinstr as __no_sanitize_memory
 - [12/45] kmsan: disable instrumentation of unsupported common kernel code
 - [19/45] kmsan: unpoison @tlb in arch_tlb_gather_mmu()
 - [23/45] virtio: kmsan: check/unpoison scatterlist in vring_map_one_sg()
 - [24/45] kmsan: handle memory sent to/from USB
 - [26/45] kmsan: disable strscpy() optimization under KMSAN
 - [27/45] crypto: kmsan: disable accelerated configs under KMSAN
 - [28/45] kmsan: disable physical page merging in biovec
 - [29/45] block: kmsan: skip bio block merging logic for KMSAN
 - [30/45] kcov: kmsan: unpoison area->list in kcov_remote_area_put()
 - [31/45] security: kmsan: fix interoperability with auto-initialization
 - [33/45] x86: kmsan: disable instrumentation of unsupported code
 - [34/45] x86: kmsan: skip shadow checks in __switch_to()
 - [38/45] x86: kasan: kmsan: support CONFIG_GENERIC_CSUM on x86, enable it for KASAN/KMSAN
 - [39/45] x86: fs: kmsan: disable CONFIG_DCACHE_WORD_ACCESS
 - [40/45] x86: kmsan: don't instrument stack walking functions
 - [41/45] entry: kmsan: introduce kmsan_unpoison_entry_regs()

5. Fixes for bugs detected with CONFIG_KMSAN_CHECK_PARAM_RETVAL:
 - [42/45] bpf: kmsan: initialize BPF registers with zeroes
 - [43/45] namei: initialize parameters passed to step_into()
 - [44/45] mm: fs: initialize fsdata passed to write_begin/write_end interface


This patchset allows one to boot and run a defconfig+KMSAN kernel on a
QEMU without known false positives. It however doesn't guarantee there
are no false positives in drivers of certain devices or less tested
subsystems, although KMSAN is actively tested on syzbot with a large
config.

The biggest difference between this patch series and v3 is the
introduction of CONFIG_KMSAN_CHECK_PARAM_RETVAL, which maps to the
-fsanitize-memory-param-retval compiler flag and enforces conservative
checks of most kernel function parameters passed by value. As discussed
in [4], passing uninitialized values as function parameters is
considered undefined behavior, therefore KMSAN now reports such cases as
errors. Several newly added patches fix known manifestations of these
errors.

The patchset was generated relative to Linux v5.19-rc4. The most
up-to-date KMSAN tree currently resides at
https://github.com/google/kmsan/. One may find it handy to review these
patches in Gerrit [5].

A huge thanks goes to the reviewers of the RFC patch series sent to LKML
in 2020 ([6]).

[1] https://clang.llvm.org/docs/MemorySanitizer.html
[2] https://syzkaller.appspot.com/upstream/fixed?manager=ci-upstream-kmsan-gce
[3] https://lore.kernel.org/all/20211126124746.761278-1-glider@google.com/
[4] https://lore.kernel.org/all/20220614144853.3693273-1-glider@google.com/ 
[5] https://linux-review.googlesource.com/c/linux/kernel/git/torvalds/linux/+/12604/ 
[6] https://lore.kernel.org/all/20200325161249.55095-1-glider@google.com/

Alexander Potapenko (44):
  stackdepot: reserve 5 extra bits in depot_stack_handle_t
  instrumented.h: allow instrumenting both sides of copy_from_user()
  x86: asm: instrument usercopy in get_user() and __put_user_size()
  asm-generic: instrument usercopy in cacheflush.h
  kmsan: add ReST documentation
  kmsan: introduce __no_sanitize_memory and __no_kmsan_checks
  kmsan: mark noinstr as __no_sanitize_memory
  x86: kmsan: pgtable: reduce vmalloc space
  libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE
  kmsan: add KMSAN runtime core
  kmsan: disable instrumentation of unsupported common kernel code
  MAINTAINERS: add entry for KMSAN
  mm: kmsan: maintain KMSAN metadata for page operations
  mm: kmsan: call KMSAN hooks from SLUB code
  kmsan: handle task creation and exiting
  init: kmsan: call KMSAN initialization routines
  instrumented.h: add KMSAN support
  kmsan: unpoison @tlb in arch_tlb_gather_mmu()
  kmsan: add iomap support
  Input: libps2: mark data received in __ps2_command() as initialized
  dma: kmsan: unpoison DMA mappings
  virtio: kmsan: check/unpoison scatterlist in vring_map_one_sg()
  kmsan: handle memory sent to/from USB
  kmsan: add tests for KMSAN
  kmsan: disable strscpy() optimization under KMSAN
  crypto: kmsan: disable accelerated configs under KMSAN
  kmsan: disable physical page merging in biovec
  block: kmsan: skip bio block merging logic for KMSAN
  kcov: kmsan: unpoison area->list in kcov_remote_area_put()
  security: kmsan: fix interoperability with auto-initialization
  objtool: kmsan: list KMSAN API functions as uaccess-safe
  x86: kmsan: disable instrumentation of unsupported code
  x86: kmsan: skip shadow checks in __switch_to()
  x86: kmsan: handle open-coded assembly in lib/iomem.c
  x86: kmsan: use __msan_ string functions where possible
  x86: kmsan: sync metadata pages on page fault
  x86: kasan: kmsan: support CONFIG_GENERIC_CSUM on x86, enable it for
    KASAN/KMSAN
  x86: fs: kmsan: disable CONFIG_DCACHE_WORD_ACCESS
  x86: kmsan: don't instrument stack walking functions
  entry: kmsan: introduce kmsan_unpoison_entry_regs()
  bpf: kmsan: initialize BPF registers with zeroes
  namei: initialize parameters passed to step_into()
  mm: fs: initialize fsdata passed to write_begin/write_end interface
  x86: kmsan: enable KMSAN builds for x86

Dmitry Vyukov (1):
  x86: add missing include to sparsemem.h

 Documentation/dev-tools/index.rst       |   1 +
 Documentation/dev-tools/kmsan.rst       | 422 ++++++++++++++++++
 MAINTAINERS                             |  12 +
 Makefile                                |   1 +
 arch/s390/lib/uaccess.c                 |   3 +-
 arch/x86/Kconfig                        |   9 +-
 arch/x86/boot/Makefile                  |   1 +
 arch/x86/boot/compressed/Makefile       |   1 +
 arch/x86/entry/vdso/Makefile            |   3 +
 arch/x86/include/asm/checksum.h         |  16 +-
 arch/x86/include/asm/kmsan.h            |  55 +++
 arch/x86/include/asm/page_64.h          |  12 +
 arch/x86/include/asm/pgtable_64_types.h |  41 +-
 arch/x86/include/asm/sparsemem.h        |   2 +
 arch/x86/include/asm/string_64.h        |  23 +-
 arch/x86/include/asm/uaccess.h          |   7 +
 arch/x86/kernel/Makefile                |   2 +
 arch/x86/kernel/cpu/Makefile            |   1 +
 arch/x86/kernel/dumpstack.c             |   6 +
 arch/x86/kernel/process_64.c            |   1 +
 arch/x86/kernel/unwind_frame.c          |  11 +
 arch/x86/lib/Makefile                   |   2 +
 arch/x86/lib/iomem.c                    |   5 +
 arch/x86/mm/Makefile                    |   2 +
 arch/x86/mm/fault.c                     |  23 +-
 arch/x86/mm/init_64.c                   |   2 +-
 arch/x86/mm/ioremap.c                   |   3 +
 arch/x86/realmode/rm/Makefile           |   1 +
 block/bio.c                             |   2 +
 block/blk.h                             |   7 +
 crypto/Kconfig                          |  30 ++
 drivers/firmware/efi/libstub/Makefile   |   1 +
 drivers/input/serio/libps2.c            |   5 +-
 drivers/net/Kconfig                     |   1 +
 drivers/nvdimm/nd.h                     |   2 +-
 drivers/nvdimm/pfn_devs.c               |   2 +-
 drivers/usb/core/urb.c                  |   2 +
 drivers/virtio/virtio_ring.c            |  10 +-
 fs/buffer.c                             |   4 +-
 fs/namei.c                              |  10 +-
 include/asm-generic/cacheflush.h        |   9 +-
 include/linux/compiler-clang.h          |  23 +
 include/linux/compiler-gcc.h            |   6 +
 include/linux/compiler_types.h          |   3 +-
 include/linux/fortify-string.h          |   2 +
 include/linux/highmem.h                 |   3 +
 include/linux/instrumented.h            |  26 +-
 include/linux/kmsan-checks.h            |  83 ++++
 include/linux/kmsan.h                   | 362 ++++++++++++++++
 include/linux/mm_types.h                |  12 +
 include/linux/sched.h                   |   5 +
 include/linux/stackdepot.h              |   8 +
 include/linux/uaccess.h                 |  19 +-
 init/main.c                             |   3 +
 kernel/Makefile                         |   1 +
 kernel/bpf/core.c                       |   2 +-
 kernel/dma/mapping.c                    |   9 +-
 kernel/entry/common.c                   |   5 +
 kernel/exit.c                           |   2 +
 kernel/fork.c                           |   2 +
 kernel/kcov.c                           |   7 +
 kernel/locking/Makefile                 |   3 +-
 lib/Kconfig.debug                       |   1 +
 lib/Kconfig.kmsan                       |  62 +++
 lib/Makefile                            |   3 +
 lib/iomap.c                             |  44 ++
 lib/iov_iter.c                          |   9 +-
 lib/stackdepot.c                        |  29 +-
 lib/string.c                            |   8 +
 lib/usercopy.c                          |   3 +-
 mm/Makefile                             |   1 +
 mm/filemap.c                            |   2 +-
 mm/internal.h                           |   6 +
 mm/kasan/common.c                       |   2 +-
 mm/kmsan/Makefile                       |  28 ++
 mm/kmsan/core.c                         | 468 ++++++++++++++++++++
 mm/kmsan/hooks.c                        | 395 +++++++++++++++++
 mm/kmsan/init.c                         | 238 ++++++++++
 mm/kmsan/instrumentation.c              | 271 ++++++++++++
 mm/kmsan/kmsan.h                        | 195 +++++++++
 mm/kmsan/kmsan_test.c                   | 552 ++++++++++++++++++++++++
 mm/kmsan/report.c                       | 211 +++++++++
 mm/kmsan/shadow.c                       | 297 +++++++++++++
 mm/memory.c                             |   2 +
 mm/mmu_gather.c                         |  10 +
 mm/page_alloc.c                         |  18 +
 mm/slab.h                               |   1 +
 mm/slub.c                               |  18 +
 mm/vmalloc.c                            |  20 +-
 scripts/Makefile.kmsan                  |   8 +
 scripts/Makefile.lib                    |   9 +
 security/Kconfig.hardening              |   4 +
 tools/objtool/check.c                   |  20 +
 93 files changed, 4222 insertions(+), 52 deletions(-)
 create mode 100644 Documentation/dev-tools/kmsan.rst
 create mode 100644 arch/x86/include/asm/kmsan.h
 create mode 100644 include/linux/kmsan-checks.h
 create mode 100644 include/linux/kmsan.h
 create mode 100644 lib/Kconfig.kmsan
 create mode 100644 mm/kmsan/Makefile
 create mode 100644 mm/kmsan/core.c
 create mode 100644 mm/kmsan/hooks.c
 create mode 100644 mm/kmsan/init.c
 create mode 100644 mm/kmsan/instrumentation.c
 create mode 100644 mm/kmsan/kmsan.h
 create mode 100644 mm/kmsan/kmsan_test.c
 create mode 100644 mm/kmsan/report.c
 create mode 100644 mm/kmsan/shadow.c
 create mode 100644 scripts/Makefile.kmsan

-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply	[flat|nested] 147+ messages in thread

* [PATCH v4 01/45] x86: add missing include to sparsemem.h
  2022-07-01 14:22 [PATCH v4 00/45] Add KernelMemorySanitizer infrastructure Alexander Potapenko
@ 2022-07-01 14:22 ` Alexander Potapenko
  2022-07-01 14:22 ` [PATCH v4 02/45] stackdepot: reserve 5 extra bits in depot_stack_handle_t Alexander Potapenko
                   ` (43 subsequent siblings)
  44 siblings, 0 replies; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-01 14:22 UTC (permalink / raw)
  To: glider
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, linux-mm, linux-arch,
	linux-kernel

From: Dmitry Vyukov <dvyukov@google.com>

Including sparsemem.h from other files (e.g. transitively via
asm/pgtable_64_types.h) results in compilation errors due to unknown
types:

sparsemem.h:34:32: error: unknown type name 'phys_addr_t'
extern int phys_to_target_node(phys_addr_t start);
                               ^
sparsemem.h:36:39: error: unknown type name 'u64'
extern int memory_add_physaddr_to_nid(u64 start);
                                      ^

Fix these errors by including linux/types.h from sparsemem.h
This is required for the upcoming KMSAN patches.

Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Alexander Potapenko <glider@google.com>
---
Link: https://linux-review.googlesource.com/id/Ifae221ce85d870d8f8d17173bd44d5cf9be2950f
---
 arch/x86/include/asm/sparsemem.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/sparsemem.h b/arch/x86/include/asm/sparsemem.h
index 6a9ccc1b2be5d..64df897c0ee30 100644
--- a/arch/x86/include/asm/sparsemem.h
+++ b/arch/x86/include/asm/sparsemem.h
@@ -2,6 +2,8 @@
 #ifndef _ASM_X86_SPARSEMEM_H
 #define _ASM_X86_SPARSEMEM_H
 
+#include <linux/types.h>
+
 #ifdef CONFIG_SPARSEMEM
 /*
  * generic non-linear memory support:
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH v4 02/45] stackdepot: reserve 5 extra bits in depot_stack_handle_t
  2022-07-01 14:22 [PATCH v4 00/45] Add KernelMemorySanitizer infrastructure Alexander Potapenko
  2022-07-01 14:22 ` [PATCH v4 01/45] x86: add missing include to sparsemem.h Alexander Potapenko
@ 2022-07-01 14:22 ` Alexander Potapenko
  2022-07-12 14:17   ` Marco Elver
  2022-07-01 14:22 ` [PATCH v4 03/45] instrumented.h: allow instrumenting both sides of copy_from_user() Alexander Potapenko
                   ` (42 subsequent siblings)
  44 siblings, 1 reply; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-01 14:22 UTC (permalink / raw)
  To: glider
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, linux-mm, linux-arch,
	linux-kernel

Some users (currently only KMSAN) may want to use spare bits in
depot_stack_handle_t. Let them do so by adding @extra_bits to
__stack_depot_save() to store arbitrary flags, and providing
stack_depot_get_extra_bits() to retrieve those flags.

Also adapt KASAN to the new prototype by passing extra_bits=0, as KASAN
does not intend to store additional information in the stack handle.

Signed-off-by: Alexander Potapenko <glider@google.com>
---
v4:
 -- per Marco Elver's request, fold "kasan: common: adapt to the new
    prototype of __stack_depot_save()" into this patch to prevent
    bisection breakages.

Link: https://linux-review.googlesource.com/id/I0587f6c777667864768daf07821d594bce6d8ff9
---
 include/linux/stackdepot.h |  8 ++++++++
 lib/stackdepot.c           | 29 ++++++++++++++++++++++++-----
 mm/kasan/common.c          |  2 +-
 3 files changed, 33 insertions(+), 6 deletions(-)

diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
index bc2797955de90..9ca7798d7a318 100644
--- a/include/linux/stackdepot.h
+++ b/include/linux/stackdepot.h
@@ -14,9 +14,15 @@
 #include <linux/gfp.h>
 
 typedef u32 depot_stack_handle_t;
+/*
+ * Number of bits in the handle that stack depot doesn't use. Users may store
+ * information in them.
+ */
+#define STACK_DEPOT_EXTRA_BITS 5
 
 depot_stack_handle_t __stack_depot_save(unsigned long *entries,
 					unsigned int nr_entries,
+					unsigned int extra_bits,
 					gfp_t gfp_flags, bool can_alloc);
 
 /*
@@ -59,6 +65,8 @@ depot_stack_handle_t stack_depot_save(unsigned long *entries,
 unsigned int stack_depot_fetch(depot_stack_handle_t handle,
 			       unsigned long **entries);
 
+unsigned int stack_depot_get_extra_bits(depot_stack_handle_t handle);
+
 int stack_depot_snprint(depot_stack_handle_t handle, char *buf, size_t size,
 		       int spaces);
 
diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index 5ca0d086ef4a3..3d1dbdd5a87f6 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -42,7 +42,8 @@
 #define STACK_ALLOC_OFFSET_BITS (STACK_ALLOC_ORDER + PAGE_SHIFT - \
 					STACK_ALLOC_ALIGN)
 #define STACK_ALLOC_INDEX_BITS (DEPOT_STACK_BITS - \
-		STACK_ALLOC_NULL_PROTECTION_BITS - STACK_ALLOC_OFFSET_BITS)
+		STACK_ALLOC_NULL_PROTECTION_BITS - \
+		STACK_ALLOC_OFFSET_BITS - STACK_DEPOT_EXTRA_BITS)
 #define STACK_ALLOC_SLABS_CAP 8192
 #define STACK_ALLOC_MAX_SLABS \
 	(((1LL << (STACK_ALLOC_INDEX_BITS)) < STACK_ALLOC_SLABS_CAP) ? \
@@ -55,6 +56,7 @@ union handle_parts {
 		u32 slabindex : STACK_ALLOC_INDEX_BITS;
 		u32 offset : STACK_ALLOC_OFFSET_BITS;
 		u32 valid : STACK_ALLOC_NULL_PROTECTION_BITS;
+		u32 extra : STACK_DEPOT_EXTRA_BITS;
 	};
 };
 
@@ -76,6 +78,14 @@ static int next_slab_inited;
 static size_t depot_offset;
 static DEFINE_RAW_SPINLOCK(depot_lock);
 
+unsigned int stack_depot_get_extra_bits(depot_stack_handle_t handle)
+{
+	union handle_parts parts = { .handle = handle };
+
+	return parts.extra;
+}
+EXPORT_SYMBOL(stack_depot_get_extra_bits);
+
 static bool init_stack_slab(void **prealloc)
 {
 	if (!*prealloc)
@@ -139,6 +149,7 @@ depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
 	stack->handle.slabindex = depot_index;
 	stack->handle.offset = depot_offset >> STACK_ALLOC_ALIGN;
 	stack->handle.valid = 1;
+	stack->handle.extra = 0;
 	memcpy(stack->entries, entries, flex_array_size(stack, entries, size));
 	depot_offset += required_size;
 
@@ -343,6 +354,7 @@ EXPORT_SYMBOL_GPL(stack_depot_fetch);
  *
  * @entries:		Pointer to storage array
  * @nr_entries:		Size of the storage array
+ * @extra_bits:		Flags to store in unused bits of depot_stack_handle_t
  * @alloc_flags:	Allocation gfp flags
  * @can_alloc:		Allocate stack slabs (increased chance of failure if false)
  *
@@ -354,6 +366,10 @@ EXPORT_SYMBOL_GPL(stack_depot_fetch);
  * If the stack trace in @entries is from an interrupt, only the portion up to
  * interrupt entry is saved.
  *
+ * Additional opaque flags can be passed in @extra_bits, stored in the unused
+ * bits of the stack handle, and retrieved using stack_depot_get_extra_bits()
+ * without calling stack_depot_fetch().
+ *
  * Context: Any context, but setting @can_alloc to %false is required if
  *          alloc_pages() cannot be used from the current context. Currently
  *          this is the case from contexts where neither %GFP_ATOMIC nor
@@ -363,10 +379,11 @@ EXPORT_SYMBOL_GPL(stack_depot_fetch);
  */
 depot_stack_handle_t __stack_depot_save(unsigned long *entries,
 					unsigned int nr_entries,
+					unsigned int extra_bits,
 					gfp_t alloc_flags, bool can_alloc)
 {
 	struct stack_record *found = NULL, **bucket;
-	depot_stack_handle_t retval = 0;
+	union handle_parts retval = { .handle = 0 };
 	struct page *page = NULL;
 	void *prealloc = NULL;
 	unsigned long flags;
@@ -450,9 +467,11 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
 		free_pages((unsigned long)prealloc, STACK_ALLOC_ORDER);
 	}
 	if (found)
-		retval = found->handle.handle;
+		retval.handle = found->handle.handle;
 fast_exit:
-	return retval;
+	retval.extra = extra_bits;
+
+	return retval.handle;
 }
 EXPORT_SYMBOL_GPL(__stack_depot_save);
 
@@ -472,6 +491,6 @@ depot_stack_handle_t stack_depot_save(unsigned long *entries,
 				      unsigned int nr_entries,
 				      gfp_t alloc_flags)
 {
-	return __stack_depot_save(entries, nr_entries, alloc_flags, true);
+	return __stack_depot_save(entries, nr_entries, 0, alloc_flags, true);
 }
 EXPORT_SYMBOL_GPL(stack_depot_save);
diff --git a/mm/kasan/common.c b/mm/kasan/common.c
index c40c0e7b3b5f1..ba4fceeec173c 100644
--- a/mm/kasan/common.c
+++ b/mm/kasan/common.c
@@ -36,7 +36,7 @@ depot_stack_handle_t kasan_save_stack(gfp_t flags, bool can_alloc)
 	unsigned int nr_entries;
 
 	nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 0);
-	return __stack_depot_save(entries, nr_entries, flags, can_alloc);
+	return __stack_depot_save(entries, nr_entries, 0, flags, can_alloc);
 }
 
 void kasan_set_track(struct kasan_track *track, gfp_t flags)
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH v4 03/45] instrumented.h: allow instrumenting both sides of copy_from_user()
  2022-07-01 14:22 [PATCH v4 00/45] Add KernelMemorySanitizer infrastructure Alexander Potapenko
  2022-07-01 14:22 ` [PATCH v4 01/45] x86: add missing include to sparsemem.h Alexander Potapenko
  2022-07-01 14:22 ` [PATCH v4 02/45] stackdepot: reserve 5 extra bits in depot_stack_handle_t Alexander Potapenko
@ 2022-07-01 14:22 ` Alexander Potapenko
  2022-07-12 14:17   ` Marco Elver
  2022-07-01 14:22 ` [PATCH v4 04/45] x86: asm: instrument usercopy in get_user() and __put_user_size() Alexander Potapenko
                   ` (41 subsequent siblings)
  44 siblings, 1 reply; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-01 14:22 UTC (permalink / raw)
  To: glider
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, linux-mm, linux-arch,
	linux-kernel

Introduce instrument_copy_from_user_before() and
instrument_copy_from_user_after() hooks to be invoked before and after
the call to copy_from_user().

KASAN and KCSAN will be only using instrument_copy_from_user_before(),
but for KMSAN we'll need to insert code after copy_from_user().

Signed-off-by: Alexander Potapenko <glider@google.com>
---
v4:
 -- fix _copy_from_user_key() in arch/s390/lib/uaccess.c (Reported-by:
    kernel test robot <lkp@intel.com>)

Link: https://linux-review.googlesource.com/id/I855034578f0b0f126734cbd734fb4ae1d3a6af99
---
 arch/s390/lib/uaccess.c      |  3 ++-
 include/linux/instrumented.h | 21 +++++++++++++++++++--
 include/linux/uaccess.h      | 19 ++++++++++++++-----
 lib/iov_iter.c               |  9 ++++++---
 lib/usercopy.c               |  3 ++-
 5 files changed, 43 insertions(+), 12 deletions(-)

diff --git a/arch/s390/lib/uaccess.c b/arch/s390/lib/uaccess.c
index d7b3b193d1088..58033dfcb6d45 100644
--- a/arch/s390/lib/uaccess.c
+++ b/arch/s390/lib/uaccess.c
@@ -81,8 +81,9 @@ unsigned long _copy_from_user_key(void *to, const void __user *from,
 
 	might_fault();
 	if (!should_fail_usercopy()) {
-		instrument_copy_from_user(to, from, n);
+		instrument_copy_from_user_before(to, from, n);
 		res = raw_copy_from_user_key(to, from, n, key);
+		instrument_copy_from_user_after(to, from, n, res);
 	}
 	if (unlikely(res))
 		memset(to + (n - res), 0, res);
diff --git a/include/linux/instrumented.h b/include/linux/instrumented.h
index 42faebbaa202a..ee8f7d17d34f5 100644
--- a/include/linux/instrumented.h
+++ b/include/linux/instrumented.h
@@ -120,7 +120,7 @@ instrument_copy_to_user(void __user *to, const void *from, unsigned long n)
 }
 
 /**
- * instrument_copy_from_user - instrument writes of copy_from_user
+ * instrument_copy_from_user_before - add instrumentation before copy_from_user
  *
  * Instrument writes to kernel memory, that are due to copy_from_user (and
  * variants). The instrumentation should be inserted before the accesses.
@@ -130,10 +130,27 @@ instrument_copy_to_user(void __user *to, const void *from, unsigned long n)
  * @n number of bytes to copy
  */
 static __always_inline void
-instrument_copy_from_user(const void *to, const void __user *from, unsigned long n)
+instrument_copy_from_user_before(const void *to, const void __user *from, unsigned long n)
 {
 	kasan_check_write(to, n);
 	kcsan_check_write(to, n);
 }
 
+/**
+ * instrument_copy_from_user_after - add instrumentation after copy_from_user
+ *
+ * Instrument writes to kernel memory, that are due to copy_from_user (and
+ * variants). The instrumentation should be inserted after the accesses.
+ *
+ * @to destination address
+ * @from source address
+ * @n number of bytes to copy
+ * @left number of bytes not copied (as returned by copy_from_user)
+ */
+static __always_inline void
+instrument_copy_from_user_after(const void *to, const void __user *from,
+				unsigned long n, unsigned long left)
+{
+}
+
 #endif /* _LINUX_INSTRUMENTED_H */
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 5a328cf02b75e..da16e96680cf1 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -58,20 +58,28 @@
 static __always_inline __must_check unsigned long
 __copy_from_user_inatomic(void *to, const void __user *from, unsigned long n)
 {
-	instrument_copy_from_user(to, from, n);
+	unsigned long res;
+
+	instrument_copy_from_user_before(to, from, n);
 	check_object_size(to, n, false);
-	return raw_copy_from_user(to, from, n);
+	res = raw_copy_from_user(to, from, n);
+	instrument_copy_from_user_after(to, from, n, res);
+	return res;
 }
 
 static __always_inline __must_check unsigned long
 __copy_from_user(void *to, const void __user *from, unsigned long n)
 {
+	unsigned long res;
+
 	might_fault();
+	instrument_copy_from_user_before(to, from, n);
 	if (should_fail_usercopy())
 		return n;
-	instrument_copy_from_user(to, from, n);
 	check_object_size(to, n, false);
-	return raw_copy_from_user(to, from, n);
+	res = raw_copy_from_user(to, from, n);
+	instrument_copy_from_user_after(to, from, n, res);
+	return res;
 }
 
 /**
@@ -115,8 +123,9 @@ _copy_from_user(void *to, const void __user *from, unsigned long n)
 	unsigned long res = n;
 	might_fault();
 	if (!should_fail_usercopy() && likely(access_ok(from, n))) {
-		instrument_copy_from_user(to, from, n);
+		instrument_copy_from_user_before(to, from, n);
 		res = raw_copy_from_user(to, from, n);
+		instrument_copy_from_user_after(to, from, n, res);
 	}
 	if (unlikely(res))
 		memset(to + (n - res), 0, res);
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 0b64695ab632f..fe5d169314dbf 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -159,13 +159,16 @@ static int copyout(void __user *to, const void *from, size_t n)
 
 static int copyin(void *to, const void __user *from, size_t n)
 {
+	size_t res = n;
+
 	if (should_fail_usercopy())
 		return n;
 	if (access_ok(from, n)) {
-		instrument_copy_from_user(to, from, n);
-		n = raw_copy_from_user(to, from, n);
+		instrument_copy_from_user_before(to, from, n);
+		res = raw_copy_from_user(to, from, n);
+		instrument_copy_from_user_after(to, from, n, res);
 	}
-	return n;
+	return res;
 }
 
 static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t bytes,
diff --git a/lib/usercopy.c b/lib/usercopy.c
index 7413dd300516e..1505a52f23a01 100644
--- a/lib/usercopy.c
+++ b/lib/usercopy.c
@@ -12,8 +12,9 @@ unsigned long _copy_from_user(void *to, const void __user *from, unsigned long n
 	unsigned long res = n;
 	might_fault();
 	if (!should_fail_usercopy() && likely(access_ok(from, n))) {
-		instrument_copy_from_user(to, from, n);
+		instrument_copy_from_user_before(to, from, n);
 		res = raw_copy_from_user(to, from, n);
+		instrument_copy_from_user_after(to, from, n, res);
 	}
 	if (unlikely(res))
 		memset(to + (n - res), 0, res);
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH v4 04/45] x86: asm: instrument usercopy in get_user() and __put_user_size()
  2022-07-01 14:22 [PATCH v4 00/45] Add KernelMemorySanitizer infrastructure Alexander Potapenko
                   ` (2 preceding siblings ...)
  2022-07-01 14:22 ` [PATCH v4 03/45] instrumented.h: allow instrumenting both sides of copy_from_user() Alexander Potapenko
@ 2022-07-01 14:22 ` Alexander Potapenko
  2022-07-02  3:47   ` kernel test robot
                     ` (3 more replies)
  2022-07-01 14:22 ` [PATCH v4 05/45] asm-generic: instrument usercopy in cacheflush.h Alexander Potapenko
                   ` (40 subsequent siblings)
  44 siblings, 4 replies; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-01 14:22 UTC (permalink / raw)
  To: glider
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, linux-mm, linux-arch,
	linux-kernel

Use hooks from instrumented.h to notify bug detection tools about
usercopy events in get_user() and put_user_size().

It's still unclear how to instrument put_user(), which assumes that
instrumentation code doesn't clobber RAX.

Signed-off-by: Alexander Potapenko <glider@google.com>
---
Link: https://linux-review.googlesource.com/id/Ia9f12bfe5832623250e20f1859fdf5cc485a2fce
---
 arch/x86/include/asm/uaccess.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 913e593a3b45f..1a8b5a234474f 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -5,6 +5,7 @@
  * User space memory access functions
  */
 #include <linux/compiler.h>
+#include <linux/instrumented.h>
 #include <linux/kasan-checks.h>
 #include <linux/string.h>
 #include <asm/asm.h>
@@ -99,11 +100,13 @@ extern int __get_user_bad(void);
 	int __ret_gu;							\
 	register __inttype(*(ptr)) __val_gu asm("%"_ASM_DX);		\
 	__chk_user_ptr(ptr);						\
+	instrument_copy_from_user_before((void *)&(x), ptr, sizeof(*(ptr))); \
 	asm volatile("call __" #fn "_%P4"				\
 		     : "=a" (__ret_gu), "=r" (__val_gu),		\
 			ASM_CALL_CONSTRAINT				\
 		     : "0" (ptr), "i" (sizeof(*(ptr))));		\
 	(x) = (__force __typeof__(*(ptr))) __val_gu;			\
+	instrument_copy_from_user_after((void *)&(x), ptr, sizeof(*(ptr)), 0); \
 	__builtin_expect(__ret_gu, 0);					\
 })
 
@@ -248,7 +251,9 @@ extern void __put_user_nocheck_8(void);
 
 #define __put_user_size(x, ptr, size, label)				\
 do {									\
+	__typeof__(*(ptr)) __pus_val = x;				\
 	__chk_user_ptr(ptr);						\
+	instrument_copy_to_user(ptr, &(__pus_val), size);		\
 	switch (size) {							\
 	case 1:								\
 		__put_user_goto(x, ptr, "b", "iq", label);		\
@@ -286,6 +291,7 @@ do {									\
 #define __get_user_size(x, ptr, size, label)				\
 do {									\
 	__chk_user_ptr(ptr);						\
+	instrument_copy_from_user_before((void *)&(x), ptr, size);	\
 	switch (size) {							\
 	case 1:	{							\
 		unsigned char x_u8__;					\
@@ -305,6 +311,7 @@ do {									\
 	default:							\
 		(x) = __get_user_bad();					\
 	}								\
+	instrument_copy_from_user_after((void *)&(x), ptr, size, 0);	\
 } while (0)
 
 #define __get_user_asm(x, addr, itype, ltype, label)			\
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH v4 05/45] asm-generic: instrument usercopy in cacheflush.h
  2022-07-01 14:22 [PATCH v4 00/45] Add KernelMemorySanitizer infrastructure Alexander Potapenko
                   ` (3 preceding siblings ...)
  2022-07-01 14:22 ` [PATCH v4 04/45] x86: asm: instrument usercopy in get_user() and __put_user_size() Alexander Potapenko
@ 2022-07-01 14:22 ` Alexander Potapenko
  2022-07-12 14:17   ` Marco Elver
  2022-07-01 14:22 ` [PATCH v4 06/45] kmsan: add ReST documentation Alexander Potapenko
                   ` (39 subsequent siblings)
  44 siblings, 1 reply; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-01 14:22 UTC (permalink / raw)
  To: glider
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, linux-mm, linux-arch,
	linux-kernel

Notify memory tools about usercopy events in copy_to_user_page() and
copy_from_user_page().

Signed-off-by: Alexander Potapenko <glider@google.com>
---
Link: https://linux-review.googlesource.com/id/Ic1ee8da1886325f46ad67f52176f48c2c836c48f
---
 include/asm-generic/cacheflush.h | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/include/asm-generic/cacheflush.h b/include/asm-generic/cacheflush.h
index 4f07afacbc239..0f63eb325025f 100644
--- a/include/asm-generic/cacheflush.h
+++ b/include/asm-generic/cacheflush.h
@@ -2,6 +2,8 @@
 #ifndef _ASM_GENERIC_CACHEFLUSH_H
 #define _ASM_GENERIC_CACHEFLUSH_H
 
+#include <linux/instrumented.h>
+
 struct mm_struct;
 struct vm_area_struct;
 struct page;
@@ -105,6 +107,7 @@ static inline void flush_cache_vunmap(unsigned long start, unsigned long end)
 #ifndef copy_to_user_page
 #define copy_to_user_page(vma, page, vaddr, dst, src, len)	\
 	do { \
+		instrument_copy_to_user(dst, src, len); \
 		memcpy(dst, src, len); \
 		flush_icache_user_page(vma, page, vaddr, len); \
 	} while (0)
@@ -112,7 +115,11 @@ static inline void flush_cache_vunmap(unsigned long start, unsigned long end)
 
 #ifndef copy_from_user_page
 #define copy_from_user_page(vma, page, vaddr, dst, src, len) \
-	memcpy(dst, src, len)
+	do { \
+		instrument_copy_from_user_before(dst, src, len); \
+		memcpy(dst, src, len); \
+		instrument_copy_from_user_after(dst, src, len, 0); \
+	} while (0)
 #endif
 
 #endif /* _ASM_GENERIC_CACHEFLUSH_H */
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH v4 06/45] kmsan: add ReST documentation
  2022-07-01 14:22 [PATCH v4 00/45] Add KernelMemorySanitizer infrastructure Alexander Potapenko
                   ` (4 preceding siblings ...)
  2022-07-01 14:22 ` [PATCH v4 05/45] asm-generic: instrument usercopy in cacheflush.h Alexander Potapenko
@ 2022-07-01 14:22 ` Alexander Potapenko
  2022-07-07 12:34   ` Marco Elver
  2022-07-01 14:22 ` [PATCH v4 07/45] kmsan: introduce __no_sanitize_memory and __no_kmsan_checks Alexander Potapenko
                   ` (38 subsequent siblings)
  44 siblings, 1 reply; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-01 14:22 UTC (permalink / raw)
  To: glider
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, linux-mm, linux-arch,
	linux-kernel

Add Documentation/dev-tools/kmsan.rst and reference it in the dev-tools
index.

Signed-off-by: Alexander Potapenko <glider@google.com>
---
v2:
 -- added a note that KMSAN is not intended for production use

v4:
 -- describe CONFIG_KMSAN_CHECK_PARAM_RETVAL
 -- drop mentions of cpu_entry_area
 -- add SPDX license

Link: https://linux-review.googlesource.com/id/I751586f79418b95550a83c6035c650b5b01567cc
---
 Documentation/dev-tools/index.rst |   1 +
 Documentation/dev-tools/kmsan.rst | 422 ++++++++++++++++++++++++++++++
 2 files changed, 423 insertions(+)
 create mode 100644 Documentation/dev-tools/kmsan.rst

diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
index 4621eac290f46..6b0663075dc04 100644
--- a/Documentation/dev-tools/index.rst
+++ b/Documentation/dev-tools/index.rst
@@ -24,6 +24,7 @@ Documentation/dev-tools/testing-overview.rst
    kcov
    gcov
    kasan
+   kmsan
    ubsan
    kmemleak
    kcsan
diff --git a/Documentation/dev-tools/kmsan.rst b/Documentation/dev-tools/kmsan.rst
new file mode 100644
index 0000000000000..3fa5d7fb222c9
--- /dev/null
+++ b/Documentation/dev-tools/kmsan.rst
@@ -0,0 +1,422 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. Copyright (C) 2022, Google LLC.
+
+=============================
+KernelMemorySanitizer (KMSAN)
+=============================
+
+KMSAN is a dynamic error detector aimed at finding uses of uninitialized
+values. It is based on compiler instrumentation, and is quite similar to the
+userspace `MemorySanitizer tool`_.
+
+An important note is that KMSAN is not intended for production use, because it
+drastically increases kernel memory footprint and slows the whole system down.
+
+Example report
+==============
+
+Here is an example of a KMSAN report::
+
+  =====================================================
+  BUG: KMSAN: uninit-value in test_uninit_kmsan_check_memory+0x1be/0x380 [kmsan_test]
+   test_uninit_kmsan_check_memory+0x1be/0x380 mm/kmsan/kmsan_test.c:273
+   kunit_run_case_internal lib/kunit/test.c:333
+   kunit_try_run_case+0x206/0x420 lib/kunit/test.c:374
+   kunit_generic_run_threadfn_adapter+0x6d/0xc0 lib/kunit/try-catch.c:28
+   kthread+0x721/0x850 kernel/kthread.c:327
+   ret_from_fork+0x1f/0x30 ??:?
+
+  Uninit was stored to memory at:
+   do_uninit_local_array+0xfa/0x110 mm/kmsan/kmsan_test.c:260
+   test_uninit_kmsan_check_memory+0x1a2/0x380 mm/kmsan/kmsan_test.c:271
+   kunit_run_case_internal lib/kunit/test.c:333
+   kunit_try_run_case+0x206/0x420 lib/kunit/test.c:374
+   kunit_generic_run_threadfn_adapter+0x6d/0xc0 lib/kunit/try-catch.c:28
+   kthread+0x721/0x850 kernel/kthread.c:327
+   ret_from_fork+0x1f/0x30 ??:?
+
+  Local variable uninit created at:
+   do_uninit_local_array+0x4a/0x110 mm/kmsan/kmsan_test.c:256
+   test_uninit_kmsan_check_memory+0x1a2/0x380 mm/kmsan/kmsan_test.c:271
+
+  Bytes 4-7 of 8 are uninitialized
+  Memory access of size 8 starts at ffff888083fe3da0
+
+  CPU: 0 PID: 6731 Comm: kunit_try_catch Tainted: G    B       E     5.16.0-rc3+ #104
+  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
+  =====================================================
+
+
+The report says that the local variable ``uninit`` was created uninitialized in
+``do_uninit_local_array()``. The lower stack trace corresponds to the place
+where this variable was created.
+
+The upper stack shows where the uninit value was used - in
+``test_uninit_kmsan_check_memory()``. The tool shows the bytes which were left
+uninitialized in the local variable, as well as the stack where the value was
+copied to another memory location before use.
+
+A use of uninitialized value ``v`` is reported by KMSAN in the following cases:
+ - in a condition, e.g. ``if (v) { ... }``;
+ - in an indexing or pointer dereferencing, e.g. ``array[v]`` or ``*v``;
+ - when it is copied to userspace or hardware, e.g. ``copy_to_user(..., &v, ...)``;
+ - when it is passed as an argument to a function, and
+   ``CONFIG_KMSAN_CHECK_PARAM_RETVAL`` is enabled (see below).
+
+The mentioned cases (apart from copying data to userspace or hardware, which is
+a security issue) are considered undefined behavior from the C11 Standard point
+of view.
+
+KMSAN and Clang
+===============
+
+In order for KMSAN to work the kernel must be built with Clang, which so far is
+the only compiler that has KMSAN support. The kernel instrumentation pass is
+based on the userspace `MemorySanitizer tool`_.
+
+How to build
+============
+
+In order to build a kernel with KMSAN you will need a fresh Clang (14.0.0+).
+Please refer to `LLVM documentation`_ for the instructions on how to build Clang.
+
+Now configure and build the kernel with CONFIG_KMSAN enabled.
+
+How KMSAN works
+===============
+
+KMSAN shadow memory
+-------------------
+
+KMSAN associates a metadata byte (also called shadow byte) with every byte of
+kernel memory. A bit in the shadow byte is set iff the corresponding bit of the
+kernel memory byte is uninitialized. Marking the memory uninitialized (i.e.
+setting its shadow bytes to ``0xff``) is called poisoning, marking it
+initialized (setting the shadow bytes to ``0x00``) is called unpoisoning.
+
+When a new variable is allocated on the stack, it is poisoned by default by
+instrumentation code inserted by the compiler (unless it is a stack variable
+that is immediately initialized). Any new heap allocation done without
+``__GFP_ZERO`` is also poisoned.
+
+Compiler instrumentation also tracks the shadow values with the help from the
+runtime library in ``mm/kmsan/``.
+
+The shadow value of a basic or compound type is an array of bytes of the same
+length. When a constant value is written into memory, that memory is unpoisoned.
+When a value is read from memory, its shadow memory is also obtained and
+propagated into all the operations which use that value. For every instruction
+that takes one or more values the compiler generates code that calculates the
+shadow of the result depending on those values and their shadows.
+
+Example::
+
+  int a = 0xff;  // i.e. 0x000000ff
+  int b;
+  int c = a | b;
+
+In this case the shadow of ``a`` is ``0``, shadow of ``b`` is ``0xffffffff``,
+shadow of ``c`` is ``0xffffff00``. This means that the upper three bytes of
+``c`` are uninitialized, while the lower byte is initialized.
+
+
+Origin tracking
+---------------
+
+Every four bytes of kernel memory also have a so-called origin assigned to
+them. This origin describes the point in program execution at which the
+uninitialized value was created. Every origin is associated with either the
+full allocation stack (for heap-allocated memory), or the function containing
+the uninitialized variable (for locals).
+
+When an uninitialized variable is allocated on stack or heap, a new origin
+value is created, and that variable's origin is filled with that value.
+When a value is read from memory, its origin is also read and kept together
+with the shadow. For every instruction that takes one or more values the origin
+of the result is one of the origins corresponding to any of the uninitialized
+inputs. If a poisoned value is written into memory, its origin is written to the
+corresponding storage as well.
+
+Example 1::
+
+  int a = 42;
+  int b;
+  int c = a + b;
+
+In this case the origin of ``b`` is generated upon function entry, and is
+stored to the origin of ``c`` right before the addition result is written into
+memory.
+
+Several variables may share the same origin address, if they are stored in the
+same four-byte chunk. In this case every write to either variable updates the
+origin for all of them. We have to sacrifice precision in this case, because
+storing origins for individual bits (and even bytes) would be too costly.
+
+Example 2::
+
+  int combine(short a, short b) {
+    union ret_t {
+      int i;
+      short s[2];
+    } ret;
+    ret.s[0] = a;
+    ret.s[1] = b;
+    return ret.i;
+  }
+
+If ``a`` is initialized and ``b`` is not, the shadow of the result would be
+0xffff0000, and the origin of the result would be the origin of ``b``.
+``ret.s[0]`` would have the same origin, but it will be never used, because
+that variable is initialized.
+
+If both function arguments are uninitialized, only the origin of the second
+argument is preserved.
+
+Origin chaining
+~~~~~~~~~~~~~~~
+
+To ease debugging, KMSAN creates a new origin for every store of an
+uninitialized value to memory. The new origin references both its creation stack
+and the previous origin the value had. This may cause increased memory
+consumption, so we limit the length of origin chains in the runtime.
+
+Clang instrumentation API
+-------------------------
+
+Clang instrumentation pass inserts calls to functions defined in
+``mm/kmsan/instrumentation.c`` into the kernel code.
+
+Shadow manipulation
+~~~~~~~~~~~~~~~~~~~
+
+For every memory access the compiler emits a call to a function that returns a
+pair of pointers to the shadow and origin addresses of the given memory::
+
+  typedef struct {
+    void *shadow, *origin;
+  } shadow_origin_ptr_t
+
+  shadow_origin_ptr_t __msan_metadata_ptr_for_load_{1,2,4,8}(void *addr)
+  shadow_origin_ptr_t __msan_metadata_ptr_for_store_{1,2,4,8}(void *addr)
+  shadow_origin_ptr_t __msan_metadata_ptr_for_load_n(void *addr, uintptr_t size)
+  shadow_origin_ptr_t __msan_metadata_ptr_for_store_n(void *addr, uintptr_t size)
+
+The function name depends on the memory access size.
+
+The compiler makes sure that for every loaded value its shadow and origin
+values are read from memory. When a value is stored to memory, its shadow and
+origin are also stored using the metadata pointers.
+
+Handling locals
+~~~~~~~~~~~~~~~
+
+A special function is used to create a new origin value for a local variable and
+set the origin of that variable to that value::
+
+  void __msan_poison_alloca(void *addr, uintptr_t size, char *descr)
+
+Access to per-task data
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+At the beginning of every instrumented function KMSAN inserts a call to
+``__msan_get_context_state()``::
+
+  kmsan_context_state *__msan_get_context_state(void)
+
+``kmsan_context_state`` is declared in ``include/linux/kmsan.h``::
+
+  struct kmsan_context_state {
+    char param_tls[KMSAN_PARAM_SIZE];
+    char retval_tls[KMSAN_RETVAL_SIZE];
+    char va_arg_tls[KMSAN_PARAM_SIZE];
+    char va_arg_origin_tls[KMSAN_PARAM_SIZE];
+    u64 va_arg_overflow_size_tls;
+    char param_origin_tls[KMSAN_PARAM_SIZE];
+    depot_stack_handle_t retval_origin_tls;
+  };
+
+This structure is used by KMSAN to pass parameter shadows and origins between
+instrumented functions (unless the parameters are checked immediately by
+``CONFIG_KMSAN_CHECK_PARAM_RETVAL``).
+
+Passing uninitialized values to functions
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+KMSAN instrumentation pass has an option, ``-fsanitize-memory-param-retval``,
+which makes the compiler check function parameters passed by value, as well as
+function return values.
+
+The option is controlled by ``CONFIG_KMSAN_CHECK_PARAM_RETVAL``, which is
+enabled by default to let KMSAN report uninitialized values earlier.
+Please refer to the `LKML discussion`_ for more details.
+
+Because of the way the checks are implemented in LLVM (they are only applied to
+parameters marked as ``noundef``), not all parameters are guaranteed to be
+checked, so we cannot give up the metadata storage in ``kmsan_context_state``.
+
+String functions
+~~~~~~~~~~~~~~~~
+
+The compiler replaces calls to ``memcpy()``/``memmove()``/``memset()`` with the
+following functions. These functions are also called when data structures are
+initialized or copied, making sure shadow and origin values are copied alongside
+with the data::
+
+  void *__msan_memcpy(void *dst, void *src, uintptr_t n)
+  void *__msan_memmove(void *dst, void *src, uintptr_t n)
+  void *__msan_memset(void *dst, int c, uintptr_t n)
+
+Error reporting
+~~~~~~~~~~~~~~~
+
+For each use of a value the compiler emits a shadow check that calls
+``__msan_warning()`` in the case that value is poisoned::
+
+  void __msan_warning(u32 origin)
+
+``__msan_warning()`` causes KMSAN runtime to print an error report.
+
+Inline assembly instrumentation
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+KMSAN instruments every inline assembly output with a call to::
+
+  void __msan_instrument_asm_store(void *addr, uintptr_t size)
+
+, which unpoisons the memory region.
+
+This approach may mask certain errors, but it also helps to avoid a lot of
+false positives in bitwise operations, atomics etc.
+
+Sometimes the pointers passed into inline assembly do not point to valid memory.
+In such cases they are ignored at runtime.
+
+Disabling the instrumentation
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+A function can be marked with ``__no_kmsan_checks``. Doing so makes KMSAN
+ignore uninitialized values in that function and mark its output as initialized.
+As a result, the user will not get KMSAN reports related to that function.
+
+Another function attribute supported by KMSAN is ``__no_sanitize_memory``.
+Applying this attribute to a function will result in KMSAN not instrumenting it,
+which can be helpful if we do not want the compiler to mess up some low-level
+code (e.g. that marked with ``noinstr``).
+
+This however comes at a cost: stack allocations from such functions will have
+incorrect shadow/origin values, likely leading to false positives. Functions
+called from non-instrumented code may also receive incorrect metadata for their
+parameters.
+
+As a rule of thumb, avoid using ``__no_sanitize_memory`` explicitly.
+
+It is also possible to disable KMSAN for a single file (e.g. main.o)::
+
+  KMSAN_SANITIZE_main.o := n
+
+or for the whole directory::
+
+  KMSAN_SANITIZE := n
+
+in the Makefile. Think of this as applying ``__no_sanitize_memory`` to every
+function in the file or directory. Most users won't need KMSAN_SANITIZE, unless
+their code gets broken by KMSAN (e.g. runs at early boot time).
+
+Runtime library
+---------------
+
+The code is located in ``mm/kmsan/``.
+
+Per-task KMSAN state
+~~~~~~~~~~~~~~~~~~~~
+
+Every task_struct has an associated KMSAN task state that holds the KMSAN
+context (see above) and a per-task flag disallowing KMSAN reports::
+
+  struct kmsan_context {
+    ...
+    bool allow_reporting;
+    struct kmsan_context_state cstate;
+    ...
+  }
+
+  struct task_struct {
+    ...
+    struct kmsan_context kmsan;
+    ...
+  }
+
+
+KMSAN contexts
+~~~~~~~~~~~~~~
+
+When running in a kernel task context, KMSAN uses ``current->kmsan.cstate`` to
+hold the metadata for function parameters and return values.
+
+But in the case the kernel is running in the interrupt, softirq or NMI context,
+where ``current`` is unavailable, KMSAN switches to per-cpu interrupt state::
+
+  DEFINE_PER_CPU(struct kmsan_ctx, kmsan_percpu_ctx);
+
+Metadata allocation
+~~~~~~~~~~~~~~~~~~~
+
+There are several places in the kernel for which the metadata is stored.
+
+1. Each ``struct page`` instance contains two pointers to its shadow and
+origin pages::
+
+  struct page {
+    ...
+    struct page *shadow, *origin;
+    ...
+  };
+
+At boot-time, the kernel allocates shadow and origin pages for every available
+kernel page. This is done quite late, when the kernel address space is already
+fragmented, so normal data pages may arbitrarily interleave with the metadata
+pages.
+
+This means that in general for two contiguous memory pages their shadow/origin
+pages may not be contiguous. So, if a memory access crosses the boundary
+of a memory block, accesses to shadow/origin memory may potentially corrupt
+other pages or read incorrect values from them.
+
+In practice, contiguous memory pages returned by the same ``alloc_pages()``
+call will have contiguous metadata, whereas if these pages belong to two
+different allocations their metadata pages can be fragmented.
+
+For the kernel data (``.data``, ``.bss`` etc.) and percpu memory regions
+there also are no guarantees on metadata contiguity.
+
+In the case ``__msan_metadata_ptr_for_XXX_YYY()`` hits the border between two
+pages with non-contiguous metadata, it returns pointers to fake shadow/origin regions::
+
+  char dummy_load_page[PAGE_SIZE] __attribute__((aligned(PAGE_SIZE)));
+  char dummy_store_page[PAGE_SIZE] __attribute__((aligned(PAGE_SIZE)));
+
+``dummy_load_page`` is zero-initialized, so reads from it always yield zeroes.
+All stores to ``dummy_store_page`` are ignored.
+
+2. For vmalloc memory and modules, there is a direct mapping between the memory
+range, its shadow and origin. KMSAN reduces the vmalloc area by 3/4, making only
+the first quarter available to ``vmalloc()``. The second quarter of the vmalloc
+area contains shadow memory for the first quarter, the third one holds the
+origins. A small part of the fourth quarter contains shadow and origins for the
+kernel modules. Please refer to ``arch/x86/include/asm/pgtable_64_types.h`` for
+more details.
+
+When an array of pages is mapped into a contiguous virtual memory space, their
+shadow and origin pages are similarly mapped into contiguous regions.
+
+References
+==========
+
+E. Stepanov, K. Serebryany. `MemorySanitizer: fast detector of uninitialized
+memory use in C++
+<https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43308.pdf>`_.
+In Proceedings of CGO 2015.
+
+.. _MemorySanitizer tool: https://clang.llvm.org/docs/MemorySanitizer.html
+.. _LLVM documentation: https://llvm.org/docs/GettingStarted.html
+.. _LKML discussion: https://lore.kernel.org/all/20220614144853.3693273-1-glider@google.com/
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH v4 07/45] kmsan: introduce __no_sanitize_memory and __no_kmsan_checks
  2022-07-01 14:22 [PATCH v4 00/45] Add KernelMemorySanitizer infrastructure Alexander Potapenko
                   ` (5 preceding siblings ...)
  2022-07-01 14:22 ` [PATCH v4 06/45] kmsan: add ReST documentation Alexander Potapenko
@ 2022-07-01 14:22 ` Alexander Potapenko
  2022-07-12 14:17   ` Marco Elver
  2022-07-01 14:22 ` [PATCH v4 08/45] kmsan: mark noinstr as __no_sanitize_memory Alexander Potapenko
                   ` (37 subsequent siblings)
  44 siblings, 1 reply; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-01 14:22 UTC (permalink / raw)
  To: glider
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, linux-mm, linux-arch,
	linux-kernel

__no_sanitize_memory is a function attribute that instructs KMSAN to
skip a function during instrumentation. This is needed to e.g. implement
the noinstr functions.

__no_kmsan_checks is a function attribute that makes KMSAN
ignore the uninitialized values coming from the function's
inputs, and initialize the function's outputs.

Functions marked with this attribute can't be inlined into functions
not marked with it, and vice versa. This behavior is overridden by
__always_inline.

__SANITIZE_MEMORY__ is a macro that's defined iff the file is
instrumented with KMSAN. This is not the same as CONFIG_KMSAN, which is
defined for every file.

Signed-off-by: Alexander Potapenko <glider@google.com>
---
Link: https://linux-review.googlesource.com/id/I004ff0360c918d3cd8b18767ddd1381c6d3281be
---
 include/linux/compiler-clang.h | 23 +++++++++++++++++++++++
 include/linux/compiler-gcc.h   |  6 ++++++
 2 files changed, 29 insertions(+)

diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
index c84fec767445d..4fa0cc4cbd2c8 100644
--- a/include/linux/compiler-clang.h
+++ b/include/linux/compiler-clang.h
@@ -51,6 +51,29 @@
 #define __no_sanitize_undefined
 #endif
 
+#if __has_feature(memory_sanitizer)
+#define __SANITIZE_MEMORY__
+/*
+ * Unlike other sanitizers, KMSAN still inserts code into functions marked with
+ * no_sanitize("kernel-memory"). Using disable_sanitizer_instrumentation
+ * provides the behavior consistent with other __no_sanitize_ attributes,
+ * guaranteeing that __no_sanitize_memory functions remain uninstrumented.
+ */
+#define __no_sanitize_memory __disable_sanitizer_instrumentation
+
+/*
+ * The __no_kmsan_checks attribute ensures that a function does not produce
+ * false positive reports by:
+ *  - initializing all local variables and memory stores in this function;
+ *  - skipping all shadow checks;
+ *  - passing initialized arguments to this function's callees.
+ */
+#define __no_kmsan_checks __attribute__((no_sanitize("kernel-memory")))
+#else
+#define __no_sanitize_memory
+#define __no_kmsan_checks
+#endif
+
 /*
  * Support for __has_feature(coverage_sanitizer) was added in Clang 13 together
  * with no_sanitize("coverage"). Prior versions of Clang support coverage
diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
index a0c55eeaeaf16..63eb90eddad77 100644
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -125,6 +125,12 @@
 #define __SANITIZE_ADDRESS__
 #endif
 
+/*
+ * GCC does not support KMSAN.
+ */
+#define __no_sanitize_memory
+#define __no_kmsan_checks
+
 /*
  * Turn individual warnings and errors on and off locally, depending
  * on version.
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH v4 08/45] kmsan: mark noinstr as __no_sanitize_memory
  2022-07-01 14:22 [PATCH v4 00/45] Add KernelMemorySanitizer infrastructure Alexander Potapenko
                   ` (6 preceding siblings ...)
  2022-07-01 14:22 ` [PATCH v4 07/45] kmsan: introduce __no_sanitize_memory and __no_kmsan_checks Alexander Potapenko
@ 2022-07-01 14:22 ` Alexander Potapenko
  2022-07-12 14:17   ` Marco Elver
  2022-07-01 14:22 ` [PATCH v4 09/45] x86: kmsan: pgtable: reduce vmalloc space Alexander Potapenko
                   ` (36 subsequent siblings)
  44 siblings, 1 reply; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-01 14:22 UTC (permalink / raw)
  To: glider
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, linux-mm, linux-arch,
	linux-kernel

noinstr functions should never be instrumented, so make KMSAN skip them
by applying the __no_sanitize_memory attribute.

Signed-off-by: Alexander Potapenko <glider@google.com>
---
v2:
 -- moved this patch earlier in the series per Mark Rutland's request

Link: https://linux-review.googlesource.com/id/I3c9abe860b97b49bc0c8026918b17a50448dec0d
---
 include/linux/compiler_types.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
index d08dfcb0ac687..fb5777e5228e7 100644
--- a/include/linux/compiler_types.h
+++ b/include/linux/compiler_types.h
@@ -227,7 +227,8 @@ struct ftrace_likely_data {
 /* Section for code which can't be instrumented at all */
 #define noinstr								\
 	noinline notrace __attribute((__section__(".noinstr.text")))	\
-	__no_kcsan __no_sanitize_address __no_profile __no_sanitize_coverage
+	__no_kcsan __no_sanitize_address __no_profile __no_sanitize_coverage \
+	__no_sanitize_memory
 
 #endif /* __KERNEL__ */
 
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH v4 09/45] x86: kmsan: pgtable: reduce vmalloc space
  2022-07-01 14:22 [PATCH v4 00/45] Add KernelMemorySanitizer infrastructure Alexander Potapenko
                   ` (7 preceding siblings ...)
  2022-07-01 14:22 ` [PATCH v4 08/45] kmsan: mark noinstr as __no_sanitize_memory Alexander Potapenko
@ 2022-07-01 14:22 ` Alexander Potapenko
  2022-07-11 16:12   ` Marco Elver
  2022-07-01 14:22 ` [PATCH v4 10/45] libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE Alexander Potapenko
                   ` (35 subsequent siblings)
  44 siblings, 1 reply; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-01 14:22 UTC (permalink / raw)
  To: glider
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, linux-mm, linux-arch,
	linux-kernel

KMSAN is going to use 3/4 of existing vmalloc space to hold the
metadata, therefore we lower VMALLOC_END to make sure vmalloc() doesn't
allocate past the first 1/4.

Signed-off-by: Alexander Potapenko <glider@google.com>
---
v2:
 -- added x86: to the title

Link: https://linux-review.googlesource.com/id/I9d8b7f0a88a639f1263bc693cbd5c136626f7efd
---
 arch/x86/include/asm/pgtable_64_types.h | 41 ++++++++++++++++++++++++-
 arch/x86/mm/init_64.c                   |  2 +-
 2 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 70e360a2e5fb7..ad6ded5b1dedf 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -139,7 +139,46 @@ extern unsigned int ptrs_per_p4d;
 # define VMEMMAP_START		__VMEMMAP_BASE_L4
 #endif /* CONFIG_DYNAMIC_MEMORY_LAYOUT */
 
-#define VMALLOC_END		(VMALLOC_START + (VMALLOC_SIZE_TB << 40) - 1)
+#define VMEMORY_END		(VMALLOC_START + (VMALLOC_SIZE_TB << 40) - 1)
+
+#ifndef CONFIG_KMSAN
+#define VMALLOC_END		VMEMORY_END
+#else
+/*
+ * In KMSAN builds vmalloc area is four times smaller, and the remaining 3/4
+ * are used to keep the metadata for virtual pages. The memory formerly
+ * belonging to vmalloc area is now laid out as follows:
+ *
+ * 1st quarter: VMALLOC_START to VMALLOC_END - new vmalloc area
+ * 2nd quarter: KMSAN_VMALLOC_SHADOW_START to
+ *              VMALLOC_END+KMSAN_VMALLOC_SHADOW_OFFSET - vmalloc area shadow
+ * 3rd quarter: KMSAN_VMALLOC_ORIGIN_START to
+ *              VMALLOC_END+KMSAN_VMALLOC_ORIGIN_OFFSET - vmalloc area origins
+ * 4th quarter: KMSAN_MODULES_SHADOW_START to KMSAN_MODULES_ORIGIN_START
+ *              - shadow for modules,
+ *              KMSAN_MODULES_ORIGIN_START to
+ *              KMSAN_MODULES_ORIGIN_START + MODULES_LEN - origins for modules.
+ */
+#define VMALLOC_QUARTER_SIZE	((VMALLOC_SIZE_TB << 40) >> 2)
+#define VMALLOC_END		(VMALLOC_START + VMALLOC_QUARTER_SIZE - 1)
+
+/*
+ * vmalloc metadata addresses are calculated by adding shadow/origin offsets
+ * to vmalloc address.
+ */
+#define KMSAN_VMALLOC_SHADOW_OFFSET	VMALLOC_QUARTER_SIZE
+#define KMSAN_VMALLOC_ORIGIN_OFFSET	(VMALLOC_QUARTER_SIZE << 1)
+
+#define KMSAN_VMALLOC_SHADOW_START	(VMALLOC_START + KMSAN_VMALLOC_SHADOW_OFFSET)
+#define KMSAN_VMALLOC_ORIGIN_START	(VMALLOC_START + KMSAN_VMALLOC_ORIGIN_OFFSET)
+
+/*
+ * The shadow/origin for modules are placed one by one in the last 1/4 of
+ * vmalloc space.
+ */
+#define KMSAN_MODULES_SHADOW_START	(VMALLOC_END + KMSAN_VMALLOC_ORIGIN_OFFSET + 1)
+#define KMSAN_MODULES_ORIGIN_START	(KMSAN_MODULES_SHADOW_START + MODULES_LEN)
+#endif /* CONFIG_KMSAN */
 
 #define MODULES_VADDR		(__START_KERNEL_map + KERNEL_IMAGE_SIZE)
 /* The module sections ends with the start of the fixmap */
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 39c5246964a91..5806331172361 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1287,7 +1287,7 @@ static void __init preallocate_vmalloc_pages(void)
 	unsigned long addr;
 	const char *lvl;
 
-	for (addr = VMALLOC_START; addr <= VMALLOC_END; addr = ALIGN(addr + 1, PGDIR_SIZE)) {
+	for (addr = VMALLOC_START; addr <= VMEMORY_END; addr = ALIGN(addr + 1, PGDIR_SIZE)) {
 		pgd_t *pgd = pgd_offset_k(addr);
 		p4d_t *p4d;
 		pud_t *pud;
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH v4 10/45] libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE
  2022-07-01 14:22 [PATCH v4 00/45] Add KernelMemorySanitizer infrastructure Alexander Potapenko
                   ` (8 preceding siblings ...)
  2022-07-01 14:22 ` [PATCH v4 09/45] x86: kmsan: pgtable: reduce vmalloc space Alexander Potapenko
@ 2022-07-01 14:22 ` Alexander Potapenko
  2022-07-11 16:26   ` Marco Elver
  2022-07-01 14:22 ` [PATCH v4 11/45] kmsan: add KMSAN runtime core Alexander Potapenko
                   ` (34 subsequent siblings)
  44 siblings, 1 reply; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-01 14:22 UTC (permalink / raw)
  To: glider
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, linux-mm, linux-arch,
	linux-kernel

KMSAN adds extra metadata fields to struct page, so it does not fit into
64 bytes anymore.

Signed-off-by: Alexander Potapenko <glider@google.com>
---
Link: https://linux-review.googlesource.com/id/I353796acc6a850bfd7bb342aa1b63e616fc614f1
---
 drivers/nvdimm/nd.h       | 2 +-
 drivers/nvdimm/pfn_devs.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index ec5219680092d..85ca5b4da3cf3 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -652,7 +652,7 @@ void devm_namespace_disable(struct device *dev,
 		struct nd_namespace_common *ndns);
 #if IS_ENABLED(CONFIG_ND_CLAIM)
 /* max struct page size independent of kernel config */
-#define MAX_STRUCT_PAGE_SIZE 64
+#define MAX_STRUCT_PAGE_SIZE 128
 int nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct dev_pagemap *pgmap);
 #else
 static inline int nvdimm_setup_pfn(struct nd_pfn *nd_pfn,
diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index 0e92ab4b32833..61af072ac98f9 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -787,7 +787,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
 		 * when populating the vmemmap. This *should* be equal to
 		 * PMD_SIZE for most architectures.
 		 *
-		 * Also make sure size of struct page is less than 64. We
+		 * Also make sure size of struct page is less than 128. We
 		 * want to make sure we use large enough size here so that
 		 * we don't have a dynamic reserve space depending on
 		 * struct page size. But we also want to make sure we notice
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH v4 11/45] kmsan: add KMSAN runtime core
  2022-07-01 14:22 [PATCH v4 00/45] Add KernelMemorySanitizer infrastructure Alexander Potapenko
                   ` (9 preceding siblings ...)
  2022-07-01 14:22 ` [PATCH v4 10/45] libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE Alexander Potapenko
@ 2022-07-01 14:22 ` Alexander Potapenko
  2022-07-02  0:18   ` Hillf Danton
                     ` (2 more replies)
  2022-07-01 14:22 ` [PATCH v4 12/45] kmsan: disable instrumentation of unsupported common kernel code Alexander Potapenko
                   ` (33 subsequent siblings)
  44 siblings, 3 replies; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-01 14:22 UTC (permalink / raw)
  To: glider
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, linux-mm, linux-arch,
	linux-kernel

For each memory location KernelMemorySanitizer maintains two types of
metadata:
1. The so-called shadow of that location - а byte:byte mapping describing
   whether or not individual bits of memory are initialized (shadow is 0)
   or not (shadow is 1).
2. The origins of that location - а 4-byte:4-byte mapping containing
   4-byte IDs of the stack traces where uninitialized values were
   created.

Each struct page now contains pointers to two struct pages holding
KMSAN metadata (shadow and origins) for the original struct page.
Utility routines in mm/kmsan/core.c and mm/kmsan/shadow.c handle the
metadata creation, addressing, copying and checking.
mm/kmsan/report.c performs error reporting in the cases an uninitialized
value is used in a way that leads to undefined behavior.

KMSAN compiler instrumentation is responsible for tracking the metadata
along with the kernel memory. mm/kmsan/instrumentation.c provides the
implementation for instrumentation hooks that are called from files
compiled with -fsanitize=kernel-memory.

To aid parameter passing (also done at instrumentation level), each
task_struct now contains a struct kmsan_task_state used to track the
metadata of function parameters and return values for that task.

Finally, this patch provides CONFIG_KMSAN that enables KMSAN, and
declares CFLAGS_KMSAN, which are applied to files compiled with KMSAN.
The KMSAN_SANITIZE:=n Makefile directive can be used to completely
disable KMSAN instrumentation for certain files.

Similarly, KMSAN_ENABLE_CHECKS:=n disables KMSAN checks and makes newly
created stack memory initialized.

Users can also use functions from include/linux/kmsan-checks.h to mark
certain memory regions as uninitialized or initialized (this is called
"poisoning" and "unpoisoning") or check that a particular region is
initialized.

Signed-off-by: Alexander Potapenko <glider@google.com>
---
v2:
 -- as requested by Greg K-H, moved hooks for different subsystems to respective patches,
    rewrote the patch description;
 -- addressed comments by Dmitry Vyukov;
 -- added a note about KMSAN being not intended for production use.
 -- fix case of unaligned dst in kmsan_internal_memmove_metadata()

v3:
 -- print build IDs in reports where applicable
 -- drop redundant filter_irq_stacks(), unpoison the local passed to __stack_depot_save()
 -- remove a stray BUG()

v4: (mostly fixes suggested by Marco Elver)
 -- add missing SPDX headers
 -- move CC_IS_CLANG && CLANG_VERSION under HAVE_KMSAN_COMPILER
 -- replace occurrences of |var| with @var
 -- reflow KMSAN_WARN_ON(), fix code comments
 -- remove x86-specific code from shadow.c to improve portability
 -- convert kmsan_report_lock to raw spinlock
 -- add enter_runtime/exit_runtime around kmsan_internal_memmove_metadata()
 -- remove unnecessary include from kmsan.h (reported by <lkp@intel.com>)
 -- introduce CONFIG_KMSAN_CHECK_PARAM_RETVAL (on by default), which
    maps to -fsanitize-memory-param-retval and makes KMSAN eagerly check
    values passed as function parameters and returned from functions.
 -- use real shadow in instrumented functions called from runtime

Link: https://linux-review.googlesource.com/id/I9b71bfe3425466c97159f9de0062e5e8e4fec866
---
 Makefile                     |   1 +
 include/linux/kmsan-checks.h |  64 +++++
 include/linux/kmsan.h        |  46 ++++
 include/linux/mm_types.h     |  12 +
 include/linux/sched.h        |   5 +
 lib/Kconfig.debug            |   1 +
 lib/Kconfig.kmsan            |  50 ++++
 mm/Makefile                  |   1 +
 mm/kmsan/Makefile            |  23 ++
 mm/kmsan/core.c              | 458 +++++++++++++++++++++++++++++++++++
 mm/kmsan/hooks.c             |  66 +++++
 mm/kmsan/instrumentation.c   | 271 +++++++++++++++++++++
 mm/kmsan/kmsan.h             | 190 +++++++++++++++
 mm/kmsan/report.c            | 211 ++++++++++++++++
 mm/kmsan/shadow.c            | 147 +++++++++++
 scripts/Makefile.kmsan       |   8 +
 scripts/Makefile.lib         |   9 +
 17 files changed, 1563 insertions(+)
 create mode 100644 include/linux/kmsan-checks.h
 create mode 100644 include/linux/kmsan.h
 create mode 100644 lib/Kconfig.kmsan
 create mode 100644 mm/kmsan/Makefile
 create mode 100644 mm/kmsan/core.c
 create mode 100644 mm/kmsan/hooks.c
 create mode 100644 mm/kmsan/instrumentation.c
 create mode 100644 mm/kmsan/kmsan.h
 create mode 100644 mm/kmsan/report.c
 create mode 100644 mm/kmsan/shadow.c
 create mode 100644 scripts/Makefile.kmsan

diff --git a/Makefile b/Makefile
index 8973b285ce6c7..7c93482f6df3d 100644
--- a/Makefile
+++ b/Makefile
@@ -1014,6 +1014,7 @@ include-y			:= scripts/Makefile.extrawarn
 include-$(CONFIG_DEBUG_INFO)	+= scripts/Makefile.debug
 include-$(CONFIG_KASAN)		+= scripts/Makefile.kasan
 include-$(CONFIG_KCSAN)		+= scripts/Makefile.kcsan
+include-$(CONFIG_KMSAN)		+= scripts/Makefile.kmsan
 include-$(CONFIG_UBSAN)		+= scripts/Makefile.ubsan
 include-$(CONFIG_KCOV)		+= scripts/Makefile.kcov
 include-$(CONFIG_RANDSTRUCT)	+= scripts/Makefile.randstruct
diff --git a/include/linux/kmsan-checks.h b/include/linux/kmsan-checks.h
new file mode 100644
index 0000000000000..a6522a0c28df9
--- /dev/null
+++ b/include/linux/kmsan-checks.h
@@ -0,0 +1,64 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * KMSAN checks to be used for one-off annotations in subsystems.
+ *
+ * Copyright (C) 2017-2022 Google LLC
+ * Author: Alexander Potapenko <glider@google.com>
+ *
+ */
+
+#ifndef _LINUX_KMSAN_CHECKS_H
+#define _LINUX_KMSAN_CHECKS_H
+
+#include <linux/types.h>
+
+#ifdef CONFIG_KMSAN
+
+/**
+ * kmsan_poison_memory() - Mark the memory range as uninitialized.
+ * @address: address to start with.
+ * @size:    size of buffer to poison.
+ * @flags:   GFP flags for allocations done by this function.
+ *
+ * Until other data is written to this range, KMSAN will treat it as
+ * uninitialized. Error reports for this memory will reference the call site of
+ * kmsan_poison_memory() as origin.
+ */
+void kmsan_poison_memory(const void *address, size_t size, gfp_t flags);
+
+/**
+ * kmsan_unpoison_memory() -  Mark the memory range as initialized.
+ * @address: address to start with.
+ * @size:    size of buffer to unpoison.
+ *
+ * Until other data is written to this range, KMSAN will treat it as
+ * initialized.
+ */
+void kmsan_unpoison_memory(const void *address, size_t size);
+
+/**
+ * kmsan_check_memory() - Check the memory range for being initialized.
+ * @address: address to start with.
+ * @size:    size of buffer to check.
+ *
+ * If any piece of the given range is marked as uninitialized, KMSAN will report
+ * an error.
+ */
+void kmsan_check_memory(const void *address, size_t size);
+
+#else
+
+static inline void kmsan_poison_memory(const void *address, size_t size,
+				       gfp_t flags)
+{
+}
+static inline void kmsan_unpoison_memory(const void *address, size_t size)
+{
+}
+static inline void kmsan_check_memory(const void *address, size_t size)
+{
+}
+
+#endif
+
+#endif /* _LINUX_KMSAN_CHECKS_H */
diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h
new file mode 100644
index 0000000000000..99e48c6b049d9
--- /dev/null
+++ b/include/linux/kmsan.h
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * KMSAN API for subsystems.
+ *
+ * Copyright (C) 2017-2022 Google LLC
+ * Author: Alexander Potapenko <glider@google.com>
+ *
+ */
+#ifndef _LINUX_KMSAN_H
+#define _LINUX_KMSAN_H
+
+#include <linux/gfp.h>
+#include <linux/kmsan-checks.h>
+#include <linux/stackdepot.h>
+#include <linux/types.h>
+
+struct page;
+
+#ifdef CONFIG_KMSAN
+
+/* These constants are defined in the MSan LLVM instrumentation pass. */
+#define KMSAN_RETVAL_SIZE 800
+#define KMSAN_PARAM_SIZE 800
+
+struct kmsan_context_state {
+	char param_tls[KMSAN_PARAM_SIZE];
+	char retval_tls[KMSAN_RETVAL_SIZE];
+	char va_arg_tls[KMSAN_PARAM_SIZE];
+	char va_arg_origin_tls[KMSAN_PARAM_SIZE];
+	u64 va_arg_overflow_size_tls;
+	char param_origin_tls[KMSAN_PARAM_SIZE];
+	depot_stack_handle_t retval_origin_tls;
+};
+
+#undef KMSAN_PARAM_SIZE
+#undef KMSAN_RETVAL_SIZE
+
+struct kmsan_ctx {
+	struct kmsan_context_state cstate;
+	int kmsan_in_runtime;
+	bool allow_reporting;
+};
+
+#endif
+
+#endif /* _LINUX_KMSAN_H */
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index c29ab4c0cd5c6..3cc0ebdd9625f 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -218,6 +218,18 @@ struct page {
 					   not kmapped, ie. highmem) */
 #endif /* WANT_PAGE_VIRTUAL */
 
+#ifdef CONFIG_KMSAN
+	/*
+	 * KMSAN metadata for this page:
+	 *  - shadow page: every bit indicates whether the corresponding
+	 *    bit of the original page is initialized (0) or not (1);
+	 *  - origin page: every 4 bytes contain an id of the stack trace
+	 *    where the uninitialized value was created.
+	 */
+	struct page *kmsan_shadow;
+	struct page *kmsan_origin;
+#endif
+
 #ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS
 	int _last_cpupid;
 #endif
diff --git a/include/linux/sched.h b/include/linux/sched.h
index c46f3a63b758f..f9bb2c954e794 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -14,6 +14,7 @@
 #include <linux/pid.h>
 #include <linux/sem.h>
 #include <linux/shm.h>
+#include <linux/kmsan.h>
 #include <linux/mutex.h>
 #include <linux/plist.h>
 #include <linux/hrtimer.h>
@@ -1353,6 +1354,10 @@ struct task_struct {
 #endif
 #endif
 
+#ifdef CONFIG_KMSAN
+	struct kmsan_ctx		kmsan_ctx;
+#endif
+
 #if IS_ENABLED(CONFIG_KUNIT)
 	struct kunit			*kunit_test;
 #endif
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 2e24db4bff192..59819e6fa5865 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -963,6 +963,7 @@ config DEBUG_STACKOVERFLOW
 
 source "lib/Kconfig.kasan"
 source "lib/Kconfig.kfence"
+source "lib/Kconfig.kmsan"
 
 endmenu # "Memory Debugging"
 
diff --git a/lib/Kconfig.kmsan b/lib/Kconfig.kmsan
new file mode 100644
index 0000000000000..8f768d4034e3c
--- /dev/null
+++ b/lib/Kconfig.kmsan
@@ -0,0 +1,50 @@
+# SPDX-License-Identifier: GPL-2.0-only
+config HAVE_ARCH_KMSAN
+	bool
+
+config HAVE_KMSAN_COMPILER
+	# Clang versions <14.0.0 also support -fsanitize=kernel-memory, but not
+	# all the features necessary to build the kernel with KMSAN.
+	depends on CC_IS_CLANG && CLANG_VERSION >= 140000
+	def_bool $(cc-option,-fsanitize=kernel-memory -mllvm -msan-disable-checks=1)
+
+config HAVE_KMSAN_PARAM_RETVAL
+	# Separate check for -fsanitize-memory-param-retval support.
+	depends on CC_IS_CLANG && CLANG_VERSION >= 140000
+	def_bool $(cc-option,-fsanitize=kernel-memory -fsanitize-memory-param-retval)
+
+
+config KMSAN
+	bool "KMSAN: detector of uninitialized values use"
+	depends on HAVE_ARCH_KMSAN && HAVE_KMSAN_COMPILER
+	depends on SLUB && DEBUG_KERNEL && !KASAN && !KCSAN
+	select STACKDEPOT
+	select STACKDEPOT_ALWAYS_INIT
+	help
+	  KernelMemorySanitizer (KMSAN) is a dynamic detector of uses of
+	  uninitialized values in the kernel. It is based on compiler
+	  instrumentation provided by Clang and thus requires Clang to build.
+
+	  An important note is that KMSAN is not intended for production use,
+	  because it drastically increases kernel memory footprint and slows
+	  the whole system down.
+
+	  See <file:Documentation/dev-tools/kmsan.rst> for more details.
+
+if KMSAN
+
+config KMSAN_CHECK_PARAM_RETVAL
+	bool "Check for uninitialized values passed to and returned from functions"
+	default HAVE_KMSAN_PARAM_RETVAL
+	help
+	  If the compiler supports -fsanitize-memory-param-retval, KMSAN will
+	  eagerly check every function parameter passed by value and every
+	  function return value.
+
+	  Disabling KMSAN_CHECK_PARAM_RETVAL will result in tracking shadow for
+	  function parameters and return values across function borders. This
+	  is a more relaxed mode, but it generates more instrumentation code and
+	  may potentially report errors in corner cases when non-instrumented
+	  functions call instrumented ones.
+
+endif
diff --git a/mm/Makefile b/mm/Makefile
index 6f9ffa968a1a1..ff96830153221 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -89,6 +89,7 @@ obj-$(CONFIG_SLAB) += slab.o
 obj-$(CONFIG_SLUB) += slub.o
 obj-$(CONFIG_KASAN)	+= kasan/
 obj-$(CONFIG_KFENCE) += kfence/
+obj-$(CONFIG_KMSAN)	+= kmsan/
 obj-$(CONFIG_FAILSLAB) += failslab.o
 obj-$(CONFIG_MEMTEST)		+= memtest.o
 obj-$(CONFIG_MIGRATION) += migrate.o
diff --git a/mm/kmsan/Makefile b/mm/kmsan/Makefile
new file mode 100644
index 0000000000000..550ad8625e4f9
--- /dev/null
+++ b/mm/kmsan/Makefile
@@ -0,0 +1,23 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Makefile for KernelMemorySanitizer (KMSAN).
+#
+#
+obj-y := core.o instrumentation.o hooks.o report.o shadow.o
+
+KMSAN_SANITIZE := n
+KCOV_INSTRUMENT := n
+UBSAN_SANITIZE := n
+
+# Disable instrumentation of KMSAN runtime with other tools.
+CC_FLAGS_KMSAN_RUNTIME := -fno-stack-protector
+CC_FLAGS_KMSAN_RUNTIME += $(call cc-option,-fno-conserve-stack)
+CC_FLAGS_KMSAN_RUNTIME += -DDISABLE_BRANCH_PROFILING
+
+CFLAGS_REMOVE.o = $(CC_FLAGS_FTRACE)
+
+CFLAGS_core.o := $(CC_FLAGS_KMSAN_RUNTIME)
+CFLAGS_hooks.o := $(CC_FLAGS_KMSAN_RUNTIME)
+CFLAGS_instrumentation.o := $(CC_FLAGS_KMSAN_RUNTIME)
+CFLAGS_report.o := $(CC_FLAGS_KMSAN_RUNTIME)
+CFLAGS_shadow.o := $(CC_FLAGS_KMSAN_RUNTIME)
diff --git a/mm/kmsan/core.c b/mm/kmsan/core.c
new file mode 100644
index 0000000000000..16fb8880a9c6d
--- /dev/null
+++ b/mm/kmsan/core.c
@@ -0,0 +1,458 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * KMSAN runtime library.
+ *
+ * Copyright (C) 2017-2022 Google LLC
+ * Author: Alexander Potapenko <glider@google.com>
+ *
+ */
+
+#include <asm/page.h>
+#include <linux/compiler.h>
+#include <linux/export.h>
+#include <linux/highmem.h>
+#include <linux/interrupt.h>
+#include <linux/kernel.h>
+#include <linux/kmsan.h>
+#include <linux/memory.h>
+#include <linux/mm.h>
+#include <linux/mm_types.h>
+#include <linux/mmzone.h>
+#include <linux/percpu-defs.h>
+#include <linux/preempt.h>
+#include <linux/slab.h>
+#include <linux/stackdepot.h>
+#include <linux/stacktrace.h>
+#include <linux/types.h>
+#include <linux/vmalloc.h>
+
+#include "../slab.h"
+#include "kmsan.h"
+
+/*
+ * Avoid creating too long origin chains, these are unlikely to participate in
+ * real reports.
+ */
+#define MAX_CHAIN_DEPTH 7
+#define NUM_SKIPPED_TO_WARN 10000
+
+bool kmsan_enabled __read_mostly;
+
+/*
+ * Per-CPU KMSAN context to be used in interrupts, where current->kmsan is
+ * unavaliable.
+ */
+DEFINE_PER_CPU(struct kmsan_ctx, kmsan_percpu_ctx);
+
+void kmsan_internal_poison_memory(void *address, size_t size, gfp_t flags,
+				  unsigned int poison_flags)
+{
+	u32 extra_bits =
+		kmsan_extra_bits(/*depth*/ 0, poison_flags & KMSAN_POISON_FREE);
+	bool checked = poison_flags & KMSAN_POISON_CHECK;
+	depot_stack_handle_t handle;
+
+	handle = kmsan_save_stack_with_flags(flags, extra_bits);
+	kmsan_internal_set_shadow_origin(address, size, -1, handle, checked);
+}
+
+void kmsan_internal_unpoison_memory(void *address, size_t size, bool checked)
+{
+	kmsan_internal_set_shadow_origin(address, size, 0, 0, checked);
+}
+
+depot_stack_handle_t kmsan_save_stack_with_flags(gfp_t flags,
+						 unsigned int extra)
+{
+	unsigned long entries[KMSAN_STACK_DEPTH];
+	unsigned int nr_entries;
+
+	nr_entries = stack_trace_save(entries, KMSAN_STACK_DEPTH, 0);
+
+	/* Don't sleep (see might_sleep_if() in __alloc_pages_nodemask()). */
+	flags &= ~__GFP_DIRECT_RECLAIM;
+
+	return __stack_depot_save(entries, nr_entries, extra, flags, true);
+}
+
+/* Copy the metadata following the memmove() behavior. */
+void kmsan_internal_memmove_metadata(void *dst, void *src, size_t n)
+{
+	depot_stack_handle_t old_origin = 0, new_origin = 0;
+	int src_slots, dst_slots, i, iter, step, skip_bits;
+	depot_stack_handle_t *origin_src, *origin_dst;
+	void *shadow_src, *shadow_dst;
+	u32 *align_shadow_src, shadow;
+	bool backwards;
+
+	shadow_dst = kmsan_get_metadata(dst, KMSAN_META_SHADOW);
+	if (!shadow_dst)
+		return;
+	KMSAN_WARN_ON(!kmsan_metadata_is_contiguous(dst, n));
+
+	shadow_src = kmsan_get_metadata(src, KMSAN_META_SHADOW);
+	if (!shadow_src) {
+		/*
+		 * @src is untracked: zero out destination shadow, ignore the
+		 * origins, we're done.
+		 */
+		__memset(shadow_dst, 0, n);
+		return;
+	}
+	KMSAN_WARN_ON(!kmsan_metadata_is_contiguous(src, n));
+
+	__memmove(shadow_dst, shadow_src, n);
+
+	origin_dst = kmsan_get_metadata(dst, KMSAN_META_ORIGIN);
+	origin_src = kmsan_get_metadata(src, KMSAN_META_ORIGIN);
+	KMSAN_WARN_ON(!origin_dst || !origin_src);
+	src_slots = (ALIGN((u64)src + n, KMSAN_ORIGIN_SIZE) -
+		     ALIGN_DOWN((u64)src, KMSAN_ORIGIN_SIZE)) /
+		    KMSAN_ORIGIN_SIZE;
+	dst_slots = (ALIGN((u64)dst + n, KMSAN_ORIGIN_SIZE) -
+		     ALIGN_DOWN((u64)dst, KMSAN_ORIGIN_SIZE)) /
+		    KMSAN_ORIGIN_SIZE;
+	KMSAN_WARN_ON((src_slots < 1) || (dst_slots < 1));
+	KMSAN_WARN_ON((src_slots - dst_slots > 1) ||
+		      (dst_slots - src_slots < -1));
+
+	backwards = dst > src;
+	i = backwards ? min(src_slots, dst_slots) - 1 : 0;
+	iter = backwards ? -1 : 1;
+
+	align_shadow_src =
+		(u32 *)ALIGN_DOWN((u64)shadow_src, KMSAN_ORIGIN_SIZE);
+	for (step = 0; step < min(src_slots, dst_slots); step++, i += iter) {
+		KMSAN_WARN_ON(i < 0);
+		shadow = align_shadow_src[i];
+		if (i == 0) {
+			/*
+			 * If @src isn't aligned on KMSAN_ORIGIN_SIZE, don't
+			 * look at the first @src % KMSAN_ORIGIN_SIZE bytes
+			 * of the first shadow slot.
+			 */
+			skip_bits = ((u64)src % KMSAN_ORIGIN_SIZE) * 8;
+			shadow = (shadow >> skip_bits) << skip_bits;
+		}
+		if (i == src_slots - 1) {
+			/*
+			 * If @src + n isn't aligned on
+			 * KMSAN_ORIGIN_SIZE, don't look at the last
+			 * (@src + n) % KMSAN_ORIGIN_SIZE bytes of the
+			 * last shadow slot.
+			 */
+			skip_bits = (((u64)src + n) % KMSAN_ORIGIN_SIZE) * 8;
+			shadow = (shadow << skip_bits) >> skip_bits;
+		}
+		/*
+		 * Overwrite the origin only if the corresponding
+		 * shadow is nonempty.
+		 */
+		if (origin_src[i] && (origin_src[i] != old_origin) && shadow) {
+			old_origin = origin_src[i];
+			new_origin = kmsan_internal_chain_origin(old_origin);
+			/*
+			 * kmsan_internal_chain_origin() may return
+			 * NULL, but we don't want to lose the previous
+			 * origin value.
+			 */
+			if (!new_origin)
+				new_origin = old_origin;
+		}
+		if (shadow)
+			origin_dst[i] = new_origin;
+		else
+			origin_dst[i] = 0;
+	}
+	/*
+	 * If dst_slots is greater than src_slots (i.e.
+	 * dst_slots == src_slots + 1), there is an extra origin slot at the
+	 * beginning or end of the destination buffer, for which we take the
+	 * origin from the previous slot.
+	 * This is only done if the part of the source shadow corresponding to
+	 * slot is non-zero.
+	 *
+	 * E.g. if we copy 8 aligned bytes that are marked as uninitialized
+	 * and have origins o111 and o222, to an unaligned buffer with offset 1,
+	 * these two origins are copied to three origin slots, so one of then
+	 * needs to be duplicated, depending on the copy direction (@backwards)
+	 *
+	 *   src shadow: |uuuu|uuuu|....|
+	 *   src origin: |o111|o222|....|
+	 *
+	 * backwards = 0:
+	 *   dst shadow: |.uuu|uuuu|u...|
+	 *   dst origin: |....|o111|o222| - fill the empty slot with o111
+	 * backwards = 1:
+	 *   dst shadow: |.uuu|uuuu|u...|
+	 *   dst origin: |o111|o222|....| - fill the empty slot with o222
+	 */
+	if (src_slots < dst_slots) {
+		if (backwards) {
+			shadow = align_shadow_src[src_slots - 1];
+			skip_bits = (((u64)dst + n) % KMSAN_ORIGIN_SIZE) * 8;
+			shadow = (shadow << skip_bits) >> skip_bits;
+			if (shadow)
+				/* src_slots > 0, therefore dst_slots is at least 2 */
+				origin_dst[dst_slots - 1] = origin_dst[dst_slots - 2];
+		} else {
+			shadow = align_shadow_src[0];
+			skip_bits = ((u64)dst % KMSAN_ORIGIN_SIZE) * 8;
+			shadow = (shadow >> skip_bits) << skip_bits;
+			if (shadow)
+				origin_dst[0] = origin_dst[1];
+		}
+	}
+}
+
+depot_stack_handle_t kmsan_internal_chain_origin(depot_stack_handle_t id)
+{
+	unsigned long entries[3];
+	u32 extra_bits;
+	int depth;
+	bool uaf;
+
+	if (!id)
+		return id;
+	/*
+	 * Make sure we have enough spare bits in @id to hold the UAF bit and
+	 * the chain depth.
+	 */
+	BUILD_BUG_ON((1 << STACK_DEPOT_EXTRA_BITS) <= (MAX_CHAIN_DEPTH << 1));
+
+	extra_bits = stack_depot_get_extra_bits(id);
+	depth = kmsan_depth_from_eb(extra_bits);
+	uaf = kmsan_uaf_from_eb(extra_bits);
+
+	if (depth >= MAX_CHAIN_DEPTH) {
+		static atomic_long_t kmsan_skipped_origins;
+		long skipped = atomic_long_inc_return(&kmsan_skipped_origins);
+
+		if (skipped % NUM_SKIPPED_TO_WARN == 0) {
+			pr_warn("not chained %ld origins\n", skipped);
+			dump_stack();
+			kmsan_print_origin(id);
+		}
+		return id;
+	}
+	depth++;
+	extra_bits = kmsan_extra_bits(depth, uaf);
+
+	entries[0] = KMSAN_CHAIN_MAGIC_ORIGIN;
+	entries[1] = kmsan_save_stack_with_flags(GFP_ATOMIC, 0);
+	entries[2] = id;
+	/*
+	 * @entries is a local var in non-instrumented code, so KMSAN does not
+	 * know it is initialized. Explicitly unpoison it to avoid false
+	 * positives when __stack_depot_save() passes it to instrumented code.
+	 */
+	kmsan_internal_unpoison_memory(entries, sizeof(entries), false);
+	return __stack_depot_save(entries, ARRAY_SIZE(entries), extra_bits,
+				  GFP_ATOMIC, true);
+}
+
+void kmsan_internal_set_shadow_origin(void *addr, size_t size, int b,
+				      u32 origin, bool checked)
+{
+	u64 address = (u64)addr;
+	void *shadow_start;
+	u32 *origin_start;
+	size_t pad = 0;
+	int i;
+
+	KMSAN_WARN_ON(!kmsan_metadata_is_contiguous(addr, size));
+	shadow_start = kmsan_get_metadata(addr, KMSAN_META_SHADOW);
+	if (!shadow_start) {
+		/*
+		 * kmsan_metadata_is_contiguous() is true, so either all shadow
+		 * and origin pages are NULL, or all are non-NULL.
+		 */
+		if (checked) {
+			pr_err("%s: not memsetting %ld bytes starting at %px, because the shadow is NULL\n",
+			       __func__, size, addr);
+			KMSAN_WARN_ON(true);
+		}
+		return;
+	}
+	__memset(shadow_start, b, size);
+
+	if (!IS_ALIGNED(address, KMSAN_ORIGIN_SIZE)) {
+		pad = address % KMSAN_ORIGIN_SIZE;
+		address -= pad;
+		size += pad;
+	}
+	size = ALIGN(size, KMSAN_ORIGIN_SIZE);
+	origin_start =
+		(u32 *)kmsan_get_metadata((void *)address, KMSAN_META_ORIGIN);
+
+	for (i = 0; i < size / KMSAN_ORIGIN_SIZE; i++)
+		origin_start[i] = origin;
+}
+
+struct page *kmsan_vmalloc_to_page_or_null(void *vaddr)
+{
+	struct page *page;
+
+	if (!kmsan_internal_is_vmalloc_addr(vaddr) &&
+	    !kmsan_internal_is_module_addr(vaddr))
+		return NULL;
+	page = vmalloc_to_page(vaddr);
+	if (pfn_valid(page_to_pfn(page)))
+		return page;
+	else
+		return NULL;
+}
+
+void kmsan_internal_check_memory(void *addr, size_t size, const void *user_addr,
+				 int reason)
+{
+	depot_stack_handle_t cur_origin = 0, new_origin = 0;
+	unsigned long addr64 = (unsigned long)addr;
+	depot_stack_handle_t *origin = NULL;
+	unsigned char *shadow = NULL;
+	int cur_off_start = -1;
+	int i, chunk_size;
+	size_t pos = 0;
+
+	if (!size)
+		return;
+	KMSAN_WARN_ON(!kmsan_metadata_is_contiguous(addr, size));
+	while (pos < size) {
+		chunk_size = min(size - pos,
+				 PAGE_SIZE - ((addr64 + pos) % PAGE_SIZE));
+		shadow = kmsan_get_metadata((void *)(addr64 + pos),
+					    KMSAN_META_SHADOW);
+		if (!shadow) {
+			/*
+			 * This page is untracked. If there were uninitialized
+			 * bytes before, report them.
+			 */
+			if (cur_origin) {
+				kmsan_enter_runtime();
+				kmsan_report(cur_origin, addr, size,
+					     cur_off_start, pos - 1, user_addr,
+					     reason);
+				kmsan_leave_runtime();
+			}
+			cur_origin = 0;
+			cur_off_start = -1;
+			pos += chunk_size;
+			continue;
+		}
+		for (i = 0; i < chunk_size; i++) {
+			if (!shadow[i]) {
+				/*
+				 * This byte is unpoisoned. If there were
+				 * poisoned bytes before, report them.
+				 */
+				if (cur_origin) {
+					kmsan_enter_runtime();
+					kmsan_report(cur_origin, addr, size,
+						     cur_off_start, pos + i - 1,
+						     user_addr, reason);
+					kmsan_leave_runtime();
+				}
+				cur_origin = 0;
+				cur_off_start = -1;
+				continue;
+			}
+			origin = kmsan_get_metadata((void *)(addr64 + pos + i),
+						    KMSAN_META_ORIGIN);
+			KMSAN_WARN_ON(!origin);
+			new_origin = *origin;
+			/*
+			 * Encountered new origin - report the previous
+			 * uninitialized range.
+			 */
+			if (cur_origin != new_origin) {
+				if (cur_origin) {
+					kmsan_enter_runtime();
+					kmsan_report(cur_origin, addr, size,
+						     cur_off_start, pos + i - 1,
+						     user_addr, reason);
+					kmsan_leave_runtime();
+				}
+				cur_origin = new_origin;
+				cur_off_start = pos + i;
+			}
+		}
+		pos += chunk_size;
+	}
+	KMSAN_WARN_ON(pos != size);
+	if (cur_origin) {
+		kmsan_enter_runtime();
+		kmsan_report(cur_origin, addr, size, cur_off_start, pos - 1,
+			     user_addr, reason);
+		kmsan_leave_runtime();
+	}
+}
+
+bool kmsan_metadata_is_contiguous(void *addr, size_t size)
+{
+	char *cur_shadow = NULL, *next_shadow = NULL, *cur_origin = NULL,
+	     *next_origin = NULL;
+	u64 cur_addr = (u64)addr, next_addr = cur_addr + PAGE_SIZE;
+	depot_stack_handle_t *origin_p;
+	bool all_untracked = false;
+
+	if (!size)
+		return true;
+
+	/* The whole range belongs to the same page. */
+	if (ALIGN_DOWN(cur_addr + size - 1, PAGE_SIZE) ==
+	    ALIGN_DOWN(cur_addr, PAGE_SIZE))
+		return true;
+
+	cur_shadow = kmsan_get_metadata((void *)cur_addr, /*is_origin*/ false);
+	if (!cur_shadow)
+		all_untracked = true;
+	cur_origin = kmsan_get_metadata((void *)cur_addr, /*is_origin*/ true);
+	if (all_untracked && cur_origin)
+		goto report;
+
+	for (; next_addr < (u64)addr + size;
+	     cur_addr = next_addr, cur_shadow = next_shadow,
+	     cur_origin = next_origin, next_addr += PAGE_SIZE) {
+		next_shadow = kmsan_get_metadata((void *)next_addr, false);
+		next_origin = kmsan_get_metadata((void *)next_addr, true);
+		if (all_untracked) {
+			if (next_shadow || next_origin)
+				goto report;
+			if (!next_shadow && !next_origin)
+				continue;
+		}
+		if (((u64)cur_shadow == ((u64)next_shadow - PAGE_SIZE)) &&
+		    ((u64)cur_origin == ((u64)next_origin - PAGE_SIZE)))
+			continue;
+		goto report;
+	}
+	return true;
+
+report:
+	pr_err("%s: attempting to access two shadow page ranges.\n", __func__);
+	pr_err("Access of size %ld at %px.\n", size, addr);
+	pr_err("Addresses belonging to different ranges: %px and %px\n",
+	       (void *)cur_addr, (void *)next_addr);
+	pr_err("page[0].shadow: %px, page[1].shadow: %px\n", cur_shadow,
+	       next_shadow);
+	pr_err("page[0].origin: %px, page[1].origin: %px\n", cur_origin,
+	       next_origin);
+	origin_p = kmsan_get_metadata(addr, KMSAN_META_ORIGIN);
+	if (origin_p) {
+		pr_err("Origin: %08x\n", *origin_p);
+		kmsan_print_origin(*origin_p);
+	} else {
+		pr_err("Origin: unavailable\n");
+	}
+	return false;
+}
+
+bool kmsan_internal_is_module_addr(void *vaddr)
+{
+	return ((u64)vaddr >= MODULES_VADDR) && ((u64)vaddr < MODULES_END);
+}
+
+bool kmsan_internal_is_vmalloc_addr(void *addr)
+{
+	return ((u64)addr >= VMALLOC_START) && ((u64)addr < VMALLOC_END);
+}
diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
new file mode 100644
index 0000000000000..4ac62fa67a02a
--- /dev/null
+++ b/mm/kmsan/hooks.c
@@ -0,0 +1,66 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * KMSAN hooks for kernel subsystems.
+ *
+ * These functions handle creation of KMSAN metadata for memory allocations.
+ *
+ * Copyright (C) 2018-2022 Google LLC
+ * Author: Alexander Potapenko <glider@google.com>
+ *
+ */
+
+#include <linux/cacheflush.h>
+#include <linux/gfp.h>
+#include <linux/mm.h>
+#include <linux/mm_types.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+
+#include "../internal.h"
+#include "../slab.h"
+#include "kmsan.h"
+
+/*
+ * Instrumented functions shouldn't be called under
+ * kmsan_enter_runtime()/kmsan_leave_runtime(), because this will lead to
+ * skipping effects of functions like memset() inside instrumented code.
+ */
+
+/* Functions from kmsan-checks.h follow. */
+void kmsan_poison_memory(const void *address, size_t size, gfp_t flags)
+{
+	if (!kmsan_enabled || kmsan_in_runtime())
+		return;
+	kmsan_enter_runtime();
+	/* The users may want to poison/unpoison random memory. */
+	kmsan_internal_poison_memory((void *)address, size, flags,
+				     KMSAN_POISON_NOCHECK);
+	kmsan_leave_runtime();
+}
+EXPORT_SYMBOL(kmsan_poison_memory);
+
+void kmsan_unpoison_memory(const void *address, size_t size)
+{
+	unsigned long ua_flags;
+
+	if (!kmsan_enabled || kmsan_in_runtime())
+		return;
+
+	ua_flags = user_access_save();
+	kmsan_enter_runtime();
+	/* The users may want to poison/unpoison random memory. */
+	kmsan_internal_unpoison_memory((void *)address, size,
+				       KMSAN_POISON_NOCHECK);
+	kmsan_leave_runtime();
+	user_access_restore(ua_flags);
+}
+EXPORT_SYMBOL(kmsan_unpoison_memory);
+
+void kmsan_check_memory(const void *addr, size_t size)
+{
+	if (!kmsan_enabled)
+		return;
+	return kmsan_internal_check_memory((void *)addr, size, /*user_addr*/ 0,
+					   REASON_ANY);
+}
+EXPORT_SYMBOL(kmsan_check_memory);
diff --git a/mm/kmsan/instrumentation.c b/mm/kmsan/instrumentation.c
new file mode 100644
index 0000000000000..1b705162be8c2
--- /dev/null
+++ b/mm/kmsan/instrumentation.c
@@ -0,0 +1,271 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * KMSAN compiler API.
+ *
+ * This file implements __msan_XXX hooks that Clang inserts into the code
+ * compiled with -fsanitize=kernel-memory.
+ * See Documentation/dev-tools/kmsan.rst for more information on how KMSAN
+ * instrumentation works.
+ *
+ * Copyright (C) 2017-2022 Google LLC
+ * Author: Alexander Potapenko <glider@google.com>
+ *
+ */
+
+#include "kmsan.h"
+#include <linux/gfp.h>
+#include <linux/mm.h>
+#include <linux/uaccess.h>
+
+static inline bool is_bad_asm_addr(void *addr, uintptr_t size, bool is_store)
+{
+	if ((u64)addr < TASK_SIZE)
+		return true;
+	if (!kmsan_get_metadata(addr, KMSAN_META_SHADOW))
+		return true;
+	return false;
+}
+
+static inline struct shadow_origin_ptr
+get_shadow_origin_ptr(void *addr, u64 size, bool store)
+{
+	unsigned long ua_flags = user_access_save();
+	struct shadow_origin_ptr ret;
+
+	ret = kmsan_get_shadow_origin_ptr(addr, size, store);
+	user_access_restore(ua_flags);
+	return ret;
+}
+
+/* Get shadow and origin pointers for a memory load with non-standard size. */
+struct shadow_origin_ptr __msan_metadata_ptr_for_load_n(void *addr,
+							uintptr_t size)
+{
+	return get_shadow_origin_ptr(addr, size, /*store*/ false);
+}
+EXPORT_SYMBOL(__msan_metadata_ptr_for_load_n);
+
+/* Get shadow and origin pointers for a memory store with non-standard size. */
+struct shadow_origin_ptr __msan_metadata_ptr_for_store_n(void *addr,
+							 uintptr_t size)
+{
+	return get_shadow_origin_ptr(addr, size, /*store*/ true);
+}
+EXPORT_SYMBOL(__msan_metadata_ptr_for_store_n);
+
+/*
+ * Declare functions that obtain shadow/origin pointers for loads and stores
+ * with fixed size.
+ */
+#define DECLARE_METADATA_PTR_GETTER(size)                                      \
+	struct shadow_origin_ptr __msan_metadata_ptr_for_load_##size(          \
+		void *addr)                                                    \
+	{                                                                      \
+		return get_shadow_origin_ptr(addr, size, /*store*/ false);     \
+	}                                                                      \
+	EXPORT_SYMBOL(__msan_metadata_ptr_for_load_##size);                    \
+	struct shadow_origin_ptr __msan_metadata_ptr_for_store_##size(         \
+		void *addr)                                                    \
+	{                                                                      \
+		return get_shadow_origin_ptr(addr, size, /*store*/ true);      \
+	}                                                                      \
+	EXPORT_SYMBOL(__msan_metadata_ptr_for_store_##size)
+
+DECLARE_METADATA_PTR_GETTER(1);
+DECLARE_METADATA_PTR_GETTER(2);
+DECLARE_METADATA_PTR_GETTER(4);
+DECLARE_METADATA_PTR_GETTER(8);
+
+/*
+ * Handle a memory store performed by inline assembly. KMSAN conservatively
+ * attempts to unpoison the outputs of asm() directives to prevent false
+ * positives caused by missed stores.
+ */
+void __msan_instrument_asm_store(void *addr, uintptr_t size)
+{
+	unsigned long ua_flags;
+
+	if (!kmsan_enabled || kmsan_in_runtime())
+		return;
+
+	ua_flags = user_access_save();
+	/*
+	 * Most of the accesses are below 32 bytes. The two exceptions so far
+	 * are clwb() (64 bytes) and FPU state (512 bytes).
+	 * It's unlikely that the assembly will touch more than 512 bytes.
+	 */
+	if (size > 512) {
+		WARN_ONCE(1, "assembly store size too big: %ld\n", size);
+		size = 8;
+	}
+	if (is_bad_asm_addr(addr, size, /*is_store*/ true)) {
+		user_access_restore(ua_flags);
+		return;
+	}
+	kmsan_enter_runtime();
+	/* Unpoisoning the memory on best effort. */
+	kmsan_internal_unpoison_memory(addr, size, /*checked*/ false);
+	kmsan_leave_runtime();
+	user_access_restore(ua_flags);
+}
+EXPORT_SYMBOL(__msan_instrument_asm_store);
+
+/* Handle llvm.memmove intrinsic. */
+void *__msan_memmove(void *dst, const void *src, uintptr_t n)
+{
+	void *result;
+
+	result = __memmove(dst, src, n);
+	if (!n)
+		/* Some people call memmove() with zero length. */
+		return result;
+	if (!kmsan_enabled || kmsan_in_runtime())
+		return result;
+
+	kmsan_enter_runtime();
+	kmsan_internal_memmove_metadata(dst, (void *)src, n);
+	kmsan_leave_runtime();
+
+	return result;
+}
+EXPORT_SYMBOL(__msan_memmove);
+
+/* Handle llvm.memcpy intrinsic. */
+void *__msan_memcpy(void *dst, const void *src, uintptr_t n)
+{
+	void *result;
+
+	result = __memcpy(dst, src, n);
+	if (!n)
+		/* Some people call memcpy() with zero length. */
+		return result;
+
+	if (!kmsan_enabled || kmsan_in_runtime())
+		return result;
+
+	kmsan_enter_runtime();
+	/* Using memmove instead of memcpy doesn't affect correctness. */
+	kmsan_internal_memmove_metadata(dst, (void *)src, n);
+	kmsan_leave_runtime();
+
+	return result;
+}
+EXPORT_SYMBOL(__msan_memcpy);
+
+/* Handle llvm.memset intrinsic. */
+void *__msan_memset(void *dst, int c, uintptr_t n)
+{
+	void *result;
+
+	result = __memset(dst, c, n);
+	if (!kmsan_enabled || kmsan_in_runtime())
+		return result;
+
+	kmsan_enter_runtime();
+	/*
+	 * Clang doesn't pass parameter metadata here, so it is impossible to
+	 * use shadow of @c to set up the shadow for @dst.
+	 */
+	kmsan_internal_unpoison_memory(dst, n, /*checked*/ false);
+	kmsan_leave_runtime();
+
+	return result;
+}
+EXPORT_SYMBOL(__msan_memset);
+
+/*
+ * Create a new origin from an old one. This is done when storing an
+ * uninitialized value to memory. When reporting an error, KMSAN unrolls and
+ * prints the whole chain of stores that preceded the use of this value.
+ */
+depot_stack_handle_t __msan_chain_origin(depot_stack_handle_t origin)
+{
+	depot_stack_handle_t ret = 0;
+	unsigned long ua_flags;
+
+	if (!kmsan_enabled || kmsan_in_runtime())
+		return ret;
+
+	ua_flags = user_access_save();
+
+	/* Creating new origins may allocate memory. */
+	kmsan_enter_runtime();
+	ret = kmsan_internal_chain_origin(origin);
+	kmsan_leave_runtime();
+	user_access_restore(ua_flags);
+	return ret;
+}
+EXPORT_SYMBOL(__msan_chain_origin);
+
+/* Poison a local variable when entering a function. */
+void __msan_poison_alloca(void *address, uintptr_t size, char *descr)
+{
+	depot_stack_handle_t handle;
+	unsigned long entries[4];
+	unsigned long ua_flags;
+
+	if (!kmsan_enabled || kmsan_in_runtime())
+		return;
+
+	ua_flags = user_access_save();
+	entries[0] = KMSAN_ALLOCA_MAGIC_ORIGIN;
+	entries[1] = (u64)descr;
+	entries[2] = (u64)__builtin_return_address(0);
+	/*
+	 * With frame pointers enabled, it is possible to quickly fetch the
+	 * second frame of the caller stack without calling the unwinder.
+	 * Without them, simply do not bother.
+	 */
+	if (IS_ENABLED(CONFIG_UNWINDER_FRAME_POINTER))
+		entries[3] = (u64)__builtin_return_address(1);
+	else
+		entries[3] = 0;
+
+	/* stack_depot_save() may allocate memory. */
+	kmsan_enter_runtime();
+	handle = stack_depot_save(entries, ARRAY_SIZE(entries), GFP_ATOMIC);
+	kmsan_leave_runtime();
+
+	kmsan_internal_set_shadow_origin(address, size, -1, handle,
+					 /*checked*/ true);
+	user_access_restore(ua_flags);
+}
+EXPORT_SYMBOL(__msan_poison_alloca);
+
+/* Unpoison a local variable. */
+void __msan_unpoison_alloca(void *address, uintptr_t size)
+{
+	if (!kmsan_enabled || kmsan_in_runtime())
+		return;
+
+	kmsan_enter_runtime();
+	kmsan_internal_unpoison_memory(address, size, /*checked*/ true);
+	kmsan_leave_runtime();
+}
+EXPORT_SYMBOL(__msan_unpoison_alloca);
+
+/*
+ * Report that an uninitialized value with the given origin was used in a way
+ * that constituted undefined behavior.
+ */
+void __msan_warning(u32 origin)
+{
+	if (!kmsan_enabled || kmsan_in_runtime())
+		return;
+	kmsan_enter_runtime();
+	kmsan_report(origin, /*address*/ 0, /*size*/ 0,
+		     /*off_first*/ 0, /*off_last*/ 0, /*user_addr*/ 0,
+		     REASON_ANY);
+	kmsan_leave_runtime();
+}
+EXPORT_SYMBOL(__msan_warning);
+
+/*
+ * At the beginning of an instrumented function, obtain the pointer to
+ * `struct kmsan_context_state` holding the metadata for function parameters.
+ */
+struct kmsan_context_state *__msan_get_context_state(void)
+{
+	return &kmsan_get_context()->cstate;
+}
+EXPORT_SYMBOL(__msan_get_context_state);
diff --git a/mm/kmsan/kmsan.h b/mm/kmsan/kmsan.h
new file mode 100644
index 0000000000000..d3c400ca097ba
--- /dev/null
+++ b/mm/kmsan/kmsan.h
@@ -0,0 +1,190 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Functions used by the KMSAN runtime.
+ *
+ * Copyright (C) 2017-2022 Google LLC
+ * Author: Alexander Potapenko <glider@google.com>
+ *
+ */
+
+#ifndef __MM_KMSAN_KMSAN_H
+#define __MM_KMSAN_KMSAN_H
+
+#include <asm/pgtable_64_types.h>
+#include <linux/irqflags.h>
+#include <linux/sched.h>
+#include <linux/stackdepot.h>
+#include <linux/stacktrace.h>
+#include <linux/nmi.h>
+#include <linux/mm.h>
+#include <linux/printk.h>
+
+#define KMSAN_ALLOCA_MAGIC_ORIGIN 0xabcd0100
+#define KMSAN_CHAIN_MAGIC_ORIGIN 0xabcd0200
+
+#define KMSAN_POISON_NOCHECK 0x0
+#define KMSAN_POISON_CHECK 0x1
+#define KMSAN_POISON_FREE 0x2
+
+#define KMSAN_ORIGIN_SIZE 4
+
+#define KMSAN_STACK_DEPTH 64
+
+#define KMSAN_META_SHADOW (false)
+#define KMSAN_META_ORIGIN (true)
+
+extern bool kmsan_enabled;
+extern int panic_on_kmsan;
+
+/*
+ * KMSAN performs a lot of consistency checks that are currently enabled by
+ * default. BUG_ON is normally discouraged in the kernel, unless used for
+ * debugging, but KMSAN itself is a debugging tool, so it makes little sense to
+ * recover if something goes wrong.
+ */
+#define KMSAN_WARN_ON(cond)						\
+	({								\
+		const bool __cond = WARN_ON(cond);			\
+		if (unlikely(__cond)) {					\
+			WRITE_ONCE(kmsan_enabled, false);		\
+			if (panic_on_kmsan) {				\
+				/* Can't call panic() here because */	\
+				/* of uaccess checks. */		\
+				BUG();					\
+			}						\
+		}							\
+		__cond;							\
+	})
+
+/*
+ * A pair of metadata pointers to be returned by the instrumentation functions.
+ */
+struct shadow_origin_ptr {
+	void *shadow, *origin;
+};
+
+struct shadow_origin_ptr kmsan_get_shadow_origin_ptr(void *addr, u64 size,
+						     bool store);
+void *kmsan_get_metadata(void *addr, bool is_origin);
+
+enum kmsan_bug_reason {
+	REASON_ANY,
+	REASON_COPY_TO_USER,
+	REASON_SUBMIT_URB,
+};
+
+void kmsan_print_origin(depot_stack_handle_t origin);
+
+/**
+ * kmsan_report() - Report a use of uninitialized value.
+ * @origin:    Stack ID of the uninitialized value.
+ * @address:   Address at which the memory access happens.
+ * @size:      Memory access size.
+ * @off_first: Offset (from @address) of the first byte to be reported.
+ * @off_last:  Offset (from @address) of the last byte to be reported.
+ * @user_addr: When non-NULL, denotes the userspace address to which the kernel
+ *             is leaking data.
+ * @reason:    Error type from enum kmsan_bug_reason.
+ *
+ * kmsan_report() prints an error message for a consequent group of bytes
+ * sharing the same origin. If an uninitialized value is used in a comparison,
+ * this function is called once without specifying the addresses. When checking
+ * a memory range, KMSAN may call kmsan_report() multiple times with the same
+ * @address, @size, @user_addr and @reason, but different @off_first and
+ * @off_last corresponding to different @origin values.
+ */
+void kmsan_report(depot_stack_handle_t origin, void *address, int size,
+		  int off_first, int off_last, const void *user_addr,
+		  enum kmsan_bug_reason reason);
+
+DECLARE_PER_CPU(struct kmsan_ctx, kmsan_percpu_ctx);
+
+static __always_inline struct kmsan_ctx *kmsan_get_context(void)
+{
+	return in_task() ? &current->kmsan_ctx : raw_cpu_ptr(&kmsan_percpu_ctx);
+}
+
+/*
+ * When a compiler hook or KMSAN runtime function is invoked, it may make a
+ * call to instrumented code and eventually call itself recursively. To avoid
+ * that, we guard the runtime entry regions with
+ * kmsan_enter_runtime()/kmsan_leave_runtime() and exit the hook if
+ * kmsan_in_runtime() is true.
+ *
+ * Non-runtime code may occasionally get executed in nested IRQs from the
+ * runtime code (e.g. when called via smp_call_function_single()). Because some
+ * KMSAN routines may take locks (e.g. for memory allocation), we conservatively
+ * bail out instead of calling them. To minimize the effect of this (potentially
+ * missing initialization events) kmsan_in_runtime() is not checked in
+ * non-blocking runtime functions.
+ */
+static __always_inline bool kmsan_in_runtime(void)
+{
+	if ((hardirq_count() >> HARDIRQ_SHIFT) > 1)
+		return true;
+	return kmsan_get_context()->kmsan_in_runtime;
+}
+
+static __always_inline void kmsan_enter_runtime(void)
+{
+	struct kmsan_ctx *ctx;
+
+	ctx = kmsan_get_context();
+	KMSAN_WARN_ON(ctx->kmsan_in_runtime++);
+}
+
+static __always_inline void kmsan_leave_runtime(void)
+{
+	struct kmsan_ctx *ctx = kmsan_get_context();
+
+	KMSAN_WARN_ON(--ctx->kmsan_in_runtime);
+}
+
+depot_stack_handle_t kmsan_save_stack(void);
+depot_stack_handle_t kmsan_save_stack_with_flags(gfp_t flags,
+						 unsigned int extra_bits);
+
+/*
+ * Pack and unpack the origin chain depth and UAF flag to/from the extra bits
+ * provided by the stack depot.
+ * The UAF flag is stored in the lowest bit, followed by the depth in the upper
+ * bits.
+ * set_dsh_extra_bits() is responsible for clamping the value.
+ */
+static __always_inline unsigned int kmsan_extra_bits(unsigned int depth,
+						     bool uaf)
+{
+	return (depth << 1) | uaf;
+}
+
+static __always_inline bool kmsan_uaf_from_eb(unsigned int extra_bits)
+{
+	return extra_bits & 1;
+}
+
+static __always_inline unsigned int kmsan_depth_from_eb(unsigned int extra_bits)
+{
+	return extra_bits >> 1;
+}
+
+/*
+ * kmsan_internal_ functions are supposed to be very simple and not require the
+ * kmsan_in_runtime() checks.
+ */
+void kmsan_internal_memmove_metadata(void *dst, void *src, size_t n);
+void kmsan_internal_poison_memory(void *address, size_t size, gfp_t flags,
+				  unsigned int poison_flags);
+void kmsan_internal_unpoison_memory(void *address, size_t size, bool checked);
+void kmsan_internal_set_shadow_origin(void *address, size_t size, int b,
+				      u32 origin, bool checked);
+depot_stack_handle_t kmsan_internal_chain_origin(depot_stack_handle_t id);
+
+bool kmsan_metadata_is_contiguous(void *addr, size_t size);
+void kmsan_internal_check_memory(void *addr, size_t size, const void *user_addr,
+				 int reason);
+bool kmsan_internal_is_module_addr(void *vaddr);
+bool kmsan_internal_is_vmalloc_addr(void *addr);
+
+struct page *kmsan_vmalloc_to_page_or_null(void *vaddr);
+
+#endif /* __MM_KMSAN_KMSAN_H */
diff --git a/mm/kmsan/report.c b/mm/kmsan/report.c
new file mode 100644
index 0000000000000..c298edcf49ee5
--- /dev/null
+++ b/mm/kmsan/report.c
@@ -0,0 +1,211 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * KMSAN error reporting routines.
+ *
+ * Copyright (C) 2019-2022 Google LLC
+ * Author: Alexander Potapenko <glider@google.com>
+ *
+ */
+
+#include <linux/console.h>
+#include <linux/moduleparam.h>
+#include <linux/stackdepot.h>
+#include <linux/stacktrace.h>
+#include <linux/uaccess.h>
+
+#include "kmsan.h"
+
+static DEFINE_RAW_SPINLOCK(kmsan_report_lock);
+#define DESCR_SIZE 128
+/* Protected by kmsan_report_lock */
+static char report_local_descr[DESCR_SIZE];
+int panic_on_kmsan __read_mostly;
+
+#ifdef MODULE_PARAM_PREFIX
+#undef MODULE_PARAM_PREFIX
+#endif
+#define MODULE_PARAM_PREFIX "kmsan."
+module_param_named(panic, panic_on_kmsan, int, 0);
+
+/*
+ * Skip internal KMSAN frames.
+ */
+static int get_stack_skipnr(const unsigned long stack_entries[],
+			    int num_entries)
+{
+	int len, skip;
+	char buf[64];
+
+	for (skip = 0; skip < num_entries; ++skip) {
+		len = scnprintf(buf, sizeof(buf), "%ps",
+				(void *)stack_entries[skip]);
+
+		/* Never show __msan_* or kmsan_* functions. */
+		if ((strnstr(buf, "__msan_", len) == buf) ||
+		    (strnstr(buf, "kmsan_", len) == buf))
+			continue;
+
+		/*
+		 * No match for runtime functions -- @skip entries to skip to
+		 * get to first frame of interest.
+		 */
+		break;
+	}
+
+	return skip;
+}
+
+/*
+ * Currently the descriptions of locals generated by Clang look as follows:
+ *   ----local_name@function_name
+ * We want to print only the name of the local, as other information in that
+ * description can be confusing.
+ * The meaningful part of the description is copied to a global buffer to avoid
+ * allocating memory.
+ */
+static char *pretty_descr(char *descr)
+{
+	int i, pos = 0, len = strlen(descr);
+
+	for (i = 0; i < len; i++) {
+		if (descr[i] == '@')
+			break;
+		if (descr[i] == '-')
+			continue;
+		report_local_descr[pos] = descr[i];
+		if (pos + 1 == DESCR_SIZE)
+			break;
+		pos++;
+	}
+	report_local_descr[pos] = 0;
+	return report_local_descr;
+}
+
+void kmsan_print_origin(depot_stack_handle_t origin)
+{
+	unsigned long *entries = NULL, *chained_entries = NULL;
+	unsigned int nr_entries, chained_nr_entries, skipnr;
+	void *pc1 = NULL, *pc2 = NULL;
+	depot_stack_handle_t head;
+	unsigned long magic;
+	char *descr = NULL;
+
+	if (!origin)
+		return;
+
+	while (true) {
+		nr_entries = stack_depot_fetch(origin, &entries);
+		magic = nr_entries ? entries[0] : 0;
+		if ((nr_entries == 4) && (magic == KMSAN_ALLOCA_MAGIC_ORIGIN)) {
+			descr = (char *)entries[1];
+			pc1 = (void *)entries[2];
+			pc2 = (void *)entries[3];
+			pr_err("Local variable %s created at:\n",
+			       pretty_descr(descr));
+			if (pc1)
+				pr_err(" %pSb\n", pc1);
+			if (pc2)
+				pr_err(" %pSb\n", pc2);
+			break;
+		}
+		if ((nr_entries == 3) && (magic == KMSAN_CHAIN_MAGIC_ORIGIN)) {
+			head = entries[1];
+			origin = entries[2];
+			pr_err("Uninit was stored to memory at:\n");
+			chained_nr_entries =
+				stack_depot_fetch(head, &chained_entries);
+			kmsan_internal_unpoison_memory(
+				chained_entries,
+				chained_nr_entries * sizeof(*chained_entries),
+				/*checked*/ false);
+			skipnr = get_stack_skipnr(chained_entries,
+						  chained_nr_entries);
+			stack_trace_print(chained_entries + skipnr,
+					  chained_nr_entries - skipnr, 0);
+			pr_err("\n");
+			continue;
+		}
+		pr_err("Uninit was created at:\n");
+		if (nr_entries) {
+			skipnr = get_stack_skipnr(entries, nr_entries);
+			stack_trace_print(entries + skipnr, nr_entries - skipnr,
+					  0);
+		} else {
+			pr_err("(stack is not available)\n");
+		}
+		break;
+	}
+}
+
+void kmsan_report(depot_stack_handle_t origin, void *address, int size,
+		  int off_first, int off_last, const void *user_addr,
+		  enum kmsan_bug_reason reason)
+{
+	unsigned long stack_entries[KMSAN_STACK_DEPTH];
+	int num_stack_entries, skipnr;
+	char *bug_type = NULL;
+	unsigned long ua_flags;
+	bool is_uaf;
+
+	if (!kmsan_enabled)
+		return;
+	if (!current->kmsan_ctx.allow_reporting)
+		return;
+	if (!origin)
+		return;
+
+	current->kmsan_ctx.allow_reporting = false;
+	ua_flags = user_access_save();
+	raw_spin_lock(&kmsan_report_lock);
+	pr_err("=====================================================\n");
+	is_uaf = kmsan_uaf_from_eb(stack_depot_get_extra_bits(origin));
+	switch (reason) {
+	case REASON_ANY:
+		bug_type = is_uaf ? "use-after-free" : "uninit-value";
+		break;
+	case REASON_COPY_TO_USER:
+		bug_type = is_uaf ? "kernel-infoleak-after-free" :
+					  "kernel-infoleak";
+		break;
+	case REASON_SUBMIT_URB:
+		bug_type = is_uaf ? "kernel-usb-infoleak-after-free" :
+					  "kernel-usb-infoleak";
+		break;
+	}
+
+	num_stack_entries =
+		stack_trace_save(stack_entries, KMSAN_STACK_DEPTH, 1);
+	skipnr = get_stack_skipnr(stack_entries, num_stack_entries);
+
+	pr_err("BUG: KMSAN: %s in %pSb\n",
+	       bug_type, (void *)stack_entries[skipnr]);
+	stack_trace_print(stack_entries + skipnr, num_stack_entries - skipnr,
+			  0);
+	pr_err("\n");
+
+	kmsan_print_origin(origin);
+
+	if (size) {
+		pr_err("\n");
+		if (off_first == off_last)
+			pr_err("Byte %d of %d is uninitialized\n", off_first,
+			       size);
+		else
+			pr_err("Bytes %d-%d of %d are uninitialized\n",
+			       off_first, off_last, size);
+	}
+	if (address)
+		pr_err("Memory access of size %d starts at %px\n", size,
+		       address);
+	if (user_addr && reason == REASON_COPY_TO_USER)
+		pr_err("Data copied to user address %px\n", user_addr);
+	pr_err("\n");
+	dump_stack_print_info(KERN_ERR);
+	pr_err("=====================================================\n");
+	add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE);
+	raw_spin_unlock(&kmsan_report_lock);
+	if (panic_on_kmsan)
+		panic("kmsan.panic set ...\n");
+	user_access_restore(ua_flags);
+	current->kmsan_ctx.allow_reporting = true;
+}
diff --git a/mm/kmsan/shadow.c b/mm/kmsan/shadow.c
new file mode 100644
index 0000000000000..e5ad2972d7362
--- /dev/null
+++ b/mm/kmsan/shadow.c
@@ -0,0 +1,147 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * KMSAN shadow implementation.
+ *
+ * Copyright (C) 2017-2022 Google LLC
+ * Author: Alexander Potapenko <glider@google.com>
+ *
+ */
+
+#include <asm/kmsan.h>
+#include <asm/tlbflush.h>
+#include <linux/cacheflush.h>
+#include <linux/memblock.h>
+#include <linux/mm_types.h>
+#include <linux/percpu-defs.h>
+#include <linux/slab.h>
+#include <linux/smp.h>
+#include <linux/stddef.h>
+
+#include "../internal.h"
+#include "kmsan.h"
+
+#define shadow_page_for(page) ((page)->kmsan_shadow)
+
+#define origin_page_for(page) ((page)->kmsan_origin)
+
+static void *shadow_ptr_for(struct page *page)
+{
+	return page_address(shadow_page_for(page));
+}
+
+static void *origin_ptr_for(struct page *page)
+{
+	return page_address(origin_page_for(page));
+}
+
+static bool page_has_metadata(struct page *page)
+{
+	return shadow_page_for(page) && origin_page_for(page);
+}
+
+static void set_no_shadow_origin_page(struct page *page)
+{
+	shadow_page_for(page) = NULL;
+	origin_page_for(page) = NULL;
+}
+
+/*
+ * Dummy load and store pages to be used when the real metadata is unavailable.
+ * There are separate pages for loads and stores, so that every load returns a
+ * zero, and every store doesn't affect other loads.
+ */
+static char dummy_load_page[PAGE_SIZE] __aligned(PAGE_SIZE);
+static char dummy_store_page[PAGE_SIZE] __aligned(PAGE_SIZE);
+
+static unsigned long vmalloc_meta(void *addr, bool is_origin)
+{
+	unsigned long addr64 = (unsigned long)addr, off;
+
+	KMSAN_WARN_ON(is_origin && !IS_ALIGNED(addr64, KMSAN_ORIGIN_SIZE));
+	if (kmsan_internal_is_vmalloc_addr(addr)) {
+		off = addr64 - VMALLOC_START;
+		return off + (is_origin ? KMSAN_VMALLOC_ORIGIN_START :
+						KMSAN_VMALLOC_SHADOW_START);
+	}
+	if (kmsan_internal_is_module_addr(addr)) {
+		off = addr64 - MODULES_VADDR;
+		return off + (is_origin ? KMSAN_MODULES_ORIGIN_START :
+						KMSAN_MODULES_SHADOW_START);
+	}
+	return 0;
+}
+
+static struct page *virt_to_page_or_null(void *vaddr)
+{
+	if (kmsan_virt_addr_valid(vaddr))
+		return virt_to_page(vaddr);
+	else
+		return NULL;
+}
+
+struct shadow_origin_ptr kmsan_get_shadow_origin_ptr(void *address, u64 size,
+						     bool store)
+{
+	struct shadow_origin_ptr ret;
+	void *shadow;
+
+	/*
+	 * Even if we redirect this memory access to the dummy page, it will
+	 * go out of bounds.
+	 */
+	KMSAN_WARN_ON(size > PAGE_SIZE);
+
+	if (!kmsan_enabled)
+		goto return_dummy;
+
+	KMSAN_WARN_ON(!kmsan_metadata_is_contiguous(address, size));
+	shadow = kmsan_get_metadata(address, KMSAN_META_SHADOW);
+	if (!shadow)
+		goto return_dummy;
+
+	ret.shadow = shadow;
+	ret.origin = kmsan_get_metadata(address, KMSAN_META_ORIGIN);
+	return ret;
+
+return_dummy:
+	if (store) {
+		/* Ignore this store. */
+		ret.shadow = dummy_store_page;
+		ret.origin = dummy_store_page;
+	} else {
+		/* This load will return zero. */
+		ret.shadow = dummy_load_page;
+		ret.origin = dummy_load_page;
+	}
+	return ret;
+}
+
+/*
+ * Obtain the shadow or origin pointer for the given address, or NULL if there's
+ * none. The caller must check the return value for being non-NULL if needed.
+ * The return value of this function should not depend on whether we're in the
+ * runtime or not.
+ */
+void *kmsan_get_metadata(void *address, bool is_origin)
+{
+	u64 addr = (u64)address, pad, off;
+	struct page *page;
+
+	if (is_origin && !IS_ALIGNED(addr, KMSAN_ORIGIN_SIZE)) {
+		pad = addr % KMSAN_ORIGIN_SIZE;
+		addr -= pad;
+	}
+	address = (void *)addr;
+	if (kmsan_internal_is_vmalloc_addr(address) ||
+	    kmsan_internal_is_module_addr(address))
+		return (void *)vmalloc_meta(address, is_origin);
+
+	page = virt_to_page_or_null(address);
+	if (!page)
+		return NULL;
+	if (!page_has_metadata(page))
+		return NULL;
+	off = addr % PAGE_SIZE;
+
+	return (is_origin ? origin_ptr_for(page) : shadow_ptr_for(page)) + off;
+}
diff --git a/scripts/Makefile.kmsan b/scripts/Makefile.kmsan
new file mode 100644
index 0000000000000..b5b0aa61322ec
--- /dev/null
+++ b/scripts/Makefile.kmsan
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: GPL-2.0
+kmsan-cflags := -fsanitize=kernel-memory
+
+ifdef CONFIG_KMSAN_CHECK_PARAM_RETVAL
+kmsan-cflags += -fsanitize-memory-param-retval
+endif
+
+export CFLAGS_KMSAN := $(kmsan-cflags)
diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index d1425778664b9..46ebf7cb081f6 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -157,6 +157,15 @@ _c_flags += $(if $(patsubst n%,, \
 endif
 endif
 
+ifeq ($(CONFIG_KMSAN),y)
+_c_flags += $(if $(patsubst n%,, \
+		$(KMSAN_SANITIZE_$(basetarget).o)$(KMSAN_SANITIZE)y), \
+		$(CFLAGS_KMSAN))
+_c_flags += $(if $(patsubst n%,, \
+		$(KMSAN_ENABLE_CHECKS_$(basetarget).o)$(KMSAN_ENABLE_CHECKS)y), \
+		, -mllvm -msan-disable-checks=1)
+endif
+
 ifeq ($(CONFIG_UBSAN),y)
 _c_flags += $(if $(patsubst n%,, \
 		$(UBSAN_SANITIZE_$(basetarget).o)$(UBSAN_SANITIZE)$(CONFIG_UBSAN_SANITIZE_ALL)), \
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH v4 12/45] kmsan: disable instrumentation of unsupported common kernel code
  2022-07-01 14:22 [PATCH v4 00/45] Add KernelMemorySanitizer infrastructure Alexander Potapenko
                   ` (10 preceding siblings ...)
  2022-07-01 14:22 ` [PATCH v4 11/45] kmsan: add KMSAN runtime core Alexander Potapenko
@ 2022-07-01 14:22 ` Alexander Potapenko
  2022-07-12 11:54   ` Marco Elver
  2022-07-01 14:22 ` [PATCH v4 13/45] MAINTAINERS: add entry for KMSAN Alexander Potapenko
                   ` (32 subsequent siblings)
  44 siblings, 1 reply; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-01 14:22 UTC (permalink / raw)
  To: glider
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, linux-mm, linux-arch,
	linux-kernel

EFI stub cannot be linked with KMSAN runtime, so we disable
instrumentation for it.

Instrumenting kcov, stackdepot or lockdep leads to infinite recursion
caused by instrumentation hooks calling instrumented code again.

This patch was previously part of "kmsan: disable KMSAN instrumentation
for certain kernel parts", but was split away per Mark Rutland's
request.

Signed-off-by: Alexander Potapenko <glider@google.com>
---
Link: https://linux-review.googlesource.com/id/I41ae706bd3474f074f6a870bfc3f0f90e9c720f7
---
 drivers/firmware/efi/libstub/Makefile | 1 +
 kernel/Makefile                       | 1 +
 kernel/locking/Makefile               | 3 ++-
 lib/Makefile                          | 1 +
 4 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index d0537573501e9..81432d0c904b1 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -46,6 +46,7 @@ GCOV_PROFILE			:= n
 # Sanitizer runtimes are unavailable and cannot be linked here.
 KASAN_SANITIZE			:= n
 KCSAN_SANITIZE			:= n
+KMSAN_SANITIZE			:= n
 UBSAN_SANITIZE			:= n
 OBJECT_FILES_NON_STANDARD	:= y
 
diff --git a/kernel/Makefile b/kernel/Makefile
index a7e1f49ab2b3b..e47f0526c987f 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -38,6 +38,7 @@ KCOV_INSTRUMENT_kcov.o := n
 KASAN_SANITIZE_kcov.o := n
 KCSAN_SANITIZE_kcov.o := n
 UBSAN_SANITIZE_kcov.o := n
+KMSAN_SANITIZE_kcov.o := n
 CFLAGS_kcov.o := $(call cc-option, -fno-conserve-stack) -fno-stack-protector
 
 # Don't instrument error handlers
diff --git a/kernel/locking/Makefile b/kernel/locking/Makefile
index d51cabf28f382..ea925731fa40f 100644
--- a/kernel/locking/Makefile
+++ b/kernel/locking/Makefile
@@ -5,8 +5,9 @@ KCOV_INSTRUMENT		:= n
 
 obj-y += mutex.o semaphore.o rwsem.o percpu-rwsem.o
 
-# Avoid recursion lockdep -> KCSAN -> ... -> lockdep.
+# Avoid recursion lockdep -> sanitizer -> ... -> lockdep.
 KCSAN_SANITIZE_lockdep.o := n
+KMSAN_SANITIZE_lockdep.o := n
 
 ifdef CONFIG_FUNCTION_TRACER
 CFLAGS_REMOVE_lockdep.o = $(CC_FLAGS_FTRACE)
diff --git a/lib/Makefile b/lib/Makefile
index f99bf61f8bbc6..5056769d00bb6 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -272,6 +272,7 @@ obj-$(CONFIG_POLYNOMIAL) += polynomial.o
 CFLAGS_stackdepot.o += -fno-builtin
 obj-$(CONFIG_STACKDEPOT) += stackdepot.o
 KASAN_SANITIZE_stackdepot.o := n
+KMSAN_SANITIZE_stackdepot.o := n
 KCOV_INSTRUMENT_stackdepot.o := n
 
 obj-$(CONFIG_REF_TRACKER) += ref_tracker.o
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH v4 13/45] MAINTAINERS: add entry for KMSAN
  2022-07-01 14:22 [PATCH v4 00/45] Add KernelMemorySanitizer infrastructure Alexander Potapenko
                   ` (11 preceding siblings ...)
  2022-07-01 14:22 ` [PATCH v4 12/45] kmsan: disable instrumentation of unsupported common kernel code Alexander Potapenko
@ 2022-07-01 14:22 ` Alexander Potapenko
  2022-07-12 12:06   ` Marco Elver
  2022-07-01 14:22 ` [PATCH v4 14/45] mm: kmsan: maintain KMSAN metadata for page operations Alexander Potapenko
                   ` (31 subsequent siblings)
  44 siblings, 1 reply; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-01 14:22 UTC (permalink / raw)
  To: glider
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, linux-mm, linux-arch,
	linux-kernel

Add entry for KMSAN maintainers/reviewers.

Signed-off-by: Alexander Potapenko <glider@google.com>
---
Link: https://linux-review.googlesource.com/id/Ic5836c2bceb6b63f71a60d3327d18af3aa3dab77
---
 MAINTAINERS | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index fe5daf1415013..f56281df30284 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11106,6 +11106,18 @@ F:	kernel/kmod.c
 F:	lib/test_kmod.c
 F:	tools/testing/selftests/kmod/
 
+KMSAN
+M:	Alexander Potapenko <glider@google.com>
+R:	Marco Elver <elver@google.com>
+R:	Dmitry Vyukov <dvyukov@google.com>
+L:	kasan-dev@googlegroups.com
+S:	Maintained
+F:	Documentation/dev-tools/kmsan.rst
+F:	include/linux/kmsan*.h
+F:	lib/Kconfig.kmsan
+F:	mm/kmsan/
+F:	scripts/Makefile.kmsan
+
 KPROBES
 M:	Naveen N. Rao <naveen.n.rao@linux.ibm.com>
 M:	Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH v4 14/45] mm: kmsan: maintain KMSAN metadata for page operations
  2022-07-01 14:22 [PATCH v4 00/45] Add KernelMemorySanitizer infrastructure Alexander Potapenko
                   ` (12 preceding siblings ...)
  2022-07-01 14:22 ` [PATCH v4 13/45] MAINTAINERS: add entry for KMSAN Alexander Potapenko
@ 2022-07-01 14:22 ` Alexander Potapenko
  2022-07-12 12:20   ` Marco Elver
  2022-07-01 14:22 ` [PATCH v4 15/45] mm: kmsan: call KMSAN hooks from SLUB code Alexander Potapenko
                   ` (30 subsequent siblings)
  44 siblings, 1 reply; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-01 14:22 UTC (permalink / raw)
  To: glider
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, linux-mm, linux-arch,
	linux-kernel

Insert KMSAN hooks that make the necessary bookkeeping changes:
 - poison page shadow and origins in alloc_pages()/free_page();
 - clear page shadow and origins in clear_page(), copy_user_highpage();
 - copy page metadata in copy_highpage(), wp_page_copy();
 - handle vmap()/vunmap()/iounmap();

Signed-off-by: Alexander Potapenko <glider@google.com>
---
v2:
 -- move page metadata hooks implementation here
 -- remove call to kmsan_memblock_free_pages()

v3:
 -- use PAGE_SHIFT in kmsan_ioremap_page_range()

v4:
 -- change sizeof(type) to sizeof(*ptr)
 -- replace occurrences of |var| with @var
 -- swap mm: and kmsan: in the subject
 -- drop __no_sanitize_memory from clear_page()

Link: https://linux-review.googlesource.com/id/I6d4f53a0e7eab46fa29f0348f3095d9f2e326850
---
 arch/x86/include/asm/page_64.h |  12 ++++
 arch/x86/mm/ioremap.c          |   3 +
 include/linux/highmem.h        |   3 +
 include/linux/kmsan.h          | 123 +++++++++++++++++++++++++++++++++
 mm/internal.h                  |   6 ++
 mm/kmsan/hooks.c               |  87 +++++++++++++++++++++++
 mm/kmsan/shadow.c              | 114 ++++++++++++++++++++++++++++++
 mm/memory.c                    |   2 +
 mm/page_alloc.c                |  11 +++
 mm/vmalloc.c                   |  20 +++++-
 10 files changed, 379 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/page_64.h b/arch/x86/include/asm/page_64.h
index baa70451b8df5..227dd33eb4efb 100644
--- a/arch/x86/include/asm/page_64.h
+++ b/arch/x86/include/asm/page_64.h
@@ -45,14 +45,26 @@ void clear_page_orig(void *page);
 void clear_page_rep(void *page);
 void clear_page_erms(void *page);
 
+/* This is an assembly header, avoid including too much of kmsan.h */
+#ifdef CONFIG_KMSAN
+void kmsan_unpoison_memory(const void *addr, size_t size);
+#endif
 static inline void clear_page(void *page)
 {
+#ifdef CONFIG_KMSAN
+	/* alternative_call_2() changes @page. */
+	void *page_copy = page;
+#endif
 	alternative_call_2(clear_page_orig,
 			   clear_page_rep, X86_FEATURE_REP_GOOD,
 			   clear_page_erms, X86_FEATURE_ERMS,
 			   "=D" (page),
 			   "0" (page)
 			   : "cc", "memory", "rax", "rcx");
+#ifdef CONFIG_KMSAN
+	/* Clear KMSAN shadow for the pages that have it. */
+	kmsan_unpoison_memory(page_copy, PAGE_SIZE);
+#endif
 }
 
 void copy_page(void *to, void *from);
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 1ad0228f8ceb9..78c5bc654cff5 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -17,6 +17,7 @@
 #include <linux/cc_platform.h>
 #include <linux/efi.h>
 #include <linux/pgtable.h>
+#include <linux/kmsan.h>
 
 #include <asm/set_memory.h>
 #include <asm/e820/api.h>
@@ -479,6 +480,8 @@ void iounmap(volatile void __iomem *addr)
 		return;
 	}
 
+	kmsan_iounmap_page_range((unsigned long)addr,
+		(unsigned long)addr + get_vm_area_size(p));
 	memtype_free(p->phys_addr, p->phys_addr + get_vm_area_size(p));
 
 	/* Finally remove it */
diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index 3af34de54330c..ae82c5aefb018 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -6,6 +6,7 @@
 #include <linux/kernel.h>
 #include <linux/bug.h>
 #include <linux/cacheflush.h>
+#include <linux/kmsan.h>
 #include <linux/mm.h>
 #include <linux/uaccess.h>
 #include <linux/hardirq.h>
@@ -302,6 +303,7 @@ static inline void copy_user_highpage(struct page *to, struct page *from,
 	vfrom = kmap_local_page(from);
 	vto = kmap_local_page(to);
 	copy_user_page(vto, vfrom, vaddr, to);
+	kmsan_unpoison_memory(page_address(to), PAGE_SIZE);
 	kunmap_local(vto);
 	kunmap_local(vfrom);
 }
@@ -317,6 +319,7 @@ static inline void copy_highpage(struct page *to, struct page *from)
 	vfrom = kmap_local_page(from);
 	vto = kmap_local_page(to);
 	copy_page(vto, vfrom);
+	kmsan_copy_page_meta(to, from);
 	kunmap_local(vto);
 	kunmap_local(vfrom);
 }
diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h
index 99e48c6b049d9..699fe4f5b3bee 100644
--- a/include/linux/kmsan.h
+++ b/include/linux/kmsan.h
@@ -41,6 +41,129 @@ struct kmsan_ctx {
 	bool allow_reporting;
 };
 
+/**
+ * kmsan_alloc_page() - Notify KMSAN about an alloc_pages() call.
+ * @page:  struct page pointer returned by alloc_pages().
+ * @order: order of allocated struct page.
+ * @flags: GFP flags used by alloc_pages()
+ *
+ * KMSAN marks 1<<@order pages starting at @page as uninitialized, unless
+ * @flags contain __GFP_ZERO.
+ */
+void kmsan_alloc_page(struct page *page, unsigned int order, gfp_t flags);
+
+/**
+ * kmsan_free_page() - Notify KMSAN about a free_pages() call.
+ * @page:  struct page pointer passed to free_pages().
+ * @order: order of deallocated struct page.
+ *
+ * KMSAN marks freed memory as uninitialized.
+ */
+void kmsan_free_page(struct page *page, unsigned int order);
+
+/**
+ * kmsan_copy_page_meta() - Copy KMSAN metadata between two pages.
+ * @dst: destination page.
+ * @src: source page.
+ *
+ * KMSAN copies the contents of metadata pages for @src into the metadata pages
+ * for @dst. If @dst has no associated metadata pages, nothing happens.
+ * If @src has no associated metadata pages, @dst metadata pages are unpoisoned.
+ */
+void kmsan_copy_page_meta(struct page *dst, struct page *src);
+
+/**
+ * kmsan_map_kernel_range_noflush() - Notify KMSAN about a vmap.
+ * @start:	start of vmapped range.
+ * @end:	end of vmapped range.
+ * @prot:	page protection flags used for vmap.
+ * @pages:	array of pages.
+ * @page_shift:	page_shift passed to vmap_range_noflush().
+ *
+ * KMSAN maps shadow and origin pages of @pages into contiguous ranges in
+ * vmalloc metadata address range.
+ */
+void kmsan_vmap_pages_range_noflush(unsigned long start, unsigned long end,
+				    pgprot_t prot, struct page **pages,
+				    unsigned int page_shift);
+
+/**
+ * kmsan_vunmap_kernel_range_noflush() - Notify KMSAN about a vunmap.
+ * @start: start of vunmapped range.
+ * @end:   end of vunmapped range.
+ *
+ * KMSAN unmaps the contiguous metadata ranges created by
+ * kmsan_map_kernel_range_noflush().
+ */
+void kmsan_vunmap_range_noflush(unsigned long start, unsigned long end);
+
+/**
+ * kmsan_ioremap_page_range() - Notify KMSAN about a ioremap_page_range() call.
+ * @addr:	range start.
+ * @end:	range end.
+ * @phys_addr:	physical range start.
+ * @prot:	page protection flags used for ioremap_page_range().
+ * @page_shift:	page_shift argument passed to vmap_range_noflush().
+ *
+ * KMSAN creates new metadata pages for the physical pages mapped into the
+ * virtual memory.
+ */
+void kmsan_ioremap_page_range(unsigned long addr, unsigned long end,
+			      phys_addr_t phys_addr, pgprot_t prot,
+			      unsigned int page_shift);
+
+/**
+ * kmsan_iounmap_page_range() - Notify KMSAN about a iounmap_page_range() call.
+ * @start: range start.
+ * @end:   range end.
+ *
+ * KMSAN unmaps the metadata pages for the given range and, unlike for
+ * vunmap_page_range(), also deallocates them.
+ */
+void kmsan_iounmap_page_range(unsigned long start, unsigned long end);
+
+#else
+
+static inline int kmsan_alloc_page(struct page *page, unsigned int order,
+				   gfp_t flags)
+{
+	return 0;
+}
+
+static inline void kmsan_free_page(struct page *page, unsigned int order)
+{
+}
+
+static inline void kmsan_copy_page_meta(struct page *dst, struct page *src)
+{
+}
+
+static inline void kmsan_vmap_pages_range_noflush(unsigned long start,
+						  unsigned long end,
+						  pgprot_t prot,
+						  struct page **pages,
+						  unsigned int page_shift)
+{
+}
+
+static inline void kmsan_vunmap_range_noflush(unsigned long start,
+					      unsigned long end)
+{
+}
+
+static inline void kmsan_ioremap_page_range(unsigned long start,
+					    unsigned long end,
+					    phys_addr_t phys_addr,
+					    pgprot_t prot,
+					    unsigned int page_shift)
+{
+}
+
+static inline void kmsan_iounmap_page_range(unsigned long start,
+					    unsigned long end)
+{
+}
+
 #endif
 
 #endif /* _LINUX_KMSAN_H */
diff --git a/mm/internal.h b/mm/internal.h
index c0f8fbe0445b5..dccdba2ac4ecf 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -847,8 +847,14 @@ int vmap_pages_range_noflush(unsigned long addr, unsigned long end,
 }
 #endif
 
+int __vmap_pages_range_noflush(unsigned long addr, unsigned long end,
+			       pgprot_t prot, struct page **pages,
+			       unsigned int page_shift);
+
 void vunmap_range_noflush(unsigned long start, unsigned long end);
 
+void __vunmap_range_noflush(unsigned long start, unsigned long end);
+
 int numa_migrate_prep(struct page *page, struct vm_area_struct *vma,
 		      unsigned long addr, int page_nid, int *flags);
 
diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
index 4ac62fa67a02a..070756be70e3a 100644
--- a/mm/kmsan/hooks.c
+++ b/mm/kmsan/hooks.c
@@ -26,6 +26,93 @@
  * skipping effects of functions like memset() inside instrumented code.
  */
 
+static unsigned long vmalloc_shadow(unsigned long addr)
+{
+	return (unsigned long)kmsan_get_metadata((void *)addr,
+						 KMSAN_META_SHADOW);
+}
+
+static unsigned long vmalloc_origin(unsigned long addr)
+{
+	return (unsigned long)kmsan_get_metadata((void *)addr,
+						 KMSAN_META_ORIGIN);
+}
+
+void kmsan_vunmap_range_noflush(unsigned long start, unsigned long end)
+{
+	__vunmap_range_noflush(vmalloc_shadow(start), vmalloc_shadow(end));
+	__vunmap_range_noflush(vmalloc_origin(start), vmalloc_origin(end));
+	flush_cache_vmap(vmalloc_shadow(start), vmalloc_shadow(end));
+	flush_cache_vmap(vmalloc_origin(start), vmalloc_origin(end));
+}
+EXPORT_SYMBOL(kmsan_vunmap_range_noflush);
+
+/*
+ * This function creates new shadow/origin pages for the physical pages mapped
+ * into the virtual memory. If those physical pages already had shadow/origin,
+ * those are ignored.
+ */
+void kmsan_ioremap_page_range(unsigned long start, unsigned long end,
+			      phys_addr_t phys_addr, pgprot_t prot,
+			      unsigned int page_shift)
+{
+	gfp_t gfp_mask = GFP_KERNEL | __GFP_ZERO;
+	struct page *shadow, *origin;
+	unsigned long off = 0;
+	int i, nr;
+
+	if (!kmsan_enabled || kmsan_in_runtime())
+		return;
+
+	nr = (end - start) / PAGE_SIZE;
+	kmsan_enter_runtime();
+	for (i = 0; i < nr; i++, off += PAGE_SIZE) {
+		shadow = alloc_pages(gfp_mask, 1);
+		origin = alloc_pages(gfp_mask, 1);
+		__vmap_pages_range_noflush(
+			vmalloc_shadow(start + off),
+			vmalloc_shadow(start + off + PAGE_SIZE), prot, &shadow,
+			PAGE_SHIFT);
+		__vmap_pages_range_noflush(
+			vmalloc_origin(start + off),
+			vmalloc_origin(start + off + PAGE_SIZE), prot, &origin,
+			PAGE_SHIFT);
+	}
+	flush_cache_vmap(vmalloc_shadow(start), vmalloc_shadow(end));
+	flush_cache_vmap(vmalloc_origin(start), vmalloc_origin(end));
+	kmsan_leave_runtime();
+}
+EXPORT_SYMBOL(kmsan_ioremap_page_range);
+
+void kmsan_iounmap_page_range(unsigned long start, unsigned long end)
+{
+	unsigned long v_shadow, v_origin;
+	struct page *shadow, *origin;
+	int i, nr;
+
+	if (!kmsan_enabled || kmsan_in_runtime())
+		return;
+
+	nr = (end - start) / PAGE_SIZE;
+	kmsan_enter_runtime();
+	v_shadow = (unsigned long)vmalloc_shadow(start);
+	v_origin = (unsigned long)vmalloc_origin(start);
+	for (i = 0; i < nr; i++, v_shadow += PAGE_SIZE, v_origin += PAGE_SIZE) {
+		shadow = kmsan_vmalloc_to_page_or_null((void *)v_shadow);
+		origin = kmsan_vmalloc_to_page_or_null((void *)v_origin);
+		__vunmap_range_noflush(v_shadow, vmalloc_shadow(end));
+		__vunmap_range_noflush(v_origin, vmalloc_origin(end));
+		if (shadow)
+			__free_pages(shadow, 1);
+		if (origin)
+			__free_pages(origin, 1);
+	}
+	flush_cache_vmap(vmalloc_shadow(start), vmalloc_shadow(end));
+	flush_cache_vmap(vmalloc_origin(start), vmalloc_origin(end));
+	kmsan_leave_runtime();
+}
+EXPORT_SYMBOL(kmsan_iounmap_page_range);
+
 /* Functions from kmsan-checks.h follow. */
 void kmsan_poison_memory(const void *address, size_t size, gfp_t flags)
 {
diff --git a/mm/kmsan/shadow.c b/mm/kmsan/shadow.c
index e5ad2972d7362..416cb85487a1a 100644
--- a/mm/kmsan/shadow.c
+++ b/mm/kmsan/shadow.c
@@ -145,3 +145,117 @@ void *kmsan_get_metadata(void *address, bool is_origin)
 
 	return (is_origin ? origin_ptr_for(page) : shadow_ptr_for(page)) + off;
 }
+
+void kmsan_copy_page_meta(struct page *dst, struct page *src)
+{
+	if (!kmsan_enabled || kmsan_in_runtime())
+		return;
+	if (!dst || !page_has_metadata(dst))
+		return;
+	if (!src || !page_has_metadata(src)) {
+		kmsan_internal_unpoison_memory(page_address(dst), PAGE_SIZE,
+					       /*checked*/ false);
+		return;
+	}
+
+	kmsan_enter_runtime();
+	__memcpy(shadow_ptr_for(dst), shadow_ptr_for(src), PAGE_SIZE);
+	__memcpy(origin_ptr_for(dst), origin_ptr_for(src), PAGE_SIZE);
+	kmsan_leave_runtime();
+}
+
+void kmsan_alloc_page(struct page *page, unsigned int order, gfp_t flags)
+{
+	bool initialized = (flags & __GFP_ZERO) || !kmsan_enabled;
+	struct page *shadow, *origin;
+	depot_stack_handle_t handle;
+	int pages = 1 << order;
+	int i;
+
+	if (!page)
+		return;
+
+	shadow = shadow_page_for(page);
+	origin = origin_page_for(page);
+
+	if (initialized) {
+		__memset(page_address(shadow), 0, PAGE_SIZE * pages);
+		__memset(page_address(origin), 0, PAGE_SIZE * pages);
+		return;
+	}
+
+	/* Zero pages allocated by the runtime should also be initialized. */
+	if (kmsan_in_runtime())
+		return;
+
+	__memset(page_address(shadow), -1, PAGE_SIZE * pages);
+	kmsan_enter_runtime();
+	handle = kmsan_save_stack_with_flags(flags, /*extra_bits*/ 0);
+	kmsan_leave_runtime();
+	/*
+	 * Addresses are page-aligned, pages are contiguous, so it's ok
+	 * to just fill the origin pages with @handle.
+	 */
+	for (i = 0; i < PAGE_SIZE * pages / sizeof(handle); i++)
+		((depot_stack_handle_t *)page_address(origin))[i] = handle;
+}
+
+void kmsan_free_page(struct page *page, unsigned int order)
+{
+	if (!kmsan_enabled || kmsan_in_runtime())
+		return;
+	kmsan_enter_runtime();
+	kmsan_internal_poison_memory(page_address(page),
+				     PAGE_SIZE << compound_order(page),
+				     GFP_KERNEL,
+				     KMSAN_POISON_CHECK | KMSAN_POISON_FREE);
+	kmsan_leave_runtime();
+}
+
+void kmsan_vmap_pages_range_noflush(unsigned long start, unsigned long end,
+				    pgprot_t prot, struct page **pages,
+				    unsigned int page_shift)
+{
+	unsigned long shadow_start, origin_start, shadow_end, origin_end;
+	struct page **s_pages, **o_pages;
+	int nr, i, mapped;
+
+	if (!kmsan_enabled)
+		return;
+
+	shadow_start = vmalloc_meta((void *)start, KMSAN_META_SHADOW);
+	shadow_end = vmalloc_meta((void *)end, KMSAN_META_SHADOW);
+	if (!shadow_start)
+		return;
+
+	nr = (end - start) / PAGE_SIZE;
+	s_pages = kcalloc(nr, sizeof(*s_pages), GFP_KERNEL);
+	o_pages = kcalloc(nr, sizeof(*o_pages), GFP_KERNEL);
+	if (!s_pages || !o_pages)
+		goto ret;
+	for (i = 0; i < nr; i++) {
+		s_pages[i] = shadow_page_for(pages[i]);
+		o_pages[i] = origin_page_for(pages[i]);
+	}
+	prot = __pgprot(pgprot_val(prot) | _PAGE_NX);
+	prot = PAGE_KERNEL;
+
+	origin_start = vmalloc_meta((void *)start, KMSAN_META_ORIGIN);
+	origin_end = vmalloc_meta((void *)end, KMSAN_META_ORIGIN);
+	kmsan_enter_runtime();
+	mapped = __vmap_pages_range_noflush(shadow_start, shadow_end, prot,
+					    s_pages, page_shift);
+	KMSAN_WARN_ON(mapped);
+	mapped = __vmap_pages_range_noflush(origin_start, origin_end, prot,
+					    o_pages, page_shift);
+	KMSAN_WARN_ON(mapped);
+	kmsan_leave_runtime();
+	flush_tlb_kernel_range(shadow_start, shadow_end);
+	flush_tlb_kernel_range(origin_start, origin_end);
+	flush_cache_vmap(shadow_start, shadow_end);
+	flush_cache_vmap(origin_start, origin_end);
+
+ret:
+	kfree(s_pages);
+	kfree(o_pages);
+}
diff --git a/mm/memory.c b/mm/memory.c
index 7a089145cad4b..947349399e05c 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -52,6 +52,7 @@
 #include <linux/highmem.h>
 #include <linux/pagemap.h>
 #include <linux/memremap.h>
+#include <linux/kmsan.h>
 #include <linux/ksm.h>
 #include <linux/rmap.h>
 #include <linux/export.h>
@@ -3120,6 +3121,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
 			delayacct_wpcopy_end();
 			return 0;
 		}
+		kmsan_copy_page_meta(new_page, old_page);
 	}
 
 	if (mem_cgroup_charge(page_folio(new_page), mm, GFP_KERNEL))
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e008a3df0485c..785459251145e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -27,6 +27,7 @@
 #include <linux/compiler.h>
 #include <linux/kernel.h>
 #include <linux/kasan.h>
+#include <linux/kmsan.h>
 #include <linux/module.h>
 #include <linux/suspend.h>
 #include <linux/pagevec.h>
@@ -1320,6 +1321,7 @@ static __always_inline bool free_pages_prepare(struct page *page,
 	VM_BUG_ON_PAGE(PageTail(page), page);
 
 	trace_mm_page_free(page, order);
+	kmsan_free_page(page, order);
 
 	if (unlikely(PageHWPoison(page)) && !order) {
 		/*
@@ -3711,6 +3713,14 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone,
 /*
  * Allocate a page from the given zone. Use pcplists for order-0 allocations.
  */
+
+/*
+ * Do not instrument rmqueue() with KMSAN. This function may call
+ * __msan_poison_alloca() through a call to set_pfnblock_flags_mask().
+ * If __msan_poison_alloca() attempts to allocate pages for the stack depot, it
+ * may call rmqueue() again, which will result in a deadlock.
+ */
+__no_sanitize_memory
 static inline
 struct page *rmqueue(struct zone *preferred_zone,
 			struct zone *zone, unsigned int order,
@@ -5446,6 +5456,7 @@ struct page *__alloc_pages(gfp_t gfp, unsigned int order, int preferred_nid,
 	}
 
 	trace_mm_page_alloc(page, order, alloc_gfp, ac.migratetype);
+	kmsan_alloc_page(page, order, alloc_gfp);
 
 	return page;
 }
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index effd1ff6a4b41..6973d7f1ef934 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -320,6 +320,9 @@ int ioremap_page_range(unsigned long addr, unsigned long end,
 	err = vmap_range_noflush(addr, end, phys_addr, pgprot_nx(prot),
 				 ioremap_max_page_shift);
 	flush_cache_vmap(addr, end);
+	if (!err)
+		kmsan_ioremap_page_range(addr, end, phys_addr, prot,
+					 ioremap_max_page_shift);
 	return err;
 }
 
@@ -416,7 +419,7 @@ static void vunmap_p4d_range(pgd_t *pgd, unsigned long addr, unsigned long end,
  *
  * This is an internal function only. Do not use outside mm/.
  */
-void vunmap_range_noflush(unsigned long start, unsigned long end)
+void __vunmap_range_noflush(unsigned long start, unsigned long end)
 {
 	unsigned long next;
 	pgd_t *pgd;
@@ -438,6 +441,12 @@ void vunmap_range_noflush(unsigned long start, unsigned long end)
 		arch_sync_kernel_mappings(start, end);
 }
 
+void vunmap_range_noflush(unsigned long start, unsigned long end)
+{
+	kmsan_vunmap_range_noflush(start, end);
+	__vunmap_range_noflush(start, end);
+}
+
 /**
  * vunmap_range - unmap kernel virtual addresses
  * @addr: start of the VM area to unmap
@@ -575,7 +584,7 @@ static int vmap_small_pages_range_noflush(unsigned long addr, unsigned long end,
  *
  * This is an internal function only. Do not use outside mm/.
  */
-int vmap_pages_range_noflush(unsigned long addr, unsigned long end,
+int __vmap_pages_range_noflush(unsigned long addr, unsigned long end,
 		pgprot_t prot, struct page **pages, unsigned int page_shift)
 {
 	unsigned int i, nr = (end - addr) >> PAGE_SHIFT;
@@ -601,6 +610,13 @@ int vmap_pages_range_noflush(unsigned long addr, unsigned long end,
 	return 0;
 }
 
+int vmap_pages_range_noflush(unsigned long addr, unsigned long end,
+		pgprot_t prot, struct page **pages, unsigned int page_shift)
+{
+	kmsan_vmap_pages_range_noflush(addr, end, prot, pages, page_shift);
+	return __vmap_pages_range_noflush(addr, end, prot, pages, page_shift);
+}
+
 /**
  * vmap_pages_range - map pages to a kernel virtual address
  * @addr: start of the VM area to map
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH v4 15/45] mm: kmsan: call KMSAN hooks from SLUB code
  2022-07-01 14:22 [PATCH v4 00/45] Add KernelMemorySanitizer infrastructure Alexander Potapenko
                   ` (13 preceding siblings ...)
  2022-07-01 14:22 ` [PATCH v4 14/45] mm: kmsan: maintain KMSAN metadata for page operations Alexander Potapenko
@ 2022-07-01 14:22 ` Alexander Potapenko
  2022-07-12 13:13   ` Marco Elver
  2022-07-01 14:22 ` [PATCH v4 16/45] kmsan: handle task creation and exiting Alexander Potapenko
                   ` (29 subsequent siblings)
  44 siblings, 1 reply; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-01 14:22 UTC (permalink / raw)
  To: glider
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, linux-mm, linux-arch,
	linux-kernel

In order to report uninitialized memory coming from heap allocations
KMSAN has to poison them unless they're created with __GFP_ZERO.

It's handy that we need KMSAN hooks in the places where
init_on_alloc/init_on_free initialization is performed.

In addition, we apply __no_kmsan_checks to get_freepointer_safe() to
suppress reports when accessing freelist pointers that reside in freed
objects.

Signed-off-by: Alexander Potapenko <glider@google.com>
---
v2:
 -- move the implementation of SLUB hooks here

v4:
 -- change sizeof(type) to sizeof(*ptr)
 -- swap mm: and kmsan: in the subject
 -- get rid of kmsan_init(), replace it with __no_kmsan_checks

Link: https://linux-review.googlesource.com/id/I6954b386c5c5d7f99f48bb6cbcc74b75136ce86e
---
 include/linux/kmsan.h | 57 ++++++++++++++++++++++++++++++
 mm/kmsan/hooks.c      | 80 +++++++++++++++++++++++++++++++++++++++++++
 mm/slab.h             |  1 +
 mm/slub.c             | 18 ++++++++++
 4 files changed, 156 insertions(+)

diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h
index 699fe4f5b3bee..fd76cea338878 100644
--- a/include/linux/kmsan.h
+++ b/include/linux/kmsan.h
@@ -15,6 +15,7 @@
 #include <linux/types.h>
 
 struct page;
+struct kmem_cache;
 
 #ifdef CONFIG_KMSAN
 
@@ -72,6 +73,44 @@ void kmsan_free_page(struct page *page, unsigned int order);
  */
 void kmsan_copy_page_meta(struct page *dst, struct page *src);
 
+/**
+ * kmsan_slab_alloc() - Notify KMSAN about a slab allocation.
+ * @s:      slab cache the object belongs to.
+ * @object: object pointer.
+ * @flags:  GFP flags passed to the allocator.
+ *
+ * Depending on cache flags and GFP flags, KMSAN sets up the metadata of the
+ * newly created object, marking it as initialized or uninitialized.
+ */
+void kmsan_slab_alloc(struct kmem_cache *s, void *object, gfp_t flags);
+
+/**
+ * kmsan_slab_free() - Notify KMSAN about a slab deallocation.
+ * @s:      slab cache the object belongs to.
+ * @object: object pointer.
+ *
+ * KMSAN marks the freed object as uninitialized.
+ */
+void kmsan_slab_free(struct kmem_cache *s, void *object);
+
+/**
+ * kmsan_kmalloc_large() - Notify KMSAN about a large slab allocation.
+ * @ptr:   object pointer.
+ * @size:  object size.
+ * @flags: GFP flags passed to the allocator.
+ *
+ * Similar to kmsan_slab_alloc(), but for large allocations.
+ */
+void kmsan_kmalloc_large(const void *ptr, size_t size, gfp_t flags);
+
+/**
+ * kmsan_kfree_large() - Notify KMSAN about a large slab deallocation.
+ * @ptr: object pointer.
+ *
+ * Similar to kmsan_slab_free(), but for large allocations.
+ */
+void kmsan_kfree_large(const void *ptr);
+
 /**
  * kmsan_map_kernel_range_noflush() - Notify KMSAN about a vmap.
  * @start:	start of vmapped range.
@@ -138,6 +177,24 @@ static inline void kmsan_copy_page_meta(struct page *dst, struct page *src)
 {
 }
 
+static inline void kmsan_slab_alloc(struct kmem_cache *s, void *object,
+				    gfp_t flags)
+{
+}
+
+static inline void kmsan_slab_free(struct kmem_cache *s, void *object)
+{
+}
+
+static inline void kmsan_kmalloc_large(const void *ptr, size_t size,
+				       gfp_t flags)
+{
+}
+
+static inline void kmsan_kfree_large(const void *ptr)
+{
+}
+
 static inline void kmsan_vmap_pages_range_noflush(unsigned long start,
 						  unsigned long end,
 						  pgprot_t prot,
diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
index 070756be70e3a..052e17b7a717d 100644
--- a/mm/kmsan/hooks.c
+++ b/mm/kmsan/hooks.c
@@ -26,6 +26,86 @@
  * skipping effects of functions like memset() inside instrumented code.
  */
 
+void kmsan_slab_alloc(struct kmem_cache *s, void *object, gfp_t flags)
+{
+	if (unlikely(object == NULL))
+		return;
+	if (!kmsan_enabled || kmsan_in_runtime())
+		return;
+	/*
+	 * There's a ctor or this is an RCU cache - do nothing. The memory
+	 * status hasn't changed since last use.
+	 */
+	if (s->ctor || (s->flags & SLAB_TYPESAFE_BY_RCU))
+		return;
+
+	kmsan_enter_runtime();
+	if (flags & __GFP_ZERO)
+		kmsan_internal_unpoison_memory(object, s->object_size,
+					       KMSAN_POISON_CHECK);
+	else
+		kmsan_internal_poison_memory(object, s->object_size, flags,
+					     KMSAN_POISON_CHECK);
+	kmsan_leave_runtime();
+}
+EXPORT_SYMBOL(kmsan_slab_alloc);
+
+void kmsan_slab_free(struct kmem_cache *s, void *object)
+{
+	if (!kmsan_enabled || kmsan_in_runtime())
+		return;
+
+	/* RCU slabs could be legally used after free within the RCU period */
+	if (unlikely(s->flags & (SLAB_TYPESAFE_BY_RCU | SLAB_POISON)))
+		return;
+	/*
+	 * If there's a constructor, freed memory must remain in the same state
+	 * until the next allocation. We cannot save its state to detect
+	 * use-after-free bugs, instead we just keep it unpoisoned.
+	 */
+	if (s->ctor)
+		return;
+	kmsan_enter_runtime();
+	kmsan_internal_poison_memory(object, s->object_size, GFP_KERNEL,
+				     KMSAN_POISON_CHECK | KMSAN_POISON_FREE);
+	kmsan_leave_runtime();
+}
+EXPORT_SYMBOL(kmsan_slab_free);
+
+void kmsan_kmalloc_large(const void *ptr, size_t size, gfp_t flags)
+{
+	if (unlikely(ptr == NULL))
+		return;
+	if (!kmsan_enabled || kmsan_in_runtime())
+		return;
+	kmsan_enter_runtime();
+	if (flags & __GFP_ZERO)
+		kmsan_internal_unpoison_memory((void *)ptr, size,
+					       /*checked*/ true);
+	else
+		kmsan_internal_poison_memory((void *)ptr, size, flags,
+					     KMSAN_POISON_CHECK);
+	kmsan_leave_runtime();
+}
+EXPORT_SYMBOL(kmsan_kmalloc_large);
+
+void kmsan_kfree_large(const void *ptr)
+{
+	struct page *page;
+
+	if (!kmsan_enabled || kmsan_in_runtime())
+		return;
+	kmsan_enter_runtime();
+	page = virt_to_head_page((void *)ptr);
+	KMSAN_WARN_ON(ptr != page_address(page));
+	kmsan_internal_poison_memory((void *)ptr,
+				     PAGE_SIZE << compound_order(page),
+				     GFP_KERNEL,
+				     KMSAN_POISON_CHECK | KMSAN_POISON_FREE);
+	kmsan_leave_runtime();
+}
+EXPORT_SYMBOL(kmsan_kfree_large);
+
 static unsigned long vmalloc_shadow(unsigned long addr)
 {
 	return (unsigned long)kmsan_get_metadata((void *)addr,
diff --git a/mm/slab.h b/mm/slab.h
index db9fb5c8dae73..d0de8195873d8 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -752,6 +752,7 @@ static inline void slab_post_alloc_hook(struct kmem_cache *s,
 			memset(p[i], 0, s->object_size);
 		kmemleak_alloc_recursive(p[i], s->object_size, 1,
 					 s->flags, flags);
+		kmsan_slab_alloc(s, p[i], flags);
 	}
 
 	memcg_slab_post_alloc_hook(s, objcg, flags, size, p);
diff --git a/mm/slub.c b/mm/slub.c
index b1281b8654bd3..b8b601f165087 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -22,6 +22,7 @@
 #include <linux/proc_fs.h>
 #include <linux/seq_file.h>
 #include <linux/kasan.h>
+#include <linux/kmsan.h>
 #include <linux/cpu.h>
 #include <linux/cpuset.h>
 #include <linux/mempolicy.h>
@@ -359,6 +360,17 @@ static void prefetch_freepointer(const struct kmem_cache *s, void *object)
 	prefetchw(object + s->offset);
 }
 
+/*
+ * When running under KMSAN, get_freepointer_safe() may return an uninitialized
+ * pointer value in the case the current thread loses the race for the next
+ * memory chunk in the freelist. In that case this_cpu_cmpxchg_double() in
+ * slab_alloc_node() will fail, so the uninitialized value won't be used, but
+ * KMSAN will still check all arguments of cmpxchg because of imperfect
+ * handling of inline assembly.
+ * To work around this problem, we apply __no_kmsan_checks to ensure that
+ * get_freepointer_safe() returns initialized memory.
+ */
+__no_kmsan_checks
 static inline void *get_freepointer_safe(struct kmem_cache *s, void *object)
 {
 	unsigned long freepointer_addr;
@@ -1709,6 +1721,7 @@ static inline void *kmalloc_large_node_hook(void *ptr, size_t size, gfp_t flags)
 	ptr = kasan_kmalloc_large(ptr, size, flags);
 	/* As ptr might get tagged, call kmemleak hook after KASAN. */
 	kmemleak_alloc(ptr, size, 1, flags);
+	kmsan_kmalloc_large(ptr, size, flags);
 	return ptr;
 }
 
@@ -1716,12 +1729,14 @@ static __always_inline void kfree_hook(void *x)
 {
 	kmemleak_free(x);
 	kasan_kfree_large(x);
+	kmsan_kfree_large(x);
 }
 
 static __always_inline bool slab_free_hook(struct kmem_cache *s,
 						void *x, bool init)
 {
 	kmemleak_free_recursive(x, s->flags);
+	kmsan_slab_free(s, x);
 
 	debug_check_no_locks_freed(x, s->object_size);
 
@@ -3756,6 +3771,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 	 */
 	slab_post_alloc_hook(s, objcg, flags, size, p,
 				slab_want_init_on_alloc(flags, s));
+
 	return i;
 error:
 	slub_put_cpu_ptr(s->cpu_slab);
@@ -5939,6 +5955,7 @@ static char *create_unique_id(struct kmem_cache *s)
 	p += sprintf(p, "%07u", s->size);
 
 	BUG_ON(p > name + ID_STR_LENGTH - 1);
+	kmsan_unpoison_memory(name, p - name);
 	return name;
 }
 
@@ -6040,6 +6057,7 @@ static int sysfs_slab_alias(struct kmem_cache *s, const char *name)
 	al->name = name;
 	al->next = alias_list;
 	alias_list = al;
+	kmsan_unpoison_memory(al, sizeof(*al));
 	return 0;
 }
 
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH v4 16/45] kmsan: handle task creation and exiting
  2022-07-01 14:22 [PATCH v4 00/45] Add KernelMemorySanitizer infrastructure Alexander Potapenko
                   ` (14 preceding siblings ...)
  2022-07-01 14:22 ` [PATCH v4 15/45] mm: kmsan: call KMSAN hooks from SLUB code Alexander Potapenko
@ 2022-07-01 14:22 ` Alexander Potapenko
  2022-07-12 13:17   ` Marco Elver
  2022-07-01 14:22 ` [PATCH v4 17/45] init: kmsan: call KMSAN initialization routines Alexander Potapenko
                   ` (28 subsequent siblings)
  44 siblings, 1 reply; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-01 14:22 UTC (permalink / raw)
  To: glider
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, linux-mm, linux-arch,
	linux-kernel

Tell KMSAN that a new task is created, so the tool creates a backing
metadata structure for that task.

Signed-off-by: Alexander Potapenko <glider@google.com>
---
v2:
 -- move implementation of kmsan_task_create() and kmsan_task_exit() here

v4:
 -- change sizeof(type) to sizeof(*ptr)

Link: https://linux-review.googlesource.com/id/I0f41c3a1c7d66f7e14aabcfdfc7c69addb945805
---
 include/linux/kmsan.h | 17 +++++++++++++++++
 kernel/exit.c         |  2 ++
 kernel/fork.c         |  2 ++
 mm/kmsan/core.c       | 10 ++++++++++
 mm/kmsan/hooks.c      | 19 +++++++++++++++++++
 mm/kmsan/kmsan.h      |  2 ++
 6 files changed, 52 insertions(+)

diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h
index fd76cea338878..b71e2032222e9 100644
--- a/include/linux/kmsan.h
+++ b/include/linux/kmsan.h
@@ -16,6 +16,7 @@
 
 struct page;
 struct kmem_cache;
+struct task_struct;
 
 #ifdef CONFIG_KMSAN
 
@@ -42,6 +43,14 @@ struct kmsan_ctx {
 	bool allow_reporting;
 };
 
+void kmsan_task_create(struct task_struct *task);
+
+/**
+ * kmsan_task_exit() - Notify KMSAN that a task has exited.
+ * @task: task about to finish.
+ */
+void kmsan_task_exit(struct task_struct *task);
+
 /**
  * kmsan_alloc_page() - Notify KMSAN about an alloc_pages() call.
  * @page:  struct page pointer returned by alloc_pages().
@@ -163,6 +172,14 @@ void kmsan_iounmap_page_range(unsigned long start, unsigned long end);
 
 #else
 
+static inline void kmsan_task_create(struct task_struct *task)
+{
+}
+
+static inline void kmsan_task_exit(struct task_struct *task)
+{
+}
+
 static inline int kmsan_alloc_page(struct page *page, unsigned int order,
 				   gfp_t flags)
 {
diff --git a/kernel/exit.c b/kernel/exit.c
index f072959fcab7f..1784b7a741ddd 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -60,6 +60,7 @@
 #include <linux/writeback.h>
 #include <linux/shm.h>
 #include <linux/kcov.h>
+#include <linux/kmsan.h>
 #include <linux/random.h>
 #include <linux/rcuwait.h>
 #include <linux/compat.h>
@@ -741,6 +742,7 @@ void __noreturn do_exit(long code)
 	WARN_ON(tsk->plug);
 
 	kcov_task_exit(tsk);
+	kmsan_task_exit(tsk);
 
 	coredump_task_exit(tsk);
 	ptrace_event(PTRACE_EVENT_EXIT, code);
diff --git a/kernel/fork.c b/kernel/fork.c
index 9d44f2d46c696..6dfca6f00ec82 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -37,6 +37,7 @@
 #include <linux/fdtable.h>
 #include <linux/iocontext.h>
 #include <linux/key.h>
+#include <linux/kmsan.h>
 #include <linux/binfmts.h>
 #include <linux/mman.h>
 #include <linux/mmu_notifier.h>
@@ -1026,6 +1027,7 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
 	tsk->worker_private = NULL;
 
 	kcov_task_init(tsk);
+	kmsan_task_create(tsk);
 	kmap_local_fork(tsk);
 
 #ifdef CONFIG_FAULT_INJECTION
diff --git a/mm/kmsan/core.c b/mm/kmsan/core.c
index 16fb8880a9c6d..7eabed03ed10b 100644
--- a/mm/kmsan/core.c
+++ b/mm/kmsan/core.c
@@ -44,6 +44,16 @@ bool kmsan_enabled __read_mostly;
  */
 DEFINE_PER_CPU(struct kmsan_ctx, kmsan_percpu_ctx);
 
+void kmsan_internal_task_create(struct task_struct *task)
+{
+	struct kmsan_ctx *ctx = &task->kmsan_ctx;
+	struct thread_info *info = current_thread_info();
+
+	__memset(ctx, 0, sizeof(*ctx));
+	ctx->allow_reporting = true;
+	kmsan_internal_unpoison_memory(info, sizeof(*info), false);
+}
+
 void kmsan_internal_poison_memory(void *address, size_t size, gfp_t flags,
 				  unsigned int poison_flags)
 {
diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
index 052e17b7a717d..43a529569053d 100644
--- a/mm/kmsan/hooks.c
+++ b/mm/kmsan/hooks.c
@@ -26,6 +26,25 @@
  * skipping effects of functions like memset() inside instrumented code.
  */
 
+void kmsan_task_create(struct task_struct *task)
+{
+	kmsan_enter_runtime();
+	kmsan_internal_task_create(task);
+	kmsan_leave_runtime();
+}
+EXPORT_SYMBOL(kmsan_task_create);
+
+void kmsan_task_exit(struct task_struct *task)
+{
+	struct kmsan_ctx *ctx = &task->kmsan_ctx;
+
+	if (!kmsan_enabled || kmsan_in_runtime())
+		return;
+
+	ctx->allow_reporting = false;
+}
+EXPORT_SYMBOL(kmsan_task_exit);
+
 void kmsan_slab_alloc(struct kmem_cache *s, void *object, gfp_t flags)
 {
 	if (unlikely(object == NULL))
diff --git a/mm/kmsan/kmsan.h b/mm/kmsan/kmsan.h
index d3c400ca097ba..c7fb8666607e2 100644
--- a/mm/kmsan/kmsan.h
+++ b/mm/kmsan/kmsan.h
@@ -179,6 +179,8 @@ void kmsan_internal_set_shadow_origin(void *address, size_t size, int b,
 				      u32 origin, bool checked);
 depot_stack_handle_t kmsan_internal_chain_origin(depot_stack_handle_t id);
 
+void kmsan_internal_task_create(struct task_struct *task);
+
 bool kmsan_metadata_is_contiguous(void *addr, size_t size);
 void kmsan_internal_check_memory(void *addr, size_t size, const void *user_addr,
 				 int reason);
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH v4 17/45] init: kmsan: call KMSAN initialization routines
  2022-07-01 14:22 [PATCH v4 00/45] Add KernelMemorySanitizer infrastructure Alexander Potapenko
                   ` (15 preceding siblings ...)
  2022-07-01 14:22 ` [PATCH v4 16/45] kmsan: handle task creation and exiting Alexander Potapenko
@ 2022-07-01 14:22 ` Alexander Potapenko
  2022-07-12 14:05   ` Marco Elver
  2022-07-01 14:22 ` [PATCH v4 18/45] instrumented.h: add KMSAN support Alexander Potapenko
                   ` (27 subsequent siblings)
  44 siblings, 1 reply; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-01 14:22 UTC (permalink / raw)
  To: glider
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, linux-mm, linux-arch,
	linux-kernel

kmsan_init_shadow() scans the mappings created at boot time and creates
metadata pages for those mappings.

When the memblock allocator returns pages to pagealloc, we reserve 2/3
of those pages and use them as metadata for the remaining 1/3. Once KMSAN
starts, every page allocated by pagealloc has its associated shadow and
origin pages.

kmsan_initialize() initializes the bookkeeping for init_task and enables
KMSAN.

Signed-off-by: Alexander Potapenko <glider@google.com>
---
v2:
 -- move mm/kmsan/init.c and kmsan_memblock_free_pages() to this patch
 -- print a warning that KMSAN is a debugging tool (per Greg K-H's
    request)

v4:
 -- change sizeof(type) to sizeof(*ptr)
 -- replace occurrences of |var| with @var
 -- swap init: and kmsan: in the subject
 -- do not export __init functions

Link: https://linux-review.googlesource.com/id/I7bc53706141275914326df2345881ffe0cdd16bd
---
 include/linux/kmsan.h |  48 +++++++++
 init/main.c           |   3 +
 mm/kmsan/Makefile     |   3 +-
 mm/kmsan/init.c       | 238 ++++++++++++++++++++++++++++++++++++++++++
 mm/kmsan/kmsan.h      |   3 +
 mm/kmsan/shadow.c     |  36 +++++++
 mm/page_alloc.c       |   3 +
 7 files changed, 333 insertions(+), 1 deletion(-)
 create mode 100644 mm/kmsan/init.c

diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h
index b71e2032222e9..82fd564cc72e7 100644
--- a/include/linux/kmsan.h
+++ b/include/linux/kmsan.h
@@ -51,6 +51,40 @@ void kmsan_task_create(struct task_struct *task);
  */
 void kmsan_task_exit(struct task_struct *task);
 
+/**
+ * kmsan_init_shadow() - Initialize KMSAN shadow at boot time.
+ *
+ * Allocate and initialize KMSAN metadata for early allocations.
+ */
+void __init kmsan_init_shadow(void);
+
+/**
+ * kmsan_init_runtime() - Initialize KMSAN state and enable KMSAN.
+ */
+void __init kmsan_init_runtime(void);
+
+/**
+ * kmsan_memblock_free_pages() - handle freeing of memblock pages.
+ * @page:	struct page to free.
+ * @order:	order of @page.
+ *
+ * Freed pages are either returned to buddy allocator or held back to be used
+ * as metadata pages.
+ */
+bool __init kmsan_memblock_free_pages(struct page *page, unsigned int order);
+
+/**
+ * kmsan_task_create() - Initialize KMSAN state for the task.
+ * @task: task to initialize.
+ */
+void kmsan_task_create(struct task_struct *task);
+
+/**
+ * kmsan_task_exit() - Notify KMSAN that a task has exited.
+ * @task: task about to finish.
+ */
+void kmsan_task_exit(struct task_struct *task);
+
 /**
  * kmsan_alloc_page() - Notify KMSAN about an alloc_pages() call.
  * @page:  struct page pointer returned by alloc_pages().
@@ -172,6 +206,20 @@ void kmsan_iounmap_page_range(unsigned long start, unsigned long end);
 
 #else
 
+static inline void kmsan_init_shadow(void)
+{
+}
+
+static inline void kmsan_init_runtime(void)
+{
+}
+
+static inline bool kmsan_memblock_free_pages(struct page *page,
+					     unsigned int order)
+{
+	return true;
+}
+
 static inline void kmsan_task_create(struct task_struct *task)
 {
 }
diff --git a/init/main.c b/init/main.c
index 0ee39cdcfcac9..7ba48a9ff1d53 100644
--- a/init/main.c
+++ b/init/main.c
@@ -34,6 +34,7 @@
 #include <linux/percpu.h>
 #include <linux/kmod.h>
 #include <linux/kprobes.h>
+#include <linux/kmsan.h>
 #include <linux/vmalloc.h>
 #include <linux/kernel_stat.h>
 #include <linux/start_kernel.h>
@@ -835,6 +836,7 @@ static void __init mm_init(void)
 	init_mem_debugging_and_hardening();
 	kfence_alloc_pool();
 	report_meminit();
+	kmsan_init_shadow();
 	stack_depot_early_init();
 	mem_init();
 	mem_init_print_info();
@@ -852,6 +854,7 @@ static void __init mm_init(void)
 	init_espfix_bsp();
 	/* Should be run after espfix64 is set up. */
 	pti_init();
+	kmsan_init_runtime();
 }
 
 #ifdef CONFIG_RANDOMIZE_KSTACK_OFFSET
diff --git a/mm/kmsan/Makefile b/mm/kmsan/Makefile
index 550ad8625e4f9..401acb1a491ce 100644
--- a/mm/kmsan/Makefile
+++ b/mm/kmsan/Makefile
@@ -3,7 +3,7 @@
 # Makefile for KernelMemorySanitizer (KMSAN).
 #
 #
-obj-y := core.o instrumentation.o hooks.o report.o shadow.o
+obj-y := core.o instrumentation.o init.o hooks.o report.o shadow.o
 
 KMSAN_SANITIZE := n
 KCOV_INSTRUMENT := n
@@ -18,6 +18,7 @@ CFLAGS_REMOVE.o = $(CC_FLAGS_FTRACE)
 
 CFLAGS_core.o := $(CC_FLAGS_KMSAN_RUNTIME)
 CFLAGS_hooks.o := $(CC_FLAGS_KMSAN_RUNTIME)
+CFLAGS_init.o := $(CC_FLAGS_KMSAN_RUNTIME)
 CFLAGS_instrumentation.o := $(CC_FLAGS_KMSAN_RUNTIME)
 CFLAGS_report.o := $(CC_FLAGS_KMSAN_RUNTIME)
 CFLAGS_shadow.o := $(CC_FLAGS_KMSAN_RUNTIME)
diff --git a/mm/kmsan/init.c b/mm/kmsan/init.c
new file mode 100644
index 0000000000000..abbf595a1e359
--- /dev/null
+++ b/mm/kmsan/init.c
@@ -0,0 +1,238 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * KMSAN initialization routines.
+ *
+ * Copyright (C) 2017-2021 Google LLC
+ * Author: Alexander Potapenko <glider@google.com>
+ *
+ */
+
+#include "kmsan.h"
+
+#include <asm/sections.h>
+#include <linux/mm.h>
+#include <linux/memblock.h>
+
+#include "../internal.h"
+
+#define NUM_FUTURE_RANGES 128
+struct start_end_pair {
+	u64 start, end;
+};
+
+static struct start_end_pair start_end_pairs[NUM_FUTURE_RANGES] __initdata;
+static int future_index __initdata;
+
+/*
+ * Record a range of memory for which the metadata pages will be created once
+ * the page allocator becomes available.
+ */
+static void __init kmsan_record_future_shadow_range(void *start, void *end)
+{
+	u64 nstart = (u64)start, nend = (u64)end, cstart, cend;
+	bool merged = false;
+	int i;
+
+	KMSAN_WARN_ON(future_index == NUM_FUTURE_RANGES);
+	KMSAN_WARN_ON((nstart >= nend) || !nstart || !nend);
+	nstart = ALIGN_DOWN(nstart, PAGE_SIZE);
+	nend = ALIGN(nend, PAGE_SIZE);
+
+	/*
+	 * Scan the existing ranges to see if any of them overlaps with
+	 * [start, end). In that case, merge the two ranges instead of
+	 * creating a new one.
+	 * The number of ranges is less than 20, so there is no need to organize
+	 * them into a more intelligent data structure.
+	 */
+	for (i = 0; i < future_index; i++) {
+		cstart = start_end_pairs[i].start;
+		cend = start_end_pairs[i].end;
+		if ((cstart < nstart && cend < nstart) ||
+		    (cstart > nend && cend > nend))
+			/* ranges are disjoint - do not merge */
+			continue;
+		start_end_pairs[i].start = min(nstart, cstart);
+		start_end_pairs[i].end = max(nend, cend);
+		merged = true;
+		break;
+	}
+	if (merged)
+		return;
+	start_end_pairs[future_index].start = nstart;
+	start_end_pairs[future_index].end = nend;
+	future_index++;
+}
+
+/*
+ * Initialize the shadow for existing mappings during kernel initialization.
+ * These include kernel text/data sections, NODE_DATA and future ranges
+ * registered while creating other data (e.g. percpu).
+ *
+ * Allocations via memblock can be only done before slab is initialized.
+ */
+void __init kmsan_init_shadow(void)
+{
+	const size_t nd_size = roundup(sizeof(pg_data_t), PAGE_SIZE);
+	phys_addr_t p_start, p_end;
+	int nid;
+	u64 i;
+
+	for_each_reserved_mem_range(i, &p_start, &p_end)
+		kmsan_record_future_shadow_range(phys_to_virt(p_start),
+						 phys_to_virt(p_end));
+	/* Allocate shadow for .data */
+	kmsan_record_future_shadow_range(_sdata, _edata);
+
+	for_each_online_node(nid)
+		kmsan_record_future_shadow_range(
+			NODE_DATA(nid), (char *)NODE_DATA(nid) + nd_size);
+
+	for (i = 0; i < future_index; i++)
+		kmsan_init_alloc_meta_for_range(
+			(void *)start_end_pairs[i].start,
+			(void *)start_end_pairs[i].end);
+}
+
+struct page_pair {
+	struct page *shadow, *origin;
+};
+static struct page_pair held_back[MAX_ORDER] __initdata;
+
+/*
+ * Eager metadata allocation. When the memblock allocator is freeing pages to
+ * pagealloc, we use 2/3 of them as metadata for the remaining 1/3.
+ * We store the pointers to the returned blocks of pages in held_back[] grouped
+ * by their order: when kmsan_memblock_free_pages() is called for the first
+ * time with a certain order, it is reserved as a shadow block, for the second
+ * time - as an origin block. On the third time the incoming block receives its
+ * shadow and origin ranges from the previously saved shadow and origin blocks,
+ * after which held_back[order] can be used again.
+ *
+ * At the very end there may be leftover blocks in held_back[]. They are
+ * collected later by kmsan_memblock_discard().
+ */
+bool kmsan_memblock_free_pages(struct page *page, unsigned int order)
+{
+	struct page *shadow, *origin;
+
+	if (!held_back[order].shadow) {
+		held_back[order].shadow = page;
+		return false;
+	}
+	if (!held_back[order].origin) {
+		held_back[order].origin = page;
+		return false;
+	}
+	shadow = held_back[order].shadow;
+	origin = held_back[order].origin;
+	kmsan_setup_meta(page, shadow, origin, order);
+
+	held_back[order].shadow = NULL;
+	held_back[order].origin = NULL;
+	return true;
+}
+
+#define MAX_BLOCKS 8
+struct smallstack {
+	struct page *items[MAX_BLOCKS];
+	int index;
+	int order;
+};
+
+static struct smallstack collect = {
+	.index = 0,
+	.order = MAX_ORDER,
+};
+
+static void smallstack_push(struct smallstack *stack, struct page *pages)
+{
+	KMSAN_WARN_ON(stack->index == MAX_BLOCKS);
+	stack->items[stack->index] = pages;
+	stack->index++;
+}
+#undef MAX_BLOCKS
+
+static struct page *smallstack_pop(struct smallstack *stack)
+{
+	struct page *ret;
+
+	KMSAN_WARN_ON(stack->index == 0);
+	stack->index--;
+	ret = stack->items[stack->index];
+	stack->items[stack->index] = NULL;
+	return ret;
+}
+
+static void do_collection(void)
+{
+	struct page *page, *shadow, *origin;
+
+	while (collect.index >= 3) {
+		page = smallstack_pop(&collect);
+		shadow = smallstack_pop(&collect);
+		origin = smallstack_pop(&collect);
+		kmsan_setup_meta(page, shadow, origin, collect.order);
+		__free_pages_core(page, collect.order);
+	}
+}
+
+static void collect_split(void)
+{
+	struct smallstack tmp = {
+		.order = collect.order - 1,
+		.index = 0,
+	};
+	struct page *page;
+
+	if (!collect.order)
+		return;
+	while (collect.index) {
+		page = smallstack_pop(&collect);
+		smallstack_push(&tmp, &page[0]);
+		smallstack_push(&tmp, &page[1 << tmp.order]);
+	}
+	__memcpy(&collect, &tmp, sizeof(tmp));
+}
+
+/*
+ * Memblock is about to go away. Split the page blocks left over in held_back[]
+ * and return 1/3 of that memory to the system.
+ */
+static void kmsan_memblock_discard(void)
+{
+	int i;
+
+	/*
+	 * For each order=N:
+	 *  - push held_back[N].shadow and .origin to @collect;
+	 *  - while there are >= 3 elements in @collect, do garbage collection:
+	 *    - pop 3 ranges from @collect;
+	 *    - use two of them as shadow and origin for the third one;
+	 *    - repeat;
+	 *  - split each remaining element from @collect into 2 ranges of
+	 *    order=N-1,
+	 *  - repeat.
+	 */
+	collect.order = MAX_ORDER - 1;
+	for (i = MAX_ORDER - 1; i >= 0; i--) {
+		if (held_back[i].shadow)
+			smallstack_push(&collect, held_back[i].shadow);
+		if (held_back[i].origin)
+			smallstack_push(&collect, held_back[i].origin);
+		held_back[i].shadow = NULL;
+		held_back[i].origin = NULL;
+		do_collection();
+		collect_split();
+	}
+}
+
+void __init kmsan_init_runtime(void)
+{
+	/* Assuming current is init_task */
+	kmsan_internal_task_create(current);
+	kmsan_memblock_discard();
+	pr_info("Starting KernelMemorySanitizer\n");
+	pr_info("ATTENTION: KMSAN is a debugging tool! Do not use it on production machines!\n");
+	kmsan_enabled = true;
+}
diff --git a/mm/kmsan/kmsan.h b/mm/kmsan/kmsan.h
index c7fb8666607e2..2f17912ef863f 100644
--- a/mm/kmsan/kmsan.h
+++ b/mm/kmsan/kmsan.h
@@ -66,6 +66,7 @@ struct shadow_origin_ptr {
 struct shadow_origin_ptr kmsan_get_shadow_origin_ptr(void *addr, u64 size,
 						     bool store);
 void *kmsan_get_metadata(void *addr, bool is_origin);
+void __init kmsan_init_alloc_meta_for_range(void *start, void *end);
 
 enum kmsan_bug_reason {
 	REASON_ANY,
@@ -188,5 +189,7 @@ bool kmsan_internal_is_module_addr(void *vaddr);
 bool kmsan_internal_is_vmalloc_addr(void *addr);
 
 struct page *kmsan_vmalloc_to_page_or_null(void *vaddr);
+void kmsan_setup_meta(struct page *page, struct page *shadow,
+		      struct page *origin, int order);
 
 #endif /* __MM_KMSAN_KMSAN_H */
diff --git a/mm/kmsan/shadow.c b/mm/kmsan/shadow.c
index 416cb85487a1a..7b254c30d42cc 100644
--- a/mm/kmsan/shadow.c
+++ b/mm/kmsan/shadow.c
@@ -259,3 +259,39 @@ void kmsan_vmap_pages_range_noflush(unsigned long start, unsigned long end,
 	kfree(s_pages);
 	kfree(o_pages);
 }
+
+/* Allocate metadata for pages allocated at boot time. */
+void __init kmsan_init_alloc_meta_for_range(void *start, void *end)
+{
+	struct page *shadow_p, *origin_p;
+	void *shadow, *origin;
+	struct page *page;
+	u64 addr, size;
+
+	start = (void *)ALIGN_DOWN((u64)start, PAGE_SIZE);
+	size = ALIGN((u64)end - (u64)start, PAGE_SIZE);
+	shadow = memblock_alloc(size, PAGE_SIZE);
+	origin = memblock_alloc(size, PAGE_SIZE);
+	for (addr = 0; addr < size; addr += PAGE_SIZE) {
+		page = virt_to_page_or_null((char *)start + addr);
+		shadow_p = virt_to_page_or_null((char *)shadow + addr);
+		set_no_shadow_origin_page(shadow_p);
+		shadow_page_for(page) = shadow_p;
+		origin_p = virt_to_page_or_null((char *)origin + addr);
+		set_no_shadow_origin_page(origin_p);
+		origin_page_for(page) = origin_p;
+	}
+}
+
+void kmsan_setup_meta(struct page *page, struct page *shadow,
+		      struct page *origin, int order)
+{
+	int i;
+
+	for (i = 0; i < (1 << order); i++) {
+		set_no_shadow_origin_page(&shadow[i]);
+		set_no_shadow_origin_page(&origin[i]);
+		shadow_page_for(&page[i]) = &shadow[i];
+		origin_page_for(&page[i]) = &origin[i];
+	}
+}
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 785459251145e..e8d5a0b2a3264 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1731,6 +1731,9 @@ void __init memblock_free_pages(struct page *page, unsigned long pfn,
 {
 	if (early_page_uninitialised(pfn))
 		return;
+	if (!kmsan_memblock_free_pages(page, order))
+		/* KMSAN will take care of these pages. */
+		return;
 	__free_pages_core(page, order);
 }
 
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH v4 18/45] instrumented.h: add KMSAN support
  2022-07-01 14:22 [PATCH v4 00/45] Add KernelMemorySanitizer infrastructure Alexander Potapenko
                   ` (16 preceding siblings ...)
  2022-07-01 14:22 ` [PATCH v4 17/45] init: kmsan: call KMSAN initialization routines Alexander Potapenko
@ 2022-07-01 14:22 ` Alexander Potapenko
  2022-07-12 13:51   ` Marco Elver
  2022-07-01 14:22 ` [PATCH v4 19/45] kmsan: unpoison @tlb in arch_tlb_gather_mmu() Alexander Potapenko
                   ` (26 subsequent siblings)
  44 siblings, 1 reply; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-01 14:22 UTC (permalink / raw)
  To: glider
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, linux-mm, linux-arch,
	linux-kernel

To avoid false positives, KMSAN needs to unpoison the data copied from
the userspace. To detect infoleaks - check the memory buffer passed to
copy_to_user().

Signed-off-by: Alexander Potapenko <glider@google.com>
---
v2:
 -- move implementation of kmsan_copy_to_user() here

Link: https://linux-review.googlesource.com/id/I43e93b9c02709e6be8d222342f1b044ac8bdbaaf
---
 include/linux/instrumented.h |  5 ++++-
 include/linux/kmsan-checks.h | 19 ++++++++++++++++++
 mm/kmsan/hooks.c             | 38 ++++++++++++++++++++++++++++++++++++
 3 files changed, 61 insertions(+), 1 deletion(-)

diff --git a/include/linux/instrumented.h b/include/linux/instrumented.h
index ee8f7d17d34f5..c73c1b19e9227 100644
--- a/include/linux/instrumented.h
+++ b/include/linux/instrumented.h
@@ -2,7 +2,7 @@
 
 /*
  * This header provides generic wrappers for memory access instrumentation that
- * the compiler cannot emit for: KASAN, KCSAN.
+ * the compiler cannot emit for: KASAN, KCSAN, KMSAN.
  */
 #ifndef _LINUX_INSTRUMENTED_H
 #define _LINUX_INSTRUMENTED_H
@@ -10,6 +10,7 @@
 #include <linux/compiler.h>
 #include <linux/kasan-checks.h>
 #include <linux/kcsan-checks.h>
+#include <linux/kmsan-checks.h>
 #include <linux/types.h>
 
 /**
@@ -117,6 +118,7 @@ instrument_copy_to_user(void __user *to, const void *from, unsigned long n)
 {
 	kasan_check_read(from, n);
 	kcsan_check_read(from, n);
+	kmsan_copy_to_user(to, from, n, 0);
 }
 
 /**
@@ -151,6 +153,7 @@ static __always_inline void
 instrument_copy_from_user_after(const void *to, const void __user *from,
 				unsigned long n, unsigned long left)
 {
+	kmsan_unpoison_memory(to, n - left);
 }
 
 #endif /* _LINUX_INSTRUMENTED_H */
diff --git a/include/linux/kmsan-checks.h b/include/linux/kmsan-checks.h
index a6522a0c28df9..c4cae333deec5 100644
--- a/include/linux/kmsan-checks.h
+++ b/include/linux/kmsan-checks.h
@@ -46,6 +46,21 @@ void kmsan_unpoison_memory(const void *address, size_t size);
  */
 void kmsan_check_memory(const void *address, size_t size);
 
+/**
+ * kmsan_copy_to_user() - Notify KMSAN about a data transfer to userspace.
+ * @to:      destination address in the userspace.
+ * @from:    source address in the kernel.
+ * @to_copy: number of bytes to copy.
+ * @left:    number of bytes not copied.
+ *
+ * If this is a real userspace data transfer, KMSAN checks the bytes that were
+ * actually copied to ensure there was no information leak. If @to belongs to
+ * the kernel space (which is possible for compat syscalls), KMSAN just copies
+ * the metadata.
+ */
+void kmsan_copy_to_user(void __user *to, const void *from, size_t to_copy,
+			size_t left);
+
 #else
 
 static inline void kmsan_poison_memory(const void *address, size_t size,
@@ -58,6 +73,10 @@ static inline void kmsan_unpoison_memory(const void *address, size_t size)
 static inline void kmsan_check_memory(const void *address, size_t size)
 {
 }
+static inline void kmsan_copy_to_user(void __user *to, const void *from,
+				      size_t to_copy, size_t left)
+{
+}
 
 #endif
 
diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
index 43a529569053d..1cdb4420977f1 100644
--- a/mm/kmsan/hooks.c
+++ b/mm/kmsan/hooks.c
@@ -212,6 +212,44 @@ void kmsan_iounmap_page_range(unsigned long start, unsigned long end)
 }
 EXPORT_SYMBOL(kmsan_iounmap_page_range);
 
+void kmsan_copy_to_user(void __user *to, const void *from, size_t to_copy,
+			size_t left)
+{
+	unsigned long ua_flags;
+
+	if (!kmsan_enabled || kmsan_in_runtime())
+		return;
+	/*
+	 * At this point we've copied the memory already. It's hard to check it
+	 * before copying, as the size of actually copied buffer is unknown.
+	 */
+
+	/* copy_to_user() may copy zero bytes. No need to check. */
+	if (!to_copy)
+		return;
+	/* Or maybe copy_to_user() failed to copy anything. */
+	if (to_copy <= left)
+		return;
+
+	ua_flags = user_access_save();
+	if ((u64)to < TASK_SIZE) {
+		/* This is a user memory access, check it. */
+		kmsan_internal_check_memory((void *)from, to_copy - left, to,
+					    REASON_COPY_TO_USER);
+		user_access_restore(ua_flags);
+		return;
+	}
+	/* Otherwise this is a kernel memory access. This happens when a compat
+	 * syscall passes an argument allocated on the kernel stack to a real
+	 * syscall.
+	 * Don't check anything, just copy the shadow of the copied bytes.
+	 */
+	kmsan_internal_memmove_metadata((void *)to, (void *)from,
+					to_copy - left);
+	user_access_restore(ua_flags);
+}
+EXPORT_SYMBOL(kmsan_copy_to_user);
+
 /* Functions from kmsan-checks.h follow. */
 void kmsan_poison_memory(const void *address, size_t size, gfp_t flags)
 {
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH v4 19/45] kmsan: unpoison @tlb in arch_tlb_gather_mmu()
  2022-07-01 14:22 [PATCH v4 00/45] Add KernelMemorySanitizer infrastructure Alexander Potapenko
                   ` (17 preceding siblings ...)
  2022-07-01 14:22 ` [PATCH v4 18/45] instrumented.h: add KMSAN support Alexander Potapenko
@ 2022-07-01 14:22 ` Alexander Potapenko
  2022-07-13  9:28   ` Marco Elver
  2022-07-01 14:22 ` [PATCH v4 20/45] kmsan: add iomap support Alexander Potapenko
                   ` (25 subsequent siblings)
  44 siblings, 1 reply; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-01 14:22 UTC (permalink / raw)
  To: glider
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, linux-mm, linux-arch,
	linux-kernel

This is a hack to reduce stackdepot pressure.

struct mmu_gather contains 7 1-bit fields packed into a 32-bit unsigned
int value. The remaining 25 bits remain uninitialized and are never used,
but KMSAN updates the origin for them in zap_pXX_range() in mm/memory.c,
thus creating very long origin chains. This is technically correct, but
consumes too much memory.

Unpoisoning the whole structure will prevent creating such chains.

Signed-off-by: Alexander Potapenko <glider@google.com>
---
Link: https://linux-review.googlesource.com/id/I76abee411b8323acfdbc29bc3a60dca8cff2de77
---
 mm/mmu_gather.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c
index a71924bd38c0d..add4244e5790d 100644
--- a/mm/mmu_gather.c
+++ b/mm/mmu_gather.c
@@ -1,6 +1,7 @@
 #include <linux/gfp.h>
 #include <linux/highmem.h>
 #include <linux/kernel.h>
+#include <linux/kmsan-checks.h>
 #include <linux/mmdebug.h>
 #include <linux/mm_types.h>
 #include <linux/mm_inline.h>
@@ -265,6 +266,15 @@ void tlb_flush_mmu(struct mmu_gather *tlb)
 static void __tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
 			     bool fullmm)
 {
+	/*
+	 * struct mmu_gather contains 7 1-bit fields packed into a 32-bit
+	 * unsigned int value. The remaining 25 bits remain uninitialized
+	 * and are never used, but KMSAN updates the origin for them in
+	 * zap_pXX_range() in mm/memory.c, thus creating very long origin
+	 * chains. This is technically correct, but consumes too much memory.
+	 * Unpoisoning the whole structure will prevent creating such chains.
+	 */
+	kmsan_unpoison_memory(tlb, sizeof(*tlb));
 	tlb->mm = mm;
 	tlb->fullmm = fullmm;
 
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH v4 20/45] kmsan: add iomap support
  2022-07-01 14:22 [PATCH v4 00/45] Add KernelMemorySanitizer infrastructure Alexander Potapenko
                   ` (18 preceding siblings ...)
  2022-07-01 14:22 ` [PATCH v4 19/45] kmsan: unpoison @tlb in arch_tlb_gather_mmu() Alexander Potapenko
@ 2022-07-01 14:22 ` Alexander Potapenko
  2022-07-01 14:22 ` [PATCH v4 21/45] Input: libps2: mark data received in __ps2_command() as initialized Alexander Potapenko
                   ` (24 subsequent siblings)
  44 siblings, 0 replies; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-01 14:22 UTC (permalink / raw)
  To: glider
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, linux-mm, linux-arch,
	linux-kernel

Functions from lib/iomap.c interact with hardware, so KMSAN must ensure
that:
 - every read function returns an initialized value
 - every write function checks values before sending them to hardware.

Signed-off-by: Alexander Potapenko <glider@google.com>

---

v4:
  -- switch from __no_sanitize_memory (which now means "no KMSAN
     instrumentation") to __no_kmsan_checks (i.e. "unpoison everything")

Link: https://linux-review.googlesource.com/id/I45527599f09090aca046dfe1a26df453adab100d
---
 lib/iomap.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/lib/iomap.c b/lib/iomap.c
index fbaa3e8f19d6c..4f8b31baa5752 100644
--- a/lib/iomap.c
+++ b/lib/iomap.c
@@ -6,6 +6,7 @@
  */
 #include <linux/pci.h>
 #include <linux/io.h>
+#include <linux/kmsan-checks.h>
 
 #include <linux/export.h>
 
@@ -70,26 +71,35 @@ static void bad_io_access(unsigned long port, const char *access)
 #define mmio_read64be(addr) swab64(readq(addr))
 #endif
 
+/*
+ * Here and below, we apply __no_kmsan_checks to functions reading data from
+ * hardware, to ensure that KMSAN marks their return values as initialized.
+ */
+__no_kmsan_checks
 unsigned int ioread8(const void __iomem *addr)
 {
 	IO_COND(addr, return inb(port), return readb(addr));
 	return 0xff;
 }
+__no_kmsan_checks
 unsigned int ioread16(const void __iomem *addr)
 {
 	IO_COND(addr, return inw(port), return readw(addr));
 	return 0xffff;
 }
+__no_kmsan_checks
 unsigned int ioread16be(const void __iomem *addr)
 {
 	IO_COND(addr, return pio_read16be(port), return mmio_read16be(addr));
 	return 0xffff;
 }
+__no_kmsan_checks
 unsigned int ioread32(const void __iomem *addr)
 {
 	IO_COND(addr, return inl(port), return readl(addr));
 	return 0xffffffff;
 }
+__no_kmsan_checks
 unsigned int ioread32be(const void __iomem *addr)
 {
 	IO_COND(addr, return pio_read32be(port), return mmio_read32be(addr));
@@ -142,18 +152,21 @@ static u64 pio_read64be_hi_lo(unsigned long port)
 	return lo | (hi << 32);
 }
 
+__no_kmsan_checks
 u64 ioread64_lo_hi(const void __iomem *addr)
 {
 	IO_COND(addr, return pio_read64_lo_hi(port), return readq(addr));
 	return 0xffffffffffffffffULL;
 }
 
+__no_kmsan_checks
 u64 ioread64_hi_lo(const void __iomem *addr)
 {
 	IO_COND(addr, return pio_read64_hi_lo(port), return readq(addr));
 	return 0xffffffffffffffffULL;
 }
 
+__no_kmsan_checks
 u64 ioread64be_lo_hi(const void __iomem *addr)
 {
 	IO_COND(addr, return pio_read64be_lo_hi(port),
@@ -161,6 +174,7 @@ u64 ioread64be_lo_hi(const void __iomem *addr)
 	return 0xffffffffffffffffULL;
 }
 
+__no_kmsan_checks
 u64 ioread64be_hi_lo(const void __iomem *addr)
 {
 	IO_COND(addr, return pio_read64be_hi_lo(port),
@@ -188,22 +202,32 @@ EXPORT_SYMBOL(ioread64be_hi_lo);
 
 void iowrite8(u8 val, void __iomem *addr)
 {
+	/* Make sure uninitialized memory isn't copied to devices. */
+	kmsan_check_memory(&val, sizeof(val));
 	IO_COND(addr, outb(val,port), writeb(val, addr));
 }
 void iowrite16(u16 val, void __iomem *addr)
 {
+	/* Make sure uninitialized memory isn't copied to devices. */
+	kmsan_check_memory(&val, sizeof(val));
 	IO_COND(addr, outw(val,port), writew(val, addr));
 }
 void iowrite16be(u16 val, void __iomem *addr)
 {
+	/* Make sure uninitialized memory isn't copied to devices. */
+	kmsan_check_memory(&val, sizeof(val));
 	IO_COND(addr, pio_write16be(val,port), mmio_write16be(val, addr));
 }
 void iowrite32(u32 val, void __iomem *addr)
 {
+	/* Make sure uninitialized memory isn't copied to devices. */
+	kmsan_check_memory(&val, sizeof(val));
 	IO_COND(addr, outl(val,port), writel(val, addr));
 }
 void iowrite32be(u32 val, void __iomem *addr)
 {
+	/* Make sure uninitialized memory isn't copied to devices. */
+	kmsan_check_memory(&val, sizeof(val));
 	IO_COND(addr, pio_write32be(val,port), mmio_write32be(val, addr));
 }
 EXPORT_SYMBOL(iowrite8);
@@ -239,24 +263,32 @@ static void pio_write64be_hi_lo(u64 val, unsigned long port)
 
 void iowrite64_lo_hi(u64 val, void __iomem *addr)
 {
+	/* Make sure uninitialized memory isn't copied to devices. */
+	kmsan_check_memory(&val, sizeof(val));
 	IO_COND(addr, pio_write64_lo_hi(val, port),
 		writeq(val, addr));
 }
 
 void iowrite64_hi_lo(u64 val, void __iomem *addr)
 {
+	/* Make sure uninitialized memory isn't copied to devices. */
+	kmsan_check_memory(&val, sizeof(val));
 	IO_COND(addr, pio_write64_hi_lo(val, port),
 		writeq(val, addr));
 }
 
 void iowrite64be_lo_hi(u64 val, void __iomem *addr)
 {
+	/* Make sure uninitialized memory isn't copied to devices. */
+	kmsan_check_memory(&val, sizeof(val));
 	IO_COND(addr, pio_write64be_lo_hi(val, port),
 		mmio_write64be(val, addr));
 }
 
 void iowrite64be_hi_lo(u64 val, void __iomem *addr)
 {
+	/* Make sure uninitialized memory isn't copied to devices. */
+	kmsan_check_memory(&val, sizeof(val));
 	IO_COND(addr, pio_write64be_hi_lo(val, port),
 		mmio_write64be(val, addr));
 }
@@ -328,14 +360,20 @@ static inline void mmio_outsl(void __iomem *addr, const u32 *src, int count)
 void ioread8_rep(const void __iomem *addr, void *dst, unsigned long count)
 {
 	IO_COND(addr, insb(port,dst,count), mmio_insb(addr, dst, count));
+	/* KMSAN must treat values read from devices as initialized. */
+	kmsan_unpoison_memory(dst, count);
 }
 void ioread16_rep(const void __iomem *addr, void *dst, unsigned long count)
 {
 	IO_COND(addr, insw(port,dst,count), mmio_insw(addr, dst, count));
+	/* KMSAN must treat values read from devices as initialized. */
+	kmsan_unpoison_memory(dst, count * 2);
 }
 void ioread32_rep(const void __iomem *addr, void *dst, unsigned long count)
 {
 	IO_COND(addr, insl(port,dst,count), mmio_insl(addr, dst, count));
+	/* KMSAN must treat values read from devices as initialized. */
+	kmsan_unpoison_memory(dst, count * 4);
 }
 EXPORT_SYMBOL(ioread8_rep);
 EXPORT_SYMBOL(ioread16_rep);
@@ -343,14 +381,20 @@ EXPORT_SYMBOL(ioread32_rep);
 
 void iowrite8_rep(void __iomem *addr, const void *src, unsigned long count)
 {
+	/* Make sure uninitialized memory isn't copied to devices. */
+	kmsan_check_memory(src, count);
 	IO_COND(addr, outsb(port, src, count), mmio_outsb(addr, src, count));
 }
 void iowrite16_rep(void __iomem *addr, const void *src, unsigned long count)
 {
+	/* Make sure uninitialized memory isn't copied to devices. */
+	kmsan_check_memory(src, count * 2);
 	IO_COND(addr, outsw(port, src, count), mmio_outsw(addr, src, count));
 }
 void iowrite32_rep(void __iomem *addr, const void *src, unsigned long count)
 {
+	/* Make sure uninitialized memory isn't copied to devices. */
+	kmsan_check_memory(src, count * 4);
 	IO_COND(addr, outsl(port, src,count), mmio_outsl(addr, src, count));
 }
 EXPORT_SYMBOL(iowrite8_rep);
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH v4 21/45] Input: libps2: mark data received in __ps2_command() as initialized
  2022-07-01 14:22 [PATCH v4 00/45] Add KernelMemorySanitizer infrastructure Alexander Potapenko
                   ` (19 preceding siblings ...)
  2022-07-01 14:22 ` [PATCH v4 20/45] kmsan: add iomap support Alexander Potapenko
@ 2022-07-01 14:22 ` Alexander Potapenko
  2022-07-01 14:22 ` [PATCH v4 22/45] dma: kmsan: unpoison DMA mappings Alexander Potapenko
                   ` (23 subsequent siblings)
  44 siblings, 0 replies; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-01 14:22 UTC (permalink / raw)
  To: glider
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, linux-mm, linux-arch,
	linux-kernel

KMSAN does not know that the device initializes certain bytes in
ps2dev->cmdbuf. Call kmsan_unpoison_memory() to explicitly mark them as
initialized.

Signed-off-by: Alexander Potapenko <glider@google.com>
---
Link: https://linux-review.googlesource.com/id/I2d26f6baa45271d37320d3f4a528c39cb7e545f0
---
 drivers/input/serio/libps2.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/input/serio/libps2.c b/drivers/input/serio/libps2.c
index 250e213cc80c6..3e19344eda93c 100644
--- a/drivers/input/serio/libps2.c
+++ b/drivers/input/serio/libps2.c
@@ -12,6 +12,7 @@
 #include <linux/sched.h>
 #include <linux/interrupt.h>
 #include <linux/input.h>
+#include <linux/kmsan-checks.h>
 #include <linux/serio.h>
 #include <linux/i8042.h>
 #include <linux/libps2.h>
@@ -294,9 +295,11 @@ int __ps2_command(struct ps2dev *ps2dev, u8 *param, unsigned int command)
 
 	serio_pause_rx(ps2dev->serio);
 
-	if (param)
+	if (param) {
 		for (i = 0; i < receive; i++)
 			param[i] = ps2dev->cmdbuf[(receive - 1) - i];
+		kmsan_unpoison_memory(param, receive);
+	}
 
 	if (ps2dev->cmdcnt &&
 	    (command != PS2_CMD_RESET_BAT || ps2dev->cmdcnt != 1)) {
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH v4 22/45] dma: kmsan: unpoison DMA mappings
  2022-07-01 14:22 [PATCH v4 00/45] Add KernelMemorySanitizer infrastructure Alexander Potapenko
                   ` (20 preceding siblings ...)
  2022-07-01 14:22 ` [PATCH v4 21/45] Input: libps2: mark data received in __ps2_command() as initialized Alexander Potapenko
@ 2022-07-01 14:22 ` Alexander Potapenko
  2022-07-01 14:22 ` [PATCH v4 23/45] virtio: kmsan: check/unpoison scatterlist in vring_map_one_sg() Alexander Potapenko
                   ` (22 subsequent siblings)
  44 siblings, 0 replies; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-01 14:22 UTC (permalink / raw)
  To: glider
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, linux-mm, linux-arch,
	linux-kernel

KMSAN doesn't know about DMA memory writes performed by devices.
We unpoison such memory when it's mapped to avoid false positive
reports.

Signed-off-by: Alexander Potapenko <glider@google.com>
---
v2:
 -- move implementation of kmsan_handle_dma() and kmsan_handle_dma_sg() here

v4:
 -- swap dma: and kmsan: int the subject

Link: https://linux-review.googlesource.com/id/Ia162dc4c5a92e74d4686c1be32a4dfeffc5c32cd
---
 include/linux/kmsan.h | 41 +++++++++++++++++++++++++++++
 kernel/dma/mapping.c  |  9 ++++---
 mm/kmsan/hooks.c      | 61 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 108 insertions(+), 3 deletions(-)

diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h
index 82fd564cc72e7..55fe673ee1e84 100644
--- a/include/linux/kmsan.h
+++ b/include/linux/kmsan.h
@@ -9,6 +9,7 @@
 #ifndef _LINUX_KMSAN_H
 #define _LINUX_KMSAN_H
 
+#include <linux/dma-direction.h>
 #include <linux/gfp.h>
 #include <linux/kmsan-checks.h>
 #include <linux/stackdepot.h>
@@ -17,6 +18,7 @@
 struct page;
 struct kmem_cache;
 struct task_struct;
+struct scatterlist;
 
 #ifdef CONFIG_KMSAN
 
@@ -204,6 +206,35 @@ void kmsan_ioremap_page_range(unsigned long addr, unsigned long end,
  */
 void kmsan_iounmap_page_range(unsigned long start, unsigned long end);
 
+/**
+ * kmsan_handle_dma() - Handle a DMA data transfer.
+ * @page:   first page of the buffer.
+ * @offset: offset of the buffer within the first page.
+ * @size:   buffer size.
+ * @dir:    one of possible dma_data_direction values.
+ *
+ * Depending on @direction, KMSAN:
+ * * checks the buffer, if it is copied to device;
+ * * initializes the buffer, if it is copied from device;
+ * * does both, if this is a DMA_BIDIRECTIONAL transfer.
+ */
+void kmsan_handle_dma(struct page *page, size_t offset, size_t size,
+		      enum dma_data_direction dir);
+
+/**
+ * kmsan_handle_dma_sg() - Handle a DMA transfer using scatterlist.
+ * @sg:    scatterlist holding DMA buffers.
+ * @nents: number of scatterlist entries.
+ * @dir:   one of possible dma_data_direction values.
+ *
+ * Depending on @direction, KMSAN:
+ * * checks the buffers in the scatterlist, if they are copied to device;
+ * * initializes the buffers, if they are copied from device;
+ * * does both, if this is a DMA_BIDIRECTIONAL transfer.
+ */
+void kmsan_handle_dma_sg(struct scatterlist *sg, int nents,
+			 enum dma_data_direction dir);
+
 #else
 
 static inline void kmsan_init_shadow(void)
@@ -286,6 +317,16 @@ static inline void kmsan_iounmap_page_range(unsigned long start,
 {
 }
 
+static inline void kmsan_handle_dma(struct page *page, size_t offset,
+				    size_t size, enum dma_data_direction dir)
+{
+}
+
+static inline void kmsan_handle_dma_sg(struct scatterlist *sg, int nents,
+				       enum dma_data_direction dir)
+{
+}
+
 #endif
 
 #endif /* _LINUX_KMSAN_H */
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index db7244291b745..5d17d5d62166b 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -156,6 +156,7 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
 		addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
 	else
 		addr = ops->map_page(dev, page, offset, size, dir, attrs);
+	kmsan_handle_dma(page, offset, size, dir);
 	debug_dma_map_page(dev, page, offset, size, dir, addr, attrs);
 
 	return addr;
@@ -194,11 +195,13 @@ static int __dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
 	else
 		ents = ops->map_sg(dev, sg, nents, dir, attrs);
 
-	if (ents > 0)
+	if (ents > 0) {
+		kmsan_handle_dma_sg(sg, nents, dir);
 		debug_dma_map_sg(dev, sg, nents, ents, dir, attrs);
-	else if (WARN_ON_ONCE(ents != -EINVAL && ents != -ENOMEM &&
-			      ents != -EIO))
+	} else if (WARN_ON_ONCE(ents != -EINVAL && ents != -ENOMEM &&
+				ents != -EIO)) {
 		return -EIO;
+	}
 
 	return ents;
 }
diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
index 1cdb4420977f1..8a6947a2a2f22 100644
--- a/mm/kmsan/hooks.c
+++ b/mm/kmsan/hooks.c
@@ -10,9 +10,11 @@
  */
 
 #include <linux/cacheflush.h>
+#include <linux/dma-direction.h>
 #include <linux/gfp.h>
 #include <linux/mm.h>
 #include <linux/mm_types.h>
+#include <linux/scatterlist.h>
 #include <linux/slab.h>
 #include <linux/uaccess.h>
 
@@ -250,6 +252,65 @@ void kmsan_copy_to_user(void __user *to, const void *from, size_t to_copy,
 }
 EXPORT_SYMBOL(kmsan_copy_to_user);
 
+static void kmsan_handle_dma_page(const void *addr, size_t size,
+				  enum dma_data_direction dir)
+{
+	switch (dir) {
+	case DMA_BIDIRECTIONAL:
+		kmsan_internal_check_memory((void *)addr, size, /*user_addr*/ 0,
+					    REASON_ANY);
+		kmsan_internal_unpoison_memory((void *)addr, size,
+					       /*checked*/ false);
+		break;
+	case DMA_TO_DEVICE:
+		kmsan_internal_check_memory((void *)addr, size, /*user_addr*/ 0,
+					    REASON_ANY);
+		break;
+	case DMA_FROM_DEVICE:
+		kmsan_internal_unpoison_memory((void *)addr, size,
+					       /*checked*/ false);
+		break;
+	case DMA_NONE:
+		break;
+	}
+}
+
+/* Helper function to handle DMA data transfers. */
+void kmsan_handle_dma(struct page *page, size_t offset, size_t size,
+		      enum dma_data_direction dir)
+{
+	u64 page_offset, to_go, addr;
+
+	if (PageHighMem(page))
+		return;
+	addr = (u64)page_address(page) + offset;
+	/*
+	 * The kernel may occasionally give us adjacent DMA pages not belonging
+	 * to the same allocation. Process them separately to avoid triggering
+	 * internal KMSAN checks.
+	 */
+	while (size > 0) {
+		page_offset = addr % PAGE_SIZE;
+		to_go = min(PAGE_SIZE - page_offset, (u64)size);
+		kmsan_handle_dma_page((void *)addr, to_go, dir);
+		addr += to_go;
+		size -= to_go;
+	}
+}
+EXPORT_SYMBOL(kmsan_handle_dma);
+
+void kmsan_handle_dma_sg(struct scatterlist *sg, int nents,
+			 enum dma_data_direction dir)
+{
+	struct scatterlist *item;
+	int i;
+
+	for_each_sg(sg, item, nents, i)
+		kmsan_handle_dma(sg_page(item), item->offset, item->length,
+				 dir);
+}
+EXPORT_SYMBOL(kmsan_handle_dma_sg);
+
 /* Functions from kmsan-checks.h follow. */
 void kmsan_poison_memory(const void *address, size_t size, gfp_t flags)
 {
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH v4 23/45] virtio: kmsan: check/unpoison scatterlist in vring_map_one_sg()
  2022-07-01 14:22 [PATCH v4 00/45] Add KernelMemorySanitizer infrastructure Alexander Potapenko
                   ` (21 preceding siblings ...)
  2022-07-01 14:22 ` [PATCH v4 22/45] dma: kmsan: unpoison DMA mappings Alexander Potapenko
@ 2022-07-01 14:22 ` Alexander Potapenko
  2022-07-01 14:22 ` [PATCH v4 24/45] kmsan: handle memory sent to/from USB Alexander Potapenko
                   ` (21 subsequent siblings)
  44 siblings, 0 replies; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-01 14:22 UTC (permalink / raw)
  To: glider
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, linux-mm, linux-arch,
	linux-kernel

If vring doesn't use the DMA API, KMSAN is unable to tell whether the
memory is initialized by hardware. Explicitly call kmsan_handle_dma()
from vring_map_one_sg() in this case to prevent false positives.

Signed-off-by: Alexander Potapenko <glider@google.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>

---
v4:
 -- swap virtio: and kmsan: in the subject

Link: https://linux-review.googlesource.com/id/I211533ecb86a66624e151551f83ddd749536b3af
---
 drivers/virtio/virtio_ring.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 13a7348cedfff..2d42a4b38e628 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -11,6 +11,7 @@
 #include <linux/module.h>
 #include <linux/hrtimer.h>
 #include <linux/dma-mapping.h>
+#include <linux/kmsan-checks.h>
 #include <linux/spinlock.h>
 #include <xen/xen.h>
 
@@ -329,8 +330,15 @@ static dma_addr_t vring_map_one_sg(const struct vring_virtqueue *vq,
 				   struct scatterlist *sg,
 				   enum dma_data_direction direction)
 {
-	if (!vq->use_dma_api)
+	if (!vq->use_dma_api) {
+		/*
+		 * If DMA is not used, KMSAN doesn't know that the scatterlist
+		 * is initialized by the hardware. Explicitly check/unpoison it
+		 * depending on the direction.
+		 */
+		kmsan_handle_dma(sg_page(sg), sg->offset, sg->length, direction);
 		return (dma_addr_t)sg_phys(sg);
+	}
 
 	/*
 	 * We can't use dma_map_sg, because we don't use scatterlists in
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH v4 24/45] kmsan: handle memory sent to/from USB
  2022-07-01 14:22 [PATCH v4 00/45] Add KernelMemorySanitizer infrastructure Alexander Potapenko
                   ` (22 preceding siblings ...)
  2022-07-01 14:22 ` [PATCH v4 23/45] virtio: kmsan: check/unpoison scatterlist in vring_map_one_sg() Alexander Potapenko
@ 2022-07-01 14:22 ` Alexander Potapenko
  2022-07-01 14:22 ` [PATCH v4 25/45] kmsan: add tests for KMSAN Alexander Potapenko
                   ` (20 subsequent siblings)
  44 siblings, 0 replies; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-01 14:22 UTC (permalink / raw)
  To: glider
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, linux-mm, linux-arch,
	linux-kernel

Depending on the value of is_out kmsan_handle_urb() KMSAN either
marks the data copied to the kernel from a USB device as initialized,
or checks the data sent to the device for being initialized.

Signed-off-by: Alexander Potapenko <glider@google.com>

---
v2:
 -- move kmsan_handle_urb() implementation to this patch

Link: https://linux-review.googlesource.com/id/Ifa67fb72015d4de14c30e971556f99fc8b2ee506
---
 drivers/usb/core/urb.c |  2 ++
 include/linux/kmsan.h  | 15 +++++++++++++++
 mm/kmsan/hooks.c       | 17 +++++++++++++++++
 3 files changed, 34 insertions(+)

diff --git a/drivers/usb/core/urb.c b/drivers/usb/core/urb.c
index 33d62d7e3929f..1fe3f23205624 100644
--- a/drivers/usb/core/urb.c
+++ b/drivers/usb/core/urb.c
@@ -8,6 +8,7 @@
 #include <linux/bitops.h>
 #include <linux/slab.h>
 #include <linux/log2.h>
+#include <linux/kmsan-checks.h>
 #include <linux/usb.h>
 #include <linux/wait.h>
 #include <linux/usb/hcd.h>
@@ -426,6 +427,7 @@ int usb_submit_urb(struct urb *urb, gfp_t mem_flags)
 			URB_SETUP_MAP_SINGLE | URB_SETUP_MAP_LOCAL |
 			URB_DMA_SG_COMBINED);
 	urb->transfer_flags |= (is_out ? URB_DIR_OUT : URB_DIR_IN);
+	kmsan_handle_urb(urb, is_out);
 
 	if (xfertype != USB_ENDPOINT_XFER_CONTROL &&
 			dev->state < USB_STATE_CONFIGURED)
diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h
index 55fe673ee1e84..e8b5c306c4aa1 100644
--- a/include/linux/kmsan.h
+++ b/include/linux/kmsan.h
@@ -19,6 +19,7 @@ struct page;
 struct kmem_cache;
 struct task_struct;
 struct scatterlist;
+struct urb;
 
 #ifdef CONFIG_KMSAN
 
@@ -235,6 +236,16 @@ void kmsan_handle_dma(struct page *page, size_t offset, size_t size,
 void kmsan_handle_dma_sg(struct scatterlist *sg, int nents,
 			 enum dma_data_direction dir);
 
+/**
+ * kmsan_handle_urb() - Handle a USB data transfer.
+ * @urb:    struct urb pointer.
+ * @is_out: data transfer direction (true means output to hardware).
+ *
+ * If @is_out is true, KMSAN checks the transfer buffer of @urb. Otherwise,
+ * KMSAN initializes the transfer buffer.
+ */
+void kmsan_handle_urb(const struct urb *urb, bool is_out);
+
 #else
 
 static inline void kmsan_init_shadow(void)
@@ -327,6 +338,10 @@ static inline void kmsan_handle_dma_sg(struct scatterlist *sg, int nents,
 {
 }
 
+static inline void kmsan_handle_urb(const struct urb *urb, bool is_out)
+{
+}
+
 #endif
 
 #endif /* _LINUX_KMSAN_H */
diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
index 8a6947a2a2f22..9aecbf2825837 100644
--- a/mm/kmsan/hooks.c
+++ b/mm/kmsan/hooks.c
@@ -17,6 +17,7 @@
 #include <linux/scatterlist.h>
 #include <linux/slab.h>
 #include <linux/uaccess.h>
+#include <linux/usb.h>
 
 #include "../internal.h"
 #include "../slab.h"
@@ -252,6 +253,22 @@ void kmsan_copy_to_user(void __user *to, const void *from, size_t to_copy,
 }
 EXPORT_SYMBOL(kmsan_copy_to_user);
 
+/* Helper function to check an URB. */
+void kmsan_handle_urb(const struct urb *urb, bool is_out)
+{
+	if (!urb)
+		return;
+	if (is_out)
+		kmsan_internal_check_memory(urb->transfer_buffer,
+					    urb->transfer_buffer_length,
+					    /*user_addr*/ 0, REASON_SUBMIT_URB);
+	else
+		kmsan_internal_unpoison_memory(urb->transfer_buffer,
+					       urb->transfer_buffer_length,
+					       /*checked*/ false);
+}
+EXPORT_SYMBOL(kmsan_handle_urb);
+
 static void kmsan_handle_dma_page(const void *addr, size_t size,
 				  enum dma_data_direction dir)
 {
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH v4 25/45] kmsan: add tests for KMSAN
  2022-07-01 14:22 [PATCH v4 00/45] Add KernelMemorySanitizer infrastructure Alexander Potapenko
                   ` (23 preceding siblings ...)
  2022-07-01 14:22 ` [PATCH v4 24/45] kmsan: handle memory sent to/from USB Alexander Potapenko
@ 2022-07-01 14:22 ` Alexander Potapenko
  2022-07-12 14:16   ` Marco Elver
  2022-07-01 14:22 ` [PATCH v4 26/45] kmsan: disable strscpy() optimization under KMSAN Alexander Potapenko
                   ` (19 subsequent siblings)
  44 siblings, 1 reply; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-01 14:22 UTC (permalink / raw)
  To: glider
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, linux-mm, linux-arch,
	linux-kernel

The testing module triggers KMSAN warnings in different cases and checks
that the errors are properly reported, using console probes to capture
the tool's output.

Signed-off-by: Alexander Potapenko <glider@google.com>
---
v2:
 -- add memcpy tests

v4:
 -- change sizeof(type) to sizeof(*ptr)
 -- add test expectations for CONFIG_KMSAN_CHECK_PARAM_RETVAL

Link: https://linux-review.googlesource.com/id/I49c3f59014cc37fd13541c80beb0b75a75244650
---
 lib/Kconfig.kmsan     |  12 +
 mm/kmsan/Makefile     |   4 +
 mm/kmsan/kmsan_test.c | 552 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 568 insertions(+)
 create mode 100644 mm/kmsan/kmsan_test.c

diff --git a/lib/Kconfig.kmsan b/lib/Kconfig.kmsan
index 8f768d4034e3c..f56ed7f7c7090 100644
--- a/lib/Kconfig.kmsan
+++ b/lib/Kconfig.kmsan
@@ -47,4 +47,16 @@ config KMSAN_CHECK_PARAM_RETVAL
 	  may potentially report errors in corner cases when non-instrumented
 	  functions call instrumented ones.
 
+config KMSAN_KUNIT_TEST
+	tristate "KMSAN integration test suite" if !KUNIT_ALL_TESTS
+	default KUNIT_ALL_TESTS
+	depends on TRACEPOINTS && KUNIT
+	help
+	  Test suite for KMSAN, testing various error detection scenarios,
+	  and checking that reports are correctly output to console.
+
+	  Say Y here if you want the test to be built into the kernel and run
+	  during boot; say M if you want the test to build as a module; say N
+	  if you are unsure.
+
 endif
diff --git a/mm/kmsan/Makefile b/mm/kmsan/Makefile
index 401acb1a491ce..98eab2856626f 100644
--- a/mm/kmsan/Makefile
+++ b/mm/kmsan/Makefile
@@ -22,3 +22,7 @@ CFLAGS_init.o := $(CC_FLAGS_KMSAN_RUNTIME)
 CFLAGS_instrumentation.o := $(CC_FLAGS_KMSAN_RUNTIME)
 CFLAGS_report.o := $(CC_FLAGS_KMSAN_RUNTIME)
 CFLAGS_shadow.o := $(CC_FLAGS_KMSAN_RUNTIME)
+
+obj-$(CONFIG_KMSAN_KUNIT_TEST) += kmsan_test.o
+KMSAN_SANITIZE_kmsan_test.o := y
+CFLAGS_kmsan_test.o += $(call cc-disable-warning, uninitialized)
diff --git a/mm/kmsan/kmsan_test.c b/mm/kmsan/kmsan_test.c
new file mode 100644
index 0000000000000..1b8da71ae0d4f
--- /dev/null
+++ b/mm/kmsan/kmsan_test.c
@@ -0,0 +1,552 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Test cases for KMSAN.
+ * For each test case checks the presence (or absence) of generated reports.
+ * Relies on 'console' tracepoint to capture reports as they appear in the
+ * kernel log.
+ *
+ * Copyright (C) 2021-2022, Google LLC.
+ * Author: Alexander Potapenko <glider@google.com>
+ *
+ */
+
+#include <kunit/test.h>
+#include "kmsan.h"
+
+#include <linux/jiffies.h>
+#include <linux/kernel.h>
+#include <linux/kmsan.h>
+#include <linux/mm.h>
+#include <linux/random.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/string.h>
+#include <linux/tracepoint.h>
+#include <trace/events/printk.h>
+
+static DEFINE_PER_CPU(int, per_cpu_var);
+
+/* Report as observed from console. */
+static struct {
+	spinlock_t lock;
+	bool available;
+	bool ignore; /* Stop console output collection. */
+	char header[256];
+} observed = {
+	.lock = __SPIN_LOCK_UNLOCKED(observed.lock),
+};
+
+/* Probe for console output: obtains observed lines of interest. */
+static void probe_console(void *ignore, const char *buf, size_t len)
+{
+	unsigned long flags;
+
+	if (observed.ignore)
+		return;
+	spin_lock_irqsave(&observed.lock, flags);
+
+	if (strnstr(buf, "BUG: KMSAN: ", len)) {
+		/*
+		 * KMSAN report and related to the test.
+		 *
+		 * The provided @buf is not NUL-terminated; copy no more than
+		 * @len bytes and let strscpy() add the missing NUL-terminator.
+		 */
+		strscpy(observed.header, buf,
+			min(len + 1, sizeof(observed.header)));
+		WRITE_ONCE(observed.available, true);
+		observed.ignore = true;
+	}
+	spin_unlock_irqrestore(&observed.lock, flags);
+}
+
+/* Check if a report related to the test exists. */
+static bool report_available(void)
+{
+	return READ_ONCE(observed.available);
+}
+
+/* Information we expect in a report. */
+struct expect_report {
+	const char *error_type; /* Error type. */
+	/*
+	 * Kernel symbol from the error header, or NULL if no report is
+	 * expected.
+	 */
+	const char *symbol;
+};
+
+/* Check observed report matches information in @r. */
+static bool report_matches(const struct expect_report *r)
+{
+	typeof(observed.header) expected_header;
+	unsigned long flags;
+	bool ret = false;
+	const char *end;
+	char *cur;
+
+	/* Doubled-checked locking. */
+	if (!report_available() || !r->symbol)
+		return (!report_available() && !r->symbol);
+
+	/* Generate expected report contents. */
+
+	/* Title */
+	cur = expected_header;
+	end = &expected_header[sizeof(expected_header) - 1];
+
+	cur += scnprintf(cur, end - cur, "BUG: KMSAN: %s", r->error_type);
+
+	scnprintf(cur, end - cur, " in %s", r->symbol);
+	/* The exact offset won't match, remove it; also strip module name. */
+	cur = strchr(expected_header, '+');
+	if (cur)
+		*cur = '\0';
+
+	spin_lock_irqsave(&observed.lock, flags);
+	if (!report_available())
+		goto out; /* A new report is being captured. */
+
+	/* Finally match expected output to what we actually observed. */
+	ret = strstr(observed.header, expected_header);
+out:
+	spin_unlock_irqrestore(&observed.lock, flags);
+
+	return ret;
+}
+
+/* ===== Test cases ===== */
+
+/* Prevent replacing branch with select in LLVM. */
+static noinline void check_true(char *arg)
+{
+	pr_info("%s is true\n", arg);
+}
+
+static noinline void check_false(char *arg)
+{
+	pr_info("%s is false\n", arg);
+}
+
+#define USE(x)                                                                 \
+	do {                                                                   \
+		if (x)                                                         \
+			check_true(#x);                                        \
+		else                                                           \
+			check_false(#x);                                       \
+	} while (0)
+
+#define EXPECTATION_ETYPE_FN(e, reason, fn)                                    \
+	struct expect_report e = {                                             \
+		.error_type = reason,                                          \
+		.symbol = fn,                                                  \
+	}
+
+#define EXPECTATION_NO_REPORT(e) EXPECTATION_ETYPE_FN(e, NULL, NULL)
+#define EXPECTATION_UNINIT_VALUE_FN(e, fn)                                     \
+	EXPECTATION_ETYPE_FN(e, "uninit-value", fn)
+#define EXPECTATION_UNINIT_VALUE(e) EXPECTATION_UNINIT_VALUE_FN(e, __func__)
+#define EXPECTATION_USE_AFTER_FREE(e)                                          \
+	EXPECTATION_ETYPE_FN(e, "use-after-free", __func__)
+
+/* Test case: ensure that kmalloc() returns uninitialized memory. */
+static void test_uninit_kmalloc(struct kunit *test)
+{
+	EXPECTATION_UNINIT_VALUE(expect);
+	int *ptr;
+
+	kunit_info(test, "uninitialized kmalloc test (UMR report)\n");
+	ptr = kmalloc(sizeof(*ptr), GFP_KERNEL);
+	USE(*ptr);
+	KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+/*
+ * Test case: ensure that kmalloc'ed memory becomes initialized after memset().
+ */
+static void test_init_kmalloc(struct kunit *test)
+{
+	EXPECTATION_NO_REPORT(expect);
+	int *ptr;
+
+	kunit_info(test, "initialized kmalloc test (no reports)\n");
+	ptr = kmalloc(sizeof(*ptr), GFP_KERNEL);
+	memset(ptr, 0, sizeof(*ptr));
+	USE(*ptr);
+	KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+/* Test case: ensure that kzalloc() returns initialized memory. */
+static void test_init_kzalloc(struct kunit *test)
+{
+	EXPECTATION_NO_REPORT(expect);
+	int *ptr;
+
+	kunit_info(test, "initialized kzalloc test (no reports)\n");
+	ptr = kzalloc(sizeof(*ptr), GFP_KERNEL);
+	USE(*ptr);
+	KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+/* Test case: ensure that local variables are uninitialized by default. */
+static void test_uninit_stack_var(struct kunit *test)
+{
+	EXPECTATION_UNINIT_VALUE(expect);
+	volatile int cond;
+
+	kunit_info(test, "uninitialized stack variable (UMR report)\n");
+	USE(cond);
+	KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+/* Test case: ensure that local variables with initializers are initialized. */
+static void test_init_stack_var(struct kunit *test)
+{
+	EXPECTATION_NO_REPORT(expect);
+	volatile int cond = 1;
+
+	kunit_info(test, "initialized stack variable (no reports)\n");
+	USE(cond);
+	KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+static noinline void two_param_fn_2(int arg1, int arg2)
+{
+	USE(arg1);
+	USE(arg2);
+}
+
+static noinline void one_param_fn(int arg)
+{
+	two_param_fn_2(arg, arg);
+	USE(arg);
+}
+
+static noinline void two_param_fn(int arg1, int arg2)
+{
+	int init = 0;
+
+	one_param_fn(init);
+	USE(arg1);
+	USE(arg2);
+}
+
+static void test_params(struct kunit *test)
+{
+#ifdef CONFIG_KMSAN_CHECK_PARAM_RETVAL
+	/*
+	 * With eager param/retval checking enabled, KMSAN will report an error
+	 * before the call to two_param_fn().
+	 */
+	EXPECTATION_UNINIT_VALUE_FN(expect, "test_params");
+#else
+	EXPECTATION_UNINIT_VALUE_FN(expect, "two_param_fn");
+#endif
+	volatile int uninit, init = 1;
+
+	kunit_info(test,
+		   "uninit passed through a function parameter (UMR report)\n");
+	two_param_fn(uninit, init);
+	KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+static int signed_sum3(int a, int b, int c)
+{
+	return a + b + c;
+}
+
+/*
+ * Test case: ensure that uninitialized values are tracked through function
+ * arguments.
+ */
+static void test_uninit_multiple_params(struct kunit *test)
+{
+	EXPECTATION_UNINIT_VALUE(expect);
+	volatile char b = 3, c;
+	volatile int a;
+
+	kunit_info(test, "uninitialized local passed to fn (UMR report)\n");
+	USE(signed_sum3(a, b, c));
+	KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+/* Helper function to make an array uninitialized. */
+static noinline void do_uninit_local_array(char *array, int start, int stop)
+{
+	volatile char uninit;
+	int i;
+
+	for (i = start; i < stop; i++)
+		array[i] = uninit;
+}
+
+/*
+ * Test case: ensure kmsan_check_memory() reports an error when checking
+ * uninitialized memory.
+ */
+static void test_uninit_kmsan_check_memory(struct kunit *test)
+{
+	EXPECTATION_UNINIT_VALUE_FN(expect, "test_uninit_kmsan_check_memory");
+	volatile char local_array[8];
+
+	kunit_info(
+		test,
+		"kmsan_check_memory() called on uninit local (UMR report)\n");
+	do_uninit_local_array((char *)local_array, 5, 7);
+
+	kmsan_check_memory((char *)local_array, 8);
+	KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+/*
+ * Test case: check that a virtual memory range created with vmap() from
+ * initialized pages is still considered as initialized.
+ */
+static void test_init_kmsan_vmap_vunmap(struct kunit *test)
+{
+	EXPECTATION_NO_REPORT(expect);
+	const int npages = 2;
+	struct page **pages;
+	void *vbuf;
+	int i;
+
+	kunit_info(test, "pages initialized via vmap (no reports)\n");
+
+	pages = kmalloc_array(npages, sizeof(*pages), GFP_KERNEL);
+	for (i = 0; i < npages; i++)
+		pages[i] = alloc_page(GFP_KERNEL);
+	vbuf = vmap(pages, npages, VM_MAP, PAGE_KERNEL);
+	memset(vbuf, 0xfe, npages * PAGE_SIZE);
+	for (i = 0; i < npages; i++)
+		kmsan_check_memory(page_address(pages[i]), PAGE_SIZE);
+
+	if (vbuf)
+		vunmap(vbuf);
+	for (i = 0; i < npages; i++)
+		if (pages[i])
+			__free_page(pages[i]);
+	kfree(pages);
+	KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+/*
+ * Test case: ensure that memset() can initialize a buffer allocated via
+ * vmalloc().
+ */
+static void test_init_vmalloc(struct kunit *test)
+{
+	EXPECTATION_NO_REPORT(expect);
+	int npages = 8, i;
+	char *buf;
+
+	kunit_info(test, "vmalloc buffer can be initialized (no reports)\n");
+	buf = vmalloc(PAGE_SIZE * npages);
+	buf[0] = 1;
+	memset(buf, 0xfe, PAGE_SIZE * npages);
+	USE(buf[0]);
+	for (i = 0; i < npages; i++)
+		kmsan_check_memory(&buf[PAGE_SIZE * i], PAGE_SIZE);
+	vfree(buf);
+	KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+/* Test case: ensure that use-after-free reporting works. */
+static void test_uaf(struct kunit *test)
+{
+	EXPECTATION_USE_AFTER_FREE(expect);
+	volatile int value;
+	volatile int *var;
+
+	kunit_info(test, "use-after-free in kmalloc-ed buffer (UMR report)\n");
+	var = kmalloc(80, GFP_KERNEL);
+	var[3] = 0xfeedface;
+	kfree((int *)var);
+	/* Copy the invalid value before checking it. */
+	value = var[3];
+	USE(value);
+	KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+/*
+ * Test case: ensure that uninitialized values are propagated through per-CPU
+ * memory.
+ */
+static void test_percpu_propagate(struct kunit *test)
+{
+	EXPECTATION_UNINIT_VALUE(expect);
+	volatile int uninit, check;
+
+	kunit_info(test,
+		   "uninit local stored to per_cpu memory (UMR report)\n");
+
+	this_cpu_write(per_cpu_var, uninit);
+	check = this_cpu_read(per_cpu_var);
+	USE(check);
+	KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+/*
+ * Test case: ensure that passing uninitialized values to printk() leads to an
+ * error report.
+ */
+static void test_printk(struct kunit *test)
+{
+#ifdef CONFIG_KMSAN_CHECK_PARAM_RETVAL
+	/*
+	 * With eager param/retval checking enabled, KMSAN will report an error
+	 * before the call to pr_info().
+	 */
+	EXPECTATION_UNINIT_VALUE_FN(expect, "test_printk");
+#else
+	EXPECTATION_UNINIT_VALUE_FN(expect, "number");
+#endif
+	volatile int uninit;
+
+	kunit_info(test, "uninit local passed to pr_info() (UMR report)\n");
+	pr_info("%px contains %d\n", &uninit, uninit);
+	KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+/*
+ * Test case: ensure that memcpy() correctly copies uninitialized values between
+ * aligned `src` and `dst`.
+ */
+static void test_memcpy_aligned_to_aligned(struct kunit *test)
+{
+	EXPECTATION_UNINIT_VALUE_FN(expect, "test_memcpy_aligned_to_aligned");
+	volatile int uninit_src;
+	volatile int dst = 0;
+
+	kunit_info(test, "memcpy()ing aligned uninit src to aligned dst (UMR report)\n");
+	memcpy((void *)&dst, (void *)&uninit_src, sizeof(uninit_src));
+	kmsan_check_memory((void *)&dst, sizeof(dst));
+	KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+/*
+ * Test case: ensure that memcpy() correctly copies uninitialized values between
+ * aligned `src` and unaligned `dst`.
+ *
+ * Copying aligned 4-byte value to an unaligned one leads to touching two
+ * aligned 4-byte values. This test case checks that KMSAN correctly reports an
+ * error on the first of the two values.
+ */
+static void test_memcpy_aligned_to_unaligned(struct kunit *test)
+{
+	EXPECTATION_UNINIT_VALUE_FN(expect, "test_memcpy_aligned_to_unaligned");
+	volatile int uninit_src;
+	volatile char dst[8] = {0};
+
+	kunit_info(test, "memcpy()ing aligned uninit src to unaligned dst (UMR report)\n");
+	memcpy((void *)&dst[1], (void *)&uninit_src, sizeof(uninit_src));
+	kmsan_check_memory((void *)dst, 4);
+	KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+/*
+ * Test case: ensure that memcpy() correctly copies uninitialized values between
+ * aligned `src` and unaligned `dst`.
+ *
+ * Copying aligned 4-byte value to an unaligned one leads to touching two
+ * aligned 4-byte values. This test case checks that KMSAN correctly reports an
+ * error on the second of the two values.
+ */
+static void test_memcpy_aligned_to_unaligned2(struct kunit *test)
+{
+	EXPECTATION_UNINIT_VALUE_FN(expect, "test_memcpy_aligned_to_unaligned2");
+	volatile int uninit_src;
+	volatile char dst[8] = {0};
+
+	kunit_info(test, "memcpy()ing aligned uninit src to unaligned dst - part 2 (UMR report)\n");
+	memcpy((void *)&dst[1], (void *)&uninit_src, sizeof(uninit_src));
+	kmsan_check_memory((void *)&dst[4], sizeof(uninit_src));
+	KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+static struct kunit_case kmsan_test_cases[] = {
+	KUNIT_CASE(test_uninit_kmalloc),
+	KUNIT_CASE(test_init_kmalloc),
+	KUNIT_CASE(test_init_kzalloc),
+	KUNIT_CASE(test_uninit_stack_var),
+	KUNIT_CASE(test_init_stack_var),
+	KUNIT_CASE(test_params),
+	KUNIT_CASE(test_uninit_multiple_params),
+	KUNIT_CASE(test_uninit_kmsan_check_memory),
+	KUNIT_CASE(test_init_kmsan_vmap_vunmap),
+	KUNIT_CASE(test_init_vmalloc),
+	KUNIT_CASE(test_uaf),
+	KUNIT_CASE(test_percpu_propagate),
+	KUNIT_CASE(test_printk),
+	KUNIT_CASE(test_memcpy_aligned_to_aligned),
+	KUNIT_CASE(test_memcpy_aligned_to_unaligned),
+	KUNIT_CASE(test_memcpy_aligned_to_unaligned2),
+	{},
+};
+
+/* ===== End test cases ===== */
+
+static int test_init(struct kunit *test)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&observed.lock, flags);
+	observed.header[0] = '\0';
+	observed.ignore = false;
+	observed.available = false;
+	spin_unlock_irqrestore(&observed.lock, flags);
+
+	return 0;
+}
+
+static void test_exit(struct kunit *test)
+{
+}
+
+static struct kunit_suite kmsan_test_suite = {
+	.name = "kmsan",
+	.test_cases = kmsan_test_cases,
+	.init = test_init,
+	.exit = test_exit,
+};
+static struct kunit_suite *kmsan_test_suites[] = { &kmsan_test_suite, NULL };
+
+static void register_tracepoints(struct tracepoint *tp, void *ignore)
+{
+	check_trace_callback_type_console(probe_console);
+	if (!strcmp(tp->name, "console"))
+		WARN_ON(tracepoint_probe_register(tp, probe_console, NULL));
+}
+
+static void unregister_tracepoints(struct tracepoint *tp, void *ignore)
+{
+	if (!strcmp(tp->name, "console"))
+		tracepoint_probe_unregister(tp, probe_console, NULL);
+}
+
+/*
+ * We only want to do tracepoints setup and teardown once, therefore we have to
+ * customize the init and exit functions and cannot rely on kunit_test_suite().
+ */
+static int __init kmsan_test_init(void)
+{
+	/*
+	 * Because we want to be able to build the test as a module, we need to
+	 * iterate through all known tracepoints, since the static registration
+	 * won't work here.
+	 */
+	for_each_kernel_tracepoint(register_tracepoints, NULL);
+	return __kunit_test_suites_init(kmsan_test_suites);
+}
+
+static void kmsan_test_exit(void)
+{
+	__kunit_test_suites_exit(kmsan_test_suites);
+	for_each_kernel_tracepoint(unregister_tracepoints, NULL);
+	tracepoint_synchronize_unregister();
+}
+
+late_initcall_sync(kmsan_test_init);
+module_exit(kmsan_test_exit);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Alexander Potapenko <glider@google.com>");
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH v4 26/45] kmsan: disable strscpy() optimization under KMSAN
  2022-07-01 14:22 [PATCH v4 00/45] Add KernelMemorySanitizer infrastructure Alexander Potapenko
                   ` (24 preceding siblings ...)
  2022-07-01 14:22 ` [PATCH v4 25/45] kmsan: add tests for KMSAN Alexander Potapenko
@ 2022-07-01 14:22 ` Alexander Potapenko
  2022-07-01 14:22 ` [PATCH v4 27/45] crypto: kmsan: disable accelerated configs " Alexander Potapenko
                   ` (18 subsequent siblings)
  44 siblings, 0 replies; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-01 14:22 UTC (permalink / raw)
  To: glider
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, linux-mm, linux-arch,
	linux-kernel

Disable the efficient 8-byte reading under KMSAN to avoid false positives.

Signed-off-by: Alexander Potapenko <glider@google.com>

---

Link: https://linux-review.googlesource.com/id/Iffd8336965e88fce915db2e6a9d6524422975f69
---
 lib/string.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/lib/string.c b/lib/string.c
index 6f334420f6871..3371d26a0e390 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -197,6 +197,14 @@ ssize_t strscpy(char *dest, const char *src, size_t count)
 		max = 0;
 #endif
 
+	/*
+	 * read_word_at_a_time() below may read uninitialized bytes after the
+	 * trailing zero and use them in comparisons. Disable this optimization
+	 * under KMSAN to prevent false positive reports.
+	 */
+	if (IS_ENABLED(CONFIG_KMSAN))
+		max = 0;
+
 	while (max >= sizeof(unsigned long)) {
 		unsigned long c, data;
 
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH v4 27/45] crypto: kmsan: disable accelerated configs under KMSAN
  2022-07-01 14:22 [PATCH v4 00/45] Add KernelMemorySanitizer infrastructure Alexander Potapenko
                   ` (25 preceding siblings ...)
  2022-07-01 14:22 ` [PATCH v4 26/45] kmsan: disable strscpy() optimization under KMSAN Alexander Potapenko
@ 2022-07-01 14:22 ` Alexander Potapenko
  2022-07-01 14:22 ` [PATCH v4 28/45] kmsan: disable physical page merging in biovec Alexander Potapenko
                   ` (17 subsequent siblings)
  44 siblings, 0 replies; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-01 14:22 UTC (permalink / raw)
  To: glider
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, linux-mm, linux-arch,
	linux-kernel

KMSAN is unable to understand when initialized values come from assembly.
Disable accelerated configs in KMSAN builds to prevent false positive
reports.

Signed-off-by: Alexander Potapenko <glider@google.com>

---

Link: https://linux-review.googlesource.com/id/Idb2334bf3a1b68b31b399709baefaa763038cc50
---
 crypto/Kconfig      | 30 ++++++++++++++++++++++++++++++
 drivers/net/Kconfig |  1 +
 2 files changed, 31 insertions(+)

diff --git a/crypto/Kconfig b/crypto/Kconfig
index 1d44893a997ba..7ddda6072ef35 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -298,6 +298,7 @@ config CRYPTO_CURVE25519
 config CRYPTO_CURVE25519_X86
 	tristate "x86_64 accelerated Curve25519 scalar multiplication library"
 	depends on X86 && 64BIT
+	depends on !KMSAN # avoid false positives from assembly
 	select CRYPTO_LIB_CURVE25519_GENERIC
 	select CRYPTO_ARCH_HAVE_LIB_CURVE25519
 
@@ -346,11 +347,13 @@ config CRYPTO_AEGIS128
 config CRYPTO_AEGIS128_SIMD
 	bool "Support SIMD acceleration for AEGIS-128"
 	depends on CRYPTO_AEGIS128 && ((ARM || ARM64) && KERNEL_MODE_NEON)
+	depends on !KMSAN # avoid false positives from assembly
 	default y
 
 config CRYPTO_AEGIS128_AESNI_SSE2
 	tristate "AEGIS-128 AEAD algorithm (x86_64 AESNI+SSE2 implementation)"
 	depends on X86 && 64BIT
+	depends on !KMSAN # avoid false positives from assembly
 	select CRYPTO_AEAD
 	select CRYPTO_SIMD
 	help
@@ -487,6 +490,7 @@ config CRYPTO_NHPOLY1305
 config CRYPTO_NHPOLY1305_SSE2
 	tristate "NHPoly1305 hash function (x86_64 SSE2 implementation)"
 	depends on X86 && 64BIT
+	depends on !KMSAN # avoid false positives from assembly
 	select CRYPTO_NHPOLY1305
 	help
 	  SSE2 optimized implementation of the hash function used by the
@@ -495,6 +499,7 @@ config CRYPTO_NHPOLY1305_SSE2
 config CRYPTO_NHPOLY1305_AVX2
 	tristate "NHPoly1305 hash function (x86_64 AVX2 implementation)"
 	depends on X86 && 64BIT
+	depends on !KMSAN # avoid false positives from assembly
 	select CRYPTO_NHPOLY1305
 	help
 	  AVX2 optimized implementation of the hash function used by the
@@ -608,6 +613,7 @@ config CRYPTO_CRC32C
 config CRYPTO_CRC32C_INTEL
 	tristate "CRC32c INTEL hardware acceleration"
 	depends on X86
+	depends on !KMSAN # avoid false positives from assembly
 	select CRYPTO_HASH
 	help
 	  In Intel processor with SSE4.2 supported, the processor will
@@ -648,6 +654,7 @@ config CRYPTO_CRC32
 config CRYPTO_CRC32_PCLMUL
 	tristate "CRC32 PCLMULQDQ hardware acceleration"
 	depends on X86
+	depends on !KMSAN # avoid false positives from assembly
 	select CRYPTO_HASH
 	select CRC32
 	help
@@ -713,6 +720,7 @@ config CRYPTO_BLAKE2S
 config CRYPTO_BLAKE2S_X86
 	tristate "BLAKE2s digest algorithm (x86 accelerated version)"
 	depends on X86 && 64BIT
+	depends on !KMSAN # avoid false positives from assembly
 	select CRYPTO_LIB_BLAKE2S_GENERIC
 	select CRYPTO_ARCH_HAVE_LIB_BLAKE2S
 
@@ -727,6 +735,7 @@ config CRYPTO_CRCT10DIF
 config CRYPTO_CRCT10DIF_PCLMUL
 	tristate "CRCT10DIF PCLMULQDQ hardware acceleration"
 	depends on X86 && 64BIT && CRC_T10DIF
+	depends on !KMSAN # avoid false positives from assembly
 	select CRYPTO_HASH
 	help
 	  For x86_64 processors with SSE4.2 and PCLMULQDQ supported,
@@ -779,6 +788,7 @@ config CRYPTO_POLY1305
 config CRYPTO_POLY1305_X86_64
 	tristate "Poly1305 authenticator algorithm (x86_64/SSE2/AVX2)"
 	depends on X86 && 64BIT
+	depends on !KMSAN # avoid false positives from assembly
 	select CRYPTO_LIB_POLY1305_GENERIC
 	select CRYPTO_ARCH_HAVE_LIB_POLY1305
 	help
@@ -867,6 +877,7 @@ config CRYPTO_SHA1
 config CRYPTO_SHA1_SSSE3
 	tristate "SHA1 digest algorithm (SSSE3/AVX/AVX2/SHA-NI)"
 	depends on X86 && 64BIT
+	depends on !KMSAN # avoid false positives from assembly
 	select CRYPTO_SHA1
 	select CRYPTO_HASH
 	help
@@ -878,6 +889,7 @@ config CRYPTO_SHA1_SSSE3
 config CRYPTO_SHA256_SSSE3
 	tristate "SHA256 digest algorithm (SSSE3/AVX/AVX2/SHA-NI)"
 	depends on X86 && 64BIT
+	depends on !KMSAN # avoid false positives from assembly
 	select CRYPTO_SHA256
 	select CRYPTO_HASH
 	help
@@ -890,6 +902,7 @@ config CRYPTO_SHA256_SSSE3
 config CRYPTO_SHA512_SSSE3
 	tristate "SHA512 digest algorithm (SSSE3/AVX/AVX2)"
 	depends on X86 && 64BIT
+	depends on !KMSAN # avoid false positives from assembly
 	select CRYPTO_SHA512
 	select CRYPTO_HASH
 	help
@@ -1065,6 +1078,7 @@ config CRYPTO_WP512
 config CRYPTO_GHASH_CLMUL_NI_INTEL
 	tristate "GHASH hash function (CLMUL-NI accelerated)"
 	depends on X86 && 64BIT
+	depends on !KMSAN # avoid false positives from assembly
 	select CRYPTO_CRYPTD
 	help
 	  This is the x86_64 CLMUL-NI accelerated implementation of
@@ -1115,6 +1129,7 @@ config CRYPTO_AES_TI
 config CRYPTO_AES_NI_INTEL
 	tristate "AES cipher algorithms (AES-NI)"
 	depends on X86
+	depends on !KMSAN # avoid false positives from assembly
 	select CRYPTO_AEAD
 	select CRYPTO_LIB_AES
 	select CRYPTO_ALGAPI
@@ -1239,6 +1254,7 @@ config CRYPTO_BLOWFISH_COMMON
 config CRYPTO_BLOWFISH_X86_64
 	tristate "Blowfish cipher algorithm (x86_64)"
 	depends on X86 && 64BIT
+	depends on !KMSAN # avoid false positives from assembly
 	select CRYPTO_SKCIPHER
 	select CRYPTO_BLOWFISH_COMMON
 	imply CRYPTO_CTR
@@ -1269,6 +1285,7 @@ config CRYPTO_CAMELLIA
 config CRYPTO_CAMELLIA_X86_64
 	tristate "Camellia cipher algorithm (x86_64)"
 	depends on X86 && 64BIT
+	depends on !KMSAN # avoid false positives from assembly
 	select CRYPTO_SKCIPHER
 	imply CRYPTO_CTR
 	help
@@ -1285,6 +1302,7 @@ config CRYPTO_CAMELLIA_X86_64
 config CRYPTO_CAMELLIA_AESNI_AVX_X86_64
 	tristate "Camellia cipher algorithm (x86_64/AES-NI/AVX)"
 	depends on X86 && 64BIT
+	depends on !KMSAN # avoid false positives from assembly
 	select CRYPTO_SKCIPHER
 	select CRYPTO_CAMELLIA_X86_64
 	select CRYPTO_SIMD
@@ -1303,6 +1321,7 @@ config CRYPTO_CAMELLIA_AESNI_AVX_X86_64
 config CRYPTO_CAMELLIA_AESNI_AVX2_X86_64
 	tristate "Camellia cipher algorithm (x86_64/AES-NI/AVX2)"
 	depends on X86 && 64BIT
+	depends on !KMSAN # avoid false positives from assembly
 	select CRYPTO_CAMELLIA_AESNI_AVX_X86_64
 	help
 	  Camellia cipher algorithm module (x86_64/AES-NI/AVX2).
@@ -1348,6 +1367,7 @@ config CRYPTO_CAST5
 config CRYPTO_CAST5_AVX_X86_64
 	tristate "CAST5 (CAST-128) cipher algorithm (x86_64/AVX)"
 	depends on X86 && 64BIT
+	depends on !KMSAN # avoid false positives from assembly
 	select CRYPTO_SKCIPHER
 	select CRYPTO_CAST5
 	select CRYPTO_CAST_COMMON
@@ -1371,6 +1391,7 @@ config CRYPTO_CAST6
 config CRYPTO_CAST6_AVX_X86_64
 	tristate "CAST6 (CAST-256) cipher algorithm (x86_64/AVX)"
 	depends on X86 && 64BIT
+	depends on !KMSAN # avoid false positives from assembly
 	select CRYPTO_SKCIPHER
 	select CRYPTO_CAST6
 	select CRYPTO_CAST_COMMON
@@ -1404,6 +1425,7 @@ config CRYPTO_DES_SPARC64
 config CRYPTO_DES3_EDE_X86_64
 	tristate "Triple DES EDE cipher algorithm (x86-64)"
 	depends on X86 && 64BIT
+	depends on !KMSAN # avoid false positives from assembly
 	select CRYPTO_SKCIPHER
 	select CRYPTO_LIB_DES
 	imply CRYPTO_CTR
@@ -1461,6 +1483,7 @@ config CRYPTO_CHACHA20
 config CRYPTO_CHACHA20_X86_64
 	tristate "ChaCha stream cipher algorithms (x86_64/SSSE3/AVX2/AVX-512VL)"
 	depends on X86 && 64BIT
+	depends on !KMSAN # avoid false positives from assembly
 	select CRYPTO_SKCIPHER
 	select CRYPTO_LIB_CHACHA_GENERIC
 	select CRYPTO_ARCH_HAVE_LIB_CHACHA
@@ -1504,6 +1527,7 @@ config CRYPTO_SERPENT
 config CRYPTO_SERPENT_SSE2_X86_64
 	tristate "Serpent cipher algorithm (x86_64/SSE2)"
 	depends on X86 && 64BIT
+	depends on !KMSAN # avoid false positives from assembly
 	select CRYPTO_SKCIPHER
 	select CRYPTO_SERPENT
 	select CRYPTO_SIMD
@@ -1523,6 +1547,7 @@ config CRYPTO_SERPENT_SSE2_X86_64
 config CRYPTO_SERPENT_SSE2_586
 	tristate "Serpent cipher algorithm (i586/SSE2)"
 	depends on X86 && !64BIT
+	depends on !KMSAN # avoid false positives from assembly
 	select CRYPTO_SKCIPHER
 	select CRYPTO_SERPENT
 	select CRYPTO_SIMD
@@ -1542,6 +1567,7 @@ config CRYPTO_SERPENT_SSE2_586
 config CRYPTO_SERPENT_AVX_X86_64
 	tristate "Serpent cipher algorithm (x86_64/AVX)"
 	depends on X86 && 64BIT
+	depends on !KMSAN # avoid false positives from assembly
 	select CRYPTO_SKCIPHER
 	select CRYPTO_SERPENT
 	select CRYPTO_SIMD
@@ -1562,6 +1588,7 @@ config CRYPTO_SERPENT_AVX_X86_64
 config CRYPTO_SERPENT_AVX2_X86_64
 	tristate "Serpent cipher algorithm (x86_64/AVX2)"
 	depends on X86 && 64BIT
+	depends on !KMSAN # avoid false positives from assembly
 	select CRYPTO_SERPENT_AVX_X86_64
 	help
 	  Serpent cipher algorithm, by Anderson, Biham & Knudsen.
@@ -1706,6 +1733,7 @@ config CRYPTO_TWOFISH_586
 config CRYPTO_TWOFISH_X86_64
 	tristate "Twofish cipher algorithm (x86_64)"
 	depends on (X86 || UML_X86) && 64BIT
+	depends on !KMSAN # avoid false positives from assembly
 	select CRYPTO_ALGAPI
 	select CRYPTO_TWOFISH_COMMON
 	imply CRYPTO_CTR
@@ -1723,6 +1751,7 @@ config CRYPTO_TWOFISH_X86_64
 config CRYPTO_TWOFISH_X86_64_3WAY
 	tristate "Twofish cipher algorithm (x86_64, 3-way parallel)"
 	depends on X86 && 64BIT
+	depends on !KMSAN # avoid false positives from assembly
 	select CRYPTO_SKCIPHER
 	select CRYPTO_TWOFISH_COMMON
 	select CRYPTO_TWOFISH_X86_64
@@ -1743,6 +1772,7 @@ config CRYPTO_TWOFISH_X86_64_3WAY
 config CRYPTO_TWOFISH_AVX_X86_64
 	tristate "Twofish cipher algorithm (x86_64/AVX)"
 	depends on X86 && 64BIT
+	depends on !KMSAN # avoid false positives from assembly
 	select CRYPTO_SKCIPHER
 	select CRYPTO_SIMD
 	select CRYPTO_TWOFISH_COMMON
diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index b2a4f998c180e..fed89b6981759 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -76,6 +76,7 @@ config WIREGUARD
 	tristate "WireGuard secure network tunnel"
 	depends on NET && INET
 	depends on IPV6 || !IPV6
+	depends on !KMSAN # KMSAN doesn't support the crypto configs below
 	select NET_UDP_TUNNEL
 	select DST_CACHE
 	select CRYPTO
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH v4 28/45] kmsan: disable physical page merging in biovec
  2022-07-01 14:22 [PATCH v4 00/45] Add KernelMemorySanitizer infrastructure Alexander Potapenko
                   ` (26 preceding siblings ...)
  2022-07-01 14:22 ` [PATCH v4 27/45] crypto: kmsan: disable accelerated configs " Alexander Potapenko
@ 2022-07-01 14:22 ` Alexander Potapenko
  2022-07-01 14:22 ` [PATCH v4 29/45] block: kmsan: skip bio block merging logic for KMSAN Alexander Potapenko
                   ` (16 subsequent siblings)
  44 siblings, 0 replies; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-01 14:22 UTC (permalink / raw)
  To: glider
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, linux-mm, linux-arch,
	linux-kernel

KMSAN metadata for adjacent physical pages may not be adjacent,
therefore accessing such pages together may lead to metadata
corruption.
We disable merging pages in biovec to prevent such corruptions.

Signed-off-by: Alexander Potapenko <glider@google.com>
---

Link: https://linux-review.googlesource.com/id/Iece16041be5ee47904fbc98121b105e5be5fea5c
---
 block/blk.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/block/blk.h b/block/blk.h
index 434017701403f..96309a98a60e3 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -93,6 +93,13 @@ static inline bool biovec_phys_mergeable(struct request_queue *q,
 	phys_addr_t addr1 = page_to_phys(vec1->bv_page) + vec1->bv_offset;
 	phys_addr_t addr2 = page_to_phys(vec2->bv_page) + vec2->bv_offset;
 
+	/*
+	 * Merging adjacent physical pages may not work correctly under KMSAN
+	 * if their metadata pages aren't adjacent. Just disable merging.
+	 */
+	if (IS_ENABLED(CONFIG_KMSAN))
+		return false;
+
 	if (addr1 + vec1->bv_len != addr2)
 		return false;
 	if (xen_domain() && !xen_biovec_phys_mergeable(vec1, vec2->bv_page))
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH v4 29/45] block: kmsan: skip bio block merging logic for KMSAN
  2022-07-01 14:22 [PATCH v4 00/45] Add KernelMemorySanitizer infrastructure Alexander Potapenko
                   ` (27 preceding siblings ...)
  2022-07-01 14:22 ` [PATCH v4 28/45] kmsan: disable physical page merging in biovec Alexander Potapenko
@ 2022-07-01 14:22 ` Alexander Potapenko
  2022-07-13 10:22   ` Marco Elver
  2022-07-01 14:22 ` [PATCH v4 30/45] kcov: kmsan: unpoison area->list in kcov_remote_area_put() Alexander Potapenko
                   ` (15 subsequent siblings)
  44 siblings, 1 reply; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-01 14:22 UTC (permalink / raw)
  To: glider
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, linux-mm, linux-arch,
	linux-kernel, Eric Biggers

KMSAN doesn't allow treating adjacent memory pages as such, if they were
allocated by different alloc_pages() calls.
The block layer however does so: adjacent pages end up being used
together. To prevent this, make page_is_mergeable() return false under
KMSAN.

Suggested-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Alexander Potapenko <glider@google.com>

---

v4:
 -- swap block: and kmsan: in the subject

Link: https://linux-review.googlesource.com/id/Ie29cc2464c70032347c32ab2a22e1e7a0b37b905
---
 block/bio.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/block/bio.c b/block/bio.c
index 51c99f2c5c908..ce6b3c82159a6 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -867,6 +867,8 @@ static inline bool page_is_mergeable(const struct bio_vec *bv,
 		return false;
 
 	*same_page = ((vec_end_addr & PAGE_MASK) == page_addr);
+	if (!*same_page && IS_ENABLED(CONFIG_KMSAN))
+		return false;
 	if (*same_page)
 		return true;
 	return (bv->bv_page + bv_end / PAGE_SIZE) == (page + off / PAGE_SIZE);
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH v4 30/45] kcov: kmsan: unpoison area->list in kcov_remote_area_put()
  2022-07-01 14:22 [PATCH v4 00/45] Add KernelMemorySanitizer infrastructure Alexander Potapenko
                   ` (28 preceding siblings ...)
  2022-07-01 14:22 ` [PATCH v4 29/45] block: kmsan: skip bio block merging logic for KMSAN Alexander Potapenko
@ 2022-07-01 14:22 ` Alexander Potapenko
  2022-07-01 14:22 ` [PATCH v4 31/45] security: kmsan: fix interoperability with auto-initialization Alexander Potapenko
                   ` (14 subsequent siblings)
  44 siblings, 0 replies; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-01 14:22 UTC (permalink / raw)
  To: glider
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, linux-mm, linux-arch,
	linux-kernel

KMSAN does not instrument kernel/kcov.c for performance reasons (with
CONFIG_KCOV=y virtually every place in the kernel invokes kcov
instrumentation). Therefore the tool may miss writes from kcov.c that
initialize memory.

When CONFIG_DEBUG_LIST is enabled, list pointers from kernel/kcov.c are
passed to instrumented helpers in lib/list_debug.c, resulting in false
positives.

To work around these reports, we unpoison the contents of area->list after
initializing it.

Signed-off-by: Alexander Potapenko <glider@google.com>
---

v4:
 -- change sizeof(type) to sizeof(*ptr)
 -- swap kcov: and kmsan: in the subject

Link: https://linux-review.googlesource.com/id/Ie17f2ee47a7af58f5cdf716d585ebf0769348a5a
---
 kernel/kcov.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/kernel/kcov.c b/kernel/kcov.c
index e19c84b02452e..e5cd09fd8a050 100644
--- a/kernel/kcov.c
+++ b/kernel/kcov.c
@@ -11,6 +11,7 @@
 #include <linux/fs.h>
 #include <linux/hashtable.h>
 #include <linux/init.h>
+#include <linux/kmsan-checks.h>
 #include <linux/mm.h>
 #include <linux/preempt.h>
 #include <linux/printk.h>
@@ -152,6 +153,12 @@ static void kcov_remote_area_put(struct kcov_remote_area *area,
 	INIT_LIST_HEAD(&area->list);
 	area->size = size;
 	list_add(&area->list, &kcov_remote_areas);
+	/*
+	 * KMSAN doesn't instrument this file, so it may not know area->list
+	 * is initialized. Unpoison it explicitly to avoid reports in
+	 * kcov_remote_area_get().
+	 */
+	kmsan_unpoison_memory(&area->list, sizeof(area->list));
 }
 
 static notrace bool check_kcov_mode(enum kcov_mode needed_mode, struct task_struct *t)
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH v4 31/45] security: kmsan: fix interoperability with auto-initialization
  2022-07-01 14:22 [PATCH v4 00/45] Add KernelMemorySanitizer infrastructure Alexander Potapenko
                   ` (29 preceding siblings ...)
  2022-07-01 14:22 ` [PATCH v4 30/45] kcov: kmsan: unpoison area->list in kcov_remote_area_put() Alexander Potapenko
@ 2022-07-01 14:22 ` Alexander Potapenko
  2022-07-01 14:22 ` [PATCH v4 32/45] objtool: kmsan: list KMSAN API functions as uaccess-safe Alexander Potapenko
                   ` (13 subsequent siblings)
  44 siblings, 0 replies; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-01 14:22 UTC (permalink / raw)
  To: glider
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, linux-mm, linux-arch,
	linux-kernel

Heap and stack initialization is great, but not when we are trying
uses of uninitialized memory. When the kernel is built with KMSAN,
having kernel memory initialization enabled may introduce false
negatives.

We disable CONFIG_INIT_STACK_ALL_PATTERN and CONFIG_INIT_STACK_ALL_ZERO
under CONFIG_KMSAN, making it impossible to auto-initialize stack
variables in KMSAN builds. We also disable CONFIG_INIT_ON_ALLOC_DEFAULT_ON
and CONFIG_INIT_ON_FREE_DEFAULT_ON to prevent accidental use of heap
auto-initialization.

We however still let the users enable heap auto-initialization at
boot-time (by setting init_on_alloc=1 or init_on_free=1), in which case
a warning is printed.

Signed-off-by: Alexander Potapenko <glider@google.com>
---
Link: https://linux-review.googlesource.com/id/I86608dd867018683a14ae1870f1928ad925f42e9
---
 mm/page_alloc.c            | 4 ++++
 security/Kconfig.hardening | 4 ++++
 2 files changed, 8 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e8d5a0b2a3264..3a0a5e204df7a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -854,6 +854,10 @@ void init_mem_debugging_and_hardening(void)
 	else
 		static_branch_disable(&init_on_free);
 
+	if (IS_ENABLED(CONFIG_KMSAN) &&
+	    (_init_on_alloc_enabled_early || _init_on_free_enabled_early))
+		pr_info("mem auto-init: please make sure init_on_alloc and init_on_free are disabled when running KMSAN\n");
+
 #ifdef CONFIG_DEBUG_PAGEALLOC
 	if (!debug_pagealloc_enabled())
 		return;
diff --git a/security/Kconfig.hardening b/security/Kconfig.hardening
index bd2aabb2c60f9..2739a6776454e 100644
--- a/security/Kconfig.hardening
+++ b/security/Kconfig.hardening
@@ -106,6 +106,7 @@ choice
 	config INIT_STACK_ALL_PATTERN
 		bool "pattern-init everything (strongest)"
 		depends on CC_HAS_AUTO_VAR_INIT_PATTERN
+		depends on !KMSAN
 		help
 		  Initializes everything on the stack (including padding)
 		  with a specific debug value. This is intended to eliminate
@@ -124,6 +125,7 @@ choice
 	config INIT_STACK_ALL_ZERO
 		bool "zero-init everything (strongest and safest)"
 		depends on CC_HAS_AUTO_VAR_INIT_ZERO
+		depends on !KMSAN
 		help
 		  Initializes everything on the stack (including padding)
 		  with a zero value. This is intended to eliminate all
@@ -218,6 +220,7 @@ config STACKLEAK_RUNTIME_DISABLE
 
 config INIT_ON_ALLOC_DEFAULT_ON
 	bool "Enable heap memory zeroing on allocation by default"
+	depends on !KMSAN
 	help
 	  This has the effect of setting "init_on_alloc=1" on the kernel
 	  command line. This can be disabled with "init_on_alloc=0".
@@ -230,6 +233,7 @@ config INIT_ON_ALLOC_DEFAULT_ON
 
 config INIT_ON_FREE_DEFAULT_ON
 	bool "Enable heap memory zeroing on free by default"
+	depends on !KMSAN
 	help
 	  This has the effect of setting "init_on_free=1" on the kernel
 	  command line. This can be disabled with "init_on_free=0".
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH v4 32/45] objtool: kmsan: list KMSAN API functions as uaccess-safe
  2022-07-01 14:22 [PATCH v4 00/45] Add KernelMemorySanitizer infrastructure Alexander Potapenko
                   ` (30 preceding siblings ...)
  2022-07-01 14:22 ` [PATCH v4 31/45] security: kmsan: fix interoperability with auto-initialization Alexander Potapenko
@ 2022-07-01 14:22 ` Alexander Potapenko
  2022-07-01 14:22 ` [PATCH v4 33/45] x86: kmsan: disable instrumentation of unsupported code Alexander Potapenko
                   ` (12 subsequent siblings)
  44 siblings, 0 replies; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-01 14:22 UTC (permalink / raw)
  To: glider
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, linux-mm, linux-arch,
	linux-kernel

KMSAN inserts API function calls in a lot of places (function entries
and exits, local variables, memory accesses), so they may get called
from the uaccess regions as well.

KMSAN API functions are used to update the metadata (shadow/origin pages)
for kernel memory accesses. The metadata pages for kernel pointers are
also located in the kernel memory, so touching them is not a problem.
For userspace pointers, no metadata is allocated.

If an API function is supposed to read or modify the metadata, it does so
for kernel pointers and ignores userspace pointers.
If an API function is supposed to return a pair of metadata pointers for
the instrumentation to use (like all __msan_metadata_ptr_for_TYPE_SIZE()
functions do), it returns the allocated metadata for kernel pointers and
special dummy buffers residing in the kernel memory for userspace
pointers.

As a result, none of KMSAN API functions perform userspace accesses, but
since they might be called from UACCESS regions they use
user_access_save/restore().

Signed-off-by: Alexander Potapenko <glider@google.com>
---
v3:
 -- updated the patch description

v4:
 -- add kmsan_unpoison_entry_regs()

Link: https://linux-review.googlesource.com/id/I242bc9816273fecad4ea3d977393784396bb3c35
---
 tools/objtool/check.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 864bb9dd35845..1cf260c966441 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -1013,6 +1013,26 @@ static const char *uaccess_safe_builtin[] = {
 	"__sanitizer_cov_trace_cmp4",
 	"__sanitizer_cov_trace_cmp8",
 	"__sanitizer_cov_trace_switch",
+	/* KMSAN */
+	"kmsan_copy_to_user",
+	"kmsan_report",
+	"kmsan_unpoison_entry_regs",
+	"kmsan_unpoison_memory",
+	"__msan_chain_origin",
+	"__msan_get_context_state",
+	"__msan_instrument_asm_store",
+	"__msan_metadata_ptr_for_load_1",
+	"__msan_metadata_ptr_for_load_2",
+	"__msan_metadata_ptr_for_load_4",
+	"__msan_metadata_ptr_for_load_8",
+	"__msan_metadata_ptr_for_load_n",
+	"__msan_metadata_ptr_for_store_1",
+	"__msan_metadata_ptr_for_store_2",
+	"__msan_metadata_ptr_for_store_4",
+	"__msan_metadata_ptr_for_store_8",
+	"__msan_metadata_ptr_for_store_n",
+	"__msan_poison_alloca",
+	"__msan_warning",
 	/* UBSAN */
 	"ubsan_type_mismatch_common",
 	"__ubsan_handle_type_mismatch",
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH v4 33/45] x86: kmsan: disable instrumentation of unsupported code
  2022-07-01 14:22 [PATCH v4 00/45] Add KernelMemorySanitizer infrastructure Alexander Potapenko
                   ` (31 preceding siblings ...)
  2022-07-01 14:22 ` [PATCH v4 32/45] objtool: kmsan: list KMSAN API functions as uaccess-safe Alexander Potapenko
@ 2022-07-01 14:22 ` Alexander Potapenko
  2022-07-12 13:43   ` Marco Elver
  2022-07-01 14:22 ` [PATCH v4 34/45] x86: kmsan: skip shadow checks in __switch_to() Alexander Potapenko
                   ` (11 subsequent siblings)
  44 siblings, 1 reply; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-01 14:22 UTC (permalink / raw)
  To: glider
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, linux-mm, linux-arch,
	linux-kernel

Instrumenting some files with KMSAN will result in kernel being unable
to link, boot or crashing at runtime for various reasons (e.g. infinite
recursion caused by instrumentation hooks calling instrumented code again).

Completely omit KMSAN instrumentation in the following places:
 - arch/x86/boot and arch/x86/realmode/rm, as KMSAN doesn't work for i386;
 - arch/x86/entry/vdso, which isn't linked with KMSAN runtime;
 - three files in arch/x86/kernel - boot problems;
 - arch/x86/mm/cpu_entry_area.c - recursion.

Signed-off-by: Alexander Potapenko <glider@google.com>
---
v2:
 -- moved the patch earlier in the series so that KMSAN can compile
 -- split off the non-x86 part into a separate patch

v3:
 -- added a comment to lib/Makefile

Link: https://linux-review.googlesource.com/id/Id5e5c4a9f9d53c24a35ebb633b814c414628d81b
---
 arch/x86/boot/Makefile            | 1 +
 arch/x86/boot/compressed/Makefile | 1 +
 arch/x86/entry/vdso/Makefile      | 3 +++
 arch/x86/kernel/Makefile          | 2 ++
 arch/x86/kernel/cpu/Makefile      | 1 +
 arch/x86/mm/Makefile              | 2 ++
 arch/x86/realmode/rm/Makefile     | 1 +
 lib/Makefile                      | 2 ++
 8 files changed, 13 insertions(+)

diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index b5aecb524a8aa..d5623232b763f 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -12,6 +12,7 @@
 # Sanitizer runtimes are unavailable and cannot be linked for early boot code.
 KASAN_SANITIZE			:= n
 KCSAN_SANITIZE			:= n
+KMSAN_SANITIZE			:= n
 OBJECT_FILES_NON_STANDARD	:= y
 
 # Kernel does not boot with kcov instrumentation here.
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 19e1905dcbf6f..8d0d4d89a00ae 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -20,6 +20,7 @@
 # Sanitizer runtimes are unavailable and cannot be linked for early boot code.
 KASAN_SANITIZE			:= n
 KCSAN_SANITIZE			:= n
+KMSAN_SANITIZE			:= n
 OBJECT_FILES_NON_STANDARD	:= y
 
 # Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index c2a8b76ae0bce..645bd919f9845 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -11,6 +11,9 @@ include $(srctree)/lib/vdso/Makefile
 
 # Sanitizer runtimes are unavailable and cannot be linked here.
 KASAN_SANITIZE			:= n
+KMSAN_SANITIZE_vclock_gettime.o := n
+KMSAN_SANITIZE_vgetcpu.o	:= n
+
 UBSAN_SANITIZE			:= n
 KCSAN_SANITIZE			:= n
 OBJECT_FILES_NON_STANDARD	:= y
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 4c8b6ae802ac3..4f2617721d3dc 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -33,6 +33,8 @@ KASAN_SANITIZE_sev.o					:= n
 # With some compiler versions the generated code results in boot hangs, caused
 # by several compilation units. To be safe, disable all instrumentation.
 KCSAN_SANITIZE := n
+KMSAN_SANITIZE_head$(BITS).o				:= n
+KMSAN_SANITIZE_nmi.o					:= n
 
 OBJECT_FILES_NON_STANDARD_test_nx.o			:= y
 
diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index 9661e3e802be5..f10a921ee7565 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -12,6 +12,7 @@ endif
 # If these files are instrumented, boot hangs during the first second.
 KCOV_INSTRUMENT_common.o := n
 KCOV_INSTRUMENT_perf_event.o := n
+KMSAN_SANITIZE_common.o := n
 
 # As above, instrumenting secondary CPU boot code causes boot hangs.
 KCSAN_SANITIZE_common.o := n
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index f8220fd2c169a..39c0700c9955c 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -12,6 +12,8 @@ KASAN_SANITIZE_mem_encrypt_identity.o	:= n
 # Disable KCSAN entirely, because otherwise we get warnings that some functions
 # reference __initdata sections.
 KCSAN_SANITIZE := n
+# Avoid recursion by not calling KMSAN hooks for CEA code.
+KMSAN_SANITIZE_cpu_entry_area.o := n
 
 ifdef CONFIG_FUNCTION_TRACER
 CFLAGS_REMOVE_mem_encrypt.o		= -pg
diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
index 83f1b6a56449f..f614009d3e4e2 100644
--- a/arch/x86/realmode/rm/Makefile
+++ b/arch/x86/realmode/rm/Makefile
@@ -10,6 +10,7 @@
 # Sanitizer runtimes are unavailable and cannot be linked here.
 KASAN_SANITIZE			:= n
 KCSAN_SANITIZE			:= n
+KMSAN_SANITIZE			:= n
 OBJECT_FILES_NON_STANDARD	:= y
 
 # Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
diff --git a/lib/Makefile b/lib/Makefile
index 5056769d00bb6..73fea85b76365 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -272,6 +272,8 @@ obj-$(CONFIG_POLYNOMIAL) += polynomial.o
 CFLAGS_stackdepot.o += -fno-builtin
 obj-$(CONFIG_STACKDEPOT) += stackdepot.o
 KASAN_SANITIZE_stackdepot.o := n
+# In particular, instrumenting stackdepot.c with KMSAN will result in infinite
+# recursion.
 KMSAN_SANITIZE_stackdepot.o := n
 KCOV_INSTRUMENT_stackdepot.o := n
 
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH v4 34/45] x86: kmsan: skip shadow checks in __switch_to()
  2022-07-01 14:22 [PATCH v4 00/45] Add KernelMemorySanitizer infrastructure Alexander Potapenko
                   ` (32 preceding siblings ...)
  2022-07-01 14:22 ` [PATCH v4 33/45] x86: kmsan: disable instrumentation of unsupported code Alexander Potapenko
@ 2022-07-01 14:22 ` Alexander Potapenko
  2022-07-01 14:23 ` [PATCH v4 35/45] x86: kmsan: handle open-coded assembly in lib/iomem.c Alexander Potapenko
                   ` (10 subsequent siblings)
  44 siblings, 0 replies; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-01 14:22 UTC (permalink / raw)
  To: glider
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, linux-mm, linux-arch,
	linux-kernel

When instrumenting functions, KMSAN obtains the per-task state (mostly
pointers to metadata for function arguments and return values) once per
function at its beginning, using the `current` pointer.

Every time the instrumented function calls another function, this state
(`struct kmsan_context_state`) is updated with shadow/origin data of the
passed and returned values.

When `current` changes in the low-level arch code, instrumented code can
not notice that, and will still refer to the old state, possibly corrupting
it or using stale data. This may result in false positive reports.

To deal with that, we need to apply __no_kmsan_checks to the functions
performing context switching - this will result in skipping all KMSAN
shadow checks and marking newly created values as initialized,
preventing all false positive reports in those functions. False negatives
are still possible, but we expect them to be rare and impersistent.

Suggested-by: Marco Elver <elver@google.com>
Signed-off-by: Alexander Potapenko <glider@google.com>
Link: https://linux-review.googlesource.com/id/Ib7d4d70946f08128ade207519c1ee405fd812839

---
v2:
 -- This patch was previously called "kmsan: skip shadow checks in files
    doing context switches". Per Mark Rutland's suggestion, we now only
    skip checks in low-level arch-specific code, as context switches in
    common code should be invisible to KMSAN. We also apply the checks
    to precisely the functions performing the context switch instead of
    the whole file.

v4:
 -- Replace KMSAN_ENABLE_CHECKS_process_64.o with __no_kmsan_checks

Link: https://linux-review.googlesource.com/id/I45e3ed9c5f66ee79b0409d1673d66ae419029bcb
---
 arch/x86/kernel/process_64.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 1962008fe7437..6b3418bff3261 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -553,6 +553,7 @@ void compat_start_thread(struct pt_regs *regs, u32 new_ip, u32 new_sp, bool x32)
  * Kprobes not supported here. Set the probe on schedule instead.
  * Function graph tracer not supported too.
  */
+__no_kmsan_checks
 __visible __notrace_funcgraph struct task_struct *
 __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 {
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH v4 35/45] x86: kmsan: handle open-coded assembly in lib/iomem.c
  2022-07-01 14:22 [PATCH v4 00/45] Add KernelMemorySanitizer infrastructure Alexander Potapenko
                   ` (33 preceding siblings ...)
  2022-07-01 14:22 ` [PATCH v4 34/45] x86: kmsan: skip shadow checks in __switch_to() Alexander Potapenko
@ 2022-07-01 14:23 ` Alexander Potapenko
  2022-07-01 14:23 ` [PATCH v4 36/45] x86: kmsan: use __msan_ string functions where possible Alexander Potapenko
                   ` (9 subsequent siblings)
  44 siblings, 0 replies; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-01 14:23 UTC (permalink / raw)
  To: glider
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, linux-mm, linux-arch,
	linux-kernel

KMSAN cannot intercept memory accesses within asm() statements.
That's why we add kmsan_unpoison_memory() and kmsan_check_memory() to
hint it how to handle memory copied from/to I/O memory.

Signed-off-by: Alexander Potapenko <glider@google.com>
---
Link: https://linux-review.googlesource.com/id/Icb16bf17269087e475debf07a7fe7d4bebc3df23
---
 arch/x86/lib/iomem.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/lib/iomem.c b/arch/x86/lib/iomem.c
index 3e2f33fc33de2..e0411a3774d49 100644
--- a/arch/x86/lib/iomem.c
+++ b/arch/x86/lib/iomem.c
@@ -1,6 +1,7 @@
 #include <linux/string.h>
 #include <linux/module.h>
 #include <linux/io.h>
+#include <linux/kmsan-checks.h>
 
 #define movs(type,to,from) \
 	asm volatile("movs" type:"=&D" (to), "=&S" (from):"0" (to), "1" (from):"memory")
@@ -37,6 +38,8 @@ static void string_memcpy_fromio(void *to, const volatile void __iomem *from, si
 		n-=2;
 	}
 	rep_movs(to, (const void *)from, n);
+	/* KMSAN must treat values read from devices as initialized. */
+	kmsan_unpoison_memory(to, n);
 }
 
 static void string_memcpy_toio(volatile void __iomem *to, const void *from, size_t n)
@@ -44,6 +47,8 @@ static void string_memcpy_toio(volatile void __iomem *to, const void *from, size
 	if (unlikely(!n))
 		return;
 
+	/* Make sure uninitialized memory isn't copied to devices. */
+	kmsan_check_memory(from, n);
 	/* Align any unaligned destination IO */
 	if (unlikely(1 & (unsigned long)to)) {
 		movs("b", to, from);
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH v4 36/45] x86: kmsan: use __msan_ string functions where possible
  2022-07-01 14:22 [PATCH v4 00/45] Add KernelMemorySanitizer infrastructure Alexander Potapenko
                   ` (34 preceding siblings ...)
  2022-07-01 14:23 ` [PATCH v4 35/45] x86: kmsan: handle open-coded assembly in lib/iomem.c Alexander Potapenko
@ 2022-07-01 14:23 ` Alexander Potapenko
  2022-07-01 14:23 ` [PATCH v4 37/45] x86: kmsan: sync metadata pages on page fault Alexander Potapenko
                   ` (8 subsequent siblings)
  44 siblings, 0 replies; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-01 14:23 UTC (permalink / raw)
  To: glider
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, linux-mm, linux-arch,
	linux-kernel

Unless stated otherwise (by explicitly calling __memcpy(), __memset() or
__memmove()) we want all string functions to call their __msan_ versions
(e.g. __msan_memcpy() instead of memcpy()), so that shadow and origin
values are updated accordingly.

Bootloader must still use the default string functions to avoid crashes.

Signed-off-by: Alexander Potapenko <glider@google.com>
---

Link: https://linux-review.googlesource.com/id/I7ca9bd6b4f5c9b9816404862ae87ca7984395f33
---
 arch/x86/include/asm/string_64.h | 23 +++++++++++++++++++++--
 include/linux/fortify-string.h   |  2 ++
 2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h
index 6e450827f677a..3b87d889b6e16 100644
--- a/arch/x86/include/asm/string_64.h
+++ b/arch/x86/include/asm/string_64.h
@@ -11,11 +11,23 @@
    function. */
 
 #define __HAVE_ARCH_MEMCPY 1
+#if defined(__SANITIZE_MEMORY__)
+#undef memcpy
+void *__msan_memcpy(void *dst, const void *src, size_t size);
+#define memcpy __msan_memcpy
+#else
 extern void *memcpy(void *to, const void *from, size_t len);
+#endif
 extern void *__memcpy(void *to, const void *from, size_t len);
 
 #define __HAVE_ARCH_MEMSET
+#if defined(__SANITIZE_MEMORY__)
+extern void *__msan_memset(void *s, int c, size_t n);
+#undef memset
+#define memset __msan_memset
+#else
 void *memset(void *s, int c, size_t n);
+#endif
 void *__memset(void *s, int c, size_t n);
 
 #define __HAVE_ARCH_MEMSET16
@@ -55,7 +67,13 @@ static inline void *memset64(uint64_t *s, uint64_t v, size_t n)
 }
 
 #define __HAVE_ARCH_MEMMOVE
+#if defined(__SANITIZE_MEMORY__)
+#undef memmove
+void *__msan_memmove(void *dest, const void *src, size_t len);
+#define memmove __msan_memmove
+#else
 void *memmove(void *dest, const void *src, size_t count);
+#endif
 void *__memmove(void *dest, const void *src, size_t count);
 
 int memcmp(const void *cs, const void *ct, size_t count);
@@ -64,8 +82,7 @@ char *strcpy(char *dest, const char *src);
 char *strcat(char *dest, const char *src);
 int strcmp(const char *cs, const char *ct);
 
-#if defined(CONFIG_KASAN) && !defined(__SANITIZE_ADDRESS__)
-
+#if (defined(CONFIG_KASAN) && !defined(__SANITIZE_ADDRESS__))
 /*
  * For files that not instrumented (e.g. mm/slub.c) we
  * should use not instrumented version of mem* functions.
@@ -73,7 +90,9 @@ int strcmp(const char *cs, const char *ct);
 
 #undef memcpy
 #define memcpy(dst, src, len) __memcpy(dst, src, len)
+#undef memmove
 #define memmove(dst, src, len) __memmove(dst, src, len)
+#undef memset
 #define memset(s, c, n) __memset(s, c, n)
 
 #ifndef __NO_FORTIFY
diff --git a/include/linux/fortify-string.h b/include/linux/fortify-string.h
index 3b401fa0f3746..6c8a1a29d0b63 100644
--- a/include/linux/fortify-string.h
+++ b/include/linux/fortify-string.h
@@ -285,8 +285,10 @@ __FORTIFY_INLINE void fortify_memset_chk(__kernel_size_t size,
  * __builtin_object_size() must be captured here to avoid evaluating argument
  * side-effects further into the macro layers.
  */
+#ifndef CONFIG_KMSAN
 #define memset(p, c, s) __fortify_memset_chk(p, c, s,			\
 		__builtin_object_size(p, 0), __builtin_object_size(p, 1))
+#endif
 
 /*
  * To make sure the compiler can enforce protection against buffer overflows,
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH v4 37/45] x86: kmsan: sync metadata pages on page fault
  2022-07-01 14:22 [PATCH v4 00/45] Add KernelMemorySanitizer infrastructure Alexander Potapenko
                   ` (35 preceding siblings ...)
  2022-07-01 14:23 ` [PATCH v4 36/45] x86: kmsan: use __msan_ string functions where possible Alexander Potapenko
@ 2022-07-01 14:23 ` Alexander Potapenko
  2022-07-01 14:23 ` [PATCH v4 38/45] x86: kasan: kmsan: support CONFIG_GENERIC_CSUM on x86, enable it for KASAN/KMSAN Alexander Potapenko
                   ` (7 subsequent siblings)
  44 siblings, 0 replies; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-01 14:23 UTC (permalink / raw)
  To: glider
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, linux-mm, linux-arch,
	linux-kernel

KMSAN assumes shadow and origin pages for every allocated page are
accessible. For pages between [VMALLOC_START, VMALLOC_END] those metadata
pages start at KMSAN_VMALLOC_SHADOW_START and
KMSAN_VMALLOC_ORIGIN_START, therefore we must sync a bigger memory
region.

Signed-off-by: Alexander Potapenko <glider@google.com>

---

v2:
 -- addressed reports from kernel test robot <lkp@intel.com>

Link: https://linux-review.googlesource.com/id/Ia5bd541e54f1ecc11b86666c3ec87c62ac0bdfb8
---
 arch/x86/mm/fault.c | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index fad8faa29d042..d07fe0801f203 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -260,7 +260,7 @@ static noinline int vmalloc_fault(unsigned long address)
 }
 NOKPROBE_SYMBOL(vmalloc_fault);
 
-void arch_sync_kernel_mappings(unsigned long start, unsigned long end)
+static void __arch_sync_kernel_mappings(unsigned long start, unsigned long end)
 {
 	unsigned long addr;
 
@@ -284,6 +284,27 @@ void arch_sync_kernel_mappings(unsigned long start, unsigned long end)
 	}
 }
 
+void arch_sync_kernel_mappings(unsigned long start, unsigned long end)
+{
+	__arch_sync_kernel_mappings(start, end);
+#ifdef CONFIG_KMSAN
+	/*
+	 * KMSAN maintains two additional metadata page mappings for the
+	 * [VMALLOC_START, VMALLOC_END) range. These mappings start at
+	 * KMSAN_VMALLOC_SHADOW_START and KMSAN_VMALLOC_ORIGIN_START and
+	 * have to be synced together with the vmalloc memory mapping.
+	 */
+	if (start >= VMALLOC_START && end < VMALLOC_END) {
+		__arch_sync_kernel_mappings(
+			start - VMALLOC_START + KMSAN_VMALLOC_SHADOW_START,
+			end - VMALLOC_START + KMSAN_VMALLOC_SHADOW_START);
+		__arch_sync_kernel_mappings(
+			start - VMALLOC_START + KMSAN_VMALLOC_ORIGIN_START,
+			end - VMALLOC_START + KMSAN_VMALLOC_ORIGIN_START);
+	}
+#endif
+}
+
 static bool low_pfn(unsigned long pfn)
 {
 	return pfn < max_low_pfn;
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH v4 38/45] x86: kasan: kmsan: support CONFIG_GENERIC_CSUM on x86, enable it for KASAN/KMSAN
  2022-07-01 14:22 [PATCH v4 00/45] Add KernelMemorySanitizer infrastructure Alexander Potapenko
                   ` (36 preceding siblings ...)
  2022-07-01 14:23 ` [PATCH v4 37/45] x86: kmsan: sync metadata pages on page fault Alexander Potapenko
@ 2022-07-01 14:23 ` Alexander Potapenko
  2022-07-01 14:23 ` [PATCH v4 39/45] x86: fs: kmsan: disable CONFIG_DCACHE_WORD_ACCESS Alexander Potapenko
                   ` (6 subsequent siblings)
  44 siblings, 0 replies; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-01 14:23 UTC (permalink / raw)
  To: glider
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, linux-mm, linux-arch,
	linux-kernel

This is needed to allow memory tools like KASAN and KMSAN see the
memory accesses from the checksum code. Without CONFIG_GENERIC_CSUM the
tools can't see memory accesses originating from handwritten assembly
code.
For KASAN it's a question of detecting more bugs, for KMSAN using the C
implementation also helps avoid false positives originating from
seemingly uninitialized checksum values.

Signed-off-by: Alexander Potapenko <glider@google.com>

---

Link: https://linux-review.googlesource.com/id/I3e95247be55b1112af59dbba07e8cbf34e50a581
---
 arch/x86/Kconfig                |  4 ++++
 arch/x86/include/asm/checksum.h | 16 ++++++++++------
 arch/x86/lib/Makefile           |  2 ++
 3 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index be0b95e51df66..4a5d0a0f54dea 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -324,6 +324,10 @@ config GENERIC_ISA_DMA
 	def_bool y
 	depends on ISA_DMA_API
 
+config GENERIC_CSUM
+	bool
+	default y if KMSAN || KASAN
+
 config GENERIC_BUG
 	def_bool y
 	depends on BUG
diff --git a/arch/x86/include/asm/checksum.h b/arch/x86/include/asm/checksum.h
index bca625a60186c..6df6ece8a28ec 100644
--- a/arch/x86/include/asm/checksum.h
+++ b/arch/x86/include/asm/checksum.h
@@ -1,9 +1,13 @@
 /* SPDX-License-Identifier: GPL-2.0 */
-#define  _HAVE_ARCH_COPY_AND_CSUM_FROM_USER 1
-#define HAVE_CSUM_COPY_USER
-#define _HAVE_ARCH_CSUM_AND_COPY
-#ifdef CONFIG_X86_32
-# include <asm/checksum_32.h>
+#ifdef CONFIG_GENERIC_CSUM
+# include <asm-generic/checksum.h>
 #else
-# include <asm/checksum_64.h>
+# define  _HAVE_ARCH_COPY_AND_CSUM_FROM_USER 1
+# define HAVE_CSUM_COPY_USER
+# define _HAVE_ARCH_CSUM_AND_COPY
+# ifdef CONFIG_X86_32
+#  include <asm/checksum_32.h>
+# else
+#  include <asm/checksum_64.h>
+# endif
 #endif
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index f76747862bd2e..7ba5f61d72735 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -65,7 +65,9 @@ ifneq ($(CONFIG_X86_CMPXCHG64),y)
 endif
 else
         obj-y += iomap_copy_64.o
+ifneq ($(CONFIG_GENERIC_CSUM),y)
         lib-y += csum-partial_64.o csum-copy_64.o csum-wrappers_64.o
+endif
         lib-y += clear_page_64.o copy_page_64.o
         lib-y += memmove_64.o memset_64.o
         lib-y += copy_user_64.o
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH v4 39/45] x86: fs: kmsan: disable CONFIG_DCACHE_WORD_ACCESS
  2022-07-01 14:22 [PATCH v4 00/45] Add KernelMemorySanitizer infrastructure Alexander Potapenko
                   ` (37 preceding siblings ...)
  2022-07-01 14:23 ` [PATCH v4 38/45] x86: kasan: kmsan: support CONFIG_GENERIC_CSUM on x86, enable it for KASAN/KMSAN Alexander Potapenko
@ 2022-07-01 14:23 ` Alexander Potapenko
  2022-07-01 14:23 ` [PATCH v4 40/45] x86: kmsan: don't instrument stack walking functions Alexander Potapenko
                   ` (5 subsequent siblings)
  44 siblings, 0 replies; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-01 14:23 UTC (permalink / raw)
  To: glider
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, linux-mm, linux-arch,
	linux-kernel, Andrey Konovalov

dentry_string_cmp() calls read_word_at_a_time(), which might read
uninitialized bytes to optimize string comparisons.
Disabling CONFIG_DCACHE_WORD_ACCESS should prohibit this optimization,
as well as (probably) similar ones.

Suggested-by: Andrey Konovalov <andreyknvl@gmail.com>
Signed-off-by: Alexander Potapenko <glider@google.com>
---
Link: https://linux-review.googlesource.com/id/I4c0073224ac2897cafb8c037362c49dda9cfa133
---
 arch/x86/Kconfig | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 4a5d0a0f54dea..aadbb16a59f01 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -129,7 +129,9 @@ config X86
 	select CLKEVT_I8253
 	select CLOCKSOURCE_VALIDATE_LAST_CYCLE
 	select CLOCKSOURCE_WATCHDOG
-	select DCACHE_WORD_ACCESS
+	# Word-size accesses may read uninitialized data past the trailing \0
+	# in strings and cause false KMSAN reports.
+	select DCACHE_WORD_ACCESS		if !KMSAN
 	select DYNAMIC_SIGFRAME
 	select EDAC_ATOMIC_SCRUB
 	select EDAC_SUPPORT
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH v4 40/45] x86: kmsan: don't instrument stack walking functions
  2022-07-01 14:22 [PATCH v4 00/45] Add KernelMemorySanitizer infrastructure Alexander Potapenko
                   ` (38 preceding siblings ...)
  2022-07-01 14:23 ` [PATCH v4 39/45] x86: fs: kmsan: disable CONFIG_DCACHE_WORD_ACCESS Alexander Potapenko
@ 2022-07-01 14:23 ` Alexander Potapenko
  2022-07-01 14:23 ` [PATCH v4 41/45] entry: kmsan: introduce kmsan_unpoison_entry_regs() Alexander Potapenko
                   ` (4 subsequent siblings)
  44 siblings, 0 replies; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-01 14:23 UTC (permalink / raw)
  To: glider
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, linux-mm, linux-arch,
	linux-kernel

Upon function exit, KMSAN marks local variables as uninitialized.
Further function calls may result in the compiler creating the stack
frame where these local variables resided. This results in frame
pointers being marked as uninitialized data, which is normally correct,
because they are not stack-allocated.

However stack unwinding functions are supposed to read and dereference
the frame pointers, in which case KMSAN might be reporting uses of
uninitialized values.

To work around that, we mark update_stack_state(), unwind_next_frame()
and show_trace_log_lvl() with __no_kmsan_checks, preventing all KMSAN
reports inside those functions and making them return initialized
values.

Signed-off-by: Alexander Potapenko <glider@google.com>
---
Link: https://linux-review.googlesource.com/id/I7001eaed630277e8d2ddaff1d6f223d54e997a6f
---
 arch/x86/kernel/dumpstack.c    |  6 ++++++
 arch/x86/kernel/unwind_frame.c | 11 +++++++++++
 2 files changed, 17 insertions(+)

diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index afae4dd774951..476eb504084e4 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -177,6 +177,12 @@ static void show_regs_if_on_stack(struct stack_info *info, struct pt_regs *regs,
 	}
 }
 
+/*
+ * This function reads pointers from the stack and dereferences them. The
+ * pointers may not have their KMSAN shadow set up properly, which may result
+ * in false positive reports. Disable instrumentation to avoid those.
+ */
+__no_kmsan_checks
 static void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
 			unsigned long *stack, const char *log_lvl)
 {
diff --git a/arch/x86/kernel/unwind_frame.c b/arch/x86/kernel/unwind_frame.c
index 8e1c50c86e5db..d8ba93778ae32 100644
--- a/arch/x86/kernel/unwind_frame.c
+++ b/arch/x86/kernel/unwind_frame.c
@@ -183,6 +183,16 @@ static struct pt_regs *decode_frame_pointer(unsigned long *bp)
 }
 #endif
 
+/*
+ * While walking the stack, KMSAN may stomp on stale locals from other
+ * functions that were marked as uninitialized upon function exit, and
+ * now hold the call frame information for the current function (e.g. the frame
+ * pointer). Because KMSAN does not specifically mark call frames as
+ * initialized, false positive reports are possible. To prevent such reports,
+ * we mark the functions scanning the stack (here and below) with
+ * __no_kmsan_checks.
+ */
+__no_kmsan_checks
 static bool update_stack_state(struct unwind_state *state,
 			       unsigned long *next_bp)
 {
@@ -250,6 +260,7 @@ static bool update_stack_state(struct unwind_state *state,
 	return true;
 }
 
+__no_kmsan_checks
 bool unwind_next_frame(struct unwind_state *state)
 {
 	struct pt_regs *regs;
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH v4 41/45] entry: kmsan: introduce kmsan_unpoison_entry_regs()
  2022-07-01 14:22 [PATCH v4 00/45] Add KernelMemorySanitizer infrastructure Alexander Potapenko
                   ` (39 preceding siblings ...)
  2022-07-01 14:23 ` [PATCH v4 40/45] x86: kmsan: don't instrument stack walking functions Alexander Potapenko
@ 2022-07-01 14:23 ` Alexander Potapenko
  2022-07-01 14:23 ` [PATCH v4 42/45] bpf: kmsan: initialize BPF registers with zeroes Alexander Potapenko
                   ` (3 subsequent siblings)
  44 siblings, 0 replies; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-01 14:23 UTC (permalink / raw)
  To: glider
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, linux-mm, linux-arch,
	linux-kernel

struct pt_regs passed into IRQ entry code is set up by uninstrumented
asm functions, therefore KMSAN may not notice the registers are
initialized.

kmsan_unpoison_entry_regs() unpoisons the contents of struct pt_regs,
preventing potential false positives. Unlike kmsan_unpoison_memory(),
it can be called under kmsan_in_runtime(), which is often the case in
IRQ entry code.

Signed-off-by: Alexander Potapenko <glider@google.com>
---
Link: https://linux-review.googlesource.com/id/Ibfd7018ac847fd8e5491681f508ba5d14e4669cf
---
 include/linux/kmsan.h | 15 +++++++++++++++
 kernel/entry/common.c |  5 +++++
 mm/kmsan/hooks.c      | 27 +++++++++++++++++++++++++++
 3 files changed, 47 insertions(+)

diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h
index e8b5c306c4aa1..c4412622b9a78 100644
--- a/include/linux/kmsan.h
+++ b/include/linux/kmsan.h
@@ -246,6 +246,17 @@ void kmsan_handle_dma_sg(struct scatterlist *sg, int nents,
  */
 void kmsan_handle_urb(const struct urb *urb, bool is_out);
 
+/**
+ * kmsan_unpoison_entry_regs() - Handle pt_regs in low-level entry code.
+ * @regs:	struct pt_regs pointer received from assembly code.
+ *
+ * KMSAN unpoisons the contents of the passed pt_regs, preventing potential
+ * false positive reports. Unlike kmsan_unpoison_memory(),
+ * kmsan_unpoison_entry_regs() can be called from the regions where
+ * kmsan_in_runtime() returns true, which is the case in early entry code.
+ */
+void kmsan_unpoison_entry_regs(const struct pt_regs *regs);
+
 #else
 
 static inline void kmsan_init_shadow(void)
@@ -342,6 +353,10 @@ static inline void kmsan_handle_urb(const struct urb *urb, bool is_out)
 {
 }
 
+static inline void kmsan_unpoison_entry_regs(const struct pt_regs *regs)
+{
+}
+
 #endif
 
 #endif /* _LINUX_KMSAN_H */
diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index 032f164abe7ce..055d3bdb0442c 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -5,6 +5,7 @@
 #include <linux/resume_user_mode.h>
 #include <linux/highmem.h>
 #include <linux/jump_label.h>
+#include <linux/kmsan.h>
 #include <linux/livepatch.h>
 #include <linux/audit.h>
 #include <linux/tick.h>
@@ -24,6 +25,7 @@ static __always_inline void __enter_from_user_mode(struct pt_regs *regs)
 	user_exit_irqoff();
 
 	instrumentation_begin();
+	kmsan_unpoison_entry_regs(regs);
 	trace_hardirqs_off_finish();
 	instrumentation_end();
 }
@@ -352,6 +354,7 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs *regs)
 		lockdep_hardirqs_off(CALLER_ADDR0);
 		rcu_irq_enter();
 		instrumentation_begin();
+		kmsan_unpoison_entry_regs(regs);
 		trace_hardirqs_off_finish();
 		instrumentation_end();
 
@@ -367,6 +370,7 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs *regs)
 	 */
 	lockdep_hardirqs_off(CALLER_ADDR0);
 	instrumentation_begin();
+	kmsan_unpoison_entry_regs(regs);
 	rcu_irq_enter_check_tick();
 	trace_hardirqs_off_finish();
 	instrumentation_end();
@@ -452,6 +456,7 @@ irqentry_state_t noinstr irqentry_nmi_enter(struct pt_regs *regs)
 	rcu_nmi_enter();
 
 	instrumentation_begin();
+	kmsan_unpoison_entry_regs(regs);
 	trace_hardirqs_off_finish();
 	ftrace_nmi_enter();
 	instrumentation_end();
diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
index 9aecbf2825837..c7528bcbb2f91 100644
--- a/mm/kmsan/hooks.c
+++ b/mm/kmsan/hooks.c
@@ -358,6 +358,33 @@ void kmsan_unpoison_memory(const void *address, size_t size)
 }
 EXPORT_SYMBOL(kmsan_unpoison_memory);
 
+/*
+ * Version of kmsan_unpoison_memory() that can be called from within the KMSAN
+ * runtime.
+ *
+ * Non-instrumented IRQ entry functions receive struct pt_regs from assembly
+ * code. Those regs need to be unpoisoned, otherwise using them will result in
+ * false positives.
+ * Using kmsan_unpoison_memory() is not an option in entry code, because the
+ * return value of in_task() is inconsistent - as a result, certain calls to
+ * kmsan_unpoison_memory() are ignored. kmsan_unpoison_entry_regs() ensures that
+ * the registers are unpoisoned even if kmsan_in_runtime() is true in the early
+ * entry code.
+ */
+void kmsan_unpoison_entry_regs(const struct pt_regs *regs)
+{
+	unsigned long ua_flags;
+
+	if (!kmsan_enabled)
+		return;
+
+	ua_flags = user_access_save();
+	kmsan_internal_unpoison_memory((void *)regs, sizeof(*regs),
+				       KMSAN_POISON_NOCHECK);
+	user_access_restore(ua_flags);
+}
+EXPORT_SYMBOL(kmsan_unpoison_entry_regs);
+
 void kmsan_check_memory(const void *addr, size_t size)
 {
 	if (!kmsan_enabled)
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH v4 42/45] bpf: kmsan: initialize BPF registers with zeroes
  2022-07-01 14:22 [PATCH v4 00/45] Add KernelMemorySanitizer infrastructure Alexander Potapenko
                   ` (40 preceding siblings ...)
  2022-07-01 14:23 ` [PATCH v4 41/45] entry: kmsan: introduce kmsan_unpoison_entry_regs() Alexander Potapenko
@ 2022-07-01 14:23 ` Alexander Potapenko
  2022-07-01 14:23 ` [PATCH v4 43/45] namei: initialize parameters passed to step_into() Alexander Potapenko
                   ` (2 subsequent siblings)
  44 siblings, 0 replies; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-01 14:23 UTC (permalink / raw)
  To: glider
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, linux-mm, linux-arch,
	linux-kernel

When executing BPF programs, certain registers may get passed
uninitialized to helper functions. E.g. when performing a JMP_CALL,
registers BPF_R1-BPF_R5 are always passed to the helper, no matter how
many of them are actually used.

Passing uninitialized values as function parameters is technically
undefined behavior, so we work around it by always initializing the
registers.

Signed-off-by: Alexander Potapenko <glider@google.com>
---
Link: https://linux-review.googlesource.com/id/I40f39d26232b14816c14ba64a0ea4a8f336f2675
---
 kernel/bpf/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 5f6f3f829b368..0ba7dd90a2ab3 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2039,7 +2039,7 @@ static u64 ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn)
 static unsigned int PROG_NAME(stack_size)(const void *ctx, const struct bpf_insn *insn) \
 { \
 	u64 stack[stack_size / sizeof(u64)]; \
-	u64 regs[MAX_BPF_EXT_REG]; \
+	u64 regs[MAX_BPF_EXT_REG] = {}; \
 \
 	FP = (u64) (unsigned long) &stack[ARRAY_SIZE(stack)]; \
 	ARG1 = (u64) (unsigned long) ctx; \
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH v4 43/45] namei: initialize parameters passed to step_into()
  2022-07-01 14:22 [PATCH v4 00/45] Add KernelMemorySanitizer infrastructure Alexander Potapenko
                   ` (41 preceding siblings ...)
  2022-07-01 14:23 ` [PATCH v4 42/45] bpf: kmsan: initialize BPF registers with zeroes Alexander Potapenko
@ 2022-07-01 14:23 ` Alexander Potapenko
  2022-07-02 17:23   ` Linus Torvalds
  2022-08-08 16:37   ` Alexander Potapenko
  2022-07-01 14:23 ` [PATCH v4 44/45] mm: fs: initialize fsdata passed to write_begin/write_end interface Alexander Potapenko
  2022-07-01 14:23 ` [PATCH v4 45/45] x86: kmsan: enable KMSAN builds for x86 Alexander Potapenko
  44 siblings, 2 replies; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-01 14:23 UTC (permalink / raw)
  To: glider
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, linux-mm, linux-arch,
	linux-kernel, Evgenii Stepanov, Linus Torvalds,
	Nathan Chancellor, Nick Desaulniers, Segher Boessenkool,
	Vitaly Buka, linux-toolchains

Under certain circumstances initialization of `unsigned seq` and
`struct inode *inode` passed into step_into() may be skipped.
In particular, if the call to lookup_fast() in walk_component()
returns NULL, and lookup_slow() returns a valid dentry, then the
`seq` and `inode` will remain uninitialized until the call to
step_into() (see [1] for more info).

Right now step_into() does not use these uninitialized values,
yet passing uninitialized values to functions is considered undefined
behavior (see [2]). To fix that, we initialize `seq` and `inode` at
definition.

[1] https://github.com/ClangBuiltLinux/linux/issues/1648#issuecomment-1146608063
[2] https://lore.kernel.org/linux-toolchains/CAHk-=whjz3wO8zD+itoerphWem+JZz4uS3myf6u1Wd6epGRgmQ@mail.gmail.com/

Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Marco Elver <elver@google.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vitaly Buka <vitalybuka@google.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-toolchains@vger.kernel.org
Signed-off-by: Alexander Potapenko <glider@google.com>
---
Link: https://linux-review.googlesource.com/id/I94d4e8cc1f0ecc7174659e9506ce96aaf2201d0a
---
 fs/namei.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 1f28d3f463c3b..6b39dfd3b41bc 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1995,8 +1995,8 @@ static const char *handle_dots(struct nameidata *nd, int type)
 static const char *walk_component(struct nameidata *nd, int flags)
 {
 	struct dentry *dentry;
-	struct inode *inode;
-	unsigned seq;
+	struct inode *inode = NULL;
+	unsigned seq = 0;
 	/*
 	 * "." and ".." are special - ".." especially so because it has
 	 * to be able to know about the current root directory and
@@ -3393,8 +3393,8 @@ static const char *open_last_lookups(struct nameidata *nd,
 	struct dentry *dir = nd->path.dentry;
 	int open_flag = op->open_flag;
 	bool got_write = false;
-	unsigned seq;
-	struct inode *inode;
+	unsigned seq = 0;
+	struct inode *inode = NULL;
 	struct dentry *dentry;
 	const char *res;
 
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH v4 44/45] mm: fs: initialize fsdata passed to write_begin/write_end interface
  2022-07-01 14:22 [PATCH v4 00/45] Add KernelMemorySanitizer infrastructure Alexander Potapenko
                   ` (42 preceding siblings ...)
  2022-07-01 14:23 ` [PATCH v4 43/45] namei: initialize parameters passed to step_into() Alexander Potapenko
@ 2022-07-01 14:23 ` Alexander Potapenko
  2022-07-04 20:07   ` Matthew Wilcox
  2022-07-01 14:23 ` [PATCH v4 45/45] x86: kmsan: enable KMSAN builds for x86 Alexander Potapenko
  44 siblings, 1 reply; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-01 14:23 UTC (permalink / raw)
  To: glider
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, linux-mm, linux-arch,
	linux-kernel

Functions implementing the a_ops->write_end() interface accept the
`void *fsdata` parameter that is supposed to be initialized by the
corresponding a_ops->write_begin() (which accepts `void **fsdata`).

However not all a_ops->write_begin() implementations initialize `fsdata`
unconditionally, so it may get passed uninitialized to a_ops->write_end(),
resulting in undefined behavior.

Fix this by initializing fsdata with NULL before the call to
write_begin(), rather than doing so in all possible a_ops
implementations.

This patch covers only the following cases found by running x86 KMSAN
under syzkaller:

 - generic_perform_write()
 - cont_expand_zero() and generic_cont_expand_simple()
 - page_symlink()

Other cases of passing uninitialized fsdata may persist in the codebase.

Signed-off-by: Alexander Potapenko <glider@google.com>
---
Link: https://linux-review.googlesource.com/id/I414f0ee3a164c9c335d91d82ce4558f6f2841471
---
 fs/buffer.c  | 4 ++--
 fs/namei.c   | 2 +-
 mm/filemap.c | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 898c7f301b1b9..d014009cff941 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2349,7 +2349,7 @@ int generic_cont_expand_simple(struct inode *inode, loff_t size)
 	struct address_space *mapping = inode->i_mapping;
 	const struct address_space_operations *aops = mapping->a_ops;
 	struct page *page;
-	void *fsdata;
+	void *fsdata = NULL;
 	int err;
 
 	err = inode_newsize_ok(inode, size);
@@ -2375,7 +2375,7 @@ static int cont_expand_zero(struct file *file, struct address_space *mapping,
 	const struct address_space_operations *aops = mapping->a_ops;
 	unsigned int blocksize = i_blocksize(inode);
 	struct page *page;
-	void *fsdata;
+	void *fsdata = NULL;
 	pgoff_t index, curidx;
 	loff_t curpos;
 	unsigned zerofrom, offset, len;
diff --git a/fs/namei.c b/fs/namei.c
index 6b39dfd3b41bc..5e3ff9d65f502 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -5051,7 +5051,7 @@ int page_symlink(struct inode *inode, const char *symname, int len)
 	const struct address_space_operations *aops = mapping->a_ops;
 	bool nofs = !mapping_gfp_constraint(mapping, __GFP_FS);
 	struct page *page;
-	void *fsdata;
+	void *fsdata = NULL;
 	int err;
 	unsigned int flags;
 
diff --git a/mm/filemap.c b/mm/filemap.c
index ffdfbc8b0e3ca..72467f00f1916 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3753,7 +3753,7 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i)
 		unsigned long offset;	/* Offset into pagecache page */
 		unsigned long bytes;	/* Bytes to write to page */
 		size_t copied;		/* Bytes copied from user */
-		void *fsdata;
+		void *fsdata = NULL;
 
 		offset = (pos & (PAGE_SIZE - 1));
 		bytes = min_t(unsigned long, PAGE_SIZE - offset,
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH v4 45/45] x86: kmsan: enable KMSAN builds for x86
  2022-07-01 14:22 [PATCH v4 00/45] Add KernelMemorySanitizer infrastructure Alexander Potapenko
                   ` (43 preceding siblings ...)
  2022-07-01 14:23 ` [PATCH v4 44/45] mm: fs: initialize fsdata passed to write_begin/write_end interface Alexander Potapenko
@ 2022-07-01 14:23 ` Alexander Potapenko
  44 siblings, 0 replies; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-01 14:23 UTC (permalink / raw)
  To: glider
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, linux-mm, linux-arch,
	linux-kernel

Make KMSAN usable by adding the necessary Kconfig bits.

Also declare x86-specific functions checking address validity
in arch/x86/include/asm/kmsan.h.

Signed-off-by: Alexander Potapenko <glider@google.com>
---
v4:
 -- per Marco Elver's request, create arch/x86/include/asm/kmsan.h
    and move arch-specific inline functions there.

Link: https://linux-review.googlesource.com/id/I1d295ce8159ce15faa496d20089d953a919c125e
---
 arch/x86/Kconfig             |  1 +
 arch/x86/include/asm/kmsan.h | 55 ++++++++++++++++++++++++++++++++++++
 2 files changed, 56 insertions(+)
 create mode 100644 arch/x86/include/asm/kmsan.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index aadbb16a59f01..d1a601111b277 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -169,6 +169,7 @@ config X86
 	select HAVE_ARCH_KASAN			if X86_64
 	select HAVE_ARCH_KASAN_VMALLOC		if X86_64
 	select HAVE_ARCH_KFENCE
+	select HAVE_ARCH_KMSAN			if X86_64
 	select HAVE_ARCH_KGDB
 	select HAVE_ARCH_MMAP_RND_BITS		if MMU
 	select HAVE_ARCH_MMAP_RND_COMPAT_BITS	if MMU && COMPAT
diff --git a/arch/x86/include/asm/kmsan.h b/arch/x86/include/asm/kmsan.h
new file mode 100644
index 0000000000000..a790b865d0a68
--- /dev/null
+++ b/arch/x86/include/asm/kmsan.h
@@ -0,0 +1,55 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * x86 KMSAN support.
+ *
+ * Copyright (C) 2022, Google LLC
+ * Author: Alexander Potapenko <glider@google.com>
+ */
+
+#ifndef _ASM_X86_KMSAN_H
+#define _ASM_X86_KMSAN_H
+
+#ifndef MODULE
+
+#include <asm/processor.h>
+#include <linux/mmzone.h>
+
+/*
+ * Taken from arch/x86/mm/physaddr.h to avoid using an instrumented version.
+ */
+static inline bool kmsan_phys_addr_valid(unsigned long addr)
+{
+	if (IS_ENABLED(CONFIG_PHYS_ADDR_T_64BIT))
+		return !(addr >> boot_cpu_data.x86_phys_bits);
+	else
+		return true;
+}
+
+/*
+ * Taken from arch/x86/mm/physaddr.c to avoid using an instrumented version.
+ */
+static inline bool kmsan_virt_addr_valid(void *addr)
+{
+	unsigned long x = (unsigned long)addr;
+	unsigned long y = x - __START_KERNEL_map;
+
+	/* use the carry flag to determine if x was < __START_KERNEL_map */
+	if (unlikely(x > y)) {
+		x = y + phys_base;
+
+		if (y >= KERNEL_IMAGE_SIZE)
+			return false;
+	} else {
+		x = y + (__START_KERNEL_map - PAGE_OFFSET);
+
+		/* carry flag will be set if starting x was >= PAGE_OFFSET */
+		if ((x > y) || !kmsan_phys_addr_valid(x))
+			return false;
+	}
+
+	return pfn_valid(x >> PAGE_SHIFT);
+}
+
+#endif /* !MODULE */
+
+#endif /* _ASM_X86_KMSAN_H */
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 11/45] kmsan: add KMSAN runtime core
  2022-07-01 14:22 ` [PATCH v4 11/45] kmsan: add KMSAN runtime core Alexander Potapenko
@ 2022-07-02  0:18   ` Hillf Danton
  2022-08-03 17:25     ` Alexander Potapenko
  2022-07-11 16:49   ` Marco Elver
  2022-07-13 10:04   ` Marco Elver
  2 siblings, 1 reply; 147+ messages in thread
From: Hillf Danton @ 2022-07-02  0:18 UTC (permalink / raw)
  To: Alexander Potapenko; +Cc: linux-mm, linux-kernel

On Fri,  1 Jul 2022 16:22:36 +0200 Alexander Potapenko wrote:
> +
> +bool kmsan_internal_is_module_addr(void *vaddr)
> +{
> +	return ((u64)vaddr >=3D MODULES_VADDR) && ((u64)vaddr < MODULES_END);
> +}
> +
> +bool kmsan_internal_is_vmalloc_addr(void *addr)
> +{
> +	return ((u64)addr >=3D VMALLOC_START) && ((u64)addr < VMALLOC_END);
> +}

Given is_vmalloc_addr(), feel free to add a one-line comment showing the
reason for adding the kmsan internal version.

Hillf


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 04/45] x86: asm: instrument usercopy in get_user() and __put_user_size()
  2022-07-01 14:22 ` [PATCH v4 04/45] x86: asm: instrument usercopy in get_user() and __put_user_size() Alexander Potapenko
@ 2022-07-02  3:47   ` kernel test robot
  2022-07-15 14:03       ` Alexander Potapenko
  2022-07-02 10:45   ` kernel test robot
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 147+ messages in thread
From: kernel test robot @ 2022-07-02  3:47 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: kbuild-all, Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Linux Memory Management List, Andrey Konovalov, Andy Lutomirski,
	Arnd Bergmann, Borislav Petkov, Christoph Hellwig,
	Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Dumazet,
	Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar,
	Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner

Hi Alexander,

I love your patch! Perhaps something to improve:

[auto build test WARNING on masahiroy-kbuild/for-next]
[also build test WARNING on linus/master v5.19-rc4 next-20220701]
[cannot apply to tip/x86/core tip/x86/mm]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Alexander-Potapenko/Add-KernelMemorySanitizer-infrastructure/20220701-222712
base:   https://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild.git for-next
config: i386-randconfig-s002 (https://download.01.org/0day-ci/archive/20220702/202207021129.palrTLrL-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-3) 11.3.0
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.4-39-gce1a6720-dirty
        # https://github.com/intel-lab-lkp/linux/commit/0ca0e4029535365a65588446ba55a952ca186079
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Alexander-Potapenko/Add-KernelMemorySanitizer-infrastructure/20220701-222712
        git checkout 0ca0e4029535365a65588446ba55a952ca186079
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=i386 SHELL=/bin/bash arch/x86/kernel/ mm/

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)
   arch/x86/kernel/signal.c:360:9: sparse: sparse: cast removes address space '__user' of expression
   arch/x86/kernel/signal.c:360:9: sparse: sparse: cast removes address space '__user' of expression
   arch/x86/kernel/signal.c:360:9: sparse: sparse: cast removes address space '__user' of expression
   arch/x86/kernel/signal.c:360:9: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected void const volatile [noderef] __user *ptr @@     got unsigned long long [usertype] * @@
   arch/x86/kernel/signal.c:360:9: sparse:     expected void const volatile [noderef] __user *ptr
   arch/x86/kernel/signal.c:360:9: sparse:     got unsigned long long [usertype] *
   arch/x86/kernel/signal.c:360:9: sparse: sparse: cast removes address space '__user' of expression
>> arch/x86/kernel/signal.c:360:9: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected void [noderef] __user *to @@     got unsigned long long [usertype] * @@
   arch/x86/kernel/signal.c:360:9: sparse:     expected void [noderef] __user *to
   arch/x86/kernel/signal.c:360:9: sparse:     got unsigned long long [usertype] *
   arch/x86/kernel/signal.c:360:9: sparse: sparse: cast removes address space '__user' of expression
   arch/x86/kernel/signal.c:360:9: sparse: sparse: cast removes address space '__user' of expression
   arch/x86/kernel/signal.c:360:9: sparse: sparse: cast removes address space '__user' of expression
   arch/x86/kernel/signal.c:360:9: sparse: sparse: cast removes address space '__user' of expression
   arch/x86/kernel/signal.c:360:9: sparse: sparse: cast removes address space '__user' of expression
   arch/x86/kernel/signal.c:360:9: sparse: sparse: cast removes address space '__user' of expression
   arch/x86/kernel/signal.c:360:9: sparse: sparse: cast removes address space '__user' of expression
   arch/x86/kernel/signal.c:360:9: sparse: sparse: cast removes address space '__user' of expression
   arch/x86/kernel/signal.c:360:9: sparse: sparse: cast removes address space '__user' of expression
   arch/x86/kernel/signal.c:360:9: sparse: sparse: cast removes address space '__user' of expression
   arch/x86/kernel/signal.c:420:9: sparse: sparse: cast removes address space '__user' of expression
   arch/x86/kernel/signal.c:420:9: sparse: sparse: cast removes address space '__user' of expression
   arch/x86/kernel/signal.c:420:9: sparse: sparse: cast removes address space '__user' of expression
   arch/x86/kernel/signal.c:420:9: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected void const volatile [noderef] __user *ptr @@     got unsigned long long [usertype] * @@
   arch/x86/kernel/signal.c:420:9: sparse:     expected void const volatile [noderef] __user *ptr
   arch/x86/kernel/signal.c:420:9: sparse:     got unsigned long long [usertype] *
   arch/x86/kernel/signal.c:420:9: sparse: sparse: cast removes address space '__user' of expression
   arch/x86/kernel/signal.c:420:9: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected void [noderef] __user *to @@     got unsigned long long [usertype] * @@
   arch/x86/kernel/signal.c:420:9: sparse:     expected void [noderef] __user *to
   arch/x86/kernel/signal.c:420:9: sparse:     got unsigned long long [usertype] *
   arch/x86/kernel/signal.c:420:9: sparse: sparse: cast removes address space '__user' of expression
   arch/x86/kernel/signal.c:420:9: sparse: sparse: cast removes address space '__user' of expression
   arch/x86/kernel/signal.c:420:9: sparse: sparse: cast removes address space '__user' of expression
   arch/x86/kernel/signal.c:420:9: sparse: sparse: cast removes address space '__user' of expression
   arch/x86/kernel/signal.c:420:9: sparse: sparse: cast removes address space '__user' of expression
   arch/x86/kernel/signal.c:420:9: sparse: sparse: cast removes address space '__user' of expression
   arch/x86/kernel/signal.c:420:9: sparse: sparse: cast removes address space '__user' of expression
   arch/x86/kernel/signal.c:420:9: sparse: sparse: cast removes address space '__user' of expression
   arch/x86/kernel/signal.c:420:9: sparse: sparse: cast removes address space '__user' of expression
   arch/x86/kernel/signal.c:420:9: sparse: sparse: cast removes address space '__user' of expression
   arch/x86/kernel/signal.c:953:9: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct lockdep_map const *lock @@     got struct lockdep_map [noderef] __rcu * @@
   arch/x86/kernel/signal.c:953:9: sparse:     expected struct lockdep_map const *lock
   arch/x86/kernel/signal.c:953:9: sparse:     got struct lockdep_map [noderef] __rcu *

vim +360 arch/x86/kernel/signal.c

75779f05264b996 arch/x86/kernel/signal.c    Hiroshi Shimamoto 2009-02-27  325  
7e907f48980d666 arch/x86/kernel/signal_32.c Ingo Molnar       2008-03-06  326  static int
235b80226b986da arch/x86/kernel/signal.c    Al Viro           2012-11-09  327  __setup_frame(int sig, struct ksignal *ksig, sigset_t *set,
7e907f48980d666 arch/x86/kernel/signal_32.c Ingo Molnar       2008-03-06  328  	      struct pt_regs *regs)
^1da177e4c3f415 arch/i386/kernel/signal.c   Linus Torvalds    2005-04-16  329  {
^1da177e4c3f415 arch/i386/kernel/signal.c   Linus Torvalds    2005-04-16  330  	struct sigframe __user *frame;
7e907f48980d666 arch/x86/kernel/signal_32.c Ingo Molnar       2008-03-06  331  	void __user *restorer;
b00d8f8f0b2b392 arch/x86/kernel/signal.c    Al Viro           2020-02-15  332  	void __user *fp = NULL;
^1da177e4c3f415 arch/i386/kernel/signal.c   Linus Torvalds    2005-04-16  333  
b00d8f8f0b2b392 arch/x86/kernel/signal.c    Al Viro           2020-02-15  334  	frame = get_sigframe(&ksig->ka, regs, sizeof(*frame), &fp);
^1da177e4c3f415 arch/i386/kernel/signal.c   Linus Torvalds    2005-04-16  335  
5c1f178094631e8 arch/x86/kernel/signal.c    Al Viro           2020-02-15  336  	if (!user_access_begin(frame, sizeof(*frame)))
3d0aedd9538e6be arch/x86/kernel/signal_32.c Hiroshi Shimamoto 2008-09-12  337  		return -EFAULT;
^1da177e4c3f415 arch/i386/kernel/signal.c   Linus Torvalds    2005-04-16  338  
5c1f178094631e8 arch/x86/kernel/signal.c    Al Viro           2020-02-15  339  	unsafe_put_user(sig, &frame->sig, Efault);
b00d8f8f0b2b392 arch/x86/kernel/signal.c    Al Viro           2020-02-15  340  	unsafe_put_sigcontext(&frame->sc, fp, regs, set, Efault);
5c1f178094631e8 arch/x86/kernel/signal.c    Al Viro           2020-02-15  341  	unsafe_put_user(set->sig[1], &frame->extramask[0], Efault);
1a3e4ca41c5a389 arch/x86/kernel/signal_32.c Roland McGrath    2008-04-09  342  	if (current->mm->context.vdso)
6f121e548f83674 arch/x86/kernel/signal.c    Andy Lutomirski   2014-05-05  343  		restorer = current->mm->context.vdso +
0a6d1fa0d2b48fb arch/x86/kernel/signal.c    Andy Lutomirski   2015-10-05  344  			vdso_image_32.sym___kernel_sigreturn;
9fbbd4dd17d0712 arch/i386/kernel/signal.c   Andi Kleen        2007-02-13  345  	else
ade1af77129dea6 arch/x86/kernel/signal_32.c Jan Engelhardt    2008-01-30  346  		restorer = &frame->retcode;
235b80226b986da arch/x86/kernel/signal.c    Al Viro           2012-11-09  347  	if (ksig->ka.sa.sa_flags & SA_RESTORER)
235b80226b986da arch/x86/kernel/signal.c    Al Viro           2012-11-09  348  		restorer = ksig->ka.sa.sa_restorer;
^1da177e4c3f415 arch/i386/kernel/signal.c   Linus Torvalds    2005-04-16  349  
^1da177e4c3f415 arch/i386/kernel/signal.c   Linus Torvalds    2005-04-16  350  	/* Set up to return from userspace.  */
5c1f178094631e8 arch/x86/kernel/signal.c    Al Viro           2020-02-15  351  	unsafe_put_user(restorer, &frame->pretcode, Efault);
^1da177e4c3f415 arch/i386/kernel/signal.c   Linus Torvalds    2005-04-16  352  
^1da177e4c3f415 arch/i386/kernel/signal.c   Linus Torvalds    2005-04-16  353  	/*
7e907f48980d666 arch/x86/kernel/signal_32.c Ingo Molnar       2008-03-06  354  	 * This is popl %eax ; movl $__NR_sigreturn, %eax ; int $0x80
^1da177e4c3f415 arch/i386/kernel/signal.c   Linus Torvalds    2005-04-16  355  	 *
^1da177e4c3f415 arch/i386/kernel/signal.c   Linus Torvalds    2005-04-16  356  	 * WE DO NOT USE IT ANY MORE! It's only left here for historical
^1da177e4c3f415 arch/i386/kernel/signal.c   Linus Torvalds    2005-04-16  357  	 * reasons and because gdb uses it as a signature to notice
^1da177e4c3f415 arch/i386/kernel/signal.c   Linus Torvalds    2005-04-16  358  	 * signal handler stack frames.
^1da177e4c3f415 arch/i386/kernel/signal.c   Linus Torvalds    2005-04-16  359  	 */
5c1f178094631e8 arch/x86/kernel/signal.c    Al Viro           2020-02-15 @360  	unsafe_put_user(*((u64 *)&retcode), (u64 *)frame->retcode, Efault);
5c1f178094631e8 arch/x86/kernel/signal.c    Al Viro           2020-02-15  361  	user_access_end();
^1da177e4c3f415 arch/i386/kernel/signal.c   Linus Torvalds    2005-04-16  362  
^1da177e4c3f415 arch/i386/kernel/signal.c   Linus Torvalds    2005-04-16  363  	/* Set up registers for signal handler */
65ea5b034990358 arch/x86/kernel/signal_32.c H. Peter Anvin    2008-01-30  364  	regs->sp = (unsigned long)frame;
235b80226b986da arch/x86/kernel/signal.c    Al Viro           2012-11-09  365  	regs->ip = (unsigned long)ksig->ka.sa.sa_handler;
65ea5b034990358 arch/x86/kernel/signal_32.c H. Peter Anvin    2008-01-30  366  	regs->ax = (unsigned long)sig;
92bc2056855b325 arch/x86/kernel/signal_32.c Harvey Harrison   2008-02-08  367  	regs->dx = 0;
92bc2056855b325 arch/x86/kernel/signal_32.c Harvey Harrison   2008-02-08  368  	regs->cx = 0;
^1da177e4c3f415 arch/i386/kernel/signal.c   Linus Torvalds    2005-04-16  369  
65ea5b034990358 arch/x86/kernel/signal_32.c H. Peter Anvin    2008-01-30  370  	regs->ds = __USER_DS;
65ea5b034990358 arch/x86/kernel/signal_32.c H. Peter Anvin    2008-01-30  371  	regs->es = __USER_DS;
65ea5b034990358 arch/x86/kernel/signal_32.c H. Peter Anvin    2008-01-30  372  	regs->ss = __USER_DS;
65ea5b034990358 arch/x86/kernel/signal_32.c H. Peter Anvin    2008-01-30  373  	regs->cs = __USER_CS;
^1da177e4c3f415 arch/i386/kernel/signal.c   Linus Torvalds    2005-04-16  374  
283828f3c19ceb3 arch/i386/kernel/signal.c   David Howells     2006-01-18  375  	return 0;
b00d8f8f0b2b392 arch/x86/kernel/signal.c    Al Viro           2020-02-15  376  
b00d8f8f0b2b392 arch/x86/kernel/signal.c    Al Viro           2020-02-15  377  Efault:
b00d8f8f0b2b392 arch/x86/kernel/signal.c    Al Viro           2020-02-15  378  	user_access_end();
b00d8f8f0b2b392 arch/x86/kernel/signal.c    Al Viro           2020-02-15  379  	return -EFAULT;
^1da177e4c3f415 arch/i386/kernel/signal.c   Linus Torvalds    2005-04-16  380  }
^1da177e4c3f415 arch/i386/kernel/signal.c   Linus Torvalds    2005-04-16  381  

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 04/45] x86: asm: instrument usercopy in get_user() and __put_user_size()
  2022-07-01 14:22 ` [PATCH v4 04/45] x86: asm: instrument usercopy in get_user() and __put_user_size() Alexander Potapenko
  2022-07-02  3:47   ` kernel test robot
@ 2022-07-02 10:45   ` kernel test robot
  2022-07-15 16:44       ` Alexander Potapenko
  2022-07-02 13:09   ` kernel test robot
  2022-07-07 10:13   ` Marco Elver
  3 siblings, 1 reply; 147+ messages in thread
From: kernel test robot @ 2022-07-02 10:45 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: llvm, kbuild-all, Alexander Viro, Alexei Starovoitov,
	Andrew Morton, Linux Memory Management List, Andrey Konovalov,
	Andy Lutomirski, Arnd Bergmann, Borislav Petkov,
	Christoph Hellwig, Christoph Lameter, David Rientjes,
	Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu,
	Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim,
	Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner

Hi Alexander,

I love your patch! Yet something to improve:

[auto build test ERROR on masahiroy-kbuild/for-next]
[also build test ERROR on linus/master v5.19-rc4 next-20220701]
[cannot apply to tip/x86/core tip/x86/mm]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Alexander-Potapenko/Add-KernelMemorySanitizer-infrastructure/20220701-222712
base:   https://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild.git for-next
config: i386-randconfig-a011 (https://download.01.org/0day-ci/archive/20220702/202207021844.0J4s1Gjz-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project a9119143a2d1f4d0d0bc1fe0d819e5351b4e0deb)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/0ca0e4029535365a65588446ba55a952ca186079
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Alexander-Potapenko/Add-KernelMemorySanitizer-infrastructure/20220701-222712
        git checkout 0ca0e4029535365a65588446ba55a952ca186079
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> arch/x86/math-emu/reg_ld_str.c:1043:3: error: address of bit-field requested
                   FPU_get_user(instruction_address.selector,
                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   arch/x86/math-emu/fpu_system.h:127:36: note: expanded from macro 'FPU_get_user'
   #define FPU_get_user(x,y) do { if (get_user((x),(y))) FPU_abort; } while (0)
                                  ~~~~^~~~~~~~~~~~~~~~~~
   arch/x86/include/asm/uaccess.h:131:43: note: expanded from macro 'get_user'
   #define get_user(x,ptr) ({ might_fault(); do_get_user_call(get_user,x,ptr); })
                                             ^
   arch/x86/include/asm/uaccess.h:103:43: note: expanded from macro 'do_get_user_call'
           instrument_copy_from_user_before((void *)&(x), ptr, sizeof(*(ptr))); \
                                                    ^
   include/linux/compiler.h:56:47: note: expanded from macro 'if'
   #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
                              ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/compiler.h:58:52: note: expanded from macro '__trace_if_var'
   #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond))
                                                      ^~~~
>> arch/x86/math-emu/reg_ld_str.c:1043:3: error: address of bit-field requested
                   FPU_get_user(instruction_address.selector,
                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   arch/x86/math-emu/fpu_system.h:127:36: note: expanded from macro 'FPU_get_user'
   #define FPU_get_user(x,y) do { if (get_user((x),(y))) FPU_abort; } while (0)
                                  ~~~~^~~~~~~~~~~~~~~~~~
   arch/x86/include/asm/uaccess.h:131:43: note: expanded from macro 'get_user'
   #define get_user(x,ptr) ({ might_fault(); do_get_user_call(get_user,x,ptr); })
                                             ^
   arch/x86/include/asm/uaccess.h:109:42: note: expanded from macro 'do_get_user_call'
           instrument_copy_from_user_after((void *)&(x), ptr, sizeof(*(ptr)), 0); \
                                                   ^
   include/linux/compiler.h:56:47: note: expanded from macro 'if'
   #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
                              ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/compiler.h:58:52: note: expanded from macro '__trace_if_var'
   #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond))
                                                      ^~~~
>> arch/x86/math-emu/reg_ld_str.c:1043:3: error: address of bit-field requested
                   FPU_get_user(instruction_address.selector,
                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   arch/x86/math-emu/fpu_system.h:127:36: note: expanded from macro 'FPU_get_user'
   #define FPU_get_user(x,y) do { if (get_user((x),(y))) FPU_abort; } while (0)
                                  ~~~~^~~~~~~~~~~~~~~~~~
   arch/x86/include/asm/uaccess.h:131:43: note: expanded from macro 'get_user'
   #define get_user(x,ptr) ({ might_fault(); do_get_user_call(get_user,x,ptr); })
                                             ^
   arch/x86/include/asm/uaccess.h:103:43: note: expanded from macro 'do_get_user_call'
           instrument_copy_from_user_before((void *)&(x), ptr, sizeof(*(ptr))); \
                                                    ^
   include/linux/compiler.h:56:47: note: expanded from macro 'if'
   #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
                              ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/compiler.h:58:61: note: expanded from macro '__trace_if_var'
   #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond))
                                                               ^~~~
>> arch/x86/math-emu/reg_ld_str.c:1043:3: error: address of bit-field requested
                   FPU_get_user(instruction_address.selector,
                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   arch/x86/math-emu/fpu_system.h:127:36: note: expanded from macro 'FPU_get_user'
   #define FPU_get_user(x,y) do { if (get_user((x),(y))) FPU_abort; } while (0)
                                  ~~~~^~~~~~~~~~~~~~~~~~
   arch/x86/include/asm/uaccess.h:131:43: note: expanded from macro 'get_user'
   #define get_user(x,ptr) ({ might_fault(); do_get_user_call(get_user,x,ptr); })
                                             ^
   arch/x86/include/asm/uaccess.h:109:42: note: expanded from macro 'do_get_user_call'
           instrument_copy_from_user_after((void *)&(x), ptr, sizeof(*(ptr)), 0); \
                                                   ^
   include/linux/compiler.h:56:47: note: expanded from macro 'if'
   #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
                              ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/compiler.h:58:61: note: expanded from macro '__trace_if_var'
   #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond))
                                                               ^~~~
>> arch/x86/math-emu/reg_ld_str.c:1043:3: error: address of bit-field requested
                   FPU_get_user(instruction_address.selector,
                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   arch/x86/math-emu/fpu_system.h:127:36: note: expanded from macro 'FPU_get_user'
   #define FPU_get_user(x,y) do { if (get_user((x),(y))) FPU_abort; } while (0)
                                  ~~~~^~~~~~~~~~~~~~~~~~
   arch/x86/include/asm/uaccess.h:131:43: note: expanded from macro 'get_user'
   #define get_user(x,ptr) ({ might_fault(); do_get_user_call(get_user,x,ptr); })
                                             ^
   arch/x86/include/asm/uaccess.h:103:43: note: expanded from macro 'do_get_user_call'
           instrument_copy_from_user_before((void *)&(x), ptr, sizeof(*(ptr))); \
                                                    ^
   include/linux/compiler.h:56:47: note: expanded from macro 'if'
   #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
                              ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/compiler.h:58:86: note: expanded from macro '__trace_if_var'
   #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond))
                                                                       ~~~~~~~~~~~~~~~~~^~~~~
   include/linux/compiler.h:69:3: note: expanded from macro '__trace_if_value'
           (cond) ?                                        \
            ^~~~
>> arch/x86/math-emu/reg_ld_str.c:1043:3: error: address of bit-field requested
                   FPU_get_user(instruction_address.selector,
                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   arch/x86/math-emu/fpu_system.h:127:36: note: expanded from macro 'FPU_get_user'
   #define FPU_get_user(x,y) do { if (get_user((x),(y))) FPU_abort; } while (0)
                                  ~~~~^~~~~~~~~~~~~~~~~~
   arch/x86/include/asm/uaccess.h:131:43: note: expanded from macro 'get_user'
   #define get_user(x,ptr) ({ might_fault(); do_get_user_call(get_user,x,ptr); })
                                             ^
   arch/x86/include/asm/uaccess.h:109:42: note: expanded from macro 'do_get_user_call'
           instrument_copy_from_user_after((void *)&(x), ptr, sizeof(*(ptr)), 0); \
                                                   ^
   include/linux/compiler.h:56:47: note: expanded from macro 'if'
   #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
                              ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/compiler.h:58:86: note: expanded from macro '__trace_if_var'
   #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond))
                                                                       ~~~~~~~~~~~~~~~~~^~~~~
   include/linux/compiler.h:69:3: note: expanded from macro '__trace_if_value'
           (cond) ?                                        \
            ^~~~
   arch/x86/math-emu/reg_ld_str.c:1047:3: error: address of bit-field requested
                   FPU_get_user(operand_address.selector,
                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   arch/x86/math-emu/fpu_system.h:127:36: note: expanded from macro 'FPU_get_user'
   #define FPU_get_user(x,y) do { if (get_user((x),(y))) FPU_abort; } while (0)
                                  ~~~~^~~~~~~~~~~~~~~~~~
   arch/x86/include/asm/uaccess.h:131:43: note: expanded from macro 'get_user'
   #define get_user(x,ptr) ({ might_fault(); do_get_user_call(get_user,x,ptr); })
                                             ^
   arch/x86/include/asm/uaccess.h:103:43: note: expanded from macro 'do_get_user_call'
           instrument_copy_from_user_before((void *)&(x), ptr, sizeof(*(ptr))); \
                                                    ^
   include/linux/compiler.h:56:47: note: expanded from macro 'if'
   #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
                              ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/compiler.h:58:52: note: expanded from macro '__trace_if_var'
   #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond))
                                                      ^~~~
   arch/x86/math-emu/reg_ld_str.c:1047:3: error: address of bit-field requested
                   FPU_get_user(operand_address.selector,
                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   arch/x86/math-emu/fpu_system.h:127:36: note: expanded from macro 'FPU_get_user'
   #define FPU_get_user(x,y) do { if (get_user((x),(y))) FPU_abort; } while (0)
                                  ~~~~^~~~~~~~~~~~~~~~~~
   arch/x86/include/asm/uaccess.h:131:43: note: expanded from macro 'get_user'
   #define get_user(x,ptr) ({ might_fault(); do_get_user_call(get_user,x,ptr); })
                                             ^
   arch/x86/include/asm/uaccess.h:109:42: note: expanded from macro 'do_get_user_call'
           instrument_copy_from_user_after((void *)&(x), ptr, sizeof(*(ptr)), 0); \
                                                   ^
   include/linux/compiler.h:56:47: note: expanded from macro 'if'
   #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
                              ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/compiler.h:58:52: note: expanded from macro '__trace_if_var'
   #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond))
                                                      ^~~~
   arch/x86/math-emu/reg_ld_str.c:1047:3: error: address of bit-field requested
                   FPU_get_user(operand_address.selector,
                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   arch/x86/math-emu/fpu_system.h:127:36: note: expanded from macro 'FPU_get_user'
   #define FPU_get_user(x,y) do { if (get_user((x),(y))) FPU_abort; } while (0)
                                  ~~~~^~~~~~~~~~~~~~~~~~
   arch/x86/include/asm/uaccess.h:131:43: note: expanded from macro 'get_user'
   #define get_user(x,ptr) ({ might_fault(); do_get_user_call(get_user,x,ptr); })
                                             ^
   arch/x86/include/asm/uaccess.h:103:43: note: expanded from macro 'do_get_user_call'
           instrument_copy_from_user_before((void *)&(x), ptr, sizeof(*(ptr))); \
                                                    ^
   include/linux/compiler.h:56:47: note: expanded from macro 'if'
   #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
                              ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/compiler.h:58:61: note: expanded from macro '__trace_if_var'
   #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond))
                                                               ^~~~
   arch/x86/math-emu/reg_ld_str.c:1047:3: error: address of bit-field requested
                   FPU_get_user(operand_address.selector,
                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   arch/x86/math-emu/fpu_system.h:127:36: note: expanded from macro 'FPU_get_user'
   #define FPU_get_user(x,y) do { if (get_user((x),(y))) FPU_abort; } while (0)
                                  ~~~~^~~~~~~~~~~~~~~~~~
   arch/x86/include/asm/uaccess.h:131:43: note: expanded from macro 'get_user'
   #define get_user(x,ptr) ({ might_fault(); do_get_user_call(get_user,x,ptr); })
                                             ^
   arch/x86/include/asm/uaccess.h:109:42: note: expanded from macro 'do_get_user_call'
           instrument_copy_from_user_after((void *)&(x), ptr, sizeof(*(ptr)), 0); \
                                                   ^
   include/linux/compiler.h:56:47: note: expanded from macro 'if'
   #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
                              ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/compiler.h:58:61: note: expanded from macro '__trace_if_var'
   #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond))
                                                               ^~~~
   arch/x86/math-emu/reg_ld_str.c:1047:3: error: address of bit-field requested
                   FPU_get_user(operand_address.selector,
                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   arch/x86/math-emu/fpu_system.h:127:36: note: expanded from macro 'FPU_get_user'
   #define FPU_get_user(x,y) do { if (get_user((x),(y))) FPU_abort; } while (0)
                                  ~~~~^~~~~~~~~~~~~~~~~~
   arch/x86/include/asm/uaccess.h:131:43: note: expanded from macro 'get_user'
   #define get_user(x,ptr) ({ might_fault(); do_get_user_call(get_user,x,ptr); })


vim +1043 arch/x86/math-emu/reg_ld_str.c

^1da177e4c3f415 arch/i386/math-emu/reg_ld_str.c Linus Torvalds 2005-04-16  1026  
^1da177e4c3f415 arch/i386/math-emu/reg_ld_str.c Linus Torvalds 2005-04-16  1027  u_char __user *fldenv(fpu_addr_modes addr_modes, u_char __user *s)
^1da177e4c3f415 arch/i386/math-emu/reg_ld_str.c Linus Torvalds 2005-04-16  1028  {
^1da177e4c3f415 arch/i386/math-emu/reg_ld_str.c Linus Torvalds 2005-04-16  1029  	unsigned short tag_word = 0;
^1da177e4c3f415 arch/i386/math-emu/reg_ld_str.c Linus Torvalds 2005-04-16  1030  	u_char tag;
^1da177e4c3f415 arch/i386/math-emu/reg_ld_str.c Linus Torvalds 2005-04-16  1031  	int i;
^1da177e4c3f415 arch/i386/math-emu/reg_ld_str.c Linus Torvalds 2005-04-16  1032  
^1da177e4c3f415 arch/i386/math-emu/reg_ld_str.c Linus Torvalds 2005-04-16  1033  	if ((addr_modes.default_mode == VM86) ||
^1da177e4c3f415 arch/i386/math-emu/reg_ld_str.c Linus Torvalds 2005-04-16  1034  	    ((addr_modes.default_mode == PM16)
3d0d14f983b55a5 arch/x86/math-emu/reg_ld_str.c  Ingo Molnar    2008-01-30  1035  	     ^ (addr_modes.override.operand_size == OP_SIZE_PREFIX))) {
^1da177e4c3f415 arch/i386/math-emu/reg_ld_str.c Linus Torvalds 2005-04-16  1036  		RE_ENTRANT_CHECK_OFF;
96d4f267e40f950 arch/x86/math-emu/reg_ld_str.c  Linus Torvalds 2019-01-03  1037  		FPU_access_ok(s, 0x0e);
^1da177e4c3f415 arch/i386/math-emu/reg_ld_str.c Linus Torvalds 2005-04-16  1038  		FPU_get_user(control_word, (unsigned short __user *)s);
^1da177e4c3f415 arch/i386/math-emu/reg_ld_str.c Linus Torvalds 2005-04-16  1039  		FPU_get_user(partial_status, (unsigned short __user *)(s + 2));
^1da177e4c3f415 arch/i386/math-emu/reg_ld_str.c Linus Torvalds 2005-04-16  1040  		FPU_get_user(tag_word, (unsigned short __user *)(s + 4));
3d0d14f983b55a5 arch/x86/math-emu/reg_ld_str.c  Ingo Molnar    2008-01-30  1041  		FPU_get_user(instruction_address.offset,
3d0d14f983b55a5 arch/x86/math-emu/reg_ld_str.c  Ingo Molnar    2008-01-30  1042  			     (unsigned short __user *)(s + 6));
3d0d14f983b55a5 arch/x86/math-emu/reg_ld_str.c  Ingo Molnar    2008-01-30 @1043  		FPU_get_user(instruction_address.selector,
3d0d14f983b55a5 arch/x86/math-emu/reg_ld_str.c  Ingo Molnar    2008-01-30  1044  			     (unsigned short __user *)(s + 8));
3d0d14f983b55a5 arch/x86/math-emu/reg_ld_str.c  Ingo Molnar    2008-01-30  1045  		FPU_get_user(operand_address.offset,
3d0d14f983b55a5 arch/x86/math-emu/reg_ld_str.c  Ingo Molnar    2008-01-30  1046  			     (unsigned short __user *)(s + 0x0a));
3d0d14f983b55a5 arch/x86/math-emu/reg_ld_str.c  Ingo Molnar    2008-01-30  1047  		FPU_get_user(operand_address.selector,
3d0d14f983b55a5 arch/x86/math-emu/reg_ld_str.c  Ingo Molnar    2008-01-30  1048  			     (unsigned short __user *)(s + 0x0c));
^1da177e4c3f415 arch/i386/math-emu/reg_ld_str.c Linus Torvalds 2005-04-16  1049  		RE_ENTRANT_CHECK_ON;
^1da177e4c3f415 arch/i386/math-emu/reg_ld_str.c Linus Torvalds 2005-04-16  1050  		s += 0x0e;
3d0d14f983b55a5 arch/x86/math-emu/reg_ld_str.c  Ingo Molnar    2008-01-30  1051  		if (addr_modes.default_mode == VM86) {
^1da177e4c3f415 arch/i386/math-emu/reg_ld_str.c Linus Torvalds 2005-04-16  1052  			instruction_address.offset
^1da177e4c3f415 arch/i386/math-emu/reg_ld_str.c Linus Torvalds 2005-04-16  1053  			    += (instruction_address.selector & 0xf000) << 4;
3d0d14f983b55a5 arch/x86/math-emu/reg_ld_str.c  Ingo Molnar    2008-01-30  1054  			operand_address.offset +=
3d0d14f983b55a5 arch/x86/math-emu/reg_ld_str.c  Ingo Molnar    2008-01-30  1055  			    (operand_address.selector & 0xf000) << 4;
^1da177e4c3f415 arch/i386/math-emu/reg_ld_str.c Linus Torvalds 2005-04-16  1056  		}
3d0d14f983b55a5 arch/x86/math-emu/reg_ld_str.c  Ingo Molnar    2008-01-30  1057  	} else {
^1da177e4c3f415 arch/i386/math-emu/reg_ld_str.c Linus Torvalds 2005-04-16  1058  		RE_ENTRANT_CHECK_OFF;
96d4f267e40f950 arch/x86/math-emu/reg_ld_str.c  Linus Torvalds 2019-01-03  1059  		FPU_access_ok(s, 0x1c);
^1da177e4c3f415 arch/i386/math-emu/reg_ld_str.c Linus Torvalds 2005-04-16  1060  		FPU_get_user(control_word, (unsigned short __user *)s);
^1da177e4c3f415 arch/i386/math-emu/reg_ld_str.c Linus Torvalds 2005-04-16  1061  		FPU_get_user(partial_status, (unsigned short __user *)(s + 4));
^1da177e4c3f415 arch/i386/math-emu/reg_ld_str.c Linus Torvalds 2005-04-16  1062  		FPU_get_user(tag_word, (unsigned short __user *)(s + 8));
3d0d14f983b55a5 arch/x86/math-emu/reg_ld_str.c  Ingo Molnar    2008-01-30  1063  		FPU_get_user(instruction_address.offset,
3d0d14f983b55a5 arch/x86/math-emu/reg_ld_str.c  Ingo Molnar    2008-01-30  1064  			     (unsigned long __user *)(s + 0x0c));
3d0d14f983b55a5 arch/x86/math-emu/reg_ld_str.c  Ingo Molnar    2008-01-30  1065  		FPU_get_user(instruction_address.selector,
3d0d14f983b55a5 arch/x86/math-emu/reg_ld_str.c  Ingo Molnar    2008-01-30  1066  			     (unsigned short __user *)(s + 0x10));
3d0d14f983b55a5 arch/x86/math-emu/reg_ld_str.c  Ingo Molnar    2008-01-30  1067  		FPU_get_user(instruction_address.opcode,
3d0d14f983b55a5 arch/x86/math-emu/reg_ld_str.c  Ingo Molnar    2008-01-30  1068  			     (unsigned short __user *)(s + 0x12));
3d0d14f983b55a5 arch/x86/math-emu/reg_ld_str.c  Ingo Molnar    2008-01-30  1069  		FPU_get_user(operand_address.offset,
3d0d14f983b55a5 arch/x86/math-emu/reg_ld_str.c  Ingo Molnar    2008-01-30  1070  			     (unsigned long __user *)(s + 0x14));
3d0d14f983b55a5 arch/x86/math-emu/reg_ld_str.c  Ingo Molnar    2008-01-30  1071  		FPU_get_user(operand_address.selector,
3d0d14f983b55a5 arch/x86/math-emu/reg_ld_str.c  Ingo Molnar    2008-01-30  1072  			     (unsigned long __user *)(s + 0x18));
^1da177e4c3f415 arch/i386/math-emu/reg_ld_str.c Linus Torvalds 2005-04-16  1073  		RE_ENTRANT_CHECK_ON;
^1da177e4c3f415 arch/i386/math-emu/reg_ld_str.c Linus Torvalds 2005-04-16  1074  		s += 0x1c;
^1da177e4c3f415 arch/i386/math-emu/reg_ld_str.c Linus Torvalds 2005-04-16  1075  	}
^1da177e4c3f415 arch/i386/math-emu/reg_ld_str.c Linus Torvalds 2005-04-16  1076  

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 04/45] x86: asm: instrument usercopy in get_user() and __put_user_size()
  2022-07-01 14:22 ` [PATCH v4 04/45] x86: asm: instrument usercopy in get_user() and __put_user_size() Alexander Potapenko
  2022-07-02  3:47   ` kernel test robot
  2022-07-02 10:45   ` kernel test robot
@ 2022-07-02 13:09   ` kernel test robot
  2022-07-07 10:13   ` Marco Elver
  3 siblings, 0 replies; 147+ messages in thread
From: kernel test robot @ 2022-07-02 13:09 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: kbuild-all, Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Linux Memory Management List, Andrey Konovalov, Andy Lutomirski,
	Arnd Bergmann, Borislav Petkov, Christoph Hellwig,
	Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Dumazet,
	Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar,
	Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner

[-- Attachment #1: Type: text/plain, Size: 20687 bytes --]

Hi Alexander,

I love your patch! Perhaps something to improve:

[auto build test WARNING on masahiroy-kbuild/for-next]
[also build test WARNING on linus/master v5.19-rc4 next-20220701]
[cannot apply to tip/x86/core tip/x86/mm]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Alexander-Potapenko/Add-KernelMemorySanitizer-infrastructure/20220701-222712
base:   https://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild.git for-next
config: i386-randconfig-s002
compiler: gcc-11 (Debian 11.3.0-3) 11.3.0
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.4-39-gce1a6720-dirty
        # https://github.com/intel-lab-lkp/linux/commit/0ca0e4029535365a65588446ba55a952ca186079
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Alexander-Potapenko/Add-KernelMemorySanitizer-infrastructure/20220701-222712
        git checkout 0ca0e4029535365a65588446ba55a952ca186079
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=i386 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)
   kernel/signal.c: note: in included file (through arch/x86/include/uapi/asm/signal.h, arch/x86/include/asm/signal.h, include/uapi/linux/signal.h, ...):
   include/uapi/asm-generic/signal-defs.h:83:29: sparse: sparse: multiple address spaces given
   kernel/signal.c:195:31: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:195:31: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:195:31: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:198:33: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:198:33: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:198:33: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:480:9: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:480:9: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:480:9: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:484:34: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:484:34: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:484:34: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:517:9: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:517:9: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:517:9: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:520:36: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:520:36: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:520:36: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:542:53: sparse: sparse: incorrect type in initializer (different address spaces) @@     expected struct k_sigaction *ka @@     got struct k_sigaction [noderef] __rcu * @@
   kernel/signal.c:542:53: sparse:     expected struct k_sigaction *ka
   kernel/signal.c:542:53: sparse:     got struct k_sigaction [noderef] __rcu *
   include/uapi/asm-generic/signal-defs.h:83:29: sparse: sparse: multiple address spaces given
   kernel/signal.c:698:33: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:698:33: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:698:33: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:700:31: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:700:31: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:700:31: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:765:9: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct lockdep_map const *lock @@     got struct lockdep_map [noderef] __rcu * @@
   kernel/signal.c:765:9: sparse:     expected struct lockdep_map const *lock
   kernel/signal.c:765:9: sparse:     got struct lockdep_map [noderef] __rcu *
   kernel/signal.c:890:9: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct lockdep_map const *lock @@     got struct lockdep_map [noderef] __rcu * @@
   kernel/signal.c:890:9: sparse:     expected struct lockdep_map const *lock
   kernel/signal.c:890:9: sparse:     got struct lockdep_map [noderef] __rcu *
   kernel/signal.c:1086:9: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct lockdep_map const *lock @@     got struct lockdep_map [noderef] __rcu * @@
   kernel/signal.c:1086:9: sparse:     expected struct lockdep_map const *lock
   kernel/signal.c:1086:9: sparse:     got struct lockdep_map [noderef] __rcu *
   kernel/signal.c:1267:29: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected void const volatile [noderef] __user *ptr @@     got unsigned char * @@
   kernel/signal.c:1267:29: sparse:     expected void const volatile [noderef] __user *ptr
   kernel/signal.c:1267:29: sparse:     got unsigned char *
>> kernel/signal.c:1267:29: sparse: sparse: incorrect type in argument 2 (different address spaces) @@     expected void const [noderef] __user *from @@     got unsigned char * @@
   kernel/signal.c:1267:29: sparse:     expected void const [noderef] __user *from
   kernel/signal.c:1267:29: sparse:     got unsigned char *
>> kernel/signal.c:1267:29: sparse: sparse: incorrect type in argument 2 (different address spaces) @@     expected void const [noderef] __user *from @@     got unsigned char * @@
   kernel/signal.c:1267:29: sparse:     expected void const [noderef] __user *from
   kernel/signal.c:1267:29: sparse:     got unsigned char *
   kernel/signal.c:1328:9: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:1328:9: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:1328:9: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:1329:16: sparse: sparse: incorrect type in assignment (different address spaces) @@     expected struct k_sigaction *action @@     got struct k_sigaction [noderef] __rcu * @@
   kernel/signal.c:1329:16: sparse:     expected struct k_sigaction *action
   kernel/signal.c:1329:16: sparse:     got struct k_sigaction [noderef] __rcu *
   kernel/signal.c:1349:34: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:1349:34: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:1349:34: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:1938:36: sparse: sparse: incorrect type in initializer (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:1938:36: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:1938:36: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:2048:44: sparse: sparse: cast removes address space '__rcu' of expression
   kernel/signal.c:2067:65: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct task_struct *tsk @@     got struct task_struct [noderef] __rcu *parent @@
   kernel/signal.c:2067:65: sparse:     expected struct task_struct *tsk
   kernel/signal.c:2067:65: sparse:     got struct task_struct [noderef] __rcu *parent
   kernel/signal.c:2068:40: sparse: sparse: cast removes address space '__rcu' of expression
   kernel/signal.c:2086:14: sparse: sparse: incorrect type in assignment (different address spaces) @@     expected struct sighand_struct *psig @@     got struct sighand_struct [noderef] __rcu *[noderef] __rcu sighand @@
   kernel/signal.c:2086:14: sparse:     expected struct sighand_struct *psig
   kernel/signal.c:2086:14: sparse:     got struct sighand_struct [noderef] __rcu *[noderef] __rcu sighand
   kernel/signal.c:2115:53: sparse: sparse: incorrect type in argument 3 (different address spaces) @@     expected struct task_struct *t @@     got struct task_struct [noderef] __rcu *parent @@
   kernel/signal.c:2115:53: sparse:     expected struct task_struct *t
   kernel/signal.c:2115:53: sparse:     got struct task_struct [noderef] __rcu *parent
   kernel/signal.c:2116:34: sparse: sparse: incorrect type in argument 2 (different address spaces) @@     expected struct task_struct *parent @@     got struct task_struct [noderef] __rcu *parent @@
   kernel/signal.c:2116:34: sparse:     expected struct task_struct *parent
   kernel/signal.c:2116:34: sparse:     got struct task_struct [noderef] __rcu *parent
   kernel/signal.c:2145:24: sparse: sparse: incorrect type in assignment (different address spaces) @@     expected struct task_struct *parent @@     got struct task_struct [noderef] __rcu *parent @@
   kernel/signal.c:2145:24: sparse:     expected struct task_struct *parent
   kernel/signal.c:2145:24: sparse:     got struct task_struct [noderef] __rcu *parent
   kernel/signal.c:2148:24: sparse: sparse: incorrect type in assignment (different address spaces) @@     expected struct task_struct *parent @@     got struct task_struct [noderef] __rcu *real_parent @@
   kernel/signal.c:2148:24: sparse:     expected struct task_struct *parent
   kernel/signal.c:2148:24: sparse:     got struct task_struct [noderef] __rcu *real_parent
   kernel/signal.c:2181:17: sparse: sparse: incorrect type in assignment (different address spaces) @@     expected struct sighand_struct *sighand @@     got struct sighand_struct [noderef] __rcu *sighand @@
   kernel/signal.c:2181:17: sparse:     expected struct sighand_struct *sighand
   kernel/signal.c:2181:17: sparse:     got struct sighand_struct [noderef] __rcu *sighand
   kernel/signal.c:2221:41: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:2221:41: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:2221:41: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:2223:39: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:2223:39: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:2223:39: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:2280:33: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:2280:33: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:2280:33: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:2315:31: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:2315:31: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:2315:31: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:2355:31: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:2355:31: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:2355:31: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:2357:33: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:2357:33: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:2357:33: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:2457:41: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:2457:41: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:2457:41: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:2541:41: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:2541:41: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:2541:41: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:2553:33: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:2553:33: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:2553:33: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:2588:52: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct task_struct *tsk @@     got struct task_struct [noderef] __rcu *parent @@
   kernel/signal.c:2588:52: sparse:     expected struct task_struct *tsk
   kernel/signal.c:2588:52: sparse:     got struct task_struct [noderef] __rcu *parent
   kernel/signal.c:2590:49: sparse: sparse: cast removes address space '__rcu' of expression
   kernel/signal.c:2628:49: sparse: sparse: incorrect type in initializer (different address spaces) @@     expected struct sighand_struct *sighand @@     got struct sighand_struct [noderef] __rcu *sighand @@
   kernel/signal.c:2628:49: sparse:     expected struct sighand_struct *sighand
   kernel/signal.c:2628:49: sparse:     got struct sighand_struct [noderef] __rcu *sighand
   kernel/signal.c:2957:27: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:2957:27: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:2957:27: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:2977:29: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:2977:29: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:2977:29: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:3044:27: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:3044:27: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:3044:27: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:3046:29: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:3046:29: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:3046:29: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:3197:31: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:3197:31: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:3197:31: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:3200:33: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:3200:33: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:3200:33: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:3589:27: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:3589:27: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:3589:27: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:3601:37: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:3601:37: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:3601:37: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:3606:35: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:3606:35: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:3606:35: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:3611:29: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:3611:29: sparse:     expected struct spinlock [usertype] *lock

vim +1267 kernel/signal.c

7978b567d31555 Sukadev Bhattiprolu 2009-04-02  1254  
4aaefee589f97b Al Viro             2012-11-05  1255  static void print_fatal_signal(int signr)
45807a1df9f51d Ingo Molnar         2007-07-15  1256  {
4aaefee589f97b Al Viro             2012-11-05  1257  	struct pt_regs *regs = signal_pt_regs();
747800efbe8b98 Wang Xiaoqiang      2016-05-23  1258  	pr_info("potentially unexpected fatal signal %d.\n", signr);
45807a1df9f51d Ingo Molnar         2007-07-15  1259  
ca5cd877ae699e Al Viro             2007-10-29  1260  #if defined(__i386__) && !defined(__arch_um__)
747800efbe8b98 Wang Xiaoqiang      2016-05-23  1261  	pr_info("code at %08lx: ", regs->ip);
45807a1df9f51d Ingo Molnar         2007-07-15  1262  	{
45807a1df9f51d Ingo Molnar         2007-07-15  1263  		int i;
45807a1df9f51d Ingo Molnar         2007-07-15  1264  		for (i = 0; i < 16; i++) {
45807a1df9f51d Ingo Molnar         2007-07-15  1265  			unsigned char insn;
45807a1df9f51d Ingo Molnar         2007-07-15  1266  
b45c6e76bc2c72 Andi Kleen          2010-01-08 @1267  			if (get_user(insn, (unsigned char *)(regs->ip + i)))
b45c6e76bc2c72 Andi Kleen          2010-01-08  1268  				break;
747800efbe8b98 Wang Xiaoqiang      2016-05-23  1269  			pr_cont("%02x ", insn);
45807a1df9f51d Ingo Molnar         2007-07-15  1270  		}
45807a1df9f51d Ingo Molnar         2007-07-15  1271  	}
747800efbe8b98 Wang Xiaoqiang      2016-05-23  1272  	pr_cont("\n");
45807a1df9f51d Ingo Molnar         2007-07-15  1273  #endif
3a9f84d354ce1e Ed Swierk           2009-01-26  1274  	preempt_disable();
45807a1df9f51d Ingo Molnar         2007-07-15  1275  	show_regs(regs);
3a9f84d354ce1e Ed Swierk           2009-01-26  1276  	preempt_enable();
45807a1df9f51d Ingo Molnar         2007-07-15  1277  }
45807a1df9f51d Ingo Molnar         2007-07-15  1278  

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

[-- Attachment #2: config --]
[-- Type: text/plain, Size: 130741 bytes --]

#
# Automatically generated file; DO NOT EDIT.
# Linux/i386 5.19.0-rc4 Kernel Configuration
#
CONFIG_CC_VERSION_TEXT="gcc-11 (Debian 11.3.0-3) 11.3.0"
CONFIG_CC_IS_GCC=y
CONFIG_GCC_VERSION=110300
CONFIG_CLANG_VERSION=0
CONFIG_AS_IS_GNU=y
CONFIG_AS_VERSION=23800
CONFIG_LD_IS_BFD=y
CONFIG_LD_VERSION=23800
CONFIG_LLD_VERSION=0
CONFIG_CC_CAN_LINK=y
CONFIG_CC_CAN_LINK_STATIC=y
CONFIG_CC_HAS_ASM_GOTO=y
CONFIG_CC_HAS_ASM_GOTO_OUTPUT=y
CONFIG_CC_HAS_ASM_INLINE=y
CONFIG_CC_HAS_NO_PROFILE_FN_ATTR=y
CONFIG_PAHOLE_VERSION=123
CONFIG_CONSTRUCTORS=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_TABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
# CONFIG_COMPILE_TEST is not set
# CONFIG_WERROR is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_BUILD_SALT=""
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
CONFIG_HAVE_KERNEL_ZSTD=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
# CONFIG_KERNEL_ZSTD is not set
CONFIG_DEFAULT_INIT=""
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
CONFIG_WATCH_QUEUE=y
CONFIG_CROSS_MEMORY_ATTACH=y
# CONFIG_USELIB is not set
# CONFIG_AUDIT is not set
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_INJECTION=y
CONFIG_HARDIRQS_SW_RESEND=y
CONFIG_GENERIC_IRQ_CHIP=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_GENERIC_MSI_IRQ_DOMAIN=y
CONFIG_IRQ_MSI_IOMMU=y
CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y
CONFIG_GENERIC_IRQ_RESERVATION_MODE=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
CONFIG_GENERIC_IRQ_DEBUGFS=y
# end of IRQ subsystem

CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_INIT=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_HAVE_POSIX_CPU_TIMERS_TASK_WORK=y
CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ is not set
CONFIG_HIGH_RES_TIMERS=y
CONFIG_CLOCKSOURCE_WATCHDOG_MAX_SKEW_US=100
# end of Timers subsystem

CONFIG_BPF=y
CONFIG_HAVE_EBPF_JIT=y

#
# BPF subsystem
#
# CONFIG_BPF_SYSCALL is not set
CONFIG_BPF_JIT=y
# end of BPF subsystem

CONFIG_PREEMPT_VOLUNTARY_BUILD=y
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
CONFIG_PREEMPT_COUNT=y
# CONFIG_PREEMPT_DYNAMIC is not set

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
CONFIG_IRQ_TIME_ACCOUNTING=y
CONFIG_BSD_PROCESS_ACCT=y
# CONFIG_BSD_PROCESS_ACCT_V3 is not set
# CONFIG_TASKSTATS is not set
# CONFIG_PSI is not set
# end of CPU/Task time and stats accounting

#
# RCU Subsystem
#
CONFIG_TINY_RCU=y
CONFIG_RCU_EXPERT=y
CONFIG_SRCU=y
CONFIG_TINY_SRCU=y
CONFIG_TASKS_RCU_GENERIC=y
CONFIG_FORCE_TASKS_RCU=y
CONFIG_TASKS_RCU=y
CONFIG_FORCE_TASKS_RUDE_RCU=y
CONFIG_TASKS_RUDE_RCU=y
CONFIG_FORCE_TASKS_TRACE_RCU=y
CONFIG_TASKS_TRACE_RCU=y
CONFIG_RCU_NEED_SEGCBLIST=y
CONFIG_TASKS_TRACE_RCU_READ_MB=y
# end of RCU Subsystem

CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
# CONFIG_IKHEADERS is not set
CONFIG_LOG_BUF_SHIFT=20
CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13
# CONFIG_PRINTK_INDEX is not set
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y

#
# Scheduler features
#
# end of Scheduler features

CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH=y
CONFIG_CC_IMPLICIT_FALLTHROUGH="-Wimplicit-fallthrough=5"
CONFIG_GCC12_NO_ARRAY_BOUNDS=y
CONFIG_CGROUPS=y
CONFIG_PAGE_COUNTER=y
# CONFIG_MEMCG is not set
# CONFIG_BLK_CGROUP is not set
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
CONFIG_CFS_BANDWIDTH=y
CONFIG_RT_GROUP_SCHED=y
# CONFIG_CGROUP_PIDS is not set
CONFIG_CGROUP_RDMA=y
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_HUGETLB=y
# CONFIG_CGROUP_DEVICE is not set
# CONFIG_CGROUP_CPUACCT is not set
CONFIG_CGROUP_PERF=y
# CONFIG_CGROUP_MISC is not set
# CONFIG_CGROUP_DEBUG is not set
CONFIG_SOCK_CGROUP_DATA=y
# CONFIG_NAMESPACES is not set
CONFIG_CHECKPOINT_RESTORE=y
CONFIG_SCHED_AUTOGROUP=y
# CONFIG_SYSFS_DEPRECATED is not set
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_RD_GZIP=y
CONFIG_RD_BZIP2=y
CONFIG_RD_LZMA=y
CONFIG_RD_XZ=y
# CONFIG_RD_LZO is not set
CONFIG_RD_LZ4=y
CONFIG_RD_ZSTD=y
# CONFIG_BOOT_CONFIG is not set
CONFIG_INITRAMFS_PRESERVE_MTIME=y
CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_LD_ORPHAN_WARN=y
CONFIG_SYSCTL=y
CONFIG_HAVE_UID16=y
CONFIG_SYSCTL_EXCEPTION_TRACE=y
CONFIG_HAVE_PCSPKR_PLATFORM=y
CONFIG_EXPERT=y
CONFIG_UID16=y
CONFIG_MULTIUSER=y
CONFIG_SGETMASK_SYSCALL=y
CONFIG_SYSFS_SYSCALL=y
CONFIG_FHANDLE=y
CONFIG_POSIX_TIMERS=y
CONFIG_PRINTK=y
CONFIG_BUG=y
# CONFIG_PCSPKR_PLATFORM is not set
# CONFIG_BASE_FULL is not set
CONFIG_FUTEX=y
CONFIG_FUTEX_PI=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_AIO=y
# CONFIG_IO_URING is not set
CONFIG_ADVISE_SYSCALLS=y
CONFIG_MEMBARRIER=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_KALLSYMS_BASE_RELATIVE=y
CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE=y
CONFIG_KCMP=y
CONFIG_RSEQ=y
# CONFIG_DEBUG_RSEQ is not set
# CONFIG_EMBEDDED is not set
CONFIG_HAVE_PERF_EVENTS=y
CONFIG_PERF_USE_VMALLOC=y
CONFIG_PC104=y

#
# Kernel Performance Events And Counters
#
CONFIG_PERF_EVENTS=y
CONFIG_DEBUG_PERF_USE_VMALLOC=y
# end of Kernel Performance Events And Counters

CONFIG_SYSTEM_DATA_VERIFICATION=y
CONFIG_PROFILING=y
CONFIG_TRACEPOINTS=y
# end of General setup

CONFIG_X86_32=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf32-i386"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_BITS_MAX=16
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_NR_GPIO=512
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_PGTABLE_LEVELS=2
CONFIG_CC_HAS_SANE_STACKPROTECTOR=y

#
# Processor type and features
#
# CONFIG_SMP is not set
CONFIG_X86_FEATURE_NAMES=y
# CONFIG_X86_MPPARSE is not set
# CONFIG_GOLDFISH is not set
# CONFIG_RETPOLINE is not set
CONFIG_CC_HAS_SLS=y
# CONFIG_X86_CPU_RESCTRL is not set
# CONFIG_X86_EXTENDED_PLATFORM is not set
# CONFIG_X86_INTEL_LPSS is not set
# CONFIG_X86_AMD_PLATFORM_DEVICE is not set
CONFIG_IOSF_MBI=y
# CONFIG_IOSF_MBI_DEBUG is not set
CONFIG_X86_32_IRIS=y
# CONFIG_SCHED_OMIT_FRAME_POINTER is not set
CONFIG_HYPERVISOR_GUEST=y
CONFIG_PARAVIRT=y
CONFIG_PARAVIRT_DEBUG=y
CONFIG_X86_HV_CALLBACK_VECTOR=y
CONFIG_KVM_GUEST=y
CONFIG_ARCH_CPUIDLE_HALTPOLL=y
CONFIG_PVH=y
# CONFIG_PARAVIRT_TIME_ACCOUNTING is not set
CONFIG_PARAVIRT_CLOCK=y
# CONFIG_M486SX is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMII is not set
# CONFIG_MPENTIUMIII is not set
# CONFIG_MPENTIUMM is not set
# CONFIG_MPENTIUM4 is not set
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
# CONFIG_MK8 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MEFFICEON is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MELAN is not set
CONFIG_MGEODEGX1=y
# CONFIG_MGEODE_LX is not set
# CONFIG_MCYRIXIII is not set
# CONFIG_MVIAC3_2 is not set
# CONFIG_MVIAC7 is not set
# CONFIG_MCORE2 is not set
# CONFIG_MATOM is not set
CONFIG_X86_GENERIC=y
CONFIG_X86_INTERNODE_CACHE_SHIFT=6
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_X86_ALIGNMENT_16=y
CONFIG_X86_INTEL_USERCOPY=y
CONFIG_X86_TSC=y
CONFIG_X86_CMPXCHG64=y
CONFIG_X86_MINIMUM_CPU_FAMILY=5
CONFIG_X86_DEBUGCTLMSR=y
CONFIG_IA32_FEAT_CTL=y
CONFIG_X86_VMX_FEATURE_NAMES=y
# CONFIG_PROCESSOR_SELECT is not set
CONFIG_CPU_SUP_INTEL=y
CONFIG_CPU_SUP_CYRIX_32=y
CONFIG_CPU_SUP_AMD=y
CONFIG_CPU_SUP_HYGON=y
CONFIG_CPU_SUP_CENTAUR=y
CONFIG_CPU_SUP_TRANSMETA_32=y
CONFIG_CPU_SUP_UMC_32=y
CONFIG_CPU_SUP_ZHAOXIN=y
CONFIG_CPU_SUP_VORTEX_32=y
CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y
# CONFIG_DMI is not set
CONFIG_NR_CPUS_RANGE_BEGIN=1
CONFIG_NR_CPUS_RANGE_END=1
CONFIG_NR_CPUS_DEFAULT=1
CONFIG_NR_CPUS=1
CONFIG_UP_LATE_INIT=y
CONFIG_X86_UP_APIC=y
CONFIG_X86_UP_IOAPIC=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
CONFIG_X86_REROUTE_FOR_BROKEN_BOOT_IRQS=y
# CONFIG_X86_MCE is not set

#
# Performance monitoring
#
CONFIG_PERF_EVENTS_INTEL_UNCORE=y
# CONFIG_PERF_EVENTS_INTEL_RAPL is not set
# CONFIG_PERF_EVENTS_INTEL_CSTATE is not set
# CONFIG_PERF_EVENTS_AMD_POWER is not set
# CONFIG_PERF_EVENTS_AMD_UNCORE is not set
# CONFIG_PERF_EVENTS_AMD_BRS is not set
# end of Performance monitoring

# CONFIG_X86_LEGACY_VM86 is not set
# CONFIG_X86_IOPL_IOPERM is not set
CONFIG_TOSHIBA=m
CONFIG_X86_REBOOTFIXUPS=y
CONFIG_MICROCODE=y
CONFIG_MICROCODE_INTEL=y
# CONFIG_MICROCODE_AMD is not set
# CONFIG_MICROCODE_LATE_LOADING is not set
CONFIG_X86_MSR=y
CONFIG_X86_CPUID=y
# CONFIG_NOHIGHMEM is not set
CONFIG_HIGHMEM4G=y
# CONFIG_VMSPLIT_3G is not set
# CONFIG_VMSPLIT_3G_OPT is not set
CONFIG_VMSPLIT_2G=y
# CONFIG_VMSPLIT_2G_OPT is not set
# CONFIG_VMSPLIT_1G is not set
CONFIG_PAGE_OFFSET=0x80000000
CONFIG_HIGHMEM=y
# CONFIG_X86_CPA_STATISTICS is not set
CONFIG_ARCH_FLATMEM_ENABLE=y
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_SELECT_MEMORY_MODEL=y
CONFIG_ILLEGAL_POINTER_VALUE=0
CONFIG_HIGHPTE=y
# CONFIG_X86_CHECK_BIOS_CORRUPTION is not set
# CONFIG_MTRR is not set
CONFIG_ARCH_RANDOM=y
CONFIG_X86_UMIP=y
CONFIG_CC_HAS_IBT=y
# CONFIG_X86_INTEL_TSX_MODE_OFF is not set
CONFIG_X86_INTEL_TSX_MODE_ON=y
# CONFIG_X86_INTEL_TSX_MODE_AUTO is not set
# CONFIG_EFI is not set
# CONFIG_HZ_100 is not set
# CONFIG_HZ_250 is not set
# CONFIG_HZ_300 is not set
CONFIG_HZ_1000=y
CONFIG_HZ=1000
CONFIG_SCHED_HRTICK=y
# CONFIG_KEXEC is not set
# CONFIG_CRASH_DUMP is not set
CONFIG_PHYSICAL_START=0x1000000
CONFIG_RELOCATABLE=y
# CONFIG_RANDOMIZE_BASE is not set
CONFIG_X86_NEED_RELOCS=y
CONFIG_PHYSICAL_ALIGN=0x200000
# CONFIG_COMPAT_VDSO is not set
# CONFIG_CMDLINE_BOOL is not set
# CONFIG_MODIFY_LDT_SYSCALL is not set
# CONFIG_STRICT_SIGALTSTACK_SIZE is not set
# end of Processor type and features

CONFIG_ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE=y

#
# Power management and ACPI options
#
CONFIG_ARCH_HIBERNATION_HEADER=y
# CONFIG_SUSPEND is not set
CONFIG_HIBERNATE_CALLBACKS=y
CONFIG_HIBERNATION=y
CONFIG_HIBERNATION_SNAPSHOT_DEV=y
CONFIG_PM_STD_PARTITION=""
CONFIG_PM_SLEEP=y
CONFIG_PM_AUTOSLEEP=y
# CONFIG_PM_WAKELOCKS is not set
CONFIG_PM=y
CONFIG_PM_DEBUG=y
CONFIG_PM_ADVANCED_DEBUG=y
CONFIG_PM_SLEEP_DEBUG=y
CONFIG_DPM_WATCHDOG=y
CONFIG_DPM_WATCHDOG_TIMEOUT=120
CONFIG_PM_TRACE=y
CONFIG_PM_TRACE_RTC=y
CONFIG_PM_CLK=y
CONFIG_WQ_POWER_EFFICIENT_DEFAULT=y
CONFIG_ARCH_SUPPORTS_ACPI=y
CONFIG_ACPI=y
CONFIG_ACPI_LEGACY_TABLES_LOOKUP=y
CONFIG_ARCH_MIGHT_HAVE_ACPI_PDC=y
CONFIG_ACPI_SYSTEM_POWER_STATES_SUPPORT=y
CONFIG_ACPI_DEBUGGER=y
CONFIG_ACPI_DEBUGGER_USER=m
CONFIG_ACPI_SPCR_TABLE=y
CONFIG_ACPI_SLEEP=y
CONFIG_ACPI_REV_OVERRIDE_POSSIBLE=y
CONFIG_ACPI_EC_DEBUGFS=m
CONFIG_ACPI_AC=m
# CONFIG_ACPI_BATTERY is not set
CONFIG_ACPI_BUTTON=y
CONFIG_ACPI_VIDEO=y
CONFIG_ACPI_FAN=y
CONFIG_ACPI_TAD=y
CONFIG_ACPI_DOCK=y
CONFIG_ACPI_CPU_FREQ_PSS=y
CONFIG_ACPI_PROCESSOR_CSTATE=y
CONFIG_ACPI_PROCESSOR_IDLE=y
CONFIG_ACPI_PROCESSOR=y
CONFIG_ACPI_IPMI=m
CONFIG_ACPI_PROCESSOR_AGGREGATOR=y
# CONFIG_ACPI_THERMAL is not set
CONFIG_ACPI_PLATFORM_PROFILE=y
CONFIG_ACPI_CUSTOM_DSDT_FILE=""
CONFIG_ARCH_HAS_ACPI_TABLE_UPGRADE=y
# CONFIG_ACPI_TABLE_UPGRADE is not set
CONFIG_ACPI_DEBUG=y
# CONFIG_ACPI_PCI_SLOT is not set
CONFIG_ACPI_CONTAINER=y
CONFIG_ACPI_HOTPLUG_IOAPIC=y
# CONFIG_ACPI_SBS is not set
# CONFIG_ACPI_HED is not set
# CONFIG_ACPI_CUSTOM_METHOD is not set
# CONFIG_ACPI_REDUCED_HARDWARE_ONLY is not set
CONFIG_HAVE_ACPI_APEI=y
CONFIG_HAVE_ACPI_APEI_NMI=y
# CONFIG_ACPI_APEI is not set
# CONFIG_ACPI_DPTF is not set
CONFIG_ACPI_CONFIGFS=y
CONFIG_ACPI_PCC=y
# CONFIG_PMIC_OPREGION is not set
CONFIG_ACPI_VIOT=y
CONFIG_X86_PM_TIMER=y
CONFIG_X86_APM_BOOT=y
CONFIG_APM=m
# CONFIG_APM_IGNORE_USER_SUSPEND is not set
# CONFIG_APM_DO_ENABLE is not set
# CONFIG_APM_CPU_IDLE is not set
# CONFIG_APM_DISPLAY_BLANK is not set
CONFIG_APM_ALLOW_INTS=y

#
# CPU Frequency scaling
#
# CONFIG_CPU_FREQ is not set
# end of CPU Frequency scaling

#
# CPU Idle
#
CONFIG_CPU_IDLE=y
CONFIG_CPU_IDLE_GOV_LADDER=y
CONFIG_CPU_IDLE_GOV_MENU=y
# CONFIG_CPU_IDLE_GOV_TEO is not set
CONFIG_CPU_IDLE_GOV_HALTPOLL=y
# CONFIG_HALTPOLL_CPUIDLE is not set
# end of CPU Idle

CONFIG_INTEL_IDLE=y
# end of Power management and ACPI options

#
# Bus options (PCI etc.)
#
# CONFIG_PCI_GOBIOS is not set
CONFIG_PCI_GOMMCONFIG=y
# CONFIG_PCI_GODIRECT is not set
# CONFIG_PCI_GOANY is not set
CONFIG_PCI_DIRECT=y
CONFIG_PCI_MMCONFIG=y
# CONFIG_PCI_CNB20LE_QUIRK is not set
# CONFIG_ISA_BUS is not set
CONFIG_ISA_DMA_API=y
# CONFIG_ISA is not set
CONFIG_SCx200=m
# CONFIG_SCx200HR_TIMER is not set
# CONFIG_OLPC is not set
CONFIG_ALIX=y
# CONFIG_NET5501 is not set
CONFIG_AMD_NB=y
# end of Bus options (PCI etc.)

#
# Binary Emulations
#
CONFIG_COMPAT_32=y
# end of Binary Emulations

CONFIG_HAVE_ATOMIC_IOMAP=y
CONFIG_HAVE_KVM=y
CONFIG_VIRTUALIZATION=y
# CONFIG_KVM is not set
CONFIG_AS_AVX512=y
CONFIG_AS_SHA1_NI=y
CONFIG_AS_SHA256_NI=y
CONFIG_AS_TPAUSE=y

#
# General architecture-dependent options
#
CONFIG_GENERIC_ENTRY=y
# CONFIG_KPROBES is not set
CONFIG_JUMP_LABEL=y
# CONFIG_STATIC_KEYS_SELFTEST is not set
# CONFIG_STATIC_CALL_SELFTEST is not set
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
CONFIG_ARCH_USE_BUILTIN_BSWAP=y
CONFIG_HAVE_IOREMAP_PROT=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_HAVE_OPTPROBES=y
CONFIG_HAVE_KPROBES_ON_FTRACE=y
CONFIG_ARCH_CORRECT_STACKTRACE_ON_KRETPROBE=y
CONFIG_HAVE_FUNCTION_ERROR_INJECTION=y
CONFIG_HAVE_NMI=y
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
CONFIG_HAVE_DMA_CONTIGUOUS=y
CONFIG_GENERIC_SMP_IDLE_THREAD=y
CONFIG_ARCH_HAS_FORTIFY_SOURCE=y
CONFIG_ARCH_HAS_SET_MEMORY=y
CONFIG_ARCH_HAS_SET_DIRECT_MAP=y
CONFIG_HAVE_ARCH_THREAD_STRUCT_WHITELIST=y
CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT=y
CONFIG_ARCH_WANTS_NO_INSTR=y
CONFIG_ARCH_32BIT_OFF_T=y
CONFIG_HAVE_ASM_MODVERSIONS=y
CONFIG_HAVE_REGS_AND_STACK_ACCESS_API=y
CONFIG_HAVE_RSEQ=y
CONFIG_HAVE_FUNCTION_ARG_ACCESS_API=y
CONFIG_HAVE_HW_BREAKPOINT=y
CONFIG_HAVE_MIXED_BREAKPOINTS_REGS=y
CONFIG_HAVE_USER_RETURN_NOTIFIER=y
CONFIG_HAVE_PERF_EVENTS_NMI=y
CONFIG_HAVE_HARDLOCKUP_DETECTOR_PERF=y
CONFIG_HAVE_PERF_REGS=y
CONFIG_HAVE_PERF_USER_STACK_DUMP=y
CONFIG_HAVE_ARCH_JUMP_LABEL=y
CONFIG_HAVE_ARCH_JUMP_LABEL_RELATIVE=y
CONFIG_MMU_GATHER_TABLE_FREE=y
CONFIG_MMU_GATHER_RCU_TABLE_FREE=y
CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG=y
CONFIG_HAVE_ALIGNED_STRUCT_PAGE=y
CONFIG_HAVE_CMPXCHG_LOCAL=y
CONFIG_HAVE_CMPXCHG_DOUBLE=y
CONFIG_ARCH_WANT_IPC_PARSE_VERSION=y
CONFIG_HAVE_ARCH_SECCOMP=y
CONFIG_HAVE_ARCH_SECCOMP_FILTER=y
CONFIG_SECCOMP=y
CONFIG_SECCOMP_FILTER=y
# CONFIG_SECCOMP_CACHE_DEBUG is not set
CONFIG_HAVE_ARCH_STACKLEAK=y
CONFIG_HAVE_STACKPROTECTOR=y
# CONFIG_STACKPROTECTOR is not set
CONFIG_ARCH_SUPPORTS_LTO_CLANG=y
CONFIG_ARCH_SUPPORTS_LTO_CLANG_THIN=y
CONFIG_LTO_NONE=y
CONFIG_HAVE_ARCH_WITHIN_STACK_FRAMES=y
CONFIG_HAVE_IRQ_TIME_ACCOUNTING=y
CONFIG_HAVE_MOVE_PUD=y
CONFIG_HAVE_MOVE_PMD=y
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_HAVE_MOD_ARCH_SPECIFIC=y
CONFIG_MODULES_USE_ELF_REL=y
CONFIG_HAVE_SOFTIRQ_ON_OWN_STACK=y
CONFIG_ARCH_HAS_ELF_RANDOMIZE=y
CONFIG_HAVE_ARCH_MMAP_RND_BITS=y
CONFIG_HAVE_EXIT_THREAD=y
CONFIG_ARCH_MMAP_RND_BITS=8
CONFIG_PAGE_SIZE_LESS_THAN_64KB=y
CONFIG_PAGE_SIZE_LESS_THAN_256KB=y
CONFIG_ISA_BUS_API=y
CONFIG_CLONE_BACKWARDS=y
CONFIG_OLD_SIGSUSPEND3=y
CONFIG_OLD_SIGACTION=y
CONFIG_COMPAT_32BIT_TIME=y
CONFIG_HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET=y
CONFIG_RANDOMIZE_KSTACK_OFFSET=y
# CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT is not set
CONFIG_ARCH_HAS_STRICT_KERNEL_RWX=y
CONFIG_STRICT_KERNEL_RWX=y
CONFIG_ARCH_HAS_STRICT_MODULE_RWX=y
CONFIG_STRICT_MODULE_RWX=y
CONFIG_HAVE_ARCH_PREL32_RELOCATIONS=y
CONFIG_LOCK_EVENT_COUNTS=y
CONFIG_ARCH_HAS_MEM_ENCRYPT=y
CONFIG_HAVE_STATIC_CALL=y
CONFIG_HAVE_PREEMPT_DYNAMIC=y
CONFIG_HAVE_PREEMPT_DYNAMIC_CALL=y
CONFIG_ARCH_WANT_LD_ORPHAN_WARN=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_ARCH_SPLIT_ARG64=y
CONFIG_ARCH_HAS_PARANOID_L1D_FLUSH=y
CONFIG_DYNAMIC_SIGFRAME=y

#
# GCOV-based kernel profiling
#
CONFIG_GCOV_KERNEL=y
CONFIG_ARCH_HAS_GCOV_PROFILE_ALL=y
CONFIG_GCOV_PROFILE_ALL=y
# end of GCOV-based kernel profiling

CONFIG_HAVE_GCC_PLUGINS=y
CONFIG_GCC_PLUGINS=y
# CONFIG_GCC_PLUGIN_LATENT_ENTROPY is not set
# end of General architecture-dependent options

CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=1
CONFIG_MODULES=y
# CONFIG_MODULE_FORCE_LOAD is not set
CONFIG_MODULE_UNLOAD=y
# CONFIG_MODULE_FORCE_UNLOAD is not set
# CONFIG_MODULE_UNLOAD_TAINT_TRACKING is not set
# CONFIG_MODVERSIONS is not set
# CONFIG_MODULE_SRCVERSION_ALL is not set
# CONFIG_MODULE_SIG is not set
CONFIG_MODULE_COMPRESS_NONE=y
# CONFIG_MODULE_COMPRESS_GZIP is not set
# CONFIG_MODULE_COMPRESS_XZ is not set
# CONFIG_MODULE_COMPRESS_ZSTD is not set
CONFIG_MODULE_ALLOW_MISSING_NAMESPACE_IMPORTS=y
CONFIG_MODPROBE_PATH="/sbin/modprobe"
# CONFIG_TRIM_UNUSED_KSYMS is not set
CONFIG_MODULES_TREE_LOOKUP=y
CONFIG_BLOCK=y
CONFIG_BLOCK_LEGACY_AUTOLOAD=y
CONFIG_BLK_ICQ=y
# CONFIG_BLK_DEV_BSGLIB is not set
CONFIG_BLK_DEV_INTEGRITY=y
CONFIG_BLK_DEV_INTEGRITY_T10=y
# CONFIG_BLK_DEV_ZONED is not set
CONFIG_BLK_WBT=y
# CONFIG_BLK_WBT_MQ is not set
CONFIG_BLK_DEBUG_FS=y
# CONFIG_BLK_SED_OPAL is not set
# CONFIG_BLK_INLINE_ENCRYPTION is not set

#
# Partition Types
#
# CONFIG_PARTITION_ADVANCED is not set
CONFIG_AMIGA_PARTITION=y
CONFIG_MSDOS_PARTITION=y
CONFIG_EFI_PARTITION=y
# end of Partition Types

CONFIG_BLK_MQ_PCI=y
CONFIG_BLK_MQ_VIRTIO=y
CONFIG_BLK_PM=y
CONFIG_BLOCK_HOLDER_DEPRECATED=y
CONFIG_BLK_MQ_STACKING=y

#
# IO Schedulers
#
CONFIG_MQ_IOSCHED_DEADLINE=y
CONFIG_MQ_IOSCHED_KYBER=y
CONFIG_IOSCHED_BFQ=y
# end of IO Schedulers

CONFIG_ASN1=y
CONFIG_UNINLINE_SPIN_UNLOCK=y
CONFIG_ARCH_SUPPORTS_ATOMIC_RMW=y
CONFIG_ARCH_USE_QUEUED_SPINLOCKS=y
CONFIG_ARCH_USE_QUEUED_RWLOCKS=y
CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE=y
CONFIG_ARCH_HAS_SYNC_CORE_BEFORE_USERMODE=y
CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y
CONFIG_FREEZER=y

#
# Executable file formats
#
CONFIG_BINFMT_ELF=y
CONFIG_ELFCORE=y
CONFIG_BINFMT_SCRIPT=y
CONFIG_BINFMT_MISC=m
# CONFIG_COREDUMP is not set
# end of Executable file formats

#
# Memory Management options
#
CONFIG_SWAP=y
# CONFIG_ZSWAP is not set

#
# SLAB allocator options
#
# CONFIG_SLAB is not set
CONFIG_SLUB=y
# CONFIG_SLOB is not set
CONFIG_SLAB_MERGE_DEFAULT=y
# CONFIG_SLAB_FREELIST_RANDOM is not set
CONFIG_SLAB_FREELIST_HARDENED=y
# CONFIG_SLUB_STATS is not set
# end of SLAB allocator options

# CONFIG_SHUFFLE_PAGE_ALLOCATOR is not set
CONFIG_COMPAT_BRK=y
CONFIG_SELECT_MEMORY_MODEL=y
# CONFIG_FLATMEM_MANUAL is not set
CONFIG_SPARSEMEM_MANUAL=y
CONFIG_SPARSEMEM=y
CONFIG_SPARSEMEM_STATIC=y
CONFIG_HAVE_FAST_GUP=y
CONFIG_MEMORY_ISOLATION=y
CONFIG_EXCLUSIVE_SYSTEM_RAM=y
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_MEMORY_BALLOON=y
# CONFIG_BALLOON_COMPACTION is not set
CONFIG_COMPACTION=y
CONFIG_PAGE_REPORTING=y
CONFIG_MIGRATION=y
CONFIG_CONTIG_ALLOC=y
# CONFIG_BOUNCE is not set
CONFIG_VIRT_TO_BUS=y
CONFIG_KSM=y
CONFIG_DEFAULT_MMAP_MIN_ADDR=4096
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
# CONFIG_TRANSPARENT_HUGEPAGE_MADVISE is not set
# CONFIG_READ_ONLY_THP_FOR_FS is not set
CONFIG_NEED_PER_CPU_KM=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_CMA=y
CONFIG_CMA_DEBUG=y
CONFIG_CMA_DEBUGFS=y
# CONFIG_CMA_SYSFS is not set
CONFIG_CMA_AREAS=7
CONFIG_GENERIC_EARLY_IOREMAP=y
CONFIG_PAGE_IDLE_FLAG=y
CONFIG_IDLE_PAGE_TRACKING=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_ARCH_HAS_CURRENT_STACK_POINTER=y
CONFIG_ARCH_HAS_VM_GET_PAGE_PROT=y
CONFIG_ARCH_HAS_ZONE_DMA_SET=y
# CONFIG_ZONE_DMA is not set
CONFIG_VMAP_PFN=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_PERCPU_STATS=y
# CONFIG_GUP_TEST is not set
CONFIG_ARCH_HAS_PTE_SPECIAL=y
CONFIG_MAPPING_DIRTY_HELPERS=y
CONFIG_KMAP_LOCAL=y
CONFIG_SECRETMEM=y
# CONFIG_ANON_VMA_NAME is not set
# CONFIG_USERFAULTFD is not set

#
# Data Access Monitoring
#
# CONFIG_DAMON is not set
# end of Data Access Monitoring
# end of Memory Management options

CONFIG_NET=y
CONFIG_NET_INGRESS=y
CONFIG_NET_EGRESS=y
CONFIG_SKB_EXTENSIONS=y

#
# Networking options
#
CONFIG_PACKET=y
CONFIG_PACKET_DIAG=m
CONFIG_UNIX=y
CONFIG_UNIX_SCM=y
CONFIG_AF_UNIX_OOB=y
CONFIG_UNIX_DIAG=y
# CONFIG_TLS is not set
# CONFIG_XFRM_USER is not set
# CONFIG_NET_KEY is not set
CONFIG_INET=y
# CONFIG_IP_MULTICAST is not set
# CONFIG_IP_ADVANCED_ROUTER is not set
CONFIG_IP_PNP=y
CONFIG_IP_PNP_DHCP=y
# CONFIG_IP_PNP_BOOTP is not set
# CONFIG_IP_PNP_RARP is not set
# CONFIG_NET_IPIP is not set
# CONFIG_NET_IPGRE_DEMUX is not set
CONFIG_NET_IP_TUNNEL=y
# CONFIG_SYN_COOKIES is not set
# CONFIG_NET_IPVTI is not set
# CONFIG_NET_FOU is not set
# CONFIG_NET_FOU_IP_TUNNELS is not set
# CONFIG_INET_AH is not set
# CONFIG_INET_ESP is not set
# CONFIG_INET_IPCOMP is not set
CONFIG_INET_TUNNEL=y
CONFIG_INET_DIAG=y
CONFIG_INET_TCP_DIAG=y
# CONFIG_INET_UDP_DIAG is not set
# CONFIG_INET_RAW_DIAG is not set
# CONFIG_INET_DIAG_DESTROY is not set
# CONFIG_TCP_CONG_ADVANCED is not set
CONFIG_TCP_CONG_CUBIC=y
CONFIG_DEFAULT_TCP_CONG="cubic"
# CONFIG_TCP_MD5SIG is not set
CONFIG_IPV6=y
# CONFIG_IPV6_ROUTER_PREF is not set
# CONFIG_IPV6_OPTIMISTIC_DAD is not set
# CONFIG_INET6_AH is not set
# CONFIG_INET6_ESP is not set
# CONFIG_INET6_IPCOMP is not set
# CONFIG_IPV6_MIP6 is not set
# CONFIG_IPV6_VTI is not set
CONFIG_IPV6_SIT=y
# CONFIG_IPV6_SIT_6RD is not set
CONFIG_IPV6_NDISC_NODETYPE=y
# CONFIG_IPV6_TUNNEL is not set
# CONFIG_IPV6_MULTIPLE_TABLES is not set
# CONFIG_IPV6_MROUTE is not set
# CONFIG_IPV6_SEG6_LWTUNNEL is not set
# CONFIG_IPV6_SEG6_HMAC is not set
# CONFIG_IPV6_RPL_LWTUNNEL is not set
# CONFIG_IPV6_IOAM6_LWTUNNEL is not set
# CONFIG_NETLABEL is not set
# CONFIG_MPTCP is not set
# CONFIG_NETWORK_SECMARK is not set
# CONFIG_NETWORK_PHY_TIMESTAMPING is not set
# CONFIG_NETFILTER is not set
# CONFIG_BPFILTER is not set
# CONFIG_IP_DCCP is not set
# CONFIG_IP_SCTP is not set
# CONFIG_RDS is not set
# CONFIG_TIPC is not set
CONFIG_ATM=y
# CONFIG_ATM_CLIP is not set
# CONFIG_ATM_LANE is not set
# CONFIG_ATM_BR2684 is not set
# CONFIG_L2TP is not set
CONFIG_STP=y
CONFIG_GARP=m
CONFIG_MRP=m
CONFIG_BRIDGE=y
CONFIG_BRIDGE_IGMP_SNOOPING=y
CONFIG_BRIDGE_VLAN_FILTERING=y
CONFIG_BRIDGE_MRP=y
# CONFIG_BRIDGE_CFM is not set
# CONFIG_NET_DSA is not set
CONFIG_VLAN_8021Q=m
CONFIG_VLAN_8021Q_GVRP=y
CONFIG_VLAN_8021Q_MVRP=y
CONFIG_DECNET=m
# CONFIG_DECNET_ROUTER is not set
CONFIG_LLC=y
CONFIG_LLC2=y
CONFIG_ATALK=y
# CONFIG_DEV_APPLETALK is not set
CONFIG_X25=y
CONFIG_LAPB=y
CONFIG_PHONET=m
# CONFIG_6LOWPAN is not set
CONFIG_IEEE802154=m
CONFIG_IEEE802154_NL802154_EXPERIMENTAL=y
CONFIG_IEEE802154_SOCKET=m
# CONFIG_MAC802154 is not set
CONFIG_NET_SCHED=y

#
# Queueing/Scheduling
#
CONFIG_NET_SCH_CBQ=m
CONFIG_NET_SCH_HTB=m
CONFIG_NET_SCH_HFSC=y
# CONFIG_NET_SCH_ATM is not set
CONFIG_NET_SCH_PRIO=y
# CONFIG_NET_SCH_MULTIQ is not set
CONFIG_NET_SCH_RED=y
# CONFIG_NET_SCH_SFB is not set
CONFIG_NET_SCH_SFQ=y
# CONFIG_NET_SCH_TEQL is not set
CONFIG_NET_SCH_TBF=m
CONFIG_NET_SCH_CBS=y
# CONFIG_NET_SCH_ETF is not set
# CONFIG_NET_SCH_TAPRIO is not set
CONFIG_NET_SCH_GRED=m
CONFIG_NET_SCH_DSMARK=y
# CONFIG_NET_SCH_NETEM is not set
CONFIG_NET_SCH_DRR=m
CONFIG_NET_SCH_MQPRIO=y
CONFIG_NET_SCH_SKBPRIO=m
CONFIG_NET_SCH_CHOKE=y
CONFIG_NET_SCH_QFQ=m
CONFIG_NET_SCH_CODEL=m
CONFIG_NET_SCH_FQ_CODEL=m
# CONFIG_NET_SCH_CAKE is not set
CONFIG_NET_SCH_FQ=m
CONFIG_NET_SCH_HHF=y
CONFIG_NET_SCH_PIE=m
CONFIG_NET_SCH_FQ_PIE=m
CONFIG_NET_SCH_INGRESS=m
# CONFIG_NET_SCH_PLUG is not set
# CONFIG_NET_SCH_ETS is not set
# CONFIG_NET_SCH_DEFAULT is not set

#
# Classification
#
CONFIG_NET_CLS=y
CONFIG_NET_CLS_BASIC=m
CONFIG_NET_CLS_TCINDEX=y
# CONFIG_NET_CLS_ROUTE4 is not set
# CONFIG_NET_CLS_FW is not set
CONFIG_NET_CLS_U32=y
# CONFIG_CLS_U32_PERF is not set
# CONFIG_CLS_U32_MARK is not set
CONFIG_NET_CLS_RSVP=y
# CONFIG_NET_CLS_RSVP6 is not set
CONFIG_NET_CLS_FLOW=m
CONFIG_NET_CLS_CGROUP=m
# CONFIG_NET_CLS_BPF is not set
CONFIG_NET_CLS_FLOWER=y
# CONFIG_NET_CLS_MATCHALL is not set
# CONFIG_NET_EMATCH is not set
CONFIG_NET_CLS_ACT=y
CONFIG_NET_ACT_POLICE=m
CONFIG_NET_ACT_GACT=y
# CONFIG_GACT_PROB is not set
# CONFIG_NET_ACT_MIRRED is not set
CONFIG_NET_ACT_SAMPLE=y
CONFIG_NET_ACT_NAT=y
CONFIG_NET_ACT_PEDIT=m
CONFIG_NET_ACT_SIMP=y
CONFIG_NET_ACT_SKBEDIT=y
# CONFIG_NET_ACT_CSUM is not set
CONFIG_NET_ACT_MPLS=y
# CONFIG_NET_ACT_VLAN is not set
CONFIG_NET_ACT_BPF=m
CONFIG_NET_ACT_SKBMOD=y
CONFIG_NET_ACT_IFE=y
CONFIG_NET_ACT_TUNNEL_KEY=y
CONFIG_NET_ACT_GATE=m
CONFIG_NET_IFE_SKBMARK=m
# CONFIG_NET_IFE_SKBPRIO is not set
CONFIG_NET_IFE_SKBTCINDEX=y
CONFIG_NET_TC_SKB_EXT=y
CONFIG_NET_SCH_FIFO=y
# CONFIG_DCB is not set
CONFIG_DNS_RESOLVER=m
CONFIG_BATMAN_ADV=y
CONFIG_BATMAN_ADV_BLA=y
CONFIG_BATMAN_ADV_DAT=y
CONFIG_BATMAN_ADV_NC=y
CONFIG_BATMAN_ADV_MCAST=y
CONFIG_BATMAN_ADV_DEBUG=y
# CONFIG_BATMAN_ADV_TRACING is not set
# CONFIG_OPENVSWITCH is not set
CONFIG_VSOCKETS=y
CONFIG_VSOCKETS_DIAG=y
# CONFIG_VSOCKETS_LOOPBACK is not set
CONFIG_VMWARE_VMCI_VSOCKETS=m
CONFIG_VIRTIO_VSOCKETS=y
CONFIG_VIRTIO_VSOCKETS_COMMON=y
CONFIG_HYPERV_VSOCKETS=y
# CONFIG_NETLINK_DIAG is not set
# CONFIG_MPLS is not set
CONFIG_NET_NSH=y
CONFIG_HSR=y
# CONFIG_NET_SWITCHDEV is not set
# CONFIG_NET_L3_MASTER_DEV is not set
CONFIG_QRTR=m
CONFIG_QRTR_SMD=m
# CONFIG_QRTR_TUN is not set
# CONFIG_NET_NCSI is not set
CONFIG_CGROUP_NET_PRIO=y
CONFIG_CGROUP_NET_CLASSID=y
CONFIG_NET_RX_BUSY_POLL=y
CONFIG_BQL=y

#
# Network testing
#
# CONFIG_NET_PKTGEN is not set
# CONFIG_NET_DROP_MONITOR is not set
# end of Network testing
# end of Networking options

CONFIG_HAMRADIO=y

#
# Packet Radio protocols
#
# CONFIG_AX25 is not set
CONFIG_CAN=m
# CONFIG_CAN_RAW is not set
# CONFIG_CAN_BCM is not set
CONFIG_CAN_GW=m
CONFIG_CAN_J1939=m
CONFIG_CAN_ISOTP=m

#
# CAN Device Drivers
#
CONFIG_CAN_VCAN=m
CONFIG_CAN_VXCAN=m
CONFIG_CAN_SLCAN=m
CONFIG_CAN_DEV=m
# CONFIG_CAN_CALC_BITTIMING is not set
CONFIG_CAN_KVASER_PCIEFD=m
CONFIG_PCH_CAN=m
# CONFIG_CAN_C_CAN is not set
# CONFIG_CAN_CC770 is not set
# CONFIG_CAN_CTUCANFD_PCI is not set
CONFIG_CAN_IFI_CANFD=m
# CONFIG_CAN_M_CAN is not set
CONFIG_CAN_PEAK_PCIEFD=m
CONFIG_CAN_SJA1000=m
# CONFIG_CAN_EMS_PCI is not set
# CONFIG_CAN_EMS_PCMCIA is not set
CONFIG_CAN_F81601=m
CONFIG_CAN_KVASER_PCI=m
CONFIG_CAN_PEAK_PCI=m
# CONFIG_CAN_PEAK_PCIEC is not set
CONFIG_CAN_PEAK_PCMCIA=m
CONFIG_CAN_PLX_PCI=m
CONFIG_CAN_SJA1000_ISA=m
CONFIG_CAN_SJA1000_PLATFORM=m
# CONFIG_CAN_SOFTING is not set
# CONFIG_CAN_DEBUG_DEVICES is not set
# end of CAN Device Drivers

CONFIG_BT=m
CONFIG_BT_BREDR=y
CONFIG_BT_RFCOMM=m
CONFIG_BT_RFCOMM_TTY=y
# CONFIG_BT_BNEP is not set
# CONFIG_BT_HIDP is not set
CONFIG_BT_HS=y
CONFIG_BT_LE=y
# CONFIG_BT_LEDS is not set
# CONFIG_BT_MSFTEXT is not set
# CONFIG_BT_AOSPEXT is not set
# CONFIG_BT_DEBUGFS is not set
# CONFIG_BT_SELFTEST is not set
CONFIG_BT_FEATURE_DEBUG=y

#
# Bluetooth device drivers
#
CONFIG_BT_MTK=m
CONFIG_BT_HCIUART=m
CONFIG_BT_HCIUART_SERDEV=y
CONFIG_BT_HCIUART_H4=y
# CONFIG_BT_HCIUART_NOKIA is not set
# CONFIG_BT_HCIUART_BCSP is not set
CONFIG_BT_HCIUART_ATH3K=y
# CONFIG_BT_HCIUART_LL is not set
# CONFIG_BT_HCIUART_3WIRE is not set
# CONFIG_BT_HCIUART_INTEL is not set
# CONFIG_BT_HCIUART_RTL is not set
# CONFIG_BT_HCIUART_QCA is not set
# CONFIG_BT_HCIUART_AG6XX is not set
CONFIG_BT_HCIUART_MRVL=y
CONFIG_BT_HCIDTL1=m
CONFIG_BT_HCIBT3C=m
# CONFIG_BT_HCIBLUECARD is not set
# CONFIG_BT_HCIVHCI is not set
CONFIG_BT_MRVL=m
CONFIG_BT_MTKUART=m
# CONFIG_BT_VIRTIO is not set
# end of Bluetooth device drivers

# CONFIG_AF_RXRPC is not set
# CONFIG_AF_KCM is not set
# CONFIG_MCTP is not set
CONFIG_WIRELESS=y
CONFIG_WEXT_CORE=y
CONFIG_WEXT_PROC=y
CONFIG_CFG80211=m
# CONFIG_NL80211_TESTMODE is not set
CONFIG_CFG80211_DEVELOPER_WARNINGS=y
# CONFIG_CFG80211_CERTIFICATION_ONUS is not set
CONFIG_CFG80211_REQUIRE_SIGNED_REGDB=y
CONFIG_CFG80211_USE_KERNEL_REGDB_KEYS=y
# CONFIG_CFG80211_DEFAULT_PS is not set
CONFIG_CFG80211_DEBUGFS=y
CONFIG_CFG80211_CRDA_SUPPORT=y
CONFIG_CFG80211_WEXT=y
CONFIG_MAC80211=m
# CONFIG_MAC80211_RC_MINSTREL is not set
CONFIG_MAC80211_RC_DEFAULT=""

#
# Some wireless drivers require a rate control algorithm
#
# CONFIG_MAC80211_MESH is not set
CONFIG_MAC80211_LEDS=y
# CONFIG_MAC80211_DEBUGFS is not set
# CONFIG_MAC80211_MESSAGE_TRACING is not set
# CONFIG_MAC80211_DEBUG_MENU is not set
CONFIG_MAC80211_STA_HASH_MAX_SIZE=0
# CONFIG_RFKILL is not set
CONFIG_NET_9P=y
CONFIG_NET_9P_FD=y
CONFIG_NET_9P_VIRTIO=y
# CONFIG_NET_9P_DEBUG is not set
# CONFIG_CAIF is not set
# CONFIG_CEPH_LIB is not set
CONFIG_NFC=m
CONFIG_NFC_DIGITAL=m
CONFIG_NFC_NCI=m
CONFIG_NFC_NCI_UART=m
# CONFIG_NFC_HCI is not set

#
# Near Field Communication (NFC) devices
#
CONFIG_NFC_SIM=m
# CONFIG_NFC_VIRTUAL_NCI is not set
CONFIG_NFC_FDP=m
CONFIG_NFC_FDP_I2C=m
CONFIG_NFC_PN533=m
# CONFIG_NFC_PN533_I2C is not set
CONFIG_NFC_PN532_UART=m
# CONFIG_NFC_MRVL_UART is not set
CONFIG_NFC_ST_NCI=m
CONFIG_NFC_ST_NCI_I2C=m
# CONFIG_NFC_NXP_NCI is not set
CONFIG_NFC_S3FWRN5=m
CONFIG_NFC_S3FWRN5_I2C=m
# CONFIG_NFC_S3FWRN82_UART is not set
# end of Near Field Communication (NFC) devices

CONFIG_PSAMPLE=y
CONFIG_NET_IFE=y
CONFIG_LWTUNNEL=y
CONFIG_LWTUNNEL_BPF=y
CONFIG_DST_CACHE=y
CONFIG_GRO_CELLS=y
CONFIG_FAILOVER=y
# CONFIG_ETHTOOL_NETLINK is not set

#
# Device Drivers
#
CONFIG_HAVE_EISA=y
CONFIG_EISA=y
CONFIG_EISA_VLB_PRIMING=y
CONFIG_EISA_PCI_EISA=y
# CONFIG_EISA_VIRTUAL_ROOT is not set
# CONFIG_EISA_NAMES is not set
CONFIG_HAVE_PCI=y
CONFIG_PCI=y
CONFIG_PCI_DOMAINS=y
# CONFIG_PCIEPORTBUS is not set
# CONFIG_PCIEASPM is not set
CONFIG_PCIE_PTM=y
CONFIG_PCI_MSI=y
CONFIG_PCI_MSI_IRQ_DOMAIN=y
CONFIG_PCI_QUIRKS=y
# CONFIG_PCI_DEBUG is not set
CONFIG_PCI_REALLOC_ENABLE_AUTO=y
# CONFIG_PCI_STUB is not set
# CONFIG_PCI_PF_STUB is not set
CONFIG_PCI_ATS=y
CONFIG_PCI_LOCKLESS_CONFIG=y
CONFIG_PCI_IOV=y
# CONFIG_PCI_PRI is not set
CONFIG_PCI_PASID=y
CONFIG_PCI_LABEL=y
# CONFIG_PCIE_BUS_TUNE_OFF is not set
CONFIG_PCIE_BUS_DEFAULT=y
# CONFIG_PCIE_BUS_SAFE is not set
# CONFIG_PCIE_BUS_PERFORMANCE is not set
# CONFIG_PCIE_BUS_PEER2PEER is not set
# CONFIG_VGA_ARB is not set
# CONFIG_HOTPLUG_PCI is not set

#
# PCI controller drivers
#

#
# DesignWare PCI Core Support
#
# CONFIG_PCIE_DW_PLAT_HOST is not set
# CONFIG_PCI_MESON is not set
# end of DesignWare PCI Core Support

#
# Mobiveil PCIe Core Support
#
# end of Mobiveil PCIe Core Support

#
# Cadence PCIe controllers support
#
# end of Cadence PCIe controllers support
# end of PCI controller drivers

#
# PCI Endpoint
#
# CONFIG_PCI_ENDPOINT is not set
# end of PCI Endpoint

#
# PCI switch controller drivers
#
# CONFIG_PCI_SW_SWITCHTEC is not set
# end of PCI switch controller drivers

# CONFIG_CXL_BUS is not set
CONFIG_PCCARD=y
CONFIG_PCMCIA=m
CONFIG_PCMCIA_LOAD_CIS=y
# CONFIG_CARDBUS is not set

#
# PC-card bridges
#
# CONFIG_YENTA is not set
CONFIG_PD6729=m
# CONFIG_I82092 is not set
CONFIG_PCCARD_NONSTATIC=y
# CONFIG_RAPIDIO is not set

#
# Generic Driver Options
#
CONFIG_AUXILIARY_BUS=y
# CONFIG_UEVENT_HELPER is not set
CONFIG_DEVTMPFS=y
# CONFIG_DEVTMPFS_MOUNT is not set
# CONFIG_DEVTMPFS_SAFE is not set
# CONFIG_STANDALONE is not set
# CONFIG_PREVENT_FIRMWARE_BUILD is not set

#
# Firmware loader
#
CONFIG_FW_LOADER=y
CONFIG_FW_LOADER_PAGED_BUF=y
CONFIG_EXTRA_FIRMWARE=""
# CONFIG_FW_LOADER_USER_HELPER is not set
CONFIG_FW_LOADER_COMPRESS=y
CONFIG_FW_LOADER_COMPRESS_XZ=y
# CONFIG_FW_LOADER_COMPRESS_ZSTD is not set
CONFIG_FW_CACHE=y
# CONFIG_FW_UPLOAD is not set
# end of Firmware loader

CONFIG_WANT_DEV_COREDUMP=y
CONFIG_ALLOW_DEV_COREDUMP=y
CONFIG_DEV_COREDUMP=y
# CONFIG_DEBUG_DRIVER is not set
CONFIG_DEBUG_DEVRES=y
# CONFIG_DEBUG_TEST_DRIVER_REMOVE is not set
# CONFIG_TEST_ASYNC_DRIVER_PROBE is not set
CONFIG_GENERIC_CPU_AUTOPROBE=y
CONFIG_GENERIC_CPU_VULNERABILITIES=y
CONFIG_REGMAP=y
CONFIG_REGMAP_I2C=y
CONFIG_REGMAP_SLIMBUS=m
CONFIG_REGMAP_MMIO=y
CONFIG_REGMAP_IRQ=y
CONFIG_DMA_SHARED_BUFFER=y
CONFIG_DMA_FENCE_TRACE=y
# end of Generic Driver Options

#
# Bus devices
#
# CONFIG_MHI_BUS is not set
# CONFIG_MHI_BUS_EP is not set
# end of Bus devices

CONFIG_CONNECTOR=m

#
# Firmware Drivers
#

#
# ARM System Control and Management Interface Protocol
#
# end of ARM System Control and Management Interface Protocol

# CONFIG_EDD is not set
CONFIG_FIRMWARE_MEMMAP=y
CONFIG_FW_CFG_SYSFS=y
# CONFIG_FW_CFG_SYSFS_CMDLINE is not set
# CONFIG_SYSFB_SIMPLEFB is not set
# CONFIG_GOOGLE_FIRMWARE is not set

#
# Tegra firmware driver
#
# end of Tegra firmware driver
# end of Firmware Drivers

CONFIG_GNSS=m
CONFIG_GNSS_SERIAL=m
CONFIG_GNSS_MTK_SERIAL=m
# CONFIG_GNSS_SIRF_SERIAL is not set
CONFIG_GNSS_UBX_SERIAL=m
CONFIG_MTD=m
# CONFIG_MTD_TESTS is not set

#
# Partition parsers
#
# CONFIG_MTD_AR7_PARTS is not set
# CONFIG_MTD_CMDLINE_PARTS is not set
# CONFIG_MTD_REDBOOT_PARTS is not set
# end of Partition parsers

#
# User Modules And Translation Layers
#
CONFIG_MTD_BLKDEVS=m
# CONFIG_MTD_BLOCK is not set
CONFIG_MTD_BLOCK_RO=m

#
# Note that in some cases UBI block is preferred. See MTD_UBI_BLOCK.
#
# CONFIG_FTL is not set
CONFIG_NFTL=m
CONFIG_NFTL_RW=y
# CONFIG_INFTL is not set
CONFIG_RFD_FTL=m
CONFIG_SSFDC=m
CONFIG_SM_FTL=m
CONFIG_MTD_OOPS=m
# CONFIG_MTD_SWAP is not set
CONFIG_MTD_PARTITIONED_MASTER=y

#
# RAM/ROM/Flash chip drivers
#
CONFIG_MTD_CFI=m
CONFIG_MTD_JEDECPROBE=m
CONFIG_MTD_GEN_PROBE=m
# CONFIG_MTD_CFI_ADV_OPTIONS is not set
CONFIG_MTD_MAP_BANK_WIDTH_1=y
CONFIG_MTD_MAP_BANK_WIDTH_2=y
CONFIG_MTD_MAP_BANK_WIDTH_4=y
CONFIG_MTD_CFI_I1=y
CONFIG_MTD_CFI_I2=y
CONFIG_MTD_CFI_INTELEXT=m
CONFIG_MTD_CFI_AMDSTD=m
CONFIG_MTD_CFI_STAA=m
CONFIG_MTD_CFI_UTIL=m
CONFIG_MTD_RAM=m
# CONFIG_MTD_ROM is not set
CONFIG_MTD_ABSENT=m
# end of RAM/ROM/Flash chip drivers

#
# Mapping drivers for chip access
#
CONFIG_MTD_COMPLEX_MAPPINGS=y
CONFIG_MTD_PHYSMAP=m
# CONFIG_MTD_PHYSMAP_COMPAT is not set
# CONFIG_MTD_PHYSMAP_GPIO_ADDR is not set
# CONFIG_MTD_SBC_GXX is not set
# CONFIG_MTD_SCx200_DOCFLASH is not set
# CONFIG_MTD_AMD76XROM is not set
CONFIG_MTD_ICHXROM=m
CONFIG_MTD_ESB2ROM=m
CONFIG_MTD_CK804XROM=m
# CONFIG_MTD_SCB2_FLASH is not set
CONFIG_MTD_NETtel=m
# CONFIG_MTD_L440GX is not set
# CONFIG_MTD_PCI is not set
# CONFIG_MTD_PCMCIA is not set
# CONFIG_MTD_INTEL_VR_NOR is not set
CONFIG_MTD_PLATRAM=m
# end of Mapping drivers for chip access

#
# Self-contained MTD device drivers
#
CONFIG_MTD_PMC551=m
CONFIG_MTD_PMC551_BUGFIX=y
# CONFIG_MTD_PMC551_DEBUG is not set
CONFIG_MTD_SLRAM=m
CONFIG_MTD_PHRAM=m
CONFIG_MTD_MTDRAM=m
CONFIG_MTDRAM_TOTAL_SIZE=4096
CONFIG_MTDRAM_ERASE_SIZE=128
CONFIG_MTD_BLOCK2MTD=m

#
# Disk-On-Chip Device Drivers
#
# CONFIG_MTD_DOCG3 is not set
# end of Self-contained MTD device drivers

#
# NAND
#
CONFIG_MTD_NAND_CORE=m
CONFIG_MTD_ONENAND=m
# CONFIG_MTD_ONENAND_VERIFY_WRITE is not set
CONFIG_MTD_ONENAND_GENERIC=m
# CONFIG_MTD_ONENAND_OTP is not set
CONFIG_MTD_ONENAND_2X_PROGRAM=y
CONFIG_MTD_RAW_NAND=m

#
# Raw/parallel NAND flash controllers
#
# CONFIG_MTD_NAND_DENALI_PCI is not set
CONFIG_MTD_NAND_CAFE=m
# CONFIG_MTD_NAND_CS553X is not set
CONFIG_MTD_NAND_MXIC=m
CONFIG_MTD_NAND_GPIO=m
# CONFIG_MTD_NAND_PLATFORM is not set
CONFIG_MTD_NAND_ARASAN=m

#
# Misc
#
CONFIG_MTD_SM_COMMON=m
# CONFIG_MTD_NAND_NANDSIM is not set
CONFIG_MTD_NAND_RICOH=m
CONFIG_MTD_NAND_DISKONCHIP=m
# CONFIG_MTD_NAND_DISKONCHIP_PROBE_ADVANCED is not set
CONFIG_MTD_NAND_DISKONCHIP_PROBE_ADDRESS=0
# CONFIG_MTD_NAND_DISKONCHIP_BBTWRITE is not set

#
# ECC engine support
#
CONFIG_MTD_NAND_ECC=y
CONFIG_MTD_NAND_ECC_SW_HAMMING=y
CONFIG_MTD_NAND_ECC_SW_HAMMING_SMC=y
CONFIG_MTD_NAND_ECC_SW_BCH=y
# CONFIG_MTD_NAND_ECC_MXIC is not set
# end of ECC engine support
# end of NAND

#
# LPDDR & LPDDR2 PCM memory drivers
#
# CONFIG_MTD_LPDDR is not set
# end of LPDDR & LPDDR2 PCM memory drivers

CONFIG_MTD_UBI=m
CONFIG_MTD_UBI_WL_THRESHOLD=4096
CONFIG_MTD_UBI_BEB_LIMIT=20
CONFIG_MTD_UBI_FASTMAP=y
# CONFIG_MTD_UBI_GLUEBI is not set
CONFIG_MTD_UBI_BLOCK=y
CONFIG_MTD_HYPERBUS=m
# CONFIG_OF is not set
CONFIG_ARCH_MIGHT_HAVE_PC_PARPORT=y
CONFIG_PARPORT=y
# CONFIG_PARPORT_PC is not set
CONFIG_PARPORT_AX88796=m
CONFIG_PARPORT_1284=y
CONFIG_PARPORT_NOT_PC=y
CONFIG_PNP=y
# CONFIG_PNP_DEBUG_MESSAGES is not set

#
# Protocols
#
CONFIG_PNPACPI=y
CONFIG_BLK_DEV=y
# CONFIG_BLK_DEV_NULL_BLK is not set
# CONFIG_BLK_DEV_FD is not set
# CONFIG_BLK_DEV_PCIESSD_MTIP32XX is not set
# CONFIG_ZRAM is not set
# CONFIG_BLK_DEV_LOOP is not set
# CONFIG_BLK_DEV_DRBD is not set
# CONFIG_BLK_DEV_NBD is not set
# CONFIG_BLK_DEV_SX8 is not set
# CONFIG_BLK_DEV_RAM is not set
# CONFIG_ATA_OVER_ETH is not set
# CONFIG_VIRTIO_BLK is not set
# CONFIG_BLK_DEV_RBD is not set

#
# NVME Support
#
CONFIG_NVME_CORE=y
CONFIG_BLK_DEV_NVME=y
# CONFIG_NVME_MULTIPATH is not set
# CONFIG_NVME_VERBOSE_ERRORS is not set
CONFIG_NVME_HWMON=y
CONFIG_NVME_FABRICS=y
CONFIG_NVME_FC=y
# CONFIG_NVME_TCP is not set
# CONFIG_NVME_TARGET is not set
# end of NVME Support

#
# Misc devices
#
CONFIG_SENSORS_LIS3LV02D=m
CONFIG_AD525X_DPOT=m
CONFIG_AD525X_DPOT_I2C=m
# CONFIG_DUMMY_IRQ is not set
CONFIG_IBM_ASM=m
CONFIG_PHANTOM=m
CONFIG_TIFM_CORE=m
CONFIG_TIFM_7XX1=m
CONFIG_ICS932S401=m
CONFIG_ENCLOSURE_SERVICES=y
CONFIG_CS5535_MFGPT=m
CONFIG_CS5535_MFGPT_DEFAULT_IRQ=7
CONFIG_CS5535_CLOCK_EVENT_SRC=m
# CONFIG_HP_ILO is not set
CONFIG_APDS9802ALS=m
# CONFIG_ISL29003 is not set
CONFIG_ISL29020=y
CONFIG_SENSORS_TSL2550=m
# CONFIG_SENSORS_BH1770 is not set
# CONFIG_SENSORS_APDS990X is not set
# CONFIG_HMC6352 is not set
# CONFIG_DS1682 is not set
CONFIG_VMWARE_BALLOON=m
CONFIG_PCH_PHUB=y
# CONFIG_SRAM is not set
# CONFIG_DW_XDATA_PCIE is not set
# CONFIG_PCI_ENDPOINT_TEST is not set
CONFIG_XILINX_SDFEC=m
# CONFIG_C2PORT is not set

#
# EEPROM support
#
CONFIG_EEPROM_AT24=y
CONFIG_EEPROM_LEGACY=y
CONFIG_EEPROM_MAX6875=m
CONFIG_EEPROM_93CX6=y
CONFIG_EEPROM_IDT_89HPESX=m
CONFIG_EEPROM_EE1004=m
# end of EEPROM support

CONFIG_CB710_CORE=y
CONFIG_CB710_DEBUG=y
CONFIG_CB710_DEBUG_ASSUMPTIONS=y

#
# Texas Instruments shared transport line discipline
#
CONFIG_TI_ST=m
# end of Texas Instruments shared transport line discipline

# CONFIG_SENSORS_LIS3_I2C is not set
CONFIG_ALTERA_STAPL=y
CONFIG_INTEL_MEI=y
CONFIG_INTEL_MEI_ME=m
CONFIG_INTEL_MEI_TXE=y
# CONFIG_INTEL_MEI_GSC is not set
# CONFIG_INTEL_MEI_HDCP is not set
CONFIG_INTEL_MEI_PXP=m
CONFIG_VMWARE_VMCI=m
CONFIG_ECHO=m
# CONFIG_BCM_VK is not set
# CONFIG_MISC_ALCOR_PCI is not set
# CONFIG_MISC_RTSX_PCI is not set
CONFIG_HABANA_AI=m
CONFIG_UACCE=y
# CONFIG_PVPANIC is not set
# end of Misc devices

#
# SCSI device support
#
CONFIG_SCSI_MOD=y
# CONFIG_RAID_ATTRS is not set
CONFIG_SCSI_COMMON=y
# CONFIG_SCSI is not set
# end of SCSI device support

# CONFIG_ATA is not set
CONFIG_MD=y
CONFIG_BLK_DEV_MD=y
CONFIG_MD_AUTODETECT=y
CONFIG_MD_LINEAR=m
CONFIG_MD_RAID0=y
CONFIG_MD_RAID1=y
CONFIG_MD_RAID10=y
CONFIG_MD_RAID456=y
CONFIG_MD_MULTIPATH=m
# CONFIG_MD_FAULTY is not set
CONFIG_BCACHE=y
# CONFIG_BCACHE_DEBUG is not set
# CONFIG_BCACHE_CLOSURES_DEBUG is not set
CONFIG_BCACHE_ASYNC_REGISTRATION=y
CONFIG_BLK_DEV_DM_BUILTIN=y
CONFIG_BLK_DEV_DM=y
CONFIG_DM_DEBUG=y
CONFIG_DM_BUFIO=y
# CONFIG_DM_DEBUG_BLOCK_MANAGER_LOCKING is not set
CONFIG_DM_BIO_PRISON=y
CONFIG_DM_PERSISTENT_DATA=y
CONFIG_DM_UNSTRIPED=m
CONFIG_DM_CRYPT=m
# CONFIG_DM_SNAPSHOT is not set
CONFIG_DM_THIN_PROVISIONING=y
CONFIG_DM_CACHE=y
# CONFIG_DM_CACHE_SMQ is not set
CONFIG_DM_WRITECACHE=m
CONFIG_DM_ERA=m
# CONFIG_DM_CLONE is not set
CONFIG_DM_MIRROR=m
CONFIG_DM_LOG_USERSPACE=m
CONFIG_DM_RAID=y
CONFIG_DM_ZERO=m
CONFIG_DM_MULTIPATH=y
CONFIG_DM_MULTIPATH_QL=y
CONFIG_DM_MULTIPATH_ST=y
CONFIG_DM_MULTIPATH_HST=y
# CONFIG_DM_MULTIPATH_IOA is not set
CONFIG_DM_DELAY=y
CONFIG_DM_DUST=m
CONFIG_DM_INIT=y
CONFIG_DM_UEVENT=y
# CONFIG_DM_FLAKEY is not set
# CONFIG_DM_VERITY is not set
CONFIG_DM_SWITCH=y
CONFIG_DM_LOG_WRITES=m
# CONFIG_DM_INTEGRITY is not set
CONFIG_TARGET_CORE=y
# CONFIG_TCM_IBLOCK is not set
# CONFIG_TCM_FILEIO is not set
CONFIG_TCM_USER2=y
# CONFIG_ISCSI_TARGET is not set
CONFIG_SBP_TARGET=m
CONFIG_FUSION=y
CONFIG_FUSION_MAX_SGE=128
CONFIG_FUSION_LOGGING=y

#
# IEEE 1394 (FireWire) support
#
CONFIG_FIREWIRE=m
CONFIG_FIREWIRE_OHCI=m
# CONFIG_FIREWIRE_NET is not set
# CONFIG_FIREWIRE_NOSY is not set
# end of IEEE 1394 (FireWire) support

# CONFIG_MACINTOSH_DRIVERS is not set
CONFIG_NETDEVICES=y
CONFIG_NET_CORE=y
# CONFIG_BONDING is not set
# CONFIG_DUMMY is not set
# CONFIG_WIREGUARD is not set
# CONFIG_EQUALIZER is not set
# CONFIG_NET_TEAM is not set
# CONFIG_MACVLAN is not set
# CONFIG_IPVLAN is not set
# CONFIG_VXLAN is not set
# CONFIG_GENEVE is not set
# CONFIG_BAREUDP is not set
# CONFIG_GTP is not set
# CONFIG_MACSEC is not set
# CONFIG_NETCONSOLE is not set
# CONFIG_TUN is not set
# CONFIG_TUN_VNET_CROSS_LE is not set
# CONFIG_VETH is not set
CONFIG_VIRTIO_NET=m
# CONFIG_NLMON is not set
# CONFIG_ARCNET is not set
CONFIG_ATM_DRIVERS=y
# CONFIG_ATM_DUMMY is not set
# CONFIG_ATM_TCP is not set
# CONFIG_ATM_LANAI is not set
# CONFIG_ATM_ENI is not set
# CONFIG_ATM_NICSTAR is not set
# CONFIG_ATM_IDT77252 is not set
# CONFIG_ATM_IA is not set
# CONFIG_ATM_FORE200E is not set
# CONFIG_ATM_HE is not set
# CONFIG_ATM_SOLOS is not set
CONFIG_ETHERNET=y
CONFIG_NET_VENDOR_3COM=y
# CONFIG_EL3 is not set
# CONFIG_PCMCIA_3C574 is not set
# CONFIG_PCMCIA_3C589 is not set
# CONFIG_VORTEX is not set
# CONFIG_TYPHOON is not set
CONFIG_NET_VENDOR_ADAPTEC=y
# CONFIG_ADAPTEC_STARFIRE is not set
CONFIG_NET_VENDOR_AGERE=y
# CONFIG_ET131X is not set
CONFIG_NET_VENDOR_ALACRITECH=y
# CONFIG_SLICOSS is not set
CONFIG_NET_VENDOR_ALTEON=y
# CONFIG_ACENIC is not set
# CONFIG_ALTERA_TSE is not set
CONFIG_NET_VENDOR_AMAZON=y
# CONFIG_ENA_ETHERNET is not set
# CONFIG_NET_VENDOR_AMD is not set
CONFIG_NET_VENDOR_AQUANTIA=y
# CONFIG_AQTION is not set
CONFIG_NET_VENDOR_ARC=y
CONFIG_NET_VENDOR_ASIX=y
CONFIG_NET_VENDOR_ATHEROS=y
# CONFIG_ATL2 is not set
# CONFIG_ATL1 is not set
# CONFIG_ATL1E is not set
# CONFIG_ATL1C is not set
# CONFIG_ALX is not set
# CONFIG_CX_ECAT is not set
CONFIG_NET_VENDOR_BROADCOM=y
# CONFIG_B44 is not set
# CONFIG_BCMGENET is not set
# CONFIG_BNX2 is not set
# CONFIG_CNIC is not set
# CONFIG_TIGON3 is not set
# CONFIG_BNX2X is not set
# CONFIG_SYSTEMPORT is not set
# CONFIG_BNXT is not set
CONFIG_NET_VENDOR_CADENCE=y
# CONFIG_MACB is not set
CONFIG_NET_VENDOR_CAVIUM=y
CONFIG_NET_VENDOR_CHELSIO=y
# CONFIG_CHELSIO_T1 is not set
# CONFIG_CHELSIO_T3 is not set
# CONFIG_CHELSIO_T4 is not set
# CONFIG_CHELSIO_T4VF is not set
CONFIG_NET_VENDOR_CIRRUS=y
CONFIG_NET_VENDOR_CISCO=y
# CONFIG_ENIC is not set
CONFIG_NET_VENDOR_CORTINA=y
CONFIG_NET_VENDOR_DAVICOM=y
# CONFIG_DNET is not set
CONFIG_NET_VENDOR_DEC=y
# CONFIG_NET_TULIP is not set
CONFIG_NET_VENDOR_DLINK=y
# CONFIG_DL2K is not set
# CONFIG_SUNDANCE is not set
CONFIG_NET_VENDOR_EMULEX=y
# CONFIG_BE2NET is not set
CONFIG_NET_VENDOR_ENGLEDER=y
# CONFIG_TSNEP is not set
CONFIG_NET_VENDOR_EZCHIP=y
CONFIG_NET_VENDOR_FUJITSU=y
# CONFIG_PCMCIA_FMVJ18X is not set
CONFIG_NET_VENDOR_FUNGIBLE=y
# CONFIG_FUN_ETH is not set
CONFIG_NET_VENDOR_GOOGLE=y
# CONFIG_GVE is not set
CONFIG_NET_VENDOR_HUAWEI=y
# CONFIG_HINIC is not set
CONFIG_NET_VENDOR_I825XX=y
CONFIG_NET_VENDOR_INTEL=y
# CONFIG_E100 is not set
CONFIG_E1000=y
# CONFIG_E1000E is not set
# CONFIG_IGB is not set
# CONFIG_IGBVF is not set
# CONFIG_IXGB is not set
# CONFIG_IXGBE is not set
# CONFIG_IXGBEVF is not set
# CONFIG_I40E is not set
# CONFIG_I40EVF is not set
# CONFIG_ICE is not set
# CONFIG_FM10K is not set
# CONFIG_IGC is not set
# CONFIG_JME is not set
CONFIG_NET_VENDOR_LITEX=y
CONFIG_NET_VENDOR_MARVELL=y
# CONFIG_MVMDIO is not set
# CONFIG_SKGE is not set
# CONFIG_SKY2 is not set
CONFIG_NET_VENDOR_MELLANOX=y
# CONFIG_MLX4_EN is not set
# CONFIG_MLX5_CORE is not set
# CONFIG_MLXSW_CORE is not set
# CONFIG_MLXFW is not set
CONFIG_NET_VENDOR_MICREL=y
# CONFIG_KS8851_MLL is not set
# CONFIG_KSZ884X_PCI is not set
CONFIG_NET_VENDOR_MICROCHIP=y
# CONFIG_LAN743X is not set
CONFIG_NET_VENDOR_MICROSEMI=y
CONFIG_NET_VENDOR_MICROSOFT=y
CONFIG_NET_VENDOR_MYRI=y
# CONFIG_MYRI10GE is not set
# CONFIG_FEALNX is not set
CONFIG_NET_VENDOR_NI=y
# CONFIG_NI_XGE_MANAGEMENT_ENET is not set
CONFIG_NET_VENDOR_NATSEMI=y
# CONFIG_NATSEMI is not set
# CONFIG_NS83820 is not set
CONFIG_NET_VENDOR_NETERION=y
# CONFIG_S2IO is not set
# CONFIG_VXGE is not set
CONFIG_NET_VENDOR_NETRONOME=y
# CONFIG_NFP is not set
CONFIG_NET_VENDOR_8390=y
# CONFIG_PCMCIA_AXNET is not set
# CONFIG_NE2K_PCI is not set
# CONFIG_PCMCIA_PCNET is not set
CONFIG_NET_VENDOR_NVIDIA=y
# CONFIG_FORCEDETH is not set
CONFIG_NET_VENDOR_OKI=y
# CONFIG_ETHOC is not set
CONFIG_NET_VENDOR_PACKET_ENGINES=y
# CONFIG_HAMACHI is not set
# CONFIG_YELLOWFIN is not set
CONFIG_NET_VENDOR_PENSANDO=y
CONFIG_NET_VENDOR_QLOGIC=y
# CONFIG_QLA3XXX is not set
# CONFIG_QLCNIC is not set
# CONFIG_NETXEN_NIC is not set
# CONFIG_QED is not set
CONFIG_NET_VENDOR_BROCADE=y
# CONFIG_BNA is not set
CONFIG_NET_VENDOR_QUALCOMM=y
# CONFIG_QCOM_EMAC is not set
# CONFIG_RMNET is not set
CONFIG_NET_VENDOR_RDC=y
# CONFIG_R6040 is not set
CONFIG_NET_VENDOR_REALTEK=y
# CONFIG_ATP is not set
# CONFIG_8139CP is not set
# CONFIG_8139TOO is not set
# CONFIG_R8169 is not set
CONFIG_NET_VENDOR_RENESAS=y
CONFIG_NET_VENDOR_ROCKER=y
CONFIG_NET_VENDOR_SAMSUNG=y
# CONFIG_SXGBE_ETH is not set
CONFIG_NET_VENDOR_SEEQ=y
CONFIG_NET_VENDOR_SILAN=y
# CONFIG_SC92031 is not set
CONFIG_NET_VENDOR_SIS=y
# CONFIG_SIS900 is not set
# CONFIG_SIS190 is not set
CONFIG_NET_VENDOR_SOLARFLARE=y
# CONFIG_SFC is not set
# CONFIG_SFC_FALCON is not set
CONFIG_NET_VENDOR_SMSC=y
# CONFIG_PCMCIA_SMC91C92 is not set
# CONFIG_EPIC100 is not set
# CONFIG_SMSC911X is not set
# CONFIG_SMSC9420 is not set
CONFIG_NET_VENDOR_SOCIONEXT=y
CONFIG_NET_VENDOR_STMICRO=y
# CONFIG_STMMAC_ETH is not set
CONFIG_NET_VENDOR_SUN=y
# CONFIG_HAPPYMEAL is not set
# CONFIG_SUNGEM is not set
# CONFIG_CASSINI is not set
# CONFIG_NIU is not set
CONFIG_NET_VENDOR_SYNOPSYS=y
# CONFIG_DWC_XLGMAC is not set
CONFIG_NET_VENDOR_TEHUTI=y
# CONFIG_TEHUTI is not set
CONFIG_NET_VENDOR_TI=y
# CONFIG_TI_CPSW_PHY_SEL is not set
# CONFIG_TLAN is not set
CONFIG_NET_VENDOR_VERTEXCOM=y
CONFIG_NET_VENDOR_VIA=y
# CONFIG_VIA_RHINE is not set
# CONFIG_VIA_VELOCITY is not set
CONFIG_NET_VENDOR_WIZNET=y
# CONFIG_WIZNET_W5100 is not set
# CONFIG_WIZNET_W5300 is not set
CONFIG_NET_VENDOR_XILINX=y
# CONFIG_XILINX_EMACLITE is not set
# CONFIG_XILINX_AXI_EMAC is not set
# CONFIG_XILINX_LL_TEMAC is not set
CONFIG_NET_VENDOR_XIRCOM=y
# CONFIG_PCMCIA_XIRC2PS is not set
# CONFIG_FDDI is not set
# CONFIG_HIPPI is not set
# CONFIG_NET_SB1000 is not set
# CONFIG_PHYLIB is not set
# CONFIG_MDIO_DEVICE is not set

#
# PCS device drivers
#
# end of PCS device drivers

# CONFIG_PLIP is not set
# CONFIG_PPP is not set
# CONFIG_SLIP is not set

#
# Host-side USB support is needed for USB Network Adapter support
#
CONFIG_WLAN=y
CONFIG_WLAN_VENDOR_ADMTEK=y
# CONFIG_ADM8211 is not set
CONFIG_WLAN_VENDOR_ATH=y
# CONFIG_ATH_DEBUG is not set
# CONFIG_ATH5K is not set
# CONFIG_ATH5K_PCI is not set
# CONFIG_ATH9K is not set
# CONFIG_ATH6KL is not set
# CONFIG_WIL6210 is not set
# CONFIG_ATH10K is not set
# CONFIG_WCN36XX is not set
# CONFIG_ATH11K is not set
CONFIG_WLAN_VENDOR_ATMEL=y
# CONFIG_ATMEL is not set
CONFIG_WLAN_VENDOR_BROADCOM=y
# CONFIG_B43 is not set
# CONFIG_B43LEGACY is not set
# CONFIG_BRCMSMAC is not set
# CONFIG_BRCMFMAC is not set
CONFIG_WLAN_VENDOR_CISCO=y
# CONFIG_AIRO is not set
# CONFIG_AIRO_CS is not set
CONFIG_WLAN_VENDOR_INTEL=y
# CONFIG_IPW2100 is not set
# CONFIG_IPW2200 is not set
# CONFIG_IWL4965 is not set
# CONFIG_IWL3945 is not set
# CONFIG_IWLWIFI is not set
# CONFIG_IWLMEI is not set
CONFIG_WLAN_VENDOR_INTERSIL=y
# CONFIG_HOSTAP is not set
# CONFIG_HERMES is not set
# CONFIG_P54_COMMON is not set
CONFIG_WLAN_VENDOR_MARVELL=y
# CONFIG_LIBERTAS is not set
# CONFIG_LIBERTAS_THINFIRM is not set
# CONFIG_MWIFIEX is not set
# CONFIG_MWL8K is not set
CONFIG_WLAN_VENDOR_MEDIATEK=y
# CONFIG_MT76x0E is not set
# CONFIG_MT76x2E is not set
# CONFIG_MT7603E is not set
# CONFIG_MT7615E is not set
# CONFIG_MT7915E is not set
# CONFIG_MT7921E is not set
CONFIG_WLAN_VENDOR_MICROCHIP=y
CONFIG_WLAN_VENDOR_PURELIFI=y
CONFIG_WLAN_VENDOR_RALINK=y
# CONFIG_RT2X00 is not set
CONFIG_WLAN_VENDOR_REALTEK=y
# CONFIG_RTL8180 is not set
CONFIG_RTL_CARDS=m
# CONFIG_RTL8192CE is not set
# CONFIG_RTL8192SE is not set
# CONFIG_RTL8192DE is not set
# CONFIG_RTL8723AE is not set
# CONFIG_RTL8723BE is not set
# CONFIG_RTL8188EE is not set
# CONFIG_RTL8192EE is not set
# CONFIG_RTL8821AE is not set
# CONFIG_RTW88 is not set
# CONFIG_RTW89 is not set
CONFIG_WLAN_VENDOR_RSI=y
# CONFIG_RSI_91X is not set
CONFIG_WLAN_VENDOR_SILABS=y
CONFIG_WLAN_VENDOR_ST=y
# CONFIG_CW1200 is not set
CONFIG_WLAN_VENDOR_TI=y
# CONFIG_WL1251 is not set
# CONFIG_WL12XX is not set
# CONFIG_WL18XX is not set
# CONFIG_WLCORE is not set
CONFIG_WLAN_VENDOR_ZYDAS=y
CONFIG_WLAN_VENDOR_QUANTENNA=y
# CONFIG_QTNFMAC_PCIE is not set
# CONFIG_PCMCIA_RAYCS is not set
# CONFIG_PCMCIA_WL3501 is not set
# CONFIG_MAC80211_HWSIM is not set
# CONFIG_VIRT_WIFI is not set
# CONFIG_WAN is not set
CONFIG_IEEE802154_DRIVERS=m

#
# Wireless WAN
#
# CONFIG_WWAN is not set
# end of Wireless WAN

# CONFIG_VMXNET3 is not set
# CONFIG_FUJITSU_ES is not set
# CONFIG_USB4_NET is not set
# CONFIG_HYPERV_NET is not set
# CONFIG_NETDEVSIM is not set
CONFIG_NET_FAILOVER=m
# CONFIG_ISDN is not set

#
# Input device support
#
CONFIG_INPUT=y
CONFIG_INPUT_LEDS=m
CONFIG_INPUT_FF_MEMLESS=y
CONFIG_INPUT_SPARSEKMAP=y
CONFIG_INPUT_MATRIXKMAP=y
CONFIG_INPUT_VIVALDIFMAP=y

#
# Userland interfaces
#
# CONFIG_INPUT_MOUSEDEV is not set
CONFIG_INPUT_JOYDEV=y
CONFIG_INPUT_EVDEV=m
CONFIG_INPUT_EVBUG=y

#
# Input Device Drivers
#
CONFIG_INPUT_KEYBOARD=y
# CONFIG_KEYBOARD_ADP5520 is not set
# CONFIG_KEYBOARD_ADP5588 is not set
# CONFIG_KEYBOARD_ADP5589 is not set
CONFIG_KEYBOARD_ATKBD=y
# CONFIG_KEYBOARD_QT1050 is not set
# CONFIG_KEYBOARD_QT1070 is not set
# CONFIG_KEYBOARD_QT2160 is not set
# CONFIG_KEYBOARD_DLINK_DIR685 is not set
# CONFIG_KEYBOARD_LKKBD is not set
# CONFIG_KEYBOARD_GPIO is not set
# CONFIG_KEYBOARD_GPIO_POLLED is not set
# CONFIG_KEYBOARD_TCA6416 is not set
# CONFIG_KEYBOARD_TCA8418 is not set
# CONFIG_KEYBOARD_MATRIX is not set
# CONFIG_KEYBOARD_LM8323 is not set
# CONFIG_KEYBOARD_LM8333 is not set
# CONFIG_KEYBOARD_MAX7359 is not set
# CONFIG_KEYBOARD_MCS is not set
# CONFIG_KEYBOARD_MPR121 is not set
# CONFIG_KEYBOARD_NEWTON is not set
# CONFIG_KEYBOARD_OPENCORES is not set
# CONFIG_KEYBOARD_SAMSUNG is not set
# CONFIG_KEYBOARD_STOWAWAY is not set
# CONFIG_KEYBOARD_SUNKBD is not set
# CONFIG_KEYBOARD_IQS62X is not set
# CONFIG_KEYBOARD_TM2_TOUCHKEY is not set
# CONFIG_KEYBOARD_XTKBD is not set
# CONFIG_KEYBOARD_MTK_PMIC is not set
# CONFIG_KEYBOARD_CYPRESS_SF is not set
# CONFIG_INPUT_MOUSE is not set
CONFIG_INPUT_JOYSTICK=y
# CONFIG_JOYSTICK_ANALOG is not set
CONFIG_JOYSTICK_A3D=m
CONFIG_JOYSTICK_ADI=m
CONFIG_JOYSTICK_COBRA=m
CONFIG_JOYSTICK_GF2K=m
# CONFIG_JOYSTICK_GRIP is not set
CONFIG_JOYSTICK_GRIP_MP=y
# CONFIG_JOYSTICK_GUILLEMOT is not set
CONFIG_JOYSTICK_INTERACT=y
CONFIG_JOYSTICK_SIDEWINDER=m
CONFIG_JOYSTICK_TMDC=y
CONFIG_JOYSTICK_IFORCE=m
# CONFIG_JOYSTICK_IFORCE_232 is not set
CONFIG_JOYSTICK_WARRIOR=m
CONFIG_JOYSTICK_MAGELLAN=m
CONFIG_JOYSTICK_SPACEORB=y
CONFIG_JOYSTICK_SPACEBALL=y
CONFIG_JOYSTICK_STINGER=y
# CONFIG_JOYSTICK_TWIDJOY is not set
# CONFIG_JOYSTICK_ZHENHUA is not set
CONFIG_JOYSTICK_DB9=y
# CONFIG_JOYSTICK_GAMECON is not set
CONFIG_JOYSTICK_TURBOGRAFX=m
CONFIG_JOYSTICK_AS5011=m
# CONFIG_JOYSTICK_JOYDUMP is not set
# CONFIG_JOYSTICK_XPAD is not set
CONFIG_JOYSTICK_WALKERA0701=m
# CONFIG_JOYSTICK_PXRC is not set
# CONFIG_JOYSTICK_QWIIC is not set
# CONFIG_JOYSTICK_FSIA6B is not set
# CONFIG_JOYSTICK_SENSEHAT is not set
CONFIG_INPUT_TABLET=y
# CONFIG_TABLET_USB_ACECAD is not set
# CONFIG_TABLET_USB_AIPTEK is not set
# CONFIG_TABLET_USB_HANWANG is not set
# CONFIG_TABLET_USB_KBTAB is not set
# CONFIG_TABLET_USB_PEGASUS is not set
# CONFIG_TABLET_SERIAL_WACOM4 is not set
# CONFIG_INPUT_TOUCHSCREEN is not set
CONFIG_INPUT_MISC=y
CONFIG_INPUT_88PM860X_ONKEY=y
CONFIG_INPUT_AD714X=y
CONFIG_INPUT_AD714X_I2C=m
CONFIG_INPUT_BMA150=m
# CONFIG_INPUT_E3X0_BUTTON is not set
CONFIG_INPUT_MAX77693_HAPTIC=y
CONFIG_INPUT_MMA8450=m
CONFIG_INPUT_APANEL=m
CONFIG_INPUT_GPIO_BEEPER=m
CONFIG_INPUT_GPIO_DECODER=m
CONFIG_INPUT_GPIO_VIBRA=y
CONFIG_INPUT_WISTRON_BTNS=y
CONFIG_INPUT_ATLAS_BTNS=y
# CONFIG_INPUT_ATI_REMOTE2 is not set
# CONFIG_INPUT_KEYSPAN_REMOTE is not set
CONFIG_INPUT_KXTJ9=m
# CONFIG_INPUT_POWERMATE is not set
# CONFIG_INPUT_YEALINK is not set
# CONFIG_INPUT_CM109 is not set
# CONFIG_INPUT_REGULATOR_HAPTIC is not set
CONFIG_INPUT_RETU_PWRBUTTON=m
CONFIG_INPUT_AXP20X_PEK=y
CONFIG_INPUT_TWL6040_VIBRA=m
# CONFIG_INPUT_UINPUT is not set
CONFIG_INPUT_PCF8574=m
CONFIG_INPUT_PWM_BEEPER=m
# CONFIG_INPUT_PWM_VIBRA is not set
# CONFIG_INPUT_GPIO_ROTARY_ENCODER is not set
# CONFIG_INPUT_DA7280_HAPTICS is not set
CONFIG_INPUT_DA9055_ONKEY=y
CONFIG_INPUT_DA9063_ONKEY=m
CONFIG_INPUT_ADXL34X=y
CONFIG_INPUT_ADXL34X_I2C=y
CONFIG_INPUT_IQS269A=y
# CONFIG_INPUT_IQS626A is not set
# CONFIG_INPUT_IQS7222 is not set
CONFIG_INPUT_CMA3000=m
CONFIG_INPUT_CMA3000_I2C=m
CONFIG_INPUT_IDEAPAD_SLIDEBAR=m
CONFIG_INPUT_DRV260X_HAPTICS=y
CONFIG_INPUT_DRV2665_HAPTICS=y
CONFIG_INPUT_DRV2667_HAPTICS=y
CONFIG_RMI4_CORE=m
CONFIG_RMI4_I2C=m
CONFIG_RMI4_SMB=m
# CONFIG_RMI4_F03 is not set
CONFIG_RMI4_2D_SENSOR=y
# CONFIG_RMI4_F11 is not set
CONFIG_RMI4_F12=y
CONFIG_RMI4_F30=y
CONFIG_RMI4_F34=y
CONFIG_RMI4_F3A=y
CONFIG_RMI4_F54=y
CONFIG_RMI4_F55=y

#
# Hardware I/O ports
#
CONFIG_SERIO=y
CONFIG_ARCH_MIGHT_HAVE_PC_SERIO=y
CONFIG_SERIO_I8042=y
CONFIG_SERIO_SERPORT=m
CONFIG_SERIO_CT82C710=m
CONFIG_SERIO_PARKBD=y
CONFIG_SERIO_PCIPS2=m
CONFIG_SERIO_LIBPS2=y
CONFIG_SERIO_RAW=m
# CONFIG_SERIO_ALTERA_PS2 is not set
CONFIG_SERIO_PS2MULT=m
CONFIG_SERIO_ARC_PS2=m
CONFIG_HYPERV_KEYBOARD=y
CONFIG_SERIO_GPIO_PS2=m
CONFIG_USERIO=y
CONFIG_GAMEPORT=y
CONFIG_GAMEPORT_NS558=m
CONFIG_GAMEPORT_L4=y
CONFIG_GAMEPORT_EMU10K1=y
CONFIG_GAMEPORT_FM801=m
# end of Hardware I/O ports
# end of Input device support

#
# Character devices
#
CONFIG_TTY=y
# CONFIG_VT is not set
CONFIG_UNIX98_PTYS=y
CONFIG_LEGACY_PTYS=y
CONFIG_LEGACY_PTY_COUNT=256
# CONFIG_LDISC_AUTOLOAD is not set

#
# Serial drivers
#
CONFIG_SERIAL_EARLYCON=y
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_DEPRECATED_OPTIONS=y
CONFIG_SERIAL_8250_PNP=y
CONFIG_SERIAL_8250_16550A_VARIANTS=y
# CONFIG_SERIAL_8250_FINTEK is not set
CONFIG_SERIAL_8250_CONSOLE=y
# CONFIG_SERIAL_8250_PCI is not set
CONFIG_SERIAL_8250_CS=m
CONFIG_SERIAL_8250_MEN_MCB=y
CONFIG_SERIAL_8250_NR_UARTS=4
CONFIG_SERIAL_8250_RUNTIME_UARTS=4
CONFIG_SERIAL_8250_EXTENDED=y
CONFIG_SERIAL_8250_MANY_PORTS=y
CONFIG_SERIAL_8250_SHARE_IRQ=y
CONFIG_SERIAL_8250_DETECT_IRQ=y
# CONFIG_SERIAL_8250_RSA is not set
CONFIG_SERIAL_8250_DWLIB=y
CONFIG_SERIAL_8250_DW=m
CONFIG_SERIAL_8250_RT288X=y
CONFIG_SERIAL_8250_LPSS=y
CONFIG_SERIAL_8250_MID=y
CONFIG_SERIAL_8250_PERICOM=y

#
# Non-8250 serial port support
#
CONFIG_SERIAL_UARTLITE=m
CONFIG_SERIAL_UARTLITE_NR_UARTS=1
CONFIG_SERIAL_CORE=y
CONFIG_SERIAL_CORE_CONSOLE=y
CONFIG_SERIAL_JSM=m
# CONFIG_SERIAL_LANTIQ is not set
# CONFIG_SERIAL_SCCNXP is not set
CONFIG_SERIAL_SC16IS7XX=y
# CONFIG_SERIAL_SC16IS7XX_I2C is not set
# CONFIG_SERIAL_TIMBERDALE is not set
CONFIG_SERIAL_ALTERA_JTAGUART=m
CONFIG_SERIAL_ALTERA_UART=m
CONFIG_SERIAL_ALTERA_UART_MAXPORTS=4
CONFIG_SERIAL_ALTERA_UART_BAUDRATE=115200
CONFIG_SERIAL_PCH_UART=y
CONFIG_SERIAL_PCH_UART_CONSOLE=y
CONFIG_SERIAL_ARC=y
# CONFIG_SERIAL_ARC_CONSOLE is not set
CONFIG_SERIAL_ARC_NR_PORTS=1
CONFIG_SERIAL_RP2=m
CONFIG_SERIAL_RP2_NR_UARTS=32
# CONFIG_SERIAL_FSL_LPUART is not set
# CONFIG_SERIAL_FSL_LINFLEXUART is not set
CONFIG_SERIAL_MEN_Z135=m
# CONFIG_SERIAL_SPRD is not set
# end of Serial drivers

CONFIG_SERIAL_MCTRL_GPIO=y
# CONFIG_SERIAL_NONSTANDARD is not set
# CONFIG_N_GSM is not set
CONFIG_NOZOMI=y
# CONFIG_NULL_TTY is not set
CONFIG_HVC_DRIVER=y
# CONFIG_RPMSG_TTY is not set
CONFIG_SERIAL_DEV_BUS=m
CONFIG_TTY_PRINTK=m
CONFIG_TTY_PRINTK_LEVEL=6
# CONFIG_PRINTER is not set
CONFIG_PPDEV=m
CONFIG_VIRTIO_CONSOLE=m
CONFIG_IPMI_HANDLER=m
CONFIG_IPMI_PLAT_DATA=y
# CONFIG_IPMI_PANIC_EVENT is not set
# CONFIG_IPMI_DEVICE_INTERFACE is not set
CONFIG_IPMI_SI=m
CONFIG_IPMI_SSIF=m
# CONFIG_IPMI_IPMB is not set
# CONFIG_IPMI_WATCHDOG is not set
# CONFIG_IPMI_POWEROFF is not set
CONFIG_IPMB_DEVICE_INTERFACE=m
# CONFIG_HW_RANDOM is not set
# CONFIG_APPLICOM is not set
# CONFIG_SONYPI is not set

#
# PCMCIA character devices
#
# CONFIG_SYNCLINK_CS is not set
# CONFIG_CARDMAN_4000 is not set
# CONFIG_CARDMAN_4040 is not set
CONFIG_SCR24X=m
# CONFIG_IPWIRELESS is not set
# end of PCMCIA character devices

CONFIG_MWAVE=y
CONFIG_SCx200_GPIO=m
# CONFIG_PC8736x_GPIO is not set
CONFIG_NSC_GPIO=y
# CONFIG_DEVMEM is not set
# CONFIG_NVRAM is not set
# CONFIG_DEVPORT is not set
CONFIG_HPET=y
CONFIG_HPET_MMAP=y
# CONFIG_HPET_MMAP_DEFAULT is not set
CONFIG_HANGCHECK_TIMER=m
# CONFIG_TCG_TPM is not set
CONFIG_TELCLOCK=y
# CONFIG_XILLYBUS is not set
# CONFIG_RANDOM_TRUST_CPU is not set
CONFIG_RANDOM_TRUST_BOOTLOADER=y
# end of Character devices

#
# I2C support
#
CONFIG_I2C=y
# CONFIG_ACPI_I2C_OPREGION is not set
CONFIG_I2C_BOARDINFO=y
# CONFIG_I2C_COMPAT is not set
CONFIG_I2C_CHARDEV=y
# CONFIG_I2C_MUX is not set
CONFIG_I2C_HELPER_AUTO=y
CONFIG_I2C_SMBUS=y
CONFIG_I2C_ALGOBIT=y

#
# I2C Hardware Bus support
#

#
# PC SMBus host controller drivers
#
CONFIG_I2C_CCGX_UCSI=y
CONFIG_I2C_ALI1535=m
# CONFIG_I2C_ALI1563 is not set
CONFIG_I2C_ALI15X3=m
# CONFIG_I2C_AMD756 is not set
# CONFIG_I2C_AMD8111 is not set
# CONFIG_I2C_AMD_MP2 is not set
CONFIG_I2C_I801=y
CONFIG_I2C_ISCH=y
CONFIG_I2C_ISMT=y
# CONFIG_I2C_PIIX4 is not set
CONFIG_I2C_NFORCE2=y
# CONFIG_I2C_NFORCE2_S4985 is not set
CONFIG_I2C_NVIDIA_GPU=y
# CONFIG_I2C_SIS5595 is not set
# CONFIG_I2C_SIS630 is not set
# CONFIG_I2C_SIS96X is not set
# CONFIG_I2C_VIA is not set
# CONFIG_I2C_VIAPRO is not set

#
# ACPI drivers
#
# CONFIG_I2C_SCMI is not set

#
# I2C system bus drivers (mostly embedded / system-on-chip)
#
CONFIG_I2C_CBUS_GPIO=y
# CONFIG_I2C_DESIGNWARE_PLATFORM is not set
# CONFIG_I2C_DESIGNWARE_PCI is not set
# CONFIG_I2C_EG20T is not set
CONFIG_I2C_EMEV2=m
# CONFIG_I2C_GPIO is not set
CONFIG_I2C_KEMPLD=y
CONFIG_I2C_OCORES=y
# CONFIG_I2C_PCA_PLATFORM is not set
CONFIG_I2C_SIMTEC=m
# CONFIG_I2C_XILINX is not set

#
# External I2C/SMBus adapter drivers
#
CONFIG_I2C_PARPORT=y
# CONFIG_I2C_TAOS_EVM is not set

#
# Other I2C/SMBus bus drivers
#
# CONFIG_SCx200_ACB is not set
# CONFIG_I2C_VIRTIO is not set
# end of I2C Hardware Bus support

CONFIG_I2C_STUB=m
CONFIG_I2C_SLAVE=y
# CONFIG_I2C_SLAVE_EEPROM is not set
# CONFIG_I2C_SLAVE_TESTUNIT is not set
# CONFIG_I2C_DEBUG_CORE is not set
# CONFIG_I2C_DEBUG_ALGO is not set
# CONFIG_I2C_DEBUG_BUS is not set
# end of I2C support

# CONFIG_I3C is not set
# CONFIG_SPI is not set
CONFIG_SPMI=y
# CONFIG_SPMI_HISI3670 is not set
CONFIG_HSI=m
CONFIG_HSI_BOARDINFO=y

#
# HSI controllers
#

#
# HSI clients
#
CONFIG_HSI_CHAR=m
# CONFIG_PPS is not set

#
# PTP clock support
#
# CONFIG_PTP_1588_CLOCK is not set
CONFIG_PTP_1588_CLOCK_OPTIONAL=y

#
# Enable PHYLIB and NETWORK_PHY_TIMESTAMPING to see the additional clocks.
#
# end of PTP clock support

CONFIG_PINCTRL=y
CONFIG_PINMUX=y
CONFIG_PINCONF=y
CONFIG_GENERIC_PINCONF=y
# CONFIG_DEBUG_PINCTRL is not set
# CONFIG_PINCTRL_AMD is not set
# CONFIG_PINCTRL_DA9062 is not set
# CONFIG_PINCTRL_MCP23S08 is not set
CONFIG_PINCTRL_SX150X=y
CONFIG_PINCTRL_MADERA=m
CONFIG_PINCTRL_CS47L15=y
CONFIG_PINCTRL_CS47L85=y
CONFIG_PINCTRL_CS47L90=y
CONFIG_PINCTRL_CS47L92=y

#
# Intel pinctrl drivers
#
CONFIG_PINCTRL_BAYTRAIL=y
CONFIG_PINCTRL_CHERRYVIEW=m
CONFIG_PINCTRL_LYNXPOINT=m
CONFIG_PINCTRL_INTEL=y
CONFIG_PINCTRL_ALDERLAKE=y
CONFIG_PINCTRL_BROXTON=m
CONFIG_PINCTRL_CANNONLAKE=m
# CONFIG_PINCTRL_CEDARFORK is not set
CONFIG_PINCTRL_DENVERTON=y
# CONFIG_PINCTRL_ELKHARTLAKE is not set
CONFIG_PINCTRL_EMMITSBURG=m
CONFIG_PINCTRL_GEMINILAKE=y
# CONFIG_PINCTRL_ICELAKE is not set
# CONFIG_PINCTRL_JASPERLAKE is not set
# CONFIG_PINCTRL_LAKEFIELD is not set
CONFIG_PINCTRL_LEWISBURG=m
CONFIG_PINCTRL_SUNRISEPOINT=m
# CONFIG_PINCTRL_TIGERLAKE is not set
# end of Intel pinctrl drivers

#
# Renesas pinctrl drivers
#
# end of Renesas pinctrl drivers

CONFIG_GPIOLIB=y
CONFIG_GPIOLIB_FASTPATH_LIMIT=512
CONFIG_GPIO_ACPI=y
CONFIG_GPIOLIB_IRQCHIP=y
# CONFIG_DEBUG_GPIO is not set
CONFIG_GPIO_SYSFS=y
CONFIG_GPIO_CDEV=y
CONFIG_GPIO_CDEV_V1=y
CONFIG_GPIO_GENERIC=y
CONFIG_GPIO_MAX730X=m

#
# Memory mapped GPIO drivers
#
# CONFIG_GPIO_AMDPT is not set
CONFIG_GPIO_DWAPB=m
# CONFIG_GPIO_GENERIC_PLATFORM is not set
CONFIG_GPIO_ICH=y
CONFIG_GPIO_MB86S7X=m
CONFIG_GPIO_MENZ127=y
# CONFIG_GPIO_SIOX is not set
CONFIG_GPIO_VX855=m
# CONFIG_GPIO_AMD_FCH is not set
# end of Memory mapped GPIO drivers

#
# Port-mapped I/O GPIO drivers
#
# CONFIG_GPIO_104_DIO_48E is not set
CONFIG_GPIO_104_IDIO_16=m
# CONFIG_GPIO_104_IDI_48 is not set
# CONFIG_GPIO_F7188X is not set
CONFIG_GPIO_GPIO_MM=m
# CONFIG_GPIO_IT87 is not set
CONFIG_GPIO_SCH=y
CONFIG_GPIO_SCH311X=y
CONFIG_GPIO_WINBOND=y
# CONFIG_GPIO_WS16C48 is not set
# end of Port-mapped I/O GPIO drivers

#
# I2C GPIO expanders
#
CONFIG_GPIO_ADP5588=m
CONFIG_GPIO_MAX7300=m
CONFIG_GPIO_MAX732X=y
# CONFIG_GPIO_MAX732X_IRQ is not set
# CONFIG_GPIO_PCA953X is not set
# CONFIG_GPIO_PCA9570 is not set
CONFIG_GPIO_PCF857X=m
CONFIG_GPIO_TPIC2810=y
# end of I2C GPIO expanders

#
# MFD GPIO expanders
#
CONFIG_GPIO_ADP5520=y
CONFIG_GPIO_ARIZONA=m
# CONFIG_GPIO_CS5535 is not set
CONFIG_GPIO_DA9055=y
CONFIG_GPIO_KEMPLD=m
CONFIG_GPIO_LP873X=m
# CONFIG_GPIO_MADERA is not set
CONFIG_GPIO_TPS65912=m
CONFIG_GPIO_TQMX86=y
CONFIG_GPIO_TWL6040=y
CONFIG_GPIO_WHISKEY_COVE=m
# CONFIG_GPIO_WM8350 is not set
# end of MFD GPIO expanders

#
# PCI GPIO expanders
#
# CONFIG_GPIO_AMD8111 is not set
CONFIG_GPIO_BT8XX=m
# CONFIG_GPIO_ML_IOH is not set
CONFIG_GPIO_PCH=m
# CONFIG_GPIO_PCI_IDIO_16 is not set
# CONFIG_GPIO_PCIE_IDIO_24 is not set
CONFIG_GPIO_RDC321X=y
# end of PCI GPIO expanders

#
# Virtual GPIO drivers
#
# CONFIG_GPIO_AGGREGATOR is not set
# CONFIG_GPIO_MOCKUP is not set
# CONFIG_GPIO_VIRTIO is not set
# CONFIG_GPIO_SIM is not set
# end of Virtual GPIO drivers

# CONFIG_W1 is not set
# CONFIG_POWER_RESET is not set
CONFIG_POWER_SUPPLY=y
# CONFIG_POWER_SUPPLY_DEBUG is not set
CONFIG_POWER_SUPPLY_HWMON=y
CONFIG_PDA_POWER=y
# CONFIG_IP5XXX_POWER is not set
CONFIG_WM8350_POWER=m
# CONFIG_TEST_POWER is not set
CONFIG_BATTERY_88PM860X=m
# CONFIG_CHARGER_ADP5061 is not set
CONFIG_BATTERY_CW2015=m
# CONFIG_BATTERY_DS2780 is not set
# CONFIG_BATTERY_DS2781 is not set
# CONFIG_BATTERY_DS2782 is not set
# CONFIG_BATTERY_SAMSUNG_SDI is not set
CONFIG_BATTERY_SBS=m
CONFIG_CHARGER_SBS=m
CONFIG_BATTERY_BQ27XXX=m
# CONFIG_BATTERY_BQ27XXX_I2C is not set
CONFIG_BATTERY_DA9030=y
# CONFIG_BATTERY_DA9150 is not set
# CONFIG_BATTERY_MAX17040 is not set
CONFIG_BATTERY_MAX17042=y
CONFIG_CHARGER_88PM860X=m
CONFIG_CHARGER_MAX8903=m
# CONFIG_CHARGER_LP8727 is not set
CONFIG_CHARGER_GPIO=m
CONFIG_CHARGER_MANAGER=m
CONFIG_CHARGER_LT3651=m
# CONFIG_CHARGER_LTC4162L is not set
CONFIG_CHARGER_MAX14577=y
CONFIG_CHARGER_MAX77693=m
# CONFIG_CHARGER_MAX77976 is not set
CONFIG_CHARGER_BQ2415X=y
CONFIG_CHARGER_BQ24190=m
CONFIG_CHARGER_BQ24257=y
CONFIG_CHARGER_BQ24735=m
CONFIG_CHARGER_BQ2515X=y
CONFIG_CHARGER_BQ25890=m
CONFIG_CHARGER_BQ25980=m
# CONFIG_CHARGER_BQ256XX is not set
# CONFIG_CHARGER_SMB347 is not set
CONFIG_CHARGER_TPS65090=m
CONFIG_BATTERY_GAUGE_LTC2941=y
# CONFIG_BATTERY_GOLDFISH is not set
CONFIG_BATTERY_RT5033=y
CONFIG_CHARGER_RT9455=m
# CONFIG_CHARGER_BD99954 is not set
# CONFIG_BATTERY_UG3105 is not set
CONFIG_HWMON=y
CONFIG_HWMON_VID=y
# CONFIG_HWMON_DEBUG_CHIP is not set

#
# Native drivers
#
# CONFIG_SENSORS_AD7414 is not set
CONFIG_SENSORS_AD7418=y
# CONFIG_SENSORS_ADM1021 is not set
CONFIG_SENSORS_ADM1025=y
CONFIG_SENSORS_ADM1026=y
CONFIG_SENSORS_ADM1029=y
# CONFIG_SENSORS_ADM1031 is not set
CONFIG_SENSORS_ADM1177=m
CONFIG_SENSORS_ADM9240=m
# CONFIG_SENSORS_ADT7410 is not set
# CONFIG_SENSORS_ADT7411 is not set
CONFIG_SENSORS_ADT7462=y
CONFIG_SENSORS_ADT7470=y
CONFIG_SENSORS_ADT7475=y
# CONFIG_SENSORS_AHT10 is not set
CONFIG_SENSORS_AS370=y
CONFIG_SENSORS_ASC7621=m
# CONFIG_SENSORS_AXI_FAN_CONTROL is not set
CONFIG_SENSORS_K8TEMP=y
CONFIG_SENSORS_K10TEMP=y
# CONFIG_SENSORS_FAM15H_POWER is not set
CONFIG_SENSORS_APPLESMC=m
CONFIG_SENSORS_ASB100=m
CONFIG_SENSORS_ASPEED=m
CONFIG_SENSORS_ATXP1=y
CONFIG_SENSORS_CORSAIR_CPRO=m
# CONFIG_SENSORS_CORSAIR_PSU is not set
CONFIG_SENSORS_DS620=m
CONFIG_SENSORS_DS1621=y
# CONFIG_SENSORS_DELL_SMM is not set
# CONFIG_SENSORS_DA9055 is not set
CONFIG_SENSORS_I5K_AMB=y
CONFIG_SENSORS_F71805F=y
CONFIG_SENSORS_F71882FG=y
CONFIG_SENSORS_F75375S=y
CONFIG_SENSORS_FSCHMD=m
CONFIG_SENSORS_GL518SM=y
CONFIG_SENSORS_GL520SM=y
CONFIG_SENSORS_G760A=m
CONFIG_SENSORS_G762=y
# CONFIG_SENSORS_HIH6130 is not set
# CONFIG_SENSORS_IBMAEM is not set
CONFIG_SENSORS_IBMPEX=m
CONFIG_SENSORS_I5500=m
CONFIG_SENSORS_CORETEMP=m
CONFIG_SENSORS_IT87=y
# CONFIG_SENSORS_JC42 is not set
CONFIG_SENSORS_POWR1220=y
# CONFIG_SENSORS_LINEAGE is not set
CONFIG_SENSORS_LTC2945=y
CONFIG_SENSORS_LTC2947=m
CONFIG_SENSORS_LTC2947_I2C=m
# CONFIG_SENSORS_LTC2990 is not set
# CONFIG_SENSORS_LTC2992 is not set
# CONFIG_SENSORS_LTC4151 is not set
CONFIG_SENSORS_LTC4215=m
# CONFIG_SENSORS_LTC4222 is not set
# CONFIG_SENSORS_LTC4245 is not set
# CONFIG_SENSORS_LTC4260 is not set
CONFIG_SENSORS_LTC4261=m
# CONFIG_SENSORS_MAX127 is not set
CONFIG_SENSORS_MAX16065=y
# CONFIG_SENSORS_MAX1619 is not set
CONFIG_SENSORS_MAX1668=m
CONFIG_SENSORS_MAX197=m
# CONFIG_SENSORS_MAX31730 is not set
# CONFIG_SENSORS_MAX6620 is not set
# CONFIG_SENSORS_MAX6621 is not set
CONFIG_SENSORS_MAX6639=m
# CONFIG_SENSORS_MAX6642 is not set
CONFIG_SENSORS_MAX6650=y
CONFIG_SENSORS_MAX6697=y
CONFIG_SENSORS_MAX31790=m
CONFIG_SENSORS_MCP3021=m
CONFIG_SENSORS_MLXREG_FAN=y
CONFIG_SENSORS_TC654=y
# CONFIG_SENSORS_TPS23861 is not set
# CONFIG_SENSORS_MENF21BMC_HWMON is not set
CONFIG_SENSORS_MR75203=y
CONFIG_SENSORS_LM63=m
CONFIG_SENSORS_LM73=m
CONFIG_SENSORS_LM75=m
CONFIG_SENSORS_LM77=y
# CONFIG_SENSORS_LM78 is not set
CONFIG_SENSORS_LM80=y
CONFIG_SENSORS_LM83=m
CONFIG_SENSORS_LM85=m
# CONFIG_SENSORS_LM87 is not set
CONFIG_SENSORS_LM90=m
CONFIG_SENSORS_LM92=m
# CONFIG_SENSORS_LM93 is not set
CONFIG_SENSORS_LM95234=m
# CONFIG_SENSORS_LM95241 is not set
# CONFIG_SENSORS_LM95245 is not set
CONFIG_SENSORS_PC87360=m
CONFIG_SENSORS_PC87427=m
CONFIG_SENSORS_NCT6683=y
CONFIG_SENSORS_NCT6775_CORE=y
CONFIG_SENSORS_NCT6775=y
# CONFIG_SENSORS_NCT6775_I2C is not set
# CONFIG_SENSORS_NCT7802 is not set
# CONFIG_SENSORS_NPCM7XX is not set
# CONFIG_SENSORS_PCF8591 is not set
CONFIG_PMBUS=y
# CONFIG_SENSORS_PMBUS is not set
CONFIG_SENSORS_ADM1266=m
CONFIG_SENSORS_ADM1275=m
# CONFIG_SENSORS_BEL_PFE is not set
# CONFIG_SENSORS_BPA_RS600 is not set
# CONFIG_SENSORS_DELTA_AHE50DC_FAN is not set
# CONFIG_SENSORS_FSP_3Y is not set
# CONFIG_SENSORS_IBM_CFFPS is not set
# CONFIG_SENSORS_DPS920AB is not set
CONFIG_SENSORS_INSPUR_IPSPS=m
CONFIG_SENSORS_IR35221=m
# CONFIG_SENSORS_IR36021 is not set
# CONFIG_SENSORS_IR38064 is not set
CONFIG_SENSORS_IRPS5401=m
CONFIG_SENSORS_ISL68137=y
CONFIG_SENSORS_LM25066=y
# CONFIG_SENSORS_LM25066_REGULATOR is not set
CONFIG_SENSORS_LTC2978=m
# CONFIG_SENSORS_LTC2978_REGULATOR is not set
CONFIG_SENSORS_LTC3815=y
# CONFIG_SENSORS_MAX15301 is not set
# CONFIG_SENSORS_MAX16064 is not set
# CONFIG_SENSORS_MAX16601 is not set
CONFIG_SENSORS_MAX20730=m
# CONFIG_SENSORS_MAX20751 is not set
# CONFIG_SENSORS_MAX31785 is not set
CONFIG_SENSORS_MAX34440=m
CONFIG_SENSORS_MAX8688=y
# CONFIG_SENSORS_MP2888 is not set
CONFIG_SENSORS_MP2975=m
# CONFIG_SENSORS_MP5023 is not set
# CONFIG_SENSORS_PIM4328 is not set
# CONFIG_SENSORS_PLI1209BC is not set
# CONFIG_SENSORS_PM6764TR is not set
# CONFIG_SENSORS_PXE1610 is not set
# CONFIG_SENSORS_Q54SJ108A2 is not set
# CONFIG_SENSORS_STPDDC60 is not set
CONFIG_SENSORS_TPS40422=m
CONFIG_SENSORS_TPS53679=y
# CONFIG_SENSORS_UCD9000 is not set
CONFIG_SENSORS_UCD9200=m
# CONFIG_SENSORS_XDPE152 is not set
CONFIG_SENSORS_XDPE122=m
# CONFIG_SENSORS_XDPE122_REGULATOR is not set
CONFIG_SENSORS_ZL6100=y
# CONFIG_SENSORS_SBTSI is not set
# CONFIG_SENSORS_SBRMI is not set
CONFIG_SENSORS_SHT15=y
# CONFIG_SENSORS_SHT21 is not set
CONFIG_SENSORS_SHT3x=y
# CONFIG_SENSORS_SHT4x is not set
# CONFIG_SENSORS_SHTC1 is not set
CONFIG_SENSORS_SIS5595=y
# CONFIG_SENSORS_SY7636A is not set
CONFIG_SENSORS_DME1737=y
# CONFIG_SENSORS_EMC1403 is not set
# CONFIG_SENSORS_EMC2103 is not set
CONFIG_SENSORS_EMC6W201=y
CONFIG_SENSORS_SMSC47M1=m
CONFIG_SENSORS_SMSC47M192=m
# CONFIG_SENSORS_SMSC47B397 is not set
# CONFIG_SENSORS_STTS751 is not set
CONFIG_SENSORS_SMM665=y
# CONFIG_SENSORS_ADC128D818 is not set
# CONFIG_SENSORS_ADS7828 is not set
CONFIG_SENSORS_AMC6821=m
# CONFIG_SENSORS_INA209 is not set
# CONFIG_SENSORS_INA2XX is not set
# CONFIG_SENSORS_INA238 is not set
CONFIG_SENSORS_INA3221=y
CONFIG_SENSORS_TC74=m
CONFIG_SENSORS_THMC50=y
CONFIG_SENSORS_TMP102=y
CONFIG_SENSORS_TMP103=m
CONFIG_SENSORS_TMP108=m
CONFIG_SENSORS_TMP401=m
CONFIG_SENSORS_TMP421=y
# CONFIG_SENSORS_TMP464 is not set
CONFIG_SENSORS_TMP513=y
CONFIG_SENSORS_VIA_CPUTEMP=y
CONFIG_SENSORS_VIA686A=y
# CONFIG_SENSORS_VT1211 is not set
# CONFIG_SENSORS_VT8231 is not set
# CONFIG_SENSORS_W83773G is not set
CONFIG_SENSORS_W83781D=m
CONFIG_SENSORS_W83791D=y
CONFIG_SENSORS_W83792D=m
CONFIG_SENSORS_W83793=y
# CONFIG_SENSORS_W83795 is not set
# CONFIG_SENSORS_W83L785TS is not set
# CONFIG_SENSORS_W83L786NG is not set
CONFIG_SENSORS_W83627HF=y
CONFIG_SENSORS_W83627EHF=y
# CONFIG_SENSORS_WM8350 is not set
CONFIG_SENSORS_XGENE=m

#
# ACPI drivers
#
# CONFIG_SENSORS_ACPI_POWER is not set
CONFIG_SENSORS_ATK0110=m
# CONFIG_SENSORS_ASUS_WMI is not set
# CONFIG_SENSORS_ASUS_WMI_EC is not set
# CONFIG_SENSORS_ASUS_EC is not set
CONFIG_THERMAL=y
CONFIG_THERMAL_NETLINK=y
# CONFIG_THERMAL_STATISTICS is not set
CONFIG_THERMAL_EMERGENCY_POWEROFF_DELAY_MS=0
CONFIG_THERMAL_HWMON=y
CONFIG_THERMAL_WRITABLE_TRIPS=y
CONFIG_THERMAL_DEFAULT_GOV_STEP_WISE=y
# CONFIG_THERMAL_DEFAULT_GOV_FAIR_SHARE is not set
# CONFIG_THERMAL_DEFAULT_GOV_USER_SPACE is not set
# CONFIG_THERMAL_GOV_FAIR_SHARE is not set
CONFIG_THERMAL_GOV_STEP_WISE=y
CONFIG_THERMAL_GOV_BANG_BANG=y
CONFIG_THERMAL_GOV_USER_SPACE=y
# CONFIG_DEVFREQ_THERMAL is not set
CONFIG_THERMAL_EMULATION=y

#
# Intel thermal drivers
#
# CONFIG_INTEL_POWERCLAMP is not set
CONFIG_X86_THERMAL_VECTOR=y
CONFIG_X86_PKG_TEMP_THERMAL=m
CONFIG_INTEL_SOC_DTS_IOSF_CORE=y
CONFIG_INTEL_SOC_DTS_THERMAL=y

#
# ACPI INT340X thermal drivers
#
# end of ACPI INT340X thermal drivers

CONFIG_INTEL_BXT_PMIC_THERMAL=m
CONFIG_INTEL_PCH_THERMAL=y
# CONFIG_INTEL_TCC_COOLING is not set
# CONFIG_INTEL_HFI_THERMAL is not set
# end of Intel thermal drivers

# CONFIG_WATCHDOG is not set
CONFIG_SSB_POSSIBLE=y
# CONFIG_SSB is not set
CONFIG_BCMA_POSSIBLE=y
# CONFIG_BCMA is not set

#
# Multifunction device drivers
#
CONFIG_MFD_CORE=y
CONFIG_MFD_CS5535=y
# CONFIG_MFD_AS3711 is not set
CONFIG_PMIC_ADP5520=y
# CONFIG_MFD_AAT2870_CORE is not set
CONFIG_MFD_BCM590XX=y
# CONFIG_MFD_BD9571MWV is not set
CONFIG_MFD_AXP20X=y
CONFIG_MFD_AXP20X_I2C=y
CONFIG_MFD_MADERA=m
# CONFIG_MFD_MADERA_I2C is not set
CONFIG_MFD_CS47L15=y
# CONFIG_MFD_CS47L35 is not set
CONFIG_MFD_CS47L85=y
CONFIG_MFD_CS47L90=y
CONFIG_MFD_CS47L92=y
CONFIG_PMIC_DA903X=y
# CONFIG_MFD_DA9052_I2C is not set
CONFIG_MFD_DA9055=y
CONFIG_MFD_DA9062=m
# CONFIG_MFD_DA9063 is not set
CONFIG_MFD_DA9150=y
# CONFIG_MFD_MC13XXX_I2C is not set
CONFIG_MFD_MP2629=m
CONFIG_HTC_PASIC3=m
CONFIG_HTC_I2CPLD=y
CONFIG_MFD_INTEL_QUARK_I2C_GPIO=m
CONFIG_LPC_ICH=y
CONFIG_LPC_SCH=y
CONFIG_INTEL_SOC_PMIC_BXTWC=m
CONFIG_INTEL_SOC_PMIC_MRFLD=y
CONFIG_MFD_INTEL_LPSS=m
# CONFIG_MFD_INTEL_LPSS_ACPI is not set
CONFIG_MFD_INTEL_LPSS_PCI=m
CONFIG_MFD_INTEL_PMC_BXT=m
CONFIG_MFD_IQS62X=m
# CONFIG_MFD_JANZ_CMODIO is not set
CONFIG_MFD_KEMPLD=y
# CONFIG_MFD_88PM800 is not set
# CONFIG_MFD_88PM805 is not set
CONFIG_MFD_88PM860X=y
CONFIG_MFD_MAX14577=y
CONFIG_MFD_MAX77693=y
# CONFIG_MFD_MAX77843 is not set
# CONFIG_MFD_MAX8907 is not set
# CONFIG_MFD_MAX8925 is not set
# CONFIG_MFD_MAX8997 is not set
# CONFIG_MFD_MAX8998 is not set
# CONFIG_MFD_MT6360 is not set
CONFIG_MFD_MT6397=y
CONFIG_MFD_MENF21BMC=y
CONFIG_MFD_RETU=m
# CONFIG_MFD_PCF50633 is not set
CONFIG_MFD_RDC321X=y
# CONFIG_MFD_RT4831 is not set
CONFIG_MFD_RT5033=m
# CONFIG_MFD_RC5T583 is not set
# CONFIG_MFD_SI476X_CORE is not set
# CONFIG_MFD_SIMPLE_MFD_I2C is not set
# CONFIG_MFD_SM501 is not set
# CONFIG_MFD_SKY81452 is not set
CONFIG_MFD_SYSCON=y
CONFIG_MFD_TI_AM335X_TSCADC=y
# CONFIG_MFD_LP3943 is not set
# CONFIG_MFD_LP8788 is not set
# CONFIG_MFD_TI_LMU is not set
# CONFIG_MFD_PALMAS is not set
CONFIG_TPS6105X=m
# CONFIG_TPS65010 is not set
CONFIG_TPS6507X=m
# CONFIG_MFD_TPS65086 is not set
CONFIG_MFD_TPS65090=y
CONFIG_MFD_TI_LP873X=m
# CONFIG_MFD_TPS6586X is not set
# CONFIG_MFD_TPS65910 is not set
CONFIG_MFD_TPS65912=y
CONFIG_MFD_TPS65912_I2C=y
# CONFIG_TWL4030_CORE is not set
CONFIG_TWL6040_CORE=y
CONFIG_MFD_WL1273_CORE=y
CONFIG_MFD_LM3533=y
# CONFIG_MFD_TIMBERDALE is not set
CONFIG_MFD_TQMX86=y
CONFIG_MFD_VX855=y
CONFIG_MFD_ARIZONA=m
CONFIG_MFD_ARIZONA_I2C=m
CONFIG_MFD_CS47L24=y
# CONFIG_MFD_WM5102 is not set
CONFIG_MFD_WM5110=y
# CONFIG_MFD_WM8997 is not set
# CONFIG_MFD_WM8998 is not set
CONFIG_MFD_WM8400=y
# CONFIG_MFD_WM831X_I2C is not set
CONFIG_MFD_WM8350=y
CONFIG_MFD_WM8350_I2C=y
# CONFIG_MFD_WM8994 is not set
CONFIG_MFD_WCD934X=m
# CONFIG_MFD_ATC260X_I2C is not set
# CONFIG_RAVE_SP_CORE is not set
# end of Multifunction device drivers

CONFIG_REGULATOR=y
CONFIG_REGULATOR_DEBUG=y
CONFIG_REGULATOR_FIXED_VOLTAGE=m
# CONFIG_REGULATOR_VIRTUAL_CONSUMER is not set
CONFIG_REGULATOR_USERSPACE_CONSUMER=y
CONFIG_REGULATOR_88PG86X=m
CONFIG_REGULATOR_88PM8607=y
# CONFIG_REGULATOR_ACT8865 is not set
CONFIG_REGULATOR_AD5398=m
# CONFIG_REGULATOR_AXP20X is not set
# CONFIG_REGULATOR_BCM590XX is not set
# CONFIG_REGULATOR_DA903X is not set
CONFIG_REGULATOR_DA9055=m
CONFIG_REGULATOR_DA9062=m
# CONFIG_REGULATOR_DA9210 is not set
CONFIG_REGULATOR_DA9211=y
CONFIG_REGULATOR_FAN53555=y
CONFIG_REGULATOR_GPIO=y
CONFIG_REGULATOR_ISL9305=m
# CONFIG_REGULATOR_ISL6271A is not set
CONFIG_REGULATOR_LP3971=m
# CONFIG_REGULATOR_LP3972 is not set
CONFIG_REGULATOR_LP872X=m
CONFIG_REGULATOR_LP8755=y
# CONFIG_REGULATOR_LTC3589 is not set
# CONFIG_REGULATOR_LTC3676 is not set
# CONFIG_REGULATOR_MAX14577 is not set
CONFIG_REGULATOR_MAX1586=m
CONFIG_REGULATOR_MAX8649=m
CONFIG_REGULATOR_MAX8660=m
# CONFIG_REGULATOR_MAX8893 is not set
CONFIG_REGULATOR_MAX8952=m
# CONFIG_REGULATOR_MAX20086 is not set
CONFIG_REGULATOR_MAX77693=y
CONFIG_REGULATOR_MAX77826=m
# CONFIG_REGULATOR_MP8859 is not set
# CONFIG_REGULATOR_MT6311 is not set
# CONFIG_REGULATOR_MT6315 is not set
CONFIG_REGULATOR_MT6323=m
CONFIG_REGULATOR_MT6358=y
CONFIG_REGULATOR_MT6359=m
CONFIG_REGULATOR_MT6397=m
# CONFIG_REGULATOR_PCA9450 is not set
# CONFIG_REGULATOR_PV88060 is not set
CONFIG_REGULATOR_PV88080=m
CONFIG_REGULATOR_PV88090=m
# CONFIG_REGULATOR_PWM is not set
# CONFIG_REGULATOR_QCOM_SPMI is not set
CONFIG_REGULATOR_QCOM_USB_VBUS=y
# CONFIG_REGULATOR_RT4801 is not set
CONFIG_REGULATOR_RT5033=m
# CONFIG_REGULATOR_RT5190A is not set
# CONFIG_REGULATOR_RT5759 is not set
# CONFIG_REGULATOR_RT6160 is not set
# CONFIG_REGULATOR_RT6245 is not set
# CONFIG_REGULATOR_RTQ2134 is not set
# CONFIG_REGULATOR_RTMV20 is not set
# CONFIG_REGULATOR_RTQ6752 is not set
# CONFIG_REGULATOR_SLG51000 is not set
# CONFIG_REGULATOR_SY7636A is not set
CONFIG_REGULATOR_TPS51632=m
CONFIG_REGULATOR_TPS6105X=m
CONFIG_REGULATOR_TPS62360=m
CONFIG_REGULATOR_TPS65023=m
CONFIG_REGULATOR_TPS6507X=y
# CONFIG_REGULATOR_TPS65090 is not set
# CONFIG_REGULATOR_TPS65132 is not set
CONFIG_REGULATOR_TPS65912=m
CONFIG_REGULATOR_WM8350=m
CONFIG_REGULATOR_WM8400=m
CONFIG_REGULATOR_QCOM_LABIBB=m
CONFIG_RC_CORE=y
CONFIG_LIRC=y
CONFIG_RC_MAP=m
# CONFIG_RC_DECODERS is not set
# CONFIG_RC_DEVICES is not set
CONFIG_CEC_CORE=y
CONFIG_CEC_NOTIFIER=y

#
# CEC support
#
# CONFIG_MEDIA_CEC_RC is not set
CONFIG_MEDIA_CEC_SUPPORT=y
CONFIG_CEC_CH7322=m
# CONFIG_USB_PULSE8_CEC is not set
# CONFIG_USB_RAINSHADOW_CEC is not set
# end of CEC support

CONFIG_MEDIA_SUPPORT=y
CONFIG_MEDIA_SUPPORT_FILTER=y
# CONFIG_MEDIA_SUBDRV_AUTOSELECT is not set

#
# Media device types
#
# CONFIG_MEDIA_CAMERA_SUPPORT is not set
CONFIG_MEDIA_ANALOG_TV_SUPPORT=y
# CONFIG_MEDIA_DIGITAL_TV_SUPPORT is not set
# CONFIG_MEDIA_RADIO_SUPPORT is not set
# CONFIG_MEDIA_SDR_SUPPORT is not set
CONFIG_MEDIA_PLATFORM_SUPPORT=y
# CONFIG_MEDIA_TEST_SUPPORT is not set
# end of Media device types

CONFIG_VIDEO_DEV=y
CONFIG_MEDIA_CONTROLLER=y

#
# Video4Linux options
#
CONFIG_VIDEO_V4L2_I2C=y
CONFIG_VIDEO_V4L2_SUBDEV_API=y
# CONFIG_VIDEO_ADV_DEBUG is not set
CONFIG_VIDEO_FIXED_MINOR_RANGES=y
CONFIG_V4L2_MEM2MEM_DEV=y
# CONFIG_V4L2_FLASH_LED_CLASS is not set
CONFIG_V4L2_FWNODE=y
CONFIG_V4L2_ASYNC=y
# end of Video4Linux options

#
# Media controller options
#
# end of Media controller options

#
# Media drivers
#

#
# Drivers filtered as selected at 'Filter media drivers'
#

#
# Media drivers
#
# CONFIG_MEDIA_PCI_SUPPORT is not set
CONFIG_MEDIA_PLATFORM_DRIVERS=y
CONFIG_V4L_PLATFORM_DRIVERS=y
CONFIG_V4L_MEM2MEM_DRIVERS=y
CONFIG_VIDEO_MEM2MEM_DEINTERLACE=y

#
# Allegro DVT media platform drivers
#

#
# Amlogic media platform drivers
#

#
# Amphion drivers
#

#
# Aspeed media platform drivers
#
CONFIG_VIDEO_ASPEED=y

#
# Atmel media platform drivers
#

#
# Cadence media platform drivers
#
# CONFIG_VIDEO_CADENCE_CSI2RX is not set
# CONFIG_VIDEO_CADENCE_CSI2TX is not set

#
# Chips&Media media platform drivers
#

#
# Intel media platform drivers
#

#
# Marvell media platform drivers
#
CONFIG_VIDEO_CAFE_CCIC=m

#
# Mediatek media platform drivers
#

#
# NVidia media platform drivers
#

#
# NXP media platform drivers
#

#
# Qualcomm media platform drivers
#

#
# Renesas media platform drivers
#

#
# Rockchip media platform drivers
#

#
# Samsung media platform drivers
#

#
# STMicroelectronics media platform drivers
#

#
# Sunxi media platform drivers
#

#
# Texas Instruments drivers
#

#
# VIA media platform drivers
#

#
# Xilinx media platform drivers
#
CONFIG_VIDEOBUF2_CORE=y
CONFIG_VIDEOBUF2_V4L2=y
CONFIG_VIDEOBUF2_MEMOPS=y
CONFIG_VIDEOBUF2_DMA_CONTIG=y
CONFIG_VIDEOBUF2_VMALLOC=m
CONFIG_VIDEOBUF2_DMA_SG=m
# end of Media drivers

#
# Media ancillary drivers
#
CONFIG_MEDIA_ATTACH=y
# CONFIG_VIDEO_IR_I2C is not set
CONFIG_VIDEO_OV7670=m

#
# Audio decoders, processors and mixers
#
CONFIG_VIDEO_CS3308=m
# CONFIG_VIDEO_CS5345 is not set
CONFIG_VIDEO_CS53L32A=m
# CONFIG_VIDEO_MSP3400 is not set
# CONFIG_VIDEO_SONY_BTF_MPX is not set
# CONFIG_VIDEO_TDA7432 is not set
CONFIG_VIDEO_TDA9840=m
CONFIG_VIDEO_TEA6415C=y
CONFIG_VIDEO_TEA6420=y
# CONFIG_VIDEO_TLV320AIC23B is not set
CONFIG_VIDEO_TVAUDIO=y
CONFIG_VIDEO_UDA1342=m
CONFIG_VIDEO_VP27SMPX=y
CONFIG_VIDEO_WM8739=m
# CONFIG_VIDEO_WM8775 is not set
# end of Audio decoders, processors and mixers

#
# RDS decoders
#
CONFIG_VIDEO_SAA6588=y
# end of RDS decoders

#
# Video decoders
#
CONFIG_VIDEO_ADV7180=m
CONFIG_VIDEO_ADV7183=m
CONFIG_VIDEO_ADV7604=m
CONFIG_VIDEO_ADV7604_CEC=y
# CONFIG_VIDEO_ADV7842 is not set
CONFIG_VIDEO_BT819=m
# CONFIG_VIDEO_BT856 is not set
CONFIG_VIDEO_BT866=y
CONFIG_VIDEO_KS0127=m
CONFIG_VIDEO_ML86V7667=y
CONFIG_VIDEO_SAA7110=m
CONFIG_VIDEO_SAA711X=m
# CONFIG_VIDEO_TC358743 is not set
CONFIG_VIDEO_TVP514X=y
# CONFIG_VIDEO_TVP5150 is not set
CONFIG_VIDEO_TVP7002=y
CONFIG_VIDEO_TW2804=m
# CONFIG_VIDEO_TW9903 is not set
# CONFIG_VIDEO_TW9906 is not set
# CONFIG_VIDEO_TW9910 is not set
CONFIG_VIDEO_VPX3220=m

#
# Video and audio decoders
#
CONFIG_VIDEO_SAA717X=m
CONFIG_VIDEO_CX25840=m
# end of Video decoders

#
# Video encoders
#
CONFIG_VIDEO_AD9389B=m
CONFIG_VIDEO_ADV7170=m
# CONFIG_VIDEO_ADV7175 is not set
CONFIG_VIDEO_ADV7343=y
CONFIG_VIDEO_ADV7393=m
# CONFIG_VIDEO_ADV7511 is not set
CONFIG_VIDEO_AK881X=m
# CONFIG_VIDEO_SAA7127 is not set
CONFIG_VIDEO_SAA7185=m
# CONFIG_VIDEO_THS8200 is not set
# end of Video encoders

#
# Video improvement chips
#
CONFIG_VIDEO_UPD64031A=y
CONFIG_VIDEO_UPD64083=m
# end of Video improvement chips

#
# Audio/Video compression chips
#
CONFIG_VIDEO_SAA6752HS=m
# end of Audio/Video compression chips

#
# SDR tuner chips
#
# end of SDR tuner chips

#
# Miscellaneous helper chips
#
# CONFIG_VIDEO_I2C is not set
# CONFIG_VIDEO_M52790 is not set
CONFIG_VIDEO_ST_MIPID02=y
CONFIG_VIDEO_THS7303=y
# end of Miscellaneous helper chips

CONFIG_MEDIA_TUNER=y

#
# Customize TV tuners
#
CONFIG_MEDIA_TUNER_E4000=y
# CONFIG_MEDIA_TUNER_FC0011 is not set
CONFIG_MEDIA_TUNER_FC0012=m
CONFIG_MEDIA_TUNER_FC0013=y
# CONFIG_MEDIA_TUNER_FC2580 is not set
CONFIG_MEDIA_TUNER_IT913X=y
CONFIG_MEDIA_TUNER_M88RS6000T=m
CONFIG_MEDIA_TUNER_MAX2165=y
CONFIG_MEDIA_TUNER_MC44S803=y
CONFIG_MEDIA_TUNER_MT2060=m
# CONFIG_MEDIA_TUNER_MT2063 is not set
CONFIG_MEDIA_TUNER_MT20XX=m
CONFIG_MEDIA_TUNER_MT2131=m
CONFIG_MEDIA_TUNER_MT2266=m
CONFIG_MEDIA_TUNER_MXL301RF=y
# CONFIG_MEDIA_TUNER_MXL5005S is not set
# CONFIG_MEDIA_TUNER_MXL5007T is not set
CONFIG_MEDIA_TUNER_QM1D1B0004=y
# CONFIG_MEDIA_TUNER_QM1D1C0042 is not set
CONFIG_MEDIA_TUNER_QT1010=m
# CONFIG_MEDIA_TUNER_R820T is not set
CONFIG_MEDIA_TUNER_SI2157=m
CONFIG_MEDIA_TUNER_SIMPLE=m
CONFIG_MEDIA_TUNER_TDA18212=y
CONFIG_MEDIA_TUNER_TDA18218=y
CONFIG_MEDIA_TUNER_TDA18250=y
CONFIG_MEDIA_TUNER_TDA18271=y
CONFIG_MEDIA_TUNER_TDA827X=y
CONFIG_MEDIA_TUNER_TDA8290=y
CONFIG_MEDIA_TUNER_TDA9887=m
# CONFIG_MEDIA_TUNER_TEA5761 is not set
# CONFIG_MEDIA_TUNER_TEA5767 is not set
# CONFIG_MEDIA_TUNER_TUA9001 is not set
CONFIG_MEDIA_TUNER_XC2028=m
CONFIG_MEDIA_TUNER_XC4000=y
CONFIG_MEDIA_TUNER_XC5000=m
# end of Customize TV tuners
# end of Media ancillary drivers

#
# Graphics support
#
CONFIG_AGP=y
# CONFIG_AGP_ALI is not set
# CONFIG_AGP_ATI is not set
# CONFIG_AGP_AMD is not set
# CONFIG_AGP_AMD64 is not set
CONFIG_AGP_INTEL=m
CONFIG_AGP_NVIDIA=m
# CONFIG_AGP_SIS is not set
CONFIG_AGP_SWORKS=y
CONFIG_AGP_VIA=y
CONFIG_AGP_EFFICEON=m
CONFIG_INTEL_GTT=y
# CONFIG_VGA_SWITCHEROO is not set
CONFIG_DRM=y
CONFIG_DRM_MIPI_DSI=y
# CONFIG_DRM_DEBUG_MM is not set
# CONFIG_DRM_DEBUG_SELFTEST is not set
CONFIG_DRM_KMS_HELPER=y
# CONFIG_DRM_DEBUG_DP_MST_TOPOLOGY_REFS is not set
CONFIG_DRM_DEBUG_MODESET_LOCK=y
# CONFIG_DRM_LOAD_EDID_FIRMWARE is not set
CONFIG_DRM_DISPLAY_HELPER=y
CONFIG_DRM_DISPLAY_DP_HELPER=y
CONFIG_DRM_DISPLAY_HDCP_HELPER=y
CONFIG_DRM_DISPLAY_HDMI_HELPER=y
# CONFIG_DRM_DP_AUX_CHARDEV is not set
# CONFIG_DRM_DP_CEC is not set
CONFIG_DRM_TTM=y
CONFIG_DRM_BUDDY=y
CONFIG_DRM_VRAM_HELPER=y
CONFIG_DRM_TTM_HELPER=y
CONFIG_DRM_GEM_SHMEM_HELPER=y
CONFIG_DRM_SCHED=y

#
# I2C encoder or helper chips
#
CONFIG_DRM_I2C_CH7006=m
CONFIG_DRM_I2C_SIL164=m
CONFIG_DRM_I2C_NXP_TDA998X=m
CONFIG_DRM_I2C_NXP_TDA9950=m
# end of I2C encoder or helper chips

#
# ARM devices
#
# end of ARM devices

# CONFIG_DRM_RADEON is not set
# CONFIG_DRM_AMDGPU is not set
# CONFIG_DRM_NOUVEAU is not set
CONFIG_DRM_I915=y
CONFIG_DRM_I915_FORCE_PROBE=""
# CONFIG_DRM_I915_CAPTURE_ERROR is not set
# CONFIG_DRM_I915_USERPTR is not set
CONFIG_DRM_I915_PXP=y

#
# drm/i915 Debugging
#
CONFIG_DRM_I915_WERROR=y
# CONFIG_DRM_I915_DEBUG is not set
CONFIG_DRM_I915_DEBUG_MMIO=y
# CONFIG_DRM_I915_DEBUG_GEM is not set
CONFIG_DRM_I915_SW_FENCE_DEBUG_OBJECTS=y
# CONFIG_DRM_I915_SW_FENCE_CHECK_DAG is not set
# CONFIG_DRM_I915_DEBUG_GUC is not set
# CONFIG_DRM_I915_SELFTEST is not set
CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS=y
# CONFIG_DRM_I915_DEBUG_VBLANK_EVADE is not set
CONFIG_DRM_I915_DEBUG_RUNTIME_PM=y
# end of drm/i915 Debugging

#
# drm/i915 Profile Guided Optimisation
#
CONFIG_DRM_I915_REQUEST_TIMEOUT=20000
CONFIG_DRM_I915_FENCE_TIMEOUT=10000
CONFIG_DRM_I915_USERFAULT_AUTOSUSPEND=250
CONFIG_DRM_I915_HEARTBEAT_INTERVAL=2500
CONFIG_DRM_I915_PREEMPT_TIMEOUT=640
CONFIG_DRM_I915_MAX_REQUEST_BUSYWAIT=8000
CONFIG_DRM_I915_STOP_TIMEOUT=100
CONFIG_DRM_I915_TIMESLICE_DURATION=1
# end of drm/i915 Profile Guided Optimisation

# CONFIG_DRM_VGEM is not set
# CONFIG_DRM_VKMS is not set
CONFIG_DRM_VMWGFX=m
# CONFIG_DRM_VMWGFX_MKSSTATS is not set
CONFIG_DRM_GMA500=m
CONFIG_DRM_AST=m
# CONFIG_DRM_MGAG200 is not set
CONFIG_DRM_QXL=m
CONFIG_DRM_VIRTIO_GPU=y
CONFIG_DRM_PANEL=y

#
# Display Panels
#
CONFIG_DRM_PANEL_RASPBERRYPI_TOUCHSCREEN=m
# end of Display Panels

CONFIG_DRM_BRIDGE=y
CONFIG_DRM_PANEL_BRIDGE=y

#
# Display Interface Bridges
#
CONFIG_DRM_ANALOGIX_ANX78XX=y
CONFIG_DRM_ANALOGIX_DP=y
# end of Display Interface Bridges

CONFIG_DRM_ETNAVIV=y
# CONFIG_DRM_ETNAVIV_THERMAL is not set
CONFIG_DRM_BOCHS=y
CONFIG_DRM_CIRRUS_QEMU=m
CONFIG_DRM_SIMPLEDRM=y
# CONFIG_DRM_VBOXVIDEO is not set
# CONFIG_DRM_SSD130X is not set
CONFIG_DRM_HYPERV=m
# CONFIG_DRM_LEGACY is not set
CONFIG_DRM_PANEL_ORIENTATION_QUIRKS=y
CONFIG_DRM_NOMODESET=y

#
# Frame buffer Devices
#
CONFIG_FB_CMDLINE=y
# CONFIG_FB is not set
# end of Frame buffer Devices

#
# Backlight & LCD device support
#
CONFIG_LCD_CLASS_DEVICE=m
CONFIG_LCD_PLATFORM=m
CONFIG_BACKLIGHT_CLASS_DEVICE=y
# CONFIG_BACKLIGHT_KTD253 is not set
CONFIG_BACKLIGHT_LM3533=y
CONFIG_BACKLIGHT_PWM=y
CONFIG_BACKLIGHT_DA903X=m
# CONFIG_BACKLIGHT_APPLE is not set
CONFIG_BACKLIGHT_QCOM_WLED=m
CONFIG_BACKLIGHT_SAHARA=m
CONFIG_BACKLIGHT_ADP5520=y
# CONFIG_BACKLIGHT_ADP8860 is not set
# CONFIG_BACKLIGHT_ADP8870 is not set
CONFIG_BACKLIGHT_88PM860X=y
CONFIG_BACKLIGHT_LM3630A=m
CONFIG_BACKLIGHT_LM3639=y
CONFIG_BACKLIGHT_LP855X=y
CONFIG_BACKLIGHT_GPIO=y
CONFIG_BACKLIGHT_LV5207LP=y
CONFIG_BACKLIGHT_BD6107=y
# CONFIG_BACKLIGHT_ARCXCNN is not set
# end of Backlight & LCD device support

CONFIG_HDMI=y
# end of Graphics support

CONFIG_SOUND=y
CONFIG_SND=y
CONFIG_SND_TIMER=m
CONFIG_SND_PCM=m
CONFIG_SND_JACK=y
CONFIG_SND_JACK_INPUT_DEV=y
# CONFIG_SND_OSSEMUL is not set
CONFIG_SND_PCM_TIMER=y
CONFIG_SND_HRTIMER=m
# CONFIG_SND_DYNAMIC_MINORS is not set
# CONFIG_SND_SUPPORT_OLD_API is not set
CONFIG_SND_PROC_FS=y
CONFIG_SND_VERBOSE_PROCFS=y
CONFIG_SND_VERBOSE_PRINTK=y
# CONFIG_SND_DEBUG is not set
CONFIG_SND_DMA_SGBUF=y
# CONFIG_SND_SEQUENCER is not set
# CONFIG_SND_DRIVERS is not set
# CONFIG_SND_PCI is not set

#
# HD-Audio
#
# end of HD-Audio

CONFIG_SND_HDA_PREALLOC_SIZE=0
# CONFIG_SND_FIREWIRE is not set
CONFIG_SND_PCMCIA=y
# CONFIG_SND_VXPOCKET is not set
CONFIG_SND_PDAUDIOCF=m
# CONFIG_SND_SOC is not set
CONFIG_SND_X86=y
CONFIG_HDMI_LPE_AUDIO=m
CONFIG_SND_VIRTIO=m

#
# HID support
#
CONFIG_HID=m
# CONFIG_HID_BATTERY_STRENGTH is not set
CONFIG_HIDRAW=y
CONFIG_UHID=m
CONFIG_HID_GENERIC=m

#
# Special HID drivers
#
# CONFIG_HID_A4TECH is not set
# CONFIG_HID_ACRUX is not set
# CONFIG_HID_APPLE is not set
# CONFIG_HID_AUREAL is not set
CONFIG_HID_BELKIN=m
CONFIG_HID_CHERRY=m
CONFIG_HID_COUGAR=m
# CONFIG_HID_MACALLY is not set
# CONFIG_HID_CMEDIA is not set
# CONFIG_HID_CYPRESS is not set
CONFIG_HID_DRAGONRISE=m
CONFIG_DRAGONRISE_FF=y
# CONFIG_HID_EMS_FF is not set
# CONFIG_HID_ELECOM is not set
# CONFIG_HID_EZKEY is not set
# CONFIG_HID_GEMBIRD is not set
# CONFIG_HID_GFRM is not set
CONFIG_HID_GLORIOUS=m
# CONFIG_HID_VIVALDI is not set
CONFIG_HID_KEYTOUCH=m
CONFIG_HID_KYE=m
CONFIG_HID_WALTOP=m
CONFIG_HID_VIEWSONIC=m
# CONFIG_HID_XIAOMI is not set
# CONFIG_HID_GYRATION is not set
# CONFIG_HID_ICADE is not set
CONFIG_HID_ITE=m
CONFIG_HID_JABRA=m
CONFIG_HID_TWINHAN=m
CONFIG_HID_KENSINGTON=m
CONFIG_HID_LCPOWER=m
CONFIG_HID_LED=m
CONFIG_HID_LENOVO=m
CONFIG_HID_MAGICMOUSE=m
CONFIG_HID_MALTRON=m
CONFIG_HID_MAYFLASH=m
CONFIG_HID_REDRAGON=m
CONFIG_HID_MICROSOFT=m
# CONFIG_HID_MONTEREY is not set
CONFIG_HID_MULTITOUCH=m
CONFIG_HID_NINTENDO=m
# CONFIG_NINTENDO_FF is not set
# CONFIG_HID_NTI is not set
CONFIG_HID_ORTEK=m
CONFIG_HID_PANTHERLORD=m
# CONFIG_PANTHERLORD_FF is not set
# CONFIG_HID_PETALYNX is not set
CONFIG_HID_PICOLCD=m
# CONFIG_HID_PICOLCD_BACKLIGHT is not set
# CONFIG_HID_PICOLCD_LCD is not set
# CONFIG_HID_PICOLCD_LEDS is not set
CONFIG_HID_PICOLCD_CIR=y
CONFIG_HID_PLANTRONICS=m
# CONFIG_HID_RAZER is not set
# CONFIG_HID_PRIMAX is not set
CONFIG_HID_SAITEK=m
# CONFIG_HID_SEMITEK is not set
CONFIG_HID_SPEEDLINK=m
CONFIG_HID_STEAM=m
CONFIG_HID_STEELSERIES=m
CONFIG_HID_SUNPLUS=m
# CONFIG_HID_RMI is not set
CONFIG_HID_GREENASIA=m
CONFIG_GREENASIA_FF=y
# CONFIG_HID_HYPERV_MOUSE is not set
CONFIG_HID_SMARTJOYPLUS=m
# CONFIG_SMARTJOYPLUS_FF is not set
CONFIG_HID_TIVO=m
CONFIG_HID_TOPSEED=m
CONFIG_HID_THINGM=m
CONFIG_HID_UDRAW_PS3=m
# CONFIG_HID_WIIMOTE is not set
# CONFIG_HID_XINMO is not set
CONFIG_HID_ZEROPLUS=m
# CONFIG_ZEROPLUS_FF is not set
CONFIG_HID_ZYDACRON=m
CONFIG_HID_SENSOR_HUB=m
CONFIG_HID_SENSOR_CUSTOM_SENSOR=m
CONFIG_HID_ALPS=m
# end of Special HID drivers

#
# I2C HID support
#
CONFIG_I2C_HID_ACPI=m
# end of I2C HID support

CONFIG_I2C_HID_CORE=m
# end of HID support

CONFIG_USB_OHCI_LITTLE_ENDIAN=y
CONFIG_USB_SUPPORT=y
# CONFIG_USB_ULPI_BUS is not set
# CONFIG_USB_CONN_GPIO is not set
CONFIG_USB_ARCH_HAS_HCD=y
# CONFIG_USB is not set
CONFIG_USB_PCI=y

#
# USB port drivers
#

#
# USB Physical Layer drivers
#
# CONFIG_NOP_USB_XCEIV is not set
# CONFIG_USB_GPIO_VBUS is not set
# CONFIG_TAHVO_USB is not set
# end of USB Physical Layer drivers

# CONFIG_USB_GADGET is not set
# CONFIG_TYPEC is not set
# CONFIG_USB_ROLE_SWITCH is not set
# CONFIG_MMC is not set
CONFIG_MEMSTICK=m
# CONFIG_MEMSTICK_DEBUG is not set

#
# MemoryStick drivers
#
# CONFIG_MEMSTICK_UNSAFE_RESUME is not set
# CONFIG_MSPRO_BLOCK is not set
# CONFIG_MS_BLOCK is not set

#
# MemoryStick Host Controller Drivers
#
CONFIG_MEMSTICK_TIFM_MS=m
# CONFIG_MEMSTICK_JMICRON_38X is not set
CONFIG_MEMSTICK_R592=m
CONFIG_NEW_LEDS=y
CONFIG_LEDS_CLASS=y
CONFIG_LEDS_CLASS_FLASH=y
# CONFIG_LEDS_CLASS_MULTICOLOR is not set
CONFIG_LEDS_BRIGHTNESS_HW_CHANGED=y

#
# LED drivers
#
# CONFIG_LEDS_88PM860X is not set
# CONFIG_LEDS_LM3530 is not set
# CONFIG_LEDS_LM3532 is not set
CONFIG_LEDS_LM3533=m
CONFIG_LEDS_LM3642=m
CONFIG_LEDS_MT6323=y
CONFIG_LEDS_NET48XX=m
CONFIG_LEDS_WRAP=m
CONFIG_LEDS_PCA9532=m
# CONFIG_LEDS_PCA9532_GPIO is not set
CONFIG_LEDS_GPIO=y
# CONFIG_LEDS_LP3944 is not set
CONFIG_LEDS_LP3952=y
# CONFIG_LEDS_LP50XX is not set
CONFIG_LEDS_PCA955X=y
# CONFIG_LEDS_PCA955X_GPIO is not set
CONFIG_LEDS_PCA963X=y
CONFIG_LEDS_WM8350=m
# CONFIG_LEDS_DA903X is not set
CONFIG_LEDS_PWM=m
# CONFIG_LEDS_REGULATOR is not set
CONFIG_LEDS_BD2802=y
# CONFIG_LEDS_LT3593 is not set
# CONFIG_LEDS_ADP5520 is not set
# CONFIG_LEDS_TCA6507 is not set
# CONFIG_LEDS_TLC591XX is not set
# CONFIG_LEDS_LM355x is not set
CONFIG_LEDS_OT200=y
CONFIG_LEDS_MENF21BMC=m

#
# LED driver for blink(1) USB RGB LED is under Special HID drivers (HID_THINGM)
#
# CONFIG_LEDS_BLINKM is not set
CONFIG_LEDS_MLXREG=m
CONFIG_LEDS_USER=y
# CONFIG_LEDS_NIC78BX is not set
# CONFIG_LEDS_TI_LMU_COMMON is not set
CONFIG_LEDS_TPS6105X=m

#
# Flash and Torch LED drivers
#
CONFIG_LEDS_AS3645A=y
# CONFIG_LEDS_LM3601X is not set
# CONFIG_LEDS_RT8515 is not set
CONFIG_LEDS_SGM3140=y

#
# RGB LED drivers
#

#
# LED Triggers
#
CONFIG_LEDS_TRIGGERS=y
CONFIG_LEDS_TRIGGER_TIMER=m
CONFIG_LEDS_TRIGGER_ONESHOT=y
# CONFIG_LEDS_TRIGGER_MTD is not set
# CONFIG_LEDS_TRIGGER_HEARTBEAT is not set
# CONFIG_LEDS_TRIGGER_BACKLIGHT is not set
CONFIG_LEDS_TRIGGER_CPU=y
CONFIG_LEDS_TRIGGER_ACTIVITY=m
CONFIG_LEDS_TRIGGER_GPIO=m
CONFIG_LEDS_TRIGGER_DEFAULT_ON=y

#
# iptables trigger is under Netfilter config (LED target)
#
# CONFIG_LEDS_TRIGGER_TRANSIENT is not set
# CONFIG_LEDS_TRIGGER_CAMERA is not set
# CONFIG_LEDS_TRIGGER_PANIC is not set
CONFIG_LEDS_TRIGGER_NETDEV=y
# CONFIG_LEDS_TRIGGER_PATTERN is not set
# CONFIG_LEDS_TRIGGER_AUDIO is not set
# CONFIG_LEDS_TRIGGER_TTY is not set

#
# Simple LED drivers
#
CONFIG_ACCESSIBILITY=y

#
# Speakup console speech
#
# end of Speakup console speech

# CONFIG_INFINIBAND is not set
CONFIG_EDAC_ATOMIC_SCRUB=y
CONFIG_EDAC_SUPPORT=y
CONFIG_EDAC=y
CONFIG_EDAC_LEGACY_SYSFS=y
# CONFIG_EDAC_DEBUG is not set
# CONFIG_EDAC_AMD76X is not set
CONFIG_EDAC_E7XXX=y
CONFIG_EDAC_E752X=m
# CONFIG_EDAC_I82875P is not set
CONFIG_EDAC_I82975X=y
# CONFIG_EDAC_I3000 is not set
# CONFIG_EDAC_I3200 is not set
CONFIG_EDAC_IE31200=m
# CONFIG_EDAC_X38 is not set
CONFIG_EDAC_I5400=m
# CONFIG_EDAC_I82860 is not set
CONFIG_EDAC_R82600=m
# CONFIG_EDAC_I5000 is not set
CONFIG_EDAC_I5100=y
CONFIG_EDAC_I7300=y
CONFIG_RTC_LIB=y
CONFIG_RTC_MC146818_LIB=y
CONFIG_RTC_CLASS=y
CONFIG_RTC_HCTOSYS=y
CONFIG_RTC_HCTOSYS_DEVICE="rtc0"
# CONFIG_RTC_SYSTOHC is not set
CONFIG_RTC_DEBUG=y
CONFIG_RTC_NVMEM=y

#
# RTC interfaces
#
CONFIG_RTC_INTF_SYSFS=y
CONFIG_RTC_INTF_PROC=y
CONFIG_RTC_INTF_DEV=y
# CONFIG_RTC_INTF_DEV_UIE_EMUL is not set
# CONFIG_RTC_DRV_TEST is not set

#
# I2C RTC drivers
#
CONFIG_RTC_DRV_88PM860X=y
CONFIG_RTC_DRV_ABB5ZES3=m
CONFIG_RTC_DRV_ABEOZ9=y
CONFIG_RTC_DRV_ABX80X=m
CONFIG_RTC_DRV_DS1307=m
CONFIG_RTC_DRV_DS1307_CENTURY=y
# CONFIG_RTC_DRV_DS1374 is not set
CONFIG_RTC_DRV_DS1672=y
CONFIG_RTC_DRV_MAX6900=y
CONFIG_RTC_DRV_RS5C372=m
CONFIG_RTC_DRV_ISL1208=m
# CONFIG_RTC_DRV_ISL12022 is not set
CONFIG_RTC_DRV_X1205=m
CONFIG_RTC_DRV_PCF8523=m
# CONFIG_RTC_DRV_PCF85063 is not set
CONFIG_RTC_DRV_PCF85363=m
# CONFIG_RTC_DRV_PCF8563 is not set
CONFIG_RTC_DRV_PCF8583=y
CONFIG_RTC_DRV_M41T80=y
CONFIG_RTC_DRV_M41T80_WDT=y
CONFIG_RTC_DRV_BQ32K=y
# CONFIG_RTC_DRV_S35390A is not set
# CONFIG_RTC_DRV_FM3130 is not set
CONFIG_RTC_DRV_RX8010=m
CONFIG_RTC_DRV_RX8581=m
# CONFIG_RTC_DRV_RX8025 is not set
# CONFIG_RTC_DRV_EM3027 is not set
CONFIG_RTC_DRV_RV3028=y
CONFIG_RTC_DRV_RV3032=m
# CONFIG_RTC_DRV_RV8803 is not set
# CONFIG_RTC_DRV_SD3078 is not set

#
# SPI RTC drivers
#
CONFIG_RTC_I2C_AND_SPI=y

#
# SPI and I2C RTC drivers
#
CONFIG_RTC_DRV_DS3232=y
CONFIG_RTC_DRV_DS3232_HWMON=y
CONFIG_RTC_DRV_PCF2127=y
# CONFIG_RTC_DRV_RV3029C2 is not set
# CONFIG_RTC_DRV_RX6110 is not set

#
# Platform RTC drivers
#
CONFIG_RTC_DRV_CMOS=m
CONFIG_RTC_DRV_DS1286=y
CONFIG_RTC_DRV_DS1511=m
CONFIG_RTC_DRV_DS1553=m
CONFIG_RTC_DRV_DS1685_FAMILY=m
# CONFIG_RTC_DRV_DS1685 is not set
CONFIG_RTC_DRV_DS1689=y
# CONFIG_RTC_DRV_DS17285 is not set
# CONFIG_RTC_DRV_DS17485 is not set
# CONFIG_RTC_DRV_DS17885 is not set
CONFIG_RTC_DRV_DS1742=y
CONFIG_RTC_DRV_DS2404=m
CONFIG_RTC_DRV_DA9055=y
CONFIG_RTC_DRV_DA9063=m
CONFIG_RTC_DRV_STK17TA8=m
# CONFIG_RTC_DRV_M48T86 is not set
# CONFIG_RTC_DRV_M48T35 is not set
CONFIG_RTC_DRV_M48T59=m
# CONFIG_RTC_DRV_MSM6242 is not set
# CONFIG_RTC_DRV_BQ4802 is not set
CONFIG_RTC_DRV_RP5C01=m
CONFIG_RTC_DRV_V3020=m
# CONFIG_RTC_DRV_WM8350 is not set

#
# on-CPU RTC drivers
#
# CONFIG_RTC_DRV_FTRTC010 is not set
CONFIG_RTC_DRV_MT6397=m

#
# HID Sensor RTC drivers
#
# CONFIG_RTC_DRV_GOLDFISH is not set
# CONFIG_DMADEVICES is not set

#
# DMABUF options
#
CONFIG_SYNC_FILE=y
# CONFIG_SW_SYNC is not set
# CONFIG_UDMABUF is not set
CONFIG_DMABUF_MOVE_NOTIFY=y
# CONFIG_DMABUF_DEBUG is not set
# CONFIG_DMABUF_SELFTESTS is not set
# CONFIG_DMABUF_HEAPS is not set
# CONFIG_DMABUF_SYSFS_STATS is not set
# end of DMABUF options

CONFIG_AUXDISPLAY=y
CONFIG_CHARLCD=y
CONFIG_LINEDISP=m
CONFIG_HD44780_COMMON=y
# CONFIG_HD44780 is not set
CONFIG_IMG_ASCII_LCD=m
# CONFIG_LCD2S is not set
CONFIG_PARPORT_PANEL=y
CONFIG_PANEL_PARPORT=0
CONFIG_PANEL_PROFILE=5
# CONFIG_PANEL_CHANGE_MESSAGE is not set
# CONFIG_CHARLCD_BL_OFF is not set
# CONFIG_CHARLCD_BL_ON is not set
CONFIG_CHARLCD_BL_FLASH=y
CONFIG_PANEL=y
CONFIG_UIO=y
# CONFIG_UIO_CIF is not set
CONFIG_UIO_PDRV_GENIRQ=y
# CONFIG_UIO_DMEM_GENIRQ is not set
CONFIG_UIO_AEC=y
# CONFIG_UIO_SERCOS3 is not set
# CONFIG_UIO_PCI_GENERIC is not set
CONFIG_UIO_NETX=y
CONFIG_UIO_PRUSS=y
CONFIG_UIO_MF624=y
# CONFIG_UIO_HV_GENERIC is not set
# CONFIG_UIO_DFL is not set
# CONFIG_VFIO is not set
CONFIG_VIRT_DRIVERS=y
CONFIG_VMGENID=y
CONFIG_VBOXGUEST=y
CONFIG_VIRTIO=y
CONFIG_VIRTIO_PCI_LIB=y
CONFIG_VIRTIO_MENU=y
CONFIG_VIRTIO_PCI=y
# CONFIG_VIRTIO_PCI_LEGACY is not set
# CONFIG_VIRTIO_BALLOON is not set
CONFIG_VIRTIO_INPUT=m
# CONFIG_VIRTIO_MMIO is not set
CONFIG_VIRTIO_DMA_SHARED_BUFFER=y
# CONFIG_VDPA is not set
# CONFIG_VHOST_MENU is not set

#
# Microsoft Hyper-V guest support
#
CONFIG_HYPERV=y
CONFIG_HYPERV_TIMER=y
CONFIG_HYPERV_UTILS=m
CONFIG_HYPERV_BALLOON=y
# end of Microsoft Hyper-V guest support

CONFIG_GREYBUS=m
# CONFIG_COMEDI is not set
# CONFIG_STAGING is not set
CONFIG_X86_PLATFORM_DEVICES=y
CONFIG_ACPI_WMI=y
CONFIG_WMI_BMOF=y
CONFIG_MXM_WMI=y
CONFIG_PEAQ_WMI=y
# CONFIG_NVIDIA_WMI_EC_BACKLIGHT is not set
CONFIG_XIAOMI_WMI=y
# CONFIG_GIGABYTE_WMI is not set
# CONFIG_YOGABOOK_WMI is not set
CONFIG_ACERHDF=m
CONFIG_ACER_WIRELESS=m
# CONFIG_ACER_WMI is not set
# CONFIG_AMD_PMC is not set
# CONFIG_ADV_SWBUTTON is not set
# CONFIG_APPLE_GMUX is not set
# CONFIG_ASUS_LAPTOP is not set
# CONFIG_ASUS_WIRELESS is not set
# CONFIG_ASUS_TF103C_DOCK is not set
# CONFIG_MERAKI_MX100 is not set
# CONFIG_X86_PLATFORM_DRIVERS_DELL is not set
# CONFIG_FUJITSU_LAPTOP is not set
# CONFIG_FUJITSU_TABLET is not set
CONFIG_GPD_POCKET_FAN=y
CONFIG_HP_ACCEL=m
# CONFIG_WIRELESS_HOTKEY is not set
CONFIG_HP_WMI=y
CONFIG_TC1100_WMI=m
CONFIG_IBM_RTL=y
CONFIG_SENSORS_HDAPS=y
# CONFIG_THINKPAD_LMI is not set
CONFIG_INTEL_ATOMISP2_PDX86=y
CONFIG_INTEL_ATOMISP2_LED=m
# CONFIG_INTEL_ATOMISP2_PM is not set
# CONFIG_INTEL_SAR_INT1092 is not set
# CONFIG_INTEL_SKL_INT3472 is not set
CONFIG_INTEL_PMC_CORE=y
CONFIG_INTEL_WMI=y
CONFIG_INTEL_WMI_SBL_FW_UPDATE=y
# CONFIG_INTEL_WMI_THUNDERBOLT is not set
CONFIG_INTEL_HID_EVENT=m
# CONFIG_INTEL_VBTN is not set
CONFIG_INTEL_INT0002_VGPIO=m
CONFIG_INTEL_BXTWC_PMIC_TMU=m
# CONFIG_INTEL_MRFLD_PWRBTN is not set
CONFIG_INTEL_PUNIT_IPC=m
CONFIG_INTEL_RST=y
CONFIG_INTEL_SMARTCONNECT=m
# CONFIG_INTEL_VSEC is not set
CONFIG_MSI_WMI=y
# CONFIG_PCENGINES_APU2 is not set
# CONFIG_BARCO_P50_GPIO is not set
CONFIG_SAMSUNG_LAPTOP=y
CONFIG_SAMSUNG_Q10=y
CONFIG_TOSHIBA_BT_RFKILL=m
CONFIG_TOSHIBA_HAPS=m
CONFIG_TOSHIBA_WMI=y
CONFIG_ACPI_CMPC=m
# CONFIG_PANASONIC_LAPTOP is not set
# CONFIG_TOPSTAR_LAPTOP is not set
# CONFIG_MLX_PLATFORM is not set
CONFIG_INTEL_IPS=y
CONFIG_INTEL_SCU_IPC=y
CONFIG_INTEL_SCU=y
CONFIG_INTEL_SCU_PCI=y
CONFIG_INTEL_SCU_PLATFORM=y
CONFIG_INTEL_SCU_IPC_UTIL=y
# CONFIG_SIEMENS_SIMATIC_IPC is not set
# CONFIG_WINMATE_FM07_KEYS is not set
CONFIG_PMC_ATOM=y
# CONFIG_CHROME_PLATFORMS is not set
CONFIG_MELLANOX_PLATFORM=y
CONFIG_MLXREG_HOTPLUG=m
CONFIG_MLXREG_IO=y
# CONFIG_MLXREG_LC is not set
# CONFIG_NVSW_SN2201 is not set
CONFIG_SURFACE_PLATFORMS=y
# CONFIG_SURFACE_3_POWER_OPREGION is not set
# CONFIG_SURFACE_HOTPLUG is not set
# CONFIG_SURFACE_PRO3_BUTTON is not set
# CONFIG_SURFACE_AGGREGATOR is not set
CONFIG_HAVE_CLK=y
CONFIG_HAVE_CLK_PREPARE=y
CONFIG_COMMON_CLK=y
# CONFIG_COMMON_CLK_MAX9485 is not set
CONFIG_COMMON_CLK_SI5341=y
CONFIG_COMMON_CLK_SI5351=y
# CONFIG_COMMON_CLK_SI544 is not set
CONFIG_COMMON_CLK_CDCE706=y
CONFIG_COMMON_CLK_CS2000_CP=m
CONFIG_CLK_TWL6040=m
# CONFIG_COMMON_CLK_PWM is not set
CONFIG_XILINX_VCU=m
CONFIG_HWSPINLOCK=y

#
# Clock Source drivers
#
CONFIG_CLKSRC_I8253=y
CONFIG_CLKEVT_I8253=y
CONFIG_CLKBLD_I8253=y
# end of Clock Source drivers

CONFIG_MAILBOX=y
CONFIG_PCC=y
CONFIG_ALTERA_MBOX=y
CONFIG_IOMMU_IOVA=y
CONFIG_IOMMU_API=y
CONFIG_IOMMU_SUPPORT=y

#
# Generic IOMMU Pagetable Support
#
# end of Generic IOMMU Pagetable Support

CONFIG_IOMMU_DEBUGFS=y
# CONFIG_IOMMU_DEFAULT_DMA_STRICT is not set
# CONFIG_IOMMU_DEFAULT_DMA_LAZY is not set
CONFIG_IOMMU_DEFAULT_PASSTHROUGH=y
CONFIG_IOMMU_DMA=y
# CONFIG_INTEL_IOMMU is not set
# CONFIG_HYPERV_IOMMU is not set
CONFIG_VIRTIO_IOMMU=y

#
# Remoteproc drivers
#
# CONFIG_REMOTEPROC is not set
# end of Remoteproc drivers

#
# Rpmsg drivers
#
CONFIG_RPMSG=y
# CONFIG_RPMSG_CHAR is not set
# CONFIG_RPMSG_CTRL is not set
CONFIG_RPMSG_NS=y
CONFIG_RPMSG_QCOM_GLINK=y
CONFIG_RPMSG_QCOM_GLINK_RPM=y
CONFIG_RPMSG_VIRTIO=y
# end of Rpmsg drivers

CONFIG_SOUNDWIRE=y

#
# SoundWire Devices
#

#
# SOC (System On Chip) specific Drivers
#

#
# Amlogic SoC drivers
#
# end of Amlogic SoC drivers

#
# Broadcom SoC drivers
#
# end of Broadcom SoC drivers

#
# NXP/Freescale QorIQ SoC drivers
#
# end of NXP/Freescale QorIQ SoC drivers

#
# i.MX SoC drivers
#
# end of i.MX SoC drivers

#
# Enable LiteX SoC Builder specific drivers
#
# end of Enable LiteX SoC Builder specific drivers

#
# Qualcomm SoC drivers
#
# end of Qualcomm SoC drivers

CONFIG_SOC_TI=y

#
# Xilinx SoC drivers
#
# end of Xilinx SoC drivers
# end of SOC (System On Chip) specific Drivers

CONFIG_PM_DEVFREQ=y

#
# DEVFREQ Governors
#
CONFIG_DEVFREQ_GOV_SIMPLE_ONDEMAND=m
CONFIG_DEVFREQ_GOV_PERFORMANCE=y
CONFIG_DEVFREQ_GOV_POWERSAVE=m
CONFIG_DEVFREQ_GOV_USERSPACE=m
CONFIG_DEVFREQ_GOV_PASSIVE=m

#
# DEVFREQ Drivers
#
# CONFIG_PM_DEVFREQ_EVENT is not set
CONFIG_EXTCON=m

#
# Extcon Device Drivers
#
# CONFIG_EXTCON_AXP288 is not set
CONFIG_EXTCON_FSA9480=m
# CONFIG_EXTCON_GPIO is not set
CONFIG_EXTCON_INTEL_INT3496=m
# CONFIG_EXTCON_INTEL_MRFLD is not set
CONFIG_EXTCON_MAX14577=m
CONFIG_EXTCON_MAX3355=m
# CONFIG_EXTCON_MAX77693 is not set
# CONFIG_EXTCON_PTN5150 is not set
CONFIG_EXTCON_RT8973A=m
CONFIG_EXTCON_SM5502=m
CONFIG_EXTCON_USB_GPIO=m
# CONFIG_EXTCON_USBC_TUSB320 is not set
# CONFIG_MEMORY is not set
# CONFIG_IIO is not set
# CONFIG_NTB is not set
# CONFIG_VME_BUS is not set
CONFIG_PWM=y
CONFIG_PWM_SYSFS=y
# CONFIG_PWM_DEBUG is not set
# CONFIG_PWM_DWC is not set
CONFIG_PWM_IQS620A=m
CONFIG_PWM_LPSS=m
CONFIG_PWM_LPSS_PCI=m
CONFIG_PWM_LPSS_PLATFORM=m
# CONFIG_PWM_PCA9685 is not set

#
# IRQ chip support
#
CONFIG_MADERA_IRQ=m
# end of IRQ chip support

# CONFIG_IPACK_BUS is not set
# CONFIG_RESET_CONTROLLER is not set

#
# PHY Subsystem
#
CONFIG_GENERIC_PHY=y
# CONFIG_USB_LGM_PHY is not set
# CONFIG_PHY_CAN_TRANSCEIVER is not set

#
# PHY drivers for Broadcom platforms
#
# CONFIG_BCM_KONA_USB2_PHY is not set
# end of PHY drivers for Broadcom platforms

CONFIG_PHY_PXA_28NM_HSIC=y
# CONFIG_PHY_PXA_28NM_USB2 is not set
# CONFIG_PHY_INTEL_LGM_EMMC is not set
# end of PHY Subsystem

# CONFIG_POWERCAP is not set
CONFIG_MCB=y
# CONFIG_MCB_PCI is not set
CONFIG_MCB_LPC=m

#
# Performance monitor support
#
# end of Performance monitor support

CONFIG_RAS=y
CONFIG_USB4=y
CONFIG_USB4_DEBUGFS_WRITE=y
# CONFIG_USB4_DMA_TEST is not set

#
# Android
#
CONFIG_ANDROID=y
# CONFIG_ANDROID_BINDER_IPC is not set
# end of Android

CONFIG_DAX=y
# CONFIG_DEV_DAX is not set
CONFIG_NVMEM=y
CONFIG_NVMEM_SYSFS=y
CONFIG_NVMEM_SPMI_SDAM=m
# CONFIG_NVMEM_RMEM is not set

#
# HW tracing support
#
CONFIG_STM=m
CONFIG_STM_PROTO_BASIC=m
# CONFIG_STM_PROTO_SYS_T is not set
# CONFIG_STM_DUMMY is not set
# CONFIG_STM_SOURCE_CONSOLE is not set
# CONFIG_STM_SOURCE_HEARTBEAT is not set
CONFIG_STM_SOURCE_FTRACE=m
CONFIG_INTEL_TH=y
# CONFIG_INTEL_TH_PCI is not set
CONFIG_INTEL_TH_ACPI=y
CONFIG_INTEL_TH_GTH=y
CONFIG_INTEL_TH_STH=m
# CONFIG_INTEL_TH_MSU is not set
CONFIG_INTEL_TH_PTI=y
# CONFIG_INTEL_TH_DEBUG is not set
# end of HW tracing support

CONFIG_FPGA=y
CONFIG_ALTERA_PR_IP_CORE=m
CONFIG_FPGA_MGR_ALTERA_CVP=y
CONFIG_FPGA_BRIDGE=y
CONFIG_ALTERA_FREEZE_BRIDGE=m
CONFIG_XILINX_PR_DECOUPLER=m
CONFIG_FPGA_REGION=y
CONFIG_FPGA_DFL=y
CONFIG_FPGA_DFL_FME=m
CONFIG_FPGA_DFL_FME_MGR=m
# CONFIG_FPGA_DFL_FME_BRIDGE is not set
CONFIG_FPGA_DFL_FME_REGION=m
# CONFIG_FPGA_DFL_AFU is not set
# CONFIG_FPGA_DFL_NIOS_INTEL_PAC_N3000 is not set
# CONFIG_FPGA_DFL_PCI is not set
CONFIG_TEE=y
CONFIG_PM_OPP=y
CONFIG_SIOX=m
CONFIG_SIOX_BUS_GPIO=m
CONFIG_SLIMBUS=m
CONFIG_SLIM_QCOM_CTRL=m
CONFIG_INTERCONNECT=y
# CONFIG_COUNTER is not set
CONFIG_MOST=m
# CONFIG_MOST_CDEV is not set
# CONFIG_MOST_SND is not set
# CONFIG_PECI is not set
# CONFIG_HTE is not set
# end of Device Drivers

#
# File systems
#
CONFIG_DCACHE_WORD_ACCESS=y
# CONFIG_VALIDATE_FS_PARSER is not set
CONFIG_FS_IOMAP=y
# CONFIG_EXT2_FS is not set
# CONFIG_EXT3_FS is not set
# CONFIG_EXT4_FS is not set
CONFIG_REISERFS_FS=m
# CONFIG_REISERFS_CHECK is not set
# CONFIG_REISERFS_PROC_INFO is not set
CONFIG_REISERFS_FS_XATTR=y
# CONFIG_REISERFS_FS_POSIX_ACL is not set
CONFIG_REISERFS_FS_SECURITY=y
CONFIG_JFS_FS=y
CONFIG_JFS_POSIX_ACL=y
CONFIG_JFS_SECURITY=y
CONFIG_JFS_DEBUG=y
# CONFIG_JFS_STATISTICS is not set
CONFIG_XFS_FS=m
CONFIG_XFS_SUPPORT_V4=y
CONFIG_XFS_QUOTA=y
# CONFIG_XFS_POSIX_ACL is not set
CONFIG_XFS_RT=y
# CONFIG_XFS_ONLINE_SCRUB is not set
# CONFIG_XFS_WARN is not set
# CONFIG_XFS_DEBUG is not set
# CONFIG_GFS2_FS is not set
# CONFIG_OCFS2_FS is not set
CONFIG_BTRFS_FS=m
CONFIG_BTRFS_FS_POSIX_ACL=y
CONFIG_BTRFS_FS_CHECK_INTEGRITY=y
# CONFIG_BTRFS_FS_RUN_SANITY_TESTS is not set
# CONFIG_BTRFS_DEBUG is not set
# CONFIG_BTRFS_ASSERT is not set
CONFIG_BTRFS_FS_REF_VERIFY=y
# CONFIG_NILFS2_FS is not set
CONFIG_F2FS_FS=y
CONFIG_F2FS_STAT_FS=y
CONFIG_F2FS_FS_XATTR=y
# CONFIG_F2FS_FS_POSIX_ACL is not set
CONFIG_F2FS_FS_SECURITY=y
CONFIG_F2FS_CHECK_FS=y
CONFIG_F2FS_FAULT_INJECTION=y
CONFIG_F2FS_FS_COMPRESSION=y
# CONFIG_F2FS_FS_LZO is not set
# CONFIG_F2FS_FS_LZ4 is not set
# CONFIG_F2FS_FS_ZSTD is not set
CONFIG_F2FS_IOSTAT=y
CONFIG_FS_POSIX_ACL=y
CONFIG_EXPORTFS=y
CONFIG_EXPORTFS_BLOCK_OPS=y
CONFIG_FILE_LOCKING=y
# CONFIG_FS_ENCRYPTION is not set
CONFIG_FS_VERITY=y
CONFIG_FS_VERITY_DEBUG=y
# CONFIG_FS_VERITY_BUILTIN_SIGNATURES is not set
CONFIG_FSNOTIFY=y
# CONFIG_DNOTIFY is not set
CONFIG_INOTIFY_USER=y
CONFIG_FANOTIFY=y
# CONFIG_FANOTIFY_ACCESS_PERMISSIONS is not set
CONFIG_QUOTA=y
# CONFIG_QUOTA_NETLINK_INTERFACE is not set
CONFIG_PRINT_QUOTA_WARNING=y
# CONFIG_QUOTA_DEBUG is not set
CONFIG_QFMT_V1=m
# CONFIG_QFMT_V2 is not set
CONFIG_QUOTACTL=y
CONFIG_AUTOFS4_FS=m
CONFIG_AUTOFS_FS=y
# CONFIG_FUSE_FS is not set
CONFIG_OVERLAY_FS=m
CONFIG_OVERLAY_FS_REDIRECT_DIR=y
CONFIG_OVERLAY_FS_REDIRECT_ALWAYS_FOLLOW=y
# CONFIG_OVERLAY_FS_INDEX is not set
CONFIG_OVERLAY_FS_METACOPY=y

#
# Caches
#
CONFIG_NETFS_SUPPORT=m
# CONFIG_NETFS_STATS is not set
CONFIG_FSCACHE=m
# CONFIG_FSCACHE_STATS is not set
CONFIG_FSCACHE_DEBUG=y
# CONFIG_CACHEFILES is not set
# end of Caches

#
# CD-ROM/DVD Filesystems
#
CONFIG_ISO9660_FS=y
CONFIG_JOLIET=y
CONFIG_ZISOFS=y
CONFIG_UDF_FS=m
# end of CD-ROM/DVD Filesystems

#
# DOS/FAT/EXFAT/NT Filesystems
#
CONFIG_FAT_FS=y
CONFIG_MSDOS_FS=y
CONFIG_VFAT_FS=m
CONFIG_FAT_DEFAULT_CODEPAGE=437
CONFIG_FAT_DEFAULT_IOCHARSET="iso8859-1"
# CONFIG_FAT_DEFAULT_UTF8 is not set
CONFIG_EXFAT_FS=m
CONFIG_EXFAT_DEFAULT_IOCHARSET="utf8"
CONFIG_NTFS_FS=y
# CONFIG_NTFS_DEBUG is not set
# CONFIG_NTFS_RW is not set
# CONFIG_NTFS3_FS is not set
# end of DOS/FAT/EXFAT/NT Filesystems

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
# CONFIG_PROC_KCORE is not set
CONFIG_PROC_SYSCTL=y
CONFIG_PROC_PAGE_MONITOR=y
CONFIG_PROC_CHILDREN=y
CONFIG_PROC_PID_ARCH_STATUS=y
CONFIG_KERNFS=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
CONFIG_TMPFS_POSIX_ACL=y
CONFIG_TMPFS_XATTR=y
CONFIG_HUGETLBFS=y
CONFIG_HUGETLB_PAGE=y
CONFIG_MEMFD_CREATE=y
CONFIG_CONFIGFS_FS=y
# end of Pseudo filesystems

CONFIG_MISC_FILESYSTEMS=y
CONFIG_ORANGEFS_FS=y
CONFIG_ADFS_FS=m
CONFIG_ADFS_FS_RW=y
CONFIG_AFFS_FS=y
CONFIG_ECRYPT_FS=m
# CONFIG_ECRYPT_FS_MESSAGING is not set
CONFIG_HFS_FS=m
CONFIG_HFSPLUS_FS=y
# CONFIG_BEFS_FS is not set
# CONFIG_BFS_FS is not set
CONFIG_EFS_FS=y
# CONFIG_JFFS2_FS is not set
# CONFIG_UBIFS_FS is not set
CONFIG_CRAMFS=m
# CONFIG_CRAMFS_BLOCKDEV is not set
# CONFIG_CRAMFS_MTD is not set
CONFIG_SQUASHFS=y
CONFIG_SQUASHFS_FILE_CACHE=y
# CONFIG_SQUASHFS_FILE_DIRECT is not set
# CONFIG_SQUASHFS_DECOMP_SINGLE is not set
CONFIG_SQUASHFS_DECOMP_MULTI=y
# CONFIG_SQUASHFS_DECOMP_MULTI_PERCPU is not set
# CONFIG_SQUASHFS_XATTR is not set
# CONFIG_SQUASHFS_ZLIB is not set
CONFIG_SQUASHFS_LZ4=y
CONFIG_SQUASHFS_LZO=y
# CONFIG_SQUASHFS_XZ is not set
CONFIG_SQUASHFS_ZSTD=y
CONFIG_SQUASHFS_4K_DEVBLK_SIZE=y
# CONFIG_SQUASHFS_EMBEDDED is not set
CONFIG_SQUASHFS_FRAGMENT_CACHE_SIZE=3
# CONFIG_VXFS_FS is not set
CONFIG_MINIX_FS=m
CONFIG_OMFS_FS=y
CONFIG_HPFS_FS=m
CONFIG_QNX4FS_FS=m
# CONFIG_QNX6FS_FS is not set
# CONFIG_ROMFS_FS is not set
CONFIG_PSTORE=y
CONFIG_PSTORE_DEFAULT_KMSG_BYTES=10240
# CONFIG_PSTORE_DEFLATE_COMPRESS is not set
# CONFIG_PSTORE_LZO_COMPRESS is not set
# CONFIG_PSTORE_LZ4_COMPRESS is not set
CONFIG_PSTORE_LZ4HC_COMPRESS=y
CONFIG_PSTORE_842_COMPRESS=y
# CONFIG_PSTORE_ZSTD_COMPRESS is not set
CONFIG_PSTORE_COMPRESS=y
CONFIG_PSTORE_LZ4HC_COMPRESS_DEFAULT=y
# CONFIG_PSTORE_842_COMPRESS_DEFAULT is not set
CONFIG_PSTORE_COMPRESS_DEFAULT="lz4hc"
CONFIG_PSTORE_CONSOLE=y
# CONFIG_PSTORE_PMSG is not set
CONFIG_PSTORE_RAM=y
# CONFIG_PSTORE_BLK is not set
# CONFIG_SYSV_FS is not set
CONFIG_UFS_FS=m
CONFIG_UFS_FS_WRITE=y
# CONFIG_UFS_DEBUG is not set
# CONFIG_EROFS_FS is not set
CONFIG_VBOXSF_FS=y
CONFIG_NETWORK_FILESYSTEMS=y
CONFIG_NFS_FS=y
CONFIG_NFS_V2=y
CONFIG_NFS_V3=y
# CONFIG_NFS_V3_ACL is not set
CONFIG_NFS_V4=m
# CONFIG_NFS_SWAP is not set
# CONFIG_NFS_V4_1 is not set
# CONFIG_ROOT_NFS is not set
# CONFIG_NFS_USE_LEGACY_DNS is not set
CONFIG_NFS_USE_KERNEL_DNS=y
CONFIG_NFS_DISABLE_UDP_SUPPORT=y
# CONFIG_NFSD is not set
CONFIG_GRACE_PERIOD=y
CONFIG_LOCKD=y
CONFIG_LOCKD_V4=y
CONFIG_NFS_COMMON=y
CONFIG_SUNRPC=y
CONFIG_SUNRPC_GSS=m
# CONFIG_SUNRPC_DEBUG is not set
# CONFIG_CEPH_FS is not set
CONFIG_CIFS=m
CONFIG_CIFS_STATS2=y
CONFIG_CIFS_ALLOW_INSECURE_LEGACY=y
# CONFIG_CIFS_UPCALL is not set
# CONFIG_CIFS_XATTR is not set
CONFIG_CIFS_DEBUG=y
# CONFIG_CIFS_DEBUG2 is not set
# CONFIG_CIFS_DEBUG_DUMP_KEYS is not set
# CONFIG_CIFS_DFS_UPCALL is not set
# CONFIG_CIFS_SWN_UPCALL is not set
# CONFIG_CIFS_FSCACHE is not set
# CONFIG_SMB_SERVER is not set
CONFIG_SMBFS_COMMON=m
# CONFIG_CODA_FS is not set
# CONFIG_AFS_FS is not set
# CONFIG_9P_FS is not set
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="iso8859-1"
CONFIG_NLS_CODEPAGE_437=m
CONFIG_NLS_CODEPAGE_737=y
# CONFIG_NLS_CODEPAGE_775 is not set
CONFIG_NLS_CODEPAGE_850=m
CONFIG_NLS_CODEPAGE_852=m
CONFIG_NLS_CODEPAGE_855=m
CONFIG_NLS_CODEPAGE_857=y
# CONFIG_NLS_CODEPAGE_860 is not set
# CONFIG_NLS_CODEPAGE_861 is not set
# CONFIG_NLS_CODEPAGE_862 is not set
CONFIG_NLS_CODEPAGE_863=y
CONFIG_NLS_CODEPAGE_864=y
CONFIG_NLS_CODEPAGE_865=m
CONFIG_NLS_CODEPAGE_866=m
# CONFIG_NLS_CODEPAGE_869 is not set
# CONFIG_NLS_CODEPAGE_936 is not set
# CONFIG_NLS_CODEPAGE_950 is not set
# CONFIG_NLS_CODEPAGE_932 is not set
# CONFIG_NLS_CODEPAGE_949 is not set
# CONFIG_NLS_CODEPAGE_874 is not set
# CONFIG_NLS_ISO8859_8 is not set
CONFIG_NLS_CODEPAGE_1250=y
CONFIG_NLS_CODEPAGE_1251=y
# CONFIG_NLS_ASCII is not set
CONFIG_NLS_ISO8859_1=y
# CONFIG_NLS_ISO8859_2 is not set
CONFIG_NLS_ISO8859_3=y
CONFIG_NLS_ISO8859_4=m
# CONFIG_NLS_ISO8859_5 is not set
# CONFIG_NLS_ISO8859_6 is not set
CONFIG_NLS_ISO8859_7=y
CONFIG_NLS_ISO8859_9=m
# CONFIG_NLS_ISO8859_13 is not set
# CONFIG_NLS_ISO8859_14 is not set
CONFIG_NLS_ISO8859_15=y
CONFIG_NLS_KOI8_R=y
# CONFIG_NLS_KOI8_U is not set
CONFIG_NLS_MAC_ROMAN=y
# CONFIG_NLS_MAC_CELTIC is not set
# CONFIG_NLS_MAC_CENTEURO is not set
CONFIG_NLS_MAC_CROATIAN=y
CONFIG_NLS_MAC_CYRILLIC=y
# CONFIG_NLS_MAC_GAELIC is not set
# CONFIG_NLS_MAC_GREEK is not set
CONFIG_NLS_MAC_ICELAND=m
# CONFIG_NLS_MAC_INUIT is not set
CONFIG_NLS_MAC_ROMANIAN=m
CONFIG_NLS_MAC_TURKISH=m
CONFIG_NLS_UTF8=y
# CONFIG_DLM is not set
CONFIG_UNICODE=y
# CONFIG_UNICODE_NORMALIZATION_SELFTEST is not set
# end of File systems

#
# Security options
#
CONFIG_KEYS=y
# CONFIG_KEYS_REQUEST_CACHE is not set
# CONFIG_PERSISTENT_KEYRINGS is not set
# CONFIG_TRUSTED_KEYS is not set
CONFIG_ENCRYPTED_KEYS=m
# CONFIG_USER_DECRYPTED_DATA is not set
CONFIG_KEY_DH_OPERATIONS=y
# CONFIG_KEY_NOTIFICATIONS is not set
CONFIG_SECURITY_DMESG_RESTRICT=y
CONFIG_SECURITY=y
CONFIG_SECURITYFS=y
# CONFIG_SECURITY_NETWORK is not set
# CONFIG_SECURITY_PATH is not set
CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR=y
# CONFIG_HARDENED_USERCOPY is not set
CONFIG_FORTIFY_SOURCE=y
CONFIG_STATIC_USERMODEHELPER=y
CONFIG_STATIC_USERMODEHELPER_PATH="/sbin/usermode-helper"
# CONFIG_SECURITY_SMACK is not set
# CONFIG_SECURITY_TOMOYO is not set
# CONFIG_SECURITY_APPARMOR is not set
# CONFIG_SECURITY_LOADPIN is not set
# CONFIG_SECURITY_YAMA is not set
# CONFIG_SECURITY_SAFESETID is not set
# CONFIG_SECURITY_LOCKDOWN_LSM is not set
# CONFIG_SECURITY_LANDLOCK is not set
CONFIG_INTEGRITY=y
# CONFIG_INTEGRITY_SIGNATURE is not set
# CONFIG_IMA is not set
# CONFIG_EVM is not set
CONFIG_DEFAULT_SECURITY_DAC=y
CONFIG_LSM="lockdown,yama,loadpin,safesetid,integrity,bpf"

#
# Kernel hardening options
#

#
# Memory initialization
#
CONFIG_INIT_STACK_NONE=y
# CONFIG_GCC_PLUGIN_STRUCTLEAK_USER is not set
# CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF is not set
# CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL is not set
# CONFIG_GCC_PLUGIN_STACKLEAK is not set
# CONFIG_INIT_ON_ALLOC_DEFAULT_ON is not set
# CONFIG_INIT_ON_FREE_DEFAULT_ON is not set
CONFIG_CC_HAS_ZERO_CALL_USED_REGS=y
# CONFIG_ZERO_CALL_USED_REGS is not set
# end of Memory initialization

CONFIG_RANDSTRUCT_NONE=y
# CONFIG_RANDSTRUCT_FULL is not set
# CONFIG_RANDSTRUCT_PERFORMANCE is not set
# end of Kernel hardening options
# end of Security options

CONFIG_XOR_BLOCKS=y
CONFIG_ASYNC_CORE=y
CONFIG_ASYNC_MEMCPY=y
CONFIG_ASYNC_XOR=y
CONFIG_ASYNC_PQ=y
CONFIG_ASYNC_RAID6_RECOV=y
CONFIG_CRYPTO=y

#
# Crypto core or helper
#
CONFIG_CRYPTO_ALGAPI=y
CONFIG_CRYPTO_ALGAPI2=y
CONFIG_CRYPTO_AEAD=y
CONFIG_CRYPTO_AEAD2=y
CONFIG_CRYPTO_SKCIPHER=y
CONFIG_CRYPTO_SKCIPHER2=y
CONFIG_CRYPTO_HASH=y
CONFIG_CRYPTO_HASH2=y
CONFIG_CRYPTO_RNG=y
CONFIG_CRYPTO_RNG2=y
CONFIG_CRYPTO_RNG_DEFAULT=y
CONFIG_CRYPTO_AKCIPHER2=y
CONFIG_CRYPTO_AKCIPHER=y
CONFIG_CRYPTO_KPP2=y
CONFIG_CRYPTO_KPP=y
CONFIG_CRYPTO_ACOMP2=y
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_MANAGER2=y
CONFIG_CRYPTO_USER=y
CONFIG_CRYPTO_MANAGER_DISABLE_TESTS=y
CONFIG_CRYPTO_GF128MUL=y
CONFIG_CRYPTO_NULL=y
CONFIG_CRYPTO_NULL2=y
CONFIG_CRYPTO_CRYPTD=y
CONFIG_CRYPTO_AUTHENC=m
# CONFIG_CRYPTO_TEST is not set
CONFIG_CRYPTO_SIMD=m

#
# Public-key cryptography
#
CONFIG_CRYPTO_RSA=y
CONFIG_CRYPTO_DH=y
# CONFIG_CRYPTO_DH_RFC7919_GROUPS is not set
CONFIG_CRYPTO_ECC=y
CONFIG_CRYPTO_ECDH=y
# CONFIG_CRYPTO_ECDSA is not set
CONFIG_CRYPTO_ECRDSA=m
CONFIG_CRYPTO_SM2=y
CONFIG_CRYPTO_CURVE25519=y

#
# Authenticated Encryption with Associated Data
#
CONFIG_CRYPTO_CCM=m
CONFIG_CRYPTO_GCM=m
CONFIG_CRYPTO_CHACHA20POLY1305=y
# CONFIG_CRYPTO_AEGIS128 is not set
# CONFIG_CRYPTO_SEQIV is not set
CONFIG_CRYPTO_ECHAINIV=y

#
# Block modes
#
CONFIG_CRYPTO_CBC=m
CONFIG_CRYPTO_CFB=m
CONFIG_CRYPTO_CTR=m
CONFIG_CRYPTO_CTS=y
CONFIG_CRYPTO_ECB=y
CONFIG_CRYPTO_LRW=y
# CONFIG_CRYPTO_OFB is not set
CONFIG_CRYPTO_PCBC=y
# CONFIG_CRYPTO_XTS is not set
CONFIG_CRYPTO_KEYWRAP=m
CONFIG_CRYPTO_NHPOLY1305=y
CONFIG_CRYPTO_ADIANTUM=y
CONFIG_CRYPTO_ESSIV=m

#
# Hash modes
#
CONFIG_CRYPTO_CMAC=m
CONFIG_CRYPTO_HMAC=y
CONFIG_CRYPTO_XCBC=m
CONFIG_CRYPTO_VMAC=y

#
# Digest
#
CONFIG_CRYPTO_CRC32C=y
CONFIG_CRYPTO_CRC32C_INTEL=m
CONFIG_CRYPTO_CRC32=y
# CONFIG_CRYPTO_CRC32_PCLMUL is not set
CONFIG_CRYPTO_XXHASH=m
CONFIG_CRYPTO_BLAKE2B=m
# CONFIG_CRYPTO_BLAKE2S is not set
CONFIG_CRYPTO_CRCT10DIF=y
CONFIG_CRYPTO_CRC64_ROCKSOFT=y
CONFIG_CRYPTO_GHASH=m
CONFIG_CRYPTO_POLY1305=y
CONFIG_CRYPTO_MD4=m
CONFIG_CRYPTO_MD5=m
CONFIG_CRYPTO_MICHAEL_MIC=y
# CONFIG_CRYPTO_RMD160 is not set
CONFIG_CRYPTO_SHA1=y
CONFIG_CRYPTO_SHA256=y
CONFIG_CRYPTO_SHA512=y
CONFIG_CRYPTO_SHA3=m
CONFIG_CRYPTO_SM3=y
# CONFIG_CRYPTO_SM3_GENERIC is not set
CONFIG_CRYPTO_STREEBOG=m
# CONFIG_CRYPTO_WP512 is not set

#
# Ciphers
#
CONFIG_CRYPTO_AES=m
CONFIG_CRYPTO_AES_TI=m
CONFIG_CRYPTO_AES_NI_INTEL=m
# CONFIG_CRYPTO_ANUBIS is not set
CONFIG_CRYPTO_ARC4=y
# CONFIG_CRYPTO_BLOWFISH is not set
CONFIG_CRYPTO_CAMELLIA=m
# CONFIG_CRYPTO_CAST5 is not set
# CONFIG_CRYPTO_CAST6 is not set
# CONFIG_CRYPTO_DES is not set
# CONFIG_CRYPTO_FCRYPT is not set
CONFIG_CRYPTO_KHAZAD=y
CONFIG_CRYPTO_CHACHA20=y
CONFIG_CRYPTO_SEED=y
CONFIG_CRYPTO_SERPENT=m
# CONFIG_CRYPTO_SERPENT_SSE2_586 is not set
# CONFIG_CRYPTO_SM4_GENERIC is not set
CONFIG_CRYPTO_TEA=y
# CONFIG_CRYPTO_TWOFISH is not set
CONFIG_CRYPTO_TWOFISH_COMMON=m
CONFIG_CRYPTO_TWOFISH_586=m

#
# Compression
#
CONFIG_CRYPTO_DEFLATE=y
CONFIG_CRYPTO_LZO=m
CONFIG_CRYPTO_842=y
# CONFIG_CRYPTO_LZ4 is not set
CONFIG_CRYPTO_LZ4HC=y
CONFIG_CRYPTO_ZSTD=m

#
# Random Number Generation
#
# CONFIG_CRYPTO_ANSI_CPRNG is not set
CONFIG_CRYPTO_DRBG_MENU=y
CONFIG_CRYPTO_DRBG_HMAC=y
# CONFIG_CRYPTO_DRBG_HASH is not set
# CONFIG_CRYPTO_DRBG_CTR is not set
CONFIG_CRYPTO_DRBG=y
CONFIG_CRYPTO_JITTERENTROPY=y
CONFIG_CRYPTO_KDF800108_CTR=y
CONFIG_CRYPTO_USER_API=y
CONFIG_CRYPTO_USER_API_HASH=y
# CONFIG_CRYPTO_USER_API_SKCIPHER is not set
CONFIG_CRYPTO_USER_API_RNG=y
# CONFIG_CRYPTO_USER_API_RNG_CAVP is not set
CONFIG_CRYPTO_USER_API_AEAD=m
CONFIG_CRYPTO_USER_API_ENABLE_OBSOLETE=y
CONFIG_CRYPTO_STATS=y
CONFIG_CRYPTO_HASH_INFO=y
# CONFIG_CRYPTO_HW is not set
CONFIG_ASYMMETRIC_KEY_TYPE=y
CONFIG_ASYMMETRIC_PUBLIC_KEY_SUBTYPE=y
CONFIG_X509_CERTIFICATE_PARSER=y
# CONFIG_PKCS8_PRIVATE_KEY_PARSER is not set
CONFIG_PKCS7_MESSAGE_PARSER=y
# CONFIG_PKCS7_TEST_KEY is not set
# CONFIG_SIGNED_PE_FILE_VERIFICATION is not set
# CONFIG_FIPS_SIGNATURE_SELFTEST is not set

#
# Certificates for signature checking
#
CONFIG_SYSTEM_TRUSTED_KEYRING=y
CONFIG_SYSTEM_TRUSTED_KEYS=""
CONFIG_SYSTEM_EXTRA_CERTIFICATE=y
CONFIG_SYSTEM_EXTRA_CERTIFICATE_SIZE=4096
CONFIG_SECONDARY_TRUSTED_KEYRING=y
CONFIG_SYSTEM_BLACKLIST_KEYRING=y
CONFIG_SYSTEM_BLACKLIST_HASH_LIST=""
# CONFIG_SYSTEM_REVOCATION_LIST is not set
# CONFIG_SYSTEM_BLACKLIST_AUTH_UPDATE is not set
# end of Certificates for signature checking

CONFIG_BINARY_PRINTF=y

#
# Library routines
#
CONFIG_RAID6_PQ=y
CONFIG_RAID6_PQ_BENCHMARK=y
CONFIG_LINEAR_RANGES=y
CONFIG_PACKING=y
CONFIG_BITREVERSE=y
CONFIG_GENERIC_STRNCPY_FROM_USER=y
CONFIG_GENERIC_STRNLEN_USER=y
CONFIG_GENERIC_NET_UTILS=y
CONFIG_CORDIC=m
CONFIG_PRIME_NUMBERS=y
CONFIG_RATIONAL=y
CONFIG_GENERIC_PCI_IOMAP=y
CONFIG_GENERIC_IOMAP=y
CONFIG_ARCH_HAS_FAST_MULTIPLIER=y
CONFIG_ARCH_USE_SYM_ANNOTATIONS=y

#
# Crypto library routines
#
CONFIG_CRYPTO_LIB_AES=m
CONFIG_CRYPTO_LIB_ARC4=y
CONFIG_CRYPTO_LIB_BLAKE2S_GENERIC=y
CONFIG_CRYPTO_LIB_CHACHA_GENERIC=y
CONFIG_CRYPTO_LIB_CHACHA=m
CONFIG_CRYPTO_LIB_CURVE25519_GENERIC=y
# CONFIG_CRYPTO_LIB_CURVE25519 is not set
CONFIG_CRYPTO_LIB_POLY1305_RSIZE=1
CONFIG_CRYPTO_LIB_POLY1305_GENERIC=y
CONFIG_CRYPTO_LIB_POLY1305=y
# CONFIG_CRYPTO_LIB_CHACHA20POLY1305 is not set
CONFIG_CRYPTO_LIB_SHA256=y
# end of Crypto library routines

CONFIG_LIB_MEMNEQ=y
CONFIG_CRC_CCITT=m
CONFIG_CRC16=y
CONFIG_CRC_T10DIF=y
CONFIG_CRC64_ROCKSOFT=y
CONFIG_CRC_ITU_T=y
CONFIG_CRC32=y
# CONFIG_CRC32_SELFTEST is not set
# CONFIG_CRC32_SLICEBY8 is not set
CONFIG_CRC32_SLICEBY4=y
# CONFIG_CRC32_SARWATE is not set
# CONFIG_CRC32_BIT is not set
CONFIG_CRC64=y
# CONFIG_CRC4 is not set
CONFIG_CRC7=y
CONFIG_LIBCRC32C=y
CONFIG_CRC8=y
CONFIG_XXHASH=y
# CONFIG_RANDOM32_SELFTEST is not set
CONFIG_842_COMPRESS=y
CONFIG_842_DECOMPRESS=y
CONFIG_ZLIB_INFLATE=y
CONFIG_ZLIB_DEFLATE=y
CONFIG_LZO_COMPRESS=y
CONFIG_LZO_DECOMPRESS=y
CONFIG_LZ4HC_COMPRESS=y
CONFIG_LZ4_DECOMPRESS=y
CONFIG_ZSTD_COMPRESS=m
CONFIG_ZSTD_DECOMPRESS=y
CONFIG_XZ_DEC=y
CONFIG_XZ_DEC_X86=y
CONFIG_XZ_DEC_POWERPC=y
# CONFIG_XZ_DEC_IA64 is not set
CONFIG_XZ_DEC_ARM=y
CONFIG_XZ_DEC_ARMTHUMB=y
CONFIG_XZ_DEC_SPARC=y
# CONFIG_XZ_DEC_MICROLZMA is not set
CONFIG_XZ_DEC_BCJ=y
# CONFIG_XZ_DEC_TEST is not set
CONFIG_DECOMPRESS_GZIP=y
CONFIG_DECOMPRESS_BZIP2=y
CONFIG_DECOMPRESS_LZMA=y
CONFIG_DECOMPRESS_XZ=y
CONFIG_DECOMPRESS_LZ4=y
CONFIG_DECOMPRESS_ZSTD=y
CONFIG_GENERIC_ALLOCATOR=y
CONFIG_REED_SOLOMON=y
CONFIG_REED_SOLOMON_ENC8=y
CONFIG_REED_SOLOMON_DEC8=y
CONFIG_REED_SOLOMON_DEC16=y
CONFIG_BCH=m
CONFIG_INTERVAL_TREE=y
CONFIG_XARRAY_MULTI=y
CONFIG_ASSOCIATIVE_ARRAY=y
CONFIG_HAS_IOMEM=y
CONFIG_HAS_IOPORT_MAP=y
CONFIG_HAS_DMA=y
CONFIG_DMA_OPS=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_DMA_CMA=y
# CONFIG_DMA_PERNUMA_CMA is not set

#
# Default contiguous memory area size:
#
CONFIG_CMA_SIZE_MBYTES=0
CONFIG_CMA_SIZE_PERCENTAGE=0
# CONFIG_CMA_SIZE_SEL_MBYTES is not set
# CONFIG_CMA_SIZE_SEL_PERCENTAGE is not set
CONFIG_CMA_SIZE_SEL_MIN=y
# CONFIG_CMA_SIZE_SEL_MAX is not set
CONFIG_CMA_ALIGNMENT=8
# CONFIG_DMA_API_DEBUG is not set
# CONFIG_DMA_MAP_BENCHMARK is not set
CONFIG_SGL_ALLOC=y
CONFIG_CHECK_SIGNATURE=y
CONFIG_DQL=y
CONFIG_GLOB=y
# CONFIG_GLOB_SELFTEST is not set
CONFIG_NLATTR=y
CONFIG_CLZ_TAB=y
# CONFIG_IRQ_POLL is not set
CONFIG_MPILIB=y
CONFIG_OID_REGISTRY=y
CONFIG_HAVE_GENERIC_VDSO=y
CONFIG_GENERIC_GETTIMEOFDAY=y
CONFIG_GENERIC_VDSO_32=y
CONFIG_GENERIC_VDSO_TIME_NS=y
CONFIG_SG_POOL=y
CONFIG_ARCH_STACKWALK=y
CONFIG_STACKDEPOT=y
CONFIG_STACK_HASH_ORDER=20
CONFIG_SBITMAP=y
# end of Library routines

#
# Kernel hacking
#

#
# printk and dmesg options
#
CONFIG_PRINTK_TIME=y
CONFIG_PRINTK_CALLER=y
# CONFIG_STACKTRACE_BUILD_ID is not set
CONFIG_CONSOLE_LOGLEVEL_DEFAULT=7
CONFIG_CONSOLE_LOGLEVEL_QUIET=4
CONFIG_MESSAGE_LOGLEVEL_DEFAULT=4
# CONFIG_BOOT_PRINTK_DELAY is not set
# CONFIG_DYNAMIC_DEBUG is not set
CONFIG_DYNAMIC_DEBUG_CORE=y
# CONFIG_SYMBOLIC_ERRNAME is not set
CONFIG_DEBUG_BUGVERBOSE=y
# end of printk and dmesg options

CONFIG_DEBUG_KERNEL=y
# CONFIG_DEBUG_MISC is not set

#
# Compile-time checks and compiler options
#
CONFIG_DEBUG_INFO_NONE=y
# CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT is not set
# CONFIG_DEBUG_INFO_DWARF4 is not set
# CONFIG_DEBUG_INFO_DWARF5 is not set
CONFIG_FRAME_WARN=8192
# CONFIG_STRIP_ASM_SYMS is not set
CONFIG_READABLE_ASM=y
# CONFIG_HEADERS_INSTALL is not set
CONFIG_DEBUG_SECTION_MISMATCH=y
CONFIG_SECTION_MISMATCH_WARN_ONLY=y
CONFIG_FRAME_POINTER=y
# CONFIG_VMLINUX_MAP is not set
CONFIG_DEBUG_FORCE_WEAK_PER_CPU=y
# end of Compile-time checks and compiler options

#
# Generic Kernel Debugging Instruments
#
CONFIG_MAGIC_SYSRQ=y
CONFIG_MAGIC_SYSRQ_DEFAULT_ENABLE=0x1
CONFIG_MAGIC_SYSRQ_SERIAL=y
CONFIG_MAGIC_SYSRQ_SERIAL_SEQUENCE=""
CONFIG_DEBUG_FS=y
# CONFIG_DEBUG_FS_ALLOW_ALL is not set
# CONFIG_DEBUG_FS_DISALLOW_MOUNT is not set
CONFIG_DEBUG_FS_ALLOW_NONE=y
CONFIG_HAVE_ARCH_KGDB=y
# CONFIG_KGDB is not set
CONFIG_ARCH_HAS_UBSAN_SANITIZE_ALL=y
CONFIG_UBSAN=y
# CONFIG_UBSAN_TRAP is not set
CONFIG_CC_HAS_UBSAN_BOUNDS=y
CONFIG_UBSAN_BOUNDS=y
CONFIG_UBSAN_ONLY_BOUNDS=y
CONFIG_UBSAN_SHIFT=y
# CONFIG_UBSAN_DIV_ZERO is not set
# CONFIG_UBSAN_UNREACHABLE is not set
# CONFIG_UBSAN_BOOL is not set
# CONFIG_UBSAN_ENUM is not set
# CONFIG_UBSAN_ALIGNMENT is not set
CONFIG_UBSAN_SANITIZE_ALL=y
# CONFIG_TEST_UBSAN is not set
CONFIG_HAVE_KCSAN_COMPILER=y
# end of Generic Kernel Debugging Instruments

#
# Networking Debugging
#
# CONFIG_NET_DEV_REFCNT_TRACKER is not set
# CONFIG_NET_NS_REFCNT_TRACKER is not set
# CONFIG_DEBUG_NET is not set
# end of Networking Debugging

#
# Memory Debugging
#
CONFIG_PAGE_EXTENSION=y
# CONFIG_DEBUG_PAGEALLOC is not set
CONFIG_SLUB_DEBUG=y
# CONFIG_SLUB_DEBUG_ON is not set
CONFIG_PAGE_OWNER=y
CONFIG_PAGE_POISONING=y
# CONFIG_DEBUG_PAGE_REF is not set
# CONFIG_DEBUG_RODATA_TEST is not set
CONFIG_ARCH_HAS_DEBUG_WX=y
# CONFIG_DEBUG_WX is not set
CONFIG_GENERIC_PTDUMP=y
# CONFIG_PTDUMP_DEBUGFS is not set
CONFIG_DEBUG_OBJECTS=y
# CONFIG_DEBUG_OBJECTS_SELFTEST is not set
CONFIG_DEBUG_OBJECTS_FREE=y
# CONFIG_DEBUG_OBJECTS_TIMERS is not set
# CONFIG_DEBUG_OBJECTS_WORK is not set
CONFIG_DEBUG_OBJECTS_RCU_HEAD=y
# CONFIG_DEBUG_OBJECTS_PERCPU_COUNTER is not set
CONFIG_DEBUG_OBJECTS_ENABLE_DEFAULT=1
CONFIG_HAVE_DEBUG_KMEMLEAK=y
# CONFIG_DEBUG_KMEMLEAK is not set
CONFIG_DEBUG_STACK_USAGE=y
CONFIG_SCHED_STACK_END_CHECK=y
CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE=y
CONFIG_DEBUG_VM=y
# CONFIG_DEBUG_VM_VMACACHE is not set
CONFIG_DEBUG_VM_RB=y
CONFIG_DEBUG_VM_PGFLAGS=y
CONFIG_DEBUG_VM_PGTABLE=y
CONFIG_ARCH_HAS_DEBUG_VIRTUAL=y
CONFIG_DEBUG_VIRTUAL=y
# CONFIG_DEBUG_MEMORY_INIT is not set
CONFIG_DEBUG_KMAP_LOCAL=y
CONFIG_ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP=y
CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP=y
# CONFIG_DEBUG_HIGHMEM is not set
CONFIG_HAVE_DEBUG_STACKOVERFLOW=y
# CONFIG_DEBUG_STACKOVERFLOW is not set
CONFIG_CC_HAS_KASAN_GENERIC=y
CONFIG_CC_HAS_WORKING_NOSANITIZE_ADDRESS=y
CONFIG_HAVE_ARCH_KFENCE=y
# CONFIG_KFENCE is not set
# end of Memory Debugging

# CONFIG_DEBUG_SHIRQ is not set

#
# Debug Oops, Lockups and Hangs
#
CONFIG_PANIC_ON_OOPS=y
CONFIG_PANIC_ON_OOPS_VALUE=1
CONFIG_PANIC_TIMEOUT=0
CONFIG_LOCKUP_DETECTOR=y
CONFIG_SOFTLOCKUP_DETECTOR=y
# CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC is not set
# CONFIG_HARDLOCKUP_DETECTOR is not set
CONFIG_DETECT_HUNG_TASK=y
CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=480
# CONFIG_BOOTPARAM_HUNG_TASK_PANIC is not set
CONFIG_WQ_WATCHDOG=y
# CONFIG_TEST_LOCKUP is not set
# end of Debug Oops, Lockups and Hangs

#
# Scheduler Debugging
#
CONFIG_SCHED_DEBUG=y
CONFIG_SCHED_INFO=y
CONFIG_SCHEDSTATS=y
# end of Scheduler Debugging

# CONFIG_DEBUG_TIMEKEEPING is not set

#
# Lock Debugging (spinlocks, mutexes, etc...)
#
CONFIG_LOCK_DEBUGGING_SUPPORT=y
CONFIG_PROVE_LOCKING=y
# CONFIG_PROVE_RAW_LOCK_NESTING is not set
CONFIG_LOCK_STAT=y
CONFIG_DEBUG_RT_MUTEXES=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y
CONFIG_DEBUG_RWSEMS=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_LOCKDEP=y
CONFIG_LOCKDEP_BITS=15
CONFIG_LOCKDEP_CHAINS_BITS=16
CONFIG_LOCKDEP_STACK_TRACE_BITS=19
CONFIG_LOCKDEP_STACK_TRACE_HASH_BITS=14
CONFIG_LOCKDEP_CIRCULAR_QUEUE_BITS=12
# CONFIG_DEBUG_LOCKDEP is not set
CONFIG_DEBUG_ATOMIC_SLEEP=y
# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
CONFIG_LOCK_TORTURE_TEST=m
# CONFIG_WW_MUTEX_SELFTEST is not set
# CONFIG_SCF_TORTURE_TEST is not set
# end of Lock Debugging (spinlocks, mutexes, etc...)

CONFIG_TRACE_IRQFLAGS=y
CONFIG_TRACE_IRQFLAGS_NMI=y
# CONFIG_DEBUG_IRQFLAGS is not set
CONFIG_STACKTRACE=y
CONFIG_WARN_ALL_UNSEEDED_RANDOM=y
# CONFIG_DEBUG_KOBJECT is not set

#
# Debug kernel data structures
#
CONFIG_DEBUG_LIST=y
CONFIG_DEBUG_PLIST=y
CONFIG_DEBUG_SG=y
CONFIG_DEBUG_NOTIFIERS=y
CONFIG_BUG_ON_DATA_CORRUPTION=y
# end of Debug kernel data structures

# CONFIG_DEBUG_CREDENTIALS is not set

#
# RCU Debugging
#
CONFIG_PROVE_RCU=y
CONFIG_PROVE_RCU_LIST=y
CONFIG_TORTURE_TEST=m
CONFIG_RCU_SCALE_TEST=m
CONFIG_RCU_TORTURE_TEST=m
# CONFIG_RCU_REF_SCALE_TEST is not set
# CONFIG_RCU_TRACE is not set
# CONFIG_RCU_EQS_DEBUG is not set
# CONFIG_RCU_STRICT_GRACE_PERIOD is not set
# end of RCU Debugging

# CONFIG_DEBUG_WQ_FORCE_RR_CPU is not set
CONFIG_LATENCYTOP=y
CONFIG_USER_STACKTRACE_SUPPORT=y
CONFIG_NOP_TRACER=y
CONFIG_HAVE_RETHOOK=y
CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_SYSCALL_TRACEPOINTS=y
CONFIG_HAVE_C_RECORDMCOUNT=y
CONFIG_HAVE_BUILDTIME_MCOUNT_SORT=y
CONFIG_TRACE_CLOCK=y
CONFIG_RING_BUFFER=y
CONFIG_EVENT_TRACING=y
CONFIG_CONTEXT_SWITCH_TRACER=y
CONFIG_PREEMPTIRQ_TRACEPOINTS=y
CONFIG_TRACING=y
CONFIG_TRACING_SUPPORT=y
# CONFIG_FTRACE is not set
# CONFIG_PROVIDE_OHCI1394_DMA_INIT is not set
CONFIG_SAMPLES=y
CONFIG_SAMPLE_AUXDISPLAY=y
CONFIG_SAMPLE_TRACE_EVENTS=m
# CONFIG_SAMPLE_TRACE_CUSTOM_EVENTS is not set
CONFIG_SAMPLE_TRACE_PRINTK=m
CONFIG_SAMPLE_TRACE_ARRAY=m
# CONFIG_SAMPLE_KOBJECT is not set
# CONFIG_SAMPLE_HW_BREAKPOINT is not set
# CONFIG_SAMPLE_KFIFO is not set
CONFIG_SAMPLE_RPMSG_CLIENT=m
CONFIG_SAMPLE_CONFIGFS=m
# CONFIG_SAMPLE_WATCHDOG is not set
CONFIG_ARCH_HAS_DEVMEM_IS_ALLOWED=y

#
# x86 Debugging
#
CONFIG_TRACE_IRQFLAGS_NMI_SUPPORT=y
CONFIG_EARLY_PRINTK_USB=y
CONFIG_X86_VERBOSE_BOOTUP=y
CONFIG_EARLY_PRINTK=y
CONFIG_EARLY_PRINTK_DBGP=y
CONFIG_EARLY_PRINTK_USB_XDBC=y
# CONFIG_DEBUG_TLBFLUSH is not set
CONFIG_HAVE_MMIOTRACE_SUPPORT=y
# CONFIG_X86_DECODER_SELFTEST is not set
CONFIG_IO_DELAY_0X80=y
# CONFIG_IO_DELAY_0XED is not set
# CONFIG_IO_DELAY_UDELAY is not set
# CONFIG_IO_DELAY_NONE is not set
CONFIG_DEBUG_BOOT_PARAMS=y
# CONFIG_CPA_DEBUG is not set
CONFIG_DEBUG_ENTRY=y
# CONFIG_DEBUG_NMI_SELFTEST is not set
CONFIG_X86_DEBUG_FPU=y
CONFIG_PUNIT_ATOM_DEBUG=m
CONFIG_UNWINDER_FRAME_POINTER=y
# end of x86 Debugging

#
# Kernel Testing and Coverage
#
# CONFIG_KUNIT is not set
CONFIG_NOTIFIER_ERROR_INJECTION=m
CONFIG_PM_NOTIFIER_ERROR_INJECT=m
CONFIG_NETDEV_NOTIFIER_ERROR_INJECT=m
CONFIG_FAULT_INJECTION=y
CONFIG_FAILSLAB=y
# CONFIG_FAIL_PAGE_ALLOC is not set
# CONFIG_FAULT_INJECTION_USERCOPY is not set
CONFIG_FAIL_MAKE_REQUEST=y
# CONFIG_FAIL_IO_TIMEOUT is not set
CONFIG_FAIL_FUTEX=y
# CONFIG_FAULT_INJECTION_DEBUG_FS is not set
CONFIG_CC_HAS_SANCOV_TRACE_PC=y
CONFIG_RUNTIME_TESTING_MENU=y
# CONFIG_LKDTM is not set
# CONFIG_TEST_MIN_HEAP is not set
# CONFIG_TEST_DIV64 is not set
# CONFIG_BACKTRACE_SELF_TEST is not set
# CONFIG_TEST_REF_TRACKER is not set
# CONFIG_RBTREE_TEST is not set
# CONFIG_REED_SOLOMON_TEST is not set
# CONFIG_INTERVAL_TREE_TEST is not set
# CONFIG_PERCPU_TEST is not set
# CONFIG_ATOMIC64_SELFTEST is not set
# CONFIG_ASYNC_RAID6_TEST is not set
# CONFIG_TEST_HEXDUMP is not set
# CONFIG_STRING_SELFTEST is not set
# CONFIG_TEST_STRING_HELPERS is not set
# CONFIG_TEST_STRSCPY is not set
# CONFIG_TEST_KSTRTOX is not set
# CONFIG_TEST_PRINTF is not set
# CONFIG_TEST_SCANF is not set
# CONFIG_TEST_BITMAP is not set
# CONFIG_TEST_UUID is not set
# CONFIG_TEST_XARRAY is not set
# CONFIG_TEST_RHASHTABLE is not set
# CONFIG_TEST_SIPHASH is not set
# CONFIG_TEST_IDA is not set
# CONFIG_TEST_LKM is not set
# CONFIG_TEST_BITOPS is not set
# CONFIG_TEST_VMALLOC is not set
# CONFIG_TEST_USER_COPY is not set
# CONFIG_TEST_BPF is not set
# CONFIG_TEST_BLACKHOLE_DEV is not set
# CONFIG_FIND_BIT_BENCHMARK is not set
# CONFIG_TEST_FIRMWARE is not set
# CONFIG_TEST_SYSCTL is not set
# CONFIG_TEST_UDELAY is not set
# CONFIG_TEST_STATIC_KEYS is not set
# CONFIG_TEST_KMOD is not set
# CONFIG_TEST_DEBUG_VIRTUAL is not set
# CONFIG_TEST_MEMCAT_P is not set
# CONFIG_TEST_MEMINIT is not set
# CONFIG_TEST_FREE_PAGES is not set
# CONFIG_TEST_FPU is not set
# CONFIG_TEST_CLOCKSOURCE_WATCHDOG is not set
CONFIG_ARCH_USE_MEMTEST=y
# CONFIG_MEMTEST is not set
# CONFIG_HYPERV_TESTING is not set
# end of Kernel Testing and Coverage
# end of Kernel hacking

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 43/45] namei: initialize parameters passed to step_into()
  2022-07-01 14:23 ` [PATCH v4 43/45] namei: initialize parameters passed to step_into() Alexander Potapenko
@ 2022-07-02 17:23   ` Linus Torvalds
  2022-07-03  3:59     ` Al Viro
  2022-07-04  2:52     ` Al Viro
  2022-08-08 16:37   ` Alexander Potapenko
  1 sibling, 2 replies; 147+ messages in thread
From: Linus Torvalds @ 2022-07-02 17:23 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

On Fri, Jul 1, 2022 at 7:25 AM Alexander Potapenko <glider@google.com> wrote:
>
> Under certain circumstances initialization of `unsigned seq` and
> `struct inode *inode` passed into step_into() may be skipped.
> In particular, if the call to lookup_fast() in walk_component()
> returns NULL, and lookup_slow() returns a valid dentry, then the
> `seq` and `inode` will remain uninitialized until the call to
> step_into() (see [1] for more info).

So while I think this needs to be fixed, I think I'd really prefer to
make the initialization and/or usage rules stricter or at least
clearer.

For example, looking around, I think "handle_dotdot()" has the exact
same kind of issue, where follow_dotdot[_rcu|() doesn't initialize
seq/inode for certain cases, and it's *really* hard to see exactly
what the rules are.

It turns out that the rules are that seq/inode only get initialized if
these routines return a non-NULL and non-error result.

Now, that is true for all of these cases - both follow_dotdot*() and
lookup_fast(). Possibly others.

But the reason follow_dotdot*() doesn't cause the same issue is that
the caller actually does the checks that avoid it, and doesn't pass
down the uninitialized cases.

Now, the other part of the rule is that they only get _used_ for
LOOKUP_RCU cases, where they are used to validate the lookup after
we've finalized things.

Of course, sometimes the "only get used for LOOKUP_RCU" is very very
unclear, because even without being an RCU lookup, step_into() will
save it into nd->inode/seq. So the values were "used", and
initializing them makes them valid, but then *that* copy must not then
be used unless RCU was set.

Also, sometimes the LOOKUP_RCU check is in the caller, and has
actually been cleared, so by the time the actual use comes around, you
just have to trust that it was a RCU lookup (ie
legitimize_links/root()).

So it all seems to work, and this patch then gets rid of one
particular odd case, but I think this patch basically hides the
compiler warning without really clarifying the code or the rules.

Anyway, what I'm building up to here is that I think we should
*document* this a bit more. and then make those initializations then
be about that documentation. I also get the feeling that
"nd->inode/nd->seq" should also be initialized.

Right now we have those quite subtle rules about "set vs use", and
while a lot of the uses are conditional on LOOKUP_RCU, that makes the
code correct, but doesn't solve the "pass uninitialized values as
arguments" case.

I also think it's very unclear when nd->inode/nd->seq are initialized,
and the compiler warning only caught the case where they were *set*
(but by arguments that weren't initialized), but didn't necessarily
catch the case where they weren't set at all in the first place and
then passed around.

End result:

 - I think I'd like path_init() (or set_nameidata) to actually
initialize nd->inode and nd->seq unconditionally too.

   Right now, they get initialized only for that LOOKUP_RCU case.
Pretty much exactly the same issue as the one this patch tries to
solve, except the compiler didn't notice because it's all indirect
through those structure fields and it just didn't track far enough.

 - I suspect it would be good to initialize them to actual invalid
values (rather than NULL/0 - particularly the sequence number)

 - I look at that follow_dotdot*() caller case, and think "that looks
very similar to the lookup_fast() case, but then we have *very*
different initialization rules".

Al - can you please take a quick look?

                    Linus

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 43/45] namei: initialize parameters passed to step_into()
  2022-07-02 17:23   ` Linus Torvalds
@ 2022-07-03  3:59     ` Al Viro
  2022-07-04  2:52     ` Al Viro
  1 sibling, 0 replies; 147+ messages in thread
From: Al Viro @ 2022-07-03  3:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexander Potapenko, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

On Sat, Jul 02, 2022 at 10:23:16AM -0700, Linus Torvalds wrote:
> On Fri, Jul 1, 2022 at 7:25 AM Alexander Potapenko <glider@google.com> wrote:
> >
> > Under certain circumstances initialization of `unsigned seq` and
> > `struct inode *inode` passed into step_into() may be skipped.
> > In particular, if the call to lookup_fast() in walk_component()
> > returns NULL, and lookup_slow() returns a valid dentry, then the
> > `seq` and `inode` will remain uninitialized until the call to
> > step_into() (see [1] for more info).

> So while I think this needs to be fixed, I think I'd really prefer to
> make the initialization and/or usage rules stricter or at least
> clearer.

Disclaimer: the bits below are nowhere near what I consider a decent
explanation; this might serve as the first approximation, but I really
need to get some sleep before I get it into coherent shape.  4 hours
of sleep today...

The rules are
	* no pathname resolution without successful path_init().
IOW, path_init() failure is an instant fuck off.
	* path_init() success sets nd->inode.  In all cases.
	* nd->inode must be set - LOOKUP_RCU or not, we simply cannot
proceed without it.

	* in non-RCU mode nd->inode must be equal to nd->path.dentry->d_inode.
	* in RCU mode nd->inode must be equal to a value observed in
nd->path.dentry->d_inode while nd->path.dentry->d_seq had been equal to
nd->seq.

	* step_into() gets a dentry/inode/seq triple.  In non-RCU
mode inode and seq are ignored; in RCU mode they must satisfy the
same relationship we have for nd->path.dentry/nd->inode/nd->seq.

> Of course, sometimes the "only get used for LOOKUP_RCU" is very very
> unclear, because even without being an RCU lookup, step_into() will
> save it into nd->inode/seq. So the values were "used", and
> initializing them makes them valid, but then *that* copy must not then
> be used unless RCU was set.

You are misreading that (and I admit that it badly needs documentation).
The whole point of step_into() is to move over to new place.  nd->inode
*MUST* be set on success, no matter what.

>  - I look at that follow_dotdot*() caller case, and think "that looks
> very similar to the lookup_fast() case, but then we have *very*
> different initialization rules".

follow_dotdot() might as well lose inodep and seqp arguments - everything
would've worked just as well without those.  We would've gotten the same
complaints about uninitialized values passed to step_into(), though.

This
                if (unlikely(!parent))
                        error = step_into(nd, WALK_NOFOLLOW,
                                         nd->path.dentry, nd->inode, nd->seq);
in handle_dots() probably contributes to confusion - it's the "we
have stepped on .. in the root, just jump into whatever's mounted on
it" case.  In non-RCU case it looks like a use of nd->seq in non-RCU
mode; however, in that case step_into() will end up ignoring the
last two arguments.

I'll post something more coherent after I get some sleep.  Sorry... ;-/

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 43/45] namei: initialize parameters passed to step_into()
  2022-07-02 17:23   ` Linus Torvalds
  2022-07-03  3:59     ` Al Viro
@ 2022-07-04  2:52     ` Al Viro
  2022-07-04  8:20       ` Alexander Potapenko
  2022-07-04 17:36       ` Linus Torvalds
  1 sibling, 2 replies; 147+ messages in thread
From: Al Viro @ 2022-07-04  2:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexander Potapenko, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

On Sat, Jul 02, 2022 at 10:23:16AM -0700, Linus Torvalds wrote:

> Al - can you please take a quick look?

FWIW, trying to write a coherent documentation had its usual effect...
The thing is, we don't really need to fetch the inode that early.
All we really care about is that in RCU mode ->d_seq gets sampled
before we fetch ->d_inode *and* we don't treat "it looks negative"
as hard -ENOENT in case of ->d_seq mismatch.

Which can be bloody well left to step_into().  So we don't need
to pass it inode argument at all - just dentry and seq.  Makes
a bunch of functions simpler as well...

It does *not* deal with the "uninitialized" seq argument in
!RCU case; I'll handle that in the followup, but that's a separate
story, IMO (and very clearly a false positive).

Cumulative diff follows; splitup is in #work.namei.  Comments?

diff --git a/fs/namei.c b/fs/namei.c
index 1f28d3f463c3..7f4f61ade9e3 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1467,7 +1467,7 @@ EXPORT_SYMBOL(follow_down);
  * we meet a managed dentry that would need blocking.
  */
 static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
-			       struct inode **inode, unsigned *seqp)
+			       unsigned *seqp)
 {
 	struct dentry *dentry = path->dentry;
 	unsigned int flags = dentry->d_flags;
@@ -1497,13 +1497,6 @@ static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
 				dentry = path->dentry = mounted->mnt.mnt_root;
 				nd->state |= ND_JUMPED;
 				*seqp = read_seqcount_begin(&dentry->d_seq);
-				*inode = dentry->d_inode;
-				/*
-				 * We don't need to re-check ->d_seq after this
-				 * ->d_inode read - there will be an RCU delay
-				 * between mount hash removal and ->mnt_root
-				 * becoming unpinned.
-				 */
 				flags = dentry->d_flags;
 				continue;
 			}
@@ -1515,8 +1508,7 @@ static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
 }
 
 static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
-			  struct path *path, struct inode **inode,
-			  unsigned int *seqp)
+			  struct path *path, unsigned int *seqp)
 {
 	bool jumped;
 	int ret;
@@ -1525,9 +1517,7 @@ static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
 	path->dentry = dentry;
 	if (nd->flags & LOOKUP_RCU) {
 		unsigned int seq = *seqp;
-		if (unlikely(!*inode))
-			return -ENOENT;
-		if (likely(__follow_mount_rcu(nd, path, inode, seqp)))
+		if (likely(__follow_mount_rcu(nd, path, seqp)))
 			return 0;
 		if (!try_to_unlazy_next(nd, dentry, seq))
 			return -ECHILD;
@@ -1547,7 +1537,6 @@ static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
 		if (path->mnt != nd->path.mnt)
 			mntput(path->mnt);
 	} else {
-		*inode = d_backing_inode(path->dentry);
 		*seqp = 0; /* out of RCU mode, so the value doesn't matter */
 	}
 	return ret;
@@ -1607,9 +1596,7 @@ static struct dentry *__lookup_hash(const struct qstr *name,
 	return dentry;
 }
 
-static struct dentry *lookup_fast(struct nameidata *nd,
-				  struct inode **inode,
-			          unsigned *seqp)
+static struct dentry *lookup_fast(struct nameidata *nd, unsigned *seqp)
 {
 	struct dentry *dentry, *parent = nd->path.dentry;
 	int status = 1;
@@ -1628,22 +1615,11 @@ static struct dentry *lookup_fast(struct nameidata *nd,
 			return NULL;
 		}
 
-		/*
-		 * This sequence count validates that the inode matches
-		 * the dentry name information from lookup.
-		 */
-		*inode = d_backing_inode(dentry);
-		if (unlikely(read_seqcount_retry(&dentry->d_seq, seq)))
-			return ERR_PTR(-ECHILD);
-
-		/*
+	        /*
 		 * This sequence count validates that the parent had no
 		 * changes while we did the lookup of the dentry above.
-		 *
-		 * The memory barrier in read_seqcount_begin of child is
-		 *  enough, we can use __read_seqcount_retry here.
 		 */
-		if (unlikely(__read_seqcount_retry(&parent->d_seq, nd->seq)))
+		if (unlikely(read_seqcount_retry(&parent->d_seq, nd->seq)))
 			return ERR_PTR(-ECHILD);
 
 		*seqp = seq;
@@ -1838,13 +1814,21 @@ static const char *pick_link(struct nameidata *nd, struct path *link,
  * for the common case.
  */
 static const char *step_into(struct nameidata *nd, int flags,
-		     struct dentry *dentry, struct inode *inode, unsigned seq)
+		     struct dentry *dentry, unsigned seq)
 {
 	struct path path;
-	int err = handle_mounts(nd, dentry, &path, &inode, &seq);
+	struct inode *inode;
+	int err = handle_mounts(nd, dentry, &path, &seq);
 
 	if (err < 0)
 		return ERR_PTR(err);
+	inode = path.dentry->d_inode;
+	if (unlikely(!inode)) {
+		if ((nd->flags & LOOKUP_RCU) &&
+		    read_seqcount_retry(&path.dentry->d_seq, seq))
+			return ERR_PTR(-ECHILD);
+		return ERR_PTR(-ENOENT);
+	}
 	if (likely(!d_is_symlink(path.dentry)) ||
 	   ((flags & WALK_TRAILING) && !(nd->flags & LOOKUP_FOLLOW)) ||
 	   (flags & WALK_NOFOLLOW)) {
@@ -1870,9 +1854,7 @@ static const char *step_into(struct nameidata *nd, int flags,
 	return pick_link(nd, &path, inode, seq, flags);
 }
 
-static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
-					struct inode **inodep,
-					unsigned *seqp)
+static struct dentry *follow_dotdot_rcu(struct nameidata *nd, unsigned *seqp)
 {
 	struct dentry *parent, *old;
 
@@ -1895,7 +1877,6 @@ static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
 	}
 	old = nd->path.dentry;
 	parent = old->d_parent;
-	*inodep = parent->d_inode;
 	*seqp = read_seqcount_begin(&parent->d_seq);
 	if (unlikely(read_seqcount_retry(&old->d_seq, nd->seq)))
 		return ERR_PTR(-ECHILD);
@@ -1910,9 +1891,7 @@ static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
 	return NULL;
 }
 
-static struct dentry *follow_dotdot(struct nameidata *nd,
-				 struct inode **inodep,
-				 unsigned *seqp)
+static struct dentry *follow_dotdot(struct nameidata *nd, unsigned *seqp)
 {
 	struct dentry *parent;
 
@@ -1937,7 +1916,6 @@ static struct dentry *follow_dotdot(struct nameidata *nd,
 		return ERR_PTR(-ENOENT);
 	}
 	*seqp = 0;
-	*inodep = parent->d_inode;
 	return parent;
 
 in_root:
@@ -1952,7 +1930,6 @@ static const char *handle_dots(struct nameidata *nd, int type)
 	if (type == LAST_DOTDOT) {
 		const char *error = NULL;
 		struct dentry *parent;
-		struct inode *inode;
 		unsigned seq;
 
 		if (!nd->root.mnt) {
@@ -1961,17 +1938,17 @@ static const char *handle_dots(struct nameidata *nd, int type)
 				return error;
 		}
 		if (nd->flags & LOOKUP_RCU)
-			parent = follow_dotdot_rcu(nd, &inode, &seq);
+			parent = follow_dotdot_rcu(nd, &seq);
 		else
-			parent = follow_dotdot(nd, &inode, &seq);
+			parent = follow_dotdot(nd, &seq);
 		if (IS_ERR(parent))
 			return ERR_CAST(parent);
 		if (unlikely(!parent))
 			error = step_into(nd, WALK_NOFOLLOW,
-					 nd->path.dentry, nd->inode, nd->seq);
+					 nd->path.dentry, nd->seq);
 		else
 			error = step_into(nd, WALK_NOFOLLOW,
-					 parent, inode, seq);
+					 parent, seq);
 		if (unlikely(error))
 			return error;
 
@@ -1995,7 +1972,6 @@ static const char *handle_dots(struct nameidata *nd, int type)
 static const char *walk_component(struct nameidata *nd, int flags)
 {
 	struct dentry *dentry;
-	struct inode *inode;
 	unsigned seq;
 	/*
 	 * "." and ".." are special - ".." especially so because it has
@@ -2007,7 +1983,7 @@ static const char *walk_component(struct nameidata *nd, int flags)
 			put_link(nd);
 		return handle_dots(nd, nd->last_type);
 	}
-	dentry = lookup_fast(nd, &inode, &seq);
+	dentry = lookup_fast(nd, &seq);
 	if (IS_ERR(dentry))
 		return ERR_CAST(dentry);
 	if (unlikely(!dentry)) {
@@ -2017,7 +1993,7 @@ static const char *walk_component(struct nameidata *nd, int flags)
 	}
 	if (!(flags & WALK_MORE) && nd->depth)
 		put_link(nd);
-	return step_into(nd, flags, dentry, inode, seq);
+	return step_into(nd, flags, dentry, seq);
 }
 
 /*
@@ -2473,8 +2449,7 @@ static int handle_lookup_down(struct nameidata *nd)
 {
 	if (!(nd->flags & LOOKUP_RCU))
 		dget(nd->path.dentry);
-	return PTR_ERR(step_into(nd, WALK_NOFOLLOW,
-			nd->path.dentry, nd->inode, nd->seq));
+	return PTR_ERR(step_into(nd, WALK_NOFOLLOW, nd->path.dentry, nd->seq));
 }
 
 /* Returns 0 and nd will be valid on success; Retuns error, otherwise. */
@@ -3394,7 +3369,6 @@ static const char *open_last_lookups(struct nameidata *nd,
 	int open_flag = op->open_flag;
 	bool got_write = false;
 	unsigned seq;
-	struct inode *inode;
 	struct dentry *dentry;
 	const char *res;
 
@@ -3410,7 +3384,7 @@ static const char *open_last_lookups(struct nameidata *nd,
 		if (nd->last.name[nd->last.len])
 			nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
 		/* we _can_ be in RCU mode here */
-		dentry = lookup_fast(nd, &inode, &seq);
+		dentry = lookup_fast(nd, &seq);
 		if (IS_ERR(dentry))
 			return ERR_CAST(dentry);
 		if (likely(dentry))
@@ -3464,7 +3438,7 @@ static const char *open_last_lookups(struct nameidata *nd,
 finish_lookup:
 	if (nd->depth)
 		put_link(nd);
-	res = step_into(nd, WALK_TRAILING, dentry, inode, seq);
+	res = step_into(nd, WALK_TRAILING, dentry, seq);
 	if (unlikely(res))
 		nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
 	return res;

^ permalink raw reply related	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 43/45] namei: initialize parameters passed to step_into()
  2022-07-04  2:52     ` Al Viro
@ 2022-07-04  8:20       ` Alexander Potapenko
  2022-07-04 13:44         ` Al Viro
  2022-07-04 17:36       ` Linus Torvalds
  1 sibling, 1 reply; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-04  8:20 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

On Mon, Jul 4, 2022 at 4:53 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Sat, Jul 02, 2022 at 10:23:16AM -0700, Linus Torvalds wrote:
>
> > Al - can you please take a quick look?
>
> FWIW, trying to write a coherent documentation had its usual effect...
> The thing is, we don't really need to fetch the inode that early.
> All we really care about is that in RCU mode ->d_seq gets sampled
> before we fetch ->d_inode *and* we don't treat "it looks negative"
> as hard -ENOENT in case of ->d_seq mismatch.
>
> Which can be bloody well left to step_into().  So we don't need
> to pass it inode argument at all - just dentry and seq.  Makes
> a bunch of functions simpler as well...
>
> It does *not* deal with the "uninitialized" seq argument in
> !RCU case; I'll handle that in the followup, but that's a separate
> story, IMO (and very clearly a false positive).

I can confirm that your patch fixes KMSAN reports on inode, yet the
following reports still persist:

=====================================================
BUG: KMSAN: uninit-value in walk_component+0x5e7/0x6c0 fs/namei.c:1996
 walk_component+0x5e7/0x6c0 fs/namei.c:1996
 lookup_last fs/namei.c:2445
 path_lookupat+0x27d/0x6f0 fs/namei.c:2468
 filename_lookup+0x24c/0x800 fs/namei.c:2497
 kern_path+0x79/0x3a0 fs/namei.c:2587
 init_stat+0x72/0x13f fs/init.c:132
 clean_path+0x74/0x24c init/initramfs.c:339
 do_name+0x12d/0xc17 init/initramfs.c:371
 write_buffer init/initramfs.c:457
 unpack_to_rootfs+0x49a/0xd9e init/initramfs.c:510
 do_populate_rootfs+0x57/0x40f init/initramfs.c:699
 async_run_entry_fn+0x8f/0x400 kernel/async.c:127
 process_one_work+0xb27/0x13e0 kernel/workqueue.c:2289
 worker_thread+0x1076/0x1d60 kernel/workqueue.c:2436
 kthread+0x31b/0x430 kernel/kthread.c:376
 ret_from_fork+0x1f/0x30 ??:?

Local variable seq created at:
 walk_component+0x46/0x6c0 fs/namei.c:1981
 lookup_last fs/namei.c:2445
 path_lookupat+0x27d/0x6f0 fs/namei.c:2468

CPU: 0 PID: 10 Comm: kworker/u9:0 Tainted: G    B
5.19.0-rc4-00059-gcf2d25715943-dirty #103
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Workqueue: events_unbound async_run_entry_fn
=====================================================

What makes you think they are false positives? Is the scenario I
described above:

"""
In particular, if the call to lookup_fast() in walk_component()
returns NULL, and lookup_slow() returns a valid dentry, then the
`seq` and `inode` will remain uninitialized until the call to
step_into()
"""

impossible?

> Cumulative diff follows; splitup is in #work.namei.  Comments?
>
> diff --git a/fs/namei.c b/fs/namei.c
> index 1f28d3f463c3..7f4f61ade9e3 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -1467,7 +1467,7 @@ EXPORT_SYMBOL(follow_down);
>   * we meet a managed dentry that would need blocking.
>   */
>  static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
> -                              struct inode **inode, unsigned *seqp)
> +                              unsigned *seqp)
>  {
>         struct dentry *dentry = path->dentry;
>         unsigned int flags = dentry->d_flags;
> @@ -1497,13 +1497,6 @@ static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
>                                 dentry = path->dentry = mounted->mnt.mnt_root;
>                                 nd->state |= ND_JUMPED;
>                                 *seqp = read_seqcount_begin(&dentry->d_seq);
> -                               *inode = dentry->d_inode;
> -                               /*
> -                                * We don't need to re-check ->d_seq after this
> -                                * ->d_inode read - there will be an RCU delay
> -                                * between mount hash removal and ->mnt_root
> -                                * becoming unpinned.
> -                                */
>                                 flags = dentry->d_flags;
>                                 continue;
>                         }
> @@ -1515,8 +1508,7 @@ static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
>  }
>
>  static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
> -                         struct path *path, struct inode **inode,
> -                         unsigned int *seqp)
> +                         struct path *path, unsigned int *seqp)
>  {
>         bool jumped;
>         int ret;
> @@ -1525,9 +1517,7 @@ static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
>         path->dentry = dentry;
>         if (nd->flags & LOOKUP_RCU) {
>                 unsigned int seq = *seqp;
> -               if (unlikely(!*inode))
> -                       return -ENOENT;
> -               if (likely(__follow_mount_rcu(nd, path, inode, seqp)))
> +               if (likely(__follow_mount_rcu(nd, path, seqp)))
>                         return 0;
>                 if (!try_to_unlazy_next(nd, dentry, seq))
>                         return -ECHILD;
> @@ -1547,7 +1537,6 @@ static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
>                 if (path->mnt != nd->path.mnt)
>                         mntput(path->mnt);
>         } else {
> -               *inode = d_backing_inode(path->dentry);
>                 *seqp = 0; /* out of RCU mode, so the value doesn't matter */
>         }
>         return ret;
> @@ -1607,9 +1596,7 @@ static struct dentry *__lookup_hash(const struct qstr *name,
>         return dentry;
>  }
>
> -static struct dentry *lookup_fast(struct nameidata *nd,
> -                                 struct inode **inode,
> -                                 unsigned *seqp)
> +static struct dentry *lookup_fast(struct nameidata *nd, unsigned *seqp)
>  {
>         struct dentry *dentry, *parent = nd->path.dentry;
>         int status = 1;
> @@ -1628,22 +1615,11 @@ static struct dentry *lookup_fast(struct nameidata *nd,
>                         return NULL;
>                 }
>
> -               /*
> -                * This sequence count validates that the inode matches
> -                * the dentry name information from lookup.
> -                */
> -               *inode = d_backing_inode(dentry);
> -               if (unlikely(read_seqcount_retry(&dentry->d_seq, seq)))
> -                       return ERR_PTR(-ECHILD);
> -
> -               /*
> +               /*
>                  * This sequence count validates that the parent had no
>                  * changes while we did the lookup of the dentry above.
> -                *
> -                * The memory barrier in read_seqcount_begin of child is
> -                *  enough, we can use __read_seqcount_retry here.
>                  */
> -               if (unlikely(__read_seqcount_retry(&parent->d_seq, nd->seq)))
> +               if (unlikely(read_seqcount_retry(&parent->d_seq, nd->seq)))
>                         return ERR_PTR(-ECHILD);
>
>                 *seqp = seq;
> @@ -1838,13 +1814,21 @@ static const char *pick_link(struct nameidata *nd, struct path *link,
>   * for the common case.
>   */
>  static const char *step_into(struct nameidata *nd, int flags,
> -                    struct dentry *dentry, struct inode *inode, unsigned seq)
> +                    struct dentry *dentry, unsigned seq)
>  {
>         struct path path;
> -       int err = handle_mounts(nd, dentry, &path, &inode, &seq);
> +       struct inode *inode;
> +       int err = handle_mounts(nd, dentry, &path, &seq);
>
>         if (err < 0)
>                 return ERR_PTR(err);
> +       inode = path.dentry->d_inode;
> +       if (unlikely(!inode)) {
> +               if ((nd->flags & LOOKUP_RCU) &&
> +                   read_seqcount_retry(&path.dentry->d_seq, seq))
> +                       return ERR_PTR(-ECHILD);
> +               return ERR_PTR(-ENOENT);
> +       }
>         if (likely(!d_is_symlink(path.dentry)) ||
>            ((flags & WALK_TRAILING) && !(nd->flags & LOOKUP_FOLLOW)) ||
>            (flags & WALK_NOFOLLOW)) {
> @@ -1870,9 +1854,7 @@ static const char *step_into(struct nameidata *nd, int flags,
>         return pick_link(nd, &path, inode, seq, flags);
>  }
>
> -static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
> -                                       struct inode **inodep,
> -                                       unsigned *seqp)
> +static struct dentry *follow_dotdot_rcu(struct nameidata *nd, unsigned *seqp)
>  {
>         struct dentry *parent, *old;
>
> @@ -1895,7 +1877,6 @@ static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
>         }
>         old = nd->path.dentry;
>         parent = old->d_parent;
> -       *inodep = parent->d_inode;
>         *seqp = read_seqcount_begin(&parent->d_seq);
>         if (unlikely(read_seqcount_retry(&old->d_seq, nd->seq)))
>                 return ERR_PTR(-ECHILD);
> @@ -1910,9 +1891,7 @@ static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
>         return NULL;
>  }
>
> -static struct dentry *follow_dotdot(struct nameidata *nd,
> -                                struct inode **inodep,
> -                                unsigned *seqp)
> +static struct dentry *follow_dotdot(struct nameidata *nd, unsigned *seqp)
>  {
>         struct dentry *parent;
>
> @@ -1937,7 +1916,6 @@ static struct dentry *follow_dotdot(struct nameidata *nd,
>                 return ERR_PTR(-ENOENT);
>         }
>         *seqp = 0;
> -       *inodep = parent->d_inode;
>         return parent;
>
>  in_root:
> @@ -1952,7 +1930,6 @@ static const char *handle_dots(struct nameidata *nd, int type)
>         if (type == LAST_DOTDOT) {
>                 const char *error = NULL;
>                 struct dentry *parent;
> -               struct inode *inode;
>                 unsigned seq;
>
>                 if (!nd->root.mnt) {
> @@ -1961,17 +1938,17 @@ static const char *handle_dots(struct nameidata *nd, int type)
>                                 return error;
>                 }
>                 if (nd->flags & LOOKUP_RCU)
> -                       parent = follow_dotdot_rcu(nd, &inode, &seq);
> +                       parent = follow_dotdot_rcu(nd, &seq);
>                 else
> -                       parent = follow_dotdot(nd, &inode, &seq);
> +                       parent = follow_dotdot(nd, &seq);
>                 if (IS_ERR(parent))
>                         return ERR_CAST(parent);
>                 if (unlikely(!parent))
>                         error = step_into(nd, WALK_NOFOLLOW,
> -                                        nd->path.dentry, nd->inode, nd->seq);
> +                                        nd->path.dentry, nd->seq);
>                 else
>                         error = step_into(nd, WALK_NOFOLLOW,
> -                                        parent, inode, seq);
> +                                        parent, seq);
>                 if (unlikely(error))
>                         return error;
>
> @@ -1995,7 +1972,6 @@ static const char *handle_dots(struct nameidata *nd, int type)
>  static const char *walk_component(struct nameidata *nd, int flags)
>  {
>         struct dentry *dentry;
> -       struct inode *inode;
>         unsigned seq;
>         /*
>          * "." and ".." are special - ".." especially so because it has
> @@ -2007,7 +1983,7 @@ static const char *walk_component(struct nameidata *nd, int flags)
>                         put_link(nd);
>                 return handle_dots(nd, nd->last_type);
>         }
> -       dentry = lookup_fast(nd, &inode, &seq);
> +       dentry = lookup_fast(nd, &seq);
>         if (IS_ERR(dentry))
>                 return ERR_CAST(dentry);
>         if (unlikely(!dentry)) {
> @@ -2017,7 +1993,7 @@ static const char *walk_component(struct nameidata *nd, int flags)
>         }
>         if (!(flags & WALK_MORE) && nd->depth)
>                 put_link(nd);
> -       return step_into(nd, flags, dentry, inode, seq);
> +       return step_into(nd, flags, dentry, seq);
>  }
>
>  /*
> @@ -2473,8 +2449,7 @@ static int handle_lookup_down(struct nameidata *nd)
>  {
>         if (!(nd->flags & LOOKUP_RCU))
>                 dget(nd->path.dentry);
> -       return PTR_ERR(step_into(nd, WALK_NOFOLLOW,
> -                       nd->path.dentry, nd->inode, nd->seq));
> +       return PTR_ERR(step_into(nd, WALK_NOFOLLOW, nd->path.dentry, nd->seq));
>  }
>
>  /* Returns 0 and nd will be valid on success; Retuns error, otherwise. */
> @@ -3394,7 +3369,6 @@ static const char *open_last_lookups(struct nameidata *nd,
>         int open_flag = op->open_flag;
>         bool got_write = false;
>         unsigned seq;
> -       struct inode *inode;
>         struct dentry *dentry;
>         const char *res;
>
> @@ -3410,7 +3384,7 @@ static const char *open_last_lookups(struct nameidata *nd,
>                 if (nd->last.name[nd->last.len])
>                         nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
>                 /* we _can_ be in RCU mode here */
> -               dentry = lookup_fast(nd, &inode, &seq);
> +               dentry = lookup_fast(nd, &seq);
>                 if (IS_ERR(dentry))
>                         return ERR_CAST(dentry);
>                 if (likely(dentry))
> @@ -3464,7 +3438,7 @@ static const char *open_last_lookups(struct nameidata *nd,
>  finish_lookup:
>         if (nd->depth)
>                 put_link(nd);
> -       res = step_into(nd, WALK_TRAILING, dentry, inode, seq);
> +       res = step_into(nd, WALK_TRAILING, dentry, seq);
>         if (unlikely(res))
>                 nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
>         return res;



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

Diese E-Mail ist vertraulich. Falls Sie diese fälschlicherweise
erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes
weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich
bitte wissen, dass die E-Mail an die falsche Person gesendet wurde.


This e-mail is confidential. If you received this communication by
mistake, please don't forward it to anyone else, please erase all
copies and attachments, and please let me know that it has gone to the
wrong person.

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 43/45] namei: initialize parameters passed to step_into()
  2022-07-04  8:20       ` Alexander Potapenko
@ 2022-07-04 13:44         ` Al Viro
  2022-07-04 13:55           ` Al Viro
                             ` (2 more replies)
  0 siblings, 3 replies; 147+ messages in thread
From: Al Viro @ 2022-07-04 13:44 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Linus Torvalds, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

On Mon, Jul 04, 2022 at 10:20:53AM +0200, Alexander Potapenko wrote:

> What makes you think they are false positives? Is the scenario I
> described above:
> 
> """
> In particular, if the call to lookup_fast() in walk_component()
> returns NULL, and lookup_slow() returns a valid dentry, then the
> `seq` and `inode` will remain uninitialized until the call to
> step_into()
> """
> 
> impossible?

Suppose step_into() has been called in non-RCU mode.  The first
thing it does is
	int err = handle_mounts(nd, dentry, &path, &seq);
	if (err < 0) 
		return ERR_PTR(err);

And handle_mounts() in non-RCU mode is
	path->mnt = nd->path.mnt;
	path->dentry = dentry;
	if (nd->flags & LOOKUP_RCU) {
		[unreachable code]
	}
	[code not touching seqp]
	if (unlikely(ret)) {
		[code not touching seqp]
	} else {
		*seqp = 0; /* out of RCU mode, so the value doesn't matter */
	}
	return ret;

In other words, the value seq argument of step_into() used to have ends up
being never fetched and, in case step_into() gets past that if (err < 0)
that value is replaced with zero before any further accesses.

So it's a false positive; yes, strictly speaking compiler is allowd
to do anything whatsoever if it manages to prove that the value is
uninitialized.  Realistically, though, especially since unsigned int
is not allowed any trapping representations...

If you want an test stripped of VFS specifics, consider this:

int g(int n, _Bool flag)
{
	if (!flag)
		n = 0;
	return n + 1;
}

int f(int n, _Bool flag)
{
	int x;

	if (flag)
		x = n + 2;
	return g(x, flag);
}

Do your tools trigger on it?

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 43/45] namei: initialize parameters passed to step_into()
  2022-07-04 13:44         ` Al Viro
@ 2022-07-04 13:55           ` Al Viro
  2022-07-04 15:49           ` Alexander Potapenko
  2022-07-04 16:00           ` Al Viro
  2 siblings, 0 replies; 147+ messages in thread
From: Al Viro @ 2022-07-04 13:55 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Linus Torvalds, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

On Mon, Jul 04, 2022 at 02:44:00PM +0100, Al Viro wrote:
> On Mon, Jul 04, 2022 at 10:20:53AM +0200, Alexander Potapenko wrote:
> 
> > What makes you think they are false positives? Is the scenario I
> > described above:
> > 
> > """
> > In particular, if the call to lookup_fast() in walk_component()
> > returns NULL, and lookup_slow() returns a valid dentry, then the
> > `seq` and `inode` will remain uninitialized until the call to
> > step_into()
> > """
> > 
> > impossible?
> 
> Suppose step_into() has been called in non-RCU mode.  The first
> thing it does is
> 	int err = handle_mounts(nd, dentry, &path, &seq);
> 	if (err < 0) 
> 		return ERR_PTR(err);
> 
> And handle_mounts() in non-RCU mode is
> 	path->mnt = nd->path.mnt;
> 	path->dentry = dentry;
> 	if (nd->flags & LOOKUP_RCU) {
> 		[unreachable code]
> 	}
> 	[code not touching seqp]
> 	if (unlikely(ret)) {
> 		[code not touching seqp]
> 	} else {
> 		*seqp = 0; /* out of RCU mode, so the value doesn't matter */
> 	}
> 	return ret;

Make that
 	[code assigning ret a non-negative value and never using seqp]
 	if (unlikely(ret)) {
 		[code never using seqp or ret]
 	} else {
 		*seqp = 0; /* out of RCU mode, so the value doesn't matter */
 	}
 	return ret;

so if (err < 0) in the caller is equivalent to if (err).

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 43/45] namei: initialize parameters passed to step_into()
  2022-07-04 13:44         ` Al Viro
  2022-07-04 13:55           ` Al Viro
@ 2022-07-04 15:49           ` Alexander Potapenko
  2022-07-04 16:03             ` Greg Kroah-Hartman
  2022-07-04 18:23             ` Segher Boessenkool
  2022-07-04 16:00           ` Al Viro
  2 siblings, 2 replies; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-04 15:49 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

On Mon, Jul 4, 2022 at 3:44 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Mon, Jul 04, 2022 at 10:20:53AM +0200, Alexander Potapenko wrote:
>
> > What makes you think they are false positives? Is the scenario I
> > described above:
> >
> > """
> > In particular, if the call to lookup_fast() in walk_component()
> > returns NULL, and lookup_slow() returns a valid dentry, then the
> > `seq` and `inode` will remain uninitialized until the call to
> > step_into()
> > """
> >
> > impossible?
>
> Suppose step_into() has been called in non-RCU mode.  The first
> thing it does is
>         int err = handle_mounts(nd, dentry, &path, &seq);
>         if (err < 0)
>                 return ERR_PTR(err);
>
> And handle_mounts() in non-RCU mode is
>         path->mnt = nd->path.mnt;
>         path->dentry = dentry;
>         if (nd->flags & LOOKUP_RCU) {
>                 [unreachable code]
>         }
>         [code not touching seqp]
>         if (unlikely(ret)) {
>                 [code not touching seqp]
>         } else {
>                 *seqp = 0; /* out of RCU mode, so the value doesn't matter */
>         }
>         return ret;
>
> In other words, the value seq argument of step_into() used to have ends up
> being never fetched and, in case step_into() gets past that if (err < 0)
> that value is replaced with zero before any further accesses.

Oh, I see. That is actually what had been discussed here:
https://lore.kernel.org/linux-toolchains/20220614144853.3693273-1-glider@google.com/
Indeed, step_into() in its current implementation does not use `seq`
(which is noted in the patch description ;)), but the question is
whether we want to catch such cases regardless of that.
One of the reasons to do so is standard compliance - passing an
uninitialized value to a function is UB in C11, as Segher pointed out
here: https://lore.kernel.org/linux-toolchains/20220614214039.GA25951@gate.crashing.org/
The compilers may not be smart enough to take advantage of this _yet_,
but I wouldn't underestimate their ability to evolve (especially that
of Clang).
I also believe it's fragile to rely on the callee to ignore certain
parameters: it may be doing so today, but if someone changes
step_into() tomorrow we may miss it.

If I am reading Linus's message here (and the following one from him
in the same thread):
https://lore.kernel.org/linux-toolchains/CAHk-=whjz3wO8zD+itoerphWem+JZz4uS3myf6u1Wd6epGRgmQ@mail.gmail.com/
correctly, we should be reporting uninitialized values passed to
functions, unless those values dissolve after inlining.
While this is a bit of a vague criterion, at least for Clang we always
know that KMSAN instrumentation is applied after inlining, so the
reports we see are due to values that are actually passed between
functions.

> So it's a false positive; yes, strictly speaking compiler is allowd
> to do anything whatsoever if it manages to prove that the value is
> uninitialized.  Realistically, though, especially since unsigned int
> is not allowed any trapping representations...
>
> If you want an test stripped of VFS specifics, consider this:
>
> int g(int n, _Bool flag)
> {
>         if (!flag)
>                 n = 0;
>         return n + 1;
> }
>
> int f(int n, _Bool flag)
> {
>         int x;
>
>         if (flag)
>                 x = n + 2;
>         return g(x, flag);
> }
>
> Do your tools trigger on it?

Currently KMSAN has two modes of operation controlled by
CONFIG_KMSAN_CHECK_PARAM_RETVAL.
When enabled, that config enforces checks of function parameters at
call sites (by applying Clang's -fsanitize-memory-param-retval flag).
In that mode the tool would report the attempt to call `g(x, false)`
if g() survives inlining.
In the case CONFIG_KMSAN_CHECK_PARAM_RETVAL=n, KMSAN won't be
reporting the error.

Based on the mentioned discussion I decided to make
CONFIG_KMSAN_CHECK_PARAM_RETVAL=y the default option.
So far it only reported a handful of errors (i.e. enforcing this rule
shouldn't be very problematic for the kernel), but it simplifies
handling of calls between instrumented and non-instrumented functions
that occur e.g. at syscall entry points: knowing that passed-by-value
arguments are checked at call sites, we can assume they are
initialized in the callees.


HTH,
Alex

-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

Diese E-Mail ist vertraulich. Falls Sie diese fälschlicherweise
erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes
weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich
bitte wissen, dass die E-Mail an die falsche Person gesendet wurde.


This e-mail is confidential. If you received this communication by
mistake, please don't forward it to anyone else, please erase all
copies and attachments, and please let me know that it has gone to the
wrong person.

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 43/45] namei: initialize parameters passed to step_into()
  2022-07-04 13:44         ` Al Viro
  2022-07-04 13:55           ` Al Viro
  2022-07-04 15:49           ` Alexander Potapenko
@ 2022-07-04 16:00           ` Al Viro
  2022-07-04 16:47             ` Alexander Potapenko
  2 siblings, 1 reply; 147+ messages in thread
From: Al Viro @ 2022-07-04 16:00 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Linus Torvalds, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

On Mon, Jul 04, 2022 at 02:44:00PM +0100, Al Viro wrote:
> On Mon, Jul 04, 2022 at 10:20:53AM +0200, Alexander Potapenko wrote:
> 
> > What makes you think they are false positives? Is the scenario I
> > described above:
> > 
> > """
> > In particular, if the call to lookup_fast() in walk_component()
> > returns NULL, and lookup_slow() returns a valid dentry, then the
> > `seq` and `inode` will remain uninitialized until the call to
> > step_into()
> > """
> > 
> > impossible?
> 
> Suppose step_into() has been called in non-RCU mode.  The first
> thing it does is
> 	int err = handle_mounts(nd, dentry, &path, &seq);
> 	if (err < 0) 
> 		return ERR_PTR(err);
> 
> And handle_mounts() in non-RCU mode is
> 	path->mnt = nd->path.mnt;
> 	path->dentry = dentry;
> 	if (nd->flags & LOOKUP_RCU) {
> 		[unreachable code]
> 	}
> 	[code not touching seqp]
> 	if (unlikely(ret)) {
> 		[code not touching seqp]
> 	} else {
> 		*seqp = 0; /* out of RCU mode, so the value doesn't matter */
> 	}
> 	return ret;
> 
> In other words, the value seq argument of step_into() used to have ends up
> being never fetched and, in case step_into() gets past that if (err < 0)
> that value is replaced with zero before any further accesses.
> 
> So it's a false positive; yes, strictly speaking compiler is allowd
> to do anything whatsoever if it manages to prove that the value is
> uninitialized.  Realistically, though, especially since unsigned int
> is not allowed any trapping representations...

FWIW, update (and yet untested) branch is in #work.namei.  Compared to the
previous, we store sampled ->d_seq of the next dentry in nd->next_seq,
rather than bothering with local variables.  AFAICS, it ends up with
better code that way.  And both ->seq and ->next_seq are zeroed at the
moments when we switch to non-RCU mode (as well as non-RCU path_init()).

IMO it looks saner that way.  NOTE: it still needs to be tested and probably
reordered and massaged; it's not for merge at the moment.  Current cumulative
diff follows:

diff --git a/fs/namei.c b/fs/namei.c
index 1f28d3f463c3..b12c6b53579d 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -567,7 +567,7 @@ struct nameidata {
 	struct path	root;
 	struct inode	*inode; /* path.dentry.d_inode */
 	unsigned int	flags, state;
-	unsigned	seq, m_seq, r_seq;
+	unsigned	seq, next_seq, m_seq, r_seq;
 	int		last_type;
 	unsigned	depth;
 	int		total_link_count;
@@ -772,6 +772,7 @@ static bool try_to_unlazy(struct nameidata *nd)
 		goto out;
 	if (unlikely(!legitimize_root(nd)))
 		goto out;
+	nd->seq = nd->next_seq = 0;
 	rcu_read_unlock();
 	BUG_ON(nd->inode != parent->d_inode);
 	return true;
@@ -780,6 +781,7 @@ static bool try_to_unlazy(struct nameidata *nd)
 	nd->path.mnt = NULL;
 	nd->path.dentry = NULL;
 out:
+	nd->seq = nd->next_seq = 0;
 	rcu_read_unlock();
 	return false;
 }
@@ -788,7 +790,6 @@ static bool try_to_unlazy(struct nameidata *nd)
  * try_to_unlazy_next - try to switch to ref-walk mode.
  * @nd: nameidata pathwalk data
  * @dentry: next dentry to step into
- * @seq: seq number to check @dentry against
  * Returns: true on success, false on failure
  *
  * Similar to try_to_unlazy(), but here we have the next dentry already
@@ -797,7 +798,7 @@ static bool try_to_unlazy(struct nameidata *nd)
  * Nothing should touch nameidata between try_to_unlazy_next() failure and
  * terminate_walk().
  */
-static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry, unsigned seq)
+static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry)
 {
 	BUG_ON(!(nd->flags & LOOKUP_RCU));
 
@@ -818,7 +819,7 @@ static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry, unsi
 	 */
 	if (unlikely(!lockref_get_not_dead(&dentry->d_lockref)))
 		goto out;
-	if (unlikely(read_seqcount_retry(&dentry->d_seq, seq)))
+	if (unlikely(read_seqcount_retry(&dentry->d_seq, nd->next_seq)))
 		goto out_dput;
 	/*
 	 * Sequence counts matched. Now make sure that the root is
@@ -826,6 +827,7 @@ static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry, unsi
 	 */
 	if (unlikely(!legitimize_root(nd)))
 		goto out_dput;
+	nd->seq = nd->next_seq = 0;
 	rcu_read_unlock();
 	return true;
 
@@ -834,9 +836,11 @@ static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry, unsi
 out1:
 	nd->path.dentry = NULL;
 out:
+	nd->seq = nd->next_seq = 0;
 	rcu_read_unlock();
 	return false;
 out_dput:
+	nd->seq = nd->next_seq = 0;
 	rcu_read_unlock();
 	dput(dentry);
 	return false;
@@ -1466,8 +1470,7 @@ EXPORT_SYMBOL(follow_down);
  * Try to skip to top of mountpoint pile in rcuwalk mode.  Fail if
  * we meet a managed dentry that would need blocking.
  */
-static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
-			       struct inode **inode, unsigned *seqp)
+static bool __follow_mount_rcu(struct nameidata *nd, struct path *path)
 {
 	struct dentry *dentry = path->dentry;
 	unsigned int flags = dentry->d_flags;
@@ -1496,14 +1499,7 @@ static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
 				path->mnt = &mounted->mnt;
 				dentry = path->dentry = mounted->mnt.mnt_root;
 				nd->state |= ND_JUMPED;
-				*seqp = read_seqcount_begin(&dentry->d_seq);
-				*inode = dentry->d_inode;
-				/*
-				 * We don't need to re-check ->d_seq after this
-				 * ->d_inode read - there will be an RCU delay
-				 * between mount hash removal and ->mnt_root
-				 * becoming unpinned.
-				 */
+				nd->next_seq = read_seqcount_begin(&dentry->d_seq);
 				flags = dentry->d_flags;
 				continue;
 			}
@@ -1515,8 +1511,7 @@ static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
 }
 
 static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
-			  struct path *path, struct inode **inode,
-			  unsigned int *seqp)
+			  struct path *path)
 {
 	bool jumped;
 	int ret;
@@ -1524,16 +1519,15 @@ static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
 	path->mnt = nd->path.mnt;
 	path->dentry = dentry;
 	if (nd->flags & LOOKUP_RCU) {
-		unsigned int seq = *seqp;
-		if (unlikely(!*inode))
-			return -ENOENT;
-		if (likely(__follow_mount_rcu(nd, path, inode, seqp)))
+		unsigned int seq = nd->next_seq;
+		if (likely(__follow_mount_rcu(nd, path)))
 			return 0;
-		if (!try_to_unlazy_next(nd, dentry, seq))
-			return -ECHILD;
-		// *path might've been clobbered by __follow_mount_rcu()
+		// *path and nd->next_seq might've been just clobbered
 		path->mnt = nd->path.mnt;
 		path->dentry = dentry;
+		nd->next_seq = seq;
+		if (!try_to_unlazy_next(nd, dentry))
+			return -ECHILD;
 	}
 	ret = traverse_mounts(path, &jumped, &nd->total_link_count, nd->flags);
 	if (jumped) {
@@ -1546,9 +1540,6 @@ static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
 		dput(path->dentry);
 		if (path->mnt != nd->path.mnt)
 			mntput(path->mnt);
-	} else {
-		*inode = d_backing_inode(path->dentry);
-		*seqp = 0; /* out of RCU mode, so the value doesn't matter */
 	}
 	return ret;
 }
@@ -1607,9 +1598,7 @@ static struct dentry *__lookup_hash(const struct qstr *name,
 	return dentry;
 }
 
-static struct dentry *lookup_fast(struct nameidata *nd,
-				  struct inode **inode,
-			          unsigned *seqp)
+static struct dentry *lookup_fast(struct nameidata *nd)
 {
 	struct dentry *dentry, *parent = nd->path.dentry;
 	int status = 1;
@@ -1620,37 +1609,24 @@ static struct dentry *lookup_fast(struct nameidata *nd,
 	 * going to fall back to non-racy lookup.
 	 */
 	if (nd->flags & LOOKUP_RCU) {
-		unsigned seq;
-		dentry = __d_lookup_rcu(parent, &nd->last, &seq);
+		dentry = __d_lookup_rcu(parent, &nd->last, &nd->next_seq);
 		if (unlikely(!dentry)) {
 			if (!try_to_unlazy(nd))
 				return ERR_PTR(-ECHILD);
 			return NULL;
 		}
 
-		/*
-		 * This sequence count validates that the inode matches
-		 * the dentry name information from lookup.
-		 */
-		*inode = d_backing_inode(dentry);
-		if (unlikely(read_seqcount_retry(&dentry->d_seq, seq)))
-			return ERR_PTR(-ECHILD);
-
-		/*
+	        /*
 		 * This sequence count validates that the parent had no
 		 * changes while we did the lookup of the dentry above.
-		 *
-		 * The memory barrier in read_seqcount_begin of child is
-		 *  enough, we can use __read_seqcount_retry here.
 		 */
-		if (unlikely(__read_seqcount_retry(&parent->d_seq, nd->seq)))
+		if (unlikely(read_seqcount_retry(&parent->d_seq, nd->seq)))
 			return ERR_PTR(-ECHILD);
 
-		*seqp = seq;
 		status = d_revalidate(dentry, nd->flags);
 		if (likely(status > 0))
 			return dentry;
-		if (!try_to_unlazy_next(nd, dentry, seq))
+		if (!try_to_unlazy_next(nd, dentry))
 			return ERR_PTR(-ECHILD);
 		if (status == -ECHILD)
 			/* we'd been told to redo it in non-rcu mode */
@@ -1731,7 +1707,7 @@ static inline int may_lookup(struct user_namespace *mnt_userns,
 	return inode_permission(mnt_userns, nd->inode, MAY_EXEC);
 }
 
-static int reserve_stack(struct nameidata *nd, struct path *link, unsigned seq)
+static int reserve_stack(struct nameidata *nd, struct path *link)
 {
 	if (unlikely(nd->total_link_count++ >= MAXSYMLINKS))
 		return -ELOOP;
@@ -1746,7 +1722,7 @@ static int reserve_stack(struct nameidata *nd, struct path *link, unsigned seq)
 	if (nd->flags & LOOKUP_RCU) {
 		// we need to grab link before we do unlazy.  And we can't skip
 		// unlazy even if we fail to grab the link - cleanup needs it
-		bool grabbed_link = legitimize_path(nd, link, seq);
+		bool grabbed_link = legitimize_path(nd, link, nd->next_seq);
 
 		if (!try_to_unlazy(nd) || !grabbed_link)
 			return -ECHILD;
@@ -1760,11 +1736,11 @@ static int reserve_stack(struct nameidata *nd, struct path *link, unsigned seq)
 enum {WALK_TRAILING = 1, WALK_MORE = 2, WALK_NOFOLLOW = 4};
 
 static const char *pick_link(struct nameidata *nd, struct path *link,
-		     struct inode *inode, unsigned seq, int flags)
+		     struct inode *inode, int flags)
 {
 	struct saved *last;
 	const char *res;
-	int error = reserve_stack(nd, link, seq);
+	int error = reserve_stack(nd, link);
 
 	if (unlikely(error)) {
 		if (!(nd->flags & LOOKUP_RCU))
@@ -1774,7 +1750,7 @@ static const char *pick_link(struct nameidata *nd, struct path *link,
 	last = nd->stack + nd->depth++;
 	last->link = *link;
 	clear_delayed_call(&last->done);
-	last->seq = seq;
+	last->seq = nd->next_seq;
 
 	if (flags & WALK_TRAILING) {
 		error = may_follow_link(nd, inode);
@@ -1836,15 +1812,25 @@ static const char *pick_link(struct nameidata *nd, struct path *link,
  * to do this check without having to look at inode->i_op,
  * so we keep a cache of "no, this doesn't need follow_link"
  * for the common case.
+ *
+ * NOTE: nd->next_seq must be the validated sampled dentry->d_seq
  */
 static const char *step_into(struct nameidata *nd, int flags,
-		     struct dentry *dentry, struct inode *inode, unsigned seq)
+		     struct dentry *dentry)
 {
 	struct path path;
-	int err = handle_mounts(nd, dentry, &path, &inode, &seq);
+	struct inode *inode;
+	int err = handle_mounts(nd, dentry, &path);
 
 	if (err < 0)
 		return ERR_PTR(err);
+	inode = path.dentry->d_inode;
+	if (unlikely(!inode)) {
+		if ((nd->flags & LOOKUP_RCU) &&
+		    read_seqcount_retry(&path.dentry->d_seq, nd->next_seq))
+			return ERR_PTR(-ECHILD);
+		return ERR_PTR(-ENOENT);
+	}
 	if (likely(!d_is_symlink(path.dentry)) ||
 	   ((flags & WALK_TRAILING) && !(nd->flags & LOOKUP_FOLLOW)) ||
 	   (flags & WALK_NOFOLLOW)) {
@@ -1856,23 +1842,21 @@ static const char *step_into(struct nameidata *nd, int flags,
 		}
 		nd->path = path;
 		nd->inode = inode;
-		nd->seq = seq;
+		nd->seq = nd->next_seq;
 		return NULL;
 	}
 	if (nd->flags & LOOKUP_RCU) {
 		/* make sure that d_is_symlink above matches inode */
-		if (read_seqcount_retry(&path.dentry->d_seq, seq))
+		if (read_seqcount_retry(&path.dentry->d_seq, nd->next_seq))
 			return ERR_PTR(-ECHILD);
 	} else {
 		if (path.mnt == nd->path.mnt)
 			mntget(path.mnt);
 	}
-	return pick_link(nd, &path, inode, seq, flags);
+	return pick_link(nd, &path, inode, flags);
 }
 
-static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
-					struct inode **inodep,
-					unsigned *seqp)
+static struct dentry *follow_dotdot_rcu(struct nameidata *nd)
 {
 	struct dentry *parent, *old;
 
@@ -1895,8 +1879,7 @@ static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
 	}
 	old = nd->path.dentry;
 	parent = old->d_parent;
-	*inodep = parent->d_inode;
-	*seqp = read_seqcount_begin(&parent->d_seq);
+	nd->next_seq = read_seqcount_begin(&parent->d_seq);
 	if (unlikely(read_seqcount_retry(&old->d_seq, nd->seq)))
 		return ERR_PTR(-ECHILD);
 	if (unlikely(!path_connected(nd->path.mnt, parent)))
@@ -1907,12 +1890,11 @@ static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
 		return ERR_PTR(-ECHILD);
 	if (unlikely(nd->flags & LOOKUP_BENEATH))
 		return ERR_PTR(-ECHILD);
-	return NULL;
+	nd->next_seq = nd->seq;
+	return nd->path.dentry;
 }
 
-static struct dentry *follow_dotdot(struct nameidata *nd,
-				 struct inode **inodep,
-				 unsigned *seqp)
+static struct dentry *follow_dotdot(struct nameidata *nd)
 {
 	struct dentry *parent;
 
@@ -1936,15 +1918,12 @@ static struct dentry *follow_dotdot(struct nameidata *nd,
 		dput(parent);
 		return ERR_PTR(-ENOENT);
 	}
-	*seqp = 0;
-	*inodep = parent->d_inode;
 	return parent;
 
 in_root:
 	if (unlikely(nd->flags & LOOKUP_BENEATH))
 		return ERR_PTR(-EXDEV);
-	dget(nd->path.dentry);
-	return NULL;
+	return dget(nd->path.dentry);
 }
 
 static const char *handle_dots(struct nameidata *nd, int type)
@@ -1952,8 +1931,6 @@ static const char *handle_dots(struct nameidata *nd, int type)
 	if (type == LAST_DOTDOT) {
 		const char *error = NULL;
 		struct dentry *parent;
-		struct inode *inode;
-		unsigned seq;
 
 		if (!nd->root.mnt) {
 			error = ERR_PTR(set_root(nd));
@@ -1961,17 +1938,12 @@ static const char *handle_dots(struct nameidata *nd, int type)
 				return error;
 		}
 		if (nd->flags & LOOKUP_RCU)
-			parent = follow_dotdot_rcu(nd, &inode, &seq);
+			parent = follow_dotdot_rcu(nd);
 		else
-			parent = follow_dotdot(nd, &inode, &seq);
+			parent = follow_dotdot(nd);
 		if (IS_ERR(parent))
 			return ERR_CAST(parent);
-		if (unlikely(!parent))
-			error = step_into(nd, WALK_NOFOLLOW,
-					 nd->path.dentry, nd->inode, nd->seq);
-		else
-			error = step_into(nd, WALK_NOFOLLOW,
-					 parent, inode, seq);
+		error = step_into(nd, WALK_NOFOLLOW, parent);
 		if (unlikely(error))
 			return error;
 
@@ -1995,8 +1967,6 @@ static const char *handle_dots(struct nameidata *nd, int type)
 static const char *walk_component(struct nameidata *nd, int flags)
 {
 	struct dentry *dentry;
-	struct inode *inode;
-	unsigned seq;
 	/*
 	 * "." and ".." are special - ".." especially so because it has
 	 * to be able to know about the current root directory and
@@ -2007,7 +1977,7 @@ static const char *walk_component(struct nameidata *nd, int flags)
 			put_link(nd);
 		return handle_dots(nd, nd->last_type);
 	}
-	dentry = lookup_fast(nd, &inode, &seq);
+	dentry = lookup_fast(nd);
 	if (IS_ERR(dentry))
 		return ERR_CAST(dentry);
 	if (unlikely(!dentry)) {
@@ -2017,7 +1987,7 @@ static const char *walk_component(struct nameidata *nd, int flags)
 	}
 	if (!(flags & WALK_MORE) && nd->depth)
 		put_link(nd);
-	return step_into(nd, flags, dentry, inode, seq);
+	return step_into(nd, flags, dentry);
 }
 
 /*
@@ -2372,6 +2342,8 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
 		flags &= ~LOOKUP_RCU;
 	if (flags & LOOKUP_RCU)
 		rcu_read_lock();
+	else
+		nd->seq = nd->next_seq = 0;
 
 	nd->flags = flags;
 	nd->state |= ND_JUMPED;
@@ -2473,8 +2445,8 @@ static int handle_lookup_down(struct nameidata *nd)
 {
 	if (!(nd->flags & LOOKUP_RCU))
 		dget(nd->path.dentry);
-	return PTR_ERR(step_into(nd, WALK_NOFOLLOW,
-			nd->path.dentry, nd->inode, nd->seq));
+	nd->next_seq = nd->seq;
+	return PTR_ERR(step_into(nd, WALK_NOFOLLOW, nd->path.dentry));
 }
 
 /* Returns 0 and nd will be valid on success; Retuns error, otherwise. */
@@ -3393,8 +3365,6 @@ static const char *open_last_lookups(struct nameidata *nd,
 	struct dentry *dir = nd->path.dentry;
 	int open_flag = op->open_flag;
 	bool got_write = false;
-	unsigned seq;
-	struct inode *inode;
 	struct dentry *dentry;
 	const char *res;
 
@@ -3410,7 +3380,7 @@ static const char *open_last_lookups(struct nameidata *nd,
 		if (nd->last.name[nd->last.len])
 			nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
 		/* we _can_ be in RCU mode here */
-		dentry = lookup_fast(nd, &inode, &seq);
+		dentry = lookup_fast(nd);
 		if (IS_ERR(dentry))
 			return ERR_CAST(dentry);
 		if (likely(dentry))
@@ -3464,7 +3434,7 @@ static const char *open_last_lookups(struct nameidata *nd,
 finish_lookup:
 	if (nd->depth)
 		put_link(nd);
-	res = step_into(nd, WALK_TRAILING, dentry, inode, seq);
+	res = step_into(nd, WALK_TRAILING, dentry);
 	if (unlikely(res))
 		nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
 	return res;

^ permalink raw reply related	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 43/45] namei: initialize parameters passed to step_into()
  2022-07-04 15:49           ` Alexander Potapenko
@ 2022-07-04 16:03             ` Greg Kroah-Hartman
  2022-07-04 16:33               ` Alexander Potapenko
  2022-07-04 18:23             ` Segher Boessenkool
  1 sibling, 1 reply; 147+ messages in thread
From: Greg Kroah-Hartman @ 2022-07-04 16:03 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Al Viro, Linus Torvalds, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Herbert Xu,
	Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim,
	Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

On Mon, Jul 04, 2022 at 05:49:13PM +0200, Alexander Potapenko wrote:
> This e-mail is confidential. If you received this communication by
> mistake, please don't forward it to anyone else, please erase all
> copies and attachments, and please let me know that it has gone to the
> wrong person.

This is not compatible with Linux kernel development, sorry.

Now deleted.

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 43/45] namei: initialize parameters passed to step_into()
  2022-07-04 16:03             ` Greg Kroah-Hartman
@ 2022-07-04 16:33               ` Alexander Potapenko
  0 siblings, 0 replies; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-04 16:33 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Al Viro, Linus Torvalds, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Herbert Xu,
	Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim,
	Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

On Mon, Jul 4, 2022 at 6:03 PM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Mon, Jul 04, 2022 at 05:49:13PM +0200, Alexander Potapenko wrote:
> > This e-mail is confidential. If you received this communication by
> > mistake, please don't forward it to anyone else, please erase all
> > copies and attachments, and please let me know that it has gone to the
> > wrong person.
>
> This is not compatible with Linux kernel development, sorry.
>
> Now deleted.

Sorry, I shouldn't have added those to public emails.
Apologies for the inconvenience.

-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 43/45] namei: initialize parameters passed to step_into()
  2022-07-04 16:00           ` Al Viro
@ 2022-07-04 16:47             ` Alexander Potapenko
  0 siblings, 0 replies; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-04 16:47 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

On Mon, Jul 4, 2022 at 6:00 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Mon, Jul 04, 2022 at 02:44:00PM +0100, Al Viro wrote:
> > On Mon, Jul 04, 2022 at 10:20:53AM +0200, Alexander Potapenko wrote:
> >
> > > What makes you think they are false positives? Is the scenario I
> > > described above:
> > >
> > > """
> > > In particular, if the call to lookup_fast() in walk_component()
> > > returns NULL, and lookup_slow() returns a valid dentry, then the
> > > `seq` and `inode` will remain uninitialized until the call to
> > > step_into()
> > > """
> > >
> > > impossible?
> >
> > Suppose step_into() has been called in non-RCU mode.  The first
> > thing it does is
> >       int err = handle_mounts(nd, dentry, &path, &seq);
> >       if (err < 0)
> >               return ERR_PTR(err);
> >
> > And handle_mounts() in non-RCU mode is
> >       path->mnt = nd->path.mnt;
> >       path->dentry = dentry;
> >       if (nd->flags & LOOKUP_RCU) {
> >               [unreachable code]
> >       }
> >       [code not touching seqp]
> >       if (unlikely(ret)) {
> >               [code not touching seqp]
> >       } else {
> >               *seqp = 0; /* out of RCU mode, so the value doesn't matter */
> >       }
> >       return ret;
> >
> > In other words, the value seq argument of step_into() used to have ends up
> > being never fetched and, in case step_into() gets past that if (err < 0)
> > that value is replaced with zero before any further accesses.
> >
> > So it's a false positive; yes, strictly speaking compiler is allowd
> > to do anything whatsoever if it manages to prove that the value is
> > uninitialized.  Realistically, though, especially since unsigned int
> > is not allowed any trapping representations...
>
> FWIW, update (and yet untested) branch is in #work.namei.  Compared to the
> previous, we store sampled ->d_seq of the next dentry in nd->next_seq,
> rather than bothering with local variables.  AFAICS, it ends up with
> better code that way.  And both ->seq and ->next_seq are zeroed at the
> moments when we switch to non-RCU mode (as well as non-RCU path_init()).
>
> IMO it looks saner that way.  NOTE: it still needs to be tested and probably
> reordered and massaged; it's not for merge at the moment.  Current cumulative
> diff follows:

I confirm all KMSAN reports are gone as a result of applying this patch.


-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 43/45] namei: initialize parameters passed to step_into()
  2022-07-04  2:52     ` Al Viro
  2022-07-04  8:20       ` Alexander Potapenko
@ 2022-07-04 17:36       ` Linus Torvalds
  2022-07-04 19:02         ` Al Viro
  1 sibling, 1 reply; 147+ messages in thread
From: Linus Torvalds @ 2022-07-04 17:36 UTC (permalink / raw)
  To: Al Viro
  Cc: Alexander Potapenko, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

On Sun, Jul 3, 2022 at 7:53 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> FWIW, trying to write a coherent documentation had its usual effect...
> The thing is, we don't really need to fetch the inode that early.

Hmm. I like the patch, but as I was reading through it, I had a question...

In particular, I'd like it even more if each step when the sequence
number is updated also had a comment about what then protects the
previous sequence number up to and over that new sequence point.

For example, in __follow_mount_rcu(), when we jump to a new mount
point, and that sequence has

                *seqp = read_seqcount_begin(&dentry->d_seq);

to reset the sequence number to the new path we jumped into.

But I don't actually see what checks the previous sequence number in
that path. We just reset it to the new one.

In contrast, in lookup_fast(), we get the new sequence number from
__d_lookup_rcu(), and then after getting the new one and before
"instantiating" it, we will revalidate the parent sequence number.

So lookup_fast() has that "chain of sequence numbers".

For __follow_mount_rcu it looks like validating the previous sequence
number is left to the caller, which then does try_to_unlazy_next().

So when reading this code, my reaction was that it really would have
been much nicer to have that kind of clear "handoff" of one sequence
number domain to the next that lookup_fast() has.

IOW, I think it would be lovely to clarify the sequence number handoff.

I only quickly scanned your second patch for this, it does seem to at
least collect it all into try_to_unlazy_next().

So maybe you already looked at exactly this, but it would be good to
be quite explicit about the sequence number logic because it's "a bit
opaque" to say the least.

                   Linus

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 43/45] namei: initialize parameters passed to step_into()
  2022-07-04 15:49           ` Alexander Potapenko
  2022-07-04 16:03             ` Greg Kroah-Hartman
@ 2022-07-04 18:23             ` Segher Boessenkool
  1 sibling, 0 replies; 147+ messages in thread
From: Segher Boessenkool @ 2022-07-04 18:23 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Al Viro, Linus Torvalds, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Vitaly Buka, linux-toolchains

On Mon, Jul 04, 2022 at 05:49:13PM +0200, Alexander Potapenko wrote:
> One of the reasons to do so is standard compliance - passing an
> uninitialized value to a function is UB in C11, as Segher pointed out
> here: https://lore.kernel.org/linux-toolchains/20220614214039.GA25951@gate.crashing.org/
> The compilers may not be smart enough to take advantage of this _yet_,
> but I wouldn't underestimate their ability to evolve (especially that
> of Clang).

GCC doesn't currently detect this UB, and doesn't even warn or error for
this, although that shouldn't be hard to do: it is all completely local.
An error is warranted here, and you won't get UB ever either then.

> I also believe it's fragile to rely on the callee to ignore certain
> parameters: it may be doing so today, but if someone changes
> step_into() tomorrow we may miss it.

There isn't any choice usually, this is C, do you want varargs?  :-)

But yes, you always should only pass "safe" values; callers should do
their part, and not assume the callee will do in the future as it does
now.  Defensive programming is mostly about defending your own sanity!


Segher

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 43/45] namei: initialize parameters passed to step_into()
  2022-07-04 17:36       ` Linus Torvalds
@ 2022-07-04 19:02         ` Al Viro
  2022-07-04 19:16           ` Linus Torvalds
  0 siblings, 1 reply; 147+ messages in thread
From: Al Viro @ 2022-07-04 19:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexander Potapenko, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

On Mon, Jul 04, 2022 at 10:36:05AM -0700, Linus Torvalds wrote:

> For example, in __follow_mount_rcu(), when we jump to a new mount
> point, and that sequence has
> 
>                 *seqp = read_seqcount_begin(&dentry->d_seq);
> 
> to reset the sequence number to the new path we jumped into.
> 
> But I don't actually see what checks the previous sequence number in
> that path. We just reset it to the new one.

Theoretically it could be a problem.  We have /mnt/foo/bar and
/mnt/baz/bar.  Something's mounted on /mnt/foo, hiding /mnt/foo/bar.
We start a pathwalk for /mnt/baz/bar,
someone umounts /mnt/foo and swaps /mnt/foo to /mnt/baz before
we get there.  We are doomed to get -ECHILD from an attempt to
legitimize in the end, no matter what.  However, we might get
a hard error (-ENOENT, for example) before that, if we pick up
the old mount that used to be on top of /mnt/foo (now /mnt/baz)
and had been detached before the damn thing had become /mnt/baz
and notice that there's no "bar" in its root.

It used to be impossible (rename would've failed if the target had
been non-empty and had we managed to empty it first, well, there's
your point when -ENOENT would've been accurate).  With exchange...
Yes, it's a possible race.

Might need to add
                                if (read_seqretry(&mount_lock, nd->m_seq))
					return false;
in there.  And yes, it's a nice demonstration of how subtle and
brittle RCU pathwalk is - nobody noticed this bit of fun back when
RENAME_EXCHANGE had been added...  It got a lot more readable these
days, but...

> For __follow_mount_rcu it looks like validating the previous sequence
> number is left to the caller, which then does try_to_unlazy_next().

Not really - the caller goes there only if we have __follow_mount_rcu()
say "it's too tricky for me, get out of RCU mode and deal with it
there".

Anyway, I've thrown a mount_lock check in there, running xfstests to
see how it goes...

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 43/45] namei: initialize parameters passed to step_into()
  2022-07-04 19:02         ` Al Viro
@ 2022-07-04 19:16           ` Linus Torvalds
  2022-07-04 19:55             ` Al Viro
  0 siblings, 1 reply; 147+ messages in thread
From: Linus Torvalds @ 2022-07-04 19:16 UTC (permalink / raw)
  To: Al Viro
  Cc: Alexander Potapenko, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

On Mon, Jul 4, 2022 at 12:03 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Anyway, I've thrown a mount_lock check in there, running xfstests to
> see how it goes...

So my reaction had been that it would be good to just do something like this:

  diff --git a/fs/namei.c b/fs/namei.c
  index 1f28d3f463c3..25c4bcc91142 100644
  --- a/fs/namei.c
  +++ b/fs/namei.c
  @@ -1493,11 +1493,18 @@ static bool __follow_mount_rcu(struct n...
      if (flags & DCACHE_MOUNTED) {
          struct mount *mounted = __lookup_mnt(path->mnt, dentry);
          if (mounted) {
  +           struct dentry *old_dentry = dentry;
  +           unsigned old_seq = *seqp;
  +
              path->mnt = &mounted->mnt;
              dentry = path->dentry = mounted->mnt.mnt_root;
              nd->state |= ND_JUMPED;
              *seqp = read_seqcount_begin(&dentry->d_seq);
              *inode = dentry->d_inode;
  +
  +           if (read_seqcount_retry(&old_dentry->d_seq, old_seq))
  +               return false;
  +
              /*
               * We don't need to re-check ->d_seq after this
               * ->d_inode read - there will be an RCU delay

but the above is just whitespace-damaged random monkey-scribbling by
yours truly.

More like a "shouldn't we do something like this" than a serious
patch, in other words.

IOW, it has *NOT* had a lot of real thought behind it. Purely a
"shouldn't we always clearly check the old sequence number after we've
picked up the new one?"

                   Linus

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 43/45] namei: initialize parameters passed to step_into()
  2022-07-04 19:16           ` Linus Torvalds
@ 2022-07-04 19:55             ` Al Viro
  2022-07-04 20:24               ` Linus Torvalds
  0 siblings, 1 reply; 147+ messages in thread
From: Al Viro @ 2022-07-04 19:55 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexander Potapenko, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

On Mon, Jul 04, 2022 at 12:16:24PM -0700, Linus Torvalds wrote:
> On Mon, Jul 4, 2022 at 12:03 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > Anyway, I've thrown a mount_lock check in there, running xfstests to
> > see how it goes...
> 
> So my reaction had been that it would be good to just do something like this:
> 
>   diff --git a/fs/namei.c b/fs/namei.c
>   index 1f28d3f463c3..25c4bcc91142 100644
>   --- a/fs/namei.c
>   +++ b/fs/namei.c
>   @@ -1493,11 +1493,18 @@ static bool __follow_mount_rcu(struct n...
>       if (flags & DCACHE_MOUNTED) {
>           struct mount *mounted = __lookup_mnt(path->mnt, dentry);
>           if (mounted) {
>   +           struct dentry *old_dentry = dentry;
>   +           unsigned old_seq = *seqp;
>   +
>               path->mnt = &mounted->mnt;
>               dentry = path->dentry = mounted->mnt.mnt_root;
>               nd->state |= ND_JUMPED;
>               *seqp = read_seqcount_begin(&dentry->d_seq);
>               *inode = dentry->d_inode;
>   +
>   +           if (read_seqcount_retry(&old_dentry->d_seq, old_seq))
>   +               return false;
>   +
>               /*
>                * We don't need to re-check ->d_seq after this
>                * ->d_inode read - there will be an RCU delay
> 
> but the above is just whitespace-damaged random monkey-scribbling by
> yours truly.
> 
> More like a "shouldn't we do something like this" than a serious
> patch, in other words.
> 
> IOW, it has *NOT* had a lot of real thought behind it. Purely a
> "shouldn't we always clearly check the old sequence number after we've
> picked up the new one?"

You are checking the wrong thing here.  It's really about mount_lock -
->d_seq is *not* bumped when we or attach in some namespace.  If there's
a mismatch, RCU pathwalk is doomed anyway (it'll fail any form of unlazy)
and we might as well bugger off.  If it *does* match, we know that both
mountpoint and root had been pinned since before the pathwalk, remain
pinned as of that check and had a mount connecting them all along.
IOW, if we could have arrived to this dentry at any point, we would have
gotten that dentry as the next step.

We sample into nd->m_seq in path_init() and we want it to stay unchanged
all along.  If it does, all mountpoints and roots we observe are pinned
and their association with each other is stable.

It's not dentry -> dentry, it's dentry -> mount -> dentry.  The following
would've been safe:

	find mountpoint
	sample ->d_seq
	verify whatever had lead us to mountpoint

	sample mount_lock
	find mount
	verify mountpoint's ->d_seq

	find root of mounted
	sample its ->d_seq
	verify mount_lock

Correct?  Now, note that the last step done against the value we'd sampled
in path_init() guarantees that mount hash had not changed through all of
that.  Which is to say, we can pretend that we'd found mount before ->d_seq
of mountpoint might've changed, leaving us with

	find mountpoint
	sample ->d_seq
	verify whatever had lead us to mountpoint
	find mount
	find root of mounted
	sample its ->d_seq
	verify mount_lock

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 44/45] mm: fs: initialize fsdata passed to write_begin/write_end interface
  2022-07-01 14:23 ` [PATCH v4 44/45] mm: fs: initialize fsdata passed to write_begin/write_end interface Alexander Potapenko
@ 2022-07-04 20:07   ` Matthew Wilcox
  2022-07-04 20:30     ` Al Viro
  2022-08-25 15:39     ` Alexander Potapenko
  0 siblings, 2 replies; 147+ messages in thread
From: Matthew Wilcox @ 2022-07-04 20:07 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, linux-mm, linux-arch, linux-kernel

On Fri, Jul 01, 2022 at 04:23:09PM +0200, Alexander Potapenko wrote:
> Functions implementing the a_ops->write_end() interface accept the
> `void *fsdata` parameter that is supposed to be initialized by the
> corresponding a_ops->write_begin() (which accepts `void **fsdata`).
> 
> However not all a_ops->write_begin() implementations initialize `fsdata`
> unconditionally, so it may get passed uninitialized to a_ops->write_end(),
> resulting in undefined behavior.

... wait, passing an uninitialised variable to a function *which doesn't
actually use it* is now UB?  What genius came up with that rule?  What
purpose does it serve?


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 43/45] namei: initialize parameters passed to step_into()
  2022-07-04 19:55             ` Al Viro
@ 2022-07-04 20:24               ` Linus Torvalds
  2022-07-04 20:46                 ` Al Viro
  2022-07-04 20:47                 ` [PATCH v4 43/45] namei: initialize parameters passed to step_into() Linus Torvalds
  0 siblings, 2 replies; 147+ messages in thread
From: Linus Torvalds @ 2022-07-04 20:24 UTC (permalink / raw)
  To: Al Viro
  Cc: Alexander Potapenko, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

On Mon, Jul 4, 2022 at 12:55 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> You are checking the wrong thing here.  It's really about mount_lock -
> ->d_seq is *not* bumped when we or attach in some namespace.

I think we're talking past each other.

Yes, we need to check the mount sequence lock too, because we're doing
that mount traversal.

But I think we *also* need to check the dentry sequence count, because
the dentry itself could have been moved to another parent.

The two are entirely independent, aren't they?

And the dentry sequence point check should go along with the "we're
now updating the sequence point from the old dentry to the new".

The mount point check should go around the "check dentry mount point",
but it's a separate issue from the whole "we are now jumping to a
different dentry, we should check that the previous dentry hasn't
changed".

                    Linus

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 44/45] mm: fs: initialize fsdata passed to write_begin/write_end interface
  2022-07-04 20:07   ` Matthew Wilcox
@ 2022-07-04 20:30     ` Al Viro
  2022-08-25 15:39     ` Alexander Potapenko
  1 sibling, 0 replies; 147+ messages in thread
From: Al Viro @ 2022-07-04 20:30 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Alexander Potapenko, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, linux-mm, linux-arch, linux-kernel

On Mon, Jul 04, 2022 at 09:07:43PM +0100, Matthew Wilcox wrote:
> On Fri, Jul 01, 2022 at 04:23:09PM +0200, Alexander Potapenko wrote:
> > Functions implementing the a_ops->write_end() interface accept the
> > `void *fsdata` parameter that is supposed to be initialized by the
> > corresponding a_ops->write_begin() (which accepts `void **fsdata`).
> > 
> > However not all a_ops->write_begin() implementations initialize `fsdata`
> > unconditionally, so it may get passed uninitialized to a_ops->write_end(),
> > resulting in undefined behavior.
> 
> ... wait, passing an uninitialised variable to a function *which doesn't
> actually use it* is now UB?  What genius came up with that rule?  What
> purpose does it serve?

"The value we are passing might be utter bollocks, but that way it's
obfuscated enough to confuse anyone, compiler included".

Defensive progamming, don'cha know?

I would suggest a different way to obfuscate it, though - pass const void **
and leave it for the callee to decide whether they want to dereferences it.
It is still 100% dependent upon the ->write_end() being correctly matched
with ->write_begin(), with zero assistance from the compiler, but it does
look, er, safer.  Or something.

	Of course, a clean way to handle that would be to have
->write_begin() return a partial application of foo_write_end to
whatever it wants for fsdata, to be evaluated where we would currently
call ->write_end().  _That_ could be usefully typechecked, but... we
don't have usable partial application.

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 43/45] namei: initialize parameters passed to step_into()
  2022-07-04 20:24               ` Linus Torvalds
@ 2022-07-04 20:46                 ` Al Viro
  2022-07-04 20:51                   ` Linus Torvalds
  2022-07-04 20:47                 ` [PATCH v4 43/45] namei: initialize parameters passed to step_into() Linus Torvalds
  1 sibling, 1 reply; 147+ messages in thread
From: Al Viro @ 2022-07-04 20:46 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexander Potapenko, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

On Mon, Jul 04, 2022 at 01:24:48PM -0700, Linus Torvalds wrote:
> On Mon, Jul 4, 2022 at 12:55 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > You are checking the wrong thing here.  It's really about mount_lock -
> > ->d_seq is *not* bumped when we or attach in some namespace.
> 
> I think we're talking past each other.

We might be.
 
> Yes, we need to check the mount sequence lock too, because we're doing
> that mount traversal.
> 
> But I think we *also* need to check the dentry sequence count, because
> the dentry itself could have been moved to another parent.

Why is that a problem?  It could have been moved to another parent,
but so it could after we'd crossed to the mounted and we wouldn't have
noticed (or cared).

What the chain of seqcount checks gives us is that with some timings
it would be possible to traverse that path, not that it had remained
valid through the entire pathwalk.

What I'm suggesting is to treat transition from mountpoint to mount
as happening instantly, with transition from mount to root sealed by
mount_lock check.

If that succeeds, there had been possible history in which refwalk
would have passed through the same dentry/mount/dentry and arrived
to the root dentry when it had the sampled ->d_seq value.

Sure, mountpoint might be moved since we'd reached it.  And the mount
would move with it, so we can pretend that we'd won the race and got
into the mount before it had the mountpoint had been moved.

Am I missing something fundamental about the things the sequence of
sampling and verifications gives us?  I'd always thought it's about
verifying that resulting history would be possible for a non-RCU
pathwalk with the right timings.  What am I missing?

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 43/45] namei: initialize parameters passed to step_into()
  2022-07-04 20:24               ` Linus Torvalds
  2022-07-04 20:46                 ` Al Viro
@ 2022-07-04 20:47                 ` Linus Torvalds
  1 sibling, 0 replies; 147+ messages in thread
From: Linus Torvalds @ 2022-07-04 20:47 UTC (permalink / raw)
  To: Al Viro
  Cc: Alexander Potapenko, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

On Mon, Jul 4, 2022 at 1:24 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> The mount point check should go around the "check dentry mount point",
> but it's a separate issue from the whole "we are now jumping to a
> different dentry, we should check that the previous dentry hasn't
> changed".

Maybe it doesn't really matter, because we never actually end up
dereferencing the previous dentry (exactly since we're following the
mount point on it).

It feels like the sequence point checks are basically tied to the
"we're looking at the inode that the dentry pointed to", and because
the mount-point traversal doesn't need to look at the inode, the
sequence point check also isn't done.

But it feels wrong to traverse a dentry under RCU - even if we don't
then look at the inode itself - without having verified that the
dentry is still valid.

Yes, the d_seq lock protects against the inode going away (aka
"unlink()") and that cannot happen when it's a mount-point.

But it _also_ ends up changing for __d_move() when the name of the
dentry changes.

And I think that name change is relevant even to "look up a mount
point", exactly because we used that name to look up the dentry in the
first place, so if the name is changing, we shouldn't traverse that
mount point.

But I may have just confused myself terminally here.

             Linus

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 43/45] namei: initialize parameters passed to step_into()
  2022-07-04 20:46                 ` Al Viro
@ 2022-07-04 20:51                   ` Linus Torvalds
  2022-07-04 21:04                     ` Al Viro
  0 siblings, 1 reply; 147+ messages in thread
From: Linus Torvalds @ 2022-07-04 20:51 UTC (permalink / raw)
  To: Al Viro
  Cc: Alexander Potapenko, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

On Mon, Jul 4, 2022 at 1:46 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Why is that a problem?  It could have been moved to another parent,
> but so it could after we'd crossed to the mounted and we wouldn't have
> noticed (or cared).

Yeah, see my other email.

I agree that it might be a "we don't actually care" situation, where
all we care about that the name was valid at one point (when we picked
up that sequence point). So maybe we don't care about closing it.

But even if so, I think it might warrant a comment, because I still
feel like we're basically "throwing away" our previous sequence point
information without ever checking it.

Maybe all we ever care about is basically "this sequence point
protects the dentry inode pointer for the next lookup", and when it
comes to mount points that ends up being immaterial.

             Linus

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 43/45] namei: initialize parameters passed to step_into()
  2022-07-04 20:51                   ` Linus Torvalds
@ 2022-07-04 21:04                     ` Al Viro
  2022-07-04 23:13                       ` [PATCH 1/7] __follow_mount_rcu(): verify that mount_lock remains unchanged Al Viro
  0 siblings, 1 reply; 147+ messages in thread
From: Al Viro @ 2022-07-04 21:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexander Potapenko, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

On Mon, Jul 04, 2022 at 01:51:16PM -0700, Linus Torvalds wrote:
> On Mon, Jul 4, 2022 at 1:46 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > Why is that a problem?  It could have been moved to another parent,
> > but so it could after we'd crossed to the mounted and we wouldn't have
> > noticed (or cared).
> 
> Yeah, see my other email.
> 
> I agree that it might be a "we don't actually care" situation, where
> all we care about that the name was valid at one point (when we picked
> up that sequence point). So maybe we don't care about closing it.
> 
> But even if so, I think it might warrant a comment, because I still
> feel like we're basically "throwing away" our previous sequence point
> information without ever checking it.
> 
> Maybe all we ever care about is basically "this sequence point
> protects the dentry inode pointer for the next lookup", and when it
> comes to mount points that ends up being immaterial.

	There is a problem, actually, but it's in a different place...
OK, let me try to write something resembling a formal proof and see
what falls out.

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [PATCH 1/7] __follow_mount_rcu(): verify that mount_lock remains unchanged
  2022-07-04 21:04                     ` Al Viro
@ 2022-07-04 23:13                       ` Al Viro
  2022-07-04 23:14                         ` [PATCH 2/7] follow_dotdot{,_rcu}(): change calling conventions Al Viro
                                           ` (6 more replies)
  0 siblings, 7 replies; 147+ messages in thread
From: Al Viro @ 2022-07-04 23:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexander Potapenko, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

Validate mount_lock seqcount as soon as we cross into mount in RCU
mode.  Sure, ->mnt_root is pinned and will remain so until we
do rcu_read_unlock() anyway, and we will eventually fail to unlazy if
the mount_lock had been touched, but we might run into a hard error
(e.g. -ENOENT) before trying to unlazy.  And it's possible to end
up with RCU pathwalk racing with rename() and umount() in a way
that would fail with -ENOENT while non-RCU pathwalk would've
succeeded with any timings.

Once upon a time we hadn't needed that, but analysis had been subtle,
brittle and went out of window as soon as RENAME_EXCHANGE had been
added.

It's narrow, hard to hit and won't get you anything other than
stray -ENOENT that could be arranged in much easier way with the
same priveleges, but it's a bug all the same.

Cc: stable@kernel.org
X-sky-is-falling: unlikely
Fixes: da1ce0670c14 "vfs: add cross-rename"
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/namei.c b/fs/namei.c
index 1f28d3f463c3..4dbf55b37ec6 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1505,6 +1505,8 @@ static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
 				 * becoming unpinned.
 				 */
 				flags = dentry->d_flags;
+				if (read_seqretry(&mount_lock, nd->m_seq))
+					return false;
 				continue;
 			}
 			if (read_seqretry(&mount_lock, nd->m_seq))
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH 2/7] follow_dotdot{,_rcu}(): change calling conventions
  2022-07-04 23:13                       ` [PATCH 1/7] __follow_mount_rcu(): verify that mount_lock remains unchanged Al Viro
@ 2022-07-04 23:14                         ` Al Viro
  2022-07-04 23:14                         ` [PATCH 3/7] namei: stash the sampled ->d_seq into nameidata Al Viro
                                           ` (5 subsequent siblings)
  6 siblings, 0 replies; 147+ messages in thread
From: Al Viro @ 2022-07-04 23:14 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexander Potapenko, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

Instead of returning NULL when we are in root, just make it return
the current position (and set *seqp and *inodep accordingly).
That collapses the calls of step_into() in handle_dots()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 4dbf55b37ec6..ecdb9ac21ece 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1909,7 +1909,9 @@ static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
 		return ERR_PTR(-ECHILD);
 	if (unlikely(nd->flags & LOOKUP_BENEATH))
 		return ERR_PTR(-ECHILD);
-	return NULL;
+	*seqp = nd->seq;
+	*inodep = nd->path.dentry->d_inode;
+	return nd->path.dentry;
 }
 
 static struct dentry *follow_dotdot(struct nameidata *nd,
@@ -1945,8 +1947,9 @@ static struct dentry *follow_dotdot(struct nameidata *nd,
 in_root:
 	if (unlikely(nd->flags & LOOKUP_BENEATH))
 		return ERR_PTR(-EXDEV);
-	dget(nd->path.dentry);
-	return NULL;
+	*seqp = 0;
+	*inodep = nd->path.dentry->d_inode;
+	return dget(nd->path.dentry);
 }
 
 static const char *handle_dots(struct nameidata *nd, int type)
@@ -1968,12 +1971,7 @@ static const char *handle_dots(struct nameidata *nd, int type)
 			parent = follow_dotdot(nd, &inode, &seq);
 		if (IS_ERR(parent))
 			return ERR_CAST(parent);
-		if (unlikely(!parent))
-			error = step_into(nd, WALK_NOFOLLOW,
-					 nd->path.dentry, nd->inode, nd->seq);
-		else
-			error = step_into(nd, WALK_NOFOLLOW,
-					 parent, inode, seq);
+		error = step_into(nd, WALK_NOFOLLOW, parent, inode, seq);
 		if (unlikely(error))
 			return error;
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH 3/7] namei: stash the sampled ->d_seq into nameidata
  2022-07-04 23:13                       ` [PATCH 1/7] __follow_mount_rcu(): verify that mount_lock remains unchanged Al Viro
  2022-07-04 23:14                         ` [PATCH 2/7] follow_dotdot{,_rcu}(): change calling conventions Al Viro
@ 2022-07-04 23:14                         ` Al Viro
  2022-07-04 23:15                         ` [PATCH 4/7] step_into(): lose inode argument Al Viro
                                           ` (4 subsequent siblings)
  6 siblings, 0 replies; 147+ messages in thread
From: Al Viro @ 2022-07-04 23:14 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexander Potapenko, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

New field: nd->next_seq.  Set to 0 outside of RCU mode, holds the sampled
value for the next dentry to be considered.  Used instead of an arseload
of local variables, arguments, etc.

step_into() has lost seq argument; nd->next_seq is used, so dentry passed
to it must be the one ->next_seq is about.

There are two requirements for RCU pathwalk:
	1) it should not give a hard failure (other than -ECHILD) unless
non-RCU pathwalk might fail that way given suitable timings.
	2) it should not succeed unless non-RCU pathwalk might succeed
with the same end location given suitable timings.

The use of seq numbers is the way we achieve that.  Invariant we want
to maintain is:
	if RCU pathwalk can reach the state with given nd->path, nd->inode
and nd->seq after having traversed some part of pathname, it must be possible
for non-RCU pathwalk to reach the same nd->path and nd->inode after having
traversed the same part of pathname, and observe the nd->path.dentry->d_seq
equal to what RCU pathwalk has in nd->seq

	For transition from parent to child, we sample child's ->d_seq
and verify that parent's ->d_seq remains unchanged.
	For transitions from child to parent we sample parent's ->d_seq
and verify that child's ->d_seq has not changed.
	For transition from mountpoint to root of mounted we sample
the ->d_seq of root and verify that nobody has touched mount_lock since
the beginning of pathwalk.  That guarantees that mount we'd found had
been there all along, with these mountpoint and root of mounted.
It would be possible for a non-RCU pathwalk to reach the previous state,
find the same mount and observe its root at the moment we'd sampled
->d_seq of that
	For transitions from root of mounted to mountpoint we sample
->d_seq of mountpoint and verify that mount_lock had not been touched
since the beginning of pathwalk.  The same reasoning as in the
previous case applies.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 104 ++++++++++++++++++++++++++---------------------------
 1 file changed, 52 insertions(+), 52 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index ecdb9ac21ece..c7c9e88add85 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -567,7 +567,7 @@ struct nameidata {
 	struct path	root;
 	struct inode	*inode; /* path.dentry.d_inode */
 	unsigned int	flags, state;
-	unsigned	seq, m_seq, r_seq;
+	unsigned	seq, next_seq, m_seq, r_seq;
 	int		last_type;
 	unsigned	depth;
 	int		total_link_count;
@@ -772,6 +772,7 @@ static bool try_to_unlazy(struct nameidata *nd)
 		goto out;
 	if (unlikely(!legitimize_root(nd)))
 		goto out;
+	nd->seq = nd->next_seq = 0;
 	rcu_read_unlock();
 	BUG_ON(nd->inode != parent->d_inode);
 	return true;
@@ -780,6 +781,7 @@ static bool try_to_unlazy(struct nameidata *nd)
 	nd->path.mnt = NULL;
 	nd->path.dentry = NULL;
 out:
+	nd->seq = nd->next_seq = 0;
 	rcu_read_unlock();
 	return false;
 }
@@ -788,7 +790,6 @@ static bool try_to_unlazy(struct nameidata *nd)
  * try_to_unlazy_next - try to switch to ref-walk mode.
  * @nd: nameidata pathwalk data
  * @dentry: next dentry to step into
- * @seq: seq number to check @dentry against
  * Returns: true on success, false on failure
  *
  * Similar to try_to_unlazy(), but here we have the next dentry already
@@ -797,7 +798,7 @@ static bool try_to_unlazy(struct nameidata *nd)
  * Nothing should touch nameidata between try_to_unlazy_next() failure and
  * terminate_walk().
  */
-static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry, unsigned seq)
+static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry)
 {
 	BUG_ON(!(nd->flags & LOOKUP_RCU));
 
@@ -818,7 +819,7 @@ static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry, unsi
 	 */
 	if (unlikely(!lockref_get_not_dead(&dentry->d_lockref)))
 		goto out;
-	if (unlikely(read_seqcount_retry(&dentry->d_seq, seq)))
+	if (unlikely(read_seqcount_retry(&dentry->d_seq, nd->next_seq)))
 		goto out_dput;
 	/*
 	 * Sequence counts matched. Now make sure that the root is
@@ -826,6 +827,7 @@ static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry, unsi
 	 */
 	if (unlikely(!legitimize_root(nd)))
 		goto out_dput;
+	nd->seq = nd->next_seq = 0;
 	rcu_read_unlock();
 	return true;
 
@@ -834,9 +836,11 @@ static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry, unsi
 out1:
 	nd->path.dentry = NULL;
 out:
+	nd->seq = nd->next_seq = 0;
 	rcu_read_unlock();
 	return false;
 out_dput:
+	nd->seq = nd->next_seq = 0;
 	rcu_read_unlock();
 	dput(dentry);
 	return false;
@@ -1467,7 +1471,7 @@ EXPORT_SYMBOL(follow_down);
  * we meet a managed dentry that would need blocking.
  */
 static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
-			       struct inode **inode, unsigned *seqp)
+			       struct inode **inode)
 {
 	struct dentry *dentry = path->dentry;
 	unsigned int flags = dentry->d_flags;
@@ -1496,7 +1500,7 @@ static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
 				path->mnt = &mounted->mnt;
 				dentry = path->dentry = mounted->mnt.mnt_root;
 				nd->state |= ND_JUMPED;
-				*seqp = read_seqcount_begin(&dentry->d_seq);
+				nd->next_seq = read_seqcount_begin(&dentry->d_seq);
 				*inode = dentry->d_inode;
 				/*
 				 * We don't need to re-check ->d_seq after this
@@ -1505,6 +1509,8 @@ static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
 				 * becoming unpinned.
 				 */
 				flags = dentry->d_flags;
+				// makes sure that non-RCU pathwalk could reach
+				// this state.
 				if (read_seqretry(&mount_lock, nd->m_seq))
 					return false;
 				continue;
@@ -1517,8 +1523,7 @@ static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
 }
 
 static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
-			  struct path *path, struct inode **inode,
-			  unsigned int *seqp)
+			  struct path *path, struct inode **inode)
 {
 	bool jumped;
 	int ret;
@@ -1526,16 +1531,15 @@ static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
 	path->mnt = nd->path.mnt;
 	path->dentry = dentry;
 	if (nd->flags & LOOKUP_RCU) {
-		unsigned int seq = *seqp;
-		if (unlikely(!*inode))
-			return -ENOENT;
-		if (likely(__follow_mount_rcu(nd, path, inode, seqp)))
+		unsigned int seq = nd->next_seq;
+		if (likely(__follow_mount_rcu(nd, path, inode)))
 			return 0;
-		if (!try_to_unlazy_next(nd, dentry, seq))
-			return -ECHILD;
-		// *path might've been clobbered by __follow_mount_rcu()
+		// *path and nd->next_seq might've been clobbered
 		path->mnt = nd->path.mnt;
 		path->dentry = dentry;
+		nd->next_seq = seq;
+		if (!try_to_unlazy_next(nd, dentry))
+			return -ECHILD;
 	}
 	ret = traverse_mounts(path, &jumped, &nd->total_link_count, nd->flags);
 	if (jumped) {
@@ -1550,7 +1554,6 @@ static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
 			mntput(path->mnt);
 	} else {
 		*inode = d_backing_inode(path->dentry);
-		*seqp = 0; /* out of RCU mode, so the value doesn't matter */
 	}
 	return ret;
 }
@@ -1610,8 +1613,7 @@ static struct dentry *__lookup_hash(const struct qstr *name,
 }
 
 static struct dentry *lookup_fast(struct nameidata *nd,
-				  struct inode **inode,
-			          unsigned *seqp)
+				  struct inode **inode)
 {
 	struct dentry *dentry, *parent = nd->path.dentry;
 	int status = 1;
@@ -1622,8 +1624,7 @@ static struct dentry *lookup_fast(struct nameidata *nd,
 	 * going to fall back to non-racy lookup.
 	 */
 	if (nd->flags & LOOKUP_RCU) {
-		unsigned seq;
-		dentry = __d_lookup_rcu(parent, &nd->last, &seq);
+		dentry = __d_lookup_rcu(parent, &nd->last, &nd->next_seq);
 		if (unlikely(!dentry)) {
 			if (!try_to_unlazy(nd))
 				return ERR_PTR(-ECHILD);
@@ -1635,7 +1636,7 @@ static struct dentry *lookup_fast(struct nameidata *nd,
 		 * the dentry name information from lookup.
 		 */
 		*inode = d_backing_inode(dentry);
-		if (unlikely(read_seqcount_retry(&dentry->d_seq, seq)))
+		if (unlikely(read_seqcount_retry(&dentry->d_seq, nd->next_seq)))
 			return ERR_PTR(-ECHILD);
 
 		/*
@@ -1648,11 +1649,10 @@ static struct dentry *lookup_fast(struct nameidata *nd,
 		if (unlikely(__read_seqcount_retry(&parent->d_seq, nd->seq)))
 			return ERR_PTR(-ECHILD);
 
-		*seqp = seq;
 		status = d_revalidate(dentry, nd->flags);
 		if (likely(status > 0))
 			return dentry;
-		if (!try_to_unlazy_next(nd, dentry, seq))
+		if (!try_to_unlazy_next(nd, dentry))
 			return ERR_PTR(-ECHILD);
 		if (status == -ECHILD)
 			/* we'd been told to redo it in non-rcu mode */
@@ -1733,7 +1733,7 @@ static inline int may_lookup(struct user_namespace *mnt_userns,
 	return inode_permission(mnt_userns, nd->inode, MAY_EXEC);
 }
 
-static int reserve_stack(struct nameidata *nd, struct path *link, unsigned seq)
+static int reserve_stack(struct nameidata *nd, struct path *link)
 {
 	if (unlikely(nd->total_link_count++ >= MAXSYMLINKS))
 		return -ELOOP;
@@ -1748,7 +1748,7 @@ static int reserve_stack(struct nameidata *nd, struct path *link, unsigned seq)
 	if (nd->flags & LOOKUP_RCU) {
 		// we need to grab link before we do unlazy.  And we can't skip
 		// unlazy even if we fail to grab the link - cleanup needs it
-		bool grabbed_link = legitimize_path(nd, link, seq);
+		bool grabbed_link = legitimize_path(nd, link, nd->next_seq);
 
 		if (!try_to_unlazy(nd) || !grabbed_link)
 			return -ECHILD;
@@ -1762,11 +1762,11 @@ static int reserve_stack(struct nameidata *nd, struct path *link, unsigned seq)
 enum {WALK_TRAILING = 1, WALK_MORE = 2, WALK_NOFOLLOW = 4};
 
 static const char *pick_link(struct nameidata *nd, struct path *link,
-		     struct inode *inode, unsigned seq, int flags)
+		     struct inode *inode, int flags)
 {
 	struct saved *last;
 	const char *res;
-	int error = reserve_stack(nd, link, seq);
+	int error = reserve_stack(nd, link);
 
 	if (unlikely(error)) {
 		if (!(nd->flags & LOOKUP_RCU))
@@ -1776,7 +1776,7 @@ static const char *pick_link(struct nameidata *nd, struct path *link,
 	last = nd->stack + nd->depth++;
 	last->link = *link;
 	clear_delayed_call(&last->done);
-	last->seq = seq;
+	last->seq = nd->next_seq;
 
 	if (flags & WALK_TRAILING) {
 		error = may_follow_link(nd, inode);
@@ -1838,12 +1838,14 @@ static const char *pick_link(struct nameidata *nd, struct path *link,
  * to do this check without having to look at inode->i_op,
  * so we keep a cache of "no, this doesn't need follow_link"
  * for the common case.
+ *
+ * NOTE: dentry must be what nd->next_seq had been sampled from.
  */
 static const char *step_into(struct nameidata *nd, int flags,
-		     struct dentry *dentry, struct inode *inode, unsigned seq)
+		     struct dentry *dentry, struct inode *inode)
 {
 	struct path path;
-	int err = handle_mounts(nd, dentry, &path, &inode, &seq);
+	int err = handle_mounts(nd, dentry, &path, &inode);
 
 	if (err < 0)
 		return ERR_PTR(err);
@@ -1858,23 +1860,22 @@ static const char *step_into(struct nameidata *nd, int flags,
 		}
 		nd->path = path;
 		nd->inode = inode;
-		nd->seq = seq;
+		nd->seq = nd->next_seq;
 		return NULL;
 	}
 	if (nd->flags & LOOKUP_RCU) {
 		/* make sure that d_is_symlink above matches inode */
-		if (read_seqcount_retry(&path.dentry->d_seq, seq))
+		if (read_seqcount_retry(&path.dentry->d_seq, nd->next_seq))
 			return ERR_PTR(-ECHILD);
 	} else {
 		if (path.mnt == nd->path.mnt)
 			mntget(path.mnt);
 	}
-	return pick_link(nd, &path, inode, seq, flags);
+	return pick_link(nd, &path, inode, flags);
 }
 
 static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
-					struct inode **inodep,
-					unsigned *seqp)
+					struct inode **inodep)
 {
 	struct dentry *parent, *old;
 
@@ -1891,6 +1892,7 @@ static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
 		nd->path = path;
 		nd->inode = path.dentry->d_inode;
 		nd->seq = seq;
+		// makes sure that non-RCU pathwalk could reach this state
 		if (unlikely(read_seqretry(&mount_lock, nd->m_seq)))
 			return ERR_PTR(-ECHILD);
 		/* we know that mountpoint was pinned */
@@ -1898,7 +1900,8 @@ static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
 	old = nd->path.dentry;
 	parent = old->d_parent;
 	*inodep = parent->d_inode;
-	*seqp = read_seqcount_begin(&parent->d_seq);
+	nd->next_seq = read_seqcount_begin(&parent->d_seq);
+	// makes sure that non-RCU pathwalk could reach this state
 	if (unlikely(read_seqcount_retry(&old->d_seq, nd->seq)))
 		return ERR_PTR(-ECHILD);
 	if (unlikely(!path_connected(nd->path.mnt, parent)))
@@ -1909,14 +1912,13 @@ static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
 		return ERR_PTR(-ECHILD);
 	if (unlikely(nd->flags & LOOKUP_BENEATH))
 		return ERR_PTR(-ECHILD);
-	*seqp = nd->seq;
+	nd->next_seq = nd->seq;
 	*inodep = nd->path.dentry->d_inode;
 	return nd->path.dentry;
 }
 
 static struct dentry *follow_dotdot(struct nameidata *nd,
-				 struct inode **inodep,
-				 unsigned *seqp)
+				 struct inode **inodep)
 {
 	struct dentry *parent;
 
@@ -1940,14 +1942,12 @@ static struct dentry *follow_dotdot(struct nameidata *nd,
 		dput(parent);
 		return ERR_PTR(-ENOENT);
 	}
-	*seqp = 0;
 	*inodep = parent->d_inode;
 	return parent;
 
 in_root:
 	if (unlikely(nd->flags & LOOKUP_BENEATH))
 		return ERR_PTR(-EXDEV);
-	*seqp = 0;
 	*inodep = nd->path.dentry->d_inode;
 	return dget(nd->path.dentry);
 }
@@ -1958,7 +1958,6 @@ static const char *handle_dots(struct nameidata *nd, int type)
 		const char *error = NULL;
 		struct dentry *parent;
 		struct inode *inode;
-		unsigned seq;
 
 		if (!nd->root.mnt) {
 			error = ERR_PTR(set_root(nd));
@@ -1966,12 +1965,12 @@ static const char *handle_dots(struct nameidata *nd, int type)
 				return error;
 		}
 		if (nd->flags & LOOKUP_RCU)
-			parent = follow_dotdot_rcu(nd, &inode, &seq);
+			parent = follow_dotdot_rcu(nd, &inode);
 		else
-			parent = follow_dotdot(nd, &inode, &seq);
+			parent = follow_dotdot(nd, &inode);
 		if (IS_ERR(parent))
 			return ERR_CAST(parent);
-		error = step_into(nd, WALK_NOFOLLOW, parent, inode, seq);
+		error = step_into(nd, WALK_NOFOLLOW, parent, inode);
 		if (unlikely(error))
 			return error;
 
@@ -1996,7 +1995,6 @@ static const char *walk_component(struct nameidata *nd, int flags)
 {
 	struct dentry *dentry;
 	struct inode *inode;
-	unsigned seq;
 	/*
 	 * "." and ".." are special - ".." especially so because it has
 	 * to be able to know about the current root directory and
@@ -2007,7 +2005,7 @@ static const char *walk_component(struct nameidata *nd, int flags)
 			put_link(nd);
 		return handle_dots(nd, nd->last_type);
 	}
-	dentry = lookup_fast(nd, &inode, &seq);
+	dentry = lookup_fast(nd, &inode);
 	if (IS_ERR(dentry))
 		return ERR_CAST(dentry);
 	if (unlikely(!dentry)) {
@@ -2017,7 +2015,7 @@ static const char *walk_component(struct nameidata *nd, int flags)
 	}
 	if (!(flags & WALK_MORE) && nd->depth)
 		put_link(nd);
-	return step_into(nd, flags, dentry, inode, seq);
+	return step_into(nd, flags, dentry, inode);
 }
 
 /*
@@ -2372,6 +2370,8 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
 		flags &= ~LOOKUP_RCU;
 	if (flags & LOOKUP_RCU)
 		rcu_read_lock();
+	else
+		nd->seq = nd->next_seq = 0;
 
 	nd->flags = flags;
 	nd->state |= ND_JUMPED;
@@ -2473,8 +2473,9 @@ static int handle_lookup_down(struct nameidata *nd)
 {
 	if (!(nd->flags & LOOKUP_RCU))
 		dget(nd->path.dentry);
+	nd->next_seq = nd->seq;
 	return PTR_ERR(step_into(nd, WALK_NOFOLLOW,
-			nd->path.dentry, nd->inode, nd->seq));
+			nd->path.dentry, nd->inode));
 }
 
 /* Returns 0 and nd will be valid on success; Retuns error, otherwise. */
@@ -3393,7 +3394,6 @@ static const char *open_last_lookups(struct nameidata *nd,
 	struct dentry *dir = nd->path.dentry;
 	int open_flag = op->open_flag;
 	bool got_write = false;
-	unsigned seq;
 	struct inode *inode;
 	struct dentry *dentry;
 	const char *res;
@@ -3410,7 +3410,7 @@ static const char *open_last_lookups(struct nameidata *nd,
 		if (nd->last.name[nd->last.len])
 			nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
 		/* we _can_ be in RCU mode here */
-		dentry = lookup_fast(nd, &inode, &seq);
+		dentry = lookup_fast(nd, &inode);
 		if (IS_ERR(dentry))
 			return ERR_CAST(dentry);
 		if (likely(dentry))
@@ -3464,7 +3464,7 @@ static const char *open_last_lookups(struct nameidata *nd,
 finish_lookup:
 	if (nd->depth)
 		put_link(nd);
-	res = step_into(nd, WALK_TRAILING, dentry, inode, seq);
+	res = step_into(nd, WALK_TRAILING, dentry, inode);
 	if (unlikely(res))
 		nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
 	return res;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH 4/7] step_into(): lose inode argument
  2022-07-04 23:13                       ` [PATCH 1/7] __follow_mount_rcu(): verify that mount_lock remains unchanged Al Viro
  2022-07-04 23:14                         ` [PATCH 2/7] follow_dotdot{,_rcu}(): change calling conventions Al Viro
  2022-07-04 23:14                         ` [PATCH 3/7] namei: stash the sampled ->d_seq into nameidata Al Viro
@ 2022-07-04 23:15                         ` Al Viro
  2022-07-04 23:15                         ` [PATCH 5/7] follow_dotdot{,_rcu}(): don't bother with inode Al Viro
                                           ` (3 subsequent siblings)
  6 siblings, 0 replies; 147+ messages in thread
From: Al Viro @ 2022-07-04 23:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexander Potapenko, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

make handle_mounts() always fetch it.  This is just the first step -
the callers of step_into() will stop trying to calculate the sucker,
etc.

The passed value should be equal to dentry->d_inode in all cases;
in RCU mode - fetched after we'd sampled ->d_seq.  Might as well
fetch it here.  We do need to validate ->d_seq, which duplicates
the check currently done in lookup_fast(); that duplication will
go away shortly.

After that change handle_mounts() always ignores the initial value of
*inode and always sets it on success.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index c7c9e88add85..dddbebf92b48 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1532,6 +1532,11 @@ static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
 	path->dentry = dentry;
 	if (nd->flags & LOOKUP_RCU) {
 		unsigned int seq = nd->next_seq;
+		*inode = dentry->d_inode;
+		if (read_seqcount_retry(&dentry->d_seq, seq))
+			return -ECHILD;
+		if (unlikely(!*inode))
+			return -ENOENT;
 		if (likely(__follow_mount_rcu(nd, path, inode)))
 			return 0;
 		// *path and nd->next_seq might've been clobbered
@@ -1842,9 +1847,10 @@ static const char *pick_link(struct nameidata *nd, struct path *link,
  * NOTE: dentry must be what nd->next_seq had been sampled from.
  */
 static const char *step_into(struct nameidata *nd, int flags,
-		     struct dentry *dentry, struct inode *inode)
+		     struct dentry *dentry)
 {
 	struct path path;
+	struct inode *inode;
 	int err = handle_mounts(nd, dentry, &path, &inode);
 
 	if (err < 0)
@@ -1970,7 +1976,7 @@ static const char *handle_dots(struct nameidata *nd, int type)
 			parent = follow_dotdot(nd, &inode);
 		if (IS_ERR(parent))
 			return ERR_CAST(parent);
-		error = step_into(nd, WALK_NOFOLLOW, parent, inode);
+		error = step_into(nd, WALK_NOFOLLOW, parent);
 		if (unlikely(error))
 			return error;
 
@@ -2015,7 +2021,7 @@ static const char *walk_component(struct nameidata *nd, int flags)
 	}
 	if (!(flags & WALK_MORE) && nd->depth)
 		put_link(nd);
-	return step_into(nd, flags, dentry, inode);
+	return step_into(nd, flags, dentry);
 }
 
 /*
@@ -2474,8 +2480,7 @@ static int handle_lookup_down(struct nameidata *nd)
 	if (!(nd->flags & LOOKUP_RCU))
 		dget(nd->path.dentry);
 	nd->next_seq = nd->seq;
-	return PTR_ERR(step_into(nd, WALK_NOFOLLOW,
-			nd->path.dentry, nd->inode));
+	return PTR_ERR(step_into(nd, WALK_NOFOLLOW, nd->path.dentry));
 }
 
 /* Returns 0 and nd will be valid on success; Retuns error, otherwise. */
@@ -3464,7 +3469,7 @@ static const char *open_last_lookups(struct nameidata *nd,
 finish_lookup:
 	if (nd->depth)
 		put_link(nd);
-	res = step_into(nd, WALK_TRAILING, dentry, inode);
+	res = step_into(nd, WALK_TRAILING, dentry);
 	if (unlikely(res))
 		nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
 	return res;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH 5/7] follow_dotdot{,_rcu}(): don't bother with inode
  2022-07-04 23:13                       ` [PATCH 1/7] __follow_mount_rcu(): verify that mount_lock remains unchanged Al Viro
                                           ` (2 preceding siblings ...)
  2022-07-04 23:15                         ` [PATCH 4/7] step_into(): lose inode argument Al Viro
@ 2022-07-04 23:15                         ` Al Viro
  2022-07-04 23:16                         ` [PATCH 6/7] lookup_fast(): " Al Viro
                                           ` (2 subsequent siblings)
  6 siblings, 0 replies; 147+ messages in thread
From: Al Viro @ 2022-07-04 23:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexander Potapenko, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

step_into() will fetch it, TYVM.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 15 ++++-----------
 1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index dddbebf92b48..fe95fe39634c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1880,8 +1880,7 @@ static const char *step_into(struct nameidata *nd, int flags,
 	return pick_link(nd, &path, inode, flags);
 }
 
-static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
-					struct inode **inodep)
+static struct dentry *follow_dotdot_rcu(struct nameidata *nd)
 {
 	struct dentry *parent, *old;
 
@@ -1905,7 +1904,6 @@ static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
 	}
 	old = nd->path.dentry;
 	parent = old->d_parent;
-	*inodep = parent->d_inode;
 	nd->next_seq = read_seqcount_begin(&parent->d_seq);
 	// makes sure that non-RCU pathwalk could reach this state
 	if (unlikely(read_seqcount_retry(&old->d_seq, nd->seq)))
@@ -1919,12 +1917,10 @@ static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
 	if (unlikely(nd->flags & LOOKUP_BENEATH))
 		return ERR_PTR(-ECHILD);
 	nd->next_seq = nd->seq;
-	*inodep = nd->path.dentry->d_inode;
 	return nd->path.dentry;
 }
 
-static struct dentry *follow_dotdot(struct nameidata *nd,
-				 struct inode **inodep)
+static struct dentry *follow_dotdot(struct nameidata *nd)
 {
 	struct dentry *parent;
 
@@ -1948,13 +1944,11 @@ static struct dentry *follow_dotdot(struct nameidata *nd,
 		dput(parent);
 		return ERR_PTR(-ENOENT);
 	}
-	*inodep = parent->d_inode;
 	return parent;
 
 in_root:
 	if (unlikely(nd->flags & LOOKUP_BENEATH))
 		return ERR_PTR(-EXDEV);
-	*inodep = nd->path.dentry->d_inode;
 	return dget(nd->path.dentry);
 }
 
@@ -1963,7 +1957,6 @@ static const char *handle_dots(struct nameidata *nd, int type)
 	if (type == LAST_DOTDOT) {
 		const char *error = NULL;
 		struct dentry *parent;
-		struct inode *inode;
 
 		if (!nd->root.mnt) {
 			error = ERR_PTR(set_root(nd));
@@ -1971,9 +1964,9 @@ static const char *handle_dots(struct nameidata *nd, int type)
 				return error;
 		}
 		if (nd->flags & LOOKUP_RCU)
-			parent = follow_dotdot_rcu(nd, &inode);
+			parent = follow_dotdot_rcu(nd);
 		else
-			parent = follow_dotdot(nd, &inode);
+			parent = follow_dotdot(nd);
 		if (IS_ERR(parent))
 			return ERR_CAST(parent);
 		error = step_into(nd, WALK_NOFOLLOW, parent);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH 6/7] lookup_fast(): don't bother with inode
  2022-07-04 23:13                       ` [PATCH 1/7] __follow_mount_rcu(): verify that mount_lock remains unchanged Al Viro
                                           ` (3 preceding siblings ...)
  2022-07-04 23:15                         ` [PATCH 5/7] follow_dotdot{,_rcu}(): don't bother with inode Al Viro
@ 2022-07-04 23:16                         ` Al Viro
  2022-07-04 23:17                         ` [PATCH 7/7] step_into(): move fetching ->d_inode past handle_mounts() Al Viro
  2022-07-04 23:19                         ` [PATCH 1/7] __follow_mount_rcu(): verify that mount_lock remains unchanged Al Viro
  6 siblings, 0 replies; 147+ messages in thread
From: Al Viro @ 2022-07-04 23:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexander Potapenko, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

Note that validation of ->d_seq after ->d_inode fetch is gone, along
with fetching of ->d_inode itself.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 24 +++++-------------------
 1 file changed, 5 insertions(+), 19 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index fe95fe39634c..cdb61d09df79 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1617,8 +1617,7 @@ static struct dentry *__lookup_hash(const struct qstr *name,
 	return dentry;
 }
 
-static struct dentry *lookup_fast(struct nameidata *nd,
-				  struct inode **inode)
+static struct dentry *lookup_fast(struct nameidata *nd)
 {
 	struct dentry *dentry, *parent = nd->path.dentry;
 	int status = 1;
@@ -1636,22 +1635,11 @@ static struct dentry *lookup_fast(struct nameidata *nd,
 			return NULL;
 		}
 
-		/*
-		 * This sequence count validates that the inode matches
-		 * the dentry name information from lookup.
-		 */
-		*inode = d_backing_inode(dentry);
-		if (unlikely(read_seqcount_retry(&dentry->d_seq, nd->next_seq)))
-			return ERR_PTR(-ECHILD);
-
-		/*
+	        /*
 		 * This sequence count validates that the parent had no
 		 * changes while we did the lookup of the dentry above.
-		 *
-		 * The memory barrier in read_seqcount_begin of child is
-		 *  enough, we can use __read_seqcount_retry here.
 		 */
-		if (unlikely(__read_seqcount_retry(&parent->d_seq, nd->seq)))
+		if (unlikely(read_seqcount_retry(&parent->d_seq, nd->seq)))
 			return ERR_PTR(-ECHILD);
 
 		status = d_revalidate(dentry, nd->flags);
@@ -1993,7 +1981,6 @@ static const char *handle_dots(struct nameidata *nd, int type)
 static const char *walk_component(struct nameidata *nd, int flags)
 {
 	struct dentry *dentry;
-	struct inode *inode;
 	/*
 	 * "." and ".." are special - ".." especially so because it has
 	 * to be able to know about the current root directory and
@@ -2004,7 +1991,7 @@ static const char *walk_component(struct nameidata *nd, int flags)
 			put_link(nd);
 		return handle_dots(nd, nd->last_type);
 	}
-	dentry = lookup_fast(nd, &inode);
+	dentry = lookup_fast(nd);
 	if (IS_ERR(dentry))
 		return ERR_CAST(dentry);
 	if (unlikely(!dentry)) {
@@ -3392,7 +3379,6 @@ static const char *open_last_lookups(struct nameidata *nd,
 	struct dentry *dir = nd->path.dentry;
 	int open_flag = op->open_flag;
 	bool got_write = false;
-	struct inode *inode;
 	struct dentry *dentry;
 	const char *res;
 
@@ -3408,7 +3394,7 @@ static const char *open_last_lookups(struct nameidata *nd,
 		if (nd->last.name[nd->last.len])
 			nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
 		/* we _can_ be in RCU mode here */
-		dentry = lookup_fast(nd, &inode);
+		dentry = lookup_fast(nd);
 		if (IS_ERR(dentry))
 			return ERR_CAST(dentry);
 		if (likely(dentry))
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH 7/7] step_into(): move fetching ->d_inode past handle_mounts()
  2022-07-04 23:13                       ` [PATCH 1/7] __follow_mount_rcu(): verify that mount_lock remains unchanged Al Viro
                                           ` (4 preceding siblings ...)
  2022-07-04 23:16                         ` [PATCH 6/7] lookup_fast(): " Al Viro
@ 2022-07-04 23:17                         ` Al Viro
  2022-07-04 23:19                         ` [PATCH 1/7] __follow_mount_rcu(): verify that mount_lock remains unchanged Al Viro
  6 siblings, 0 replies; 147+ messages in thread
From: Al Viro @ 2022-07-04 23:17 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexander Potapenko, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

... and lose messing with it in __follow_mount_rcu()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 31 +++++++++++--------------------
 1 file changed, 11 insertions(+), 20 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index cdb61d09df79..f2c99e75b578 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1470,8 +1470,7 @@ EXPORT_SYMBOL(follow_down);
  * Try to skip to top of mountpoint pile in rcuwalk mode.  Fail if
  * we meet a managed dentry that would need blocking.
  */
-static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
-			       struct inode **inode)
+static bool __follow_mount_rcu(struct nameidata *nd, struct path *path)
 {
 	struct dentry *dentry = path->dentry;
 	unsigned int flags = dentry->d_flags;
@@ -1501,13 +1500,6 @@ static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
 				dentry = path->dentry = mounted->mnt.mnt_root;
 				nd->state |= ND_JUMPED;
 				nd->next_seq = read_seqcount_begin(&dentry->d_seq);
-				*inode = dentry->d_inode;
-				/*
-				 * We don't need to re-check ->d_seq after this
-				 * ->d_inode read - there will be an RCU delay
-				 * between mount hash removal and ->mnt_root
-				 * becoming unpinned.
-				 */
 				flags = dentry->d_flags;
 				// makes sure that non-RCU pathwalk could reach
 				// this state.
@@ -1523,7 +1515,7 @@ static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
 }
 
 static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
-			  struct path *path, struct inode **inode)
+			  struct path *path)
 {
 	bool jumped;
 	int ret;
@@ -1532,12 +1524,7 @@ static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
 	path->dentry = dentry;
 	if (nd->flags & LOOKUP_RCU) {
 		unsigned int seq = nd->next_seq;
-		*inode = dentry->d_inode;
-		if (read_seqcount_retry(&dentry->d_seq, seq))
-			return -ECHILD;
-		if (unlikely(!*inode))
-			return -ENOENT;
-		if (likely(__follow_mount_rcu(nd, path, inode)))
+		if (likely(__follow_mount_rcu(nd, path)))
 			return 0;
 		// *path and nd->next_seq might've been clobbered
 		path->mnt = nd->path.mnt;
@@ -1557,8 +1544,6 @@ static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
 		dput(path->dentry);
 		if (path->mnt != nd->path.mnt)
 			mntput(path->mnt);
-	} else {
-		*inode = d_backing_inode(path->dentry);
 	}
 	return ret;
 }
@@ -1839,15 +1824,21 @@ static const char *step_into(struct nameidata *nd, int flags,
 {
 	struct path path;
 	struct inode *inode;
-	int err = handle_mounts(nd, dentry, &path, &inode);
+	int err = handle_mounts(nd, dentry, &path);
 
 	if (err < 0)
 		return ERR_PTR(err);
+	inode = path.dentry->d_inode;
 	if (likely(!d_is_symlink(path.dentry)) ||
 	   ((flags & WALK_TRAILING) && !(nd->flags & LOOKUP_FOLLOW)) ||
 	   (flags & WALK_NOFOLLOW)) {
 		/* not a symlink or should not follow */
-		if (!(nd->flags & LOOKUP_RCU)) {
+		if (nd->flags & LOOKUP_RCU) {
+			if (read_seqcount_retry(&path.dentry->d_seq, nd->next_seq))
+				return ERR_PTR(-ECHILD);
+			if (unlikely(!inode))
+				return ERR_PTR(-ENOENT);
+		} else {
 			dput(nd->path.dentry);
 			if (nd->path.mnt != path.mnt)
 				mntput(nd->path.mnt);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* Re: [PATCH 1/7] __follow_mount_rcu(): verify that mount_lock remains unchanged
  2022-07-04 23:13                       ` [PATCH 1/7] __follow_mount_rcu(): verify that mount_lock remains unchanged Al Viro
                                           ` (5 preceding siblings ...)
  2022-07-04 23:17                         ` [PATCH 7/7] step_into(): move fetching ->d_inode past handle_mounts() Al Viro
@ 2022-07-04 23:19                         ` Al Viro
  2022-07-05  0:06                           ` Linus Torvalds
  6 siblings, 1 reply; 147+ messages in thread
From: Al Viro @ 2022-07-04 23:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexander Potapenko, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

	Just in case - this series is RFC only; it's very lightly
tested.  Might be carved into too many steps, at that.  Cumulative
delta follows; might or might not be more convenient for review.

diff --git a/fs/namei.c b/fs/namei.c
index 1f28d3f463c3..f2c99e75b578 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -567,7 +567,7 @@ struct nameidata {
 	struct path	root;
 	struct inode	*inode; /* path.dentry.d_inode */
 	unsigned int	flags, state;
-	unsigned	seq, m_seq, r_seq;
+	unsigned	seq, next_seq, m_seq, r_seq;
 	int		last_type;
 	unsigned	depth;
 	int		total_link_count;
@@ -772,6 +772,7 @@ static bool try_to_unlazy(struct nameidata *nd)
 		goto out;
 	if (unlikely(!legitimize_root(nd)))
 		goto out;
+	nd->seq = nd->next_seq = 0;
 	rcu_read_unlock();
 	BUG_ON(nd->inode != parent->d_inode);
 	return true;
@@ -780,6 +781,7 @@ static bool try_to_unlazy(struct nameidata *nd)
 	nd->path.mnt = NULL;
 	nd->path.dentry = NULL;
 out:
+	nd->seq = nd->next_seq = 0;
 	rcu_read_unlock();
 	return false;
 }
@@ -788,7 +790,6 @@ static bool try_to_unlazy(struct nameidata *nd)
  * try_to_unlazy_next - try to switch to ref-walk mode.
  * @nd: nameidata pathwalk data
  * @dentry: next dentry to step into
- * @seq: seq number to check @dentry against
  * Returns: true on success, false on failure
  *
  * Similar to try_to_unlazy(), but here we have the next dentry already
@@ -797,7 +798,7 @@ static bool try_to_unlazy(struct nameidata *nd)
  * Nothing should touch nameidata between try_to_unlazy_next() failure and
  * terminate_walk().
  */
-static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry, unsigned seq)
+static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry)
 {
 	BUG_ON(!(nd->flags & LOOKUP_RCU));
 
@@ -818,7 +819,7 @@ static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry, unsi
 	 */
 	if (unlikely(!lockref_get_not_dead(&dentry->d_lockref)))
 		goto out;
-	if (unlikely(read_seqcount_retry(&dentry->d_seq, seq)))
+	if (unlikely(read_seqcount_retry(&dentry->d_seq, nd->next_seq)))
 		goto out_dput;
 	/*
 	 * Sequence counts matched. Now make sure that the root is
@@ -826,6 +827,7 @@ static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry, unsi
 	 */
 	if (unlikely(!legitimize_root(nd)))
 		goto out_dput;
+	nd->seq = nd->next_seq = 0;
 	rcu_read_unlock();
 	return true;
 
@@ -834,9 +836,11 @@ static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry, unsi
 out1:
 	nd->path.dentry = NULL;
 out:
+	nd->seq = nd->next_seq = 0;
 	rcu_read_unlock();
 	return false;
 out_dput:
+	nd->seq = nd->next_seq = 0;
 	rcu_read_unlock();
 	dput(dentry);
 	return false;
@@ -1466,8 +1470,7 @@ EXPORT_SYMBOL(follow_down);
  * Try to skip to top of mountpoint pile in rcuwalk mode.  Fail if
  * we meet a managed dentry that would need blocking.
  */
-static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
-			       struct inode **inode, unsigned *seqp)
+static bool __follow_mount_rcu(struct nameidata *nd, struct path *path)
 {
 	struct dentry *dentry = path->dentry;
 	unsigned int flags = dentry->d_flags;
@@ -1496,15 +1499,12 @@ static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
 				path->mnt = &mounted->mnt;
 				dentry = path->dentry = mounted->mnt.mnt_root;
 				nd->state |= ND_JUMPED;
-				*seqp = read_seqcount_begin(&dentry->d_seq);
-				*inode = dentry->d_inode;
-				/*
-				 * We don't need to re-check ->d_seq after this
-				 * ->d_inode read - there will be an RCU delay
-				 * between mount hash removal and ->mnt_root
-				 * becoming unpinned.
-				 */
+				nd->next_seq = read_seqcount_begin(&dentry->d_seq);
 				flags = dentry->d_flags;
+				// makes sure that non-RCU pathwalk could reach
+				// this state.
+				if (read_seqretry(&mount_lock, nd->m_seq))
+					return false;
 				continue;
 			}
 			if (read_seqretry(&mount_lock, nd->m_seq))
@@ -1515,8 +1515,7 @@ static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
 }
 
 static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
-			  struct path *path, struct inode **inode,
-			  unsigned int *seqp)
+			  struct path *path)
 {
 	bool jumped;
 	int ret;
@@ -1524,16 +1523,15 @@ static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
 	path->mnt = nd->path.mnt;
 	path->dentry = dentry;
 	if (nd->flags & LOOKUP_RCU) {
-		unsigned int seq = *seqp;
-		if (unlikely(!*inode))
-			return -ENOENT;
-		if (likely(__follow_mount_rcu(nd, path, inode, seqp)))
+		unsigned int seq = nd->next_seq;
+		if (likely(__follow_mount_rcu(nd, path)))
 			return 0;
-		if (!try_to_unlazy_next(nd, dentry, seq))
-			return -ECHILD;
-		// *path might've been clobbered by __follow_mount_rcu()
+		// *path and nd->next_seq might've been clobbered
 		path->mnt = nd->path.mnt;
 		path->dentry = dentry;
+		nd->next_seq = seq;
+		if (!try_to_unlazy_next(nd, dentry))
+			return -ECHILD;
 	}
 	ret = traverse_mounts(path, &jumped, &nd->total_link_count, nd->flags);
 	if (jumped) {
@@ -1546,9 +1544,6 @@ static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
 		dput(path->dentry);
 		if (path->mnt != nd->path.mnt)
 			mntput(path->mnt);
-	} else {
-		*inode = d_backing_inode(path->dentry);
-		*seqp = 0; /* out of RCU mode, so the value doesn't matter */
 	}
 	return ret;
 }
@@ -1607,9 +1602,7 @@ static struct dentry *__lookup_hash(const struct qstr *name,
 	return dentry;
 }
 
-static struct dentry *lookup_fast(struct nameidata *nd,
-				  struct inode **inode,
-			          unsigned *seqp)
+static struct dentry *lookup_fast(struct nameidata *nd)
 {
 	struct dentry *dentry, *parent = nd->path.dentry;
 	int status = 1;
@@ -1620,37 +1613,24 @@ static struct dentry *lookup_fast(struct nameidata *nd,
 	 * going to fall back to non-racy lookup.
 	 */
 	if (nd->flags & LOOKUP_RCU) {
-		unsigned seq;
-		dentry = __d_lookup_rcu(parent, &nd->last, &seq);
+		dentry = __d_lookup_rcu(parent, &nd->last, &nd->next_seq);
 		if (unlikely(!dentry)) {
 			if (!try_to_unlazy(nd))
 				return ERR_PTR(-ECHILD);
 			return NULL;
 		}
 
-		/*
-		 * This sequence count validates that the inode matches
-		 * the dentry name information from lookup.
-		 */
-		*inode = d_backing_inode(dentry);
-		if (unlikely(read_seqcount_retry(&dentry->d_seq, seq)))
-			return ERR_PTR(-ECHILD);
-
-		/*
+	        /*
 		 * This sequence count validates that the parent had no
 		 * changes while we did the lookup of the dentry above.
-		 *
-		 * The memory barrier in read_seqcount_begin of child is
-		 *  enough, we can use __read_seqcount_retry here.
 		 */
-		if (unlikely(__read_seqcount_retry(&parent->d_seq, nd->seq)))
+		if (unlikely(read_seqcount_retry(&parent->d_seq, nd->seq)))
 			return ERR_PTR(-ECHILD);
 
-		*seqp = seq;
 		status = d_revalidate(dentry, nd->flags);
 		if (likely(status > 0))
 			return dentry;
-		if (!try_to_unlazy_next(nd, dentry, seq))
+		if (!try_to_unlazy_next(nd, dentry))
 			return ERR_PTR(-ECHILD);
 		if (status == -ECHILD)
 			/* we'd been told to redo it in non-rcu mode */
@@ -1731,7 +1711,7 @@ static inline int may_lookup(struct user_namespace *mnt_userns,
 	return inode_permission(mnt_userns, nd->inode, MAY_EXEC);
 }
 
-static int reserve_stack(struct nameidata *nd, struct path *link, unsigned seq)
+static int reserve_stack(struct nameidata *nd, struct path *link)
 {
 	if (unlikely(nd->total_link_count++ >= MAXSYMLINKS))
 		return -ELOOP;
@@ -1746,7 +1726,7 @@ static int reserve_stack(struct nameidata *nd, struct path *link, unsigned seq)
 	if (nd->flags & LOOKUP_RCU) {
 		// we need to grab link before we do unlazy.  And we can't skip
 		// unlazy even if we fail to grab the link - cleanup needs it
-		bool grabbed_link = legitimize_path(nd, link, seq);
+		bool grabbed_link = legitimize_path(nd, link, nd->next_seq);
 
 		if (!try_to_unlazy(nd) || !grabbed_link)
 			return -ECHILD;
@@ -1760,11 +1740,11 @@ static int reserve_stack(struct nameidata *nd, struct path *link, unsigned seq)
 enum {WALK_TRAILING = 1, WALK_MORE = 2, WALK_NOFOLLOW = 4};
 
 static const char *pick_link(struct nameidata *nd, struct path *link,
-		     struct inode *inode, unsigned seq, int flags)
+		     struct inode *inode, int flags)
 {
 	struct saved *last;
 	const char *res;
-	int error = reserve_stack(nd, link, seq);
+	int error = reserve_stack(nd, link);
 
 	if (unlikely(error)) {
 		if (!(nd->flags & LOOKUP_RCU))
@@ -1774,7 +1754,7 @@ static const char *pick_link(struct nameidata *nd, struct path *link,
 	last = nd->stack + nd->depth++;
 	last->link = *link;
 	clear_delayed_call(&last->done);
-	last->seq = seq;
+	last->seq = nd->next_seq;
 
 	if (flags & WALK_TRAILING) {
 		error = may_follow_link(nd, inode);
@@ -1836,43 +1816,50 @@ static const char *pick_link(struct nameidata *nd, struct path *link,
  * to do this check without having to look at inode->i_op,
  * so we keep a cache of "no, this doesn't need follow_link"
  * for the common case.
+ *
+ * NOTE: dentry must be what nd->next_seq had been sampled from.
  */
 static const char *step_into(struct nameidata *nd, int flags,
-		     struct dentry *dentry, struct inode *inode, unsigned seq)
+		     struct dentry *dentry)
 {
 	struct path path;
-	int err = handle_mounts(nd, dentry, &path, &inode, &seq);
+	struct inode *inode;
+	int err = handle_mounts(nd, dentry, &path);
 
 	if (err < 0)
 		return ERR_PTR(err);
+	inode = path.dentry->d_inode;
 	if (likely(!d_is_symlink(path.dentry)) ||
 	   ((flags & WALK_TRAILING) && !(nd->flags & LOOKUP_FOLLOW)) ||
 	   (flags & WALK_NOFOLLOW)) {
 		/* not a symlink or should not follow */
-		if (!(nd->flags & LOOKUP_RCU)) {
+		if (nd->flags & LOOKUP_RCU) {
+			if (read_seqcount_retry(&path.dentry->d_seq, nd->next_seq))
+				return ERR_PTR(-ECHILD);
+			if (unlikely(!inode))
+				return ERR_PTR(-ENOENT);
+		} else {
 			dput(nd->path.dentry);
 			if (nd->path.mnt != path.mnt)
 				mntput(nd->path.mnt);
 		}
 		nd->path = path;
 		nd->inode = inode;
-		nd->seq = seq;
+		nd->seq = nd->next_seq;
 		return NULL;
 	}
 	if (nd->flags & LOOKUP_RCU) {
 		/* make sure that d_is_symlink above matches inode */
-		if (read_seqcount_retry(&path.dentry->d_seq, seq))
+		if (read_seqcount_retry(&path.dentry->d_seq, nd->next_seq))
 			return ERR_PTR(-ECHILD);
 	} else {
 		if (path.mnt == nd->path.mnt)
 			mntget(path.mnt);
 	}
-	return pick_link(nd, &path, inode, seq, flags);
+	return pick_link(nd, &path, inode, flags);
 }
 
-static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
-					struct inode **inodep,
-					unsigned *seqp)
+static struct dentry *follow_dotdot_rcu(struct nameidata *nd)
 {
 	struct dentry *parent, *old;
 
@@ -1889,14 +1876,15 @@ static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
 		nd->path = path;
 		nd->inode = path.dentry->d_inode;
 		nd->seq = seq;
+		// makes sure that non-RCU pathwalk could reach this state
 		if (unlikely(read_seqretry(&mount_lock, nd->m_seq)))
 			return ERR_PTR(-ECHILD);
 		/* we know that mountpoint was pinned */
 	}
 	old = nd->path.dentry;
 	parent = old->d_parent;
-	*inodep = parent->d_inode;
-	*seqp = read_seqcount_begin(&parent->d_seq);
+	nd->next_seq = read_seqcount_begin(&parent->d_seq);
+	// makes sure that non-RCU pathwalk could reach this state
 	if (unlikely(read_seqcount_retry(&old->d_seq, nd->seq)))
 		return ERR_PTR(-ECHILD);
 	if (unlikely(!path_connected(nd->path.mnt, parent)))
@@ -1907,12 +1895,11 @@ static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
 		return ERR_PTR(-ECHILD);
 	if (unlikely(nd->flags & LOOKUP_BENEATH))
 		return ERR_PTR(-ECHILD);
-	return NULL;
+	nd->next_seq = nd->seq;
+	return nd->path.dentry;
 }
 
-static struct dentry *follow_dotdot(struct nameidata *nd,
-				 struct inode **inodep,
-				 unsigned *seqp)
+static struct dentry *follow_dotdot(struct nameidata *nd)
 {
 	struct dentry *parent;
 
@@ -1936,15 +1923,12 @@ static struct dentry *follow_dotdot(struct nameidata *nd,
 		dput(parent);
 		return ERR_PTR(-ENOENT);
 	}
-	*seqp = 0;
-	*inodep = parent->d_inode;
 	return parent;
 
 in_root:
 	if (unlikely(nd->flags & LOOKUP_BENEATH))
 		return ERR_PTR(-EXDEV);
-	dget(nd->path.dentry);
-	return NULL;
+	return dget(nd->path.dentry);
 }
 
 static const char *handle_dots(struct nameidata *nd, int type)
@@ -1952,8 +1936,6 @@ static const char *handle_dots(struct nameidata *nd, int type)
 	if (type == LAST_DOTDOT) {
 		const char *error = NULL;
 		struct dentry *parent;
-		struct inode *inode;
-		unsigned seq;
 
 		if (!nd->root.mnt) {
 			error = ERR_PTR(set_root(nd));
@@ -1961,17 +1943,12 @@ static const char *handle_dots(struct nameidata *nd, int type)
 				return error;
 		}
 		if (nd->flags & LOOKUP_RCU)
-			parent = follow_dotdot_rcu(nd, &inode, &seq);
+			parent = follow_dotdot_rcu(nd);
 		else
-			parent = follow_dotdot(nd, &inode, &seq);
+			parent = follow_dotdot(nd);
 		if (IS_ERR(parent))
 			return ERR_CAST(parent);
-		if (unlikely(!parent))
-			error = step_into(nd, WALK_NOFOLLOW,
-					 nd->path.dentry, nd->inode, nd->seq);
-		else
-			error = step_into(nd, WALK_NOFOLLOW,
-					 parent, inode, seq);
+		error = step_into(nd, WALK_NOFOLLOW, parent);
 		if (unlikely(error))
 			return error;
 
@@ -1995,8 +1972,6 @@ static const char *handle_dots(struct nameidata *nd, int type)
 static const char *walk_component(struct nameidata *nd, int flags)
 {
 	struct dentry *dentry;
-	struct inode *inode;
-	unsigned seq;
 	/*
 	 * "." and ".." are special - ".." especially so because it has
 	 * to be able to know about the current root directory and
@@ -2007,7 +1982,7 @@ static const char *walk_component(struct nameidata *nd, int flags)
 			put_link(nd);
 		return handle_dots(nd, nd->last_type);
 	}
-	dentry = lookup_fast(nd, &inode, &seq);
+	dentry = lookup_fast(nd);
 	if (IS_ERR(dentry))
 		return ERR_CAST(dentry);
 	if (unlikely(!dentry)) {
@@ -2017,7 +1992,7 @@ static const char *walk_component(struct nameidata *nd, int flags)
 	}
 	if (!(flags & WALK_MORE) && nd->depth)
 		put_link(nd);
-	return step_into(nd, flags, dentry, inode, seq);
+	return step_into(nd, flags, dentry);
 }
 
 /*
@@ -2372,6 +2347,8 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
 		flags &= ~LOOKUP_RCU;
 	if (flags & LOOKUP_RCU)
 		rcu_read_lock();
+	else
+		nd->seq = nd->next_seq = 0;
 
 	nd->flags = flags;
 	nd->state |= ND_JUMPED;
@@ -2473,8 +2450,8 @@ static int handle_lookup_down(struct nameidata *nd)
 {
 	if (!(nd->flags & LOOKUP_RCU))
 		dget(nd->path.dentry);
-	return PTR_ERR(step_into(nd, WALK_NOFOLLOW,
-			nd->path.dentry, nd->inode, nd->seq));
+	nd->next_seq = nd->seq;
+	return PTR_ERR(step_into(nd, WALK_NOFOLLOW, nd->path.dentry));
 }
 
 /* Returns 0 and nd will be valid on success; Retuns error, otherwise. */
@@ -3393,8 +3370,6 @@ static const char *open_last_lookups(struct nameidata *nd,
 	struct dentry *dir = nd->path.dentry;
 	int open_flag = op->open_flag;
 	bool got_write = false;
-	unsigned seq;
-	struct inode *inode;
 	struct dentry *dentry;
 	const char *res;
 
@@ -3410,7 +3385,7 @@ static const char *open_last_lookups(struct nameidata *nd,
 		if (nd->last.name[nd->last.len])
 			nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
 		/* we _can_ be in RCU mode here */
-		dentry = lookup_fast(nd, &inode, &seq);
+		dentry = lookup_fast(nd);
 		if (IS_ERR(dentry))
 			return ERR_CAST(dentry);
 		if (likely(dentry))
@@ -3464,7 +3439,7 @@ static const char *open_last_lookups(struct nameidata *nd,
 finish_lookup:
 	if (nd->depth)
 		put_link(nd);
-	res = step_into(nd, WALK_TRAILING, dentry, inode, seq);
+	res = step_into(nd, WALK_TRAILING, dentry);
 	if (unlikely(res))
 		nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
 	return res;

^ permalink raw reply related	[flat|nested] 147+ messages in thread

* Re: [PATCH 1/7] __follow_mount_rcu(): verify that mount_lock remains unchanged
  2022-07-04 23:19                         ` [PATCH 1/7] __follow_mount_rcu(): verify that mount_lock remains unchanged Al Viro
@ 2022-07-05  0:06                           ` Linus Torvalds
  2022-07-05  3:48                             ` Al Viro
  0 siblings, 1 reply; 147+ messages in thread
From: Linus Torvalds @ 2022-07-05  0:06 UTC (permalink / raw)
  To: Al Viro
  Cc: Alexander Potapenko, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

On Mon, Jul 4, 2022 at 4:19 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> -       unsigned        seq, m_seq, r_seq;
> +       unsigned        seq, next_seq, m_seq, r_seq;

So the main thing I react to here is how "next_seq" is in the "struct
nameidata", but then it always goes together with a "struct dentry"
that you end up having to pass separately (and that is *not* in that
"struct nameidata").

Now, saving the associated dentry (as "next_dentry") in the nd would
solve that, but ends up benign ugly since everything then wants to
look at the dentry anyway, so while it would solve the inconsistency,
it would be ugly.

I wonder if the solution might not be to create a new structure like

        struct rcu_dentry {
                struct dentry *dentry;
                unsigned seq;
        };

and in fact then we could make __d_lookup_rcu() return one of these
things (we already rely on that "returning a two-word structure is
efficient" elsewhere).

That would then make that "this dentry goes with this sequence number"
be a very clear thing, and I actually thjink that it would make
__d_lookup_rcu() have a cleaner calling convention too, ie we'd go
from

        dentry = __d_lookup_rcu(parent, &nd->last, &nd->next_seq);

rto

       dseq = __d_lookup_rcu(parent, &nd->last);

and it would even improve code generation because it now returns the
dentry and the sequence number in registers, instead of returning one
in a register and one in memory.

I did *not* look at how it would change some of the other places, but
I do like the notion of "keep the dentry and the sequence number that
goes with it together".

That "keep dentry as a local, keep the sequence number that goes with
it as a field in the 'nd'" really does seem an odd thing. So I'm
throwing the above out as a "maybe we could do this instead..".

Not a huge deal. That oddity or not, I think the patch series is an improvement.

I do have a minor gripe with this too:

> +       nd->seq = nd->next_seq = 0;

I'm not convinced "0" is a good value.

It's not supposed to match anything, but it *could* match a valid
sequence number. Wouldn't it be better to pick something that is
explicitly invalid and has the low bit set (ie 1 or -1).

We don't seem to have a SEQ_INVAL or anything like that, but it does
seem that if the intent is to make it clear it's not a real sequence
number any more at that point, then 0 isn't great.

But again, this is more of a stylistic detail thing than a real complaint.

             Linus

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 1/7] __follow_mount_rcu(): verify that mount_lock remains unchanged
  2022-07-05  0:06                           ` Linus Torvalds
@ 2022-07-05  3:48                             ` Al Viro
  0 siblings, 0 replies; 147+ messages in thread
From: Al Viro @ 2022-07-05  3:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexander Potapenko, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

On Mon, Jul 04, 2022 at 05:06:17PM -0700, Linus Torvalds wrote:

> I wonder if the solution might not be to create a new structure like
> 
>         struct rcu_dentry {
>                 struct dentry *dentry;
>                 unsigned seq;
>         };
> 
> and in fact then we could make __d_lookup_rcu() return one of these
> things (we already rely on that "returning a two-word structure is
> efficient" elsewhere).
>
> That would then make that "this dentry goes with this sequence number"
> be a very clear thing, and I actually thjink that it would make
> __d_lookup_rcu() have a cleaner calling convention too, ie we'd go
> from
> 
>         dentry = __d_lookup_rcu(parent, &nd->last, &nd->next_seq);
> 
> rto
> 
>        dseq = __d_lookup_rcu(parent, &nd->last);
> 
> and it would even improve code generation because it now returns the
> dentry and the sequence number in registers, instead of returning one
> in a register and one in memory.
> 
> I did *not* look at how it would change some of the other places, but
> I do like the notion of "keep the dentry and the sequence number that
> goes with it together".
> 
> That "keep dentry as a local, keep the sequence number that goes with
> it as a field in the 'nd'" really does seem an odd thing. So I'm
> throwing the above out as a "maybe we could do this instead..".

I looked into that; turns out to be quite messy, unfortunately.  For one
thing, the distance between the places where we get the seq count and
the place where we consume it is large; worse, there's a bunch of paths
where we are in non-RCU mode converging to the same consumer and those
need a 0/1/-1/whatever paired with dentry.  Gets very clumsy...

There might be a clever way to deal with pairs cleanly, but I don't see it
at the moment.  I'll look into that some more, but...

BTW, how good gcc and clang are at figuring out that e.g.

static int foo(int n)
{
	if (likely(n >= 0))
		return 0;
	....
}

....
	if (foo(n))
		whatever();

should be treated as
	if (unlikely(foo(n)))
		whatever();

They certainly do it just fine if the damn thing is inlined (e.g.
all those unlikely(read_seqcount_retry(....)) can and should lose
unlikely), but do they manage that for non-inlined functions in
the same compilation unit?  Relatively recent gcc seems to...

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 04/45] x86: asm: instrument usercopy in get_user() and __put_user_size()
  2022-07-01 14:22 ` [PATCH v4 04/45] x86: asm: instrument usercopy in get_user() and __put_user_size() Alexander Potapenko
                     ` (2 preceding siblings ...)
  2022-07-02 13:09   ` kernel test robot
@ 2022-07-07 10:13   ` Marco Elver
  2022-08-07 17:33     ` Alexander Potapenko
  3 siblings, 1 reply; 147+ messages in thread
From: Marco Elver @ 2022-07-07 10:13 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, linux-mm, linux-arch, linux-kernel

On Fri, 1 Jul 2022 at 16:23, Alexander Potapenko <glider@google.com> wrote:
>
> Use hooks from instrumented.h to notify bug detection tools about
> usercopy events in get_user() and put_user_size().
>
> It's still unclear how to instrument put_user(), which assumes that
> instrumentation code doesn't clobber RAX.

do_put_user_call() has a comment about KASAN clobbering %ax, doesn't
this also apply to KMSAN? If not, could we have a <asm/instrumented.h>
that provides helpers to push registers on the stack and pop them back
on return?

Also it seems the test robot complained about this patch.

> Signed-off-by: Alexander Potapenko <glider@google.com>
> ---
> Link: https://linux-review.googlesource.com/id/Ia9f12bfe5832623250e20f1859fdf5cc485a2fce
> ---
>  arch/x86/include/asm/uaccess.h | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
> index 913e593a3b45f..1a8b5a234474f 100644
> --- a/arch/x86/include/asm/uaccess.h
> +++ b/arch/x86/include/asm/uaccess.h
> @@ -5,6 +5,7 @@
>   * User space memory access functions
>   */
>  #include <linux/compiler.h>
> +#include <linux/instrumented.h>
>  #include <linux/kasan-checks.h>
>  #include <linux/string.h>
>  #include <asm/asm.h>
> @@ -99,11 +100,13 @@ extern int __get_user_bad(void);
>         int __ret_gu;                                                   \
>         register __inttype(*(ptr)) __val_gu asm("%"_ASM_DX);            \
>         __chk_user_ptr(ptr);                                            \
> +       instrument_copy_from_user_before((void *)&(x), ptr, sizeof(*(ptr))); \
>         asm volatile("call __" #fn "_%P4"                               \
>                      : "=a" (__ret_gu), "=r" (__val_gu),                \
>                         ASM_CALL_CONSTRAINT                             \
>                      : "0" (ptr), "i" (sizeof(*(ptr))));                \
>         (x) = (__force __typeof__(*(ptr))) __val_gu;                    \
> +       instrument_copy_from_user_after((void *)&(x), ptr, sizeof(*(ptr)), 0); \
>         __builtin_expect(__ret_gu, 0);                                  \
>  })
>
> @@ -248,7 +251,9 @@ extern void __put_user_nocheck_8(void);
>
>  #define __put_user_size(x, ptr, size, label)                           \
>  do {                                                                   \
> +       __typeof__(*(ptr)) __pus_val = x;                               \
>         __chk_user_ptr(ptr);                                            \
> +       instrument_copy_to_user(ptr, &(__pus_val), size);               \
>         switch (size) {                                                 \
>         case 1:                                                         \
>                 __put_user_goto(x, ptr, "b", "iq", label);              \
> @@ -286,6 +291,7 @@ do {                                                                        \
>  #define __get_user_size(x, ptr, size, label)                           \
>  do {                                                                   \
>         __chk_user_ptr(ptr);                                            \
> +       instrument_copy_from_user_before((void *)&(x), ptr, size);      \
>         switch (size) {                                                 \
>         case 1: {                                                       \
>                 unsigned char x_u8__;                                   \
> @@ -305,6 +311,7 @@ do {                                                                        \
>         default:                                                        \
>                 (x) = __get_user_bad();                                 \
>         }                                                               \
> +       instrument_copy_from_user_after((void *)&(x), ptr, size, 0);    \
>  } while (0)
>
>  #define __get_user_asm(x, addr, itype, ltype, label)                   \
> --
> 2.37.0.rc0.161.g10f37bed90-goog
>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 06/45] kmsan: add ReST documentation
  2022-07-01 14:22 ` [PATCH v4 06/45] kmsan: add ReST documentation Alexander Potapenko
@ 2022-07-07 12:34   ` Marco Elver
  2022-07-15  7:42     ` Alexander Potapenko
  0 siblings, 1 reply; 147+ messages in thread
From: Marco Elver @ 2022-07-07 12:34 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, linux-mm, linux-arch, linux-kernel

On Fri, 1 Jul 2022 at 16:23, Alexander Potapenko <glider@google.com> wrote:
>
> Add Documentation/dev-tools/kmsan.rst and reference it in the dev-tools
> index.
>
> Signed-off-by: Alexander Potapenko <glider@google.com>
> ---
> v2:
>  -- added a note that KMSAN is not intended for production use
>
> v4:
>  -- describe CONFIG_KMSAN_CHECK_PARAM_RETVAL
>  -- drop mentions of cpu_entry_area
>  -- add SPDX license
>
> Link: https://linux-review.googlesource.com/id/I751586f79418b95550a83c6035c650b5b01567cc
> ---
>  Documentation/dev-tools/index.rst |   1 +
>  Documentation/dev-tools/kmsan.rst | 422 ++++++++++++++++++++++++++++++
>  2 files changed, 423 insertions(+)
>  create mode 100644 Documentation/dev-tools/kmsan.rst
>
> diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> index 4621eac290f46..6b0663075dc04 100644
> --- a/Documentation/dev-tools/index.rst
> +++ b/Documentation/dev-tools/index.rst
> @@ -24,6 +24,7 @@ Documentation/dev-tools/testing-overview.rst
>     kcov
>     gcov
>     kasan
> +   kmsan
>     ubsan
>     kmemleak
>     kcsan
> diff --git a/Documentation/dev-tools/kmsan.rst b/Documentation/dev-tools/kmsan.rst
> new file mode 100644
> index 0000000000000..3fa5d7fb222c9
> --- /dev/null
> +++ b/Documentation/dev-tools/kmsan.rst
> @@ -0,0 +1,422 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +.. Copyright (C) 2022, Google LLC.
> +
> +=============================
> +KernelMemorySanitizer (KMSAN)
> +=============================

To be consistent with other tools, I think we have settled on "The
Kernel <...> Sanitizer (K?SAN)", see
Documentation/dev-tools/k[ac]san.rst. So this will be "The Kernel
Memory Sanitizer (KMSAN)".

> +KMSAN is a dynamic error detector aimed at finding uses of uninitialized
> +values. It is based on compiler instrumentation, and is quite similar to the
> +userspace `MemorySanitizer tool`_.
> +
> +An important note is that KMSAN is not intended for production use, because it
> +drastically increases kernel memory footprint and slows the whole system down.
> +
> +Example report
> +==============
> +
> +Here is an example of a KMSAN report::
> +
> +  =====================================================
> +  BUG: KMSAN: uninit-value in test_uninit_kmsan_check_memory+0x1be/0x380 [kmsan_test]
> +   test_uninit_kmsan_check_memory+0x1be/0x380 mm/kmsan/kmsan_test.c:273
> +   kunit_run_case_internal lib/kunit/test.c:333
> +   kunit_try_run_case+0x206/0x420 lib/kunit/test.c:374
> +   kunit_generic_run_threadfn_adapter+0x6d/0xc0 lib/kunit/try-catch.c:28
> +   kthread+0x721/0x850 kernel/kthread.c:327
> +   ret_from_fork+0x1f/0x30 ??:?
> +
> +  Uninit was stored to memory at:
> +   do_uninit_local_array+0xfa/0x110 mm/kmsan/kmsan_test.c:260
> +   test_uninit_kmsan_check_memory+0x1a2/0x380 mm/kmsan/kmsan_test.c:271
> +   kunit_run_case_internal lib/kunit/test.c:333
> +   kunit_try_run_case+0x206/0x420 lib/kunit/test.c:374
> +   kunit_generic_run_threadfn_adapter+0x6d/0xc0 lib/kunit/try-catch.c:28
> +   kthread+0x721/0x850 kernel/kthread.c:327
> +   ret_from_fork+0x1f/0x30 ??:?
> +
> +  Local variable uninit created at:
> +   do_uninit_local_array+0x4a/0x110 mm/kmsan/kmsan_test.c:256
> +   test_uninit_kmsan_check_memory+0x1a2/0x380 mm/kmsan/kmsan_test.c:271
> +
> +  Bytes 4-7 of 8 are uninitialized
> +  Memory access of size 8 starts at ffff888083fe3da0
> +
> +  CPU: 0 PID: 6731 Comm: kunit_try_catch Tainted: G    B       E     5.16.0-rc3+ #104
> +  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
> +  =====================================================
> +
> +
> +The report says that the local variable ``uninit`` was created uninitialized in
> +``do_uninit_local_array()``. The lower stack trace corresponds to the place

-> "The third stack trace ..."
(Because it looks like there's also another stack trace in the middle
and "lower" is ambiguous)

> +where this variable was created.
> +
> +The upper stack shows where the uninit value was used - in

-> "The first stack trace shows where the uninit value was used (in
``test_uninit_kmsan_check_memory()``)."

> +``test_uninit_kmsan_check_memory()``. The tool shows the bytes which were left
> +uninitialized in the local variable, as well as the stack where the value was
> +copied to another memory location before use.
> +
> +A use of uninitialized value ``v`` is reported by KMSAN in the following cases:
> + - in a condition, e.g. ``if (v) { ... }``;
> + - in an indexing or pointer dereferencing, e.g. ``array[v]`` or ``*v``;
> + - when it is copied to userspace or hardware, e.g. ``copy_to_user(..., &v, ...)``;
> + - when it is passed as an argument to a function, and
> +   ``CONFIG_KMSAN_CHECK_PARAM_RETVAL`` is enabled (see below).
> +
> +The mentioned cases (apart from copying data to userspace or hardware, which is
> +a security issue) are considered undefined behavior from the C11 Standard point
> +of view.
> +
> +KMSAN and Clang
> +===============

The KASAN documentation has a section on "Support" which lists
architectures and compilers supported. I'd try to mirror (or improve
on) that.

> +In order for KMSAN to work the kernel must be built with Clang, which so far is
> +the only compiler that has KMSAN support. The kernel instrumentation pass is
> +based on the userspace `MemorySanitizer tool`_.
> +
> +How to build
> +============

I'd call it "Usage", like in the KASAN and KCSAN documentation.

> +In order to build a kernel with KMSAN you will need a fresh Clang (14.0.0+).
> +Please refer to `LLVM documentation`_ for the instructions on how to build Clang.
> +
> +Now configure and build the kernel with CONFIG_KMSAN enabled.

I would move build/usage instructions right after introduction as
that's most likely what users of KMSAN will want to know about first.

> +How KMSAN works
> +===============
> +
> +KMSAN shadow memory
> +-------------------
> +
> +KMSAN associates a metadata byte (also called shadow byte) with every byte of
> +kernel memory. A bit in the shadow byte is set iff the corresponding bit of the
> +kernel memory byte is uninitialized. Marking the memory uninitialized (i.e.
> +setting its shadow bytes to ``0xff``) is called poisoning, marking it
> +initialized (setting the shadow bytes to ``0x00``) is called unpoisoning.
> +
> +When a new variable is allocated on the stack, it is poisoned by default by
> +instrumentation code inserted by the compiler (unless it is a stack variable
> +that is immediately initialized). Any new heap allocation done without
> +``__GFP_ZERO`` is also poisoned.
> +
> +Compiler instrumentation also tracks the shadow values with the help from the
> +runtime library in ``mm/kmsan/``.

This sentence might still be confusing. I think it should highlight
that runtime and compiler go together, but depending on the scope of
the value, the compiler invokes the runtime to persist the shadow.

> +The shadow value of a basic or compound type is an array of bytes of the same
> +length. When a constant value is written into memory, that memory is unpoisoned.
> +When a value is read from memory, its shadow memory is also obtained and
> +propagated into all the operations which use that value. For every instruction
> +that takes one or more values the compiler generates code that calculates the
> +shadow of the result depending on those values and their shadows.
> +
> +Example::
> +
> +  int a = 0xff;  // i.e. 0x000000ff
> +  int b;
> +  int c = a | b;
> +
> +In this case the shadow of ``a`` is ``0``, shadow of ``b`` is ``0xffffffff``,
> +shadow of ``c`` is ``0xffffff00``. This means that the upper three bytes of
> +``c`` are uninitialized, while the lower byte is initialized.
> +
> +

There are 2 blank lines here, which is inconsistent with the rest of
the document.

> +Origin tracking
> +---------------
> +
> +Every four bytes of kernel memory also have a so-called origin assigned to

Is "assigned" or "mapped" more appropriate here?

> +them. This origin describes the point in program execution at which the
> +uninitialized value was created. Every origin is associated with either the
> +full allocation stack (for heap-allocated memory), or the function containing
> +the uninitialized variable (for locals).
> +
> +When an uninitialized variable is allocated on stack or heap, a new origin
> +value is created, and that variable's origin is filled with that value.
> +When a value is read from memory, its origin is also read and kept together
> +with the shadow. For every instruction that takes one or more values the origin

s/values the origin/values, the origin/

> +of the result is one of the origins corresponding to any of the uninitialized
> +inputs. If a poisoned value is written into memory, its origin is written to the
> +corresponding storage as well.
> +
> +Example 1::
> +
> +  int a = 42;
> +  int b;
> +  int c = a + b;
> +
> +In this case the origin of ``b`` is generated upon function entry, and is
> +stored to the origin of ``c`` right before the addition result is written into
> +memory.
> +
> +Several variables may share the same origin address, if they are stored in the
> +same four-byte chunk. In this case every write to either variable updates the
> +origin for all of them. We have to sacrifice precision in this case, because
> +storing origins for individual bits (and even bytes) would be too costly.
> +
> +Example 2::
> +
> +  int combine(short a, short b) {
> +    union ret_t {
> +      int i;
> +      short s[2];
> +    } ret;
> +    ret.s[0] = a;
> +    ret.s[1] = b;
> +    return ret.i;
> +  }
> +
> +If ``a`` is initialized and ``b`` is not, the shadow of the result would be
> +0xffff0000, and the origin of the result would be the origin of ``b``.
> +``ret.s[0]`` would have the same origin, but it will be never used, because

s/be never/never be/

> +that variable is initialized.
> +
> +If both function arguments are uninitialized, only the origin of the second
> +argument is preserved.
> +
> +Origin chaining
> +~~~~~~~~~~~~~~~
> +
> +To ease debugging, KMSAN creates a new origin for every store of an
> +uninitialized value to memory. The new origin references both its creation stack
> +and the previous origin the value had. This may cause increased memory
> +consumption, so we limit the length of origin chains in the runtime.
> +
> +Clang instrumentation API
> +-------------------------
> +
> +Clang instrumentation pass inserts calls to functions defined in
> +``mm/kmsan/instrumentation.c`` into the kernel code.
> +
> +Shadow manipulation
> +~~~~~~~~~~~~~~~~~~~
> +
> +For every memory access the compiler emits a call to a function that returns a
> +pair of pointers to the shadow and origin addresses of the given memory::
> +
> +  typedef struct {
> +    void *shadow, *origin;
> +  } shadow_origin_ptr_t
> +
> +  shadow_origin_ptr_t __msan_metadata_ptr_for_load_{1,2,4,8}(void *addr)
> +  shadow_origin_ptr_t __msan_metadata_ptr_for_store_{1,2,4,8}(void *addr)
> +  shadow_origin_ptr_t __msan_metadata_ptr_for_load_n(void *addr, uintptr_t size)
> +  shadow_origin_ptr_t __msan_metadata_ptr_for_store_n(void *addr, uintptr_t size)
> +
> +The function name depends on the memory access size.
> +
> +The compiler makes sure that for every loaded value its shadow and origin
> +values are read from memory. When a value is stored to memory, its shadow and
> +origin are also stored using the metadata pointers.
> +
> +Handling locals
> +~~~~~~~~~~~~~~~
> +
> +A special function is used to create a new origin value for a local variable and
> +set the origin of that variable to that value::
> +
> +  void __msan_poison_alloca(void *addr, uintptr_t size, char *descr)
> +
> +Access to per-task data
> +~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +At the beginning of every instrumented function KMSAN inserts a call to
> +``__msan_get_context_state()``::
> +
> +  kmsan_context_state *__msan_get_context_state(void)
> +
> +``kmsan_context_state`` is declared in ``include/linux/kmsan.h``::
> +
> +  struct kmsan_context_state {
> +    char param_tls[KMSAN_PARAM_SIZE];
> +    char retval_tls[KMSAN_RETVAL_SIZE];
> +    char va_arg_tls[KMSAN_PARAM_SIZE];
> +    char va_arg_origin_tls[KMSAN_PARAM_SIZE];
> +    u64 va_arg_overflow_size_tls;
> +    char param_origin_tls[KMSAN_PARAM_SIZE];
> +    depot_stack_handle_t retval_origin_tls;
> +  };
> +
> +This structure is used by KMSAN to pass parameter shadows and origins between
> +instrumented functions (unless the parameters are checked immediately by
> +``CONFIG_KMSAN_CHECK_PARAM_RETVAL``).
> +
> +Passing uninitialized values to functions
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +KMSAN instrumentation pass has an option, ``-fsanitize-memory-param-retval``,

"KMSAN instrumentation pass" -> "Clang's instrumentation support" ?
Because it seems wrong to say that KMSAN has the instrumentation pass.

> +which makes the compiler check function parameters passed by value, as well as
> +function return values.
> +
> +The option is controlled by ``CONFIG_KMSAN_CHECK_PARAM_RETVAL``, which is
> +enabled by default to let KMSAN report uninitialized values earlier.
> +Please refer to the `LKML discussion`_ for more details.
> +
> +Because of the way the checks are implemented in LLVM (they are only applied to
> +parameters marked as ``noundef``), not all parameters are guaranteed to be
> +checked, so we cannot give up the metadata storage in ``kmsan_context_state``.
> +
> +String functions
> +~~~~~~~~~~~~~~~~
> +
> +The compiler replaces calls to ``memcpy()``/``memmove()``/``memset()`` with the
> +following functions. These functions are also called when data structures are
> +initialized or copied, making sure shadow and origin values are copied alongside
> +with the data::
> +
> +  void *__msan_memcpy(void *dst, void *src, uintptr_t n)
> +  void *__msan_memmove(void *dst, void *src, uintptr_t n)
> +  void *__msan_memset(void *dst, int c, uintptr_t n)
> +
> +Error reporting
> +~~~~~~~~~~~~~~~
> +
> +For each use of a value the compiler emits a shadow check that calls
> +``__msan_warning()`` in the case that value is poisoned::
> +
> +  void __msan_warning(u32 origin)
> +
> +``__msan_warning()`` causes KMSAN runtime to print an error report.
> +
> +Inline assembly instrumentation
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +KMSAN instruments every inline assembly output with a call to::
> +
> +  void __msan_instrument_asm_store(void *addr, uintptr_t size)
> +
> +, which unpoisons the memory region.
> +
> +This approach may mask certain errors, but it also helps to avoid a lot of
> +false positives in bitwise operations, atomics etc.
> +
> +Sometimes the pointers passed into inline assembly do not point to valid memory.
> +In such cases they are ignored at runtime.
> +
> +Disabling the instrumentation
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

It would be useful to move this section somewhere to the beginning,
closer to usage and the example, as this is information that a user of
KMSAN might want to know (but they might not want to know much about
how KMSAN works).

> +A function can be marked with ``__no_kmsan_checks``. Doing so makes KMSAN
> +ignore uninitialized values in that function and mark its output as initialized.
> +As a result, the user will not get KMSAN reports related to that function.
> +
> +Another function attribute supported by KMSAN is ``__no_sanitize_memory``.
> +Applying this attribute to a function will result in KMSAN not instrumenting it,
> +which can be helpful if we do not want the compiler to mess up some low-level

s/mess up/interfere with/

> +code (e.g. that marked with ``noinstr``).

maybe "... (e.g. that marked with ``noinstr``, which implicitly adds
``__no_sanitize_memory``)."

otherwise people might think that it's necessary to add
__no_sanitize_memory explicitly to noinstr.

> +
> +This however comes at a cost: stack allocations from such functions will have
> +incorrect shadow/origin values, likely leading to false positives. Functions
> +called from non-instrumented code may also receive incorrect metadata for their
> +parameters.
> +
> +As a rule of thumb, avoid using ``__no_sanitize_memory`` explicitly.
> +
> +It is also possible to disable KMSAN for a single file (e.g. main.o)::
> +
> +  KMSAN_SANITIZE_main.o := n
> +
> +or for the whole directory::
> +
> +  KMSAN_SANITIZE := n
> +
> +in the Makefile. Think of this as applying ``__no_sanitize_memory`` to every
> +function in the file or directory. Most users won't need KMSAN_SANITIZE, unless
> +their code gets broken by KMSAN (e.g. runs at early boot time).
> +
> +Runtime library
> +---------------
> +
> +The code is located in ``mm/kmsan/``.
> +
> +Per-task KMSAN state
> +~~~~~~~~~~~~~~~~~~~~
> +
> +Every task_struct has an associated KMSAN task state that holds the KMSAN
> +context (see above) and a per-task flag disallowing KMSAN reports::
> +
> +  struct kmsan_context {
> +    ...
> +    bool allow_reporting;
> +    struct kmsan_context_state cstate;
> +    ...
> +  }
> +
> +  struct task_struct {
> +    ...
> +    struct kmsan_context kmsan;
> +    ...
> +  }
> +
> +

1 blank line instead of 2?

> +KMSAN contexts
> +~~~~~~~~~~~~~~
> +
> +When running in a kernel task context, KMSAN uses ``current->kmsan.cstate`` to
> +hold the metadata for function parameters and return values.
> +
> +But in the case the kernel is running in the interrupt, softirq or NMI context,
> +where ``current`` is unavailable, KMSAN switches to per-cpu interrupt state::
> +
> +  DEFINE_PER_CPU(struct kmsan_ctx, kmsan_percpu_ctx);
> +
> +Metadata allocation
> +~~~~~~~~~~~~~~~~~~~
> +
> +There are several places in the kernel for which the metadata is stored.
> +
> +1. Each ``struct page`` instance contains two pointers to its shadow and
> +origin pages::
> +
> +  struct page {
> +    ...
> +    struct page *shadow, *origin;
> +    ...
> +  };
> +
> +At boot-time, the kernel allocates shadow and origin pages for every available
> +kernel page. This is done quite late, when the kernel address space is already
> +fragmented, so normal data pages may arbitrarily interleave with the metadata
> +pages.
> +
> +This means that in general for two contiguous memory pages their shadow/origin
> +pages may not be contiguous. So, if a memory access crosses the boundary

s/So, /Consequently, /

> +of a memory block, accesses to shadow/origin memory may potentially corrupt
> +other pages or read incorrect values from them.
> +
> +In practice, contiguous memory pages returned by the same ``alloc_pages()``
> +call will have contiguous metadata, whereas if these pages belong to two
> +different allocations their metadata pages can be fragmented.
> +
> +For the kernel data (``.data``, ``.bss`` etc.) and percpu memory regions
> +there also are no guarantees on metadata contiguity.
> +
> +In the case ``__msan_metadata_ptr_for_XXX_YYY()`` hits the border between two
> +pages with non-contiguous metadata, it returns pointers to fake shadow/origin regions::
> +
> +  char dummy_load_page[PAGE_SIZE] __attribute__((aligned(PAGE_SIZE)));
> +  char dummy_store_page[PAGE_SIZE] __attribute__((aligned(PAGE_SIZE)));
> +
> +``dummy_load_page`` is zero-initialized, so reads from it always yield zeroes.
> +All stores to ``dummy_store_page`` are ignored.
> +
> +2. For vmalloc memory and modules, there is a direct mapping between the memory
> +range, its shadow and origin. KMSAN reduces the vmalloc area by 3/4, making only
> +the first quarter available to ``vmalloc()``. The second quarter of the vmalloc
> +area contains shadow memory for the first quarter, the third one holds the
> +origins. A small part of the fourth quarter contains shadow and origins for the
> +kernel modules. Please refer to ``arch/x86/include/asm/pgtable_64_types.h`` for
> +more details.
> +
> +When an array of pages is mapped into a contiguous virtual memory space, their
> +shadow and origin pages are similarly mapped into contiguous regions.
> +
> +References
> +==========
> +
> +E. Stepanov, K. Serebryany. `MemorySanitizer: fast detector of uninitialized
> +memory use in C++
> +<https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43308.pdf>`_.
> +In Proceedings of CGO 2015.
> +
> +.. _MemorySanitizer tool: https://clang.llvm.org/docs/MemorySanitizer.html
> +.. _LLVM documentation: https://llvm.org/docs/GettingStarted.html
> +.. _LKML discussion: https://lore.kernel.org/all/20220614144853.3693273-1-glider@google.com/
> --
> 2.37.0.rc0.161.g10f37bed90-goog
>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 09/45] x86: kmsan: pgtable: reduce vmalloc space
  2022-07-01 14:22 ` [PATCH v4 09/45] x86: kmsan: pgtable: reduce vmalloc space Alexander Potapenko
@ 2022-07-11 16:12   ` Marco Elver
  0 siblings, 0 replies; 147+ messages in thread
From: Marco Elver @ 2022-07-11 16:12 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, linux-mm, linux-arch, linux-kernel

On Fri, 1 Jul 2022 at 16:23, Alexander Potapenko <glider@google.com> wrote:
>
> KMSAN is going to use 3/4 of existing vmalloc space to hold the
> metadata, therefore we lower VMALLOC_END to make sure vmalloc() doesn't
> allocate past the first 1/4.
>
> Signed-off-by: Alexander Potapenko <glider@google.com>
> ---
> v2:
>  -- added x86: to the title
>
> Link: https://linux-review.googlesource.com/id/I9d8b7f0a88a639f1263bc693cbd5c136626f7efd
> ---
>  arch/x86/include/asm/pgtable_64_types.h | 41 ++++++++++++++++++++++++-
>  arch/x86/mm/init_64.c                   |  2 +-
>  2 files changed, 41 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
> index 70e360a2e5fb7..ad6ded5b1dedf 100644
> --- a/arch/x86/include/asm/pgtable_64_types.h
> +++ b/arch/x86/include/asm/pgtable_64_types.h
> @@ -139,7 +139,46 @@ extern unsigned int ptrs_per_p4d;
>  # define VMEMMAP_START         __VMEMMAP_BASE_L4
>  #endif /* CONFIG_DYNAMIC_MEMORY_LAYOUT */
>
> -#define VMALLOC_END            (VMALLOC_START + (VMALLOC_SIZE_TB << 40) - 1)
> +#define VMEMORY_END            (VMALLOC_START + (VMALLOC_SIZE_TB << 40) - 1)

Comment what VMEMORY_END is? (Seems obvious, but less guessing is better here.)

> +#ifndef CONFIG_KMSAN
> +#define VMALLOC_END            VMEMORY_END
> +#else
> +/*
> + * In KMSAN builds vmalloc area is four times smaller, and the remaining 3/4
> + * are used to keep the metadata for virtual pages. The memory formerly
> + * belonging to vmalloc area is now laid out as follows:
> + *
> + * 1st quarter: VMALLOC_START to VMALLOC_END - new vmalloc area
> + * 2nd quarter: KMSAN_VMALLOC_SHADOW_START to
> + *              VMALLOC_END+KMSAN_VMALLOC_SHADOW_OFFSET - vmalloc area shadow
> + * 3rd quarter: KMSAN_VMALLOC_ORIGIN_START to
> + *              VMALLOC_END+KMSAN_VMALLOC_ORIGIN_OFFSET - vmalloc area origins
> + * 4th quarter: KMSAN_MODULES_SHADOW_START to KMSAN_MODULES_ORIGIN_START
> + *              - shadow for modules,
> + *              KMSAN_MODULES_ORIGIN_START to
> + *              KMSAN_MODULES_ORIGIN_START + MODULES_LEN - origins for modules.
> + */
> +#define VMALLOC_QUARTER_SIZE   ((VMALLOC_SIZE_TB << 40) >> 2)
> +#define VMALLOC_END            (VMALLOC_START + VMALLOC_QUARTER_SIZE - 1)
> +
> +/*
> + * vmalloc metadata addresses are calculated by adding shadow/origin offsets
> + * to vmalloc address.
> + */
> +#define KMSAN_VMALLOC_SHADOW_OFFSET    VMALLOC_QUARTER_SIZE
> +#define KMSAN_VMALLOC_ORIGIN_OFFSET    (VMALLOC_QUARTER_SIZE << 1)
> +
> +#define KMSAN_VMALLOC_SHADOW_START     (VMALLOC_START + KMSAN_VMALLOC_SHADOW_OFFSET)
> +#define KMSAN_VMALLOC_ORIGIN_START     (VMALLOC_START + KMSAN_VMALLOC_ORIGIN_OFFSET)
> +
> +/*
> + * The shadow/origin for modules are placed one by one in the last 1/4 of
> + * vmalloc space.
> + */
> +#define KMSAN_MODULES_SHADOW_START     (VMALLOC_END + KMSAN_VMALLOC_ORIGIN_OFFSET + 1)
> +#define KMSAN_MODULES_ORIGIN_START     (KMSAN_MODULES_SHADOW_START + MODULES_LEN)
> +#endif /* CONFIG_KMSAN */
>
>  #define MODULES_VADDR          (__START_KERNEL_map + KERNEL_IMAGE_SIZE)
>  /* The module sections ends with the start of the fixmap */
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index 39c5246964a91..5806331172361 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -1287,7 +1287,7 @@ static void __init preallocate_vmalloc_pages(void)
>         unsigned long addr;
>         const char *lvl;
>
> -       for (addr = VMALLOC_START; addr <= VMALLOC_END; addr = ALIGN(addr + 1, PGDIR_SIZE)) {
> +       for (addr = VMALLOC_START; addr <= VMEMORY_END; addr = ALIGN(addr + 1, PGDIR_SIZE)) {
>                 pgd_t *pgd = pgd_offset_k(addr);
>                 p4d_t *p4d;
>                 pud_t *pud;
> --
> 2.37.0.rc0.161.g10f37bed90-goog
>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 10/45] libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE
  2022-07-01 14:22 ` [PATCH v4 10/45] libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE Alexander Potapenko
@ 2022-07-11 16:26   ` Marco Elver
  2022-08-03  9:41     ` Alexander Potapenko
  2022-08-03  9:44     ` Alexander Potapenko
  0 siblings, 2 replies; 147+ messages in thread
From: Marco Elver @ 2022-07-11 16:26 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, linux-mm, linux-arch, linux-kernel

On Fri, 1 Jul 2022 at 16:23, Alexander Potapenko <glider@google.com> wrote:
>
> KMSAN adds extra metadata fields to struct page, so it does not fit into
> 64 bytes anymore.

Does this somehow cause extra space being used in all kernel configs?
If not, it would be good to note this in the commit message.


> Signed-off-by: Alexander Potapenko <glider@google.com>

Reviewed-by: Marco Elver <elver@google.com>

> ---
> Link: https://linux-review.googlesource.com/id/I353796acc6a850bfd7bb342aa1b63e616fc614f1
> ---
>  drivers/nvdimm/nd.h       | 2 +-
>  drivers/nvdimm/pfn_devs.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
> index ec5219680092d..85ca5b4da3cf3 100644
> --- a/drivers/nvdimm/nd.h
> +++ b/drivers/nvdimm/nd.h
> @@ -652,7 +652,7 @@ void devm_namespace_disable(struct device *dev,
>                 struct nd_namespace_common *ndns);
>  #if IS_ENABLED(CONFIG_ND_CLAIM)
>  /* max struct page size independent of kernel config */
> -#define MAX_STRUCT_PAGE_SIZE 64
> +#define MAX_STRUCT_PAGE_SIZE 128
>  int nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct dev_pagemap *pgmap);
>  #else
>  static inline int nvdimm_setup_pfn(struct nd_pfn *nd_pfn,
> diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
> index 0e92ab4b32833..61af072ac98f9 100644
> --- a/drivers/nvdimm/pfn_devs.c
> +++ b/drivers/nvdimm/pfn_devs.c
> @@ -787,7 +787,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
>                  * when populating the vmemmap. This *should* be equal to
>                  * PMD_SIZE for most architectures.
>                  *
> -                * Also make sure size of struct page is less than 64. We
> +                * Also make sure size of struct page is less than 128. We
>                  * want to make sure we use large enough size here so that
>                  * we don't have a dynamic reserve space depending on
>                  * struct page size. But we also want to make sure we notice
> --
> 2.37.0.rc0.161.g10f37bed90-goog
>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 11/45] kmsan: add KMSAN runtime core
  2022-07-01 14:22 ` [PATCH v4 11/45] kmsan: add KMSAN runtime core Alexander Potapenko
  2022-07-02  0:18   ` Hillf Danton
@ 2022-07-11 16:49   ` Marco Elver
  2022-08-03 18:14     ` Alexander Potapenko
  2022-07-13 10:04   ` Marco Elver
  2 siblings, 1 reply; 147+ messages in thread
From: Marco Elver @ 2022-07-11 16:49 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, linux-mm, linux-arch, linux-kernel

On Fri, 1 Jul 2022 at 16:23, 'Alexander Potapenko' via kasan-dev
<kasan-dev@googlegroups.com> wrote:
>
> For each memory location KernelMemorySanitizer maintains two types of
> metadata:
> 1. The so-called shadow of that location - а byte:byte mapping describing
>    whether or not individual bits of memory are initialized (shadow is 0)
>    or not (shadow is 1).
> 2. The origins of that location - а 4-byte:4-byte mapping containing
>    4-byte IDs of the stack traces where uninitialized values were
>    created.
>
> Each struct page now contains pointers to two struct pages holding
> KMSAN metadata (shadow and origins) for the original struct page.
> Utility routines in mm/kmsan/core.c and mm/kmsan/shadow.c handle the
> metadata creation, addressing, copying and checking.
> mm/kmsan/report.c performs error reporting in the cases an uninitialized
> value is used in a way that leads to undefined behavior.
>
> KMSAN compiler instrumentation is responsible for tracking the metadata
> along with the kernel memory. mm/kmsan/instrumentation.c provides the
> implementation for instrumentation hooks that are called from files
> compiled with -fsanitize=kernel-memory.
>
> To aid parameter passing (also done at instrumentation level), each
> task_struct now contains a struct kmsan_task_state used to track the
> metadata of function parameters and return values for that task.
>
> Finally, this patch provides CONFIG_KMSAN that enables KMSAN, and
> declares CFLAGS_KMSAN, which are applied to files compiled with KMSAN.
> The KMSAN_SANITIZE:=n Makefile directive can be used to completely
> disable KMSAN instrumentation for certain files.
>
> Similarly, KMSAN_ENABLE_CHECKS:=n disables KMSAN checks and makes newly
> created stack memory initialized.
>
> Users can also use functions from include/linux/kmsan-checks.h to mark
> certain memory regions as uninitialized or initialized (this is called
> "poisoning" and "unpoisoning") or check that a particular region is
> initialized.
>
> Signed-off-by: Alexander Potapenko <glider@google.com>

Acked-by: Marco Elver <elver@google.com>

Overall looks clean and well organized now.

> ---
> v2:
>  -- as requested by Greg K-H, moved hooks for different subsystems to respective patches,
>     rewrote the patch description;
>  -- addressed comments by Dmitry Vyukov;
>  -- added a note about KMSAN being not intended for production use.
>  -- fix case of unaligned dst in kmsan_internal_memmove_metadata()
>
> v3:
>  -- print build IDs in reports where applicable
>  -- drop redundant filter_irq_stacks(), unpoison the local passed to __stack_depot_save()
>  -- remove a stray BUG()
>
> v4: (mostly fixes suggested by Marco Elver)
>  -- add missing SPDX headers
>  -- move CC_IS_CLANG && CLANG_VERSION under HAVE_KMSAN_COMPILER
>  -- replace occurrences of |var| with @var
>  -- reflow KMSAN_WARN_ON(), fix code comments
>  -- remove x86-specific code from shadow.c to improve portability
>  -- convert kmsan_report_lock to raw spinlock
>  -- add enter_runtime/exit_runtime around kmsan_internal_memmove_metadata()
>  -- remove unnecessary include from kmsan.h (reported by <lkp@intel.com>)
>  -- introduce CONFIG_KMSAN_CHECK_PARAM_RETVAL (on by default), which
>     maps to -fsanitize-memory-param-retval and makes KMSAN eagerly check
>     values passed as function parameters and returned from functions.
>  -- use real shadow in instrumented functions called from runtime
>
> Link: https://linux-review.googlesource.com/id/I9b71bfe3425466c97159f9de0062e5e8e4fec866
> ---
>  Makefile                     |   1 +
>  include/linux/kmsan-checks.h |  64 +++++
>  include/linux/kmsan.h        |  46 ++++
>  include/linux/mm_types.h     |  12 +
>  include/linux/sched.h        |   5 +
>  lib/Kconfig.debug            |   1 +
>  lib/Kconfig.kmsan            |  50 ++++
>  mm/Makefile                  |   1 +
>  mm/kmsan/Makefile            |  23 ++
>  mm/kmsan/core.c              | 458 +++++++++++++++++++++++++++++++++++
>  mm/kmsan/hooks.c             |  66 +++++
>  mm/kmsan/instrumentation.c   | 271 +++++++++++++++++++++
>  mm/kmsan/kmsan.h             | 190 +++++++++++++++
>  mm/kmsan/report.c            | 211 ++++++++++++++++
>  mm/kmsan/shadow.c            | 147 +++++++++++
>  scripts/Makefile.kmsan       |   8 +
>  scripts/Makefile.lib         |   9 +
>  17 files changed, 1563 insertions(+)
>  create mode 100644 include/linux/kmsan-checks.h
>  create mode 100644 include/linux/kmsan.h
>  create mode 100644 lib/Kconfig.kmsan
>  create mode 100644 mm/kmsan/Makefile
>  create mode 100644 mm/kmsan/core.c
>  create mode 100644 mm/kmsan/hooks.c
>  create mode 100644 mm/kmsan/instrumentation.c
>  create mode 100644 mm/kmsan/kmsan.h
>  create mode 100644 mm/kmsan/report.c
>  create mode 100644 mm/kmsan/shadow.c
>  create mode 100644 scripts/Makefile.kmsan
>
> diff --git a/Makefile b/Makefile
> index 8973b285ce6c7..7c93482f6df3d 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1014,6 +1014,7 @@ include-y                 := scripts/Makefile.extrawarn
>  include-$(CONFIG_DEBUG_INFO)   += scripts/Makefile.debug
>  include-$(CONFIG_KASAN)                += scripts/Makefile.kasan
>  include-$(CONFIG_KCSAN)                += scripts/Makefile.kcsan
> +include-$(CONFIG_KMSAN)                += scripts/Makefile.kmsan
>  include-$(CONFIG_UBSAN)                += scripts/Makefile.ubsan
>  include-$(CONFIG_KCOV)         += scripts/Makefile.kcov
>  include-$(CONFIG_RANDSTRUCT)   += scripts/Makefile.randstruct
> diff --git a/include/linux/kmsan-checks.h b/include/linux/kmsan-checks.h
> new file mode 100644
> index 0000000000000..a6522a0c28df9
> --- /dev/null
> +++ b/include/linux/kmsan-checks.h
> @@ -0,0 +1,64 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * KMSAN checks to be used for one-off annotations in subsystems.
> + *
> + * Copyright (C) 2017-2022 Google LLC
> + * Author: Alexander Potapenko <glider@google.com>
> + *
> + */
> +
> +#ifndef _LINUX_KMSAN_CHECKS_H
> +#define _LINUX_KMSAN_CHECKS_H
> +
> +#include <linux/types.h>
> +
> +#ifdef CONFIG_KMSAN
> +
> +/**
> + * kmsan_poison_memory() - Mark the memory range as uninitialized.
> + * @address: address to start with.
> + * @size:    size of buffer to poison.
> + * @flags:   GFP flags for allocations done by this function.
> + *
> + * Until other data is written to this range, KMSAN will treat it as
> + * uninitialized. Error reports for this memory will reference the call site of
> + * kmsan_poison_memory() as origin.
> + */
> +void kmsan_poison_memory(const void *address, size_t size, gfp_t flags);
> +
> +/**
> + * kmsan_unpoison_memory() -  Mark the memory range as initialized.
> + * @address: address to start with.
> + * @size:    size of buffer to unpoison.
> + *
> + * Until other data is written to this range, KMSAN will treat it as
> + * initialized.
> + */
> +void kmsan_unpoison_memory(const void *address, size_t size);
> +
> +/**
> + * kmsan_check_memory() - Check the memory range for being initialized.
> + * @address: address to start with.
> + * @size:    size of buffer to check.
> + *
> + * If any piece of the given range is marked as uninitialized, KMSAN will report
> + * an error.
> + */
> +void kmsan_check_memory(const void *address, size_t size);
> +
> +#else
> +
> +static inline void kmsan_poison_memory(const void *address, size_t size,
> +                                      gfp_t flags)
> +{
> +}
> +static inline void kmsan_unpoison_memory(const void *address, size_t size)
> +{
> +}
> +static inline void kmsan_check_memory(const void *address, size_t size)
> +{
> +}
> +
> +#endif
> +
> +#endif /* _LINUX_KMSAN_CHECKS_H */
> diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h
> new file mode 100644
> index 0000000000000..99e48c6b049d9
> --- /dev/null
> +++ b/include/linux/kmsan.h
> @@ -0,0 +1,46 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * KMSAN API for subsystems.
> + *
> + * Copyright (C) 2017-2022 Google LLC
> + * Author: Alexander Potapenko <glider@google.com>
> + *
> + */
> +#ifndef _LINUX_KMSAN_H
> +#define _LINUX_KMSAN_H
> +
> +#include <linux/gfp.h>
> +#include <linux/kmsan-checks.h>
> +#include <linux/stackdepot.h>
> +#include <linux/types.h>
> +
> +struct page;

Either I'm not looking hard enough, but I can't see where this is used
in this file.

> +#ifdef CONFIG_KMSAN
> +
> +/* These constants are defined in the MSan LLVM instrumentation pass. */
> +#define KMSAN_RETVAL_SIZE 800
> +#define KMSAN_PARAM_SIZE 800
> +
> +struct kmsan_context_state {
> +       char param_tls[KMSAN_PARAM_SIZE];
> +       char retval_tls[KMSAN_RETVAL_SIZE];
> +       char va_arg_tls[KMSAN_PARAM_SIZE];
> +       char va_arg_origin_tls[KMSAN_PARAM_SIZE];
> +       u64 va_arg_overflow_size_tls;
> +       char param_origin_tls[KMSAN_PARAM_SIZE];
> +       depot_stack_handle_t retval_origin_tls;
> +};
> +
> +#undef KMSAN_PARAM_SIZE
> +#undef KMSAN_RETVAL_SIZE
> +
> +struct kmsan_ctx {
> +       struct kmsan_context_state cstate;
> +       int kmsan_in_runtime;
> +       bool allow_reporting;
> +};
> +
> +#endif
> +
> +#endif /* _LINUX_KMSAN_H */
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index c29ab4c0cd5c6..3cc0ebdd9625f 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -218,6 +218,18 @@ struct page {
>                                            not kmapped, ie. highmem) */
>  #endif /* WANT_PAGE_VIRTUAL */
>
> +#ifdef CONFIG_KMSAN
> +       /*
> +        * KMSAN metadata for this page:
> +        *  - shadow page: every bit indicates whether the corresponding
> +        *    bit of the original page is initialized (0) or not (1);
> +        *  - origin page: every 4 bytes contain an id of the stack trace
> +        *    where the uninitialized value was created.
> +        */
> +       struct page *kmsan_shadow;
> +       struct page *kmsan_origin;
> +#endif
> +
>  #ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS
>         int _last_cpupid;
>  #endif
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index c46f3a63b758f..f9bb2c954e794 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -14,6 +14,7 @@
>  #include <linux/pid.h>
>  #include <linux/sem.h>
>  #include <linux/shm.h>
> +#include <linux/kmsan.h>
>  #include <linux/mutex.h>
>  #include <linux/plist.h>
>  #include <linux/hrtimer.h>
> @@ -1353,6 +1354,10 @@ struct task_struct {
>  #endif
>  #endif
>
> +#ifdef CONFIG_KMSAN
> +       struct kmsan_ctx                kmsan_ctx;
> +#endif
> +
>  #if IS_ENABLED(CONFIG_KUNIT)
>         struct kunit                    *kunit_test;
>  #endif
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 2e24db4bff192..59819e6fa5865 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -963,6 +963,7 @@ config DEBUG_STACKOVERFLOW
>
>  source "lib/Kconfig.kasan"
>  source "lib/Kconfig.kfence"
> +source "lib/Kconfig.kmsan"
>
>  endmenu # "Memory Debugging"
>
> diff --git a/lib/Kconfig.kmsan b/lib/Kconfig.kmsan
> new file mode 100644
> index 0000000000000..8f768d4034e3c
> --- /dev/null
> +++ b/lib/Kconfig.kmsan
> @@ -0,0 +1,50 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +config HAVE_ARCH_KMSAN
> +       bool
> +
> +config HAVE_KMSAN_COMPILER
> +       # Clang versions <14.0.0 also support -fsanitize=kernel-memory, but not
> +       # all the features necessary to build the kernel with KMSAN.
> +       depends on CC_IS_CLANG && CLANG_VERSION >= 140000
> +       def_bool $(cc-option,-fsanitize=kernel-memory -mllvm -msan-disable-checks=1)
> +
> +config HAVE_KMSAN_PARAM_RETVAL
> +       # Separate check for -fsanitize-memory-param-retval support.
> +       depends on CC_IS_CLANG && CLANG_VERSION >= 140000
> +       def_bool $(cc-option,-fsanitize=kernel-memory -fsanitize-memory-param-retval)
> +
> +
> +config KMSAN
> +       bool "KMSAN: detector of uninitialized values use"
> +       depends on HAVE_ARCH_KMSAN && HAVE_KMSAN_COMPILER
> +       depends on SLUB && DEBUG_KERNEL && !KASAN && !KCSAN
> +       select STACKDEPOT
> +       select STACKDEPOT_ALWAYS_INIT
> +       help
> +         KernelMemorySanitizer (KMSAN) is a dynamic detector of uses of
> +         uninitialized values in the kernel. It is based on compiler
> +         instrumentation provided by Clang and thus requires Clang to build.
> +
> +         An important note is that KMSAN is not intended for production use,
> +         because it drastically increases kernel memory footprint and slows
> +         the whole system down.
> +
> +         See <file:Documentation/dev-tools/kmsan.rst> for more details.
> +
> +if KMSAN
> +
> +config KMSAN_CHECK_PARAM_RETVAL
> +       bool "Check for uninitialized values passed to and returned from functions"
> +       default HAVE_KMSAN_PARAM_RETVAL
> +       help
> +         If the compiler supports -fsanitize-memory-param-retval, KMSAN will
> +         eagerly check every function parameter passed by value and every
> +         function return value.
> +
> +         Disabling KMSAN_CHECK_PARAM_RETVAL will result in tracking shadow for
> +         function parameters and return values across function borders. This
> +         is a more relaxed mode, but it generates more instrumentation code and
> +         may potentially report errors in corner cases when non-instrumented
> +         functions call instrumented ones.
> +
> +endif
> diff --git a/mm/Makefile b/mm/Makefile
> index 6f9ffa968a1a1..ff96830153221 100644
> --- a/mm/Makefile
> +++ b/mm/Makefile
> @@ -89,6 +89,7 @@ obj-$(CONFIG_SLAB) += slab.o
>  obj-$(CONFIG_SLUB) += slub.o
>  obj-$(CONFIG_KASAN)    += kasan/
>  obj-$(CONFIG_KFENCE) += kfence/
> +obj-$(CONFIG_KMSAN)    += kmsan/
>  obj-$(CONFIG_FAILSLAB) += failslab.o
>  obj-$(CONFIG_MEMTEST)          += memtest.o
>  obj-$(CONFIG_MIGRATION) += migrate.o
> diff --git a/mm/kmsan/Makefile b/mm/kmsan/Makefile
> new file mode 100644
> index 0000000000000..550ad8625e4f9
> --- /dev/null
> +++ b/mm/kmsan/Makefile
> @@ -0,0 +1,23 @@
> +# SPDX-License-Identifier: GPL-2.0
> +#
> +# Makefile for KernelMemorySanitizer (KMSAN).
> +#
> +#
> +obj-y := core.o instrumentation.o hooks.o report.o shadow.o
> +
> +KMSAN_SANITIZE := n
> +KCOV_INSTRUMENT := n
> +UBSAN_SANITIZE := n
> +
> +# Disable instrumentation of KMSAN runtime with other tools.
> +CC_FLAGS_KMSAN_RUNTIME := -fno-stack-protector
> +CC_FLAGS_KMSAN_RUNTIME += $(call cc-option,-fno-conserve-stack)
> +CC_FLAGS_KMSAN_RUNTIME += -DDISABLE_BRANCH_PROFILING
> +
> +CFLAGS_REMOVE.o = $(CC_FLAGS_FTRACE)
> +
> +CFLAGS_core.o := $(CC_FLAGS_KMSAN_RUNTIME)
> +CFLAGS_hooks.o := $(CC_FLAGS_KMSAN_RUNTIME)
> +CFLAGS_instrumentation.o := $(CC_FLAGS_KMSAN_RUNTIME)
> +CFLAGS_report.o := $(CC_FLAGS_KMSAN_RUNTIME)
> +CFLAGS_shadow.o := $(CC_FLAGS_KMSAN_RUNTIME)
> diff --git a/mm/kmsan/core.c b/mm/kmsan/core.c
> new file mode 100644
> index 0000000000000..16fb8880a9c6d
> --- /dev/null
> +++ b/mm/kmsan/core.c
> @@ -0,0 +1,458 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * KMSAN runtime library.
> + *
> + * Copyright (C) 2017-2022 Google LLC
> + * Author: Alexander Potapenko <glider@google.com>
> + *
> + */
> +
> +#include <asm/page.h>
> +#include <linux/compiler.h>
> +#include <linux/export.h>
> +#include <linux/highmem.h>
> +#include <linux/interrupt.h>
> +#include <linux/kernel.h>
> +#include <linux/kmsan.h>
> +#include <linux/memory.h>
> +#include <linux/mm.h>
> +#include <linux/mm_types.h>
> +#include <linux/mmzone.h>
> +#include <linux/percpu-defs.h>
> +#include <linux/preempt.h>
> +#include <linux/slab.h>
> +#include <linux/stackdepot.h>
> +#include <linux/stacktrace.h>
> +#include <linux/types.h>
> +#include <linux/vmalloc.h>
> +
> +#include "../slab.h"
> +#include "kmsan.h"
> +
> +/*
> + * Avoid creating too long origin chains, these are unlikely to participate in
> + * real reports.
> + */
> +#define MAX_CHAIN_DEPTH 7
> +#define NUM_SKIPPED_TO_WARN 10000
> +
> +bool kmsan_enabled __read_mostly;
> +
> +/*
> + * Per-CPU KMSAN context to be used in interrupts, where current->kmsan is
> + * unavaliable.
> + */
> +DEFINE_PER_CPU(struct kmsan_ctx, kmsan_percpu_ctx);
> +
> +void kmsan_internal_poison_memory(void *address, size_t size, gfp_t flags,
> +                                 unsigned int poison_flags)
> +{
> +       u32 extra_bits =
> +               kmsan_extra_bits(/*depth*/ 0, poison_flags & KMSAN_POISON_FREE);
> +       bool checked = poison_flags & KMSAN_POISON_CHECK;
> +       depot_stack_handle_t handle;
> +
> +       handle = kmsan_save_stack_with_flags(flags, extra_bits);
> +       kmsan_internal_set_shadow_origin(address, size, -1, handle, checked);
> +}
> +
> +void kmsan_internal_unpoison_memory(void *address, size_t size, bool checked)
> +{
> +       kmsan_internal_set_shadow_origin(address, size, 0, 0, checked);
> +}
> +
> +depot_stack_handle_t kmsan_save_stack_with_flags(gfp_t flags,
> +                                                unsigned int extra)
> +{
> +       unsigned long entries[KMSAN_STACK_DEPTH];
> +       unsigned int nr_entries;
> +
> +       nr_entries = stack_trace_save(entries, KMSAN_STACK_DEPTH, 0);
> +
> +       /* Don't sleep (see might_sleep_if() in __alloc_pages_nodemask()). */
> +       flags &= ~__GFP_DIRECT_RECLAIM;
> +
> +       return __stack_depot_save(entries, nr_entries, extra, flags, true);
> +}
> +
> +/* Copy the metadata following the memmove() behavior. */
> +void kmsan_internal_memmove_metadata(void *dst, void *src, size_t n)
> +{
> +       depot_stack_handle_t old_origin = 0, new_origin = 0;
> +       int src_slots, dst_slots, i, iter, step, skip_bits;
> +       depot_stack_handle_t *origin_src, *origin_dst;
> +       void *shadow_src, *shadow_dst;
> +       u32 *align_shadow_src, shadow;
> +       bool backwards;
> +
> +       shadow_dst = kmsan_get_metadata(dst, KMSAN_META_SHADOW);
> +       if (!shadow_dst)
> +               return;
> +       KMSAN_WARN_ON(!kmsan_metadata_is_contiguous(dst, n));
> +
> +       shadow_src = kmsan_get_metadata(src, KMSAN_META_SHADOW);
> +       if (!shadow_src) {
> +               /*
> +                * @src is untracked: zero out destination shadow, ignore the
> +                * origins, we're done.
> +                */
> +               __memset(shadow_dst, 0, n);
> +               return;
> +       }
> +       KMSAN_WARN_ON(!kmsan_metadata_is_contiguous(src, n));
> +
> +       __memmove(shadow_dst, shadow_src, n);
> +
> +       origin_dst = kmsan_get_metadata(dst, KMSAN_META_ORIGIN);
> +       origin_src = kmsan_get_metadata(src, KMSAN_META_ORIGIN);
> +       KMSAN_WARN_ON(!origin_dst || !origin_src);
> +       src_slots = (ALIGN((u64)src + n, KMSAN_ORIGIN_SIZE) -
> +                    ALIGN_DOWN((u64)src, KMSAN_ORIGIN_SIZE)) /
> +                   KMSAN_ORIGIN_SIZE;
> +       dst_slots = (ALIGN((u64)dst + n, KMSAN_ORIGIN_SIZE) -
> +                    ALIGN_DOWN((u64)dst, KMSAN_ORIGIN_SIZE)) /
> +                   KMSAN_ORIGIN_SIZE;
> +       KMSAN_WARN_ON((src_slots < 1) || (dst_slots < 1));
> +       KMSAN_WARN_ON((src_slots - dst_slots > 1) ||
> +                     (dst_slots - src_slots < -1));
> +
> +       backwards = dst > src;
> +       i = backwards ? min(src_slots, dst_slots) - 1 : 0;
> +       iter = backwards ? -1 : 1;
> +
> +       align_shadow_src =
> +               (u32 *)ALIGN_DOWN((u64)shadow_src, KMSAN_ORIGIN_SIZE);
> +       for (step = 0; step < min(src_slots, dst_slots); step++, i += iter) {
> +               KMSAN_WARN_ON(i < 0);
> +               shadow = align_shadow_src[i];
> +               if (i == 0) {
> +                       /*
> +                        * If @src isn't aligned on KMSAN_ORIGIN_SIZE, don't
> +                        * look at the first @src % KMSAN_ORIGIN_SIZE bytes
> +                        * of the first shadow slot.
> +                        */
> +                       skip_bits = ((u64)src % KMSAN_ORIGIN_SIZE) * 8;
> +                       shadow = (shadow >> skip_bits) << skip_bits;
> +               }
> +               if (i == src_slots - 1) {
> +                       /*
> +                        * If @src + n isn't aligned on
> +                        * KMSAN_ORIGIN_SIZE, don't look at the last
> +                        * (@src + n) % KMSAN_ORIGIN_SIZE bytes of the
> +                        * last shadow slot.
> +                        */
> +                       skip_bits = (((u64)src + n) % KMSAN_ORIGIN_SIZE) * 8;
> +                       shadow = (shadow << skip_bits) >> skip_bits;
> +               }
> +               /*
> +                * Overwrite the origin only if the corresponding
> +                * shadow is nonempty.
> +                */
> +               if (origin_src[i] && (origin_src[i] != old_origin) && shadow) {
> +                       old_origin = origin_src[i];
> +                       new_origin = kmsan_internal_chain_origin(old_origin);
> +                       /*
> +                        * kmsan_internal_chain_origin() may return
> +                        * NULL, but we don't want to lose the previous
> +                        * origin value.
> +                        */
> +                       if (!new_origin)
> +                               new_origin = old_origin;
> +               }
> +               if (shadow)
> +                       origin_dst[i] = new_origin;
> +               else
> +                       origin_dst[i] = 0;
> +       }
> +       /*
> +        * If dst_slots is greater than src_slots (i.e.
> +        * dst_slots == src_slots + 1), there is an extra origin slot at the
> +        * beginning or end of the destination buffer, for which we take the
> +        * origin from the previous slot.
> +        * This is only done if the part of the source shadow corresponding to
> +        * slot is non-zero.
> +        *
> +        * E.g. if we copy 8 aligned bytes that are marked as uninitialized
> +        * and have origins o111 and o222, to an unaligned buffer with offset 1,
> +        * these two origins are copied to three origin slots, so one of then
> +        * needs to be duplicated, depending on the copy direction (@backwards)
> +        *
> +        *   src shadow: |uuuu|uuuu|....|
> +        *   src origin: |o111|o222|....|
> +        *
> +        * backwards = 0:
> +        *   dst shadow: |.uuu|uuuu|u...|
> +        *   dst origin: |....|o111|o222| - fill the empty slot with o111
> +        * backwards = 1:
> +        *   dst shadow: |.uuu|uuuu|u...|
> +        *   dst origin: |o111|o222|....| - fill the empty slot with o222
> +        */
> +       if (src_slots < dst_slots) {
> +               if (backwards) {
> +                       shadow = align_shadow_src[src_slots - 1];
> +                       skip_bits = (((u64)dst + n) % KMSAN_ORIGIN_SIZE) * 8;
> +                       shadow = (shadow << skip_bits) >> skip_bits;
> +                       if (shadow)
> +                               /* src_slots > 0, therefore dst_slots is at least 2 */
> +                               origin_dst[dst_slots - 1] = origin_dst[dst_slots - 2];
> +               } else {
> +                       shadow = align_shadow_src[0];
> +                       skip_bits = ((u64)dst % KMSAN_ORIGIN_SIZE) * 8;
> +                       shadow = (shadow >> skip_bits) << skip_bits;
> +                       if (shadow)
> +                               origin_dst[0] = origin_dst[1];
> +               }
> +       }
> +}
> +
> +depot_stack_handle_t kmsan_internal_chain_origin(depot_stack_handle_t id)
> +{
> +       unsigned long entries[3];
> +       u32 extra_bits;
> +       int depth;
> +       bool uaf;
> +
> +       if (!id)
> +               return id;
> +       /*
> +        * Make sure we have enough spare bits in @id to hold the UAF bit and
> +        * the chain depth.
> +        */
> +       BUILD_BUG_ON((1 << STACK_DEPOT_EXTRA_BITS) <= (MAX_CHAIN_DEPTH << 1));
> +
> +       extra_bits = stack_depot_get_extra_bits(id);
> +       depth = kmsan_depth_from_eb(extra_bits);
> +       uaf = kmsan_uaf_from_eb(extra_bits);
> +
> +       if (depth >= MAX_CHAIN_DEPTH) {
> +               static atomic_long_t kmsan_skipped_origins;
> +               long skipped = atomic_long_inc_return(&kmsan_skipped_origins);
> +
> +               if (skipped % NUM_SKIPPED_TO_WARN == 0) {
> +                       pr_warn("not chained %ld origins\n", skipped);
> +                       dump_stack();
> +                       kmsan_print_origin(id);
> +               }
> +               return id;
> +       }
> +       depth++;
> +       extra_bits = kmsan_extra_bits(depth, uaf);
> +
> +       entries[0] = KMSAN_CHAIN_MAGIC_ORIGIN;
> +       entries[1] = kmsan_save_stack_with_flags(GFP_ATOMIC, 0);
> +       entries[2] = id;
> +       /*
> +        * @entries is a local var in non-instrumented code, so KMSAN does not
> +        * know it is initialized. Explicitly unpoison it to avoid false
> +        * positives when __stack_depot_save() passes it to instrumented code.
> +        */
> +       kmsan_internal_unpoison_memory(entries, sizeof(entries), false);
> +       return __stack_depot_save(entries, ARRAY_SIZE(entries), extra_bits,
> +                                 GFP_ATOMIC, true);
> +}
> +
> +void kmsan_internal_set_shadow_origin(void *addr, size_t size, int b,
> +                                     u32 origin, bool checked)
> +{
> +       u64 address = (u64)addr;
> +       void *shadow_start;
> +       u32 *origin_start;
> +       size_t pad = 0;
> +       int i;
> +
> +       KMSAN_WARN_ON(!kmsan_metadata_is_contiguous(addr, size));
> +       shadow_start = kmsan_get_metadata(addr, KMSAN_META_SHADOW);
> +       if (!shadow_start) {
> +               /*
> +                * kmsan_metadata_is_contiguous() is true, so either all shadow
> +                * and origin pages are NULL, or all are non-NULL.
> +                */
> +               if (checked) {
> +                       pr_err("%s: not memsetting %ld bytes starting at %px, because the shadow is NULL\n",
> +                              __func__, size, addr);
> +                       KMSAN_WARN_ON(true);
> +               }
> +               return;
> +       }
> +       __memset(shadow_start, b, size);
> +
> +       if (!IS_ALIGNED(address, KMSAN_ORIGIN_SIZE)) {
> +               pad = address % KMSAN_ORIGIN_SIZE;
> +               address -= pad;
> +               size += pad;
> +       }
> +       size = ALIGN(size, KMSAN_ORIGIN_SIZE);
> +       origin_start =
> +               (u32 *)kmsan_get_metadata((void *)address, KMSAN_META_ORIGIN);
> +
> +       for (i = 0; i < size / KMSAN_ORIGIN_SIZE; i++)
> +               origin_start[i] = origin;
> +}
> +
> +struct page *kmsan_vmalloc_to_page_or_null(void *vaddr)
> +{
> +       struct page *page;
> +
> +       if (!kmsan_internal_is_vmalloc_addr(vaddr) &&
> +           !kmsan_internal_is_module_addr(vaddr))
> +               return NULL;
> +       page = vmalloc_to_page(vaddr);
> +       if (pfn_valid(page_to_pfn(page)))
> +               return page;
> +       else
> +               return NULL;
> +}
> +
> +void kmsan_internal_check_memory(void *addr, size_t size, const void *user_addr,
> +                                int reason)
> +{
> +       depot_stack_handle_t cur_origin = 0, new_origin = 0;
> +       unsigned long addr64 = (unsigned long)addr;
> +       depot_stack_handle_t *origin = NULL;
> +       unsigned char *shadow = NULL;
> +       int cur_off_start = -1;
> +       int i, chunk_size;
> +       size_t pos = 0;
> +
> +       if (!size)
> +               return;
> +       KMSAN_WARN_ON(!kmsan_metadata_is_contiguous(addr, size));
> +       while (pos < size) {
> +               chunk_size = min(size - pos,
> +                                PAGE_SIZE - ((addr64 + pos) % PAGE_SIZE));
> +               shadow = kmsan_get_metadata((void *)(addr64 + pos),
> +                                           KMSAN_META_SHADOW);
> +               if (!shadow) {
> +                       /*
> +                        * This page is untracked. If there were uninitialized
> +                        * bytes before, report them.
> +                        */
> +                       if (cur_origin) {
> +                               kmsan_enter_runtime();
> +                               kmsan_report(cur_origin, addr, size,
> +                                            cur_off_start, pos - 1, user_addr,
> +                                            reason);
> +                               kmsan_leave_runtime();
> +                       }
> +                       cur_origin = 0;
> +                       cur_off_start = -1;
> +                       pos += chunk_size;
> +                       continue;
> +               }
> +               for (i = 0; i < chunk_size; i++) {
> +                       if (!shadow[i]) {
> +                               /*
> +                                * This byte is unpoisoned. If there were
> +                                * poisoned bytes before, report them.
> +                                */
> +                               if (cur_origin) {
> +                                       kmsan_enter_runtime();
> +                                       kmsan_report(cur_origin, addr, size,
> +                                                    cur_off_start, pos + i - 1,
> +                                                    user_addr, reason);
> +                                       kmsan_leave_runtime();
> +                               }
> +                               cur_origin = 0;
> +                               cur_off_start = -1;
> +                               continue;
> +                       }
> +                       origin = kmsan_get_metadata((void *)(addr64 + pos + i),
> +                                                   KMSAN_META_ORIGIN);
> +                       KMSAN_WARN_ON(!origin);
> +                       new_origin = *origin;
> +                       /*
> +                        * Encountered new origin - report the previous
> +                        * uninitialized range.
> +                        */
> +                       if (cur_origin != new_origin) {
> +                               if (cur_origin) {
> +                                       kmsan_enter_runtime();
> +                                       kmsan_report(cur_origin, addr, size,
> +                                                    cur_off_start, pos + i - 1,
> +                                                    user_addr, reason);
> +                                       kmsan_leave_runtime();
> +                               }
> +                               cur_origin = new_origin;
> +                               cur_off_start = pos + i;
> +                       }
> +               }
> +               pos += chunk_size;
> +       }
> +       KMSAN_WARN_ON(pos != size);
> +       if (cur_origin) {
> +               kmsan_enter_runtime();
> +               kmsan_report(cur_origin, addr, size, cur_off_start, pos - 1,
> +                            user_addr, reason);
> +               kmsan_leave_runtime();
> +       }
> +}
> +
> +bool kmsan_metadata_is_contiguous(void *addr, size_t size)
> +{
> +       char *cur_shadow = NULL, *next_shadow = NULL, *cur_origin = NULL,
> +            *next_origin = NULL;
> +       u64 cur_addr = (u64)addr, next_addr = cur_addr + PAGE_SIZE;
> +       depot_stack_handle_t *origin_p;
> +       bool all_untracked = false;
> +
> +       if (!size)
> +               return true;
> +
> +       /* The whole range belongs to the same page. */
> +       if (ALIGN_DOWN(cur_addr + size - 1, PAGE_SIZE) ==
> +           ALIGN_DOWN(cur_addr, PAGE_SIZE))
> +               return true;
> +
> +       cur_shadow = kmsan_get_metadata((void *)cur_addr, /*is_origin*/ false);
> +       if (!cur_shadow)
> +               all_untracked = true;
> +       cur_origin = kmsan_get_metadata((void *)cur_addr, /*is_origin*/ true);
> +       if (all_untracked && cur_origin)
> +               goto report;
> +
> +       for (; next_addr < (u64)addr + size;
> +            cur_addr = next_addr, cur_shadow = next_shadow,
> +            cur_origin = next_origin, next_addr += PAGE_SIZE) {
> +               next_shadow = kmsan_get_metadata((void *)next_addr, false);
> +               next_origin = kmsan_get_metadata((void *)next_addr, true);
> +               if (all_untracked) {
> +                       if (next_shadow || next_origin)
> +                               goto report;
> +                       if (!next_shadow && !next_origin)
> +                               continue;
> +               }
> +               if (((u64)cur_shadow == ((u64)next_shadow - PAGE_SIZE)) &&
> +                   ((u64)cur_origin == ((u64)next_origin - PAGE_SIZE)))
> +                       continue;
> +               goto report;
> +       }
> +       return true;
> +
> +report:
> +       pr_err("%s: attempting to access two shadow page ranges.\n", __func__);
> +       pr_err("Access of size %ld at %px.\n", size, addr);
> +       pr_err("Addresses belonging to different ranges: %px and %px\n",
> +              (void *)cur_addr, (void *)next_addr);
> +       pr_err("page[0].shadow: %px, page[1].shadow: %px\n", cur_shadow,
> +              next_shadow);
> +       pr_err("page[0].origin: %px, page[1].origin: %px\n", cur_origin,
> +              next_origin);
> +       origin_p = kmsan_get_metadata(addr, KMSAN_META_ORIGIN);
> +       if (origin_p) {
> +               pr_err("Origin: %08x\n", *origin_p);
> +               kmsan_print_origin(*origin_p);
> +       } else {
> +               pr_err("Origin: unavailable\n");
> +       }
> +       return false;
> +}
> +
> +bool kmsan_internal_is_module_addr(void *vaddr)
> +{
> +       return ((u64)vaddr >= MODULES_VADDR) && ((u64)vaddr < MODULES_END);
> +}
> +
> +bool kmsan_internal_is_vmalloc_addr(void *addr)
> +{
> +       return ((u64)addr >= VMALLOC_START) && ((u64)addr < VMALLOC_END);
> +}
> diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
> new file mode 100644
> index 0000000000000..4ac62fa67a02a
> --- /dev/null
> +++ b/mm/kmsan/hooks.c
> @@ -0,0 +1,66 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * KMSAN hooks for kernel subsystems.
> + *
> + * These functions handle creation of KMSAN metadata for memory allocations.
> + *
> + * Copyright (C) 2018-2022 Google LLC
> + * Author: Alexander Potapenko <glider@google.com>
> + *
> + */
> +
> +#include <linux/cacheflush.h>
> +#include <linux/gfp.h>
> +#include <linux/mm.h>
> +#include <linux/mm_types.h>
> +#include <linux/slab.h>
> +#include <linux/uaccess.h>
> +
> +#include "../internal.h"
> +#include "../slab.h"
> +#include "kmsan.h"
> +
> +/*
> + * Instrumented functions shouldn't be called under
> + * kmsan_enter_runtime()/kmsan_leave_runtime(), because this will lead to
> + * skipping effects of functions like memset() inside instrumented code.
> + */
> +
> +/* Functions from kmsan-checks.h follow. */
> +void kmsan_poison_memory(const void *address, size_t size, gfp_t flags)
> +{
> +       if (!kmsan_enabled || kmsan_in_runtime())
> +               return;
> +       kmsan_enter_runtime();
> +       /* The users may want to poison/unpoison random memory. */
> +       kmsan_internal_poison_memory((void *)address, size, flags,
> +                                    KMSAN_POISON_NOCHECK);
> +       kmsan_leave_runtime();
> +}
> +EXPORT_SYMBOL(kmsan_poison_memory);
> +
> +void kmsan_unpoison_memory(const void *address, size_t size)
> +{
> +       unsigned long ua_flags;
> +
> +       if (!kmsan_enabled || kmsan_in_runtime())
> +               return;
> +
> +       ua_flags = user_access_save();
> +       kmsan_enter_runtime();
> +       /* The users may want to poison/unpoison random memory. */
> +       kmsan_internal_unpoison_memory((void *)address, size,
> +                                      KMSAN_POISON_NOCHECK);
> +       kmsan_leave_runtime();
> +       user_access_restore(ua_flags);
> +}
> +EXPORT_SYMBOL(kmsan_unpoison_memory);
> +
> +void kmsan_check_memory(const void *addr, size_t size)
> +{
> +       if (!kmsan_enabled)
> +               return;
> +       return kmsan_internal_check_memory((void *)addr, size, /*user_addr*/ 0,
> +                                          REASON_ANY);
> +}
> +EXPORT_SYMBOL(kmsan_check_memory);
> diff --git a/mm/kmsan/instrumentation.c b/mm/kmsan/instrumentation.c
> new file mode 100644
> index 0000000000000..1b705162be8c2
> --- /dev/null
> +++ b/mm/kmsan/instrumentation.c
> @@ -0,0 +1,271 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * KMSAN compiler API.
> + *
> + * This file implements __msan_XXX hooks that Clang inserts into the code
> + * compiled with -fsanitize=kernel-memory.
> + * See Documentation/dev-tools/kmsan.rst for more information on how KMSAN
> + * instrumentation works.
> + *
> + * Copyright (C) 2017-2022 Google LLC
> + * Author: Alexander Potapenko <glider@google.com>
> + *
> + */
> +
> +#include "kmsan.h"
> +#include <linux/gfp.h>
> +#include <linux/mm.h>
> +#include <linux/uaccess.h>
> +
> +static inline bool is_bad_asm_addr(void *addr, uintptr_t size, bool is_store)
> +{
> +       if ((u64)addr < TASK_SIZE)
> +               return true;
> +       if (!kmsan_get_metadata(addr, KMSAN_META_SHADOW))
> +               return true;
> +       return false;
> +}
> +
> +static inline struct shadow_origin_ptr
> +get_shadow_origin_ptr(void *addr, u64 size, bool store)
> +{
> +       unsigned long ua_flags = user_access_save();
> +       struct shadow_origin_ptr ret;
> +
> +       ret = kmsan_get_shadow_origin_ptr(addr, size, store);
> +       user_access_restore(ua_flags);
> +       return ret;
> +}
> +
> +/* Get shadow and origin pointers for a memory load with non-standard size. */
> +struct shadow_origin_ptr __msan_metadata_ptr_for_load_n(void *addr,
> +                                                       uintptr_t size)
> +{
> +       return get_shadow_origin_ptr(addr, size, /*store*/ false);
> +}
> +EXPORT_SYMBOL(__msan_metadata_ptr_for_load_n);
> +
> +/* Get shadow and origin pointers for a memory store with non-standard size. */
> +struct shadow_origin_ptr __msan_metadata_ptr_for_store_n(void *addr,
> +                                                        uintptr_t size)
> +{
> +       return get_shadow_origin_ptr(addr, size, /*store*/ true);
> +}
> +EXPORT_SYMBOL(__msan_metadata_ptr_for_store_n);
> +
> +/*
> + * Declare functions that obtain shadow/origin pointers for loads and stores
> + * with fixed size.
> + */
> +#define DECLARE_METADATA_PTR_GETTER(size)                                      \
> +       struct shadow_origin_ptr __msan_metadata_ptr_for_load_##size(          \
> +               void *addr)                                                    \
> +       {                                                                      \
> +               return get_shadow_origin_ptr(addr, size, /*store*/ false);     \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__msan_metadata_ptr_for_load_##size);                    \
> +       struct shadow_origin_ptr __msan_metadata_ptr_for_store_##size(         \
> +               void *addr)                                                    \
> +       {                                                                      \
> +               return get_shadow_origin_ptr(addr, size, /*store*/ true);      \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__msan_metadata_ptr_for_store_##size)
> +
> +DECLARE_METADATA_PTR_GETTER(1);
> +DECLARE_METADATA_PTR_GETTER(2);
> +DECLARE_METADATA_PTR_GETTER(4);
> +DECLARE_METADATA_PTR_GETTER(8);
> +
> +/*
> + * Handle a memory store performed by inline assembly. KMSAN conservatively
> + * attempts to unpoison the outputs of asm() directives to prevent false
> + * positives caused by missed stores.
> + */
> +void __msan_instrument_asm_store(void *addr, uintptr_t size)
> +{
> +       unsigned long ua_flags;
> +
> +       if (!kmsan_enabled || kmsan_in_runtime())
> +               return;
> +
> +       ua_flags = user_access_save();
> +       /*
> +        * Most of the accesses are below 32 bytes. The two exceptions so far
> +        * are clwb() (64 bytes) and FPU state (512 bytes).
> +        * It's unlikely that the assembly will touch more than 512 bytes.
> +        */
> +       if (size > 512) {
> +               WARN_ONCE(1, "assembly store size too big: %ld\n", size);
> +               size = 8;
> +       }
> +       if (is_bad_asm_addr(addr, size, /*is_store*/ true)) {
> +               user_access_restore(ua_flags);
> +               return;
> +       }
> +       kmsan_enter_runtime();
> +       /* Unpoisoning the memory on best effort. */
> +       kmsan_internal_unpoison_memory(addr, size, /*checked*/ false);
> +       kmsan_leave_runtime();
> +       user_access_restore(ua_flags);
> +}
> +EXPORT_SYMBOL(__msan_instrument_asm_store);
> +
> +/* Handle llvm.memmove intrinsic. */
> +void *__msan_memmove(void *dst, const void *src, uintptr_t n)
> +{
> +       void *result;
> +
> +       result = __memmove(dst, src, n);
> +       if (!n)
> +               /* Some people call memmove() with zero length. */
> +               return result;
> +       if (!kmsan_enabled || kmsan_in_runtime())
> +               return result;
> +
> +       kmsan_enter_runtime();
> +       kmsan_internal_memmove_metadata(dst, (void *)src, n);
> +       kmsan_leave_runtime();
> +
> +       return result;
> +}
> +EXPORT_SYMBOL(__msan_memmove);
> +
> +/* Handle llvm.memcpy intrinsic. */
> +void *__msan_memcpy(void *dst, const void *src, uintptr_t n)
> +{
> +       void *result;
> +
> +       result = __memcpy(dst, src, n);
> +       if (!n)
> +               /* Some people call memcpy() with zero length. */
> +               return result;
> +
> +       if (!kmsan_enabled || kmsan_in_runtime())
> +               return result;
> +
> +       kmsan_enter_runtime();
> +       /* Using memmove instead of memcpy doesn't affect correctness. */
> +       kmsan_internal_memmove_metadata(dst, (void *)src, n);
> +       kmsan_leave_runtime();
> +
> +       return result;
> +}
> +EXPORT_SYMBOL(__msan_memcpy);
> +
> +/* Handle llvm.memset intrinsic. */
> +void *__msan_memset(void *dst, int c, uintptr_t n)
> +{
> +       void *result;
> +
> +       result = __memset(dst, c, n);
> +       if (!kmsan_enabled || kmsan_in_runtime())
> +               return result;
> +
> +       kmsan_enter_runtime();
> +       /*
> +        * Clang doesn't pass parameter metadata here, so it is impossible to
> +        * use shadow of @c to set up the shadow for @dst.
> +        */
> +       kmsan_internal_unpoison_memory(dst, n, /*checked*/ false);
> +       kmsan_leave_runtime();
> +
> +       return result;
> +}
> +EXPORT_SYMBOL(__msan_memset);
> +
> +/*
> + * Create a new origin from an old one. This is done when storing an
> + * uninitialized value to memory. When reporting an error, KMSAN unrolls and
> + * prints the whole chain of stores that preceded the use of this value.
> + */
> +depot_stack_handle_t __msan_chain_origin(depot_stack_handle_t origin)
> +{
> +       depot_stack_handle_t ret = 0;
> +       unsigned long ua_flags;
> +
> +       if (!kmsan_enabled || kmsan_in_runtime())
> +               return ret;
> +
> +       ua_flags = user_access_save();
> +
> +       /* Creating new origins may allocate memory. */
> +       kmsan_enter_runtime();
> +       ret = kmsan_internal_chain_origin(origin);
> +       kmsan_leave_runtime();
> +       user_access_restore(ua_flags);
> +       return ret;
> +}
> +EXPORT_SYMBOL(__msan_chain_origin);
> +
> +/* Poison a local variable when entering a function. */
> +void __msan_poison_alloca(void *address, uintptr_t size, char *descr)
> +{
> +       depot_stack_handle_t handle;
> +       unsigned long entries[4];
> +       unsigned long ua_flags;
> +
> +       if (!kmsan_enabled || kmsan_in_runtime())
> +               return;
> +
> +       ua_flags = user_access_save();
> +       entries[0] = KMSAN_ALLOCA_MAGIC_ORIGIN;
> +       entries[1] = (u64)descr;
> +       entries[2] = (u64)__builtin_return_address(0);
> +       /*
> +        * With frame pointers enabled, it is possible to quickly fetch the
> +        * second frame of the caller stack without calling the unwinder.
> +        * Without them, simply do not bother.
> +        */
> +       if (IS_ENABLED(CONFIG_UNWINDER_FRAME_POINTER))
> +               entries[3] = (u64)__builtin_return_address(1);
> +       else
> +               entries[3] = 0;
> +
> +       /* stack_depot_save() may allocate memory. */
> +       kmsan_enter_runtime();
> +       handle = stack_depot_save(entries, ARRAY_SIZE(entries), GFP_ATOMIC);
> +       kmsan_leave_runtime();
> +
> +       kmsan_internal_set_shadow_origin(address, size, -1, handle,
> +                                        /*checked*/ true);
> +       user_access_restore(ua_flags);
> +}
> +EXPORT_SYMBOL(__msan_poison_alloca);
> +
> +/* Unpoison a local variable. */
> +void __msan_unpoison_alloca(void *address, uintptr_t size)
> +{
> +       if (!kmsan_enabled || kmsan_in_runtime())
> +               return;
> +
> +       kmsan_enter_runtime();
> +       kmsan_internal_unpoison_memory(address, size, /*checked*/ true);
> +       kmsan_leave_runtime();
> +}
> +EXPORT_SYMBOL(__msan_unpoison_alloca);
> +
> +/*
> + * Report that an uninitialized value with the given origin was used in a way
> + * that constituted undefined behavior.
> + */
> +void __msan_warning(u32 origin)
> +{
> +       if (!kmsan_enabled || kmsan_in_runtime())
> +               return;
> +       kmsan_enter_runtime();
> +       kmsan_report(origin, /*address*/ 0, /*size*/ 0,
> +                    /*off_first*/ 0, /*off_last*/ 0, /*user_addr*/ 0,
> +                    REASON_ANY);
> +       kmsan_leave_runtime();
> +}
> +EXPORT_SYMBOL(__msan_warning);
> +
> +/*
> + * At the beginning of an instrumented function, obtain the pointer to
> + * `struct kmsan_context_state` holding the metadata for function parameters.
> + */
> +struct kmsan_context_state *__msan_get_context_state(void)
> +{
> +       return &kmsan_get_context()->cstate;
> +}
> +EXPORT_SYMBOL(__msan_get_context_state);
> diff --git a/mm/kmsan/kmsan.h b/mm/kmsan/kmsan.h
> new file mode 100644
> index 0000000000000..d3c400ca097ba
> --- /dev/null
> +++ b/mm/kmsan/kmsan.h
> @@ -0,0 +1,190 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Functions used by the KMSAN runtime.
> + *
> + * Copyright (C) 2017-2022 Google LLC
> + * Author: Alexander Potapenko <glider@google.com>
> + *
> + */
> +
> +#ifndef __MM_KMSAN_KMSAN_H
> +#define __MM_KMSAN_KMSAN_H
> +
> +#include <asm/pgtable_64_types.h>
> +#include <linux/irqflags.h>
> +#include <linux/sched.h>
> +#include <linux/stackdepot.h>
> +#include <linux/stacktrace.h>
> +#include <linux/nmi.h>
> +#include <linux/mm.h>
> +#include <linux/printk.h>
> +
> +#define KMSAN_ALLOCA_MAGIC_ORIGIN 0xabcd0100
> +#define KMSAN_CHAIN_MAGIC_ORIGIN 0xabcd0200
> +
> +#define KMSAN_POISON_NOCHECK 0x0
> +#define KMSAN_POISON_CHECK 0x1
> +#define KMSAN_POISON_FREE 0x2
> +
> +#define KMSAN_ORIGIN_SIZE 4
> +
> +#define KMSAN_STACK_DEPTH 64
> +
> +#define KMSAN_META_SHADOW (false)
> +#define KMSAN_META_ORIGIN (true)
> +
> +extern bool kmsan_enabled;
> +extern int panic_on_kmsan;
> +
> +/*
> + * KMSAN performs a lot of consistency checks that are currently enabled by
> + * default. BUG_ON is normally discouraged in the kernel, unless used for
> + * debugging, but KMSAN itself is a debugging tool, so it makes little sense to
> + * recover if something goes wrong.
> + */
> +#define KMSAN_WARN_ON(cond)                                            \
> +       ({                                                              \
> +               const bool __cond = WARN_ON(cond);                      \
> +               if (unlikely(__cond)) {                                 \
> +                       WRITE_ONCE(kmsan_enabled, false);               \
> +                       if (panic_on_kmsan) {                           \
> +                               /* Can't call panic() here because */   \
> +                               /* of uaccess checks. */                \
> +                               BUG();                                  \
> +                       }                                               \
> +               }                                                       \
> +               __cond;                                                 \
> +       })
> +
> +/*
> + * A pair of metadata pointers to be returned by the instrumentation functions.
> + */
> +struct shadow_origin_ptr {
> +       void *shadow, *origin;
> +};
> +
> +struct shadow_origin_ptr kmsan_get_shadow_origin_ptr(void *addr, u64 size,
> +                                                    bool store);
> +void *kmsan_get_metadata(void *addr, bool is_origin);
> +
> +enum kmsan_bug_reason {
> +       REASON_ANY,
> +       REASON_COPY_TO_USER,
> +       REASON_SUBMIT_URB,
> +};
> +
> +void kmsan_print_origin(depot_stack_handle_t origin);
> +
> +/**
> + * kmsan_report() - Report a use of uninitialized value.
> + * @origin:    Stack ID of the uninitialized value.
> + * @address:   Address at which the memory access happens.
> + * @size:      Memory access size.
> + * @off_first: Offset (from @address) of the first byte to be reported.
> + * @off_last:  Offset (from @address) of the last byte to be reported.
> + * @user_addr: When non-NULL, denotes the userspace address to which the kernel
> + *             is leaking data.
> + * @reason:    Error type from enum kmsan_bug_reason.
> + *
> + * kmsan_report() prints an error message for a consequent group of bytes
> + * sharing the same origin. If an uninitialized value is used in a comparison,
> + * this function is called once without specifying the addresses. When checking
> + * a memory range, KMSAN may call kmsan_report() multiple times with the same
> + * @address, @size, @user_addr and @reason, but different @off_first and
> + * @off_last corresponding to different @origin values.
> + */
> +void kmsan_report(depot_stack_handle_t origin, void *address, int size,
> +                 int off_first, int off_last, const void *user_addr,
> +                 enum kmsan_bug_reason reason);
> +
> +DECLARE_PER_CPU(struct kmsan_ctx, kmsan_percpu_ctx);
> +
> +static __always_inline struct kmsan_ctx *kmsan_get_context(void)
> +{
> +       return in_task() ? &current->kmsan_ctx : raw_cpu_ptr(&kmsan_percpu_ctx);
> +}
> +
> +/*
> + * When a compiler hook or KMSAN runtime function is invoked, it may make a
> + * call to instrumented code and eventually call itself recursively. To avoid
> + * that, we guard the runtime entry regions with
> + * kmsan_enter_runtime()/kmsan_leave_runtime() and exit the hook if
> + * kmsan_in_runtime() is true.
> + *
> + * Non-runtime code may occasionally get executed in nested IRQs from the
> + * runtime code (e.g. when called via smp_call_function_single()). Because some
> + * KMSAN routines may take locks (e.g. for memory allocation), we conservatively
> + * bail out instead of calling them. To minimize the effect of this (potentially
> + * missing initialization events) kmsan_in_runtime() is not checked in
> + * non-blocking runtime functions.
> + */
> +static __always_inline bool kmsan_in_runtime(void)
> +{
> +       if ((hardirq_count() >> HARDIRQ_SHIFT) > 1)
> +               return true;
> +       return kmsan_get_context()->kmsan_in_runtime;
> +}
> +
> +static __always_inline void kmsan_enter_runtime(void)
> +{
> +       struct kmsan_ctx *ctx;
> +
> +       ctx = kmsan_get_context();
> +       KMSAN_WARN_ON(ctx->kmsan_in_runtime++);
> +}
> +
> +static __always_inline void kmsan_leave_runtime(void)
> +{
> +       struct kmsan_ctx *ctx = kmsan_get_context();
> +
> +       KMSAN_WARN_ON(--ctx->kmsan_in_runtime);
> +}
> +
> +depot_stack_handle_t kmsan_save_stack(void);
> +depot_stack_handle_t kmsan_save_stack_with_flags(gfp_t flags,
> +                                                unsigned int extra_bits);
> +
> +/*
> + * Pack and unpack the origin chain depth and UAF flag to/from the extra bits
> + * provided by the stack depot.
> + * The UAF flag is stored in the lowest bit, followed by the depth in the upper
> + * bits.
> + * set_dsh_extra_bits() is responsible for clamping the value.
> + */
> +static __always_inline unsigned int kmsan_extra_bits(unsigned int depth,
> +                                                    bool uaf)
> +{
> +       return (depth << 1) | uaf;
> +}
> +
> +static __always_inline bool kmsan_uaf_from_eb(unsigned int extra_bits)
> +{
> +       return extra_bits & 1;
> +}
> +
> +static __always_inline unsigned int kmsan_depth_from_eb(unsigned int extra_bits)
> +{
> +       return extra_bits >> 1;
> +}
> +
> +/*
> + * kmsan_internal_ functions are supposed to be very simple and not require the
> + * kmsan_in_runtime() checks.
> + */
> +void kmsan_internal_memmove_metadata(void *dst, void *src, size_t n);
> +void kmsan_internal_poison_memory(void *address, size_t size, gfp_t flags,
> +                                 unsigned int poison_flags);
> +void kmsan_internal_unpoison_memory(void *address, size_t size, bool checked);
> +void kmsan_internal_set_shadow_origin(void *address, size_t size, int b,
> +                                     u32 origin, bool checked);
> +depot_stack_handle_t kmsan_internal_chain_origin(depot_stack_handle_t id);
> +
> +bool kmsan_metadata_is_contiguous(void *addr, size_t size);
> +void kmsan_internal_check_memory(void *addr, size_t size, const void *user_addr,
> +                                int reason);
> +bool kmsan_internal_is_module_addr(void *vaddr);
> +bool kmsan_internal_is_vmalloc_addr(void *addr);
> +
> +struct page *kmsan_vmalloc_to_page_or_null(void *vaddr);
> +
> +#endif /* __MM_KMSAN_KMSAN_H */
> diff --git a/mm/kmsan/report.c b/mm/kmsan/report.c
> new file mode 100644
> index 0000000000000..c298edcf49ee5
> --- /dev/null
> +++ b/mm/kmsan/report.c
> @@ -0,0 +1,211 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * KMSAN error reporting routines.
> + *
> + * Copyright (C) 2019-2022 Google LLC
> + * Author: Alexander Potapenko <glider@google.com>
> + *
> + */
> +
> +#include <linux/console.h>
> +#include <linux/moduleparam.h>
> +#include <linux/stackdepot.h>
> +#include <linux/stacktrace.h>
> +#include <linux/uaccess.h>
> +
> +#include "kmsan.h"
> +
> +static DEFINE_RAW_SPINLOCK(kmsan_report_lock);
> +#define DESCR_SIZE 128
> +/* Protected by kmsan_report_lock */
> +static char report_local_descr[DESCR_SIZE];
> +int panic_on_kmsan __read_mostly;
> +
> +#ifdef MODULE_PARAM_PREFIX
> +#undef MODULE_PARAM_PREFIX
> +#endif
> +#define MODULE_PARAM_PREFIX "kmsan."
> +module_param_named(panic, panic_on_kmsan, int, 0);
> +
> +/*
> + * Skip internal KMSAN frames.
> + */
> +static int get_stack_skipnr(const unsigned long stack_entries[],
> +                           int num_entries)
> +{
> +       int len, skip;
> +       char buf[64];
> +
> +       for (skip = 0; skip < num_entries; ++skip) {
> +               len = scnprintf(buf, sizeof(buf), "%ps",
> +                               (void *)stack_entries[skip]);
> +
> +               /* Never show __msan_* or kmsan_* functions. */
> +               if ((strnstr(buf, "__msan_", len) == buf) ||
> +                   (strnstr(buf, "kmsan_", len) == buf))
> +                       continue;
> +
> +               /*
> +                * No match for runtime functions -- @skip entries to skip to
> +                * get to first frame of interest.
> +                */
> +               break;
> +       }
> +
> +       return skip;
> +}
> +
> +/*
> + * Currently the descriptions of locals generated by Clang look as follows:
> + *   ----local_name@function_name
> + * We want to print only the name of the local, as other information in that
> + * description can be confusing.
> + * The meaningful part of the description is copied to a global buffer to avoid
> + * allocating memory.
> + */
> +static char *pretty_descr(char *descr)
> +{
> +       int i, pos = 0, len = strlen(descr);
> +
> +       for (i = 0; i < len; i++) {
> +               if (descr[i] == '@')
> +                       break;
> +               if (descr[i] == '-')
> +                       continue;
> +               report_local_descr[pos] = descr[i];
> +               if (pos + 1 == DESCR_SIZE)
> +                       break;
> +               pos++;
> +       }
> +       report_local_descr[pos] = 0;
> +       return report_local_descr;
> +}
> +
> +void kmsan_print_origin(depot_stack_handle_t origin)
> +{
> +       unsigned long *entries = NULL, *chained_entries = NULL;
> +       unsigned int nr_entries, chained_nr_entries, skipnr;
> +       void *pc1 = NULL, *pc2 = NULL;
> +       depot_stack_handle_t head;
> +       unsigned long magic;
> +       char *descr = NULL;
> +
> +       if (!origin)
> +               return;
> +
> +       while (true) {
> +               nr_entries = stack_depot_fetch(origin, &entries);
> +               magic = nr_entries ? entries[0] : 0;
> +               if ((nr_entries == 4) && (magic == KMSAN_ALLOCA_MAGIC_ORIGIN)) {
> +                       descr = (char *)entries[1];
> +                       pc1 = (void *)entries[2];
> +                       pc2 = (void *)entries[3];
> +                       pr_err("Local variable %s created at:\n",
> +                              pretty_descr(descr));
> +                       if (pc1)
> +                               pr_err(" %pSb\n", pc1);
> +                       if (pc2)
> +                               pr_err(" %pSb\n", pc2);
> +                       break;
> +               }
> +               if ((nr_entries == 3) && (magic == KMSAN_CHAIN_MAGIC_ORIGIN)) {
> +                       head = entries[1];
> +                       origin = entries[2];
> +                       pr_err("Uninit was stored to memory at:\n");
> +                       chained_nr_entries =
> +                               stack_depot_fetch(head, &chained_entries);
> +                       kmsan_internal_unpoison_memory(
> +                               chained_entries,
> +                               chained_nr_entries * sizeof(*chained_entries),
> +                               /*checked*/ false);
> +                       skipnr = get_stack_skipnr(chained_entries,
> +                                                 chained_nr_entries);
> +                       stack_trace_print(chained_entries + skipnr,
> +                                         chained_nr_entries - skipnr, 0);
> +                       pr_err("\n");
> +                       continue;
> +               }
> +               pr_err("Uninit was created at:\n");
> +               if (nr_entries) {
> +                       skipnr = get_stack_skipnr(entries, nr_entries);
> +                       stack_trace_print(entries + skipnr, nr_entries - skipnr,
> +                                         0);
> +               } else {
> +                       pr_err("(stack is not available)\n");
> +               }
> +               break;
> +       }
> +}
> +
> +void kmsan_report(depot_stack_handle_t origin, void *address, int size,
> +                 int off_first, int off_last, const void *user_addr,
> +                 enum kmsan_bug_reason reason)
> +{
> +       unsigned long stack_entries[KMSAN_STACK_DEPTH];
> +       int num_stack_entries, skipnr;
> +       char *bug_type = NULL;
> +       unsigned long ua_flags;
> +       bool is_uaf;
> +
> +       if (!kmsan_enabled)
> +               return;
> +       if (!current->kmsan_ctx.allow_reporting)
> +               return;
> +       if (!origin)
> +               return;
> +
> +       current->kmsan_ctx.allow_reporting = false;
> +       ua_flags = user_access_save();
> +       raw_spin_lock(&kmsan_report_lock);
> +       pr_err("=====================================================\n");
> +       is_uaf = kmsan_uaf_from_eb(stack_depot_get_extra_bits(origin));
> +       switch (reason) {
> +       case REASON_ANY:
> +               bug_type = is_uaf ? "use-after-free" : "uninit-value";
> +               break;
> +       case REASON_COPY_TO_USER:
> +               bug_type = is_uaf ? "kernel-infoleak-after-free" :
> +                                         "kernel-infoleak";
> +               break;
> +       case REASON_SUBMIT_URB:
> +               bug_type = is_uaf ? "kernel-usb-infoleak-after-free" :
> +                                         "kernel-usb-infoleak";
> +               break;
> +       }
> +
> +       num_stack_entries =
> +               stack_trace_save(stack_entries, KMSAN_STACK_DEPTH, 1);
> +       skipnr = get_stack_skipnr(stack_entries, num_stack_entries);
> +
> +       pr_err("BUG: KMSAN: %s in %pSb\n",
> +              bug_type, (void *)stack_entries[skipnr]);
> +       stack_trace_print(stack_entries + skipnr, num_stack_entries - skipnr,
> +                         0);
> +       pr_err("\n");
> +
> +       kmsan_print_origin(origin);
> +
> +       if (size) {
> +               pr_err("\n");
> +               if (off_first == off_last)
> +                       pr_err("Byte %d of %d is uninitialized\n", off_first,
> +                              size);
> +               else
> +                       pr_err("Bytes %d-%d of %d are uninitialized\n",
> +                              off_first, off_last, size);
> +       }
> +       if (address)
> +               pr_err("Memory access of size %d starts at %px\n", size,
> +                      address);
> +       if (user_addr && reason == REASON_COPY_TO_USER)
> +               pr_err("Data copied to user address %px\n", user_addr);
> +       pr_err("\n");
> +       dump_stack_print_info(KERN_ERR);
> +       pr_err("=====================================================\n");
> +       add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE);
> +       raw_spin_unlock(&kmsan_report_lock);
> +       if (panic_on_kmsan)
> +               panic("kmsan.panic set ...\n");
> +       user_access_restore(ua_flags);
> +       current->kmsan_ctx.allow_reporting = true;
> +}
> diff --git a/mm/kmsan/shadow.c b/mm/kmsan/shadow.c
> new file mode 100644
> index 0000000000000..e5ad2972d7362
> --- /dev/null
> +++ b/mm/kmsan/shadow.c
> @@ -0,0 +1,147 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * KMSAN shadow implementation.
> + *
> + * Copyright (C) 2017-2022 Google LLC
> + * Author: Alexander Potapenko <glider@google.com>
> + *
> + */
> +
> +#include <asm/kmsan.h>
> +#include <asm/tlbflush.h>
> +#include <linux/cacheflush.h>
> +#include <linux/memblock.h>
> +#include <linux/mm_types.h>
> +#include <linux/percpu-defs.h>
> +#include <linux/slab.h>
> +#include <linux/smp.h>
> +#include <linux/stddef.h>
> +
> +#include "../internal.h"
> +#include "kmsan.h"
> +
> +#define shadow_page_for(page) ((page)->kmsan_shadow)
> +
> +#define origin_page_for(page) ((page)->kmsan_origin)
> +
> +static void *shadow_ptr_for(struct page *page)
> +{
> +       return page_address(shadow_page_for(page));
> +}
> +
> +static void *origin_ptr_for(struct page *page)
> +{
> +       return page_address(origin_page_for(page));
> +}
> +
> +static bool page_has_metadata(struct page *page)
> +{
> +       return shadow_page_for(page) && origin_page_for(page);
> +}
> +
> +static void set_no_shadow_origin_page(struct page *page)
> +{
> +       shadow_page_for(page) = NULL;
> +       origin_page_for(page) = NULL;
> +}
> +
> +/*
> + * Dummy load and store pages to be used when the real metadata is unavailable.
> + * There are separate pages for loads and stores, so that every load returns a
> + * zero, and every store doesn't affect other loads.
> + */
> +static char dummy_load_page[PAGE_SIZE] __aligned(PAGE_SIZE);
> +static char dummy_store_page[PAGE_SIZE] __aligned(PAGE_SIZE);
> +
> +static unsigned long vmalloc_meta(void *addr, bool is_origin)
> +{
> +       unsigned long addr64 = (unsigned long)addr, off;
> +
> +       KMSAN_WARN_ON(is_origin && !IS_ALIGNED(addr64, KMSAN_ORIGIN_SIZE));
> +       if (kmsan_internal_is_vmalloc_addr(addr)) {
> +               off = addr64 - VMALLOC_START;
> +               return off + (is_origin ? KMSAN_VMALLOC_ORIGIN_START :
> +                                               KMSAN_VMALLOC_SHADOW_START);
> +       }
> +       if (kmsan_internal_is_module_addr(addr)) {
> +               off = addr64 - MODULES_VADDR;
> +               return off + (is_origin ? KMSAN_MODULES_ORIGIN_START :
> +                                               KMSAN_MODULES_SHADOW_START);
> +       }
> +       return 0;
> +}
> +
> +static struct page *virt_to_page_or_null(void *vaddr)
> +{
> +       if (kmsan_virt_addr_valid(vaddr))
> +               return virt_to_page(vaddr);
> +       else
> +               return NULL;
> +}
> +
> +struct shadow_origin_ptr kmsan_get_shadow_origin_ptr(void *address, u64 size,
> +                                                    bool store)
> +{
> +       struct shadow_origin_ptr ret;
> +       void *shadow;
> +
> +       /*
> +        * Even if we redirect this memory access to the dummy page, it will
> +        * go out of bounds.
> +        */
> +       KMSAN_WARN_ON(size > PAGE_SIZE);
> +
> +       if (!kmsan_enabled)
> +               goto return_dummy;
> +
> +       KMSAN_WARN_ON(!kmsan_metadata_is_contiguous(address, size));
> +       shadow = kmsan_get_metadata(address, KMSAN_META_SHADOW);
> +       if (!shadow)
> +               goto return_dummy;
> +
> +       ret.shadow = shadow;
> +       ret.origin = kmsan_get_metadata(address, KMSAN_META_ORIGIN);
> +       return ret;
> +
> +return_dummy:
> +       if (store) {
> +               /* Ignore this store. */
> +               ret.shadow = dummy_store_page;
> +               ret.origin = dummy_store_page;
> +       } else {
> +               /* This load will return zero. */
> +               ret.shadow = dummy_load_page;
> +               ret.origin = dummy_load_page;
> +       }
> +       return ret;
> +}
> +
> +/*
> + * Obtain the shadow or origin pointer for the given address, or NULL if there's
> + * none. The caller must check the return value for being non-NULL if needed.
> + * The return value of this function should not depend on whether we're in the
> + * runtime or not.
> + */
> +void *kmsan_get_metadata(void *address, bool is_origin)
> +{
> +       u64 addr = (u64)address, pad, off;
> +       struct page *page;
> +
> +       if (is_origin && !IS_ALIGNED(addr, KMSAN_ORIGIN_SIZE)) {
> +               pad = addr % KMSAN_ORIGIN_SIZE;
> +               addr -= pad;
> +       }
> +       address = (void *)addr;
> +       if (kmsan_internal_is_vmalloc_addr(address) ||
> +           kmsan_internal_is_module_addr(address))
> +               return (void *)vmalloc_meta(address, is_origin);
> +
> +       page = virt_to_page_or_null(address);
> +       if (!page)
> +               return NULL;
> +       if (!page_has_metadata(page))
> +               return NULL;
> +       off = addr % PAGE_SIZE;
> +
> +       return (is_origin ? origin_ptr_for(page) : shadow_ptr_for(page)) + off;
> +}
> diff --git a/scripts/Makefile.kmsan b/scripts/Makefile.kmsan
> new file mode 100644
> index 0000000000000..b5b0aa61322ec
> --- /dev/null
> +++ b/scripts/Makefile.kmsan
> @@ -0,0 +1,8 @@
> +# SPDX-License-Identifier: GPL-2.0
> +kmsan-cflags := -fsanitize=kernel-memory
> +
> +ifdef CONFIG_KMSAN_CHECK_PARAM_RETVAL
> +kmsan-cflags += -fsanitize-memory-param-retval
> +endif
> +
> +export CFLAGS_KMSAN := $(kmsan-cflags)
> diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> index d1425778664b9..46ebf7cb081f6 100644
> --- a/scripts/Makefile.lib
> +++ b/scripts/Makefile.lib
> @@ -157,6 +157,15 @@ _c_flags += $(if $(patsubst n%,, \
>  endif
>  endif
>
> +ifeq ($(CONFIG_KMSAN),y)
> +_c_flags += $(if $(patsubst n%,, \
> +               $(KMSAN_SANITIZE_$(basetarget).o)$(KMSAN_SANITIZE)y), \
> +               $(CFLAGS_KMSAN))
> +_c_flags += $(if $(patsubst n%,, \
> +               $(KMSAN_ENABLE_CHECKS_$(basetarget).o)$(KMSAN_ENABLE_CHECKS)y), \
> +               , -mllvm -msan-disable-checks=1)
> +endif
> +
>  ifeq ($(CONFIG_UBSAN),y)
>  _c_flags += $(if $(patsubst n%,, \
>                 $(UBSAN_SANITIZE_$(basetarget).o)$(UBSAN_SANITIZE)$(CONFIG_UBSAN_SANITIZE_ALL)), \
> --
> 2.37.0.rc0.161.g10f37bed90-goog
>
> --
> You received this message because you are subscribed to the Google Groups "kasan-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/20220701142310.2188015-12-glider%40google.com.

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 12/45] kmsan: disable instrumentation of unsupported common kernel code
  2022-07-01 14:22 ` [PATCH v4 12/45] kmsan: disable instrumentation of unsupported common kernel code Alexander Potapenko
@ 2022-07-12 11:54   ` Marco Elver
  0 siblings, 0 replies; 147+ messages in thread
From: Marco Elver @ 2022-07-12 11:54 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, linux-mm, linux-arch, linux-kernel

On Fri, 1 Jul 2022 at 16:23, Alexander Potapenko <glider@google.com> wrote:
>
> EFI stub cannot be linked with KMSAN runtime, so we disable
> instrumentation for it.
>
> Instrumenting kcov, stackdepot or lockdep leads to infinite recursion
> caused by instrumentation hooks calling instrumented code again.
>
> This patch was previously part of "kmsan: disable KMSAN instrumentation
> for certain kernel parts", but was split away per Mark Rutland's
> request.

The "This patch..." paragraph feels out of place, and feels like it
should be part of a v4 changelog below ---.

> Signed-off-by: Alexander Potapenko <glider@google.com>

Otherwise,

Reviewed-by: Marco Elver <elver@google.com>

> ---
> Link: https://linux-review.googlesource.com/id/I41ae706bd3474f074f6a870bfc3f0f90e9c720f7
> ---
>  drivers/firmware/efi/libstub/Makefile | 1 +
>  kernel/Makefile                       | 1 +
>  kernel/locking/Makefile               | 3 ++-
>  lib/Makefile                          | 1 +
>  4 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
> index d0537573501e9..81432d0c904b1 100644
> --- a/drivers/firmware/efi/libstub/Makefile
> +++ b/drivers/firmware/efi/libstub/Makefile
> @@ -46,6 +46,7 @@ GCOV_PROFILE                  := n
>  # Sanitizer runtimes are unavailable and cannot be linked here.
>  KASAN_SANITIZE                 := n
>  KCSAN_SANITIZE                 := n
> +KMSAN_SANITIZE                 := n
>  UBSAN_SANITIZE                 := n
>  OBJECT_FILES_NON_STANDARD      := y
>
> diff --git a/kernel/Makefile b/kernel/Makefile
> index a7e1f49ab2b3b..e47f0526c987f 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -38,6 +38,7 @@ KCOV_INSTRUMENT_kcov.o := n
>  KASAN_SANITIZE_kcov.o := n
>  KCSAN_SANITIZE_kcov.o := n
>  UBSAN_SANITIZE_kcov.o := n
> +KMSAN_SANITIZE_kcov.o := n
>  CFLAGS_kcov.o := $(call cc-option, -fno-conserve-stack) -fno-stack-protector
>
>  # Don't instrument error handlers
> diff --git a/kernel/locking/Makefile b/kernel/locking/Makefile
> index d51cabf28f382..ea925731fa40f 100644
> --- a/kernel/locking/Makefile
> +++ b/kernel/locking/Makefile
> @@ -5,8 +5,9 @@ KCOV_INSTRUMENT         := n
>
>  obj-y += mutex.o semaphore.o rwsem.o percpu-rwsem.o
>
> -# Avoid recursion lockdep -> KCSAN -> ... -> lockdep.
> +# Avoid recursion lockdep -> sanitizer -> ... -> lockdep.
>  KCSAN_SANITIZE_lockdep.o := n
> +KMSAN_SANITIZE_lockdep.o := n
>
>  ifdef CONFIG_FUNCTION_TRACER
>  CFLAGS_REMOVE_lockdep.o = $(CC_FLAGS_FTRACE)
> diff --git a/lib/Makefile b/lib/Makefile
> index f99bf61f8bbc6..5056769d00bb6 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -272,6 +272,7 @@ obj-$(CONFIG_POLYNOMIAL) += polynomial.o
>  CFLAGS_stackdepot.o += -fno-builtin
>  obj-$(CONFIG_STACKDEPOT) += stackdepot.o
>  KASAN_SANITIZE_stackdepot.o := n
> +KMSAN_SANITIZE_stackdepot.o := n
>  KCOV_INSTRUMENT_stackdepot.o := n
>
>  obj-$(CONFIG_REF_TRACKER) += ref_tracker.o
> --
> 2.37.0.rc0.161.g10f37bed90-goog
>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 13/45] MAINTAINERS: add entry for KMSAN
  2022-07-01 14:22 ` [PATCH v4 13/45] MAINTAINERS: add entry for KMSAN Alexander Potapenko
@ 2022-07-12 12:06   ` Marco Elver
  2022-08-02 16:39     ` Alexander Potapenko
  0 siblings, 1 reply; 147+ messages in thread
From: Marco Elver @ 2022-07-12 12:06 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, linux-mm, linux-arch, linux-kernel

On Fri, 1 Jul 2022 at 16:23, 'Alexander Potapenko' via kasan-dev
<kasan-dev@googlegroups.com> wrote:
>
> Add entry for KMSAN maintainers/reviewers.
>
> Signed-off-by: Alexander Potapenko <glider@google.com>
> ---
> Link: https://linux-review.googlesource.com/id/Ic5836c2bceb6b63f71a60d3327d18af3aa3dab77
> ---
>  MAINTAINERS | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index fe5daf1415013..f56281df30284 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -11106,6 +11106,18 @@ F:     kernel/kmod.c
>  F:     lib/test_kmod.c
>  F:     tools/testing/selftests/kmod/
>
> +KMSAN
> +M:     Alexander Potapenko <glider@google.com>
> +R:     Marco Elver <elver@google.com>
> +R:     Dmitry Vyukov <dvyukov@google.com>
> +L:     kasan-dev@googlegroups.com
> +S:     Maintained
> +F:     Documentation/dev-tools/kmsan.rst
> +F:     include/linux/kmsan*.h
> +F:     lib/Kconfig.kmsan
> +F:     mm/kmsan/
> +F:     scripts/Makefile.kmsan
> +

It's missing:

  arch/*/include/asm/kmsan.h

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 14/45] mm: kmsan: maintain KMSAN metadata for page operations
  2022-07-01 14:22 ` [PATCH v4 14/45] mm: kmsan: maintain KMSAN metadata for page operations Alexander Potapenko
@ 2022-07-12 12:20   ` Marco Elver
  2022-08-03 10:30     ` Alexander Potapenko
  0 siblings, 1 reply; 147+ messages in thread
From: Marco Elver @ 2022-07-12 12:20 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, linux-mm, linux-arch, linux-kernel

On Fri, 1 Jul 2022 at 16:23, Alexander Potapenko <glider@google.com> wrote:
>
> Insert KMSAN hooks that make the necessary bookkeeping changes:
>  - poison page shadow and origins in alloc_pages()/free_page();
>  - clear page shadow and origins in clear_page(), copy_user_highpage();
>  - copy page metadata in copy_highpage(), wp_page_copy();
>  - handle vmap()/vunmap()/iounmap();
>
> Signed-off-by: Alexander Potapenko <glider@google.com>
> ---
> v2:
>  -- move page metadata hooks implementation here
>  -- remove call to kmsan_memblock_free_pages()
>
> v3:
>  -- use PAGE_SHIFT in kmsan_ioremap_page_range()
>
> v4:
>  -- change sizeof(type) to sizeof(*ptr)
>  -- replace occurrences of |var| with @var
>  -- swap mm: and kmsan: in the subject
>  -- drop __no_sanitize_memory from clear_page()
>
> Link: https://linux-review.googlesource.com/id/I6d4f53a0e7eab46fa29f0348f3095d9f2e326850
> ---
>  arch/x86/include/asm/page_64.h |  12 ++++
>  arch/x86/mm/ioremap.c          |   3 +
>  include/linux/highmem.h        |   3 +
>  include/linux/kmsan.h          | 123 +++++++++++++++++++++++++++++++++
>  mm/internal.h                  |   6 ++
>  mm/kmsan/hooks.c               |  87 +++++++++++++++++++++++
>  mm/kmsan/shadow.c              | 114 ++++++++++++++++++++++++++++++
>  mm/memory.c                    |   2 +
>  mm/page_alloc.c                |  11 +++
>  mm/vmalloc.c                   |  20 +++++-
>  10 files changed, 379 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/asm/page_64.h b/arch/x86/include/asm/page_64.h
> index baa70451b8df5..227dd33eb4efb 100644
> --- a/arch/x86/include/asm/page_64.h
> +++ b/arch/x86/include/asm/page_64.h
> @@ -45,14 +45,26 @@ void clear_page_orig(void *page);
>  void clear_page_rep(void *page);
>  void clear_page_erms(void *page);
>
> +/* This is an assembly header, avoid including too much of kmsan.h */

All of this code is under an "#ifndef __ASSEMBLY__" guard, does it matter?

> +#ifdef CONFIG_KMSAN
> +void kmsan_unpoison_memory(const void *addr, size_t size);
> +#endif
>  static inline void clear_page(void *page)
>  {
> +#ifdef CONFIG_KMSAN
> +       /* alternative_call_2() changes @page. */
> +       void *page_copy = page;
> +#endif
>         alternative_call_2(clear_page_orig,
>                            clear_page_rep, X86_FEATURE_REP_GOOD,
>                            clear_page_erms, X86_FEATURE_ERMS,
>                            "=D" (page),
>                            "0" (page)
>                            : "cc", "memory", "rax", "rcx");
> +#ifdef CONFIG_KMSAN
> +       /* Clear KMSAN shadow for the pages that have it. */
> +       kmsan_unpoison_memory(page_copy, PAGE_SIZE);

What happens if this is called before the alternative-call? Could this
(in the interest of simplicity) be moved above it? And if you used the
kmsan-checks.h header, it also doesn't need any "ifdef CONFIG_KMSAN"
anymore.

> +#endif
>  }

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 15/45] mm: kmsan: call KMSAN hooks from SLUB code
  2022-07-01 14:22 ` [PATCH v4 15/45] mm: kmsan: call KMSAN hooks from SLUB code Alexander Potapenko
@ 2022-07-12 13:13   ` Marco Elver
  2022-08-02 16:31     ` Alexander Potapenko
  0 siblings, 1 reply; 147+ messages in thread
From: Marco Elver @ 2022-07-12 13:13 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, linux-mm, linux-arch, linux-kernel

On Fri, 1 Jul 2022 at 16:23, 'Alexander Potapenko' via kasan-dev
<kasan-dev@googlegroups.com> wrote:
>
> In order to report uninitialized memory coming from heap allocations
> KMSAN has to poison them unless they're created with __GFP_ZERO.
>
> It's handy that we need KMSAN hooks in the places where
> init_on_alloc/init_on_free initialization is performed.
>
> In addition, we apply __no_kmsan_checks to get_freepointer_safe() to
> suppress reports when accessing freelist pointers that reside in freed
> objects.
>
> Signed-off-by: Alexander Potapenko <glider@google.com>

Reviewed-by: Marco Elver <elver@google.com>

But see comment below.

> ---
> v2:
>  -- move the implementation of SLUB hooks here
>
> v4:
>  -- change sizeof(type) to sizeof(*ptr)
>  -- swap mm: and kmsan: in the subject
>  -- get rid of kmsan_init(), replace it with __no_kmsan_checks
>
> Link: https://linux-review.googlesource.com/id/I6954b386c5c5d7f99f48bb6cbcc74b75136ce86e
> ---
>  include/linux/kmsan.h | 57 ++++++++++++++++++++++++++++++
>  mm/kmsan/hooks.c      | 80 +++++++++++++++++++++++++++++++++++++++++++
>  mm/slab.h             |  1 +
>  mm/slub.c             | 18 ++++++++++
>  4 files changed, 156 insertions(+)
>
> diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h
> index 699fe4f5b3bee..fd76cea338878 100644
> --- a/include/linux/kmsan.h
> +++ b/include/linux/kmsan.h
> @@ -15,6 +15,7 @@
>  #include <linux/types.h>
>
>  struct page;
> +struct kmem_cache;
>
>  #ifdef CONFIG_KMSAN
>
> @@ -72,6 +73,44 @@ void kmsan_free_page(struct page *page, unsigned int order);
>   */
>  void kmsan_copy_page_meta(struct page *dst, struct page *src);
>
> +/**
> + * kmsan_slab_alloc() - Notify KMSAN about a slab allocation.
> + * @s:      slab cache the object belongs to.
> + * @object: object pointer.
> + * @flags:  GFP flags passed to the allocator.
> + *
> + * Depending on cache flags and GFP flags, KMSAN sets up the metadata of the
> + * newly created object, marking it as initialized or uninitialized.
> + */
> +void kmsan_slab_alloc(struct kmem_cache *s, void *object, gfp_t flags);
> +
> +/**
> + * kmsan_slab_free() - Notify KMSAN about a slab deallocation.
> + * @s:      slab cache the object belongs to.
> + * @object: object pointer.
> + *
> + * KMSAN marks the freed object as uninitialized.
> + */
> +void kmsan_slab_free(struct kmem_cache *s, void *object);
> +
> +/**
> + * kmsan_kmalloc_large() - Notify KMSAN about a large slab allocation.
> + * @ptr:   object pointer.
> + * @size:  object size.
> + * @flags: GFP flags passed to the allocator.
> + *
> + * Similar to kmsan_slab_alloc(), but for large allocations.
> + */
> +void kmsan_kmalloc_large(const void *ptr, size_t size, gfp_t flags);
> +
> +/**
> + * kmsan_kfree_large() - Notify KMSAN about a large slab deallocation.
> + * @ptr: object pointer.
> + *
> + * Similar to kmsan_slab_free(), but for large allocations.
> + */
> +void kmsan_kfree_large(const void *ptr);
> +
>  /**
>   * kmsan_map_kernel_range_noflush() - Notify KMSAN about a vmap.
>   * @start:     start of vmapped range.
> @@ -138,6 +177,24 @@ static inline void kmsan_copy_page_meta(struct page *dst, struct page *src)
>  {
>  }
>
> +static inline void kmsan_slab_alloc(struct kmem_cache *s, void *object,
> +                                   gfp_t flags)
> +{
> +}
> +
> +static inline void kmsan_slab_free(struct kmem_cache *s, void *object)
> +{
> +}
> +
> +static inline void kmsan_kmalloc_large(const void *ptr, size_t size,
> +                                      gfp_t flags)
> +{
> +}
> +
> +static inline void kmsan_kfree_large(const void *ptr)
> +{
> +}
> +
>  static inline void kmsan_vmap_pages_range_noflush(unsigned long start,
>                                                   unsigned long end,
>                                                   pgprot_t prot,
> diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
> index 070756be70e3a..052e17b7a717d 100644
> --- a/mm/kmsan/hooks.c
> +++ b/mm/kmsan/hooks.c
> @@ -26,6 +26,86 @@
>   * skipping effects of functions like memset() inside instrumented code.
>   */
>
> +void kmsan_slab_alloc(struct kmem_cache *s, void *object, gfp_t flags)
> +{
> +       if (unlikely(object == NULL))
> +               return;
> +       if (!kmsan_enabled || kmsan_in_runtime())
> +               return;
> +       /*
> +        * There's a ctor or this is an RCU cache - do nothing. The memory
> +        * status hasn't changed since last use.
> +        */
> +       if (s->ctor || (s->flags & SLAB_TYPESAFE_BY_RCU))
> +               return;
> +
> +       kmsan_enter_runtime();
> +       if (flags & __GFP_ZERO)
> +               kmsan_internal_unpoison_memory(object, s->object_size,
> +                                              KMSAN_POISON_CHECK);
> +       else
> +               kmsan_internal_poison_memory(object, s->object_size, flags,
> +                                            KMSAN_POISON_CHECK);
> +       kmsan_leave_runtime();
> +}
> +EXPORT_SYMBOL(kmsan_slab_alloc);
> +
> +void kmsan_slab_free(struct kmem_cache *s, void *object)
> +{
> +       if (!kmsan_enabled || kmsan_in_runtime())
> +               return;
> +
> +       /* RCU slabs could be legally used after free within the RCU period */
> +       if (unlikely(s->flags & (SLAB_TYPESAFE_BY_RCU | SLAB_POISON)))
> +               return;
> +       /*
> +        * If there's a constructor, freed memory must remain in the same state
> +        * until the next allocation. We cannot save its state to detect
> +        * use-after-free bugs, instead we just keep it unpoisoned.
> +        */
> +       if (s->ctor)
> +               return;
> +       kmsan_enter_runtime();
> +       kmsan_internal_poison_memory(object, s->object_size, GFP_KERNEL,
> +                                    KMSAN_POISON_CHECK | KMSAN_POISON_FREE);
> +       kmsan_leave_runtime();
> +}
> +EXPORT_SYMBOL(kmsan_slab_free);
> +
> +void kmsan_kmalloc_large(const void *ptr, size_t size, gfp_t flags)
> +{
> +       if (unlikely(ptr == NULL))
> +               return;
> +       if (!kmsan_enabled || kmsan_in_runtime())
> +               return;
> +       kmsan_enter_runtime();
> +       if (flags & __GFP_ZERO)
> +               kmsan_internal_unpoison_memory((void *)ptr, size,
> +                                              /*checked*/ true);
> +       else
> +               kmsan_internal_poison_memory((void *)ptr, size, flags,
> +                                            KMSAN_POISON_CHECK);
> +       kmsan_leave_runtime();
> +}
> +EXPORT_SYMBOL(kmsan_kmalloc_large);
> +
> +void kmsan_kfree_large(const void *ptr)
> +{
> +       struct page *page;
> +
> +       if (!kmsan_enabled || kmsan_in_runtime())
> +               return;
> +       kmsan_enter_runtime();
> +       page = virt_to_head_page((void *)ptr);
> +       KMSAN_WARN_ON(ptr != page_address(page));
> +       kmsan_internal_poison_memory((void *)ptr,
> +                                    PAGE_SIZE << compound_order(page),
> +                                    GFP_KERNEL,
> +                                    KMSAN_POISON_CHECK | KMSAN_POISON_FREE);
> +       kmsan_leave_runtime();
> +}
> +EXPORT_SYMBOL(kmsan_kfree_large);
> +
>  static unsigned long vmalloc_shadow(unsigned long addr)
>  {
>         return (unsigned long)kmsan_get_metadata((void *)addr,
> diff --git a/mm/slab.h b/mm/slab.h
> index db9fb5c8dae73..d0de8195873d8 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -752,6 +752,7 @@ static inline void slab_post_alloc_hook(struct kmem_cache *s,
>                         memset(p[i], 0, s->object_size);
>                 kmemleak_alloc_recursive(p[i], s->object_size, 1,
>                                          s->flags, flags);
> +               kmsan_slab_alloc(s, p[i], flags);
>         }
>
>         memcg_slab_post_alloc_hook(s, objcg, flags, size, p);
> diff --git a/mm/slub.c b/mm/slub.c
> index b1281b8654bd3..b8b601f165087 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -22,6 +22,7 @@
>  #include <linux/proc_fs.h>
>  #include <linux/seq_file.h>
>  #include <linux/kasan.h>
> +#include <linux/kmsan.h>
>  #include <linux/cpu.h>
>  #include <linux/cpuset.h>
>  #include <linux/mempolicy.h>
> @@ -359,6 +360,17 @@ static void prefetch_freepointer(const struct kmem_cache *s, void *object)
>         prefetchw(object + s->offset);
>  }
>
> +/*
> + * When running under KMSAN, get_freepointer_safe() may return an uninitialized
> + * pointer value in the case the current thread loses the race for the next
> + * memory chunk in the freelist. In that case this_cpu_cmpxchg_double() in
> + * slab_alloc_node() will fail, so the uninitialized value won't be used, but
> + * KMSAN will still check all arguments of cmpxchg because of imperfect
> + * handling of inline assembly.
> + * To work around this problem, we apply __no_kmsan_checks to ensure that
> + * get_freepointer_safe() returns initialized memory.
> + */
> +__no_kmsan_checks
>  static inline void *get_freepointer_safe(struct kmem_cache *s, void *object)
>  {
>         unsigned long freepointer_addr;
> @@ -1709,6 +1721,7 @@ static inline void *kmalloc_large_node_hook(void *ptr, size_t size, gfp_t flags)
>         ptr = kasan_kmalloc_large(ptr, size, flags);
>         /* As ptr might get tagged, call kmemleak hook after KASAN. */
>         kmemleak_alloc(ptr, size, 1, flags);
> +       kmsan_kmalloc_large(ptr, size, flags);
>         return ptr;
>  }
>
> @@ -1716,12 +1729,14 @@ static __always_inline void kfree_hook(void *x)
>  {
>         kmemleak_free(x);
>         kasan_kfree_large(x);
> +       kmsan_kfree_large(x);
>  }
>
>  static __always_inline bool slab_free_hook(struct kmem_cache *s,
>                                                 void *x, bool init)
>  {
>         kmemleak_free_recursive(x, s->flags);
> +       kmsan_slab_free(s, x);
>
>         debug_check_no_locks_freed(x, s->object_size);
>
> @@ -3756,6 +3771,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
>          */
>         slab_post_alloc_hook(s, objcg, flags, size, p,
>                                 slab_want_init_on_alloc(flags, s));
> +

Remove unnecessary whitespace change.

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 16/45] kmsan: handle task creation and exiting
  2022-07-01 14:22 ` [PATCH v4 16/45] kmsan: handle task creation and exiting Alexander Potapenko
@ 2022-07-12 13:17   ` Marco Elver
  2022-08-02 15:47     ` Alexander Potapenko
  0 siblings, 1 reply; 147+ messages in thread
From: Marco Elver @ 2022-07-12 13:17 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, linux-mm, linux-arch, linux-kernel

On Fri, 1 Jul 2022 at 16:24, 'Alexander Potapenko' via kasan-dev
<kasan-dev@googlegroups.com> wrote:
>
> Tell KMSAN that a new task is created, so the tool creates a backing
> metadata structure for that task.
>
> Signed-off-by: Alexander Potapenko <glider@google.com>
> ---
> v2:
>  -- move implementation of kmsan_task_create() and kmsan_task_exit() here
>
> v4:
>  -- change sizeof(type) to sizeof(*ptr)
>
> Link: https://linux-review.googlesource.com/id/I0f41c3a1c7d66f7e14aabcfdfc7c69addb945805
> ---
>  include/linux/kmsan.h | 17 +++++++++++++++++
>  kernel/exit.c         |  2 ++
>  kernel/fork.c         |  2 ++
>  mm/kmsan/core.c       | 10 ++++++++++
>  mm/kmsan/hooks.c      | 19 +++++++++++++++++++
>  mm/kmsan/kmsan.h      |  2 ++
>  6 files changed, 52 insertions(+)
>
> diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h
> index fd76cea338878..b71e2032222e9 100644
> --- a/include/linux/kmsan.h
> +++ b/include/linux/kmsan.h
> @@ -16,6 +16,7 @@
>
>  struct page;
>  struct kmem_cache;
> +struct task_struct;
>
>  #ifdef CONFIG_KMSAN
>
> @@ -42,6 +43,14 @@ struct kmsan_ctx {
>         bool allow_reporting;
>  };
>
> +void kmsan_task_create(struct task_struct *task);
> +
> +/**
> + * kmsan_task_exit() - Notify KMSAN that a task has exited.
> + * @task: task about to finish.
> + */
> +void kmsan_task_exit(struct task_struct *task);
> +
>  /**
>   * kmsan_alloc_page() - Notify KMSAN about an alloc_pages() call.
>   * @page:  struct page pointer returned by alloc_pages().
> @@ -163,6 +172,14 @@ void kmsan_iounmap_page_range(unsigned long start, unsigned long end);
>
>  #else
>
> +static inline void kmsan_task_create(struct task_struct *task)
> +{
> +}
> +
> +static inline void kmsan_task_exit(struct task_struct *task)
> +{
> +}
> +
>  static inline int kmsan_alloc_page(struct page *page, unsigned int order,
>                                    gfp_t flags)
>  {
> diff --git a/kernel/exit.c b/kernel/exit.c
> index f072959fcab7f..1784b7a741ddd 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -60,6 +60,7 @@
>  #include <linux/writeback.h>
>  #include <linux/shm.h>
>  #include <linux/kcov.h>
> +#include <linux/kmsan.h>
>  #include <linux/random.h>
>  #include <linux/rcuwait.h>
>  #include <linux/compat.h>
> @@ -741,6 +742,7 @@ void __noreturn do_exit(long code)
>         WARN_ON(tsk->plug);
>
>         kcov_task_exit(tsk);
> +       kmsan_task_exit(tsk);
>
>         coredump_task_exit(tsk);
>         ptrace_event(PTRACE_EVENT_EXIT, code);
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 9d44f2d46c696..6dfca6f00ec82 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -37,6 +37,7 @@
>  #include <linux/fdtable.h>
>  #include <linux/iocontext.h>
>  #include <linux/key.h>
> +#include <linux/kmsan.h>
>  #include <linux/binfmts.h>
>  #include <linux/mman.h>
>  #include <linux/mmu_notifier.h>
> @@ -1026,6 +1027,7 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
>         tsk->worker_private = NULL;
>
>         kcov_task_init(tsk);
> +       kmsan_task_create(tsk);
>         kmap_local_fork(tsk);
>
>  #ifdef CONFIG_FAULT_INJECTION
> diff --git a/mm/kmsan/core.c b/mm/kmsan/core.c
> index 16fb8880a9c6d..7eabed03ed10b 100644
> --- a/mm/kmsan/core.c
> +++ b/mm/kmsan/core.c
> @@ -44,6 +44,16 @@ bool kmsan_enabled __read_mostly;
>   */
>  DEFINE_PER_CPU(struct kmsan_ctx, kmsan_percpu_ctx);
>
> +void kmsan_internal_task_create(struct task_struct *task)
> +{
> +       struct kmsan_ctx *ctx = &task->kmsan_ctx;
> +       struct thread_info *info = current_thread_info();
> +
> +       __memset(ctx, 0, sizeof(*ctx));
> +       ctx->allow_reporting = true;
> +       kmsan_internal_unpoison_memory(info, sizeof(*info), false);
> +}
> +
>  void kmsan_internal_poison_memory(void *address, size_t size, gfp_t flags,
>                                   unsigned int poison_flags)
>  {
> diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
> index 052e17b7a717d..43a529569053d 100644
> --- a/mm/kmsan/hooks.c
> +++ b/mm/kmsan/hooks.c
> @@ -26,6 +26,25 @@
>   * skipping effects of functions like memset() inside instrumented code.
>   */
>
> +void kmsan_task_create(struct task_struct *task)
> +{
> +       kmsan_enter_runtime();
> +       kmsan_internal_task_create(task);
> +       kmsan_leave_runtime();
> +}
> +EXPORT_SYMBOL(kmsan_task_create);
> +
> +void kmsan_task_exit(struct task_struct *task)
> +{
> +       struct kmsan_ctx *ctx = &task->kmsan_ctx;
> +
> +       if (!kmsan_enabled || kmsan_in_runtime())
> +               return;
> +
> +       ctx->allow_reporting = false;
> +}
> +EXPORT_SYMBOL(kmsan_task_exit);

Why are these EXPORT_SYMBOL? Will they be used from some kernel module?

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 33/45] x86: kmsan: disable instrumentation of unsupported code
  2022-07-01 14:22 ` [PATCH v4 33/45] x86: kmsan: disable instrumentation of unsupported code Alexander Potapenko
@ 2022-07-12 13:43   ` Marco Elver
  2022-08-03 10:52     ` Alexander Potapenko
  0 siblings, 1 reply; 147+ messages in thread
From: Marco Elver @ 2022-07-12 13:43 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, linux-mm, linux-arch, linux-kernel

On Fri, 1 Jul 2022 at 16:24, 'Alexander Potapenko' via kasan-dev
<kasan-dev@googlegroups.com> wrote:
[...]
> ---
>  arch/x86/boot/Makefile            | 1 +
>  arch/x86/boot/compressed/Makefile | 1 +
>  arch/x86/entry/vdso/Makefile      | 3 +++
>  arch/x86/kernel/Makefile          | 2 ++
>  arch/x86/kernel/cpu/Makefile      | 1 +
>  arch/x86/mm/Makefile              | 2 ++
>  arch/x86/realmode/rm/Makefile     | 1 +
>  lib/Makefile                      | 2 ++
[...]
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -272,6 +272,8 @@ obj-$(CONFIG_POLYNOMIAL) += polynomial.o
>  CFLAGS_stackdepot.o += -fno-builtin
>  obj-$(CONFIG_STACKDEPOT) += stackdepot.o
>  KASAN_SANITIZE_stackdepot.o := n
> +# In particular, instrumenting stackdepot.c with KMSAN will result in infinite
> +# recursion.
>  KMSAN_SANITIZE_stackdepot.o := n
>  KCOV_INSTRUMENT_stackdepot.o := n

This is generic code and not x86, should it have been in the earlier patch?

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 18/45] instrumented.h: add KMSAN support
  2022-07-01 14:22 ` [PATCH v4 18/45] instrumented.h: add KMSAN support Alexander Potapenko
@ 2022-07-12 13:51   ` Marco Elver
  2022-08-03 11:17     ` Alexander Potapenko
  0 siblings, 1 reply; 147+ messages in thread
From: Marco Elver @ 2022-07-12 13:51 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, linux-mm, linux-arch, linux-kernel

On Fri, 1 Jul 2022 at 16:24, Alexander Potapenko <glider@google.com> wrote:
>
> To avoid false positives, KMSAN needs to unpoison the data copied from
> the userspace. To detect infoleaks - check the memory buffer passed to
> copy_to_user().
>
> Signed-off-by: Alexander Potapenko <glider@google.com>

Reviewed-by: Marco Elver <elver@google.com>

With the code simplification below.

[...]
> --- a/mm/kmsan/hooks.c
> +++ b/mm/kmsan/hooks.c
> @@ -212,6 +212,44 @@ void kmsan_iounmap_page_range(unsigned long start, unsigned long end)
>  }
>  EXPORT_SYMBOL(kmsan_iounmap_page_range);
>
> +void kmsan_copy_to_user(void __user *to, const void *from, size_t to_copy,
> +                       size_t left)
> +{
> +       unsigned long ua_flags;
> +
> +       if (!kmsan_enabled || kmsan_in_runtime())
> +               return;
> +       /*
> +        * At this point we've copied the memory already. It's hard to check it
> +        * before copying, as the size of actually copied buffer is unknown.
> +        */
> +
> +       /* copy_to_user() may copy zero bytes. No need to check. */
> +       if (!to_copy)
> +               return;
> +       /* Or maybe copy_to_user() failed to copy anything. */
> +       if (to_copy <= left)
> +               return;
> +
> +       ua_flags = user_access_save();
> +       if ((u64)to < TASK_SIZE) {
> +               /* This is a user memory access, check it. */
> +               kmsan_internal_check_memory((void *)from, to_copy - left, to,
> +                                           REASON_COPY_TO_USER);

This could just do "} else {" and the stuff below, and would result in
simpler code with no explicit "return" and no duplicated
user_access_restore().

> +               user_access_restore(ua_flags);
> +               return;
> +       }
> +       /* Otherwise this is a kernel memory access. This happens when a compat
> +        * syscall passes an argument allocated on the kernel stack to a real
> +        * syscall.
> +        * Don't check anything, just copy the shadow of the copied bytes.
> +        */
> +       kmsan_internal_memmove_metadata((void *)to, (void *)from,
> +                                       to_copy - left);
> +       user_access_restore(ua_flags);
> +}
> +EXPORT_SYMBOL(kmsan_copy_to_user);

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 17/45] init: kmsan: call KMSAN initialization routines
  2022-07-01 14:22 ` [PATCH v4 17/45] init: kmsan: call KMSAN initialization routines Alexander Potapenko
@ 2022-07-12 14:05   ` Marco Elver
  2022-08-02 20:07     ` Alexander Potapenko
  0 siblings, 1 reply; 147+ messages in thread
From: Marco Elver @ 2022-07-12 14:05 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, linux-mm, linux-arch, linux-kernel

On Fri, 1 Jul 2022 at 16:24, Alexander Potapenko <glider@google.com> wrote:
>
> kmsan_init_shadow() scans the mappings created at boot time and creates
> metadata pages for those mappings.
>
> When the memblock allocator returns pages to pagealloc, we reserve 2/3
> of those pages and use them as metadata for the remaining 1/3. Once KMSAN
> starts, every page allocated by pagealloc has its associated shadow and
> origin pages.
>
> kmsan_initialize() initializes the bookkeeping for init_task and enables
> KMSAN.
>
> Signed-off-by: Alexander Potapenko <glider@google.com>
> ---
> v2:
>  -- move mm/kmsan/init.c and kmsan_memblock_free_pages() to this patch
>  -- print a warning that KMSAN is a debugging tool (per Greg K-H's
>     request)
>
> v4:
>  -- change sizeof(type) to sizeof(*ptr)
>  -- replace occurrences of |var| with @var
>  -- swap init: and kmsan: in the subject
>  -- do not export __init functions
>
> Link: https://linux-review.googlesource.com/id/I7bc53706141275914326df2345881ffe0cdd16bd
> ---
>  include/linux/kmsan.h |  48 +++++++++
>  init/main.c           |   3 +
>  mm/kmsan/Makefile     |   3 +-
>  mm/kmsan/init.c       | 238 ++++++++++++++++++++++++++++++++++++++++++
>  mm/kmsan/kmsan.h      |   3 +
>  mm/kmsan/shadow.c     |  36 +++++++
>  mm/page_alloc.c       |   3 +
>  7 files changed, 333 insertions(+), 1 deletion(-)
>  create mode 100644 mm/kmsan/init.c
>
> diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h
> index b71e2032222e9..82fd564cc72e7 100644
> --- a/include/linux/kmsan.h
> +++ b/include/linux/kmsan.h
> @@ -51,6 +51,40 @@ void kmsan_task_create(struct task_struct *task);
>   */
>  void kmsan_task_exit(struct task_struct *task);
>
> +/**
> + * kmsan_init_shadow() - Initialize KMSAN shadow at boot time.
> + *
> + * Allocate and initialize KMSAN metadata for early allocations.
> + */
> +void __init kmsan_init_shadow(void);
> +
> +/**
> + * kmsan_init_runtime() - Initialize KMSAN state and enable KMSAN.
> + */
> +void __init kmsan_init_runtime(void);
> +
> +/**
> + * kmsan_memblock_free_pages() - handle freeing of memblock pages.
> + * @page:      struct page to free.
> + * @order:     order of @page.
> + *
> + * Freed pages are either returned to buddy allocator or held back to be used
> + * as metadata pages.
> + */
> +bool __init kmsan_memblock_free_pages(struct page *page, unsigned int order);
> +
> +/**
> + * kmsan_task_create() - Initialize KMSAN state for the task.
> + * @task: task to initialize.
> + */
> +void kmsan_task_create(struct task_struct *task);
> +
> +/**
> + * kmsan_task_exit() - Notify KMSAN that a task has exited.
> + * @task: task about to finish.
> + */
> +void kmsan_task_exit(struct task_struct *task);

Something went wrong with patch shuffling here I think,
kmsan_task_create + kmsan_task_exit decls are duplicated by this
patch.

>  /**
>   * kmsan_alloc_page() - Notify KMSAN about an alloc_pages() call.
>   * @page:  struct page pointer returned by alloc_pages().
> @@ -172,6 +206,20 @@ void kmsan_iounmap_page_range(unsigned long start, unsigned long end);
>
>  #else
>
> +static inline void kmsan_init_shadow(void)
> +{
> +}
> +
> +static inline void kmsan_init_runtime(void)
> +{
> +}
> +
> +static inline bool kmsan_memblock_free_pages(struct page *page,
> +                                            unsigned int order)
> +{
> +       return true;
> +}
> +
>  static inline void kmsan_task_create(struct task_struct *task)
>  {
>  }
> diff --git a/init/main.c b/init/main.c
> index 0ee39cdcfcac9..7ba48a9ff1d53 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -34,6 +34,7 @@
>  #include <linux/percpu.h>
>  #include <linux/kmod.h>
>  #include <linux/kprobes.h>
> +#include <linux/kmsan.h>
>  #include <linux/vmalloc.h>
>  #include <linux/kernel_stat.h>
>  #include <linux/start_kernel.h>
> @@ -835,6 +836,7 @@ static void __init mm_init(void)
>         init_mem_debugging_and_hardening();
>         kfence_alloc_pool();
>         report_meminit();
> +       kmsan_init_shadow();
>         stack_depot_early_init();
>         mem_init();
>         mem_init_print_info();
> @@ -852,6 +854,7 @@ static void __init mm_init(void)
>         init_espfix_bsp();
>         /* Should be run after espfix64 is set up. */
>         pti_init();
> +       kmsan_init_runtime();
>  }
>
>  #ifdef CONFIG_RANDOMIZE_KSTACK_OFFSET
> diff --git a/mm/kmsan/Makefile b/mm/kmsan/Makefile
> index 550ad8625e4f9..401acb1a491ce 100644
> --- a/mm/kmsan/Makefile
> +++ b/mm/kmsan/Makefile
> @@ -3,7 +3,7 @@
>  # Makefile for KernelMemorySanitizer (KMSAN).
>  #
>  #
> -obj-y := core.o instrumentation.o hooks.o report.o shadow.o
> +obj-y := core.o instrumentation.o init.o hooks.o report.o shadow.o
>
>  KMSAN_SANITIZE := n
>  KCOV_INSTRUMENT := n
> @@ -18,6 +18,7 @@ CFLAGS_REMOVE.o = $(CC_FLAGS_FTRACE)
>
>  CFLAGS_core.o := $(CC_FLAGS_KMSAN_RUNTIME)
>  CFLAGS_hooks.o := $(CC_FLAGS_KMSAN_RUNTIME)
> +CFLAGS_init.o := $(CC_FLAGS_KMSAN_RUNTIME)
>  CFLAGS_instrumentation.o := $(CC_FLAGS_KMSAN_RUNTIME)
>  CFLAGS_report.o := $(CC_FLAGS_KMSAN_RUNTIME)
>  CFLAGS_shadow.o := $(CC_FLAGS_KMSAN_RUNTIME)
> diff --git a/mm/kmsan/init.c b/mm/kmsan/init.c
> new file mode 100644
> index 0000000000000..abbf595a1e359
> --- /dev/null
> +++ b/mm/kmsan/init.c
> @@ -0,0 +1,238 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * KMSAN initialization routines.
> + *
> + * Copyright (C) 2017-2021 Google LLC
> + * Author: Alexander Potapenko <glider@google.com>
> + *
> + */
> +
> +#include "kmsan.h"
> +
> +#include <asm/sections.h>
> +#include <linux/mm.h>
> +#include <linux/memblock.h>
> +
> +#include "../internal.h"
> +
> +#define NUM_FUTURE_RANGES 128
> +struct start_end_pair {
> +       u64 start, end;
> +};
> +
> +static struct start_end_pair start_end_pairs[NUM_FUTURE_RANGES] __initdata;
> +static int future_index __initdata;
> +
> +/*
> + * Record a range of memory for which the metadata pages will be created once
> + * the page allocator becomes available.
> + */
> +static void __init kmsan_record_future_shadow_range(void *start, void *end)
> +{
> +       u64 nstart = (u64)start, nend = (u64)end, cstart, cend;
> +       bool merged = false;
> +       int i;
> +
> +       KMSAN_WARN_ON(future_index == NUM_FUTURE_RANGES);
> +       KMSAN_WARN_ON((nstart >= nend) || !nstart || !nend);
> +       nstart = ALIGN_DOWN(nstart, PAGE_SIZE);
> +       nend = ALIGN(nend, PAGE_SIZE);
> +
> +       /*
> +        * Scan the existing ranges to see if any of them overlaps with
> +        * [start, end). In that case, merge the two ranges instead of
> +        * creating a new one.
> +        * The number of ranges is less than 20, so there is no need to organize
> +        * them into a more intelligent data structure.
> +        */
> +       for (i = 0; i < future_index; i++) {
> +               cstart = start_end_pairs[i].start;
> +               cend = start_end_pairs[i].end;
> +               if ((cstart < nstart && cend < nstart) ||
> +                   (cstart > nend && cend > nend))
> +                       /* ranges are disjoint - do not merge */
> +                       continue;
> +               start_end_pairs[i].start = min(nstart, cstart);
> +               start_end_pairs[i].end = max(nend, cend);
> +               merged = true;
> +               break;
> +       }
> +       if (merged)
> +               return;
> +       start_end_pairs[future_index].start = nstart;
> +       start_end_pairs[future_index].end = nend;
> +       future_index++;
> +}
> +
> +/*
> + * Initialize the shadow for existing mappings during kernel initialization.
> + * These include kernel text/data sections, NODE_DATA and future ranges
> + * registered while creating other data (e.g. percpu).
> + *
> + * Allocations via memblock can be only done before slab is initialized.
> + */
> +void __init kmsan_init_shadow(void)
> +{
> +       const size_t nd_size = roundup(sizeof(pg_data_t), PAGE_SIZE);
> +       phys_addr_t p_start, p_end;
> +       int nid;
> +       u64 i;
> +
> +       for_each_reserved_mem_range(i, &p_start, &p_end)
> +               kmsan_record_future_shadow_range(phys_to_virt(p_start),
> +                                                phys_to_virt(p_end));
> +       /* Allocate shadow for .data */
> +       kmsan_record_future_shadow_range(_sdata, _edata);
> +
> +       for_each_online_node(nid)
> +               kmsan_record_future_shadow_range(
> +                       NODE_DATA(nid), (char *)NODE_DATA(nid) + nd_size);
> +
> +       for (i = 0; i < future_index; i++)
> +               kmsan_init_alloc_meta_for_range(
> +                       (void *)start_end_pairs[i].start,
> +                       (void *)start_end_pairs[i].end);
> +}
> +
> +struct page_pair {

'struct shadow_origin_pages' for a more descriptive name?

> +       struct page *shadow, *origin;
> +};
> +static struct page_pair held_back[MAX_ORDER] __initdata;
> +
> +/*
> + * Eager metadata allocation. When the memblock allocator is freeing pages to
> + * pagealloc, we use 2/3 of them as metadata for the remaining 1/3.
> + * We store the pointers to the returned blocks of pages in held_back[] grouped
> + * by their order: when kmsan_memblock_free_pages() is called for the first
> + * time with a certain order, it is reserved as a shadow block, for the second
> + * time - as an origin block. On the third time the incoming block receives its
> + * shadow and origin ranges from the previously saved shadow and origin blocks,
> + * after which held_back[order] can be used again.
> + *
> + * At the very end there may be leftover blocks in held_back[]. They are
> + * collected later by kmsan_memblock_discard().
> + */
> +bool kmsan_memblock_free_pages(struct page *page, unsigned int order)
> +{
> +       struct page *shadow, *origin;

Can this just be 'struct page_pair'?

> +       if (!held_back[order].shadow) {
> +               held_back[order].shadow = page;
> +               return false;
> +       }
> +       if (!held_back[order].origin) {
> +               held_back[order].origin = page;
> +               return false;
> +       }
> +       shadow = held_back[order].shadow;
> +       origin = held_back[order].origin;
> +       kmsan_setup_meta(page, shadow, origin, order);
> +
> +       held_back[order].shadow = NULL;
> +       held_back[order].origin = NULL;
> +       return true;
> +}
> +
> +#define MAX_BLOCKS 8
> +struct smallstack {
> +       struct page *items[MAX_BLOCKS];
> +       int index;
> +       int order;
> +};
> +
> +static struct smallstack collect = {
> +       .index = 0,
> +       .order = MAX_ORDER,
> +};
> +
> +static void smallstack_push(struct smallstack *stack, struct page *pages)
> +{
> +       KMSAN_WARN_ON(stack->index == MAX_BLOCKS);
> +       stack->items[stack->index] = pages;
> +       stack->index++;
> +}
> +#undef MAX_BLOCKS
> +
> +static struct page *smallstack_pop(struct smallstack *stack)
> +{
> +       struct page *ret;
> +
> +       KMSAN_WARN_ON(stack->index == 0);
> +       stack->index--;
> +       ret = stack->items[stack->index];
> +       stack->items[stack->index] = NULL;
> +       return ret;
> +}
> +
> +static void do_collection(void)
> +{
> +       struct page *page, *shadow, *origin;
> +
> +       while (collect.index >= 3) {
> +               page = smallstack_pop(&collect);
> +               shadow = smallstack_pop(&collect);
> +               origin = smallstack_pop(&collect);
> +               kmsan_setup_meta(page, shadow, origin, collect.order);
> +               __free_pages_core(page, collect.order);
> +       }
> +}
> +
> +static void collect_split(void)
> +{
> +       struct smallstack tmp = {
> +               .order = collect.order - 1,
> +               .index = 0,
> +       };
> +       struct page *page;
> +
> +       if (!collect.order)
> +               return;
> +       while (collect.index) {
> +               page = smallstack_pop(&collect);
> +               smallstack_push(&tmp, &page[0]);
> +               smallstack_push(&tmp, &page[1 << tmp.order]);
> +       }
> +       __memcpy(&collect, &tmp, sizeof(tmp));
> +}
> +
> +/*
> + * Memblock is about to go away. Split the page blocks left over in held_back[]
> + * and return 1/3 of that memory to the system.
> + */
> +static void kmsan_memblock_discard(void)
> +{
> +       int i;
> +
> +       /*
> +        * For each order=N:
> +        *  - push held_back[N].shadow and .origin to @collect;
> +        *  - while there are >= 3 elements in @collect, do garbage collection:
> +        *    - pop 3 ranges from @collect;
> +        *    - use two of them as shadow and origin for the third one;
> +        *    - repeat;
> +        *  - split each remaining element from @collect into 2 ranges of
> +        *    order=N-1,
> +        *  - repeat.
> +        */
> +       collect.order = MAX_ORDER - 1;
> +       for (i = MAX_ORDER - 1; i >= 0; i--) {
> +               if (held_back[i].shadow)
> +                       smallstack_push(&collect, held_back[i].shadow);
> +               if (held_back[i].origin)
> +                       smallstack_push(&collect, held_back[i].origin);
> +               held_back[i].shadow = NULL;
> +               held_back[i].origin = NULL;
> +               do_collection();
> +               collect_split();
> +       }
> +}
> +
> +void __init kmsan_init_runtime(void)
> +{
> +       /* Assuming current is init_task */
> +       kmsan_internal_task_create(current);
> +       kmsan_memblock_discard();
> +       pr_info("Starting KernelMemorySanitizer\n");
> +       pr_info("ATTENTION: KMSAN is a debugging tool! Do not use it on production machines!\n");
> +       kmsan_enabled = true;
> +}
> diff --git a/mm/kmsan/kmsan.h b/mm/kmsan/kmsan.h
> index c7fb8666607e2..2f17912ef863f 100644
> --- a/mm/kmsan/kmsan.h
> +++ b/mm/kmsan/kmsan.h
> @@ -66,6 +66,7 @@ struct shadow_origin_ptr {
>  struct shadow_origin_ptr kmsan_get_shadow_origin_ptr(void *addr, u64 size,
>                                                      bool store);
>  void *kmsan_get_metadata(void *addr, bool is_origin);
> +void __init kmsan_init_alloc_meta_for_range(void *start, void *end);
>
>  enum kmsan_bug_reason {
>         REASON_ANY,
> @@ -188,5 +189,7 @@ bool kmsan_internal_is_module_addr(void *vaddr);
>  bool kmsan_internal_is_vmalloc_addr(void *addr);
>
>  struct page *kmsan_vmalloc_to_page_or_null(void *vaddr);
> +void kmsan_setup_meta(struct page *page, struct page *shadow,
> +                     struct page *origin, int order);
>
>  #endif /* __MM_KMSAN_KMSAN_H */
> diff --git a/mm/kmsan/shadow.c b/mm/kmsan/shadow.c
> index 416cb85487a1a..7b254c30d42cc 100644
> --- a/mm/kmsan/shadow.c
> +++ b/mm/kmsan/shadow.c
> @@ -259,3 +259,39 @@ void kmsan_vmap_pages_range_noflush(unsigned long start, unsigned long end,
>         kfree(s_pages);
>         kfree(o_pages);
>  }
> +
> +/* Allocate metadata for pages allocated at boot time. */
> +void __init kmsan_init_alloc_meta_for_range(void *start, void *end)
> +{
> +       struct page *shadow_p, *origin_p;
> +       void *shadow, *origin;
> +       struct page *page;
> +       u64 addr, size;
> +
> +       start = (void *)ALIGN_DOWN((u64)start, PAGE_SIZE);
> +       size = ALIGN((u64)end - (u64)start, PAGE_SIZE);
> +       shadow = memblock_alloc(size, PAGE_SIZE);
> +       origin = memblock_alloc(size, PAGE_SIZE);
> +       for (addr = 0; addr < size; addr += PAGE_SIZE) {
> +               page = virt_to_page_or_null((char *)start + addr);
> +               shadow_p = virt_to_page_or_null((char *)shadow + addr);
> +               set_no_shadow_origin_page(shadow_p);
> +               shadow_page_for(page) = shadow_p;
> +               origin_p = virt_to_page_or_null((char *)origin + addr);
> +               set_no_shadow_origin_page(origin_p);
> +               origin_page_for(page) = origin_p;
> +       }
> +}
> +
> +void kmsan_setup_meta(struct page *page, struct page *shadow,
> +                     struct page *origin, int order)
> +{
> +       int i;
> +
> +       for (i = 0; i < (1 << order); i++) {

Noticed this in many places, but we can just make these "for (int i =.." now.

> +               set_no_shadow_origin_page(&shadow[i]);
> +               set_no_shadow_origin_page(&origin[i]);
> +               shadow_page_for(&page[i]) = &shadow[i];
> +               origin_page_for(&page[i]) = &origin[i];
> +       }
> +}
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 785459251145e..e8d5a0b2a3264 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1731,6 +1731,9 @@ void __init memblock_free_pages(struct page *page, unsigned long pfn,
>  {
>         if (early_page_uninitialised(pfn))
>                 return;
> +       if (!kmsan_memblock_free_pages(page, order))
> +               /* KMSAN will take care of these pages. */
> +               return;

Add {} because the then-statement is not right below the if.

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 25/45] kmsan: add tests for KMSAN
  2022-07-01 14:22 ` [PATCH v4 25/45] kmsan: add tests for KMSAN Alexander Potapenko
@ 2022-07-12 14:16   ` Marco Elver
  2022-08-02 17:29     ` Alexander Potapenko
  0 siblings, 1 reply; 147+ messages in thread
From: Marco Elver @ 2022-07-12 14:16 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, linux-mm, linux-arch, linux-kernel

)

On Fri, 1 Jul 2022 at 16:24, 'Alexander Potapenko' via kasan-dev
<kasan-dev@googlegroups.com> wrote:
>
> The testing module triggers KMSAN warnings in different cases and checks
> that the errors are properly reported, using console probes to capture
> the tool's output.
>
> Signed-off-by: Alexander Potapenko <glider@google.com>
> ---
> v2:
>  -- add memcpy tests
>
> v4:
>  -- change sizeof(type) to sizeof(*ptr)
>  -- add test expectations for CONFIG_KMSAN_CHECK_PARAM_RETVAL
>
> Link: https://linux-review.googlesource.com/id/I49c3f59014cc37fd13541c80beb0b75a75244650
> ---
>  lib/Kconfig.kmsan     |  12 +
>  mm/kmsan/Makefile     |   4 +
>  mm/kmsan/kmsan_test.c | 552 ++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 568 insertions(+)
>  create mode 100644 mm/kmsan/kmsan_test.c
>
> diff --git a/lib/Kconfig.kmsan b/lib/Kconfig.kmsan
> index 8f768d4034e3c..f56ed7f7c7090 100644
> --- a/lib/Kconfig.kmsan
> +++ b/lib/Kconfig.kmsan
> @@ -47,4 +47,16 @@ config KMSAN_CHECK_PARAM_RETVAL
>           may potentially report errors in corner cases when non-instrumented
>           functions call instrumented ones.
>
> +config KMSAN_KUNIT_TEST
> +       tristate "KMSAN integration test suite" if !KUNIT_ALL_TESTS
> +       default KUNIT_ALL_TESTS
> +       depends on TRACEPOINTS && KUNIT
> +       help
> +         Test suite for KMSAN, testing various error detection scenarios,
> +         and checking that reports are correctly output to console.
> +
> +         Say Y here if you want the test to be built into the kernel and run
> +         during boot; say M if you want the test to build as a module; say N
> +         if you are unsure.
> +
>  endif
> diff --git a/mm/kmsan/Makefile b/mm/kmsan/Makefile
> index 401acb1a491ce..98eab2856626f 100644
> --- a/mm/kmsan/Makefile
> +++ b/mm/kmsan/Makefile
> @@ -22,3 +22,7 @@ CFLAGS_init.o := $(CC_FLAGS_KMSAN_RUNTIME)
>  CFLAGS_instrumentation.o := $(CC_FLAGS_KMSAN_RUNTIME)
>  CFLAGS_report.o := $(CC_FLAGS_KMSAN_RUNTIME)
>  CFLAGS_shadow.o := $(CC_FLAGS_KMSAN_RUNTIME)
> +
> +obj-$(CONFIG_KMSAN_KUNIT_TEST) += kmsan_test.o
> +KMSAN_SANITIZE_kmsan_test.o := y
> +CFLAGS_kmsan_test.o += $(call cc-disable-warning, uninitialized)
> diff --git a/mm/kmsan/kmsan_test.c b/mm/kmsan/kmsan_test.c
> new file mode 100644
> index 0000000000000..1b8da71ae0d4f
> --- /dev/null
> +++ b/mm/kmsan/kmsan_test.c
> @@ -0,0 +1,552 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Test cases for KMSAN.
> + * For each test case checks the presence (or absence) of generated reports.
> + * Relies on 'console' tracepoint to capture reports as they appear in the
> + * kernel log.
> + *
> + * Copyright (C) 2021-2022, Google LLC.
> + * Author: Alexander Potapenko <glider@google.com>
> + *
> + */
> +
> +#include <kunit/test.h>
> +#include "kmsan.h"
> +
> +#include <linux/jiffies.h>
> +#include <linux/kernel.h>
> +#include <linux/kmsan.h>
> +#include <linux/mm.h>
> +#include <linux/random.h>
> +#include <linux/slab.h>
> +#include <linux/spinlock.h>
> +#include <linux/string.h>
> +#include <linux/tracepoint.h>
> +#include <trace/events/printk.h>
> +
> +static DEFINE_PER_CPU(int, per_cpu_var);
> +
> +/* Report as observed from console. */
> +static struct {
> +       spinlock_t lock;
> +       bool available;
> +       bool ignore; /* Stop console output collection. */
> +       char header[256];
> +} observed = {
> +       .lock = __SPIN_LOCK_UNLOCKED(observed.lock),
> +};
> +
> +/* Probe for console output: obtains observed lines of interest. */
> +static void probe_console(void *ignore, const char *buf, size_t len)
> +{
> +       unsigned long flags;
> +
> +       if (observed.ignore)
> +               return;
> +       spin_lock_irqsave(&observed.lock, flags);
> +
> +       if (strnstr(buf, "BUG: KMSAN: ", len)) {
> +               /*
> +                * KMSAN report and related to the test.
> +                *
> +                * The provided @buf is not NUL-terminated; copy no more than
> +                * @len bytes and let strscpy() add the missing NUL-terminator.
> +                */
> +               strscpy(observed.header, buf,
> +                       min(len + 1, sizeof(observed.header)));
> +               WRITE_ONCE(observed.available, true);
> +               observed.ignore = true;
> +       }
> +       spin_unlock_irqrestore(&observed.lock, flags);
> +}
> +
> +/* Check if a report related to the test exists. */
> +static bool report_available(void)
> +{
> +       return READ_ONCE(observed.available);
> +}
> +
> +/* Information we expect in a report. */
> +struct expect_report {
> +       const char *error_type; /* Error type. */
> +       /*
> +        * Kernel symbol from the error header, or NULL if no report is
> +        * expected.
> +        */
> +       const char *symbol;
> +};
> +
> +/* Check observed report matches information in @r. */
> +static bool report_matches(const struct expect_report *r)
> +{
> +       typeof(observed.header) expected_header;
> +       unsigned long flags;
> +       bool ret = false;
> +       const char *end;
> +       char *cur;
> +
> +       /* Doubled-checked locking. */
> +       if (!report_available() || !r->symbol)
> +               return (!report_available() && !r->symbol);
> +
> +       /* Generate expected report contents. */
> +
> +       /* Title */
> +       cur = expected_header;
> +       end = &expected_header[sizeof(expected_header) - 1];
> +
> +       cur += scnprintf(cur, end - cur, "BUG: KMSAN: %s", r->error_type);
> +
> +       scnprintf(cur, end - cur, " in %s", r->symbol);
> +       /* The exact offset won't match, remove it; also strip module name. */
> +       cur = strchr(expected_header, '+');
> +       if (cur)
> +               *cur = '\0';
> +
> +       spin_lock_irqsave(&observed.lock, flags);
> +       if (!report_available())
> +               goto out; /* A new report is being captured. */
> +
> +       /* Finally match expected output to what we actually observed. */
> +       ret = strstr(observed.header, expected_header);
> +out:
> +       spin_unlock_irqrestore(&observed.lock, flags);
> +
> +       return ret;
> +}
> +
> +/* ===== Test cases ===== */
> +
> +/* Prevent replacing branch with select in LLVM. */
> +static noinline void check_true(char *arg)
> +{
> +       pr_info("%s is true\n", arg);
> +}
> +
> +static noinline void check_false(char *arg)
> +{
> +       pr_info("%s is false\n", arg);
> +}
> +
> +#define USE(x)                                                                 \
> +       do {                                                                   \
> +               if (x)                                                         \
> +                       check_true(#x);                                        \
> +               else                                                           \
> +                       check_false(#x);                                       \
> +       } while (0)
> +
> +#define EXPECTATION_ETYPE_FN(e, reason, fn)                                    \
> +       struct expect_report e = {                                             \
> +               .error_type = reason,                                          \
> +               .symbol = fn,                                                  \
> +       }
> +
> +#define EXPECTATION_NO_REPORT(e) EXPECTATION_ETYPE_FN(e, NULL, NULL)
> +#define EXPECTATION_UNINIT_VALUE_FN(e, fn)                                     \
> +       EXPECTATION_ETYPE_FN(e, "uninit-value", fn)
> +#define EXPECTATION_UNINIT_VALUE(e) EXPECTATION_UNINIT_VALUE_FN(e, __func__)
> +#define EXPECTATION_USE_AFTER_FREE(e)                                          \
> +       EXPECTATION_ETYPE_FN(e, "use-after-free", __func__)
> +
> +/* Test case: ensure that kmalloc() returns uninitialized memory. */
> +static void test_uninit_kmalloc(struct kunit *test)
> +{
> +       EXPECTATION_UNINIT_VALUE(expect);
> +       int *ptr;
> +
> +       kunit_info(test, "uninitialized kmalloc test (UMR report)\n");
> +       ptr = kmalloc(sizeof(*ptr), GFP_KERNEL);
> +       USE(*ptr);
> +       KUNIT_EXPECT_TRUE(test, report_matches(&expect));
> +}
> +
> +/*
> + * Test case: ensure that kmalloc'ed memory becomes initialized after memset().
> + */
> +static void test_init_kmalloc(struct kunit *test)
> +{
> +       EXPECTATION_NO_REPORT(expect);
> +       int *ptr;
> +
> +       kunit_info(test, "initialized kmalloc test (no reports)\n");
> +       ptr = kmalloc(sizeof(*ptr), GFP_KERNEL);
> +       memset(ptr, 0, sizeof(*ptr));
> +       USE(*ptr);
> +       KUNIT_EXPECT_TRUE(test, report_matches(&expect));
> +}
> +
> +/* Test case: ensure that kzalloc() returns initialized memory. */
> +static void test_init_kzalloc(struct kunit *test)
> +{
> +       EXPECTATION_NO_REPORT(expect);
> +       int *ptr;
> +
> +       kunit_info(test, "initialized kzalloc test (no reports)\n");
> +       ptr = kzalloc(sizeof(*ptr), GFP_KERNEL);
> +       USE(*ptr);
> +       KUNIT_EXPECT_TRUE(test, report_matches(&expect));
> +}
> +
> +/* Test case: ensure that local variables are uninitialized by default. */
> +static void test_uninit_stack_var(struct kunit *test)
> +{
> +       EXPECTATION_UNINIT_VALUE(expect);
> +       volatile int cond;
> +
> +       kunit_info(test, "uninitialized stack variable (UMR report)\n");
> +       USE(cond);
> +       KUNIT_EXPECT_TRUE(test, report_matches(&expect));
> +}
> +
> +/* Test case: ensure that local variables with initializers are initialized. */
> +static void test_init_stack_var(struct kunit *test)
> +{
> +       EXPECTATION_NO_REPORT(expect);
> +       volatile int cond = 1;
> +
> +       kunit_info(test, "initialized stack variable (no reports)\n");
> +       USE(cond);
> +       KUNIT_EXPECT_TRUE(test, report_matches(&expect));
> +}
> +
> +static noinline void two_param_fn_2(int arg1, int arg2)
> +{
> +       USE(arg1);
> +       USE(arg2);
> +}
> +
> +static noinline void one_param_fn(int arg)
> +{
> +       two_param_fn_2(arg, arg);
> +       USE(arg);
> +}
> +
> +static noinline void two_param_fn(int arg1, int arg2)
> +{
> +       int init = 0;
> +
> +       one_param_fn(init);
> +       USE(arg1);
> +       USE(arg2);
> +}
> +
> +static void test_params(struct kunit *test)
> +{
> +#ifdef CONFIG_KMSAN_CHECK_PARAM_RETVAL

if (IS_ENABLED(...))

> +       /*
> +        * With eager param/retval checking enabled, KMSAN will report an error
> +        * before the call to two_param_fn().
> +        */
> +       EXPECTATION_UNINIT_VALUE_FN(expect, "test_params");
> +#else
> +       EXPECTATION_UNINIT_VALUE_FN(expect, "two_param_fn");
> +#endif
> +       volatile int uninit, init = 1;
> +
> +       kunit_info(test,
> +                  "uninit passed through a function parameter (UMR report)\n");
> +       two_param_fn(uninit, init);
> +       KUNIT_EXPECT_TRUE(test, report_matches(&expect));
> +}
> +
> +static int signed_sum3(int a, int b, int c)
> +{
> +       return a + b + c;
> +}
> +
> +/*
> + * Test case: ensure that uninitialized values are tracked through function
> + * arguments.
> + */
> +static void test_uninit_multiple_params(struct kunit *test)
> +{
> +       EXPECTATION_UNINIT_VALUE(expect);
> +       volatile char b = 3, c;
> +       volatile int a;
> +
> +       kunit_info(test, "uninitialized local passed to fn (UMR report)\n");
> +       USE(signed_sum3(a, b, c));
> +       KUNIT_EXPECT_TRUE(test, report_matches(&expect));
> +}
> +
> +/* Helper function to make an array uninitialized. */
> +static noinline void do_uninit_local_array(char *array, int start, int stop)
> +{
> +       volatile char uninit;
> +       int i;
> +
> +       for (i = start; i < stop; i++)
> +               array[i] = uninit;
> +}
> +
> +/*
> + * Test case: ensure kmsan_check_memory() reports an error when checking
> + * uninitialized memory.
> + */
> +static void test_uninit_kmsan_check_memory(struct kunit *test)
> +{
> +       EXPECTATION_UNINIT_VALUE_FN(expect, "test_uninit_kmsan_check_memory");
> +       volatile char local_array[8];
> +
> +       kunit_info(
> +               test,
> +               "kmsan_check_memory() called on uninit local (UMR report)\n");
> +       do_uninit_local_array((char *)local_array, 5, 7);
> +
> +       kmsan_check_memory((char *)local_array, 8);
> +       KUNIT_EXPECT_TRUE(test, report_matches(&expect));
> +}
> +
> +/*
> + * Test case: check that a virtual memory range created with vmap() from
> + * initialized pages is still considered as initialized.
> + */
> +static void test_init_kmsan_vmap_vunmap(struct kunit *test)
> +{
> +       EXPECTATION_NO_REPORT(expect);
> +       const int npages = 2;
> +       struct page **pages;
> +       void *vbuf;
> +       int i;
> +
> +       kunit_info(test, "pages initialized via vmap (no reports)\n");
> +
> +       pages = kmalloc_array(npages, sizeof(*pages), GFP_KERNEL);
> +       for (i = 0; i < npages; i++)
> +               pages[i] = alloc_page(GFP_KERNEL);
> +       vbuf = vmap(pages, npages, VM_MAP, PAGE_KERNEL);
> +       memset(vbuf, 0xfe, npages * PAGE_SIZE);
> +       for (i = 0; i < npages; i++)
> +               kmsan_check_memory(page_address(pages[i]), PAGE_SIZE);
> +
> +       if (vbuf)
> +               vunmap(vbuf);
> +       for (i = 0; i < npages; i++)

add { }

> +               if (pages[i])
> +                       __free_page(pages[i]);
> +       kfree(pages);
> +       KUNIT_EXPECT_TRUE(test, report_matches(&expect));
> +}
> +
> +/*
> + * Test case: ensure that memset() can initialize a buffer allocated via
> + * vmalloc().
> + */
> +static void test_init_vmalloc(struct kunit *test)
> +{
> +       EXPECTATION_NO_REPORT(expect);
> +       int npages = 8, i;
> +       char *buf;
> +
> +       kunit_info(test, "vmalloc buffer can be initialized (no reports)\n");
> +       buf = vmalloc(PAGE_SIZE * npages);
> +       buf[0] = 1;
> +       memset(buf, 0xfe, PAGE_SIZE * npages);
> +       USE(buf[0]);
> +       for (i = 0; i < npages; i++)
> +               kmsan_check_memory(&buf[PAGE_SIZE * i], PAGE_SIZE);
> +       vfree(buf);
> +       KUNIT_EXPECT_TRUE(test, report_matches(&expect));
> +}
> +
> +/* Test case: ensure that use-after-free reporting works. */
> +static void test_uaf(struct kunit *test)
> +{
> +       EXPECTATION_USE_AFTER_FREE(expect);
> +       volatile int value;
> +       volatile int *var;
> +
> +       kunit_info(test, "use-after-free in kmalloc-ed buffer (UMR report)\n");
> +       var = kmalloc(80, GFP_KERNEL);
> +       var[3] = 0xfeedface;
> +       kfree((int *)var);
> +       /* Copy the invalid value before checking it. */
> +       value = var[3];
> +       USE(value);
> +       KUNIT_EXPECT_TRUE(test, report_matches(&expect));
> +}
> +
> +/*
> + * Test case: ensure that uninitialized values are propagated through per-CPU
> + * memory.
> + */
> +static void test_percpu_propagate(struct kunit *test)
> +{
> +       EXPECTATION_UNINIT_VALUE(expect);
> +       volatile int uninit, check;
> +
> +       kunit_info(test,
> +                  "uninit local stored to per_cpu memory (UMR report)\n");
> +
> +       this_cpu_write(per_cpu_var, uninit);
> +       check = this_cpu_read(per_cpu_var);
> +       USE(check);
> +       KUNIT_EXPECT_TRUE(test, report_matches(&expect));
> +}
> +
> +/*
> + * Test case: ensure that passing uninitialized values to printk() leads to an
> + * error report.
> + */
> +static void test_printk(struct kunit *test)
> +{
> +#ifdef CONFIG_KMSAN_CHECK_PARAM_RETVAL

if (IS_ENABLED(CONFIG_KMSAN_CHECK_PARAM_RETVAL))

> +       /*
> +        * With eager param/retval checking enabled, KMSAN will report an error
> +        * before the call to pr_info().
> +        */
> +       EXPECTATION_UNINIT_VALUE_FN(expect, "test_printk");
> +#else
> +       EXPECTATION_UNINIT_VALUE_FN(expect, "number");
> +#endif
> +       volatile int uninit;
> +
> +       kunit_info(test, "uninit local passed to pr_info() (UMR report)\n");
> +       pr_info("%px contains %d\n", &uninit, uninit);
> +       KUNIT_EXPECT_TRUE(test, report_matches(&expect));
> +}
> +
> +/*
> + * Test case: ensure that memcpy() correctly copies uninitialized values between
> + * aligned `src` and `dst`.
> + */
> +static void test_memcpy_aligned_to_aligned(struct kunit *test)
> +{
> +       EXPECTATION_UNINIT_VALUE_FN(expect, "test_memcpy_aligned_to_aligned");
> +       volatile int uninit_src;
> +       volatile int dst = 0;
> +
> +       kunit_info(test, "memcpy()ing aligned uninit src to aligned dst (UMR report)\n");
> +       memcpy((void *)&dst, (void *)&uninit_src, sizeof(uninit_src));
> +       kmsan_check_memory((void *)&dst, sizeof(dst));
> +       KUNIT_EXPECT_TRUE(test, report_matches(&expect));
> +}
> +
> +/*
> + * Test case: ensure that memcpy() correctly copies uninitialized values between
> + * aligned `src` and unaligned `dst`.
> + *
> + * Copying aligned 4-byte value to an unaligned one leads to touching two
> + * aligned 4-byte values. This test case checks that KMSAN correctly reports an
> + * error on the first of the two values.
> + */
> +static void test_memcpy_aligned_to_unaligned(struct kunit *test)
> +{
> +       EXPECTATION_UNINIT_VALUE_FN(expect, "test_memcpy_aligned_to_unaligned");
> +       volatile int uninit_src;
> +       volatile char dst[8] = {0};
> +
> +       kunit_info(test, "memcpy()ing aligned uninit src to unaligned dst (UMR report)\n");
> +       memcpy((void *)&dst[1], (void *)&uninit_src, sizeof(uninit_src));
> +       kmsan_check_memory((void *)dst, 4);
> +       KUNIT_EXPECT_TRUE(test, report_matches(&expect));
> +}
> +
> +/*
> + * Test case: ensure that memcpy() correctly copies uninitialized values between
> + * aligned `src` and unaligned `dst`.
> + *
> + * Copying aligned 4-byte value to an unaligned one leads to touching two
> + * aligned 4-byte values. This test case checks that KMSAN correctly reports an
> + * error on the second of the two values.
> + */
> +static void test_memcpy_aligned_to_unaligned2(struct kunit *test)
> +{
> +       EXPECTATION_UNINIT_VALUE_FN(expect, "test_memcpy_aligned_to_unaligned2");
> +       volatile int uninit_src;
> +       volatile char dst[8] = {0};
> +
> +       kunit_info(test, "memcpy()ing aligned uninit src to unaligned dst - part 2 (UMR report)\n");
> +       memcpy((void *)&dst[1], (void *)&uninit_src, sizeof(uninit_src));
> +       kmsan_check_memory((void *)&dst[4], sizeof(uninit_src));
> +       KUNIT_EXPECT_TRUE(test, report_matches(&expect));
> +}
> +
> +static struct kunit_case kmsan_test_cases[] = {
> +       KUNIT_CASE(test_uninit_kmalloc),
> +       KUNIT_CASE(test_init_kmalloc),
> +       KUNIT_CASE(test_init_kzalloc),
> +       KUNIT_CASE(test_uninit_stack_var),
> +       KUNIT_CASE(test_init_stack_var),
> +       KUNIT_CASE(test_params),
> +       KUNIT_CASE(test_uninit_multiple_params),
> +       KUNIT_CASE(test_uninit_kmsan_check_memory),
> +       KUNIT_CASE(test_init_kmsan_vmap_vunmap),
> +       KUNIT_CASE(test_init_vmalloc),
> +       KUNIT_CASE(test_uaf),
> +       KUNIT_CASE(test_percpu_propagate),
> +       KUNIT_CASE(test_printk),
> +       KUNIT_CASE(test_memcpy_aligned_to_aligned),
> +       KUNIT_CASE(test_memcpy_aligned_to_unaligned),
> +       KUNIT_CASE(test_memcpy_aligned_to_unaligned2),
> +       {},
> +};
> +
> +/* ===== End test cases ===== */
> +
> +static int test_init(struct kunit *test)
> +{
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&observed.lock, flags);
> +       observed.header[0] = '\0';
> +       observed.ignore = false;
> +       observed.available = false;
> +       spin_unlock_irqrestore(&observed.lock, flags);
> +
> +       return 0;
> +}
> +
> +static void test_exit(struct kunit *test)
> +{
> +}
> +
> +static struct kunit_suite kmsan_test_suite = {
> +       .name = "kmsan",
> +       .test_cases = kmsan_test_cases,
> +       .init = test_init,
> +       .exit = test_exit,
> +};
> +static struct kunit_suite *kmsan_test_suites[] = { &kmsan_test_suite, NULL };
> +
> +static void register_tracepoints(struct tracepoint *tp, void *ignore)
> +{
> +       check_trace_callback_type_console(probe_console);
> +       if (!strcmp(tp->name, "console"))
> +               WARN_ON(tracepoint_probe_register(tp, probe_console, NULL));
> +}
> +
> +static void unregister_tracepoints(struct tracepoint *tp, void *ignore)
> +{
> +       if (!strcmp(tp->name, "console"))
> +               tracepoint_probe_unregister(tp, probe_console, NULL);
> +}
> +
> +/*
> + * We only want to do tracepoints setup and teardown once, therefore we have to
> + * customize the init and exit functions and cannot rely on kunit_test_suite().
> + */

This is no longer true. See a recent version of
mm/kfence/kfence_test.c which uses the new suite_init/exit.

> +static int __init kmsan_test_init(void)
> +{
> +       /*
> +        * Because we want to be able to build the test as a module, we need to
> +        * iterate through all known tracepoints, since the static registration
> +        * won't work here.
> +        */
> +       for_each_kernel_tracepoint(register_tracepoints, NULL);
> +       return __kunit_test_suites_init(kmsan_test_suites);
> +}
> +
> +static void kmsan_test_exit(void)
> +{
> +       __kunit_test_suites_exit(kmsan_test_suites);
> +       for_each_kernel_tracepoint(unregister_tracepoints, NULL);
> +       tracepoint_synchronize_unregister();
> +}
> +
> +late_initcall_sync(kmsan_test_init);
> +module_exit(kmsan_test_exit);
> +
> +MODULE_LICENSE("GPL v2");

A recent version of checkpatch should complain about this, wanting
only "GPL" instead of "GPL v2".

> +MODULE_AUTHOR("Alexander Potapenko <glider@google.com>");

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 02/45] stackdepot: reserve 5 extra bits in depot_stack_handle_t
  2022-07-01 14:22 ` [PATCH v4 02/45] stackdepot: reserve 5 extra bits in depot_stack_handle_t Alexander Potapenko
@ 2022-07-12 14:17   ` Marco Elver
  0 siblings, 0 replies; 147+ messages in thread
From: Marco Elver @ 2022-07-12 14:17 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, linux-mm, linux-arch, linux-kernel

On Fri, 1 Jul 2022 at 16:23, Alexander Potapenko <glider@google.com> wrote:
>
> Some users (currently only KMSAN) may want to use spare bits in
> depot_stack_handle_t. Let them do so by adding @extra_bits to
> __stack_depot_save() to store arbitrary flags, and providing
> stack_depot_get_extra_bits() to retrieve those flags.
>
> Also adapt KASAN to the new prototype by passing extra_bits=0, as KASAN
> does not intend to store additional information in the stack handle.
>
> Signed-off-by: Alexander Potapenko <glider@google.com>

Reviewed-by: Marco Elver <elver@google.com>


> ---
> v4:
>  -- per Marco Elver's request, fold "kasan: common: adapt to the new
>     prototype of __stack_depot_save()" into this patch to prevent
>     bisection breakages.
>
> Link: https://linux-review.googlesource.com/id/I0587f6c777667864768daf07821d594bce6d8ff9
> ---
>  include/linux/stackdepot.h |  8 ++++++++
>  lib/stackdepot.c           | 29 ++++++++++++++++++++++++-----
>  mm/kasan/common.c          |  2 +-
>  3 files changed, 33 insertions(+), 6 deletions(-)
>
> diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
> index bc2797955de90..9ca7798d7a318 100644
> --- a/include/linux/stackdepot.h
> +++ b/include/linux/stackdepot.h
> @@ -14,9 +14,15 @@
>  #include <linux/gfp.h>
>
>  typedef u32 depot_stack_handle_t;
> +/*
> + * Number of bits in the handle that stack depot doesn't use. Users may store
> + * information in them.
> + */
> +#define STACK_DEPOT_EXTRA_BITS 5
>
>  depot_stack_handle_t __stack_depot_save(unsigned long *entries,
>                                         unsigned int nr_entries,
> +                                       unsigned int extra_bits,
>                                         gfp_t gfp_flags, bool can_alloc);
>
>  /*
> @@ -59,6 +65,8 @@ depot_stack_handle_t stack_depot_save(unsigned long *entries,
>  unsigned int stack_depot_fetch(depot_stack_handle_t handle,
>                                unsigned long **entries);
>
> +unsigned int stack_depot_get_extra_bits(depot_stack_handle_t handle);
> +
>  int stack_depot_snprint(depot_stack_handle_t handle, char *buf, size_t size,
>                        int spaces);
>
> diff --git a/lib/stackdepot.c b/lib/stackdepot.c
> index 5ca0d086ef4a3..3d1dbdd5a87f6 100644
> --- a/lib/stackdepot.c
> +++ b/lib/stackdepot.c
> @@ -42,7 +42,8 @@
>  #define STACK_ALLOC_OFFSET_BITS (STACK_ALLOC_ORDER + PAGE_SHIFT - \
>                                         STACK_ALLOC_ALIGN)
>  #define STACK_ALLOC_INDEX_BITS (DEPOT_STACK_BITS - \
> -               STACK_ALLOC_NULL_PROTECTION_BITS - STACK_ALLOC_OFFSET_BITS)
> +               STACK_ALLOC_NULL_PROTECTION_BITS - \
> +               STACK_ALLOC_OFFSET_BITS - STACK_DEPOT_EXTRA_BITS)
>  #define STACK_ALLOC_SLABS_CAP 8192
>  #define STACK_ALLOC_MAX_SLABS \
>         (((1LL << (STACK_ALLOC_INDEX_BITS)) < STACK_ALLOC_SLABS_CAP) ? \
> @@ -55,6 +56,7 @@ union handle_parts {
>                 u32 slabindex : STACK_ALLOC_INDEX_BITS;
>                 u32 offset : STACK_ALLOC_OFFSET_BITS;
>                 u32 valid : STACK_ALLOC_NULL_PROTECTION_BITS;
> +               u32 extra : STACK_DEPOT_EXTRA_BITS;
>         };
>  };
>
> @@ -76,6 +78,14 @@ static int next_slab_inited;
>  static size_t depot_offset;
>  static DEFINE_RAW_SPINLOCK(depot_lock);
>
> +unsigned int stack_depot_get_extra_bits(depot_stack_handle_t handle)
> +{
> +       union handle_parts parts = { .handle = handle };
> +
> +       return parts.extra;
> +}
> +EXPORT_SYMBOL(stack_depot_get_extra_bits);
> +
>  static bool init_stack_slab(void **prealloc)
>  {
>         if (!*prealloc)
> @@ -139,6 +149,7 @@ depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
>         stack->handle.slabindex = depot_index;
>         stack->handle.offset = depot_offset >> STACK_ALLOC_ALIGN;
>         stack->handle.valid = 1;
> +       stack->handle.extra = 0;
>         memcpy(stack->entries, entries, flex_array_size(stack, entries, size));
>         depot_offset += required_size;
>
> @@ -343,6 +354,7 @@ EXPORT_SYMBOL_GPL(stack_depot_fetch);
>   *
>   * @entries:           Pointer to storage array
>   * @nr_entries:                Size of the storage array
> + * @extra_bits:                Flags to store in unused bits of depot_stack_handle_t
>   * @alloc_flags:       Allocation gfp flags
>   * @can_alloc:         Allocate stack slabs (increased chance of failure if false)
>   *
> @@ -354,6 +366,10 @@ EXPORT_SYMBOL_GPL(stack_depot_fetch);
>   * If the stack trace in @entries is from an interrupt, only the portion up to
>   * interrupt entry is saved.
>   *
> + * Additional opaque flags can be passed in @extra_bits, stored in the unused
> + * bits of the stack handle, and retrieved using stack_depot_get_extra_bits()
> + * without calling stack_depot_fetch().
> + *
>   * Context: Any context, but setting @can_alloc to %false is required if
>   *          alloc_pages() cannot be used from the current context. Currently
>   *          this is the case from contexts where neither %GFP_ATOMIC nor
> @@ -363,10 +379,11 @@ EXPORT_SYMBOL_GPL(stack_depot_fetch);
>   */
>  depot_stack_handle_t __stack_depot_save(unsigned long *entries,
>                                         unsigned int nr_entries,
> +                                       unsigned int extra_bits,
>                                         gfp_t alloc_flags, bool can_alloc)
>  {
>         struct stack_record *found = NULL, **bucket;
> -       depot_stack_handle_t retval = 0;
> +       union handle_parts retval = { .handle = 0 };
>         struct page *page = NULL;
>         void *prealloc = NULL;
>         unsigned long flags;
> @@ -450,9 +467,11 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
>                 free_pages((unsigned long)prealloc, STACK_ALLOC_ORDER);
>         }
>         if (found)
> -               retval = found->handle.handle;
> +               retval.handle = found->handle.handle;
>  fast_exit:
> -       return retval;
> +       retval.extra = extra_bits;
> +
> +       return retval.handle;
>  }
>  EXPORT_SYMBOL_GPL(__stack_depot_save);
>
> @@ -472,6 +491,6 @@ depot_stack_handle_t stack_depot_save(unsigned long *entries,
>                                       unsigned int nr_entries,
>                                       gfp_t alloc_flags)
>  {
> -       return __stack_depot_save(entries, nr_entries, alloc_flags, true);
> +       return __stack_depot_save(entries, nr_entries, 0, alloc_flags, true);
>  }
>  EXPORT_SYMBOL_GPL(stack_depot_save);
> diff --git a/mm/kasan/common.c b/mm/kasan/common.c
> index c40c0e7b3b5f1..ba4fceeec173c 100644
> --- a/mm/kasan/common.c
> +++ b/mm/kasan/common.c
> @@ -36,7 +36,7 @@ depot_stack_handle_t kasan_save_stack(gfp_t flags, bool can_alloc)
>         unsigned int nr_entries;
>
>         nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 0);
> -       return __stack_depot_save(entries, nr_entries, flags, can_alloc);
> +       return __stack_depot_save(entries, nr_entries, 0, flags, can_alloc);
>  }
>
>  void kasan_set_track(struct kasan_track *track, gfp_t flags)
> --
> 2.37.0.rc0.161.g10f37bed90-goog
>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 03/45] instrumented.h: allow instrumenting both sides of copy_from_user()
  2022-07-01 14:22 ` [PATCH v4 03/45] instrumented.h: allow instrumenting both sides of copy_from_user() Alexander Potapenko
@ 2022-07-12 14:17   ` Marco Elver
  0 siblings, 0 replies; 147+ messages in thread
From: Marco Elver @ 2022-07-12 14:17 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, linux-mm, linux-arch, linux-kernel

On Fri, 1 Jul 2022 at 16:23, Alexander Potapenko <glider@google.com> wrote:
>
> Introduce instrument_copy_from_user_before() and
> instrument_copy_from_user_after() hooks to be invoked before and after
> the call to copy_from_user().
>
> KASAN and KCSAN will be only using instrument_copy_from_user_before(),
> but for KMSAN we'll need to insert code after copy_from_user().
>
> Signed-off-by: Alexander Potapenko <glider@google.com>

Reviewed-by: Marco Elver <elver@google.com>


> ---
> v4:
>  -- fix _copy_from_user_key() in arch/s390/lib/uaccess.c (Reported-by:
>     kernel test robot <lkp@intel.com>)
>
> Link: https://linux-review.googlesource.com/id/I855034578f0b0f126734cbd734fb4ae1d3a6af99
> ---
>  arch/s390/lib/uaccess.c      |  3 ++-
>  include/linux/instrumented.h | 21 +++++++++++++++++++--
>  include/linux/uaccess.h      | 19 ++++++++++++++-----
>  lib/iov_iter.c               |  9 ++++++---
>  lib/usercopy.c               |  3 ++-
>  5 files changed, 43 insertions(+), 12 deletions(-)
>
> diff --git a/arch/s390/lib/uaccess.c b/arch/s390/lib/uaccess.c
> index d7b3b193d1088..58033dfcb6d45 100644
> --- a/arch/s390/lib/uaccess.c
> +++ b/arch/s390/lib/uaccess.c
> @@ -81,8 +81,9 @@ unsigned long _copy_from_user_key(void *to, const void __user *from,
>
>         might_fault();
>         if (!should_fail_usercopy()) {
> -               instrument_copy_from_user(to, from, n);
> +               instrument_copy_from_user_before(to, from, n);
>                 res = raw_copy_from_user_key(to, from, n, key);
> +               instrument_copy_from_user_after(to, from, n, res);
>         }
>         if (unlikely(res))
>                 memset(to + (n - res), 0, res);
> diff --git a/include/linux/instrumented.h b/include/linux/instrumented.h
> index 42faebbaa202a..ee8f7d17d34f5 100644
> --- a/include/linux/instrumented.h
> +++ b/include/linux/instrumented.h
> @@ -120,7 +120,7 @@ instrument_copy_to_user(void __user *to, const void *from, unsigned long n)
>  }
>
>  /**
> - * instrument_copy_from_user - instrument writes of copy_from_user
> + * instrument_copy_from_user_before - add instrumentation before copy_from_user
>   *
>   * Instrument writes to kernel memory, that are due to copy_from_user (and
>   * variants). The instrumentation should be inserted before the accesses.
> @@ -130,10 +130,27 @@ instrument_copy_to_user(void __user *to, const void *from, unsigned long n)
>   * @n number of bytes to copy
>   */
>  static __always_inline void
> -instrument_copy_from_user(const void *to, const void __user *from, unsigned long n)
> +instrument_copy_from_user_before(const void *to, const void __user *from, unsigned long n)
>  {
>         kasan_check_write(to, n);
>         kcsan_check_write(to, n);
>  }
>
> +/**
> + * instrument_copy_from_user_after - add instrumentation after copy_from_user
> + *
> + * Instrument writes to kernel memory, that are due to copy_from_user (and
> + * variants). The instrumentation should be inserted after the accesses.
> + *
> + * @to destination address
> + * @from source address
> + * @n number of bytes to copy
> + * @left number of bytes not copied (as returned by copy_from_user)
> + */
> +static __always_inline void
> +instrument_copy_from_user_after(const void *to, const void __user *from,
> +                               unsigned long n, unsigned long left)
> +{
> +}
> +
>  #endif /* _LINUX_INSTRUMENTED_H */
> diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
> index 5a328cf02b75e..da16e96680cf1 100644
> --- a/include/linux/uaccess.h
> +++ b/include/linux/uaccess.h
> @@ -58,20 +58,28 @@
>  static __always_inline __must_check unsigned long
>  __copy_from_user_inatomic(void *to, const void __user *from, unsigned long n)
>  {
> -       instrument_copy_from_user(to, from, n);
> +       unsigned long res;
> +
> +       instrument_copy_from_user_before(to, from, n);
>         check_object_size(to, n, false);
> -       return raw_copy_from_user(to, from, n);
> +       res = raw_copy_from_user(to, from, n);
> +       instrument_copy_from_user_after(to, from, n, res);
> +       return res;
>  }
>
>  static __always_inline __must_check unsigned long
>  __copy_from_user(void *to, const void __user *from, unsigned long n)
>  {
> +       unsigned long res;
> +
>         might_fault();
> +       instrument_copy_from_user_before(to, from, n);
>         if (should_fail_usercopy())
>                 return n;
> -       instrument_copy_from_user(to, from, n);
>         check_object_size(to, n, false);
> -       return raw_copy_from_user(to, from, n);
> +       res = raw_copy_from_user(to, from, n);
> +       instrument_copy_from_user_after(to, from, n, res);
> +       return res;
>  }
>
>  /**
> @@ -115,8 +123,9 @@ _copy_from_user(void *to, const void __user *from, unsigned long n)
>         unsigned long res = n;
>         might_fault();
>         if (!should_fail_usercopy() && likely(access_ok(from, n))) {
> -               instrument_copy_from_user(to, from, n);
> +               instrument_copy_from_user_before(to, from, n);
>                 res = raw_copy_from_user(to, from, n);
> +               instrument_copy_from_user_after(to, from, n, res);
>         }
>         if (unlikely(res))
>                 memset(to + (n - res), 0, res);
> diff --git a/lib/iov_iter.c b/lib/iov_iter.c
> index 0b64695ab632f..fe5d169314dbf 100644
> --- a/lib/iov_iter.c
> +++ b/lib/iov_iter.c
> @@ -159,13 +159,16 @@ static int copyout(void __user *to, const void *from, size_t n)
>
>  static int copyin(void *to, const void __user *from, size_t n)
>  {
> +       size_t res = n;
> +
>         if (should_fail_usercopy())
>                 return n;
>         if (access_ok(from, n)) {
> -               instrument_copy_from_user(to, from, n);
> -               n = raw_copy_from_user(to, from, n);
> +               instrument_copy_from_user_before(to, from, n);
> +               res = raw_copy_from_user(to, from, n);
> +               instrument_copy_from_user_after(to, from, n, res);
>         }
> -       return n;
> +       return res;
>  }
>
>  static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t bytes,
> diff --git a/lib/usercopy.c b/lib/usercopy.c
> index 7413dd300516e..1505a52f23a01 100644
> --- a/lib/usercopy.c
> +++ b/lib/usercopy.c
> @@ -12,8 +12,9 @@ unsigned long _copy_from_user(void *to, const void __user *from, unsigned long n
>         unsigned long res = n;
>         might_fault();
>         if (!should_fail_usercopy() && likely(access_ok(from, n))) {
> -               instrument_copy_from_user(to, from, n);
> +               instrument_copy_from_user_before(to, from, n);
>                 res = raw_copy_from_user(to, from, n);
> +               instrument_copy_from_user_after(to, from, n, res);
>         }
>         if (unlikely(res))
>                 memset(to + (n - res), 0, res);
> --
> 2.37.0.rc0.161.g10f37bed90-goog
>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 05/45] asm-generic: instrument usercopy in cacheflush.h
  2022-07-01 14:22 ` [PATCH v4 05/45] asm-generic: instrument usercopy in cacheflush.h Alexander Potapenko
@ 2022-07-12 14:17   ` Marco Elver
  0 siblings, 0 replies; 147+ messages in thread
From: Marco Elver @ 2022-07-12 14:17 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, linux-mm, linux-arch, linux-kernel

On Fri, 1 Jul 2022 at 16:23, Alexander Potapenko <glider@google.com> wrote:
>
> Notify memory tools about usercopy events in copy_to_user_page() and
> copy_from_user_page().
>
> Signed-off-by: Alexander Potapenko <glider@google.com>

Reviewed-by: Marco Elver <elver@google.com>


> ---
> Link: https://linux-review.googlesource.com/id/Ic1ee8da1886325f46ad67f52176f48c2c836c48f
> ---
>  include/asm-generic/cacheflush.h | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/include/asm-generic/cacheflush.h b/include/asm-generic/cacheflush.h
> index 4f07afacbc239..0f63eb325025f 100644
> --- a/include/asm-generic/cacheflush.h
> +++ b/include/asm-generic/cacheflush.h
> @@ -2,6 +2,8 @@
>  #ifndef _ASM_GENERIC_CACHEFLUSH_H
>  #define _ASM_GENERIC_CACHEFLUSH_H
>
> +#include <linux/instrumented.h>
> +
>  struct mm_struct;
>  struct vm_area_struct;
>  struct page;
> @@ -105,6 +107,7 @@ static inline void flush_cache_vunmap(unsigned long start, unsigned long end)
>  #ifndef copy_to_user_page
>  #define copy_to_user_page(vma, page, vaddr, dst, src, len)     \
>         do { \
> +               instrument_copy_to_user(dst, src, len); \
>                 memcpy(dst, src, len); \
>                 flush_icache_user_page(vma, page, vaddr, len); \
>         } while (0)
> @@ -112,7 +115,11 @@ static inline void flush_cache_vunmap(unsigned long start, unsigned long end)
>
>  #ifndef copy_from_user_page
>  #define copy_from_user_page(vma, page, vaddr, dst, src, len) \
> -       memcpy(dst, src, len)
> +       do { \
> +               instrument_copy_from_user_before(dst, src, len); \
> +               memcpy(dst, src, len); \
> +               instrument_copy_from_user_after(dst, src, len, 0); \
> +       } while (0)
>  #endif
>
>  #endif /* _ASM_GENERIC_CACHEFLUSH_H */
> --
> 2.37.0.rc0.161.g10f37bed90-goog
>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 07/45] kmsan: introduce __no_sanitize_memory and __no_kmsan_checks
  2022-07-01 14:22 ` [PATCH v4 07/45] kmsan: introduce __no_sanitize_memory and __no_kmsan_checks Alexander Potapenko
@ 2022-07-12 14:17   ` Marco Elver
  0 siblings, 0 replies; 147+ messages in thread
From: Marco Elver @ 2022-07-12 14:17 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, linux-mm, linux-arch, linux-kernel

On Fri, 1 Jul 2022 at 16:23, Alexander Potapenko <glider@google.com> wrote:
>
> __no_sanitize_memory is a function attribute that instructs KMSAN to
> skip a function during instrumentation. This is needed to e.g. implement
> the noinstr functions.
>
> __no_kmsan_checks is a function attribute that makes KMSAN
> ignore the uninitialized values coming from the function's
> inputs, and initialize the function's outputs.
>
> Functions marked with this attribute can't be inlined into functions
> not marked with it, and vice versa. This behavior is overridden by
> __always_inline.
>
> __SANITIZE_MEMORY__ is a macro that's defined iff the file is
> instrumented with KMSAN. This is not the same as CONFIG_KMSAN, which is
> defined for every file.
>
> Signed-off-by: Alexander Potapenko <glider@google.com>

Reviewed-by: Marco Elver <elver@google.com>



> ---
> Link: https://linux-review.googlesource.com/id/I004ff0360c918d3cd8b18767ddd1381c6d3281be
> ---
>  include/linux/compiler-clang.h | 23 +++++++++++++++++++++++
>  include/linux/compiler-gcc.h   |  6 ++++++
>  2 files changed, 29 insertions(+)
>
> diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
> index c84fec767445d..4fa0cc4cbd2c8 100644
> --- a/include/linux/compiler-clang.h
> +++ b/include/linux/compiler-clang.h
> @@ -51,6 +51,29 @@
>  #define __no_sanitize_undefined
>  #endif
>
> +#if __has_feature(memory_sanitizer)
> +#define __SANITIZE_MEMORY__
> +/*
> + * Unlike other sanitizers, KMSAN still inserts code into functions marked with
> + * no_sanitize("kernel-memory"). Using disable_sanitizer_instrumentation
> + * provides the behavior consistent with other __no_sanitize_ attributes,
> + * guaranteeing that __no_sanitize_memory functions remain uninstrumented.
> + */
> +#define __no_sanitize_memory __disable_sanitizer_instrumentation
> +
> +/*
> + * The __no_kmsan_checks attribute ensures that a function does not produce
> + * false positive reports by:
> + *  - initializing all local variables and memory stores in this function;
> + *  - skipping all shadow checks;
> + *  - passing initialized arguments to this function's callees.
> + */
> +#define __no_kmsan_checks __attribute__((no_sanitize("kernel-memory")))
> +#else
> +#define __no_sanitize_memory
> +#define __no_kmsan_checks
> +#endif
> +
>  /*
>   * Support for __has_feature(coverage_sanitizer) was added in Clang 13 together
>   * with no_sanitize("coverage"). Prior versions of Clang support coverage
> diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
> index a0c55eeaeaf16..63eb90eddad77 100644
> --- a/include/linux/compiler-gcc.h
> +++ b/include/linux/compiler-gcc.h
> @@ -125,6 +125,12 @@
>  #define __SANITIZE_ADDRESS__
>  #endif
>
> +/*
> + * GCC does not support KMSAN.
> + */
> +#define __no_sanitize_memory
> +#define __no_kmsan_checks
> +
>  /*
>   * Turn individual warnings and errors on and off locally, depending
>   * on version.
> --
> 2.37.0.rc0.161.g10f37bed90-goog
>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 08/45] kmsan: mark noinstr as __no_sanitize_memory
  2022-07-01 14:22 ` [PATCH v4 08/45] kmsan: mark noinstr as __no_sanitize_memory Alexander Potapenko
@ 2022-07-12 14:17   ` Marco Elver
  0 siblings, 0 replies; 147+ messages in thread
From: Marco Elver @ 2022-07-12 14:17 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, linux-mm, linux-arch, linux-kernel

On Fri, 1 Jul 2022 at 16:23, Alexander Potapenko <glider@google.com> wrote:
>
> noinstr functions should never be instrumented, so make KMSAN skip them
> by applying the __no_sanitize_memory attribute.
>
> Signed-off-by: Alexander Potapenko <glider@google.com>

Reviewed-by: Marco Elver <elver@google.com>

> ---
> v2:
>  -- moved this patch earlier in the series per Mark Rutland's request
>
> Link: https://linux-review.googlesource.com/id/I3c9abe860b97b49bc0c8026918b17a50448dec0d
> ---
>  include/linux/compiler_types.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
> index d08dfcb0ac687..fb5777e5228e7 100644
> --- a/include/linux/compiler_types.h
> +++ b/include/linux/compiler_types.h
> @@ -227,7 +227,8 @@ struct ftrace_likely_data {
>  /* Section for code which can't be instrumented at all */
>  #define noinstr                                                                \
>         noinline notrace __attribute((__section__(".noinstr.text")))    \
> -       __no_kcsan __no_sanitize_address __no_profile __no_sanitize_coverage
> +       __no_kcsan __no_sanitize_address __no_profile __no_sanitize_coverage \
> +       __no_sanitize_memory
>
>  #endif /* __KERNEL__ */
>
> --
> 2.37.0.rc0.161.g10f37bed90-goog
>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 19/45] kmsan: unpoison @tlb in arch_tlb_gather_mmu()
  2022-07-01 14:22 ` [PATCH v4 19/45] kmsan: unpoison @tlb in arch_tlb_gather_mmu() Alexander Potapenko
@ 2022-07-13  9:28   ` Marco Elver
  0 siblings, 0 replies; 147+ messages in thread
From: Marco Elver @ 2022-07-13  9:28 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, linux-mm, linux-arch, linux-kernel

On Fri, 1 Jul 2022 at 16:24, Alexander Potapenko <glider@google.com> wrote:
>
> This is a hack to reduce stackdepot pressure.

Will it cause false negatives or other issues? If not, I'd just call
it an optimization and not a hack.

> struct mmu_gather contains 7 1-bit fields packed into a 32-bit unsigned
> int value. The remaining 25 bits remain uninitialized and are never used,
> but KMSAN updates the origin for them in zap_pXX_range() in mm/memory.c,
> thus creating very long origin chains. This is technically correct, but
> consumes too much memory.
>
> Unpoisoning the whole structure will prevent creating such chains.
>
> Signed-off-by: Alexander Potapenko <glider@google.com>

Acked-by: Marco Elver <elver@google.com>

> ---
> Link: https://linux-review.googlesource.com/id/I76abee411b8323acfdbc29bc3a60dca8cff2de77
> ---
>  mm/mmu_gather.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c
> index a71924bd38c0d..add4244e5790d 100644
> --- a/mm/mmu_gather.c
> +++ b/mm/mmu_gather.c
> @@ -1,6 +1,7 @@
>  #include <linux/gfp.h>
>  #include <linux/highmem.h>
>  #include <linux/kernel.h>
> +#include <linux/kmsan-checks.h>
>  #include <linux/mmdebug.h>
>  #include <linux/mm_types.h>
>  #include <linux/mm_inline.h>
> @@ -265,6 +266,15 @@ void tlb_flush_mmu(struct mmu_gather *tlb)
>  static void __tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
>                              bool fullmm)
>  {
> +       /*
> +        * struct mmu_gather contains 7 1-bit fields packed into a 32-bit
> +        * unsigned int value. The remaining 25 bits remain uninitialized
> +        * and are never used, but KMSAN updates the origin for them in
> +        * zap_pXX_range() in mm/memory.c, thus creating very long origin
> +        * chains. This is technically correct, but consumes too much memory.
> +        * Unpoisoning the whole structure will prevent creating such chains.
> +        */
> +       kmsan_unpoison_memory(tlb, sizeof(*tlb));
>         tlb->mm = mm;
>         tlb->fullmm = fullmm;
>
> --
> 2.37.0.rc0.161.g10f37bed90-goog
>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 11/45] kmsan: add KMSAN runtime core
  2022-07-01 14:22 ` [PATCH v4 11/45] kmsan: add KMSAN runtime core Alexander Potapenko
  2022-07-02  0:18   ` Hillf Danton
  2022-07-11 16:49   ` Marco Elver
@ 2022-07-13 10:04   ` Marco Elver
  2022-08-03 17:45     ` Alexander Potapenko
  2 siblings, 1 reply; 147+ messages in thread
From: Marco Elver @ 2022-07-13 10:04 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, linux-mm, linux-arch, linux-kernel

On Fri, Jul 01, 2022 at 04:22PM +0200, 'Alexander Potapenko' via kasan-dev wrote:
[...]
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 2e24db4bff192..59819e6fa5865 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -963,6 +963,7 @@ config DEBUG_STACKOVERFLOW
>  
>  source "lib/Kconfig.kasan"
>  source "lib/Kconfig.kfence"
> +source "lib/Kconfig.kmsan"
>  
>  endmenu # "Memory Debugging"
>  
> diff --git a/lib/Kconfig.kmsan b/lib/Kconfig.kmsan
> new file mode 100644
> index 0000000000000..8f768d4034e3c
> --- /dev/null
> +++ b/lib/Kconfig.kmsan
> @@ -0,0 +1,50 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +config HAVE_ARCH_KMSAN
> +	bool
> +
> +config HAVE_KMSAN_COMPILER
> +	# Clang versions <14.0.0 also support -fsanitize=kernel-memory, but not
> +	# all the features necessary to build the kernel with KMSAN.
> +	depends on CC_IS_CLANG && CLANG_VERSION >= 140000
> +	def_bool $(cc-option,-fsanitize=kernel-memory -mllvm -msan-disable-checks=1)
> +
> +config HAVE_KMSAN_PARAM_RETVAL
> +	# Separate check for -fsanitize-memory-param-retval support.

This comment doesn't add much value, maybe instead say that "Supported
only by Clang >= 15."

> +	depends on CC_IS_CLANG && CLANG_VERSION >= 140000

Why not just "depends on HAVE_KMSAN_COMPILER"? (All
fsanitize-memory-param-retval supporting compilers must also be KMSAN
compilers.)

> +	def_bool $(cc-option,-fsanitize=kernel-memory -fsanitize-memory-param-retval)
> +
> +

HAVE_KMSAN_PARAM_RETVAL should be moved under "if KMSAN" so that this
isn't unnecessarily evaluated in every kernel build (saving 1 shelling
out to clang in most builds).

> +config KMSAN
> +	bool "KMSAN: detector of uninitialized values use"
> +	depends on HAVE_ARCH_KMSAN && HAVE_KMSAN_COMPILER
> +	depends on SLUB && DEBUG_KERNEL && !KASAN && !KCSAN
> +	select STACKDEPOT
> +	select STACKDEPOT_ALWAYS_INIT
> +	help
> +	  KernelMemorySanitizer (KMSAN) is a dynamic detector of uses of
> +	  uninitialized values in the kernel. It is based on compiler
> +	  instrumentation provided by Clang and thus requires Clang to build.
> +
> +	  An important note is that KMSAN is not intended for production use,
> +	  because it drastically increases kernel memory footprint and slows
> +	  the whole system down.
> +
> +	  See <file:Documentation/dev-tools/kmsan.rst> for more details.
> +
> +if KMSAN
> +
> +config KMSAN_CHECK_PARAM_RETVAL
> +	bool "Check for uninitialized values passed to and returned from functions"
> +	default HAVE_KMSAN_PARAM_RETVAL

This can be enabled even if !HAVE_KMSAN_PARAM_RETVAL. Should this be:

	default y
	depends on HAVE_KMSAN_PARAM_RETVAL

instead?

> +	help
> +	  If the compiler supports -fsanitize-memory-param-retval, KMSAN will
> +	  eagerly check every function parameter passed by value and every
> +	  function return value.
> +
> +	  Disabling KMSAN_CHECK_PARAM_RETVAL will result in tracking shadow for
> +	  function parameters and return values across function borders. This
> +	  is a more relaxed mode, but it generates more instrumentation code and
> +	  may potentially report errors in corner cases when non-instrumented
> +	  functions call instrumented ones.
> +

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 29/45] block: kmsan: skip bio block merging logic for KMSAN
  2022-07-01 14:22 ` [PATCH v4 29/45] block: kmsan: skip bio block merging logic for KMSAN Alexander Potapenko
@ 2022-07-13 10:22   ` Marco Elver
  2022-08-02 17:47     ` Alexander Potapenko
  0 siblings, 1 reply; 147+ messages in thread
From: Marco Elver @ 2022-07-13 10:22 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, linux-mm, linux-arch, linux-kernel,
	Eric Biggers

On Fri, Jul 01, 2022 at 04:22PM +0200, 'Alexander Potapenko' via kasan-dev wrote:
[...]
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -867,6 +867,8 @@ static inline bool page_is_mergeable(const struct bio_vec *bv,
>  		return false;
>  
>  	*same_page = ((vec_end_addr & PAGE_MASK) == page_addr);
> +	if (!*same_page && IS_ENABLED(CONFIG_KMSAN))
> +		return false;
>  	if (*same_page)
>  		return true;

  	if (*same_page)
  		return true;
	else if (IS_ENABLED(CONFIG_KMSAN))
		return false;

>  	return (bv->bv_page + bv_end / PAGE_SIZE) == (page + off / PAGE_SIZE);

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 06/45] kmsan: add ReST documentation
  2022-07-07 12:34   ` Marco Elver
@ 2022-07-15  7:42     ` Alexander Potapenko
  2022-07-15  8:52       ` Marco Elver
  0 siblings, 1 reply; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-15  7:42 UTC (permalink / raw)
  To: Marco Elver
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, Linux Memory Management List,
	Linux-Arch, LKML

> To be consistent with other tools, I think we have settled on "The
> Kernel <...> Sanitizer (K?SAN)", see
> Documentation/dev-tools/k[ac]san.rst. So this will be "The Kernel
> Memory Sanitizer (KMSAN)".

Done (will appear in v5).


> -> "The third stack trace ..."
> (Because it looks like there's also another stack trace in the middle
> and "lower" is ambiguous)

Done

>
> > +where this variable was created.
> > +
> > +The upper stack shows where the uninit value was used - in
>
> -> "The first stack trace shows where the uninit value was used (in
> ``test_uninit_kmsan_check_memory()``)."
Done

> > +KMSAN and Clang
> > +===============
>
> The KASAN documentation has a section on "Support" which lists
> architectures and compilers supported. I'd try to mirror (or improve
> on) that.

Renamed this section to "Support", added a line about supported
architectures (x86_64)

>
> > +In order for KMSAN to work the kernel must be built with Clang, which so far is
> > +the only compiler that has KMSAN support. The kernel instrumentation pass is
> > +based on the userspace `MemorySanitizer tool`_.
> > +
> > +How to build
> > +============
>
> I'd call it "Usage", like in the KASAN and KCSAN documentation.
Done

>
> > +In order to build a kernel with KMSAN you will need a fresh Clang (14.0.0+).
> > +Please refer to `LLVM documentation`_ for the instructions on how to build Clang.
> > +
> > +Now configure and build the kernel with CONFIG_KMSAN enabled.
>
> I would move build/usage instructions right after introduction as
> that's most likely what users of KMSAN will want to know about first.

Done

> > +How KMSAN works
> > +===============
> > +
> > +KMSAN shadow memory
> > +-------------------
> > +
> > +KMSAN associates a metadata byte (also called shadow byte) with every byte of
> > +kernel memory. A bit in the shadow byte is set iff the corresponding bit of the
> > +kernel memory byte is uninitialized. Marking the memory uninitialized (i.e.
> > +setting its shadow bytes to ``0xff``) is called poisoning, marking it
> > +initialized (setting the shadow bytes to ``0x00``) is called unpoisoning.
> > +
> > +When a new variable is allocated on the stack, it is poisoned by default by
> > +instrumentation code inserted by the compiler (unless it is a stack variable
> > +that is immediately initialized). Any new heap allocation done without
> > +``__GFP_ZERO`` is also poisoned.
> > +
> > +Compiler instrumentation also tracks the shadow values with the help from the
> > +runtime library in ``mm/kmsan/``.
>
> This sentence might still be confusing. I think it should highlight
> that runtime and compiler go together, but depending on the scope of
> the value, the compiler invokes the runtime to persist the shadow.

Changed to:
"""
Compiler instrumentation also tracks the shadow values as they are used along
the code. When needed, instrumentation code invokes the runtime library in
``mm/kmsan/`` to persist shadow values.
"""

> > +
> > +
>
> There are 2 blank lines here, which is inconsistent with the rest of
> the document.

Fixed

> > +Origin tracking
> > +---------------
> > +
> > +Every four bytes of kernel memory also have a so-called origin assigned to
>
> Is "assigned" or "mapped" more appropriate here?

I think initially this was more about origin values that exist in SSA
as well as memory, so not all of them were "mapped".
On the other hand, we're talking about bytes in the memory, so "mapped" is fine.

> > +them. This origin describes the point in program execution at which the
> > +uninitialized value was created. Every origin is associated with either the
> > +full allocation stack (for heap-allocated memory), or the function containing
> > +the uninitialized variable (for locals).
> > +
> > +When an uninitialized variable is allocated on stack or heap, a new origin
> > +value is created, and that variable's origin is filled with that value.
> > +When a value is read from memory, its origin is also read and kept together
> > +with the shadow. For every instruction that takes one or more values the origin
>
> s/values the origin/values, the origin/
Done, thanks!


> > +
> > +If ``a`` is initialized and ``b`` is not, the shadow of the result would be
> > +0xffff0000, and the origin of the result would be the origin of ``b``.
> > +``ret.s[0]`` would have the same origin, but it will be never used, because
>
> s/be never/never be/
Done

> > +Passing uninitialized values to functions
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +KMSAN instrumentation pass has an option, ``-fsanitize-memory-param-retval``,
>
> "KMSAN instrumentation pass" -> "Clang's instrumentation support" ?
> Because it seems wrong to say that KMSAN has the instrumentation pass.
How about "Clang's MSan instrumentation pass"?

> > +
> > +Sometimes the pointers passed into inline assembly do not point to valid memory.
> > +In such cases they are ignored at runtime.
> > +
> > +Disabling the instrumentation
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> It would be useful to move this section somewhere to the beginning,
> closer to usage and the example, as this is information that a user of
> KMSAN might want to know (but they might not want to know much about
> how KMSAN works).

I restructured the TOC as follows:

== The Kernel Memory Sanitizer (KMSAN)
== Usage
--- Building the kernel
--- Example report
--- Disabling the instrumentation
== Support
== How KMSAN works
--- KMSAN shadow memory
--- Origin tracking
~~~~ Origin chaining
--- Clang instrumentation API
~~~~ Shadow manipulation
~~~~ Handling locals
~~~~ Access to per-task data
~~~~ Passing uninitialized values to functions
~~~~ String functions
~~~~ Error reporting
~~~~ Inline assembly instrumentation
--- Runtime library
~~~~ Per-task KMSAN state
~~~~ KMSAN contexts
~~~~ Metadata allocation
== References


> > +Another function attribute supported by KMSAN is ``__no_sanitize_memory``.
> > +Applying this attribute to a function will result in KMSAN not instrumenting it,
> > +which can be helpful if we do not want the compiler to mess up some low-level
>
> s/mess up/interfere with/
Done

> > +code (e.g. that marked with ``noinstr``).
>
> maybe "... (e.g. that marked with ``noinstr``, which implicitly adds
> ``__no_sanitize_memory``)."

Done

> otherwise people might think that it's necessary to add
> __no_sanitize_memory explicitly to noinstr.

Good point!

> > +    ...
> > +    struct kmsan_context kmsan;
> > +    ...
> > +  }
> > +
> > +
>
> 1 blank line instead of 2?
Done

> > +This means that in general for two contiguous memory pages their shadow/origin
> > +pages may not be contiguous. So, if a memory access crosses the boundary
>
> s/So, /Consequently, /
Done


-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 06/45] kmsan: add ReST documentation
  2022-07-15  7:42     ` Alexander Potapenko
@ 2022-07-15  8:52       ` Marco Elver
  0 siblings, 0 replies; 147+ messages in thread
From: Marco Elver @ 2022-07-15  8:52 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, Linux Memory Management List,
	Linux-Arch, LKML

On Fri, Jul 15, 2022 at 09:42AM +0200, Alexander Potapenko wrote:
[...]
> > This sentence might still be confusing. I think it should highlight
> > that runtime and compiler go together, but depending on the scope of
> > the value, the compiler invokes the runtime to persist the shadow.
> 
> Changed to:
> """
> Compiler instrumentation also tracks the shadow values as they are used along
> the code. When needed, instrumentation code invokes the runtime library in
> ``mm/kmsan/`` to persist shadow values.
> """

Ack.

[...]
> > > +Passing uninitialized values to functions
> > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > +
> > > +KMSAN instrumentation pass has an option, ``-fsanitize-memory-param-retval``,
> >
> > "KMSAN instrumentation pass" -> "Clang's instrumentation support" ?
> > Because it seems wrong to say that KMSAN has the instrumentation pass.
> How about "Clang's MSan instrumentation pass"?

Maybe just "Clang's MemorySanitizer instrumentation" - no abbreviation,
and "pass" is very compiler-implementation specific and not everyone
might know what "pass" even means in this context, so I'd leave it out.

[...]
> > It would be useful to move this section somewhere to the beginning,
> > closer to usage and the example, as this is information that a user of
> > KMSAN might want to know (but they might not want to know much about
> > how KMSAN works).
> 
> I restructured the TOC as follows:
> 
> == The Kernel Memory Sanitizer (KMSAN)
> == Usage
> --- Building the kernel
> --- Example report
> --- Disabling the instrumentation
> == Support
> == How KMSAN works
> --- KMSAN shadow memory
> --- Origin tracking
> ~~~~ Origin chaining
> --- Clang instrumentation API
> ~~~~ Shadow manipulation
> ~~~~ Handling locals
> ~~~~ Access to per-task data
> ~~~~ Passing uninitialized values to functions
> ~~~~ String functions
> ~~~~ Error reporting
> ~~~~ Inline assembly instrumentation
> --- Runtime library
> ~~~~ Per-task KMSAN state
> ~~~~ KMSAN contexts
> ~~~~ Metadata allocation
> == References

LGTM.

Thanks,
-- Marco

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 04/45] x86: asm: instrument usercopy in get_user() and __put_user_size()
  2022-07-02  3:47   ` kernel test robot
@ 2022-07-15 14:03       ` Alexander Potapenko
  0 siblings, 0 replies; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-15 14:03 UTC (permalink / raw)
  To: kernel test robot
  Cc: kbuild-all, Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Linux Memory Management List, Andrey Konovalov, Andy Lutomirski,
	Arnd Bergmann, Borislav Petkov, Christoph Hellwig,
	Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Dumazet,
	Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar,
	Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner

On Sat, Jul 2, 2022 at 5:47 AM kernel test robot <lkp@intel.com> wrote:
>
> Hi Alexander,
>
> I love your patch! Perhaps something to improve:
>
> [auto build test WARNING on masahiroy-kbuild/for-next]
> [also build test WARNING on linus/master v5.19-rc4 next-20220701]
> [cannot apply to tip/x86/core tip/x86/mm]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
>
> url:    https://github.com/intel-lab-lkp/linux/commits/Alexander-Potapenko/Add-KernelMemorySanitizer-infrastructure/20220701-222712
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild.git for-next
> config: i386-randconfig-s002 (https://download.01.org/0day-ci/archive/20220702/202207021129.palrTLrL-lkp@intel.com/config)
> compiler: gcc-11 (Debian 11.3.0-3) 11.3.0
> reproduce:
>         # apt-get install sparse
>         # sparse version: v0.6.4-39-gce1a6720-dirty
>         # https://github.com/intel-lab-lkp/linux/commit/0ca0e4029535365a65588446ba55a952ca186079
>         git remote add linux-review https://github.com/intel-lab-lkp/linux
>         git fetch --no-tags linux-review Alexander-Potapenko/Add-KernelMemorySanitizer-infrastructure/20220701-222712
>         git checkout 0ca0e4029535365a65588446ba55a952ca186079
>         # save the config file
>         mkdir build_dir && cp config build_dir/.config
>         make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=i386 SHELL=/bin/bash arch/x86/kernel/ mm/
>
> If you fix the issue, kindly add following tag where applicable
> Reported-by: kernel test robot <lkp@intel.com>
>
>
> sparse warnings: (new ones prefixed by >>)
>    arch/x86/kernel/signal.c:360:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:360:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:360:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:360:9: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected void const volatile [noderef] __user *ptr @@     got unsigned long long [usertype] * @@
>    arch/x86/kernel/signal.c:360:9: sparse:     expected void const volatile [noderef] __user *ptr
>    arch/x86/kernel/signal.c:360:9: sparse:     got unsigned long long [usertype] *
>    arch/x86/kernel/signal.c:360:9: sparse: sparse: cast removes address space '__user' of expression
> >> arch/x86/kernel/signal.c:360:9: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected void [noderef] __user *to @@     got unsigned long long [usertype] * @@
>    arch/x86/kernel/signal.c:360:9: sparse:     expected void [noderef] __user *to
>    arch/x86/kernel/signal.c:360:9: sparse:     got unsigned long long [usertype] *
>    arch/x86/kernel/signal.c:360:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:360:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:360:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:360:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:360:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:360:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:360:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:360:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:360:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:360:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:420:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:420:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:420:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:420:9: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected void const volatile [noderef] __user *ptr @@     got unsigned long long [usertype] * @@
>    arch/x86/kernel/signal.c:420:9: sparse:     expected void const volatile [noderef] __user *ptr
>    arch/x86/kernel/signal.c:420:9: sparse:     got unsigned long long [usertype] *
>    arch/x86/kernel/signal.c:420:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:420:9: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected void [noderef] __user *to @@     got unsigned long long [usertype] * @@
>    arch/x86/kernel/signal.c:420:9: sparse:     expected void [noderef] __user *to
>    arch/x86/kernel/signal.c:420:9: sparse:     got unsigned long long [usertype] *
>    arch/x86/kernel/signal.c:420:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:420:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:420:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:420:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:420:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:420:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:420:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:420:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:420:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:420:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:953:9: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct lockdep_map const *lock @@     got struct lockdep_map [noderef] __rcu * @@
>    arch/x86/kernel/signal.c:953:9: sparse:     expected struct lockdep_map const *lock
>    arch/x86/kernel/signal.c:953:9: sparse:     got struct lockdep_map [noderef] __rcu *

Looks like sparse is complaining about the missing __user attribute in the cast:
============================================
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 9c7265b524c73..437de52e2ecaa 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -357,7 +357,7 @@ __setup_frame(int sig, struct ksignal *ksig, sigset_t *set,
         * reasons and because gdb uses it as a signature to notice
         * signal handler stack frames.
         */
-       unsafe_put_user(*((u64 *)&retcode), (u64 *)frame->retcode, Efault);
+       unsafe_put_user(*((u64 *)&retcode), (__user u64
*)frame->retcode, Efault);
        user_access_end();

        /* Set up registers for signal handler */
============================================

The only reason it blames KMSAN patches is because those add yet
another hook inside unsafe_put_user() that expects a __user pointer.


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 04/45] x86: asm: instrument usercopy in get_user() and __put_user_size()
@ 2022-07-15 14:03       ` Alexander Potapenko
  0 siblings, 0 replies; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-15 14:03 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 7557 bytes --]

On Sat, Jul 2, 2022 at 5:47 AM kernel test robot <lkp@intel.com> wrote:
>
> Hi Alexander,
>
> I love your patch! Perhaps something to improve:
>
> [auto build test WARNING on masahiroy-kbuild/for-next]
> [also build test WARNING on linus/master v5.19-rc4 next-20220701]
> [cannot apply to tip/x86/core tip/x86/mm]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
>
> url:    https://github.com/intel-lab-lkp/linux/commits/Alexander-Potapenko/Add-KernelMemorySanitizer-infrastructure/20220701-222712
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild.git for-next
> config: i386-randconfig-s002 (https://download.01.org/0day-ci/archive/20220702/202207021129.palrTLrL-lkp(a)intel.com/config)
> compiler: gcc-11 (Debian 11.3.0-3) 11.3.0
> reproduce:
>         # apt-get install sparse
>         # sparse version: v0.6.4-39-gce1a6720-dirty
>         # https://github.com/intel-lab-lkp/linux/commit/0ca0e4029535365a65588446ba55a952ca186079
>         git remote add linux-review https://github.com/intel-lab-lkp/linux
>         git fetch --no-tags linux-review Alexander-Potapenko/Add-KernelMemorySanitizer-infrastructure/20220701-222712
>         git checkout 0ca0e4029535365a65588446ba55a952ca186079
>         # save the config file
>         mkdir build_dir && cp config build_dir/.config
>         make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=i386 SHELL=/bin/bash arch/x86/kernel/ mm/
>
> If you fix the issue, kindly add following tag where applicable
> Reported-by: kernel test robot <lkp@intel.com>
>
>
> sparse warnings: (new ones prefixed by >>)
>    arch/x86/kernel/signal.c:360:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:360:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:360:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:360:9: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected void const volatile [noderef] __user *ptr @@     got unsigned long long [usertype] * @@
>    arch/x86/kernel/signal.c:360:9: sparse:     expected void const volatile [noderef] __user *ptr
>    arch/x86/kernel/signal.c:360:9: sparse:     got unsigned long long [usertype] *
>    arch/x86/kernel/signal.c:360:9: sparse: sparse: cast removes address space '__user' of expression
> >> arch/x86/kernel/signal.c:360:9: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected void [noderef] __user *to @@     got unsigned long long [usertype] * @@
>    arch/x86/kernel/signal.c:360:9: sparse:     expected void [noderef] __user *to
>    arch/x86/kernel/signal.c:360:9: sparse:     got unsigned long long [usertype] *
>    arch/x86/kernel/signal.c:360:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:360:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:360:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:360:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:360:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:360:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:360:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:360:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:360:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:360:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:420:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:420:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:420:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:420:9: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected void const volatile [noderef] __user *ptr @@     got unsigned long long [usertype] * @@
>    arch/x86/kernel/signal.c:420:9: sparse:     expected void const volatile [noderef] __user *ptr
>    arch/x86/kernel/signal.c:420:9: sparse:     got unsigned long long [usertype] *
>    arch/x86/kernel/signal.c:420:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:420:9: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected void [noderef] __user *to @@     got unsigned long long [usertype] * @@
>    arch/x86/kernel/signal.c:420:9: sparse:     expected void [noderef] __user *to
>    arch/x86/kernel/signal.c:420:9: sparse:     got unsigned long long [usertype] *
>    arch/x86/kernel/signal.c:420:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:420:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:420:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:420:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:420:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:420:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:420:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:420:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:420:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:420:9: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/kernel/signal.c:953:9: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct lockdep_map const *lock @@     got struct lockdep_map [noderef] __rcu * @@
>    arch/x86/kernel/signal.c:953:9: sparse:     expected struct lockdep_map const *lock
>    arch/x86/kernel/signal.c:953:9: sparse:     got struct lockdep_map [noderef] __rcu *

Looks like sparse is complaining about the missing __user attribute in the cast:
============================================
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 9c7265b524c73..437de52e2ecaa 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -357,7 +357,7 @@ __setup_frame(int sig, struct ksignal *ksig, sigset_t *set,
         * reasons and because gdb uses it as a signature to notice
         * signal handler stack frames.
         */
-       unsafe_put_user(*((u64 *)&retcode), (u64 *)frame->retcode, Efault);
+       unsafe_put_user(*((u64 *)&retcode), (__user u64
*)frame->retcode, Efault);
        user_access_end();

        /* Set up registers for signal handler */
============================================

The only reason it blames KMSAN patches is because those add yet
another hook inside unsafe_put_user() that expects a __user pointer.

^ permalink raw reply related	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 04/45] x86: asm: instrument usercopy in get_user() and __put_user_size()
  2022-07-02 10:45   ` kernel test robot
@ 2022-07-15 16:44       ` Alexander Potapenko
  0 siblings, 0 replies; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-15 16:44 UTC (permalink / raw)
  To: kernel test robot
  Cc: llvm, kbuild-all, Alexander Viro, Alexei Starovoitov,
	Andrew Morton, Linux Memory Management List, Andrey Konovalov,
	Andy Lutomirski, Arnd Bergmann, Borislav Petkov,
	Christoph Hellwig, Christoph Lameter, David Rientjes,
	Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu,
	Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim,
	Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner

On Sat, Jul 2, 2022 at 12:46 PM kernel test robot <lkp@intel.com> wrote:
>
> Hi Alexander,
>
> I love your patch! Yet something to improve:
>
> [auto build test ERROR on masahiroy-kbuild/for-next]
> [also build test ERROR on linus/master v5.19-rc4 next-20220701]
> [cannot apply to tip/x86/core tip/x86/mm]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
>
> url:    https://github.com/intel-lab-lkp/linux/commits/Alexander-Potapenko/Add-KernelMemorySanitizer-infrastructure/20220701-222712
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild.git for-next
> config: i386-randconfig-a011 (https://download.01.org/0day-ci/archive/20220702/202207021844.0J4s1Gjz-lkp@intel.com/config)
> compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project a9119143a2d1f4d0d0bc1fe0d819e5351b4e0deb)
> reproduce (this is a W=1 build):
>         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # https://github.com/intel-lab-lkp/linux/commit/0ca0e4029535365a65588446ba55a952ca186079
>         git remote add linux-review https://github.com/intel-lab-lkp/linux
>         git fetch --no-tags linux-review Alexander-Potapenko/Add-KernelMemorySanitizer-infrastructure/20220701-222712
>         git checkout 0ca0e4029535365a65588446ba55a952ca186079
>         # save the config file
>         mkdir build_dir && cp config build_dir/.config
>         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash
>
> If you fix the issue, kindly add following tag where applicable
> Reported-by: kernel test robot <lkp@intel.com>

Uh oh, this is a very good catch.

> All errors (new ones prefixed by >>):
>
> >> arch/x86/math-emu/reg_ld_str.c:1043:3: error: address of bit-field requested
>                    FPU_get_user(instruction_address.selector,
>                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>    arch/x86/math-emu/fpu_system.h:127:36: note: expanded from macro 'FPU_get_user'
>    #define FPU_get_user(x,y) do { if (get_user((x),(y))) FPU_abort; } while (0)
>                                   ~~~~^~~~~~~~~~~~~~~~~~
>    arch/x86/include/asm/uaccess.h:131:43: note: expanded from macro 'get_user'
>    #define get_user(x,ptr) ({ might_fault(); do_get_user_call(get_user,x,ptr); })
>                                              ^
>    arch/x86/include/asm/uaccess.h:103:43: note: expanded from macro 'do_get_user_call'
>            instrument_copy_from_user_before((void *)&(x), ptr, sizeof(*(ptr))); \
>                                                     ^
>    include/linux/compiler.h:56:47: note: expanded from macro 'if'
>    #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
>                               ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~
>    include/linux/compiler.h:58:52: note: expanded from macro '__trace_if_var'
>    #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond))
>                                                       ^~~~
> >> arch/x86/math-emu/reg_ld_str.c:1043:3: error: address of bit-field requested
>                    FPU_get_user(instruction_address.selector,
>                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Actually `x` in get_user(x, ptr) and put_user(x, ptr) is not
guaranteed to be addressable (e.g. in this case it is a bitfield).
We can copy the values to a local var, like it's done in __put_user_size:

254         __typeof__(*(ptr)) __pus_val = x;                               \
255         __chk_user_ptr(ptr);                                            \
256         instrument_copy_to_user(ptr, &(__pus_val), size);               \

, but this won't help KASAN or KCSAN detect the bugs accessing x.
Perhaps a better solution would be to declare instrument_get_user()
and instrument_put_user(), which will unpoison/check the values for
KMSAN and do nothing for other tools.


-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 04/45] x86: asm: instrument usercopy in get_user() and __put_user_size()
@ 2022-07-15 16:44       ` Alexander Potapenko
  0 siblings, 0 replies; 147+ messages in thread
From: Alexander Potapenko @ 2022-07-15 16:44 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 4444 bytes --]

On Sat, Jul 2, 2022 at 12:46 PM kernel test robot <lkp@intel.com> wrote:
>
> Hi Alexander,
>
> I love your patch! Yet something to improve:
>
> [auto build test ERROR on masahiroy-kbuild/for-next]
> [also build test ERROR on linus/master v5.19-rc4 next-20220701]
> [cannot apply to tip/x86/core tip/x86/mm]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
>
> url:    https://github.com/intel-lab-lkp/linux/commits/Alexander-Potapenko/Add-KernelMemorySanitizer-infrastructure/20220701-222712
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild.git for-next
> config: i386-randconfig-a011 (https://download.01.org/0day-ci/archive/20220702/202207021844.0J4s1Gjz-lkp(a)intel.com/config)
> compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project a9119143a2d1f4d0d0bc1fe0d819e5351b4e0deb)
> reproduce (this is a W=1 build):
>         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # https://github.com/intel-lab-lkp/linux/commit/0ca0e4029535365a65588446ba55a952ca186079
>         git remote add linux-review https://github.com/intel-lab-lkp/linux
>         git fetch --no-tags linux-review Alexander-Potapenko/Add-KernelMemorySanitizer-infrastructure/20220701-222712
>         git checkout 0ca0e4029535365a65588446ba55a952ca186079
>         # save the config file
>         mkdir build_dir && cp config build_dir/.config
>         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash
>
> If you fix the issue, kindly add following tag where applicable
> Reported-by: kernel test robot <lkp@intel.com>

Uh oh, this is a very good catch.

> All errors (new ones prefixed by >>):
>
> >> arch/x86/math-emu/reg_ld_str.c:1043:3: error: address of bit-field requested
>                    FPU_get_user(instruction_address.selector,
>                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>    arch/x86/math-emu/fpu_system.h:127:36: note: expanded from macro 'FPU_get_user'
>    #define FPU_get_user(x,y) do { if (get_user((x),(y))) FPU_abort; } while (0)
>                                   ~~~~^~~~~~~~~~~~~~~~~~
>    arch/x86/include/asm/uaccess.h:131:43: note: expanded from macro 'get_user'
>    #define get_user(x,ptr) ({ might_fault(); do_get_user_call(get_user,x,ptr); })
>                                              ^
>    arch/x86/include/asm/uaccess.h:103:43: note: expanded from macro 'do_get_user_call'
>            instrument_copy_from_user_before((void *)&(x), ptr, sizeof(*(ptr))); \
>                                                     ^
>    include/linux/compiler.h:56:47: note: expanded from macro 'if'
>    #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
>                               ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~
>    include/linux/compiler.h:58:52: note: expanded from macro '__trace_if_var'
>    #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond))
>                                                       ^~~~
> >> arch/x86/math-emu/reg_ld_str.c:1043:3: error: address of bit-field requested
>                    FPU_get_user(instruction_address.selector,
>                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Actually `x` in get_user(x, ptr) and put_user(x, ptr) is not
guaranteed to be addressable (e.g. in this case it is a bitfield).
We can copy the values to a local var, like it's done in __put_user_size:

254         __typeof__(*(ptr)) __pus_val = x;                               \
255         __chk_user_ptr(ptr);                                            \
256         instrument_copy_to_user(ptr, &(__pus_val), size);               \

, but this won't help KASAN or KCSAN detect the bugs accessing x.
Perhaps a better solution would be to declare instrument_get_user()
and instrument_put_user(), which will unpoison/check the values for
KMSAN and do nothing for other tools.


-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 16/45] kmsan: handle task creation and exiting
  2022-07-12 13:17   ` Marco Elver
@ 2022-08-02 15:47     ` Alexander Potapenko
  0 siblings, 0 replies; 147+ messages in thread
From: Alexander Potapenko @ 2022-08-02 15:47 UTC (permalink / raw)
  To: Marco Elver
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, Linux Memory Management List,
	Linux-Arch, LKML

On Tue, Jul 12, 2022 at 3:18 PM Marco Elver <elver@google.com> wrote:
>
> On Fri, 1 Jul 2022 at 16:24, 'Alexander Potapenko' via kasan-dev
> <kasan-dev@googlegroups.com> wrote:
> >
> > Tell KMSAN that a new task is created, so the tool creates a backing
> > metadata structure for that task.
> >
> > Signed-off-by: Alexander Potapenko <glider@google.com>
> > ---
> > v2:
> >  -- move implementation of kmsan_task_create() and kmsan_task_exit() here
> >
> > v4:
> >  -- change sizeof(type) to sizeof(*ptr)
> >
> > Link: https://linux-review.googlesource.com/id/I0f41c3a1c7d66f7e14aabcfdfc7c69addb945805
> > ---
> >  include/linux/kmsan.h | 17 +++++++++++++++++
> >  kernel/exit.c         |  2 ++
> >  kernel/fork.c         |  2 ++
> >  mm/kmsan/core.c       | 10 ++++++++++
> >  mm/kmsan/hooks.c      | 19 +++++++++++++++++++
> >  mm/kmsan/kmsan.h      |  2 ++
> >  6 files changed, 52 insertions(+)
> >
> > diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h
> > index fd76cea338878..b71e2032222e9 100644
> > --- a/include/linux/kmsan.h
> > +++ b/include/linux/kmsan.h
> > @@ -16,6 +16,7 @@
> >
> >  struct page;
> >  struct kmem_cache;
> > +struct task_struct;
> >
> >  #ifdef CONFIG_KMSAN
> >
> > @@ -42,6 +43,14 @@ struct kmsan_ctx {
> >         bool allow_reporting;
> >  };
> >
> > +void kmsan_task_create(struct task_struct *task);
> > +
> > +/**
> > + * kmsan_task_exit() - Notify KMSAN that a task has exited.
> > + * @task: task about to finish.
> > + */
> > +void kmsan_task_exit(struct task_struct *task);
> > +
> >  /**
> >   * kmsan_alloc_page() - Notify KMSAN about an alloc_pages() call.
> >   * @page:  struct page pointer returned by alloc_pages().
> > @@ -163,6 +172,14 @@ void kmsan_iounmap_page_range(unsigned long start, unsigned long end);
> >
> >  #else
> >
> > +static inline void kmsan_task_create(struct task_struct *task)
> > +{
> > +}
> > +
> > +static inline void kmsan_task_exit(struct task_struct *task)
> > +{
> > +}
> > +
> >  static inline int kmsan_alloc_page(struct page *page, unsigned int order,
> >                                    gfp_t flags)
> >  {
> > diff --git a/kernel/exit.c b/kernel/exit.c
> > index f072959fcab7f..1784b7a741ddd 100644
> > --- a/kernel/exit.c
> > +++ b/kernel/exit.c
> > @@ -60,6 +60,7 @@
> >  #include <linux/writeback.h>
> >  #include <linux/shm.h>
> >  #include <linux/kcov.h>
> > +#include <linux/kmsan.h>
> >  #include <linux/random.h>
> >  #include <linux/rcuwait.h>
> >  #include <linux/compat.h>
> > @@ -741,6 +742,7 @@ void __noreturn do_exit(long code)
> >         WARN_ON(tsk->plug);
> >
> >         kcov_task_exit(tsk);
> > +       kmsan_task_exit(tsk);
> >
> >         coredump_task_exit(tsk);
> >         ptrace_event(PTRACE_EVENT_EXIT, code);
> > diff --git a/kernel/fork.c b/kernel/fork.c
> > index 9d44f2d46c696..6dfca6f00ec82 100644
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -37,6 +37,7 @@
> >  #include <linux/fdtable.h>
> >  #include <linux/iocontext.h>
> >  #include <linux/key.h>
> > +#include <linux/kmsan.h>
> >  #include <linux/binfmts.h>
> >  #include <linux/mman.h>
> >  #include <linux/mmu_notifier.h>
> > @@ -1026,6 +1027,7 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
> >         tsk->worker_private = NULL;
> >
> >         kcov_task_init(tsk);
> > +       kmsan_task_create(tsk);
> >         kmap_local_fork(tsk);
> >
> >  #ifdef CONFIG_FAULT_INJECTION
> > diff --git a/mm/kmsan/core.c b/mm/kmsan/core.c
> > index 16fb8880a9c6d..7eabed03ed10b 100644
> > --- a/mm/kmsan/core.c
> > +++ b/mm/kmsan/core.c
> > @@ -44,6 +44,16 @@ bool kmsan_enabled __read_mostly;
> >   */
> >  DEFINE_PER_CPU(struct kmsan_ctx, kmsan_percpu_ctx);
> >
> > +void kmsan_internal_task_create(struct task_struct *task)
> > +{
> > +       struct kmsan_ctx *ctx = &task->kmsan_ctx;
> > +       struct thread_info *info = current_thread_info();
> > +
> > +       __memset(ctx, 0, sizeof(*ctx));
> > +       ctx->allow_reporting = true;
> > +       kmsan_internal_unpoison_memory(info, sizeof(*info), false);
> > +}
> > +
> >  void kmsan_internal_poison_memory(void *address, size_t size, gfp_t flags,
> >                                   unsigned int poison_flags)
> >  {
> > diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
> > index 052e17b7a717d..43a529569053d 100644
> > --- a/mm/kmsan/hooks.c
> > +++ b/mm/kmsan/hooks.c
> > @@ -26,6 +26,25 @@
> >   * skipping effects of functions like memset() inside instrumented code.
> >   */
> >
> > +void kmsan_task_create(struct task_struct *task)
> > +{
> > +       kmsan_enter_runtime();
> > +       kmsan_internal_task_create(task);
> > +       kmsan_leave_runtime();
> > +}
> > +EXPORT_SYMBOL(kmsan_task_create);
> > +
> > +void kmsan_task_exit(struct task_struct *task)
> > +{
> > +       struct kmsan_ctx *ctx = &task->kmsan_ctx;
> > +
> > +       if (!kmsan_enabled || kmsan_in_runtime())
> > +               return;
> > +
> > +       ctx->allow_reporting = false;
> > +}
> > +EXPORT_SYMBOL(kmsan_task_exit);
>
> Why are these EXPORT_SYMBOL? Will they be used from some kernel module?

You're right, most of them will not. Will fix.

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 15/45] mm: kmsan: call KMSAN hooks from SLUB code
  2022-07-12 13:13   ` Marco Elver
@ 2022-08-02 16:31     ` Alexander Potapenko
  0 siblings, 0 replies; 147+ messages in thread
From: Alexander Potapenko @ 2022-08-02 16:31 UTC (permalink / raw)
  To: Marco Elver
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, Linux Memory Management List,
	Linux-Arch, LKML

On Tue, Jul 12, 2022 at 3:14 PM Marco Elver <elver@google.com> wrote:
>
> On Fri, 1 Jul 2022 at 16:23, 'Alexander Potapenko' via kasan-dev
> <kasan-dev@googlegroups.com> wrote:
> >
> > In order to report uninitialized memory coming from heap allocations
> > KMSAN has to poison them unless they're created with __GFP_ZERO.
> >
> > It's handy that we need KMSAN hooks in the places where
> > init_on_alloc/init_on_free initialization is performed.
> >
> > In addition, we apply __no_kmsan_checks to get_freepointer_safe() to
> > suppress reports when accessing freelist pointers that reside in freed
> > objects.
> >
> > Signed-off-by: Alexander Potapenko <glider@google.com>
>
> Reviewed-by: Marco Elver <elver@google.com>
>
> But see comment below.
>

>
> Remove unnecessary whitespace change.
Will do, thanks for catching!


-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 13/45] MAINTAINERS: add entry for KMSAN
  2022-07-12 12:06   ` Marco Elver
@ 2022-08-02 16:39     ` Alexander Potapenko
  0 siblings, 0 replies; 147+ messages in thread
From: Alexander Potapenko @ 2022-08-02 16:39 UTC (permalink / raw)
  To: Marco Elver
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, Linux Memory Management List,
	Linux-Arch, LKML

> >
> > +KMSAN
> > +M:     Alexander Potapenko <glider@google.com>
> > +R:     Marco Elver <elver@google.com>
> > +R:     Dmitry Vyukov <dvyukov@google.com>
> > +L:     kasan-dev@googlegroups.com
> > +S:     Maintained
> > +F:     Documentation/dev-tools/kmsan.rst
> > +F:     include/linux/kmsan*.h
> > +F:     lib/Kconfig.kmsan
> > +F:     mm/kmsan/
> > +F:     scripts/Makefile.kmsan
> > +
>
> It's missing:
>
>   arch/*/include/asm/kmsan.h

Done

-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 25/45] kmsan: add tests for KMSAN
  2022-07-12 14:16   ` Marco Elver
@ 2022-08-02 17:29     ` Alexander Potapenko
  0 siblings, 0 replies; 147+ messages in thread
From: Alexander Potapenko @ 2022-08-02 17:29 UTC (permalink / raw)
  To: Marco Elver
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, Linux Memory Management List,
	Linux-Arch, LKML

On Tue, Jul 12, 2022 at 4:17 PM Marco Elver <elver@google.com> wrote:
>

> > +static void test_params(struct kunit *test)
> > +{
> > +#ifdef CONFIG_KMSAN_CHECK_PARAM_RETVAL
>
> if (IS_ENABLED(...))
>
Not sure this is valid C, given that EXPECTATION_UNINIT_VALUE_FN
introduces a variable declaration.

> > +       if (vbuf)
> > +               vunmap(vbuf);
> > +       for (i = 0; i < npages; i++)
>
> add { }
>
Done.


> if (IS_ENABLED(CONFIG_KMSAN_CHECK_PARAM_RETVAL))
>
Same as above.

> > +static void unregister_tracepoints(struct tracepoint *tp, void *ignore)
> > +{
> > +       if (!strcmp(tp->name, "console"))
> > +               tracepoint_probe_unregister(tp, probe_console, NULL);
> > +}
> > +
> > +/*
> > + * We only want to do tracepoints setup and teardown once, therefore we have to
> > + * customize the init and exit functions and cannot rely on kunit_test_suite().
> > + */
>
> This is no longer true. See a recent version of
> mm/kfence/kfence_test.c which uses the new suite_init/exit.

Done.

> > +late_initcall_sync(kmsan_test_init);
> > +module_exit(kmsan_test_exit);
> > +
> > +MODULE_LICENSE("GPL v2");
>
> A recent version of checkpatch should complain about this, wanting
> only "GPL" instead of "GPL v2".
>
Fixed.

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 29/45] block: kmsan: skip bio block merging logic for KMSAN
  2022-07-13 10:22   ` Marco Elver
@ 2022-08-02 17:47     ` Alexander Potapenko
  0 siblings, 0 replies; 147+ messages in thread
From: Alexander Potapenko @ 2022-08-02 17:47 UTC (permalink / raw)
  To: Marco Elver
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, Linux Memory Management List,
	Linux-Arch, LKML, Eric Biggers

On Wed, Jul 13, 2022 at 12:23 PM Marco Elver <elver@google.com> wrote:
>
> On Fri, Jul 01, 2022 at 04:22PM +0200, 'Alexander Potapenko' via kasan-dev wrote:
> [...]
> > --- a/block/bio.c
> > +++ b/block/bio.c
> > @@ -867,6 +867,8 @@ static inline bool page_is_mergeable(const struct bio_vec *bv,
> >               return false;
> >
> >       *same_page = ((vec_end_addr & PAGE_MASK) == page_addr);
> > +     if (!*same_page && IS_ENABLED(CONFIG_KMSAN))
> > +             return false;
> >       if (*same_page)
> >               return true;
>
>         if (*same_page)
>                 return true;
>         else if (IS_ENABLED(CONFIG_KMSAN))
>                 return false;
>
Done.
> >       return (bv->bv_page + bv_end / PAGE_SIZE) == (page + off / PAGE_SIZE);



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 17/45] init: kmsan: call KMSAN initialization routines
  2022-07-12 14:05   ` Marco Elver
@ 2022-08-02 20:07     ` Alexander Potapenko
  2022-08-03  9:08       ` Marco Elver
  0 siblings, 1 reply; 147+ messages in thread
From: Alexander Potapenko @ 2022-08-02 20:07 UTC (permalink / raw)
  To: Marco Elver
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, Linux Memory Management List,
	Linux-Arch, LKML

On Tue, Jul 12, 2022 at 4:05 PM Marco Elver <elver@google.com> wrote:
>

> > +/**
> > + * kmsan_task_exit() - Notify KMSAN that a task has exited.
> > + * @task: task about to finish.
> > + */
> > +void kmsan_task_exit(struct task_struct *task);
>
> Something went wrong with patch shuffling here I think,
> kmsan_task_create + kmsan_task_exit decls are duplicated by this
> patch.
Right, I've messed it up. Will fix.

> > +
> > +struct page_pair {
>
> 'struct shadow_origin_pages' for a more descriptive name?
How about "metadata_page_pair"?

> > + * At the very end there may be leftover blocks in held_back[]. They are
> > + * collected later by kmsan_memblock_discard().
> > + */
> > +bool kmsan_memblock_free_pages(struct page *page, unsigned int order)
> > +{
> > +       struct page *shadow, *origin;
>
> Can this just be 'struct page_pair'?

Not sure this is worth it. We'll save one line by assigning this
struct to held_back[order], but the call to kmsan_setup_meta() will
become more verbose.
(and passing a struct page_pair to kmsan_setup_meta() looks excessive).


> > +                     struct page *origin, int order)
> > +{
> > +       int i;
> > +
> > +       for (i = 0; i < (1 << order); i++) {
>
> Noticed this in many places, but we can just make these "for (int i =.." now.
Fixed here and all over the runtime.

> > @@ -1731,6 +1731,9 @@ void __init memblock_free_pages(struct page *page, unsigned long pfn,
> >  {
> >         if (early_page_uninitialised(pfn))
> >                 return;
> > +       if (!kmsan_memblock_free_pages(page, order))
> > +               /* KMSAN will take care of these pages. */
> > +               return;
>
> Add {} because the then-statement is not right below the if.

Done.

-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 17/45] init: kmsan: call KMSAN initialization routines
  2022-08-02 20:07     ` Alexander Potapenko
@ 2022-08-03  9:08       ` Marco Elver
  0 siblings, 0 replies; 147+ messages in thread
From: Marco Elver @ 2022-08-03  9:08 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, Linux Memory Management List,
	Linux-Arch, LKML

On Tue, 2 Aug 2022 at 22:08, Alexander Potapenko <glider@google.com> wrote:
>
> On Tue, Jul 12, 2022 at 4:05 PM Marco Elver <elver@google.com> wrote:
> >
>
> > > +/**
> > > + * kmsan_task_exit() - Notify KMSAN that a task has exited.
> > > + * @task: task about to finish.
> > > + */
> > > +void kmsan_task_exit(struct task_struct *task);
> >
> > Something went wrong with patch shuffling here I think,
> > kmsan_task_create + kmsan_task_exit decls are duplicated by this
> > patch.
> Right, I've messed it up. Will fix.
>
> > > +
> > > +struct page_pair {
> >
> > 'struct shadow_origin_pages' for a more descriptive name?
> How about "metadata_page_pair"?

Sure - this is local anyway, but page_pair was too generic.

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 10/45] libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE
  2022-07-11 16:26   ` Marco Elver
@ 2022-08-03  9:41     ` Alexander Potapenko
  2022-08-03  9:44     ` Alexander Potapenko
  1 sibling, 0 replies; 147+ messages in thread
From: Alexander Potapenko @ 2022-08-03  9:41 UTC (permalink / raw)
  To: Marco Elver, Dan Williams
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, Linux Memory Management List,
	Linux-Arch, LKML

(+ Dan Williams)

On Mon, Jul 11, 2022 at 6:27 PM Marco Elver <elver@google.com> wrote:
>
> On Fri, 1 Jul 2022 at 16:23, Alexander Potapenko <glider@google.com> wrote:
> >
> > KMSAN adds extra metadata fields to struct page, so it does not fit into
> > 64 bytes anymore.
>
> Does this somehow cause extra space being used in all kernel configs?
> If not, it would be good to note this in the commit message.

I actually couldn't verify this on QEMU, because the driver never got loaded.
Looks like this increases the amount of memory used by the nvdimm
driver in all kernel configs that enable it (including those that
don't use KMSAN), but I am not sure how much is that.

Dan, do you know how bad increasing MAX_STRUCT_PAGE_SIZE can be?

>
> > Signed-off-by: Alexander Potapenko <glider@google.com>
>
> Reviewed-by: Marco Elver <elver@google.com>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 10/45] libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE
  2022-07-11 16:26   ` Marco Elver
  2022-08-03  9:41     ` Alexander Potapenko
@ 2022-08-03  9:44     ` Alexander Potapenko
  2023-01-05 22:08       ` Dan Williams
  1 sibling, 1 reply; 147+ messages in thread
From: Alexander Potapenko @ 2022-08-03  9:44 UTC (permalink / raw)
  To: Marco Elver, Dan Williams
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, Linux Memory Management List,
	Linux-Arch, LKML

(+ Dan Williams)
(resending with patch context included)

On Mon, Jul 11, 2022 at 6:27 PM Marco Elver <elver@google.com> wrote:
>
> On Fri, 1 Jul 2022 at 16:23, Alexander Potapenko <glider@google.com> wrote:
> >
> > KMSAN adds extra metadata fields to struct page, so it does not fit into
> > 64 bytes anymore.
>
> Does this somehow cause extra space being used in all kernel configs?
> If not, it would be good to note this in the commit message.
>
I actually couldn't verify this on QEMU, because the driver never got loaded.
Looks like this increases the amount of memory used by the nvdimm
driver in all kernel configs that enable it (including those that
don't use KMSAN), but I am not sure how much is that.

Dan, do you know how bad increasing MAX_STRUCT_PAGE_SIZE can be?

>
> > Signed-off-by: Alexander Potapenko <glider@google.com>
>
> Reviewed-by: Marco Elver <elver@google.com>
>
> > ---
> > Link: https://linux-review.googlesource.com/id/I353796acc6a850bfd7bb342aa1b63e616fc614f1
> > ---
> >  drivers/nvdimm/nd.h       | 2 +-
> >  drivers/nvdimm/pfn_devs.c | 2 +-
> >  2 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
> > index ec5219680092d..85ca5b4da3cf3 100644
> > --- a/drivers/nvdimm/nd.h
> > +++ b/drivers/nvdimm/nd.h
> > @@ -652,7 +652,7 @@ void devm_namespace_disable(struct device *dev,
> >                 struct nd_namespace_common *ndns);
> >  #if IS_ENABLED(CONFIG_ND_CLAIM)
> >  /* max struct page size independent of kernel config */
> > -#define MAX_STRUCT_PAGE_SIZE 64
> > +#define MAX_STRUCT_PAGE_SIZE 128
> >  int nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct dev_pagemap *pgmap);
> >  #else
> >  static inline int nvdimm_setup_pfn(struct nd_pfn *nd_pfn,
> > diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
> > index 0e92ab4b32833..61af072ac98f9 100644
> > --- a/drivers/nvdimm/pfn_devs.c
> > +++ b/drivers/nvdimm/pfn_devs.c
> > @@ -787,7 +787,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
> >                  * when populating the vmemmap. This *should* be equal to
> >                  * PMD_SIZE for most architectures.
> >                  *
> > -                * Also make sure size of struct page is less than 64. We
> > +                * Also make sure size of struct page is less than 128. We
> >                  * want to make sure we use large enough size here so that
> >                  * we don't have a dynamic reserve space depending on
> >                  * struct page size. But we also want to make sure we notice
> > --
> > 2.37.0.rc0.161.g10f37bed90-goog
> >



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 14/45] mm: kmsan: maintain KMSAN metadata for page operations
  2022-07-12 12:20   ` Marco Elver
@ 2022-08-03 10:30     ` Alexander Potapenko
  0 siblings, 0 replies; 147+ messages in thread
From: Alexander Potapenko @ 2022-08-03 10:30 UTC (permalink / raw)
  To: Marco Elver
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, Linux Memory Management List,
	Linux-Arch, LKML

On Tue, Jul 12, 2022 at 2:21 PM Marco Elver <elver@google.com> wrote:
>
> On Fri, 1 Jul 2022 at 16:23, Alexander Potapenko <glider@google.com> wrote:
> >
> > Insert KMSAN hooks that make the necessary bookkeeping changes:
> >  - poison page shadow and origins in alloc_pages()/free_page();
> >  - clear page shadow and origins in clear_page(), copy_user_highpage();
> >  - copy page metadata in copy_highpage(), wp_page_copy();
> >  - handle vmap()/vunmap()/iounmap();
> >
> > Signed-off-by: Alexander Potapenko <glider@google.com>
> > ---
> > v2:
> >  -- move page metadata hooks implementation here
> >  -- remove call to kmsan_memblock_free_pages()
> >
> > v3:
> >  -- use PAGE_SHIFT in kmsan_ioremap_page_range()
> >
> > v4:
> >  -- change sizeof(type) to sizeof(*ptr)
> >  -- replace occurrences of |var| with @var
> >  -- swap mm: and kmsan: in the subject
> >  -- drop __no_sanitize_memory from clear_page()
> >
> > Link: https://linux-review.googlesource.com/id/I6d4f53a0e7eab46fa29f0348f3095d9f2e326850
> > ---
> >  arch/x86/include/asm/page_64.h |  12 ++++
> >  arch/x86/mm/ioremap.c          |   3 +
> >  include/linux/highmem.h        |   3 +
> >  include/linux/kmsan.h          | 123 +++++++++++++++++++++++++++++++++
> >  mm/internal.h                  |   6 ++
> >  mm/kmsan/hooks.c               |  87 +++++++++++++++++++++++
> >  mm/kmsan/shadow.c              | 114 ++++++++++++++++++++++++++++++
> >  mm/memory.c                    |   2 +
> >  mm/page_alloc.c                |  11 +++
> >  mm/vmalloc.c                   |  20 +++++-
> >  10 files changed, 379 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/page_64.h b/arch/x86/include/asm/page_64.h
> > index baa70451b8df5..227dd33eb4efb 100644
> > --- a/arch/x86/include/asm/page_64.h
> > +++ b/arch/x86/include/asm/page_64.h
> > @@ -45,14 +45,26 @@ void clear_page_orig(void *page);
> >  void clear_page_rep(void *page);
> >  void clear_page_erms(void *page);
> >
> > +/* This is an assembly header, avoid including too much of kmsan.h */
>
> All of this code is under an "#ifndef __ASSEMBLY__" guard, does it matter?
Actually, the comment is a bit outdated. kmsan-checks.h doesn't
introduce any unnecessary declarations and can be used here.

> > +#ifdef CONFIG_KMSAN
> > +void kmsan_unpoison_memory(const void *addr, size_t size);
> > +#endif
> >  static inline void clear_page(void *page)
> >  {
> > +#ifdef CONFIG_KMSAN
> > +       /* alternative_call_2() changes @page. */
> > +       void *page_copy = page;
> > +#endif
> >         alternative_call_2(clear_page_orig,
> >                            clear_page_rep, X86_FEATURE_REP_GOOD,
> >                            clear_page_erms, X86_FEATURE_ERMS,
> >                            "=D" (page),
> >                            "0" (page)
> >                            : "cc", "memory", "rax", "rcx");
> > +#ifdef CONFIG_KMSAN
> > +       /* Clear KMSAN shadow for the pages that have it. */
> > +       kmsan_unpoison_memory(page_copy, PAGE_SIZE);
>
> What happens if this is called before the alternative-call? Could this
> (in the interest of simplicity) be moved above it? And if you used the
> kmsan-checks.h header, it also doesn't need any "ifdef CONFIG_KMSAN"
> anymore.

Good idea, that'll work.

> > +#endif
> >  }



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 33/45] x86: kmsan: disable instrumentation of unsupported code
  2022-07-12 13:43   ` Marco Elver
@ 2022-08-03 10:52     ` Alexander Potapenko
  0 siblings, 0 replies; 147+ messages in thread
From: Alexander Potapenko @ 2022-08-03 10:52 UTC (permalink / raw)
  To: Marco Elver
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, Linux Memory Management List,
	Linux-Arch, LKML

On Tue, Jul 12, 2022 at 3:44 PM Marco Elver <elver@google.com> wrote:
>
> On Fri, 1 Jul 2022 at 16:24, 'Alexander Potapenko' via kasan-dev
> <kasan-dev@googlegroups.com> wrote:
> [...]
> > ---
> >  arch/x86/boot/Makefile            | 1 +
> >  arch/x86/boot/compressed/Makefile | 1 +
> >  arch/x86/entry/vdso/Makefile      | 3 +++
> >  arch/x86/kernel/Makefile          | 2 ++
> >  arch/x86/kernel/cpu/Makefile      | 1 +
> >  arch/x86/mm/Makefile              | 2 ++
> >  arch/x86/realmode/rm/Makefile     | 1 +
> >  lib/Makefile                      | 2 ++
> [...]
> > --- a/lib/Makefile
> > +++ b/lib/Makefile
> > @@ -272,6 +272,8 @@ obj-$(CONFIG_POLYNOMIAL) += polynomial.o
> >  CFLAGS_stackdepot.o += -fno-builtin
> >  obj-$(CONFIG_STACKDEPOT) += stackdepot.o
> >  KASAN_SANITIZE_stackdepot.o := n
> > +# In particular, instrumenting stackdepot.c with KMSAN will result in infinite
> > +# recursion.
> >  KMSAN_SANITIZE_stackdepot.o := n
> >  KCOV_INSTRUMENT_stackdepot.o := n
>
> This is generic code and not x86, should it have been in the earlier patch?
Ack.


-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 18/45] instrumented.h: add KMSAN support
  2022-07-12 13:51   ` Marco Elver
@ 2022-08-03 11:17     ` Alexander Potapenko
  0 siblings, 0 replies; 147+ messages in thread
From: Alexander Potapenko @ 2022-08-03 11:17 UTC (permalink / raw)
  To: Marco Elver
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, Linux Memory Management List,
	Linux-Arch, LKML

On Tue, Jul 12, 2022 at 3:52 PM Marco Elver <elver@google.com> wrote:
>
> On Fri, 1 Jul 2022 at 16:24, Alexander Potapenko <glider@google.com> wrote:
> >
> > To avoid false positives, KMSAN needs to unpoison the data copied from
> > the userspace. To detect infoleaks - check the memory buffer passed to
> > copy_to_user().
> >
> > Signed-off-by: Alexander Potapenko <glider@google.com>
>
> Reviewed-by: Marco Elver <elver@google.com>
>
> With the code simplification below.
>
> [...]
> > --- a/mm/kmsan/hooks.c
> > +++ b/mm/kmsan/hooks.c
> > @@ -212,6 +212,44 @@ void kmsan_iounmap_page_range(unsigned long start, unsigned long end)
> >  }
> >  EXPORT_SYMBOL(kmsan_iounmap_page_range);
> >
> > +void kmsan_copy_to_user(void __user *to, const void *from, size_t to_copy,
> > +                       size_t left)
> > +{
> > +       unsigned long ua_flags;
> > +
> > +       if (!kmsan_enabled || kmsan_in_runtime())
> > +               return;
> > +       /*
> > +        * At this point we've copied the memory already. It's hard to check it
> > +        * before copying, as the size of actually copied buffer is unknown.
> > +        */
> > +
> > +       /* copy_to_user() may copy zero bytes. No need to check. */
> > +       if (!to_copy)
> > +               return;
> > +       /* Or maybe copy_to_user() failed to copy anything. */
> > +       if (to_copy <= left)
> > +               return;
> > +
> > +       ua_flags = user_access_save();
> > +       if ((u64)to < TASK_SIZE) {
> > +               /* This is a user memory access, check it. */
> > +               kmsan_internal_check_memory((void *)from, to_copy - left, to,
> > +                                           REASON_COPY_TO_USER);
>
> This could just do "} else {" and the stuff below, and would result in
> simpler code with no explicit "return" and no duplicated
> user_access_restore().

Sounds good, will do.


-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 11/45] kmsan: add KMSAN runtime core
  2022-07-02  0:18   ` Hillf Danton
@ 2022-08-03 17:25     ` Alexander Potapenko
  0 siblings, 0 replies; 147+ messages in thread
From: Alexander Potapenko @ 2022-08-03 17:25 UTC (permalink / raw)
  To: Hillf Danton; +Cc: Linux Memory Management List, LKML

On Sat, Jul 2, 2022 at 2:18 AM Hillf Danton <hdanton@sina.com> wrote:
>
> On Fri,  1 Jul 2022 16:22:36 +0200 Alexander Potapenko wrote:
> > +
> > +bool kmsan_internal_is_module_addr(void *vaddr)
> > +{
> > +     return ((u64)vaddr >=3D MODULES_VADDR) && ((u64)vaddr < MODULES_END);
> > +}
> > +
> > +bool kmsan_internal_is_vmalloc_addr(void *addr)
> > +{
> > +     return ((u64)addr >=3D VMALLOC_START) && ((u64)addr < VMALLOC_END);
> > +}
>
> Given is_vmalloc_addr(), feel free to add a one-line comment showing the
> reason for adding the kmsan internal version.

Ok, will do. I'm also going to move these two to mm/kmsan/kmsan.h, so
that they can be inlined.
Keeping internal versions allows us to not have these two functions
instrumented by KMSAN, which gains us some performance and (more
importantly) prevents potential recursion.

In fact right now the original is_vmalloc_addr() doesn't contain
instrumented memory accesses and is thus safe to be called from KMSAN
runtime without causing recursion.
I am not sure though whether we can rely on that remaining true, so I
added it along with the internal version of is_module_address(), which
actually has the problem.

> Hillf




--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 11/45] kmsan: add KMSAN runtime core
  2022-07-13 10:04   ` Marco Elver
@ 2022-08-03 17:45     ` Alexander Potapenko
  0 siblings, 0 replies; 147+ messages in thread
From: Alexander Potapenko @ 2022-08-03 17:45 UTC (permalink / raw)
  To: Marco Elver
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, Linux Memory Management List,
	Linux-Arch, LKML

On Wed, Jul 13, 2022 at 12:04 PM Marco Elver <elver@google.com> wrote:
>
> On Fri, Jul 01, 2022 at 04:22PM +0200, 'Alexander Potapenko' via kasan-dev wrote:
> [...]
> > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> > index 2e24db4bff192..59819e6fa5865 100644
> > --- a/lib/Kconfig.debug
> > +++ b/lib/Kconfig.debug
> > @@ -963,6 +963,7 @@ config DEBUG_STACKOVERFLOW
> >
> >  source "lib/Kconfig.kasan"
> >  source "lib/Kconfig.kfence"
> > +source "lib/Kconfig.kmsan"
> >
> >  endmenu # "Memory Debugging"
> >
> > diff --git a/lib/Kconfig.kmsan b/lib/Kconfig.kmsan
> > new file mode 100644
> > index 0000000000000..8f768d4034e3c
> > --- /dev/null
> > +++ b/lib/Kconfig.kmsan
> > @@ -0,0 +1,50 @@
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +config HAVE_ARCH_KMSAN
> > +     bool
> > +
> > +config HAVE_KMSAN_COMPILER
> > +     # Clang versions <14.0.0 also support -fsanitize=kernel-memory, but not
> > +     # all the features necessary to build the kernel with KMSAN.
> > +     depends on CC_IS_CLANG && CLANG_VERSION >= 140000
> > +     def_bool $(cc-option,-fsanitize=kernel-memory -mllvm -msan-disable-checks=1)
> > +
> > +config HAVE_KMSAN_PARAM_RETVAL
> > +     # Separate check for -fsanitize-memory-param-retval support.
>
> This comment doesn't add much value, maybe instead say that "Supported
> only by Clang >= 15."
Fixed.

> > +     depends on CC_IS_CLANG && CLANG_VERSION >= 140000
>
> Why not just "depends on HAVE_KMSAN_COMPILER"? (All
> fsanitize-memory-param-retval supporting compilers must also be KMSAN
> compilers.)
Good idea, will do.

> > +     def_bool $(cc-option,-fsanitize=kernel-memory -fsanitize-memory-param-retval)
> > +
> > +
>
> HAVE_KMSAN_PARAM_RETVAL should be moved under "if KMSAN" so that this
> isn't unnecessarily evaluated in every kernel build (saving 1 shelling
> out to clang in most builds).
Ack.

> > +config KMSAN
> > +     bool "KMSAN: detector of uninitialized values use"
> > +     depends on HAVE_ARCH_KMSAN && HAVE_KMSAN_COMPILER
> > +     depends on SLUB && DEBUG_KERNEL && !KASAN && !KCSAN
> > +     select STACKDEPOT
> > +     select STACKDEPOT_ALWAYS_INIT
> > +     help
> > +       KernelMemorySanitizer (KMSAN) is a dynamic detector of uses of
> > +       uninitialized values in the kernel. It is based on compiler
> > +       instrumentation provided by Clang and thus requires Clang to build.
> > +
> > +       An important note is that KMSAN is not intended for production use,
> > +       because it drastically increases kernel memory footprint and slows
> > +       the whole system down.
> > +
> > +       See <file:Documentation/dev-tools/kmsan.rst> for more details.
> > +
> > +if KMSAN
> > +
> > +config KMSAN_CHECK_PARAM_RETVAL
> > +     bool "Check for uninitialized values passed to and returned from functions"
> > +     default HAVE_KMSAN_PARAM_RETVAL
>
> This can be enabled even if !HAVE_KMSAN_PARAM_RETVAL. Should this be:
>
>         default y
>         depends on HAVE_KMSAN_PARAM_RETVAL
>
> instead?
>
Ack

-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 11/45] kmsan: add KMSAN runtime core
  2022-07-11 16:49   ` Marco Elver
@ 2022-08-03 18:14     ` Alexander Potapenko
  0 siblings, 0 replies; 147+ messages in thread
From: Alexander Potapenko @ 2022-08-03 18:14 UTC (permalink / raw)
  To: Marco Elver
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, Linux Memory Management List,
	Linux-Arch, LKML

"
> > +struct page;
>
> Either I'm not looking hard enough, but I can't see where this is used
> in this file.

You are right, this declaration should belong to "init: kmsan: call
KMSAN initialization routines". I'll move it accordingly.

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 04/45] x86: asm: instrument usercopy in get_user() and __put_user_size()
  2022-07-07 10:13   ` Marco Elver
@ 2022-08-07 17:33     ` Alexander Potapenko
  0 siblings, 0 replies; 147+ messages in thread
From: Alexander Potapenko @ 2022-08-07 17:33 UTC (permalink / raw)
  To: Marco Elver
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, Linux Memory Management List,
	Linux-Arch, LKML

On Thu, Jul 7, 2022 at 12:13 PM Marco Elver <elver@google.com> wrote:
>
> On Fri, 1 Jul 2022 at 16:23, Alexander Potapenko <glider@google.com> wrote:
> >
> > Use hooks from instrumented.h to notify bug detection tools about
> > usercopy events in get_user() and put_user_size().
> >
> > It's still unclear how to instrument put_user(), which assumes that
> > instrumentation code doesn't clobber RAX.
>
> do_put_user_call() has a comment about KASAN clobbering %ax, doesn't
> this also apply to KMSAN? If not, could we have a <asm/instrumented.h>
> that provides helpers to push registers on the stack and pop them back
> on return?

In fact, yes, it is rather simple to not clobber %ax.
A more important aspect of instrumenting get_user()/put_user() is to
always evaluate `x` and `ptr` only once, because sometimes these
macros get called like `put_user(v, sp++)`.
I might have confused the effects of evaluating sp++ twice with some
register clobbering.

> Also it seems the test robot complained about this patch.
Will fix in v5.
>
> > Signed-off-by: Alexander Potapenko <glider@google.com>
> > ---
> > Link: https://linux-review.googlesource.com/id/Ia9f12bfe5832623250e20f1859fdf5cc485a2fce
> > ---
> >  arch/x86/include/asm/uaccess.h | 7 +++++++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
> > index 913e593a3b45f..1a8b5a234474f 100644
> > --- a/arch/x86/include/asm/uaccess.h
> > +++ b/arch/x86/include/asm/uaccess.h
> > @@ -5,6 +5,7 @@
> >   * User space memory access functions
> >   */
> >  #include <linux/compiler.h>
> > +#include <linux/instrumented.h>
> >  #include <linux/kasan-checks.h>
> >  #include <linux/string.h>
> >  #include <asm/asm.h>
> > @@ -99,11 +100,13 @@ extern int __get_user_bad(void);
> >         int __ret_gu;                                                   \
> >         register __inttype(*(ptr)) __val_gu asm("%"_ASM_DX);            \
> >         __chk_user_ptr(ptr);                                            \
> > +       instrument_copy_from_user_before((void *)&(x), ptr, sizeof(*(ptr))); \
> >         asm volatile("call __" #fn "_%P4"                               \
> >                      : "=a" (__ret_gu), "=r" (__val_gu),                \
> >                         ASM_CALL_CONSTRAINT                             \
> >                      : "0" (ptr), "i" (sizeof(*(ptr))));                \
> >         (x) = (__force __typeof__(*(ptr))) __val_gu;                    \
> > +       instrument_copy_from_user_after((void *)&(x), ptr, sizeof(*(ptr)), 0); \
> >         __builtin_expect(__ret_gu, 0);                                  \
> >  })
> >
> > @@ -248,7 +251,9 @@ extern void __put_user_nocheck_8(void);
> >
> >  #define __put_user_size(x, ptr, size, label)                           \
> >  do {                                                                   \
> > +       __typeof__(*(ptr)) __pus_val = x;                               \
> >         __chk_user_ptr(ptr);                                            \
> > +       instrument_copy_to_user(ptr, &(__pus_val), size);               \
> >         switch (size) {                                                 \
> >         case 1:                                                         \
> >                 __put_user_goto(x, ptr, "b", "iq", label);              \
> > @@ -286,6 +291,7 @@ do {                                                                        \
> >  #define __get_user_size(x, ptr, size, label)                           \
> >  do {                                                                   \
> >         __chk_user_ptr(ptr);                                            \
> > +       instrument_copy_from_user_before((void *)&(x), ptr, size);      \
> >         switch (size) {                                                 \
> >         case 1: {                                                       \
> >                 unsigned char x_u8__;                                   \
> > @@ -305,6 +311,7 @@ do {                                                                        \
> >         default:                                                        \
> >                 (x) = __get_user_bad();                                 \
> >         }                                                               \
> > +       instrument_copy_from_user_after((void *)&(x), ptr, size, 0);    \
> >  } while (0)
> >
> >  #define __get_user_asm(x, addr, itype, ltype, label)                   \
> > --
> > 2.37.0.rc0.161.g10f37bed90-goog
> >



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 43/45] namei: initialize parameters passed to step_into()
  2022-07-01 14:23 ` [PATCH v4 43/45] namei: initialize parameters passed to step_into() Alexander Potapenko
  2022-07-02 17:23   ` Linus Torvalds
@ 2022-08-08 16:37   ` Alexander Potapenko
  1 sibling, 0 replies; 147+ messages in thread
From: Alexander Potapenko @ 2022-08-08 16:37 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev,
	Linux Memory Management List, Linux-Arch, LKML, Evgenii Stepanov,
	Linus Torvalds, Nathan Chancellor, Nick Desaulniers,
	Segher Boessenkool, Vitaly Buka, linux-toolchains

On Fri, Jul 1, 2022 at 4:25 PM Alexander Potapenko <glider@google.com> wrote:
>
> Under certain circumstances initialization of `unsigned seq` and
> `struct inode *inode` passed into step_into() may be skipped.
> In particular, if the call to lookup_fast() in walk_component()
> returns NULL, and lookup_slow() returns a valid dentry, then the
> `seq` and `inode` will remain uninitialized until the call to
> step_into() (see [1] for more info).
>
> Right now step_into() does not use these uninitialized values,
> yet passing uninitialized values to functions is considered undefined
> behavior (see [2]). To fix that, we initialize `seq` and `inode` at
> definition.

Given that Al Viro has a patch series in flight to address the
problem, I am going to drop this patch from KMSAN v5 series.

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 44/45] mm: fs: initialize fsdata passed to write_begin/write_end interface
  2022-07-04 20:07   ` Matthew Wilcox
  2022-07-04 20:30     ` Al Viro
@ 2022-08-25 15:39     ` Alexander Potapenko
  2022-08-25 16:33       ` Linus Torvalds
  1 sibling, 1 reply; 147+ messages in thread
From: Alexander Potapenko @ 2022-08-25 15:39 UTC (permalink / raw)
  To: Matthew Wilcox, Segher Boessenkool, Linus Torvalds, Thomas Gleixner
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Vasily Gorbik, Vegard Nossum, Vlastimil Babka,
	kasan-dev, Linux Memory Management List, Linux-Arch, LKML

On Mon, Jul 4, 2022 at 10:07 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Fri, Jul 01, 2022 at 04:23:09PM +0200, Alexander Potapenko wrote:
> > Functions implementing the a_ops->write_end() interface accept the
> > `void *fsdata` parameter that is supposed to be initialized by the
> > corresponding a_ops->write_begin() (which accepts `void **fsdata`).
> >
> > However not all a_ops->write_begin() implementations initialize `fsdata`
> > unconditionally, so it may get passed uninitialized to a_ops->write_end(),
> > resulting in undefined behavior.
>
> ... wait, passing an uninitialised variable to a function *which doesn't
> actually use it* is now UB?  What genius came up with that rule?  What
> purpose does it serve?
>

Hi Matthew,

There is a discussion at [1], with Segher pointing out a reason for
this rule [2] and Linus requesting that we should be warning about the
cases where uninitialized variables are passed by value.

Right now there are only a handful cases in the kernel where such
passing is performed (we just need one more patch in addition to this
one for KMSAN to boot cleanly). So we are in a good position to start
enforcing this rule, unless there's a reason not to.

I am not sure standard compliance alone is a convincing argument, but
from KMSAN standpoint, performing parameter check at callsites
noticeably eases handling of values passed between instrumented and
non-instrumented code. This lets us avoid some low-level hacks around
instrumentation_begin()/instrumentation_end() (some context available
at [4]).

Let me know what you think,
Alex

[1] - https://lore.kernel.org/lkml/CAFKCwrjBjHMquj-adTf0_1QLYq3Et=gJ0rq6HS-qrAEmVA7Ujw@mail.gmail.com/T/
[2] - https://lore.kernel.org/lkml/20220615164655.GC25951@gate.crashing.org/
[3] - https://lore.kernel.org/lkml/CAHk-=whjz3wO8zD+itoerphWem+JZz4uS3myf6u1Wd6epGRgmQ@mail.gmail.com/
[4] https://lore.kernel.org/lkml/20220426164315.625149-29-glider@google.com/

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 44/45] mm: fs: initialize fsdata passed to write_begin/write_end interface
  2022-08-25 15:39     ` Alexander Potapenko
@ 2022-08-25 16:33       ` Linus Torvalds
  2022-08-25 21:57         ` Segher Boessenkool
  2022-08-25 22:13         ` Segher Boessenkool
  0 siblings, 2 replies; 147+ messages in thread
From: Linus Torvalds @ 2022-08-25 16:33 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Matthew Wilcox, Segher Boessenkool, Thomas Gleixner,
	Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Vasily Gorbik, Vegard Nossum, Vlastimil Babka,
	kasan-dev, Linux Memory Management List, Linux-Arch, LKML

On Thu, Aug 25, 2022 at 8:40 AM Alexander Potapenko <glider@google.com> wrote:
>
> On Mon, Jul 4, 2022 at 10:07 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > ... wait, passing an uninitialised variable to a function *which doesn't
> > actually use it* is now UB?  What genius came up with that rule?  What
> > purpose does it serve?
> >
>
> There is a discussion at [1], with Segher pointing out a reason for
> this rule [2] and Linus requesting that we should be warning about the
> cases where uninitialized variables are passed by value.

I think Matthew was actually more wondering how that UB rule came to be.

Personally, I pretty much despise *all* cases of "undefined behavior",
but "uninitialized argument" across a function call is one of the more
understandable ones.

For one, it's a static sanity checking issue: if function call
arguments can be uninitialized random garbage on the assumption that
the callee doesn't necessarily _use_ them, then any static checker is
going to be unhappy because it means that it can never assume that
incoming arguments have been initialized either.

Of course, that's always true for any pointer passing, but hey, at
least then it's pretty much explicit. You're passing a pointer to some
memory to another function, it's always going to be a bit ambiguous
who is supposed to initialize it - the caller or the callee.

Because one very important "static checker" is the person reading the
code. When I read a function definition, I most definitely have the
expectation that the caller has initialized all the arguments.

So I actually think that "human static checker" is a really important
case. I do not think I'm the only one who expects incomping function
arguments to have values.

But I think the immediate cause of it on a compiler side was basically
things like poison bits. Which are a nice debugging feature, even
though (sadly) I don't think they are usually added the for debugging.
It's always for some other much more nefarious reason (eg ia64 and
speculative loads weren't for "hey, this will help people find bugs",
but for "hey, our architecture depends on static scheduling tricks
that aren't really valid, so we have to take faults late").

Now, imagine you're a compiler, and you see a random incoming integer
argument, and you can't even schedule simple arithmetic expressions on
it early because you don't know if the caller initialized it or not,
and it might cause some poison bit fault...

So you'd most certainly want to know that all incoming arguments are
actually valid, because otherwise you can't do even some really simple
and obvious optimziations.

Of course, on normal architectures, this only ever happens with FP
values, and it's often hard to trigger there too. But you most
definitely *could* see it.

I personally was actually surprised compilers didn't warn for "you are
using an uninitialized value" for a function call argument, because I
mentally consider function call arguments to *be* a use of a value.

Except when the function is inlined, and then it's all different - the
call itself goes away, and I *expect* the compiler to DTRT and not
"use" the argument except when it's used inside the inlined function.

Because hey, that's literally the whole point of inlining, and it
makes the "static checking" problem go away at least for a compiler.

                     Linus

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 44/45] mm: fs: initialize fsdata passed to write_begin/write_end interface
  2022-08-25 16:33       ` Linus Torvalds
@ 2022-08-25 21:57         ` Segher Boessenkool
  2022-08-26 19:41           ` Linus Torvalds
  2022-08-25 22:13         ` Segher Boessenkool
  1 sibling, 1 reply; 147+ messages in thread
From: Segher Boessenkool @ 2022-08-25 21:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexander Potapenko, Matthew Wilcox, Thomas Gleixner,
	Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Vasily Gorbik, Vegard Nossum, Vlastimil Babka,
	kasan-dev, Linux Memory Management List, Linux-Arch, LKML

On Thu, Aug 25, 2022 at 09:33:18AM -0700, Linus Torvalds wrote:
> On Thu, Aug 25, 2022 at 8:40 AM Alexander Potapenko <glider@google.com> wrote:
> >
> > On Mon, Jul 4, 2022 at 10:07 PM Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > ... wait, passing an uninitialised variable to a function *which doesn't
> > > actually use it* is now UB?  What genius came up with that rule?  What
> > > purpose does it serve?
> > >
> >
> > There is a discussion at [1], with Segher pointing out a reason for
> > this rule [2] and Linus requesting that we should be warning about the
> > cases where uninitialized variables are passed by value.
> 
> I think Matthew was actually more wondering how that UB rule came to be.
> 
> Personally, I pretty much despise *all* cases of "undefined behavior",

Let me start by saying you're not alone.  But some UB *cannot* be worked
around by compilers (we cannot solve the halting problem), and some is
very expensive to work around (initialising huge structures is a
typical example).

Many (if not most) instances of undefined behaviour are unavoidable with
a language like C.  A very big part of this is separate compilation,
that is, compiling translation units separately from each other, so that
the compiler does not see all the ways that something is used when it is
compiling it.  There only is UB if something is *used*.

> but "uninitialized argument" across a function call is one of the more
> understandable ones.

Allowing this essentially never allows generating better machine code,
so there are no real arguments for ever allowing it, other than just
inertia: uninitialised everything else is allowed just fine, and only
actually *using* such data is UB.  Passing it around is not!  That is
how everything used to work (with static data, automatic data, function
parameters, the lot).

But it now is clarified that passing data to a function as function
argument is a use of that data by itself, even if the function will not
even look at it ever.

> I personally was actually surprised compilers didn't warn for "you are
> using an uninitialized value" for a function call argument, because I
> mentally consider function call arguments to *be* a use of a value.

The function call is a use of all passed arguments.

> Except when the function is inlined, and then it's all different - the
> call itself goes away, and I *expect* the compiler to DTRT and not
> "use" the argument except when it's used inside the inlined function.

> Because hey, that's literally the whole point of inlining, and it
> makes the "static checking" problem go away at least for a compiler.

But UB is defined in terms of the abstract machine (like *all* of C),
not in terms of the generated machine code.  Typically things will work
fine if they "become invisible" by inlining, but this does not make the
program a correct program ever.  Sorry :-(


Segher

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 44/45] mm: fs: initialize fsdata passed to write_begin/write_end interface
  2022-08-25 16:33       ` Linus Torvalds
  2022-08-25 21:57         ` Segher Boessenkool
@ 2022-08-25 22:13         ` Segher Boessenkool
  1 sibling, 0 replies; 147+ messages in thread
From: Segher Boessenkool @ 2022-08-25 22:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexander Potapenko, Matthew Wilcox, Thomas Gleixner,
	Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Vasily Gorbik, Vegard Nossum, Vlastimil Babka,
	kasan-dev, Linux Memory Management List, Linux-Arch, LKML

On Thu, Aug 25, 2022 at 09:33:18AM -0700, Linus Torvalds wrote:
> So you'd most certainly want to know that all incoming arguments are
> actually valid, because otherwise you can't do even some really simple
> and obvious optimziations.

The C11 change was via DR 338, see
<https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_338.htm>
for more info.


Segher

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 44/45] mm: fs: initialize fsdata passed to write_begin/write_end interface
  2022-08-25 21:57         ` Segher Boessenkool
@ 2022-08-26 19:41           ` Linus Torvalds
  2022-08-31 13:32             ` Alexander Potapenko
  0 siblings, 1 reply; 147+ messages in thread
From: Linus Torvalds @ 2022-08-26 19:41 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Alexander Potapenko, Matthew Wilcox, Thomas Gleixner,
	Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Vasily Gorbik, Vegard Nossum, Vlastimil Babka,
	kasan-dev, Linux Memory Management List, Linux-Arch, LKML

On Thu, Aug 25, 2022 at 3:10 PM Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> But UB is defined in terms of the abstract machine (like *all* of C),
> not in terms of the generated machine code.  Typically things will work
> fine if they "become invisible" by inlining, but this does not make the
> program a correct program ever.  Sorry :-(

Yeah, and the abstract machine model based on "abstract syntax" is
just wrong, wrong, wrong.

I really wish the C standard people had the guts to just fix it.  At
some point, relying on tradition when the tradition is bad is not a
great thing.

It's the same problem that made all the memory ordering discussions
completely untenable. The language to allow the whole data dependency
was completely ridiculous, because it became about the C language
syntax and theory, not about the actual code generation and actual
*meaning* that the whole thing was *about*.

Java may be a horrible language that a lot of people hate, but it
avoided a lot of problems by just making things about an actual
virtual machine and describing things within a more concrete model of
a virtual machine.

Then you can just say "this code sequence generates this set of
operations, and the compiler can optimize it any which way it likes as
long as the end result is equivalent".

Oh well.

I will repeat: a paper standard that doesn't take reality into account
is less useful than toilet paper. It's scratchy and not very
absorbent.

And the kernel will continue to care more about reality than about a C
standard that does bad things.

Inlining makes the use of the argument go away at the call site and
moves the code of the function into the body. That's how things
*work*. That's literally the meaning of inlining.

And inlining in C is so important because macros are weak, and other
facilities like templates don't exist.

But in the kernel, we also often use it because the actual semantics
of "not a function call" in terms of code generation is also important
(ie we have literal cases where "not generating the 'call'
instruction" is a correctness issue).

If the C standard thinks "undefined argument even for inlining use is
UB", then it's a case of that paperwork that doesn't reflect reality,
and we'll treat it with the deference it deserves - is less than
toilet paper.

We have decades of history of doing that in the kernel. Sometimes the
standards are just wrong, sometimes they are just too far removed from
reality to be relevant, and then it's just not worth worrying about
them.

          Linus

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 44/45] mm: fs: initialize fsdata passed to write_begin/write_end interface
  2022-08-26 19:41           ` Linus Torvalds
@ 2022-08-31 13:32             ` Alexander Potapenko
  0 siblings, 0 replies; 147+ messages in thread
From: Alexander Potapenko @ 2022-08-31 13:32 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Segher Boessenkool, Matthew Wilcox, Thomas Gleixner,
	Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Vasily Gorbik, Vegard Nossum, Vlastimil Babka,
	kasan-dev, Linux Memory Management List, Linux-Arch, LKML

On Fri, Aug 26, 2022 at 9:41 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Thu, Aug 25, 2022 at 3:10 PM Segher Boessenkool
> <segher@kernel.crashing.org> wrote:
> >
> > But UB is defined in terms of the abstract machine (like *all* of C),
> > not in terms of the generated machine code.  Typically things will work
> > fine if they "become invisible" by inlining, but this does not make the
> > program a correct program ever.  Sorry :-(
>
> Yeah, and the abstract machine model based on "abstract syntax" is
> just wrong, wrong, wrong.
>
> I really wish the C standard people had the guts to just fix it.  At
> some point, relying on tradition when the tradition is bad is not a
> great thing.
>
> It's the same problem that made all the memory ordering discussions
> completely untenable. The language to allow the whole data dependency
> was completely ridiculous, because it became about the C language
> syntax and theory, not about the actual code generation and actual
> *meaning* that the whole thing was *about*.
>
> Java may be a horrible language that a lot of people hate, but it
> avoided a lot of problems by just making things about an actual
> virtual machine and describing things within a more concrete model of
> a virtual machine.
>
> Then you can just say "this code sequence generates this set of
> operations, and the compiler can optimize it any which way it likes as
> long as the end result is equivalent".
>
> Oh well.
>
> I will repeat: a paper standard that doesn't take reality into account
> is less useful than toilet paper. It's scratchy and not very
> absorbent.
>
> And the kernel will continue to care more about reality than about a C
> standard that does bad things.
>
> Inlining makes the use of the argument go away at the call site and
> moves the code of the function into the body. That's how things
> *work*. That's literally the meaning of inlining.
>
> And inlining in C is so important because macros are weak, and other
> facilities like templates don't exist.
>
> But in the kernel, we also often use it because the actual semantics
> of "not a function call" in terms of code generation is also important
> (ie we have literal cases where "not generating the 'call'
> instruction" is a correctness issue).
>
> If the C standard thinks "undefined argument even for inlining use is
> UB", then it's a case of that paperwork that doesn't reflect reality,
> and we'll treat it with the deference it deserves - is less than
> toilet paper.

Just for posterity, in the case of KMSAN we are only dealing with
cases where the function call survived inlining and dead code
elimination.


> We have decades of history of doing that in the kernel. Sometimes the
> standards are just wrong, sometimes they are just too far removed from
> reality to be relevant, and then it's just not worth worrying about
> them.
>
>           Linus



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 10/45] libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE
  2022-08-03  9:44     ` Alexander Potapenko
@ 2023-01-05 22:08       ` Dan Williams
  2023-01-09  9:51         ` Alexander Potapenko
  2023-01-30  8:34         ` Alexander Potapenko
  0 siblings, 2 replies; 147+ messages in thread
From: Dan Williams @ 2023-01-05 22:08 UTC (permalink / raw)
  To: Alexander Potapenko, Marco Elver, Dan Williams
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, Linux Memory Management List,
	Linux-Arch, LKML

Alexander Potapenko wrote:
> (+ Dan Williams)
> (resending with patch context included)
> 
> On Mon, Jul 11, 2022 at 6:27 PM Marco Elver <elver@google.com> wrote:
> >
> > On Fri, 1 Jul 2022 at 16:23, Alexander Potapenko <glider@google.com> wrote:
> > >
> > > KMSAN adds extra metadata fields to struct page, so it does not fit into
> > > 64 bytes anymore.
> >
> > Does this somehow cause extra space being used in all kernel configs?
> > If not, it would be good to note this in the commit message.
> >
> I actually couldn't verify this on QEMU, because the driver never got loaded.
> Looks like this increases the amount of memory used by the nvdimm
> driver in all kernel configs that enable it (including those that
> don't use KMSAN), but I am not sure how much is that.
> 
> Dan, do you know how bad increasing MAX_STRUCT_PAGE_SIZE can be?

Apologies I missed this several months ago. The answer is that this
causes everyone creating PMEM namespaces on v6.1+ to lose double the
capacity of their namespace even when not using KMSAN which is too
wasteful to tolerate. So, I think "6e9f05dc66f9 libnvdimm/pfn_dev:
increase MAX_STRUCT_PAGE_SIZE" needs to be reverted and replaced with
something like:

diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig
index 79d93126453d..5693869b720b 100644
--- a/drivers/nvdimm/Kconfig
+++ b/drivers/nvdimm/Kconfig
@@ -63,6 +63,7 @@ config NVDIMM_PFN
        bool "PFN: Map persistent (device) memory"
        default LIBNVDIMM
        depends on ZONE_DEVICE
+       depends on !KMSAN
        select ND_CLAIM
        help
          Map persistent memory, i.e. advertise it to the memory


...otherwise, what was the rationale for increasing this value? Were you
actually trying to use KMSAN for DAX pages?

^ permalink raw reply related	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 10/45] libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE
  2023-01-05 22:08       ` Dan Williams
@ 2023-01-09  9:51         ` Alexander Potapenko
  2023-01-09 22:06           ` Dan Williams
  2023-01-30  8:34         ` Alexander Potapenko
  1 sibling, 1 reply; 147+ messages in thread
From: Alexander Potapenko @ 2023-01-09  9:51 UTC (permalink / raw)
  To: Dan Williams
  Cc: Marco Elver, Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, Linux Memory Management List,
	Linux-Arch, LKML

On Thu, Jan 5, 2023 at 11:09 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> Alexander Potapenko wrote:
> > (+ Dan Williams)
> > (resending with patch context included)
> >
> > On Mon, Jul 11, 2022 at 6:27 PM Marco Elver <elver@google.com> wrote:
> > >
> > > On Fri, 1 Jul 2022 at 16:23, Alexander Potapenko <glider@google.com> wrote:
> > > >
> > > > KMSAN adds extra metadata fields to struct page, so it does not fit into
> > > > 64 bytes anymore.
> > >
> > > Does this somehow cause extra space being used in all kernel configs?
> > > If not, it would be good to note this in the commit message.
> > >
> > I actually couldn't verify this on QEMU, because the driver never got loaded.
> > Looks like this increases the amount of memory used by the nvdimm
> > driver in all kernel configs that enable it (including those that
> > don't use KMSAN), but I am not sure how much is that.
> >
> > Dan, do you know how bad increasing MAX_STRUCT_PAGE_SIZE can be?
>
> Apologies I missed this several months ago. The answer is that this
> causes everyone creating PMEM namespaces on v6.1+ to lose double the
> capacity of their namespace even when not using KMSAN which is too
> wasteful to tolerate. So, I think "6e9f05dc66f9 libnvdimm/pfn_dev:
> increase MAX_STRUCT_PAGE_SIZE" needs to be reverted and replaced with
> something like:
>
> diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig
> index 79d93126453d..5693869b720b 100644
> --- a/drivers/nvdimm/Kconfig
> +++ b/drivers/nvdimm/Kconfig
> @@ -63,6 +63,7 @@ config NVDIMM_PFN
>         bool "PFN: Map persistent (device) memory"
>         default LIBNVDIMM
>         depends on ZONE_DEVICE
> +       depends on !KMSAN
>         select ND_CLAIM
>         help
>           Map persistent memory, i.e. advertise it to the memory
>
>
> ...otherwise, what was the rationale for increasing this value? Were you
> actually trying to use KMSAN for DAX pages?

I was just building the kernel with nvdimm driver and KMSAN enabled.
Because KMSAN adds extra data to every struct page, it immediately hit
the following assert:

drivers/nvdimm/pfn_devs.c:796:3: error: call to
__compiletime_assert_330 declared with 'error' attribute: BUILD_BUG_ON
fE
                BUILD_BUG_ON(sizeof(struct page) > MAX_STRUCT_PAGE_SIZE);

The comment before MAX_STRUCT_PAGE_SIZE declaration says "max struct
page size independent of kernel config", but maybe we can afford
making it dependent on CONFIG_KMSAN (and possibly other config options
that increase struct page size)?

I don't mind disabling the driver under KMSAN, but having an extra
ifdef to keep KMSAN support sounds reasonable, WDYT?



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 10/45] libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE
  2023-01-09  9:51         ` Alexander Potapenko
@ 2023-01-09 22:06           ` Dan Williams
  2023-01-10  5:56             ` Greg Kroah-Hartman
  0 siblings, 1 reply; 147+ messages in thread
From: Dan Williams @ 2023-01-09 22:06 UTC (permalink / raw)
  To: Alexander Potapenko, Dan Williams
  Cc: Marco Elver, Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, Linux Memory Management List,
	Linux-Arch, LKML

Alexander Potapenko wrote:
> On Thu, Jan 5, 2023 at 11:09 PM Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > Alexander Potapenko wrote:
> > > (+ Dan Williams)
> > > (resending with patch context included)
> > >
> > > On Mon, Jul 11, 2022 at 6:27 PM Marco Elver <elver@google.com> wrote:
> > > >
> > > > On Fri, 1 Jul 2022 at 16:23, Alexander Potapenko <glider@google.com> wrote:
> > > > >
> > > > > KMSAN adds extra metadata fields to struct page, so it does not fit into
> > > > > 64 bytes anymore.
> > > >
> > > > Does this somehow cause extra space being used in all kernel configs?
> > > > If not, it would be good to note this in the commit message.
> > > >
> > > I actually couldn't verify this on QEMU, because the driver never got loaded.
> > > Looks like this increases the amount of memory used by the nvdimm
> > > driver in all kernel configs that enable it (including those that
> > > don't use KMSAN), but I am not sure how much is that.
> > >
> > > Dan, do you know how bad increasing MAX_STRUCT_PAGE_SIZE can be?
> >
> > Apologies I missed this several months ago. The answer is that this
> > causes everyone creating PMEM namespaces on v6.1+ to lose double the
> > capacity of their namespace even when not using KMSAN which is too
> > wasteful to tolerate. So, I think "6e9f05dc66f9 libnvdimm/pfn_dev:
> > increase MAX_STRUCT_PAGE_SIZE" needs to be reverted and replaced with
> > something like:
> >
> > diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig
> > index 79d93126453d..5693869b720b 100644
> > --- a/drivers/nvdimm/Kconfig
> > +++ b/drivers/nvdimm/Kconfig
> > @@ -63,6 +63,7 @@ config NVDIMM_PFN
> >         bool "PFN: Map persistent (device) memory"
> >         default LIBNVDIMM
> >         depends on ZONE_DEVICE
> > +       depends on !KMSAN
> >         select ND_CLAIM
> >         help
> >           Map persistent memory, i.e. advertise it to the memory
> >
> >
> > ...otherwise, what was the rationale for increasing this value? Were you
> > actually trying to use KMSAN for DAX pages?
> 
> I was just building the kernel with nvdimm driver and KMSAN enabled.
> Because KMSAN adds extra data to every struct page, it immediately hit
> the following assert:
> 
> drivers/nvdimm/pfn_devs.c:796:3: error: call to
> __compiletime_assert_330 declared with 'error' attribute: BUILD_BUG_ON
> fE
>                 BUILD_BUG_ON(sizeof(struct page) > MAX_STRUCT_PAGE_SIZE);
> 
> The comment before MAX_STRUCT_PAGE_SIZE declaration says "max struct
> page size independent of kernel config", but maybe we can afford
> making it dependent on CONFIG_KMSAN (and possibly other config options
> that increase struct page size)?
> 
> I don't mind disabling the driver under KMSAN, but having an extra
> ifdef to keep KMSAN support sounds reasonable, WDYT?

How about a module parameter to opt-in to the increased permanent
capacity loss?

-- >8 --
From 693563817dea3fd8f293f9b69ec78066ab1d96d2 Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams@intel.com>
Date: Thu, 5 Jan 2023 13:27:34 -0800
Subject: [PATCH] nvdimm: Support sizeof(struct page) > MAX_STRUCT_PAGE_SIZE

Commit 6e9f05dc66f9 ("libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE")

...updated MAX_STRUCT_PAGE_SIZE to account for sizeof(struct page)
potentially doubling in the case of CONFIG_KMSAN=y. Unfortunately this
doubles the amount of capacity stolen from user addressable capacity for
everyone, regardless of whether they are using the debug option. Revert
that change, mandate that MAX_STRUCT_PAGE_SIZE never exceed 64, but
allow for debug scenarios to proceed with creating debug sized page maps
with a new 'libnvdimm.page_struct_override' module parameter.

Note that this only applies to cases where the page map is permanent,
i.e. stored in a reservation of the pmem itself ("--map=dev" in "ndctl
create-namespace" terms). For the "--map=mem" case, since the allocation
is ephemeral for the lifespan of the namespace, there are no explicit
restriction. However, the implicit restriction, of having enough
available "System RAM" to store the page map for the typically large
pmem, still applies.

Fixes: 6e9f05dc66f9 ("libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE")
Cc: <stable@vger.kernel.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Marco Elver <elver@google.com>
Reported-by: Jeff Moyer <jmoyer@redhat.com>
---
 drivers/nvdimm/nd.h       |  2 +-
 drivers/nvdimm/pfn_devs.c | 45 ++++++++++++++++++++++++++-------------
 2 files changed, 31 insertions(+), 16 deletions(-)

diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index 85ca5b4da3cf..ec5219680092 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -652,7 +652,7 @@ void devm_namespace_disable(struct device *dev,
 		struct nd_namespace_common *ndns);
 #if IS_ENABLED(CONFIG_ND_CLAIM)
 /* max struct page size independent of kernel config */
-#define MAX_STRUCT_PAGE_SIZE 128
+#define MAX_STRUCT_PAGE_SIZE 64
 int nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct dev_pagemap *pgmap);
 #else
 static inline int nvdimm_setup_pfn(struct nd_pfn *nd_pfn,
diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index 61af072ac98f..978d63559c0e 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -13,6 +13,11 @@
 #include "pfn.h"
 #include "nd.h"
 
+static bool page_struct_override;
+module_param(page_struct_override, bool, 0644);
+MODULE_PARM_DESC(page_struct_override,
+		 "Force namespace creation in the presence of mm-debug.");
+
 static void nd_pfn_release(struct device *dev)
 {
 	struct nd_region *nd_region = to_nd_region(dev->parent);
@@ -758,12 +763,6 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
 		return -ENXIO;
 	}
 
-	/*
-	 * Note, we use 64 here for the standard size of struct page,
-	 * debugging options may cause it to be larger in which case the
-	 * implementation will limit the pfns advertised through
-	 * ->direct_access() to those that are included in the memmap.
-	 */
 	start = nsio->res.start;
 	size = resource_size(&nsio->res);
 	npfns = PHYS_PFN(size - SZ_8K);
@@ -782,20 +781,33 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
 	}
 	end_trunc = start + size - ALIGN_DOWN(start + size, align);
 	if (nd_pfn->mode == PFN_MODE_PMEM) {
+		unsigned long page_map_size = MAX_STRUCT_PAGE_SIZE * npfns;
+
 		/*
 		 * The altmap should be padded out to the block size used
 		 * when populating the vmemmap. This *should* be equal to
 		 * PMD_SIZE for most architectures.
 		 *
-		 * Also make sure size of struct page is less than 128. We
-		 * want to make sure we use large enough size here so that
-		 * we don't have a dynamic reserve space depending on
-		 * struct page size. But we also want to make sure we notice
-		 * when we end up adding new elements to struct page.
+		 * Also make sure size of struct page is less than
+		 * MAX_STRUCT_PAGE_SIZE. The goal here is compatibility in the
+		 * face of production kernel configurations that reduce the
+		 * 'struct page' size below MAX_STRUCT_PAGE_SIZE. For debug
+		 * kernel configurations that increase the 'struct page' size
+		 * above MAX_STRUCT_PAGE_SIZE, the page_struct_override allows
+		 * for continuing with the capacity that will be wasted when
+		 * reverting to a production kernel configuration. Otherwise,
+		 * those configurations are blocked by default.
 		 */
-		BUILD_BUG_ON(sizeof(struct page) > MAX_STRUCT_PAGE_SIZE);
-		offset = ALIGN(start + SZ_8K + MAX_STRUCT_PAGE_SIZE * npfns, align)
-			- start;
+		if (sizeof(struct page) > MAX_STRUCT_PAGE_SIZE) {
+			if (page_struct_override)
+				page_map_size = sizeof(struct page) * npfns;
+			else {
+				dev_err(&nd_pfn->dev,
+					"Memory debug options prevent using pmem for the page map\n");
+				return -EINVAL;
+			}
+		}
+		offset = ALIGN(start + SZ_8K + page_map_size, align) - start;
 	} else if (nd_pfn->mode == PFN_MODE_RAM)
 		offset = ALIGN(start + SZ_8K, align) - start;
 	else
@@ -818,7 +830,10 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
 	pfn_sb->version_minor = cpu_to_le16(4);
 	pfn_sb->end_trunc = cpu_to_le32(end_trunc);
 	pfn_sb->align = cpu_to_le32(nd_pfn->align);
-	pfn_sb->page_struct_size = cpu_to_le16(MAX_STRUCT_PAGE_SIZE);
+	if (sizeof(struct page) > MAX_STRUCT_PAGE_SIZE && page_struct_override)
+		pfn_sb->page_struct_size = cpu_to_le16(sizeof(struct page));
+	else
+		pfn_sb->page_struct_size = cpu_to_le16(MAX_STRUCT_PAGE_SIZE);
 	pfn_sb->page_size = cpu_to_le32(PAGE_SIZE);
 	checksum = nd_sb_checksum((struct nd_gen_sb *) pfn_sb);
 	pfn_sb->checksum = cpu_to_le64(checksum);
-- 
2.38.1

^ permalink raw reply related	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 10/45] libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE
  2023-01-09 22:06           ` Dan Williams
@ 2023-01-10  5:56             ` Greg Kroah-Hartman
  2023-01-10  6:55               ` Dan Williams
  0 siblings, 1 reply; 147+ messages in thread
From: Greg Kroah-Hartman @ 2023-01-10  5:56 UTC (permalink / raw)
  To: Dan Williams
  Cc: Alexander Potapenko, Marco Elver, Alexander Viro,
	Alexei Starovoitov, Andrew Morton, Andrey Konovalov,
	Andy Lutomirski, Arnd Bergmann, Borislav Petkov,
	Christoph Hellwig, Christoph Lameter, David Rientjes,
	Dmitry Vyukov, Eric Dumazet, Herbert Xu, Ilya Leoshkevich,
	Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev,
	Linux Memory Management List, Linux-Arch, LKML

On Mon, Jan 09, 2023 at 02:06:36PM -0800, Dan Williams wrote:
> Alexander Potapenko wrote:
> > On Thu, Jan 5, 2023 at 11:09 PM Dan Williams <dan.j.williams@intel.com> wrote:
> > >
> > > Alexander Potapenko wrote:
> > > > (+ Dan Williams)
> > > > (resending with patch context included)
> > > >
> > > > On Mon, Jul 11, 2022 at 6:27 PM Marco Elver <elver@google.com> wrote:
> > > > >
> > > > > On Fri, 1 Jul 2022 at 16:23, Alexander Potapenko <glider@google.com> wrote:
> > > > > >
> > > > > > KMSAN adds extra metadata fields to struct page, so it does not fit into
> > > > > > 64 bytes anymore.
> > > > >
> > > > > Does this somehow cause extra space being used in all kernel configs?
> > > > > If not, it would be good to note this in the commit message.
> > > > >
> > > > I actually couldn't verify this on QEMU, because the driver never got loaded.
> > > > Looks like this increases the amount of memory used by the nvdimm
> > > > driver in all kernel configs that enable it (including those that
> > > > don't use KMSAN), but I am not sure how much is that.
> > > >
> > > > Dan, do you know how bad increasing MAX_STRUCT_PAGE_SIZE can be?
> > >
> > > Apologies I missed this several months ago. The answer is that this
> > > causes everyone creating PMEM namespaces on v6.1+ to lose double the
> > > capacity of their namespace even when not using KMSAN which is too
> > > wasteful to tolerate. So, I think "6e9f05dc66f9 libnvdimm/pfn_dev:
> > > increase MAX_STRUCT_PAGE_SIZE" needs to be reverted and replaced with
> > > something like:
> > >
> > > diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig
> > > index 79d93126453d..5693869b720b 100644
> > > --- a/drivers/nvdimm/Kconfig
> > > +++ b/drivers/nvdimm/Kconfig
> > > @@ -63,6 +63,7 @@ config NVDIMM_PFN
> > >         bool "PFN: Map persistent (device) memory"
> > >         default LIBNVDIMM
> > >         depends on ZONE_DEVICE
> > > +       depends on !KMSAN
> > >         select ND_CLAIM
> > >         help
> > >           Map persistent memory, i.e. advertise it to the memory
> > >
> > >
> > > ...otherwise, what was the rationale for increasing this value? Were you
> > > actually trying to use KMSAN for DAX pages?
> > 
> > I was just building the kernel with nvdimm driver and KMSAN enabled.
> > Because KMSAN adds extra data to every struct page, it immediately hit
> > the following assert:
> > 
> > drivers/nvdimm/pfn_devs.c:796:3: error: call to
> > __compiletime_assert_330 declared with 'error' attribute: BUILD_BUG_ON
> > fE
> >                 BUILD_BUG_ON(sizeof(struct page) > MAX_STRUCT_PAGE_SIZE);
> > 
> > The comment before MAX_STRUCT_PAGE_SIZE declaration says "max struct
> > page size independent of kernel config", but maybe we can afford
> > making it dependent on CONFIG_KMSAN (and possibly other config options
> > that increase struct page size)?
> > 
> > I don't mind disabling the driver under KMSAN, but having an extra
> > ifdef to keep KMSAN support sounds reasonable, WDYT?
> 
> How about a module parameter to opt-in to the increased permanent
> capacity loss?

Please no, this isn't the 1990's, we should never force users to keep
track of new module parameters that you then have to support for
forever.


> 
> -- >8 --
> >From 693563817dea3fd8f293f9b69ec78066ab1d96d2 Mon Sep 17 00:00:00 2001
> From: Dan Williams <dan.j.williams@intel.com>
> Date: Thu, 5 Jan 2023 13:27:34 -0800
> Subject: [PATCH] nvdimm: Support sizeof(struct page) > MAX_STRUCT_PAGE_SIZE
> 
> Commit 6e9f05dc66f9 ("libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE")
> 
> ...updated MAX_STRUCT_PAGE_SIZE to account for sizeof(struct page)
> potentially doubling in the case of CONFIG_KMSAN=y. Unfortunately this
> doubles the amount of capacity stolen from user addressable capacity for
> everyone, regardless of whether they are using the debug option. Revert
> that change, mandate that MAX_STRUCT_PAGE_SIZE never exceed 64, but
> allow for debug scenarios to proceed with creating debug sized page maps
> with a new 'libnvdimm.page_struct_override' module parameter.
> 
> Note that this only applies to cases where the page map is permanent,
> i.e. stored in a reservation of the pmem itself ("--map=dev" in "ndctl
> create-namespace" terms). For the "--map=mem" case, since the allocation
> is ephemeral for the lifespan of the namespace, there are no explicit
> restriction. However, the implicit restriction, of having enough
> available "System RAM" to store the page map for the typically large
> pmem, still applies.
> 
> Fixes: 6e9f05dc66f9 ("libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE")
> Cc: <stable@vger.kernel.org>
> Cc: Alexander Potapenko <glider@google.com>
> Cc: Marco Elver <elver@google.com>
> Reported-by: Jeff Moyer <jmoyer@redhat.com>
> ---
>  drivers/nvdimm/nd.h       |  2 +-
>  drivers/nvdimm/pfn_devs.c | 45 ++++++++++++++++++++++++++-------------
>  2 files changed, 31 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
> index 85ca5b4da3cf..ec5219680092 100644
> --- a/drivers/nvdimm/nd.h
> +++ b/drivers/nvdimm/nd.h
> @@ -652,7 +652,7 @@ void devm_namespace_disable(struct device *dev,
>  		struct nd_namespace_common *ndns);
>  #if IS_ENABLED(CONFIG_ND_CLAIM)
>  /* max struct page size independent of kernel config */
> -#define MAX_STRUCT_PAGE_SIZE 128
> +#define MAX_STRUCT_PAGE_SIZE 64
>  int nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct dev_pagemap *pgmap);
>  #else
>  static inline int nvdimm_setup_pfn(struct nd_pfn *nd_pfn,
> diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
> index 61af072ac98f..978d63559c0e 100644
> --- a/drivers/nvdimm/pfn_devs.c
> +++ b/drivers/nvdimm/pfn_devs.c
> @@ -13,6 +13,11 @@
>  #include "pfn.h"
>  #include "nd.h"
>  
> +static bool page_struct_override;
> +module_param(page_struct_override, bool, 0644);
> +MODULE_PARM_DESC(page_struct_override,
> +		 "Force namespace creation in the presence of mm-debug.");

I can't figure out from this description what this is for so perhaps it
should be either removed and made dynamic (if you know you want to debug
the mm core, why not turn it on then?) or made more obvious what is
happening?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 10/45] libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE
  2023-01-10  5:56             ` Greg Kroah-Hartman
@ 2023-01-10  6:55               ` Dan Williams
  2023-01-10  8:48                 ` Alexander Potapenko
  0 siblings, 1 reply; 147+ messages in thread
From: Dan Williams @ 2023-01-10  6:55 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Dan Williams
  Cc: Alexander Potapenko, Marco Elver, Alexander Viro,
	Alexei Starovoitov, Andrew Morton, Andrey Konovalov,
	Andy Lutomirski, Arnd Bergmann, Borislav Petkov,
	Christoph Hellwig, Christoph Lameter, David Rientjes,
	Dmitry Vyukov, Eric Dumazet, Herbert Xu, Ilya Leoshkevich,
	Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev,
	Linux Memory Management List, Linux-Arch, LKML

Greg Kroah-Hartman wrote:
> On Mon, Jan 09, 2023 at 02:06:36PM -0800, Dan Williams wrote:
> > Alexander Potapenko wrote:
> > > On Thu, Jan 5, 2023 at 11:09 PM Dan Williams <dan.j.williams@intel.com> wrote:
> > > >
> > > > Alexander Potapenko wrote:
> > > > > (+ Dan Williams)
> > > > > (resending with patch context included)
> > > > >
> > > > > On Mon, Jul 11, 2022 at 6:27 PM Marco Elver <elver@google.com> wrote:
> > > > > >
> > > > > > On Fri, 1 Jul 2022 at 16:23, Alexander Potapenko <glider@google.com> wrote:
> > > > > > >
> > > > > > > KMSAN adds extra metadata fields to struct page, so it does not fit into
> > > > > > > 64 bytes anymore.
> > > > > >
> > > > > > Does this somehow cause extra space being used in all kernel configs?
> > > > > > If not, it would be good to note this in the commit message.
> > > > > >
> > > > > I actually couldn't verify this on QEMU, because the driver never got loaded.
> > > > > Looks like this increases the amount of memory used by the nvdimm
> > > > > driver in all kernel configs that enable it (including those that
> > > > > don't use KMSAN), but I am not sure how much is that.
> > > > >
> > > > > Dan, do you know how bad increasing MAX_STRUCT_PAGE_SIZE can be?
> > > >
> > > > Apologies I missed this several months ago. The answer is that this
> > > > causes everyone creating PMEM namespaces on v6.1+ to lose double the
> > > > capacity of their namespace even when not using KMSAN which is too
> > > > wasteful to tolerate. So, I think "6e9f05dc66f9 libnvdimm/pfn_dev:
> > > > increase MAX_STRUCT_PAGE_SIZE" needs to be reverted and replaced with
> > > > something like:
> > > >
> > > > diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig
> > > > index 79d93126453d..5693869b720b 100644
> > > > --- a/drivers/nvdimm/Kconfig
> > > > +++ b/drivers/nvdimm/Kconfig
> > > > @@ -63,6 +63,7 @@ config NVDIMM_PFN
> > > >         bool "PFN: Map persistent (device) memory"
> > > >         default LIBNVDIMM
> > > >         depends on ZONE_DEVICE
> > > > +       depends on !KMSAN
> > > >         select ND_CLAIM
> > > >         help
> > > >           Map persistent memory, i.e. advertise it to the memory
> > > >
> > > >
> > > > ...otherwise, what was the rationale for increasing this value? Were you
> > > > actually trying to use KMSAN for DAX pages?
> > > 
> > > I was just building the kernel with nvdimm driver and KMSAN enabled.
> > > Because KMSAN adds extra data to every struct page, it immediately hit
> > > the following assert:
> > > 
> > > drivers/nvdimm/pfn_devs.c:796:3: error: call to
> > > __compiletime_assert_330 declared with 'error' attribute: BUILD_BUG_ON
> > > fE
> > >                 BUILD_BUG_ON(sizeof(struct page) > MAX_STRUCT_PAGE_SIZE);
> > > 
> > > The comment before MAX_STRUCT_PAGE_SIZE declaration says "max struct
> > > page size independent of kernel config", but maybe we can afford
> > > making it dependent on CONFIG_KMSAN (and possibly other config options
> > > that increase struct page size)?
> > > 
> > > I don't mind disabling the driver under KMSAN, but having an extra
> > > ifdef to keep KMSAN support sounds reasonable, WDYT?
> > 
> > How about a module parameter to opt-in to the increased permanent
> > capacity loss?
> 
> Please no, this isn't the 1990's, we should never force users to keep
> track of new module parameters that you then have to support for
> forever.

Fair enough, premature enabling. If someone really wants this they can
find this thread in the archives and ask for another solution like
compile time override.

> 
> 
> > 
> > -- >8 --
> > >From 693563817dea3fd8f293f9b69ec78066ab1d96d2 Mon Sep 17 00:00:00 2001
> > From: Dan Williams <dan.j.williams@intel.com>
> > Date: Thu, 5 Jan 2023 13:27:34 -0800
> > Subject: [PATCH] nvdimm: Support sizeof(struct page) > MAX_STRUCT_PAGE_SIZE
> > 
> > Commit 6e9f05dc66f9 ("libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE")
> > 
> > ...updated MAX_STRUCT_PAGE_SIZE to account for sizeof(struct page)
> > potentially doubling in the case of CONFIG_KMSAN=y. Unfortunately this
> > doubles the amount of capacity stolen from user addressable capacity for
> > everyone, regardless of whether they are using the debug option. Revert
> > that change, mandate that MAX_STRUCT_PAGE_SIZE never exceed 64, but
> > allow for debug scenarios to proceed with creating debug sized page maps
> > with a new 'libnvdimm.page_struct_override' module parameter.
> > 
> > Note that this only applies to cases where the page map is permanent,
> > i.e. stored in a reservation of the pmem itself ("--map=dev" in "ndctl
> > create-namespace" terms). For the "--map=mem" case, since the allocation
> > is ephemeral for the lifespan of the namespace, there are no explicit
> > restriction. However, the implicit restriction, of having enough
> > available "System RAM" to store the page map for the typically large
> > pmem, still applies.
> > 
> > Fixes: 6e9f05dc66f9 ("libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE")
> > Cc: <stable@vger.kernel.org>
> > Cc: Alexander Potapenko <glider@google.com>
> > Cc: Marco Elver <elver@google.com>
> > Reported-by: Jeff Moyer <jmoyer@redhat.com>
> > ---
> >  drivers/nvdimm/nd.h       |  2 +-
> >  drivers/nvdimm/pfn_devs.c | 45 ++++++++++++++++++++++++++-------------
> >  2 files changed, 31 insertions(+), 16 deletions(-)
> > 
> > diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
> > index 85ca5b4da3cf..ec5219680092 100644
> > --- a/drivers/nvdimm/nd.h
> > +++ b/drivers/nvdimm/nd.h
> > @@ -652,7 +652,7 @@ void devm_namespace_disable(struct device *dev,
> >  		struct nd_namespace_common *ndns);
> >  #if IS_ENABLED(CONFIG_ND_CLAIM)
> >  /* max struct page size independent of kernel config */
> > -#define MAX_STRUCT_PAGE_SIZE 128
> > +#define MAX_STRUCT_PAGE_SIZE 64
> >  int nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct dev_pagemap *pgmap);
> >  #else
> >  static inline int nvdimm_setup_pfn(struct nd_pfn *nd_pfn,
> > diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
> > index 61af072ac98f..978d63559c0e 100644
> > --- a/drivers/nvdimm/pfn_devs.c
> > +++ b/drivers/nvdimm/pfn_devs.c
> > @@ -13,6 +13,11 @@
> >  #include "pfn.h"
> >  #include "nd.h"
> >  
> > +static bool page_struct_override;
> > +module_param(page_struct_override, bool, 0644);
> > +MODULE_PARM_DESC(page_struct_override,
> > +		 "Force namespace creation in the presence of mm-debug.");
> 
> I can't figure out from this description what this is for so perhaps it
> should be either removed and made dynamic (if you know you want to debug
> the mm core, why not turn it on then?) or made more obvious what is
> happening?

I'll kill it and update the KMSAN Documentation that KMSAN has
interactions with the NVDIMM subsystem that may cause some namespaces to
fail to enable. That Documentation needs to be a part of this patch
regardless as that would be the default behavior of this module
parameter.

Unfortunately, it can not be dynamically enabled because the size of
'struct page' is unfortunately recorded in the metadata of the device.
Recall this is for supporting platform configurations where the capacity
of the persistent memory exceeds or consumes too much of System RAM.
Consider 4TB of PMEM consumes 64GB of space just for 'struct page'. So,
NVDIMM subsystem has a mode to store that page array in a reservation on
the PMEM device itself.

KMSAN mandates either that all namespaces all the time reserve the extra
capacity, or that those namespace cannot be mapped while KMSAN is
enabled.

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 10/45] libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE
  2023-01-10  6:55               ` Dan Williams
@ 2023-01-10  8:48                 ` Alexander Potapenko
  2023-01-10  8:52                   ` Alexander Potapenko
  2023-01-10  8:53                   ` Eric Dumazet
  0 siblings, 2 replies; 147+ messages in thread
From: Alexander Potapenko @ 2023-01-10  8:48 UTC (permalink / raw)
  To: Dan Williams
  Cc: Greg Kroah-Hartman, Marco Elver, Alexander Viro,
	Alexei Starovoitov, Andrew Morton, Andrey Konovalov,
	Andy Lutomirski, Arnd Bergmann, Borislav Petkov,
	Christoph Hellwig, Christoph Lameter, David Rientjes,
	Dmitry Vyukov, Eric Dumazet, Herbert Xu, Ilya Leoshkevich,
	Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev,
	Linux Memory Management List, Linux-Arch, LKML

On Tue, Jan 10, 2023 at 7:55 AM Dan Williams <dan.j.williams@intel.com> wrote:
>
> Greg Kroah-Hartman wrote:
> > On Mon, Jan 09, 2023 at 02:06:36PM -0800, Dan Williams wrote:
> > > Alexander Potapenko wrote:
> > > > On Thu, Jan 5, 2023 at 11:09 PM Dan Williams <dan.j.williams@intel.com> wrote:
> > > > >
> > > > > Alexander Potapenko wrote:
> > > > > > (+ Dan Williams)
> > > > > > (resending with patch context included)
> > > > > >
> > > > > > On Mon, Jul 11, 2022 at 6:27 PM Marco Elver <elver@google.com> wrote:
> > > > > > >
> > > > > > > On Fri, 1 Jul 2022 at 16:23, Alexander Potapenko <glider@google.com> wrote:
> > > > > > > >
> > > > > > > > KMSAN adds extra metadata fields to struct page, so it does not fit into
> > > > > > > > 64 bytes anymore.
> > > > > > >
> > > > > > > Does this somehow cause extra space being used in all kernel configs?
> > > > > > > If not, it would be good to note this in the commit message.
> > > > > > >
> > > > > > I actually couldn't verify this on QEMU, because the driver never got loaded.
> > > > > > Looks like this increases the amount of memory used by the nvdimm
> > > > > > driver in all kernel configs that enable it (including those that
> > > > > > don't use KMSAN), but I am not sure how much is that.
> > > > > >
> > > > > > Dan, do you know how bad increasing MAX_STRUCT_PAGE_SIZE can be?
> > > > >
> > > > > Apologies I missed this several months ago. The answer is that this
> > > > > causes everyone creating PMEM namespaces on v6.1+ to lose double the
> > > > > capacity of their namespace even when not using KMSAN which is too
> > > > > wasteful to tolerate. So, I think "6e9f05dc66f9 libnvdimm/pfn_dev:
> > > > > increase MAX_STRUCT_PAGE_SIZE" needs to be reverted and replaced with
> > > > > something like:
> > > > >
> > > > > diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig
> > > > > index 79d93126453d..5693869b720b 100644
> > > > > --- a/drivers/nvdimm/Kconfig
> > > > > +++ b/drivers/nvdimm/Kconfig
> > > > > @@ -63,6 +63,7 @@ config NVDIMM_PFN
> > > > >         bool "PFN: Map persistent (device) memory"
> > > > >         default LIBNVDIMM
> > > > >         depends on ZONE_DEVICE
> > > > > +       depends on !KMSAN
> > > > >         select ND_CLAIM
> > > > >         help
> > > > >           Map persistent memory, i.e. advertise it to the memory
> > > > >
> > > > >
> > > > > ...otherwise, what was the rationale for increasing this value? Were you
> > > > > actually trying to use KMSAN for DAX pages?
> > > >
> > > > I was just building the kernel with nvdimm driver and KMSAN enabled.
> > > > Because KMSAN adds extra data to every struct page, it immediately hit
> > > > the following assert:
> > > >
> > > > drivers/nvdimm/pfn_devs.c:796:3: error: call to
> > > > __compiletime_assert_330 declared with 'error' attribute: BUILD_BUG_ON
> > > > fE
> > > >                 BUILD_BUG_ON(sizeof(struct page) > MAX_STRUCT_PAGE_SIZE);
> > > >
> > > > The comment before MAX_STRUCT_PAGE_SIZE declaration says "max struct
> > > > page size independent of kernel config", but maybe we can afford
> > > > making it dependent on CONFIG_KMSAN (and possibly other config options
> > > > that increase struct page size)?
> > > >
> > > > I don't mind disabling the driver under KMSAN, but having an extra
> > > > ifdef to keep KMSAN support sounds reasonable, WDYT?
> > >
> > > How about a module parameter to opt-in to the increased permanent
> > > capacity loss?
> >
> > Please no, this isn't the 1990's, we should never force users to keep
> > track of new module parameters that you then have to support for
> > forever.
>
> Fair enough, premature enabling. If someone really wants this they can
> find this thread in the archives and ask for another solution like
> compile time override.
>
> >
> >
> > >
> > > -- >8 --
> > > >From 693563817dea3fd8f293f9b69ec78066ab1d96d2 Mon Sep 17 00:00:00 2001
> > > From: Dan Williams <dan.j.williams@intel.com>
> > > Date: Thu, 5 Jan 2023 13:27:34 -0800
> > > Subject: [PATCH] nvdimm: Support sizeof(struct page) > MAX_STRUCT_PAGE_SIZE
> > >
> > > Commit 6e9f05dc66f9 ("libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE")
> > >
> > > ...updated MAX_STRUCT_PAGE_SIZE to account for sizeof(struct page)
> > > potentially doubling in the case of CONFIG_KMSAN=y. Unfortunately this
> > > doubles the amount of capacity stolen from user addressable capacity for
> > > everyone, regardless of whether they are using the debug option. Revert
> > > that change, mandate that MAX_STRUCT_PAGE_SIZE never exceed 64, but
> > > allow for debug scenarios to proceed with creating debug sized page maps
> > > with a new 'libnvdimm.page_struct_override' module parameter.
> > >
> > > Note that this only applies to cases where the page map is permanent,
> > > i.e. stored in a reservation of the pmem itself ("--map=dev" in "ndctl
> > > create-namespace" terms). For the "--map=mem" case, since the allocation
> > > is ephemeral for the lifespan of the namespace, there are no explicit
> > > restriction. However, the implicit restriction, of having enough
> > > available "System RAM" to store the page map for the typically large
> > > pmem, still applies.
> > >
> > > Fixes: 6e9f05dc66f9 ("libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE")
> > > Cc: <stable@vger.kernel.org>
> > > Cc: Alexander Potapenko <glider@google.com>
> > > Cc: Marco Elver <elver@google.com>
> > > Reported-by: Jeff Moyer <jmoyer@redhat.com>
> > > ---
> > >  drivers/nvdimm/nd.h       |  2 +-
> > >  drivers/nvdimm/pfn_devs.c | 45 ++++++++++++++++++++++++++-------------
> > >  2 files changed, 31 insertions(+), 16 deletions(-)
> > >
> > > diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
> > > index 85ca5b4da3cf..ec5219680092 100644
> > > --- a/drivers/nvdimm/nd.h
> > > +++ b/drivers/nvdimm/nd.h
> > > @@ -652,7 +652,7 @@ void devm_namespace_disable(struct device *dev,
> > >             struct nd_namespace_common *ndns);
> > >  #if IS_ENABLED(CONFIG_ND_CLAIM)
> > >  /* max struct page size independent of kernel config */
> > > -#define MAX_STRUCT_PAGE_SIZE 128
> > > +#define MAX_STRUCT_PAGE_SIZE 64
> > >  int nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct dev_pagemap *pgmap);
> > >  #else
> > >  static inline int nvdimm_setup_pfn(struct nd_pfn *nd_pfn,
> > > diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
> > > index 61af072ac98f..978d63559c0e 100644
> > > --- a/drivers/nvdimm/pfn_devs.c
> > > +++ b/drivers/nvdimm/pfn_devs.c
> > > @@ -13,6 +13,11 @@
> > >  #include "pfn.h"
> > >  #include "nd.h"
> > >
> > > +static bool page_struct_override;
> > > +module_param(page_struct_override, bool, 0644);
> > > +MODULE_PARM_DESC(page_struct_override,
> > > +            "Force namespace creation in the presence of mm-debug.");
> >
> > I can't figure out from this description what this is for so perhaps it
> > should be either removed and made dynamic (if you know you want to debug
> > the mm core, why not turn it on then?) or made more obvious what is
> > happening?
>
> I'll kill it and update the KMSAN Documentation that KMSAN has
> interactions with the NVDIMM subsystem that may cause some namespaces to
> fail to enable. That Documentation needs to be a part of this patch
> regardless as that would be the default behavior of this module
> parameter.
>
> Unfortunately, it can not be dynamically enabled because the size of
> 'struct page' is unfortunately recorded in the metadata of the device.
> Recall this is for supporting platform configurations where the capacity
> of the persistent memory exceeds or consumes too much of System RAM.
> Consider 4TB of PMEM consumes 64GB of space just for 'struct page'. So,
> NVDIMM subsystem has a mode to store that page array in a reservation on
> the PMEM device itself.

Sorry, I might be missing something, but why cannot we have

#ifdef CONFIG_KMSAN
#define MAX_STRUCT_PAGE_SIZE 128
#else
#define MAX_STRUCT_PAGE_SIZE 64
#endif

?

KMSAN is a debug-only tool, it already consumes more than two thirds
of the system memory, so you don't want to enable it in any production
environment anyway.

> KMSAN mandates either that all namespaces all the time reserve the extra
> capacity, or that those namespace cannot be mapped while KMSAN is
> enabled.

Struct page depends on a couple of config options that affect its
size, and has already been approaching the 64 byte boundary.
It is unfortunate that the introduction of KMSAN was the last straw,
but it could've been any other debug config that needs to store data
in the struct page.
Keeping the struct within cacheline size sounds reasonable for the
default configuration, but having a build-time assert that prevents us
from building debug configs sounds excessive.

> --
> You received this message because you are subscribed to the Google Groups "kasan-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/63bd0be8945a0_5178e29414%40dwillia2-xfh.jf.intel.com.notmuch.



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 10/45] libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE
  2023-01-10  8:48                 ` Alexander Potapenko
@ 2023-01-10  8:52                   ` Alexander Potapenko
  2023-01-10  8:53                   ` Eric Dumazet
  1 sibling, 0 replies; 147+ messages in thread
From: Alexander Potapenko @ 2023-01-10  8:52 UTC (permalink / raw)
  To: Dan Williams
  Cc: Greg Kroah-Hartman, Marco Elver, Alexander Viro,
	Alexei Starovoitov, Andrew Morton, Andrey Konovalov,
	Andy Lutomirski, Arnd Bergmann, Borislav Petkov,
	Christoph Hellwig, Christoph Lameter, David Rientjes,
	Dmitry Vyukov, Eric Dumazet, Herbert Xu, Ilya Leoshkevich,
	Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev,
	Linux Memory Management List, Linux-Arch, LKML

> > >
> > > >
> > > > -- >8 --
> > > > >From 693563817dea3fd8f293f9b69ec78066ab1d96d2 Mon Sep 17 00:00:00 2001
> > > > From: Dan Williams <dan.j.williams@intel.com>
> > > > Date: Thu, 5 Jan 2023 13:27:34 -0800
> > > > Subject: [PATCH] nvdimm: Support sizeof(struct page) > MAX_STRUCT_PAGE_SIZE
> > > >
> > > > Commit 6e9f05dc66f9 ("libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE")
> > > >
> > > > ...updated MAX_STRUCT_PAGE_SIZE to account for sizeof(struct page)
> > > > potentially doubling in the case of CONFIG_KMSAN=y. Unfortunately this
> > > > doubles the amount of capacity stolen from user addressable capacity for
> > > > everyone, regardless of whether they are using the debug option. Revert
> > > > that change, mandate that MAX_STRUCT_PAGE_SIZE never exceed 64, but
> > > > allow for debug scenarios to proceed with creating debug sized page maps
> > > > with a new 'libnvdimm.page_struct_override' module parameter.
> > > >
> > > > Note that this only applies to cases where the page map is permanent,
> > > > i.e. stored in a reservation of the pmem itself ("--map=dev" in "ndctl
> > > > create-namespace" terms). For the "--map=mem" case, since the allocation
> > > > is ephemeral for the lifespan of the namespace, there are no explicit
> > > > restriction. However, the implicit restriction, of having enough
> > > > available "System RAM" to store the page map for the typically large
> > > > pmem, still applies.
> > > >
> > > > Fixes: 6e9f05dc66f9 ("libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE")
> > > > Cc: <stable@vger.kernel.org>
> > > > Cc: Alexander Potapenko <glider@google.com>
> > > > Cc: Marco Elver <elver@google.com>
> > > > Reported-by: Jeff Moyer <jmoyer@redhat.com>
> > > > ---
> > > >  drivers/nvdimm/nd.h       |  2 +-
> > > >  drivers/nvdimm/pfn_devs.c | 45 ++++++++++++++++++++++++++-------------
> > > >  2 files changed, 31 insertions(+), 16 deletions(-)
> > > >
> > > > diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
> > > > index 85ca5b4da3cf..ec5219680092 100644
> > > > --- a/drivers/nvdimm/nd.h
> > > > +++ b/drivers/nvdimm/nd.h
> > > > @@ -652,7 +652,7 @@ void devm_namespace_disable(struct device *dev,
> > > >             struct nd_namespace_common *ndns);
> > > >  #if IS_ENABLED(CONFIG_ND_CLAIM)
> > > >  /* max struct page size independent of kernel config */
> > > > -#define MAX_STRUCT_PAGE_SIZE 128
> > > > +#define MAX_STRUCT_PAGE_SIZE 64
> > > >  int nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct dev_pagemap *pgmap);
> > > >  #else
> > > >  static inline int nvdimm_setup_pfn(struct nd_pfn *nd_pfn,
> > > > diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
> > > > index 61af072ac98f..978d63559c0e 100644
> > > > --- a/drivers/nvdimm/pfn_devs.c
> > > > +++ b/drivers/nvdimm/pfn_devs.c
> > > > @@ -13,6 +13,11 @@
> > > >  #include "pfn.h"
> > > >  #include "nd.h"
> > > >
> > > > +static bool page_struct_override;
> > > > +module_param(page_struct_override, bool, 0644);
> > > > +MODULE_PARM_DESC(page_struct_override,
> > > > +            "Force namespace creation in the presence of mm-debug.");
> > >
> > > I can't figure out from this description what this is for so perhaps it
> > > should be either removed and made dynamic (if you know you want to debug
> > > the mm core, why not turn it on then?) or made more obvious what is
> > > happening?
> >
> > I'll kill it and update the KMSAN Documentation that KMSAN has
> > interactions with the NVDIMM subsystem that may cause some namespaces to
> > fail to enable. That Documentation needs to be a part of this patch
> > regardless as that would be the default behavior of this module
> > parameter.
> >
> > Unfortunately, it can not be dynamically enabled because the size of
> > 'struct page' is unfortunately recorded in the metadata of the device.
> > Recall this is for supporting platform configurations where the capacity
> > of the persistent memory exceeds or consumes too much of System RAM.
> > Consider 4TB of PMEM consumes 64GB of space just for 'struct page'. So,
> > NVDIMM subsystem has a mode to store that page array in a reservation on
> > the PMEM device itself.
>
> Sorry, I might be missing something, but why cannot we have
>
> #ifdef CONFIG_KMSAN
> #define MAX_STRUCT_PAGE_SIZE 128
By the way, KMSAN only adds 16 bytes to struct page - would it help to
reduce MAX_STRUCT_PAGE_SIZE to 80 bytes?
> #else
> #define MAX_STRUCT_PAGE_SIZE 64
> #endif
>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 10/45] libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE
  2023-01-10  8:48                 ` Alexander Potapenko
  2023-01-10  8:52                   ` Alexander Potapenko
@ 2023-01-10  8:53                   ` Eric Dumazet
  2023-01-10  8:55                     ` Christoph Hellwig
  2023-01-10  9:14                     ` Alexander Potapenko
  1 sibling, 2 replies; 147+ messages in thread
From: Eric Dumazet @ 2023-01-10  8:53 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Dan Williams, Greg Kroah-Hartman, Marco Elver, Alexander Viro,
	Alexei Starovoitov, Andrew Morton, Andrey Konovalov,
	Andy Lutomirski, Arnd Bergmann, Borislav Petkov,
	Christoph Hellwig, Christoph Lameter, David Rientjes,
	Dmitry Vyukov, Herbert Xu, Ilya Leoshkevich, Ingo Molnar,
	Jens Axboe, Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, Linux Memory Management List,
	Linux-Arch, LKML

On Tue, Jan 10, 2023 at 9:49 AM Alexander Potapenko <glider@google.com> wrote:
>
> On Tue, Jan 10, 2023 at 7:55 AM Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > Greg Kroah-Hartman wrote:
> > > On Mon, Jan 09, 2023 at 02:06:36PM -0800, Dan Williams wrote:
> > > > Alexander Potapenko wrote:
> > > > > On Thu, Jan 5, 2023 at 11:09 PM Dan Williams <dan.j.williams@intel.com> wrote:
> > > > > >
> > > > > > Alexander Potapenko wrote:
> > > > > > > (+ Dan Williams)
> > > > > > > (resending with patch context included)
> > > > > > >
> > > > > > > On Mon, Jul 11, 2022 at 6:27 PM Marco Elver <elver@google.com> wrote:
> > > > > > > >
> > > > > > > > On Fri, 1 Jul 2022 at 16:23, Alexander Potapenko <glider@google.com> wrote:
> > > > > > > > >
> > > > > > > > > KMSAN adds extra metadata fields to struct page, so it does not fit into
> > > > > > > > > 64 bytes anymore.
> > > > > > > >
> > > > > > > > Does this somehow cause extra space being used in all kernel configs?
> > > > > > > > If not, it would be good to note this in the commit message.
> > > > > > > >
> > > > > > > I actually couldn't verify this on QEMU, because the driver never got loaded.
> > > > > > > Looks like this increases the amount of memory used by the nvdimm
> > > > > > > driver in all kernel configs that enable it (including those that
> > > > > > > don't use KMSAN), but I am not sure how much is that.
> > > > > > >
> > > > > > > Dan, do you know how bad increasing MAX_STRUCT_PAGE_SIZE can be?
> > > > > >
> > > > > > Apologies I missed this several months ago. The answer is that this
> > > > > > causes everyone creating PMEM namespaces on v6.1+ to lose double the
> > > > > > capacity of their namespace even when not using KMSAN which is too
> > > > > > wasteful to tolerate. So, I think "6e9f05dc66f9 libnvdimm/pfn_dev:
> > > > > > increase MAX_STRUCT_PAGE_SIZE" needs to be reverted and replaced with
> > > > > > something like:
> > > > > >
> > > > > > diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig
> > > > > > index 79d93126453d..5693869b720b 100644
> > > > > > --- a/drivers/nvdimm/Kconfig
> > > > > > +++ b/drivers/nvdimm/Kconfig
> > > > > > @@ -63,6 +63,7 @@ config NVDIMM_PFN
> > > > > >         bool "PFN: Map persistent (device) memory"
> > > > > >         default LIBNVDIMM
> > > > > >         depends on ZONE_DEVICE
> > > > > > +       depends on !KMSAN
> > > > > >         select ND_CLAIM
> > > > > >         help
> > > > > >           Map persistent memory, i.e. advertise it to the memory
> > > > > >
> > > > > >
> > > > > > ...otherwise, what was the rationale for increasing this value? Were you
> > > > > > actually trying to use KMSAN for DAX pages?
> > > > >
> > > > > I was just building the kernel with nvdimm driver and KMSAN enabled.
> > > > > Because KMSAN adds extra data to every struct page, it immediately hit
> > > > > the following assert:
> > > > >
> > > > > drivers/nvdimm/pfn_devs.c:796:3: error: call to
> > > > > __compiletime_assert_330 declared with 'error' attribute: BUILD_BUG_ON
> > > > > fE
> > > > >                 BUILD_BUG_ON(sizeof(struct page) > MAX_STRUCT_PAGE_SIZE);
> > > > >
> > > > > The comment before MAX_STRUCT_PAGE_SIZE declaration says "max struct
> > > > > page size independent of kernel config", but maybe we can afford
> > > > > making it dependent on CONFIG_KMSAN (and possibly other config options
> > > > > that increase struct page size)?
> > > > >
> > > > > I don't mind disabling the driver under KMSAN, but having an extra
> > > > > ifdef to keep KMSAN support sounds reasonable, WDYT?
> > > >
> > > > How about a module parameter to opt-in to the increased permanent
> > > > capacity loss?
> > >
> > > Please no, this isn't the 1990's, we should never force users to keep
> > > track of new module parameters that you then have to support for
> > > forever.
> >
> > Fair enough, premature enabling. If someone really wants this they can
> > find this thread in the archives and ask for another solution like
> > compile time override.
> >
> > >
> > >
> > > >
> > > > -- >8 --
> > > > >From 693563817dea3fd8f293f9b69ec78066ab1d96d2 Mon Sep 17 00:00:00 2001
> > > > From: Dan Williams <dan.j.williams@intel.com>
> > > > Date: Thu, 5 Jan 2023 13:27:34 -0800
> > > > Subject: [PATCH] nvdimm: Support sizeof(struct page) > MAX_STRUCT_PAGE_SIZE
> > > >
> > > > Commit 6e9f05dc66f9 ("libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE")
> > > >
> > > > ...updated MAX_STRUCT_PAGE_SIZE to account for sizeof(struct page)
> > > > potentially doubling in the case of CONFIG_KMSAN=y. Unfortunately this
> > > > doubles the amount of capacity stolen from user addressable capacity for
> > > > everyone, regardless of whether they are using the debug option. Revert
> > > > that change, mandate that MAX_STRUCT_PAGE_SIZE never exceed 64, but
> > > > allow for debug scenarios to proceed with creating debug sized page maps
> > > > with a new 'libnvdimm.page_struct_override' module parameter.
> > > >
> > > > Note that this only applies to cases where the page map is permanent,
> > > > i.e. stored in a reservation of the pmem itself ("--map=dev" in "ndctl
> > > > create-namespace" terms). For the "--map=mem" case, since the allocation
> > > > is ephemeral for the lifespan of the namespace, there are no explicit
> > > > restriction. However, the implicit restriction, of having enough
> > > > available "System RAM" to store the page map for the typically large
> > > > pmem, still applies.
> > > >
> > > > Fixes: 6e9f05dc66f9 ("libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE")
> > > > Cc: <stable@vger.kernel.org>
> > > > Cc: Alexander Potapenko <glider@google.com>
> > > > Cc: Marco Elver <elver@google.com>
> > > > Reported-by: Jeff Moyer <jmoyer@redhat.com>
> > > > ---
> > > >  drivers/nvdimm/nd.h       |  2 +-
> > > >  drivers/nvdimm/pfn_devs.c | 45 ++++++++++++++++++++++++++-------------
> > > >  2 files changed, 31 insertions(+), 16 deletions(-)
> > > >
> > > > diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
> > > > index 85ca5b4da3cf..ec5219680092 100644
> > > > --- a/drivers/nvdimm/nd.h
> > > > +++ b/drivers/nvdimm/nd.h
> > > > @@ -652,7 +652,7 @@ void devm_namespace_disable(struct device *dev,
> > > >             struct nd_namespace_common *ndns);
> > > >  #if IS_ENABLED(CONFIG_ND_CLAIM)
> > > >  /* max struct page size independent of kernel config */
> > > > -#define MAX_STRUCT_PAGE_SIZE 128
> > > > +#define MAX_STRUCT_PAGE_SIZE 64
> > > >  int nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct dev_pagemap *pgmap);
> > > >  #else
> > > >  static inline int nvdimm_setup_pfn(struct nd_pfn *nd_pfn,
> > > > diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
> > > > index 61af072ac98f..978d63559c0e 100644
> > > > --- a/drivers/nvdimm/pfn_devs.c
> > > > +++ b/drivers/nvdimm/pfn_devs.c
> > > > @@ -13,6 +13,11 @@
> > > >  #include "pfn.h"
> > > >  #include "nd.h"
> > > >
> > > > +static bool page_struct_override;
> > > > +module_param(page_struct_override, bool, 0644);
> > > > +MODULE_PARM_DESC(page_struct_override,
> > > > +            "Force namespace creation in the presence of mm-debug.");
> > >
> > > I can't figure out from this description what this is for so perhaps it
> > > should be either removed and made dynamic (if you know you want to debug
> > > the mm core, why not turn it on then?) or made more obvious what is
> > > happening?
> >
> > I'll kill it and update the KMSAN Documentation that KMSAN has
> > interactions with the NVDIMM subsystem that may cause some namespaces to
> > fail to enable. That Documentation needs to be a part of this patch
> > regardless as that would be the default behavior of this module
> > parameter.
> >
> > Unfortunately, it can not be dynamically enabled because the size of
> > 'struct page' is unfortunately recorded in the metadata of the device.
> > Recall this is for supporting platform configurations where the capacity
> > of the persistent memory exceeds or consumes too much of System RAM.
> > Consider 4TB of PMEM consumes 64GB of space just for 'struct page'. So,
> > NVDIMM subsystem has a mode to store that page array in a reservation on
> > the PMEM device itself.
>
> Sorry, I might be missing something, but why cannot we have
>
> #ifdef CONFIG_KMSAN
> #define MAX_STRUCT_PAGE_SIZE 128
> #else
> #define MAX_STRUCT_PAGE_SIZE 64
> #endif
>

Possibly because this needs to be a fixed size on permanent storage
(like an inode on a disk file system)



> ?
>
> KMSAN is a debug-only tool, it already consumes more than two thirds
> of the system memory, so you don't want to enable it in any production
> environment anyway.
>
> > KMSAN mandates either that all namespaces all the time reserve the extra
> > capacity, or that those namespace cannot be mapped while KMSAN is
> > enabled.
>
> Struct page depends on a couple of config options that affect its
> size, and has already been approaching the 64 byte boundary.
> It is unfortunate that the introduction of KMSAN was the last straw,
> but it could've been any other debug config that needs to store data
> in the struct page.
> Keeping the struct within cacheline size sounds reasonable for the
> default configuration, but having a build-time assert that prevents us
> from building debug configs sounds excessive.
>
> > --
> > You received this message because you are subscribed to the Google Groups "kasan-dev" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+unsubscribe@googlegroups.com.
> > To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/63bd0be8945a0_5178e29414%40dwillia2-xfh.jf.intel.com.notmuch.
>
>
>
> --
> Alexander Potapenko
> Software Engineer
>
> Google Germany GmbH
> Erika-Mann-Straße, 33
> 80636 München
>
> Geschäftsführer: Paul Manicle, Liana Sebastian
> Registergericht und -nummer: Hamburg, HRB 86891
> Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 10/45] libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE
  2023-01-10  8:53                   ` Eric Dumazet
@ 2023-01-10  8:55                     ` Christoph Hellwig
  2023-01-10 15:35                       ` Steven Rostedt
  2023-01-10  9:14                     ` Alexander Potapenko
  1 sibling, 1 reply; 147+ messages in thread
From: Christoph Hellwig @ 2023-01-10  8:55 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alexander Potapenko, Dan Williams, Greg Kroah-Hartman,
	Marco Elver, Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Herbert Xu, Ilya Leoshkevich,
	Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev,
	Linux Memory Management List, Linux-Arch, LKML

Folks, can you please trim your quotes to what is relevant?  I've not
actually beeng finding any relevant information in the last three mails
before giving up the scrolling after multiple pages.

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 10/45] libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE
  2023-01-10  8:53                   ` Eric Dumazet
  2023-01-10  8:55                     ` Christoph Hellwig
@ 2023-01-10  9:14                     ` Alexander Potapenko
  1 sibling, 0 replies; 147+ messages in thread
From: Alexander Potapenko @ 2023-01-10  9:14 UTC (permalink / raw)
  To: Eric Dumazet, Dan Williams
  Cc: Greg Kroah-Hartman, Marco Elver, Alexander Viro,
	Alexei Starovoitov, Andrew Morton, Andrey Konovalov,
	Andy Lutomirski, Arnd Bergmann, Borislav Petkov,
	Christoph Hellwig, Christoph Lameter, David Rientjes,
	Dmitry Vyukov, Herbert Xu, Ilya Leoshkevich, Ingo Molnar,
	Jens Axboe, Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, Linux Memory Management List,
	Linux-Arch, LKML

> > > Unfortunately, it can not be dynamically enabled because the size of
> > > 'struct page' is unfortunately recorded in the metadata of the device.
> > > Recall this is for supporting platform configurations where the capacity
> > > of the persistent memory exceeds or consumes too much of System RAM.
> > > Consider 4TB of PMEM consumes 64GB of space just for 'struct page'. So,
> > > NVDIMM subsystem has a mode to store that page array in a reservation on
> > > the PMEM device itself.
> >
> > Sorry, I might be missing something, but why cannot we have
> >
> > #ifdef CONFIG_KMSAN
> > #define MAX_STRUCT_PAGE_SIZE 128
> > #else
> > #define MAX_STRUCT_PAGE_SIZE 64
> > #endif
> >
>
> Possibly because this needs to be a fixed size on permanent storage
> (like an inode on a disk file system)
>

Ah, thank you, that makes sense.
Then I'm assuming some contents of struct pages are also stored in the
persistent memory. What happens if sizeof(struct page) stays below
MAX_STRUCT_PAGE_SIZE, but the layout changes?
E.g. doesn't it cause problems if the user enables/disables CONFIG_MEMCG?

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 10/45] libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE
  2023-01-10  8:55                     ` Christoph Hellwig
@ 2023-01-10 15:35                       ` Steven Rostedt
  0 siblings, 0 replies; 147+ messages in thread
From: Steven Rostedt @ 2023-01-10 15:35 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Eric Dumazet, Alexander Potapenko, Dan Williams,
	Greg Kroah-Hartman, Marco Elver, Alexander Viro,
	Alexei Starovoitov, Andrew Morton, Andrey Konovalov,
	Andy Lutomirski, Arnd Bergmann, Borislav Petkov,
	Christoph Lameter, David Rientjes, Dmitry Vyukov, Herbert Xu,
	Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim,
	Kees Cook, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin,
	Pekka Enberg, Peter Zijlstra, Petr Mladek, Thomas Gleixner,
	Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasan-dev,
	Linux Memory Management List, Linux-Arch, LKML

On Tue, 10 Jan 2023 09:55:49 +0100
Christoph Hellwig <hch@lst.de> wrote:

> Folks, can you please trim your quotes to what is relevant?  I've not
> actually beeng finding any relevant information in the last three mails
> before giving up the scrolling after multiple pages.

If I don't see a response after three scrolls, I ignore the email.

-- Steve

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 10/45] libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE
  2023-01-05 22:08       ` Dan Williams
  2023-01-09  9:51         ` Alexander Potapenko
@ 2023-01-30  8:34         ` Alexander Potapenko
  2023-01-30 18:57           ` Dan Williams
  1 sibling, 1 reply; 147+ messages in thread
From: Alexander Potapenko @ 2023-01-30  8:34 UTC (permalink / raw)
  To: Dan Williams
  Cc: Marco Elver, Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, Linux Memory Management List,
	Linux-Arch, LKML

On Thu, Jan 5, 2023 at 11:09 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> Alexander Potapenko wrote:
> > (+ Dan Williams)
> > (resending with patch context included)
> >
> > On Mon, Jul 11, 2022 at 6:27 PM Marco Elver <elver@google.com> wrote:
> > >
> > > On Fri, 1 Jul 2022 at 16:23, Alexander Potapenko <glider@google.com> wrote:
> > > >
> > > > KMSAN adds extra metadata fields to struct page, so it does not fit into
> > > > 64 bytes anymore.
> > >
> > > Does this somehow cause extra space being used in all kernel configs?
> > > If not, it would be good to note this in the commit message.
> > >
> > I actually couldn't verify this on QEMU, because the driver never got loaded.
> > Looks like this increases the amount of memory used by the nvdimm
> > driver in all kernel configs that enable it (including those that
> > don't use KMSAN), but I am not sure how much is that.
> >
> > Dan, do you know how bad increasing MAX_STRUCT_PAGE_SIZE can be?
>
> Apologies I missed this several months ago. The answer is that this
> causes everyone creating PMEM namespaces on v6.1+ to lose double the
> capacity of their namespace even when not using KMSAN which is too
> wasteful to tolerate. So, I think "6e9f05dc66f9 libnvdimm/pfn_dev:
> increase MAX_STRUCT_PAGE_SIZE" needs to be reverted and replaced with
> something like:
>
> diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig
> index 79d93126453d..5693869b720b 100644
> --- a/drivers/nvdimm/Kconfig
> +++ b/drivers/nvdimm/Kconfig
> @@ -63,6 +63,7 @@ config NVDIMM_PFN
>         bool "PFN: Map persistent (device) memory"
>         default LIBNVDIMM
>         depends on ZONE_DEVICE
> +       depends on !KMSAN
>         select ND_CLAIM
>         help
>           Map persistent memory, i.e. advertise it to the memory
>

Looks like we still don't have a resolution for this problem.
I have the following options in mind:

1. Set MAX_STRUCT_PAGE_SIZE to 80 (i.e. increase it by 2*sizeof(struct
page *) added by KMSAN) instead of 128.
2. Disable storing of struct pages on device for KMSAN builds.

, but if those are infeasible, we can always go for:

3. Disable KMSAN for NVDIMM and reflect it in Documentation. I am
happy to send the patch if we decide this is the best option.

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v4 10/45] libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE
  2023-01-30  8:34         ` Alexander Potapenko
@ 2023-01-30 18:57           ` Dan Williams
  0 siblings, 0 replies; 147+ messages in thread
From: Dan Williams @ 2023-01-30 18:57 UTC (permalink / raw)
  To: Alexander Potapenko, Dan Williams
  Cc: Marco Elver, Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, Linux Memory Management List,
	Linux-Arch, LKML

Alexander Potapenko wrote:
[..]
> >
> > diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig
> > index 79d93126453d..5693869b720b 100644
> > --- a/drivers/nvdimm/Kconfig
> > +++ b/drivers/nvdimm/Kconfig
> > @@ -63,6 +63,7 @@ config NVDIMM_PFN
> >         bool "PFN: Map persistent (device) memory"
> >         default LIBNVDIMM
> >         depends on ZONE_DEVICE
> > +       depends on !KMSAN
> >         select ND_CLAIM
> >         help
> >           Map persistent memory, i.e. advertise it to the memory
> >
> 
> Looks like we still don't have a resolution for this problem.
> I have the following options in mind:
> 
> 1. Set MAX_STRUCT_PAGE_SIZE to 80 (i.e. increase it by 2*sizeof(struct
> page *) added by KMSAN) instead of 128.
> 2. Disable storing of struct pages on device for KMSAN builds.
> 
> , but if those are infeasible, we can always go for:
> 
> 3. Disable KMSAN for NVDIMM and reflect it in Documentation. I am
> happy to send the patch if we decide this is the best option.

I copied you on the new proposal here:

https://lore.kernel.org/nvdimm/167467815773.463042.7022545814443036382.stgit@dwillia2-xfh.jf.intel.com/

It disables PMEM namespace creation with page-array reservations when
sizeof(struct page) > 64.

Note, it was pre-existing behavior for PMEM namespaces with too small of
a reservation to fail to enable. That gives me confidence that the
restriction to lose some PMEM namespace access with these memory debug
facilities is acceptable.

^ permalink raw reply	[flat|nested] 147+ messages in thread

end of thread, other threads:[~2023-01-30 18:57 UTC | newest]

Thread overview: 147+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-01 14:22 [PATCH v4 00/45] Add KernelMemorySanitizer infrastructure Alexander Potapenko
2022-07-01 14:22 ` [PATCH v4 01/45] x86: add missing include to sparsemem.h Alexander Potapenko
2022-07-01 14:22 ` [PATCH v4 02/45] stackdepot: reserve 5 extra bits in depot_stack_handle_t Alexander Potapenko
2022-07-12 14:17   ` Marco Elver
2022-07-01 14:22 ` [PATCH v4 03/45] instrumented.h: allow instrumenting both sides of copy_from_user() Alexander Potapenko
2022-07-12 14:17   ` Marco Elver
2022-07-01 14:22 ` [PATCH v4 04/45] x86: asm: instrument usercopy in get_user() and __put_user_size() Alexander Potapenko
2022-07-02  3:47   ` kernel test robot
2022-07-15 14:03     ` Alexander Potapenko
2022-07-15 14:03       ` Alexander Potapenko
2022-07-02 10:45   ` kernel test robot
2022-07-15 16:44     ` Alexander Potapenko
2022-07-15 16:44       ` Alexander Potapenko
2022-07-02 13:09   ` kernel test robot
2022-07-07 10:13   ` Marco Elver
2022-08-07 17:33     ` Alexander Potapenko
2022-07-01 14:22 ` [PATCH v4 05/45] asm-generic: instrument usercopy in cacheflush.h Alexander Potapenko
2022-07-12 14:17   ` Marco Elver
2022-07-01 14:22 ` [PATCH v4 06/45] kmsan: add ReST documentation Alexander Potapenko
2022-07-07 12:34   ` Marco Elver
2022-07-15  7:42     ` Alexander Potapenko
2022-07-15  8:52       ` Marco Elver
2022-07-01 14:22 ` [PATCH v4 07/45] kmsan: introduce __no_sanitize_memory and __no_kmsan_checks Alexander Potapenko
2022-07-12 14:17   ` Marco Elver
2022-07-01 14:22 ` [PATCH v4 08/45] kmsan: mark noinstr as __no_sanitize_memory Alexander Potapenko
2022-07-12 14:17   ` Marco Elver
2022-07-01 14:22 ` [PATCH v4 09/45] x86: kmsan: pgtable: reduce vmalloc space Alexander Potapenko
2022-07-11 16:12   ` Marco Elver
2022-07-01 14:22 ` [PATCH v4 10/45] libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE Alexander Potapenko
2022-07-11 16:26   ` Marco Elver
2022-08-03  9:41     ` Alexander Potapenko
2022-08-03  9:44     ` Alexander Potapenko
2023-01-05 22:08       ` Dan Williams
2023-01-09  9:51         ` Alexander Potapenko
2023-01-09 22:06           ` Dan Williams
2023-01-10  5:56             ` Greg Kroah-Hartman
2023-01-10  6:55               ` Dan Williams
2023-01-10  8:48                 ` Alexander Potapenko
2023-01-10  8:52                   ` Alexander Potapenko
2023-01-10  8:53                   ` Eric Dumazet
2023-01-10  8:55                     ` Christoph Hellwig
2023-01-10 15:35                       ` Steven Rostedt
2023-01-10  9:14                     ` Alexander Potapenko
2023-01-30  8:34         ` Alexander Potapenko
2023-01-30 18:57           ` Dan Williams
2022-07-01 14:22 ` [PATCH v4 11/45] kmsan: add KMSAN runtime core Alexander Potapenko
2022-07-02  0:18   ` Hillf Danton
2022-08-03 17:25     ` Alexander Potapenko
2022-07-11 16:49   ` Marco Elver
2022-08-03 18:14     ` Alexander Potapenko
2022-07-13 10:04   ` Marco Elver
2022-08-03 17:45     ` Alexander Potapenko
2022-07-01 14:22 ` [PATCH v4 12/45] kmsan: disable instrumentation of unsupported common kernel code Alexander Potapenko
2022-07-12 11:54   ` Marco Elver
2022-07-01 14:22 ` [PATCH v4 13/45] MAINTAINERS: add entry for KMSAN Alexander Potapenko
2022-07-12 12:06   ` Marco Elver
2022-08-02 16:39     ` Alexander Potapenko
2022-07-01 14:22 ` [PATCH v4 14/45] mm: kmsan: maintain KMSAN metadata for page operations Alexander Potapenko
2022-07-12 12:20   ` Marco Elver
2022-08-03 10:30     ` Alexander Potapenko
2022-07-01 14:22 ` [PATCH v4 15/45] mm: kmsan: call KMSAN hooks from SLUB code Alexander Potapenko
2022-07-12 13:13   ` Marco Elver
2022-08-02 16:31     ` Alexander Potapenko
2022-07-01 14:22 ` [PATCH v4 16/45] kmsan: handle task creation and exiting Alexander Potapenko
2022-07-12 13:17   ` Marco Elver
2022-08-02 15:47     ` Alexander Potapenko
2022-07-01 14:22 ` [PATCH v4 17/45] init: kmsan: call KMSAN initialization routines Alexander Potapenko
2022-07-12 14:05   ` Marco Elver
2022-08-02 20:07     ` Alexander Potapenko
2022-08-03  9:08       ` Marco Elver
2022-07-01 14:22 ` [PATCH v4 18/45] instrumented.h: add KMSAN support Alexander Potapenko
2022-07-12 13:51   ` Marco Elver
2022-08-03 11:17     ` Alexander Potapenko
2022-07-01 14:22 ` [PATCH v4 19/45] kmsan: unpoison @tlb in arch_tlb_gather_mmu() Alexander Potapenko
2022-07-13  9:28   ` Marco Elver
2022-07-01 14:22 ` [PATCH v4 20/45] kmsan: add iomap support Alexander Potapenko
2022-07-01 14:22 ` [PATCH v4 21/45] Input: libps2: mark data received in __ps2_command() as initialized Alexander Potapenko
2022-07-01 14:22 ` [PATCH v4 22/45] dma: kmsan: unpoison DMA mappings Alexander Potapenko
2022-07-01 14:22 ` [PATCH v4 23/45] virtio: kmsan: check/unpoison scatterlist in vring_map_one_sg() Alexander Potapenko
2022-07-01 14:22 ` [PATCH v4 24/45] kmsan: handle memory sent to/from USB Alexander Potapenko
2022-07-01 14:22 ` [PATCH v4 25/45] kmsan: add tests for KMSAN Alexander Potapenko
2022-07-12 14:16   ` Marco Elver
2022-08-02 17:29     ` Alexander Potapenko
2022-07-01 14:22 ` [PATCH v4 26/45] kmsan: disable strscpy() optimization under KMSAN Alexander Potapenko
2022-07-01 14:22 ` [PATCH v4 27/45] crypto: kmsan: disable accelerated configs " Alexander Potapenko
2022-07-01 14:22 ` [PATCH v4 28/45] kmsan: disable physical page merging in biovec Alexander Potapenko
2022-07-01 14:22 ` [PATCH v4 29/45] block: kmsan: skip bio block merging logic for KMSAN Alexander Potapenko
2022-07-13 10:22   ` Marco Elver
2022-08-02 17:47     ` Alexander Potapenko
2022-07-01 14:22 ` [PATCH v4 30/45] kcov: kmsan: unpoison area->list in kcov_remote_area_put() Alexander Potapenko
2022-07-01 14:22 ` [PATCH v4 31/45] security: kmsan: fix interoperability with auto-initialization Alexander Potapenko
2022-07-01 14:22 ` [PATCH v4 32/45] objtool: kmsan: list KMSAN API functions as uaccess-safe Alexander Potapenko
2022-07-01 14:22 ` [PATCH v4 33/45] x86: kmsan: disable instrumentation of unsupported code Alexander Potapenko
2022-07-12 13:43   ` Marco Elver
2022-08-03 10:52     ` Alexander Potapenko
2022-07-01 14:22 ` [PATCH v4 34/45] x86: kmsan: skip shadow checks in __switch_to() Alexander Potapenko
2022-07-01 14:23 ` [PATCH v4 35/45] x86: kmsan: handle open-coded assembly in lib/iomem.c Alexander Potapenko
2022-07-01 14:23 ` [PATCH v4 36/45] x86: kmsan: use __msan_ string functions where possible Alexander Potapenko
2022-07-01 14:23 ` [PATCH v4 37/45] x86: kmsan: sync metadata pages on page fault Alexander Potapenko
2022-07-01 14:23 ` [PATCH v4 38/45] x86: kasan: kmsan: support CONFIG_GENERIC_CSUM on x86, enable it for KASAN/KMSAN Alexander Potapenko
2022-07-01 14:23 ` [PATCH v4 39/45] x86: fs: kmsan: disable CONFIG_DCACHE_WORD_ACCESS Alexander Potapenko
2022-07-01 14:23 ` [PATCH v4 40/45] x86: kmsan: don't instrument stack walking functions Alexander Potapenko
2022-07-01 14:23 ` [PATCH v4 41/45] entry: kmsan: introduce kmsan_unpoison_entry_regs() Alexander Potapenko
2022-07-01 14:23 ` [PATCH v4 42/45] bpf: kmsan: initialize BPF registers with zeroes Alexander Potapenko
2022-07-01 14:23 ` [PATCH v4 43/45] namei: initialize parameters passed to step_into() Alexander Potapenko
2022-07-02 17:23   ` Linus Torvalds
2022-07-03  3:59     ` Al Viro
2022-07-04  2:52     ` Al Viro
2022-07-04  8:20       ` Alexander Potapenko
2022-07-04 13:44         ` Al Viro
2022-07-04 13:55           ` Al Viro
2022-07-04 15:49           ` Alexander Potapenko
2022-07-04 16:03             ` Greg Kroah-Hartman
2022-07-04 16:33               ` Alexander Potapenko
2022-07-04 18:23             ` Segher Boessenkool
2022-07-04 16:00           ` Al Viro
2022-07-04 16:47             ` Alexander Potapenko
2022-07-04 17:36       ` Linus Torvalds
2022-07-04 19:02         ` Al Viro
2022-07-04 19:16           ` Linus Torvalds
2022-07-04 19:55             ` Al Viro
2022-07-04 20:24               ` Linus Torvalds
2022-07-04 20:46                 ` Al Viro
2022-07-04 20:51                   ` Linus Torvalds
2022-07-04 21:04                     ` Al Viro
2022-07-04 23:13                       ` [PATCH 1/7] __follow_mount_rcu(): verify that mount_lock remains unchanged Al Viro
2022-07-04 23:14                         ` [PATCH 2/7] follow_dotdot{,_rcu}(): change calling conventions Al Viro
2022-07-04 23:14                         ` [PATCH 3/7] namei: stash the sampled ->d_seq into nameidata Al Viro
2022-07-04 23:15                         ` [PATCH 4/7] step_into(): lose inode argument Al Viro
2022-07-04 23:15                         ` [PATCH 5/7] follow_dotdot{,_rcu}(): don't bother with inode Al Viro
2022-07-04 23:16                         ` [PATCH 6/7] lookup_fast(): " Al Viro
2022-07-04 23:17                         ` [PATCH 7/7] step_into(): move fetching ->d_inode past handle_mounts() Al Viro
2022-07-04 23:19                         ` [PATCH 1/7] __follow_mount_rcu(): verify that mount_lock remains unchanged Al Viro
2022-07-05  0:06                           ` Linus Torvalds
2022-07-05  3:48                             ` Al Viro
2022-07-04 20:47                 ` [PATCH v4 43/45] namei: initialize parameters passed to step_into() Linus Torvalds
2022-08-08 16:37   ` Alexander Potapenko
2022-07-01 14:23 ` [PATCH v4 44/45] mm: fs: initialize fsdata passed to write_begin/write_end interface Alexander Potapenko
2022-07-04 20:07   ` Matthew Wilcox
2022-07-04 20:30     ` Al Viro
2022-08-25 15:39     ` Alexander Potapenko
2022-08-25 16:33       ` Linus Torvalds
2022-08-25 21:57         ` Segher Boessenkool
2022-08-26 19:41           ` Linus Torvalds
2022-08-31 13:32             ` Alexander Potapenko
2022-08-25 22:13         ` Segher Boessenkool
2022-07-01 14:23 ` [PATCH v4 45/45] x86: kmsan: enable KMSAN builds for x86 Alexander Potapenko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.